in

InaPlex Forums

Forums and downloads for InaPlex customers and partners.

Removing NULLs and other illegal characters from text

Last post 03-13-2009 5:29 by admin. 2 replies.
Page 1 of 1 (3 items)
Sort Posts: Previous Next
  • 11-08-2008 18:39

    Removing NULLs and other illegal characters from text

    A frequent problem with data migrations is text data with illegal characters in it. For example, the GoldMine notes fields sometimes contains NULL characters (binary 0), which can be problematic when inserting the data into SQL Server.

    Inaport and Sage Migrator provide the snip() function, which can be used to remove characters or strings from a field. For example:

    snip("aa111bb", "1")

    will remove the "1" characters from the string, returning "aabb".

    The characters to be snipped can be built from a regular expression. In particular, \xNN allows you to specify the binary value of any character. Examples:

    \x00  -  NULL
    \x0A  -  Carriage Return
    \x0D  -  Line Feed

    So to snip the NULL character from a field, do:

    snip(#myField, "\x00")

    If you need to remove a range of characters, use the regular expression "[]" construct, which allows you to specify a set or range of characters:

    [\x00\x02]  -  specifes the characters \x00 and \x02
    [\x00-\x08]  -  specifies all characters between \x00 and \x08

    So to snip a range of characters fro the field, do:

    snip(#myField, "[\x00-\x08]+")

     

    Filed under: ,
  • 03-12-2009 20:26 In reply to

    • DougW
    • Top 10 Contributor
    • Joined on 03-25-2008
    • Posts 9

    Re: Removing NULLs and other illegal characters from text

    This is a great example of how Inaport can be used to cleanse data that's already in our database! Thanks.

    Just two corrections to your tutorial. You need to use the Hex code for the ASCII characters, not binary as you stated. And the Hex codes for some of your examples appear to be reversed. I found two published charts of ASCII/Hex/etc codes at these sites:

       http://www.asciitable.com/

       http://www.pcguide.com/res/tablesASCII-c.html

    I used this tutorial as a way to cleanse phone number fields of bogus characters like ( ) - . : and spaces using this expression:

       snip(#MAINPHONE, "[\x20\x28\x29\x2D\x2E\x3A]")

  • 03-13-2009 5:29 In reply to

    Re: Removing NULLs and other illegal characters from text

    Hi Doug

    Thank you for the comments and the feedback - nice to get! I am sorry that I was not clear that hexadecimal and binary are equivalent.

    Your example is a good one. As is often the case, there are also other ways of achieving this.

    Using regular expressions, "\d" stands for any digit character, and "\D" stands for any NON-digit character. You can therefore do this:
          snip(#MAINPHONE, "\D")
    This will remove any non-digit character. You might then feed the result to the usphfmt() function to format it as (123)456-7890, like this:
          usphfmt(snip(#MAINPHONE, "\D"))

    Inaport also has a stripphno() function, that will remove all non-digit characters; it is a bit faster because it does not invoke the regular expression engine. The speed difference is negligible in all but extreme circumstances.

    Regards

    Admin

     

    Filed under:
Page 1 of 1 (3 items)
InaPlex Limited 2011