Posted in General

Split text with a delimiter and Text Qualifier

I have read an article that spoke about spltting a string for given Delimiter and TextQualifier at codeproject by Larry Steinle. Larry had an example of Delimiter being FULL-STOP and Text Qualifier being DOUBLE-QUOTE.

If you are unsure of what i am talking, conside this. For example, say we have some text test string “example.”.cool!, we would like to split the text whenever we see the FULL-STOP – which is something the normal string.Split() does anyway. But we should be able to have DOUBLE-QUOTES as Text Qualifiers, so any text thats enclosed between two DOUBLE-QUOTES should be treated as text and we should not split that part of text if it contains a FULL-STOP.

Input : test string "example.".cool!
Delimiter : .
Text Qualifier : "

Our result after split should be

  1. test string “example.”
  2. cool!

Looks like Larry intended to have a generic method that will take given demiliter & textQualifier do the split. Larry wrote a function that does the work. I thought regular expression might be alternative, so i came up with some code and posted a comment with code. I am suprised how many number of lines i needed to do this (Just 3 maximum – oh yeah we can go crazy and make it single line too). I have attached a sample project too (for your reference). I started out simple with the issue mentioned (Spliting for a delimiter and text qualifier). I was pretty sure that Regular expression is better option (they are easier to implement – less code and time required and are pretty good performance wise, but yeah normal codeing can beat regular expression sometimes with couple of % – usually between 1%-5%, but they take days to write and optmise). Firstly i googled for an existing code which has 100% match for what i wanted to try. Then i had trouble finding exactly what i wanted, so i moved to almost what i want. I know i have used parse CSV files using RE ( Delimiter and ‘,’ and Qualifier is ‘”‘). I just replaced the comma with DOT (slash dot – as dot is regular expression syntax so i had to escape it). Bingo, its all done.

Regex re = new Regex(@"\.(?=(?:[^""]*\"[^""]*"")*(?![^""]*""))", RegexOptions.IgnoreCase| RegexOptions.Compiled); string[] arr = re.Split(@"test string ""example."".cool!"); Larry request for a generic method that can support any delimiter and textQualifier. I placed {0} where ever i saw ‘\.’ and {1} where ever i saw “, thats it. It was all done:

string reString = string.Format(@"{0}(?=(?:[^{1}]*{1}[^{1}]*{1})*(?![^{1}]*{1}))", Regex.Escape(delimiter), Regex.Escape(textQualifier));
Regex re = new Regex(reString, RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Multiline);

I have attached the source code for anyone else who wants to do it. This is what the source code does Screen shot wat the source code does


5 thoughts on “Split text with a delimiter and Text Qualifier

  1. Hi. Really good guide here. What if I want a slightly different result with your expression? I would like the result in your example to be:

    1. test string example.
    2. cool!

    So get rid of the ” at all, but keep the “.” inside them…?

    Thanks 🙂

  2. Kto chce zarobic nie wychodzac z domu parenascie dolarow dziennie???
    Sposob jest prosty i dziala, wpiszcie sobie w gogle: Jak kosic 5% zysku dziennie
    na Traffichubb

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s