Blog coding and discussion of coding about JavaScript, PHP, CGI, general web building etc.

Wednesday, March 8, 2017

C# Regular expressions, retrieving two words separated by a comma, parenthesis operator

C# Regular expressions, retrieving two words separated by a comma, parenthesis operator


I've been playing around with retrieving data from a string using regular expression, mostly as an exercise for myself. The pattern that I'm trying to match looks like this:

"(SomeWord,OtherWord)"  

After reading some documentation and looking at a cheat sheet I came to the conclusion that the following regex should give me 2 matches:

"\((\w),(\w)\)"  

Because according to the documentation the parenthesis should do the following:

(pattern) Matches pattern and remembers the match. The matched substring can be retrieved from the resulting Matches collection, using Item [0]...[n]. To match parentheses characters ( ), use "\ (" or "\ )".

However using the following code (removed error checking for conciseness) matches quite something different:

string line = "(A,B)";  string pattern = @"\((\w),(\w)\)";  MatchCollection matches = Regex.Matches(line, pattern);  string left = matches[0].Value;  string right = matches[1].Value;  

Now I would expect left to become "A" and right to become "B". However left becomes "(A,B)" and there is no second match at all. What am I missing here?

(I know this example is trivial to solve without regexes but to learn how to properly use regexes I should be able to make something simple as this work)

Answer by pstrjds for C# Regular expressions, retrieving two words separated by a comma, parenthesis operator


You want the Groups member of the first match. In your example case there is only 1 match, which is the whole string. In the Groups collection you will have 3 items. Try this sample code, left should be A, and right should be B. If you look at the group[0] value it will be the whole string.

string line = "(A,B)";  string pattern = @"\((\w),(\w)\)";  MatchCollection matches = Regex.Matches(line, pattern);  GroupCollection groups = matches[0].Groups;  string left = groups[1].Value;  string right = groups[2].Value;  

Answer by Olivier Jacot-Descombes for C# Regular expressions, retrieving two words separated by a comma, parenthesis operator


\w matches only one word character. If words have to contain at least one character, the expression should be:

string pattern = @"\((\w+),(\w+)\)";   

if words may be empty:

string pattern = @"\((\w*),(\w*)\)";   

+: means one or more repetitions.

*: means zero, one or more repetitions.

In any case, you will get one match with three groups, the first containing the whole string including the left and right parentheses, the two others the two words.

Answer by Timothy Khouri for C# Regular expressions, retrieving two words separated by a comma, parenthesis operator


First off, it's one "match", with 2 "groups"...

I would recommend you name the groups anyway...

string pattern = @"\((?\w+),(?\w+)\)";  

Then you could do...

Match m = Regex.Match(line, pattern);    string firstWord = m.Groups["FirstWord"].Value;  

Answer by ean5533 for C# Regular expressions, retrieving two words separated by a comma, parenthesis operator


I think the problem is that you're confusing the concept of a match and a group.

A MatchCollection contains a list of strings that matched your entire regex, not just the parenthetical groups inside that Regex. For example, if the string you searched looked like this...

(A,B)(C,D)  

...then you would have two matches: (A,B) and (C,D).

However, there's good news: you can get the groups from each match very easily, like so:

string line = "(A,B)";  string pattern = @"\((\w),(\w)\)";  MatchCollection matches = Regex.Matches(line, pattern);  string left = matches[0].Groups[1].Value;  string right = matches[0].Groups[2].Value;  

That Groups variable is a collection of parenthetical groups from a single match.

Edit: Olivier Jacot-Descombes made a very good point: we all got so hung up explaining match vs. group that we forgot to notice a second problem: \w will only match a SINGLE character. You need to add a quantifier (such as +) in order to grab more than one character at a time. Olivier's answer should explain that part clearly.

Answer by Novus for C# Regular expressions, retrieving two words separated by a comma, parenthesis operator


Since all you are looking for are the characters separated by a comma, you can simply use \w as your pattern. The matches will be A and B.

A handy site for testing your Regex is http://gskinner.com/RegExr/


Fatal error: Call to a member function getElementsByTagName() on a non-object in D:\XAMPP INSTALLASTION\xampp\htdocs\endunpratama9i\www-stackoverflow-info-proses.php on line 72

0 comments:

Post a Comment

Popular Posts

Powered by Blogger.