Regex with non-capturing group in C#

15,536

Solution 1

mc[0].Captures is equivalent to mc[0].Groups[0].Captures. Groups[0] always refers to the whole match, so there will only ever be the one Capture associated with it. The part you're looking for is captured in group #1, so you should be using mc[0].Groups[1].Captures.

But your regex is designed to match the whole input in one attempt, so the Matches() method will always return a MatchCollection with only one Match in it (assuming the match is successful). You might as well use Match() instead:

  Match m = Regex.Match(source, jointPattern);
  if (m.Success)
  {
    foreach (Capture c in m.Groups[1].Captures)
    {
      Console.WriteLine(c.Value);
    }
  }

output:

1            0.000000E+00           0.975415E+01           0.616921E+01
2            0.000000E+00           0.000000E+00           0.000000E+00

Solution 2

I would just not use Regex for heavy lifting and parse the text.

var data = @"     JOINTS               DISPL.-X               DISPL.-Y               ROTATION


         1            0.000000E+00           0.975415E+01           0.616921E+01
         2            0.000000E+00           0.000000E+00           0.000000E+00";

var lines = data.Split('\r', '\n').Where(s => !string.IsNullOrWhiteSpace(s));
var regex = new Regex(@"(\S+)");

var dataItems = lines.Select(s => regex.Matches(s)).Select(m => m.Cast<Match>().Select(c => c.Value));

enter image description here

Solution 3

There's two problems: The repeating part (?:...) is not matching properly; and the .* is greedy and consumes all the input, so the repeating part never matches even if it could.

Use this instead:

JOINTS.*?[\r\n]+(?:\s*(\d+\s*\S*\s*\S*\s*\S*)[\r\n\s]*)*

This has a non-greedy leading part, ensures that the line-matching part starts on a new line (not in the middle of a title), and uses [\r\n\s]* in case the newlines are not exactly as you expect.

Personally, I would use regexes for this, but I like regexes :-) If you happen to know that the structure of the string will always be [title][newline][newline][lines] then perhaps it's more straightforward (if less flexible) to just split on newlines and process accordingly.

Finally, you can use regex101.com or one of the many other regex testing sites to help debug your regular expressions.

Solution 4

Why not just capture the values and ignore the rest. Here is a regex which gets the values.

string data = @"JOINTS DISPL.-X DISPL.-Y ROTATION
 1 0.000000E+00 0.975415E+01 0.616921E+01
 2 0.000000E+00 0.000000E+00 0.000000E+00";

string pattern = @"^
\s+
 (?<Joint>\d+)
\s+
 (?<ValX>[^\s]+)
\s+
 (?<ValY>[^\s]+)
\s+
 (?<Rotation>[^\s]+)";

var result = Regex.Matches(data, pattern, RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture)
                  .OfType<Match>()
                  .Select (mt => new
                  {
                    Joint = mt.Groups["Joint"].Value,
                    ValX  = mt.Groups["ValX"].Value,
                    ValY  = mt.Groups["ValY"].Value,
                    Rotation = mt.Groups["Rotation"].Value,
                  });
/* result is
IEnumerable<> (2 items)
Joint ValX ValY Rotation
1 0.000000E+00 0.975415E+01 0.616921E+01
2 0.000000E+00 0.000000E+00 0.000000E+00
*/
Share:
15,536
ian93
Author by

ian93

Updated on June 11, 2022

Comments

  • ian93
    ian93 almost 2 years

    I am using the following Regex

    JOINTS.*\s*(?:(\d*\s*\S*\s*\S*\s*\S*)\r\n\s*)*
    

    on the following type of data:

     JOINTS               DISPL.-X               DISPL.-Y               ROTATION
    
    
         1            0.000000E+00           0.975415E+01           0.616921E+01
         2            0.000000E+00           0.000000E+00           0.000000E+00
    

    The idea is to extract two groups, each containing a line (starting with the Joint Number, 1, 2, etc.) The C# code is as follows:

    string jointPattern = @"JOINTS.*\s*(?:(\d*\s*\S*\s*\S*\s*\S*)\r\n\s*)*";
    MatchCollection mc = Regex.Matches(outFileSection, jointPattern );
    foreach (Capture c in mc[0].Captures)
    {
        JointOutput j = new JointOutput();
        string[] vals = c.Value.Split();
        j.Joint = int.Parse(vals[0]) - 1;
        j.XDisplacement = float.Parse(vals[1]);
        j.YDisplacement = float.Parse(vals[2]);
        j.Rotation = float.Parse(vals[3]);
        joints.Add(j);
    }
    

    However, this does not work: rather than returning two captured groups (the inside group), it returns one group: the entire block, including the column headers. Why does this happen? Does C# deal with un-captured groups differently?

    Finally, are RegExes the best way to do this? (I really do feel like I have two problems now.)

  • ian93
    ian93 about 11 years
    Nope, still doesn't work. It gives one big capture group containing everything from JOINTS through to the last floating point number.
  • Cameron
    Cameron about 11 years
    @ian93: Try it now, I fixed the newline handling/start of line handling. Also, why are you using Matches if you know there's only going to be one match?
  • ian93
    ian93 about 11 years
    I think I may go with this approach, but using less Linq, as I have to maintain the code and looking at yours I have no idea what's going on...
  • Dustin Kingen
    Dustin Kingen about 11 years
    Might want to learn Linq because it's really powerful. The last lines matches each line and pulls out everything that isn't a space and then extract the values from the CaptureCollection that's inside the MatchCollection.
  • tttony
    tttony about 11 years
    @Cameron it will match only the first, add * or + at the end
  • Cameron
    Cameron about 11 years
    @tttony: Oops, I forgot a character when I copied it from where I was testing it. Good catch, thanks :-)
  • Jake H
    Jake H about 11 years
    I admit, it is kinda funny when a guy using regex statements dismisses a linq statement for looking complex. There is power in in being succinct, it applies to linq as well as regexes.
  • Cameron
    Cameron about 11 years
    You know, I actually checked MSDN to find out how the Captures property works (I've never used it myself), and I didn't notice that it refers to group 0 (which is obviously the main cause of consternation for the OP). +1!