Regex find all occurrences of a pattern in a string

18,563

Solution 1

When regexp parser sees the .* character sequence, it matches everything up to the end of the string and goes back, char by char, (greedy match). So, to avoid the problem, you can use a non-greedy match or explicitly define the characters that can appear at a string.

"=\?[a-zA-Z0-9?=-]*\?B\?[a-zA-Z0-9?=-]*\?="

Solution 2

A non-regex way:

string msg= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
string[] charSetOccurences = msg.Split(new string[]{ " " }, StringSplitOptions.None);
foreach (string s in charSetOccurences)
{
    string charSet = s.Replace("=?", "").Replace("?B?", "").Replace("?b?", "");
    Console.WriteLine(charSet);
}

See an ideone.

And if you still want to use regex, you should make the .* lazy by adding a ?. This was already mentioned by the previous users, but it seems you are not getting matches?

string msg= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
var charSetOccurences = new Regex(@"=\?.*?\?B\?.*?\?=", RegexOptions.IgnoreCase);
var charSetMatches = charSetOccurences.Matches(msg);
foreach (Match match in charSetMatches)
{
    string charSet = match.Groups[0].Value.Replace("=?", "").Replace("?B?", "").Replace("?b?", "");
    Console.WriteLine(charSet);
}

See another ideone.

The output is the same in both cases:

windows-1258UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?=
windows-1258IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=

EDIT: As per update, see an all in one solution for your problem

string msg= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
var charSetOccurences = new Regex(@"=\?.*?\?[BQ]\?.*?\?=", RegexOptions.IgnoreCase);
MatchCollection matches = charSetOccurences.Matches(msg);
foreach (Match match in matches)
{
    string[] encoding = match.Groups[0].Value.Split(new string[]{ "?" }, StringSplitOptions.None);
    string charSet = encoding[1];
    string encodeType = encoding[2];
    string encodedString = encoding[3];
    Console.WriteLine("Charset: " + charSet);
    Console.WriteLine("Encoding type: " + encodeType);
    Console.WriteLine("Encoded String: " + encodedString + "\n");
}

Returns:

Charset: windows-1258
Encoding type: B
Encoded String: UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz

Charset: windows-1258
Encoding type: B
Encoded String: IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=

See this.

Or since we already had the regex, we can use:

string msg= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
var charSetOccurences = new Regex(@"=\?(.*?)\?([BQ])\?(.*?)\?=", RegexOptions.IgnoreCase);
MatchCollection matches = charSetOccurences.Matches(msg);
foreach (Match match in matches)
{
    Console.WriteLine("Charset: " + match.Groups[1].Value);
    Console.WriteLine("Encoding type: " + match.Groups[2].Value);
    Console.WriteLine("Encoded String: " + match.Groups[3].Value + "\n");
}

Returns the same output.

Solution 3

.* is greedy and will match everything from the first ? to the last ?B?.

You need to use either a non-greedy match

=\?.*?\?B\?.*?\?=

or exclude ? from your list of characters

=\?[^?]*\?B\?[^?]*\?=
Share:
18,563
CloudAnywhere
Author by

CloudAnywhere

Updated on June 25, 2022

Comments

  • CloudAnywhere
    CloudAnywhere almost 2 years

    I have a problem finding all occurences of a pattern in a string.

    Check this string :

    string msg= "=?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?= =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?=";
    

    I want to return the 2 occurrences (in order to later decode them):

    =?windows-1258?B?UkU6IFRyIDogUGxhbiBkZSBjb250aW51aXTpIGQnYWN0aXZpdOkgZGVz?=

    and

    =?windows-1258?B?IHNlcnZldXJzIFdlYiBHb1ZveWFnZXN=?="

    With the following regex code, it returns only 1 occurrence: the full string.

    var charSetOccurences = new Regex(@"=\?.*\?B\?.*\?=", RegexOptions.IgnoreCase);
    var charSetMatches = charSetOccurences.Matches(input);
    foreach (Match match in charSetMatches)
    {
        charSet = match.Groups[0].Value.Replace("=?", "").Replace("?B?", "").Replace("?b?", "");
    }
    

    Do you know what I'm missing?