Check if a string contains a word but only in specific position?

16,591

Solution 1

I like Sertac's idea about deleting strings enclosed by brackets and searching for a string after that. Here is a code sample extended by a search for whole words and case sensitivity:

function ContainsWord(const AText, AWord: string; AWholeWord: Boolean = True;
  ACaseSensitive: Boolean = False): Boolean;
var
  S: string;
  BracketEnd: Integer;
  BracketStart: Integer;
  SearchOptions: TStringSearchOptions;
begin
  S := AText;
  BracketEnd := Pos(']', S);
  BracketStart := Pos('[', S);
  while (BracketStart > 0) and (BracketEnd > 0) do
  begin
    Delete(S, BracketStart, BracketEnd - BracketStart + 1);
    BracketEnd := Pos(']', S);
    BracketStart := Pos('[', S);
  end;
  SearchOptions := [soDown];
  if AWholeWord then
    Include(SearchOptions, soWholeWord);
  if ACaseSensitive then
    Include(SearchOptions, soMatchCase);
  Result := Assigned(SearchBuf(PChar(S), StrLen(PChar(S)), 0, 0, AWord,
    SearchOptions));
end;

Here is an optimized version of the function, which uses pointer char iteration without string manipulation. In comparison with a previous version this handles the case when you have a string with missing closing bracket like for instance My [favorite color is. Such string is there evaluated to True because of that missing bracket.

The principle is to go through the whole string char by char and when you find the opening bracket, look if that bracket has a closing pair for itself. If yes, then check if the substring from the stored position until the opening bracket contains the searched word. If yes, exit the function. If not, move the stored position to the closing bracket. If the opening bracket doesn't have own closing pair, search for the word from the stored position to the end of the whole string and exit the function.

For commented version of this code follow this link.

function ContainsWord(const AText, AWord: string; AWholeWord: Boolean = True;
  ACaseSensitive: Boolean = False): Boolean;
var
  CurrChr: PChar;
  TokenChr: PChar;
  TokenLen: Integer;
  SubstrChr: PChar;
  SubstrLen: Integer;
  SearchOptions: TStringSearchOptions;
begin
  Result := False;
  if (Length(AText) = 0) or (Length(AWord) = 0) then
    Exit;
  SearchOptions := [soDown];
  if AWholeWord then
    Include(SearchOptions, soWholeWord);
  if ACaseSensitive then
    Include(SearchOptions, soMatchCase);
  CurrChr := PChar(AText);
  SubstrChr := CurrChr;
  SubstrLen := 0;
  while CurrChr^ <> #0 do
  begin
    if CurrChr^ = '[' then
    begin
      TokenChr := CurrChr;
      TokenLen := 0;
      while (TokenChr^ <> #0) and (TokenChr^ <> ']') do
      begin
        Inc(TokenChr);
        Inc(TokenLen);
      end;
      if TokenChr^ = #0 then
        SubstrLen := SubstrLen + TokenLen;
      Result := Assigned(SearchBuf(SubstrChr, SubstrLen, 0, 0, AWord,
        SearchOptions));
      if Result or (TokenChr^ = #0) then
        Exit;
      CurrChr := TokenChr;
      SubstrChr := CurrChr;
      SubstrLen := 0;
    end
    else
    begin
      Inc(CurrChr);
      Inc(SubstrLen);
    end;
  end;
  Result := Assigned(SearchBuf(SubstrChr, SubstrLen, 0, 0, AWord,
    SearchOptions));
end;

Solution 2

In regular expressions, there is a thing called look-around you could use. In your case you can solve it with negative lookbehind: you want "favorite" unless it's preceded with an opening bracket. It could look like this:

(?<!\[[^\[\]]*)favorite

Step by step: (?<! is the negative lookbehind prefix, we're looking for \[ optionally followed by none or more things that are not closing or opening brackets: [^\[\]]*, close the negative lookbehind with ), and then favorite right after.

Share:
16,591
Admin
Author by

Admin

Updated on June 09, 2022

Comments

  • Admin
    Admin almost 2 years

    How can I check if a string contains a substring, but only in a specific position?

    Example string:

    What is your favorite color? my [favorite] color is blue

    If I wanted to check if the string contained a specific word I usually do this:

    var
      S: string;
    begin
      S := 'What is your favorite color? my [favorite] color is blue';
      if (Pos('favorite', S) > 0) then
      begin
        //
      end;
    end;
    

    What I need is to determine if the word favorite exists in the string, ignoring though if it appears inside the [ ] symbols, which the above code sample clearly does not do.

    So if we put the code into a boolean function, some sample results would look like this:

    TRUE: What is your favorite color? my [my favorite] color is blue

    TRUE: What is your favorite color? my [blah blah] color is blue

    FALSE: What is your blah blah color? my [some favorite] color is blue

    The first two samples above are true because the word favorite is found outside of the [ ] symbols, whether it is inside them or not.

    The 3rd sample is false because even though there is the word favorite, it only appears inside the [ ] symbols - we should only check if it exists outside of the symbols.

    So I need a function to determine whether or not a word (favorite in this example) appears in a string, but ignoring the fact if the word is surrounded inside [ ] symbols.

  • Admin
    Admin over 11 years
    Great answer, especially useful is the link to the answer with comments, makes it a little easier to digest and understand what is happening.
  • TLama
    TLama over 11 years
    Thanks! Anyway, regex is the right way to do what you need (and surely easier), but on the other hand, this is more straight just to this specific task (and more efficient I'd say, since regex at least needs to parse the expression before starts to match). I'd say, if you're not going to build some parser for instance, where would you have many similar tasks like this match, then this solution might be lighter than including regex. But the main reason, why I've posted this is that none of the answers here used pure Delphi.
  • diegoaguilar
    diegoaguilar almost 11 years
    I think yours is an elegant and proper solution