"^ backslash not last character on line" in gawk

5,755

gensub() expects a string as second argument. You are trying to concatenate the / and , around an expression (\\1+3) which you assume will be evaluated by the function. It will not. It is evaluated before calling the function. You use \1 to refer to the matching capture group () in the regexp, but you can only use it in a string, not in an expression.

So at best you could use as second argument "/\\1+3,", but you would then get the result ...Backslash/49+3,Black. You cannot evaluate the 49+3 part in this way.

If you want to do arithmetic on the match, you must first extract the string, do the arithmetic, then place it back in the string. For example,

awk '{ n = split($0, d, /\/([0-9]+),/, s)
       print d[1] "/"(substr(s[1],2)+3)"," d[2] }'

This uses gnu awk's split() function with a regexp to split the line into 3 parts: the part before the match in d[1], the part after the match in d[2], and the matched string "/49," in s[1]. You should really check n is 2 to ensure you got exactly one match.

You can then extract the number from the matched string by simply skipping over the initial "/", do the arithmetic, and concatenate all the parts together again.


If the pattern may appear several times in one line of your data, a better solution is to use match() to find only the last occurence and cut up the line using substr():

awk '{ match($0, /.*\/([0-9]+),/, m)
       a = m[1,"start"]
       b = m[1,"length"]
       if(a)print substr($0,1,a-1) substr($0,a,b)+3 substr($0,a+b)
       else print }'

Here the pattern has .* added at the front to match only the last occurence. a is set to the character position of the start of the capture group () in the regexp, and b to its length, so substr($0,a,b) is just the number. The final line is reassembled from the two other parts of the original data.

Share:
5,755

Related videos on Youtube

Tim
Author by

Tim

Elitists are oppressive, anti-intellectual, ultra-conservative, and cancerous to the society, environment, and humanity. Please help make Stack Exchange a better place. Expose elite supremacy, elitist brutality, and moderation injustice to https://stackoverflow.com/contact (complicit community managers), in comments, to meta, outside Stack Exchange, and by legal actions. Push back and don't let them normalize their behaviors. Changes always happen from the bottom up. Thank you very much! Just a curious self learner. Almost always upvote replies. Thanks for enlightenment! Meanwhile, Corruption and abuses have been rampantly coming from elitists. Supportive comments have been removed and attacks are kept to control the direction of discourse. Outright vicious comments have been removed only to conceal atrocities. Systematic discrimination has been made into policies. Countless users have been harassed, persecuted, and suffocated. Q&A sites are for everyone to learn and grow, not for elitists to indulge abusive oppression, and cover up for each other. https://softwareengineering.stackexchange.com/posts/419086/revisions https://math.meta.stackexchange.com/q/32539/ (https://i.stack.imgur.com/4knYh.png) and https://math.meta.stackexchange.com/q/32548/ (https://i.stack.imgur.com/9gaZ2.png) https://meta.stackexchange.com/posts/353417/timeline (The moderators defended continuous harassment comments showing no reading and understanding of my post) https://cs.stackexchange.com/posts/125651/timeline (a PLT academic had trouble with the books I am reading and disparaged my self learning posts, and a moderator with long abusive history added more insults.) https://stackoverflow.com/posts/61679659/revisions (homework libels) Much more that have happened.

Updated on September 18, 2022

Comments

  • Tim
    Tim almost 2 years

    I would like to match a number between / and , in each line, and increase it by 3. For example

    The Ubiquitous Backslash/49,Black

    becomes

    The Ubiquitous Backslash/52,Black

    My gawk command is:

    $ gawk '{b=gensub(/\/([0-9]+),/, "/" (\\1+3) ",") ; print b}' add.jpdf 
    gawk: cmd. line:1:                    ^ backslash not last character on line
    

    I was wondering what "^ backslash not last character on line" means? Which gawk syntax rule does my solution violate?

    Thanks.

  • Tim
    Tim almost 7 years
    Thanks. (1) From gnu.org/software/gawk/manual/html_node/String-Functions.html‌​, b = gensub(/(.+) (.+)/, "\\2 \\1", "g", a), are \1 and \2 evaluated by the function or before calling the function? Why does it work? (2) Does substr(s[1],2) return the suffice of string s[1] starting from the second character of s[1]? Will its output include the last character ,?
  • meuh
    meuh almost 7 years
    \1 and \2 in the string passed to gensub will get replaced by gensub for every match. But the string passed to gensub stays otherwise fixed. In your post you try to provide an expression, but that must be converted into a string (which can contain \1) before it gets passed to gensub. The substr call is as you described, it starts from character position 2 until the end.
  • Tim
    Tim almost 7 years
    Thanks. I see. How would you do to "check n is 2 to ensure you got exactly one match"?
  • Tim
    Tim almost 7 years
    In gensub(/\/([0-9]+),/, "/" (\\1+3) ","), what will \\1+3 in the second argument become after evaluating and then what will it become after converting to string?
  • meuh
    meuh almost 7 years
    I added an alternative answer for when there is more than one match on a line. You cannot ask about \\1+3, as it is illegal awk, so means nothing.
  • Tim
    Tim almost 7 years
    Thanks. Can your way deal with the case that there is no match (where we should do nothing)?
  • meuh
    meuh almost 7 years
    The if(a) tests if there was a match. If not just print the line. I edited the answer.
  • Tim
    Tim almost 7 years
    Thanks a lot @meuh. I was wondering what syntax error "^ backslash not last character on line" in my original solution mean?