How to find all patterns between two characters?
Solution 1
First of all, your grep -Po '"\K[^"]*' file
idea fails because grep
sees both "One"
and ". the second is here"
as being inside quotes. Personally, I'd probably just do
$ grep -oP '"[^"]+"' file | tr -d '"'
One
Two
Three
Four
But that is two commands. To do it with a single command, you could use one of:
-
Perl
$ perl -lne '@F=/"\s*([^"]+)\s*"/g; print for @F' file One Two Three Four
Here, the
@F
array holds all matches of the regex (a quote, followed by as many non-"
as possible until the next"
). Theprint for @F
just means "print each element of@F
. -
Perl
$ perl -F'"' -lne 'for($i=1;$i<=$#F;$i+=2){print $F[$i]}' file One Two Three Four
To remove leading/trailing spaces from each match, use this:
perl -F'"' -lne 'for($i=1;$i<=$#F;$i+=2){$F[$i]=~s/^\s*|\s$//; print $F[$i]}' file
Here, Perl is behaving like
awk
. The-a
switch causes it to automatically split input lines into fields on the character given by-F
. Since I have given it"
, the fields are:$ perl -F'"' -lne 'for($i=0;$i<=$#F;$i++){print "Field $i: $F[$i]"}' file Field 0: first matched is Field 1: One Field 2: . the second is here Field 3: Two Field 0: and here are in second line Field 1: Three Field 2: Field 3: Four Field 4: .
Because we are looking for text between two consecutive field separators, we know we want every second field. So,
for($i=1;$i<=$#F;$i+=2){print $F[$i]}
will print the ones we care about. -
The same idea but in
awk
:$ awk -F'"' '{for(i=2;i<=NF;i+=2){print $(i)}}' file One Two Three Four
Solution 2
The key is to consume the quotes in your expression. Hard to do that with a single grep command. Here's a perl one-liner:
perl -0777 -nE 'say for /"(.*?)"/sg' file
That slurps the whole input and prints out the captured part of the match. It will work even if there's a newline inside the quotes, although it then becomes difficult to separate elements with and without newlines. To help with that, use a different character as the output record separator, the null character for instance
perl -0777 -lne 'print for /"(.*?)"/sg} BEGIN {$\="\0"' <<DATA | od -c
blah "first" blah "second
quote with newline" blah "third"
DATA
0000000 f i r s t \0 s e c o n d \n q u o
0000020 t e w i t h n e w l i n e \0
0000040 t h i r d \0
0000046
Solution 3
This could be possible with the below grep one liner and i assumed that you have balanced quotation marks.
grep -oP '"\s*\K[^"]+?(?=\s*"(?:[^"]*"[^"]*")*[^"]*$)' file
Example:
$ cat file
first matched is "One". the second is here"Two "
and here are in second line" Three ""Four".
$ grep -oP '"\s*\K[^"]+?(?=\s*"(?:[^"]*"[^"]*")*[^"]*$)' file
One
Two
Three
Four
Another hair pulling solution through PCRE verb (*SKIP)(*F)
,
$ grep -oP '[^"]+(?=(?:"[^"]*"[^"]*)*[^"]*$)(*SKIP)(*F)|\s*\K[^"]+(?=\b\s*)' file
One
Two
Three
Four
αғsнιη
SeniorDevOpsEngineer at #HUAWEI since March-2015 (#opentowork https://www.linkedin.com/in/-rw-r--r--) ʷⁱˡˡⁱⁿᵍ ᵗᵒ ˢᵉᵉ ʸᵒᵘ ⁱⁿ ᵃ ᵐⁱʳʳᵒʳ ᵐᵃᵈᵉ ᵒᶠ ᵐʸ ᵉʸᵉˢ # touch 'you ◔◡◔'
Updated on September 18, 2022Comments
-
αғsнιη over 1 year
I'm trying to find all patterns between a pair of double quotes. Let say I have a file with contents look like as following:
first matched is "One". the second is here"Two " and here are in second line" Three ""Four".
I want to below words as output:
One Two Three Four
As you can see all strings in output are between a pair of quotes.
What I tried, is this command:
grep -Po ' "\K[^"]*' file
Above command works fine if I have a space before first pair of
"
marks. For example it works if my input file contains the following:first matched is "One". the second is here "Two " and here are in second line " Three " "Four".
I know I can do this with multiple commands combination. But I'm looking for one command and without using that for multiple time. e.g: below command
grep -oP '"[^"]*"' file | grep -oP '[^"]*'
How can I achieve/print all of my patterns just using one command?
Reply to comments: It's not important for me to removing whitespace around matched pattern inside a pair of quotes, but it would be better if the command support it too. and also my files contain nested quotes like
"foo "bar" zoo"
. And all of the quoted words are in separate lines and they are not expanded to multi lines.Thanks in advance.
-
Admin over 9 yearsCan you have nested quotes? Things like
"foo "bar""
? If yes, how should those be dealt with? -
Admin over 9 years@terdon I wrote I think
"One". the second is here "Two "
and also" Three ""Four"
are nested. isn't it? -
Admin over 9 yearsNo, nested would be where the first quote includes the second. Yours are just next to each other. Nested:
"foo "bar" baz"
, not nested:` "foo""bar"`. -
Admin over 9 yearsIs it possible for the quoted text to contain newlines?
-
Admin over 9 years@KasiyA could you post a single example which satisfies all your needs along with the expected output?
-
Admin over 9 years@KasiyA added an answer, check it :-)
-
-
αғsнιη over 9 yearsIs there any option to remove last printed char like
\b
for example in c++. -
terdon over 9 years@KasiyA where? What printed char? From which of the suggestions?
-
αғsнιη over 9 yearsThank you Glenn my this command
grep -Po ' "\K[^"]*' file
works if I have a single space before first left pair of"
s in my input file. Is there any replace regex that I change space here... -Po '[HERE]"\K ...
with that regex. replacing space char to match for all chars like[a-zA-Z]
-
terdon over 9 years@KasiyA no. The problem is that
grep
will match theOne
and print it. Then, the remaining text is". the second is here"
which also matches. I don't think that grep's PCRE engine has any way of avoiding that. -
terdon over 9 years@KasiyA to do it without scripting, just use the
grep
/tr
suggestion. Remember that pipes are The UNIX Way®, there's no reason to avoid them. You can't do it ingrep
(at least I don't think so) becausegrep
will start matching again where the last match ended, which means that after the first hit, everything will start with a"
. -
fiatux over 9 yearswhich is why I wrote that the expression must consume the trailing quote.
-
terdon over 9 years@glennjackman exactly. Do you have any idea if that's possible in
grep
?