Perl multiline regex
Solution 1
/m
affects what ^
and $
match. You use neither, so /m
has no effect.
You only read a single line at a time, so you only match against a single line at a time. /m
cannot possibly cause the regex to match against data that is awaiting to be read from some file handle it doesn't know anything about.
You could load the entire file into memory by using -0777
and loop over all matches instead of just grabbing the first.
Solution 2
This is pretty straightforward with just grep
and sed
:
grep adGroupId listado.txt | sed -E "s/[^0-9]+//g"
- Match lines with adGroupId in them
- Remove everything that isn't a digit
Solution 3
Depending of exact structure of your data you may make use of line numbers:
while (<>) {
if ( /NumberLong\("?(?<nr>\d+)/ ) {
$.%2 ? print "$+{nr}-" : print "$+{nr}\n";
}
}
Or use flags:
my $flag = 0;
while (<>) {
if ( /NumberLong\("?(?<nr>\d+)/ ) {
!$flag
? (print "$+{nr}-" and $flag++)
: (print "$+{nr}\n" and $flag--);
}
}
Or with slurping:
use 5.010;
my $file;
{
local $/;
$file = <>;
}
while ($file =~ /adGroupId" : NumberLong\("?(?<first>\d+).+?keywordId" : NumberLong\("?(?<second>\d+)/gs ) {
say "$+{first}-$+{second}";
}
Nicolas Rodríguez Seara
Love building solutions to everyday problems using software. Passionate, curious, entrepreneur. www.reclutapro.com
Updated on June 14, 2022Comments
-
Nicolas Rodríguez Seara almost 2 years
I have a file full of json objects to parse, similar to this one:
{ "_id" : ObjectId("523a58c1e4b09611f4c58a66"), "_items" : [ { "adGroupId" : NumberLong(1230610621), "keywordId" : NumberLong("5458816773") }, { "adGroupId" : NumberLong(1230613681), "keywordId" : NumberLong("3204196588") }, { "adGroupId" : NumberLong(1230613681), "keywordId" : NumberLong("4340421772") }, { "adGroupId" : NumberLong(1230615571), "keywordId" : NumberLong("10525630645") }, { "adGroupId" : NumberLong(1230617641), "keywordId" : NumberLong("4178290208") } ]}
I want to take the numbers from inside de NumberLong(). At first I needed just the keywordId, and managed to accomplish it with:
cat listado.txt |& perl -ne 'print "$1," if /\"keywordId\" : NumberLong\(\"?(\d*)\"?\)/' keywordIds.txt
This generated a comma separated file with the numbers. I now need also de adGroupIds, so I'm trying the following matching regex with no luck:
cat ./work/listado.txt |& perl -ne 'print "$1-$2," if /\"adGroupId\" : NumberLong\(\"?(\d*)\"?\),\s*\"keywordId\" : NumberLong\(\"?(\d*)\"?\)/m'
The regex matches, but I believe perl is not doing multiline, even though I'm using
/m
.Any ideas?
-
Hunter McMillen over 10 yearsHe claims to want the numbers. How is this any different? (Other than the lack of commas)
-
Nicolas Rodríguez Seara over 10 yearsYou are only capturing the adgroupid numbers, I need both, adgroupid and keywordid, in a file like this: group1-keyword1, group2-keywd2, ...
-
ikegami over 10 yearsThere's a big difference between 1-2,3-4,5-6 and 1\n3\n5
-
Nicolas Rodríguez Seara over 10 yearsThat returns ok the first group, output: "1230610621-5458816773,". How do I make it keep going?. Oh, and the file is 100MB, if I can avoid uploading it all to mem, better
-
ikegami over 10 years
print "$1-$2," while /.../g;
. Or without the extra comma,push @matches, "$1-$2" while /.../g; END { print join ',' @matches }
-
tijagi over 9 years@Nicolas It surprises me that nobody posted a variant in sed yet.
sed -nr 's/.*adGroupId.*\(([0-9]+)\).*/\1/; Te; N; s/\n.*keywordId.*\("([0-9]+)"\).*$/-\1/; H; :e ${g;s/^\n//;s/\n/,/g;p};' <file