Extracting Data with Python Regular Expressions
Solution 1
t = "\"productId\":\"111111\""
m = re.match("\W*productId[^:]*:\D*(\d+)", t)
if m:
print m.group(1)
meaning match non-word characters (\W*
), then productId
followed by non-column characters ([^:]*
) and a :
. Then match non-digits (\D*
) and match and capture following digits ((\d+)
).
Output
111111
Solution 2
something like this:
In [13]: s=r'\"productId\":\"111111\"'
In [14]: print s
\"productId\":\"111111\"
In [15]: import re
In [16]: re.findall(r'\d+', s)
Out[16]: ['111111']
Solution 3
The backslashes here might add to the confusion, because they are used as an escape character both by (non-raw) Python strings and by the regexp syntax.
This extracts the product ids from the format you posted:
re_prodId = re.compile(r'\\"productId\\":\\"([^"]+)\\"')
The raw string r'...'
does away with one level of backslash escaping; the use of a single quote as the string delimiter does away with the need to escape double quotes; and finally the backslashe are doubled (only once) because of their special meaning in the regexp language.
You can use the regexp object's findall()
method to find all matches in some text:
re_prodId.findall(text_to_search)
This will return a list of all product ids.
Related videos on Youtube
greyfox
I'm a web developer from Columbus, Ohio. I studied Computer Science at Capital University in Columbus, where I received my bachelor's degree. In college I was heavy into C++ and Python. I dabbled my hands in Objective-C/Cocoa as well. After college I began doing web development using PHP/MySQL. I really fell in love with web development. Now I'm transitioning into Java/Spring MVC. At some point I would like to get more into ASP.NET MVC.
Updated on November 18, 2020Comments
-
greyfox over 3 years
I am having some trouble wrapping my head around Python regular expressions to come up with a regular expression to extract specific values.
The page I am trying to parse has a number of productIds which appear in the following format
\"productId\":\"111111\"
I need to extract all the values,
111111
in this case.-
cmd about 11 yearsIs it that you are new to regex, python, or both? Which part do you need help with? What have you tried?
-
Андрей Беньковский over 8 yearsPossible duplicate of how to extract a substring from inside a string in Python?
-
-
skytreader almost 9 yearsI find this more Pythonic. :)
-
Tim MB over 2 yearsDoes this not need to be a raw string, or to have the backslashes escaped?