Extracting URL link using regular expression re - string matching - Python

python string url matching extraction

10,358

re.findall(r'https?://[^\s<>"]+|www\.[^\s<>"]+', str(STRING))

The [^\s<>"]+ part matches any non-whitespace, non quote, non anglebracket character to avoid matching strings like:

<a href="http://www.example.com/stuff">
http://www.example.com/stuff</br>

10,358

Author by

Eternity

Developer

Updated on June 26, 2022

Comments

Eternity almost 2 years
I've been trying to extract URLs from a text file using re api. any link that starts with http:// , https:// and www.

the file contains texts as well as html source code, html part is easy because i can extract them using BeautifulSoup, but normal text seems to be more challenging. I found this online which seems to be the best implementation of URL extraction however it fails on certain tags, specially it can't handle tags and includes them in the URL. any help is appreciated, because I'm not familiar with string matching at all myself

here is the signature
```
sp1=re.findall("http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+", str(STRING))
sp2=re.findall('www.(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', str(STRING))
```
examples:
```
http://www.website.com/science/</span></a><o:p></o:p></span></div><div
www.website.com/library/</span></a></span></i><span
http://awebsite.com/Groups</a><div>
```
Eternity about 12 years

awesome, Works like a champ :)..Thanks mate

Recents

Why Is PNG file with Drop Shadow in Flutter Web App Grainy?

How to troubleshoot crashes detected by Google Play Store for Flutter app

Cupertino DateTime picker interfering with scroll behaviour

Why does awk -F work for most letters, but not for the letter "t"?

Flutter change focus color and icon color but not works

How to print and connect to printer using flutter desktop via usb?

Critical issues have been reported with the following SDK versions: com.google.android.gms:play-services-safetynet:17.0.0

Flutter Dart - get localized country name from country code

navigatorState is null when using pushNamed Navigation onGenerateRoutes of GetMaterialPage

Android Sdk manager not found- Flutter doctor error

Flutter Laravel Push Notification without using any third party like(firebase,onesignal..etc)

How to change the color of ElevatedButton when entering text in TextField

Related

Python/Pandas: How to Match List of Strings with a DataFrame column

Wildcard matching in Python

How do you extract a url from a string using python?

Extract string from between quotations

Python : Comparing two times, and returning in minutes

Python search text file, print characters following a string

Finding common letters between 2 strings in Python

ValueError: need more than 0 values to unpack

How to convert string to variable name?

Check for valid domain name in a string?