BeautifulSoup: what's the difference between 'lxml' and 'html.parser' and 'html5lib' parsers?

python html web-scraping beautifulsoup lxml

16,401

Solution 1

From the docs's summarized table of advantages and disadvantages:

html.parser - BeautifulSoup(markup, "html.parser")
- Advantages: Batteries included, Decent speed, Lenient (as of Python 2.7.3 and 3.2.)
- Disadvantages: Not very lenient (before Python 2.7.3 or 3.2.2)
lxml - BeautifulSoup(markup, "lxml")
- Advantages: Very fast, Lenient
- Disadvantages: External C dependency
html5lib - BeautifulSoup(markup, "html5lib")
- Advantages: Extremely lenient, Parses pages the same way a web browser does, Creates valid HTML5
- Disadvantages: Very slow, External Python dependency

Solution 2

The key differences are highlighted in the BeautifulSoup documentation:

Differences between parsers

The basic reasoning why would you prefer one parser instead of others:

html.parser- built-in - no extra dependencies needed
html5lib - the most lenient - better use it if HTML is broken
lxml - the fastest

16,401

Related videos on Youtube

Author by

duc hathaway

Updated on July 17, 2022

Comments

duc hathaway almost 2 years
When using Beautiful Soup what is the difference between 'lxml' and "html.parser" and "html5lib"?

When would you use one over the other and the benefits of each? When I used each they seemed to be interchangeable, but people here correct me that I should be using a different one. I'd like to strengthen my understanding; I've read a couple posts on here about this but they're not going over the uses much in any at all.

Example:
```
soup = BeautifulSoup(response.text, 'lxml')
```
kd88 almost 6 years

Thanks - html5lib (as a parser of broken HTML) just saved my bacon

Recents

Why Is PNG file with Drop Shadow in Flutter Web App Grainy?

How to troubleshoot crashes detected by Google Play Store for Flutter app

Cupertino DateTime picker interfering with scroll behaviour

Why does awk -F work for most letters, but not for the letter "t"?

Flutter change focus color and icon color but not works

How to print and connect to printer using flutter desktop via usb?

Critical issues have been reported with the following SDK versions: com.google.android.gms:play-services-safetynet:17.0.0

Flutter Dart - get localized country name from country code

navigatorState is null when using pushNamed Navigation onGenerateRoutes of GetMaterialPage

Android Sdk manager not found- Flutter doctor error

Flutter Laravel Push Notification without using any third party like(firebase,onesignal..etc)

How to change the color of ElevatedButton when entering text in TextField

Related

How to scrape google maps using python

Find index of tag with certain text in beautifulsoup/python

Python - Web Scraping HTML table and printing to CSV

Get the href text of a link that has a certain class attribute using BeautifulSoup in Python

HTML encoding and lxml parsing

Using BeautifulSoup to find specific text on a webpage

lxml.html parsing with XPath and variables

Beautiful Soup and Table Scraping - lxml vs html parser

Python + BeautifulSoup: How to get ‘href’ attribute of ‘a’ element?

Python BeautifulSoup scrape tables