Message 312379 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	steven.daprano
Recipients	ezio.melotti, hanno, steven.daprano
Date	2018-02-19.23:02:09
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<[email protected]>
In-reply-to

Content
The stdlib HTML parser requires correct HTML. To parse broken HTML, as you find in the real world, you need a third-party library like BeautifulSoup. BeautifulSoup is much more complex (about 7-8 times as many LOC) but can handle nearly anything a browser can. I doubt the stdlib will ever compete with BeautifulSoup.

The stdlib HTML parser requires correct HTML.

To parse broken HTML, as you find in the real world, you need a third-party library like BeautifulSoup. BeautifulSoup is much more complex (about 7-8 times as many LOC) but can handle nearly anything a browser can.

I doubt the stdlib will ever compete with BeautifulSoup.

History
Date	User	Action	Args
2018-02-19 23:02:09	steven.daprano	set	recipients: + steven.daprano, ezio.melotti, hanno
2018-02-19 23:02:09	steven.daprano	set	messageid: <[email protected]>
2018-02-19 23:02:09	steven.daprano	link	issue32876 messages
2018-02-19 23:02:09	steven.daprano	create