Working With Beautiful Soup in Python

🎭 Want to master this with real projects? Join the Playwright Automation Mastery course at The Testing Academy.

Beautiful Soup is a famous Python library which is generally used to get the desired data from HTML, XML files using your famous parser(html5.parser, lxml parser) for navigating, searching and modifying the html tree. In this series, First we will learn the basics of the Beautiful Soup and at the end we will work on a demo project. Installing the Beautiful Soup:- To install the Beautiful Soup on the Windows machine use below mentioned PIP command. [code]pip install beautifulsoup4[/code] Download a HTML document and display by Beautiful Soup:- To download and see the beautiful HTML document in Beautiful Soup. Lets Import the Beautiful Soup and urllib module to the project.

from bs4 import BeautifulSoup
import urllib.request

We are going to use “Prettify” Method to see the HTML document in the console. [python] __author__ = ‘WP8Dev’ from bs4 import BeautifulSoup import urllib.request def main(): print("***********") testUrl = "http://scrolltest.com/about-us/" pageSource = urllib.request.urlopen(testUrl) soupPKG = BeautifulSoup(pageSource) print(soupPKG.prettify()) if __name__=="__main__": main() [/python] Output:- Basics of the Beautiful Soup:- Beautiful Soup transforms a complex HTML document into a complex tree of Python objects. But you’ll only ever have to deal with about four kinds of objects: – Tag – NavigableString – BeautifulSoup – Comment Tag:- A Tag object corresponds to an XML or HTML tag in the original document: e.g [python] soup = BeautifulSoup(‘Extremely bold’)tag = soup.b print(tag)[/python] Get the Attribute of the tag, You can access a tag’s attributes by treating the tag like a dictionary:- Single-values attrubute [code]tag[‘class’][/code] To get the all attribs [code]tag.attrs[/code] Multi-valued attributes [python] css_soup = BeautifulSoup(‘’) css_soup.p[‘class’] # ["body", "strikeout"] [/python] NavigableString A NavigableString is just like a Python Unicode string, except that it also supports some of the features described in Navigating the tree and Searching the tree. [python]unicode_string = unicode(tag.string) unicode_string # u’Extremely bold'[/python] Comments and other special strings [code]markup = "<!–Hey, buddy. Want to buy a used parser?–>" soup = BeautifulSoup(markup) comment = soup.b.string print(comment)[/code] BeautifulSoup:- The BeautifulSoup object itself represents the document as a whole. For most purposes, you can treat it as a Tag object. Lets build a Simple program to get the all the links of the page using “find_all(‘a’)”. [python] from bs4 import BeautifulSoup import urllib.request def main(): print("***********") testUrl = "http://scrolltest.com/about-us/" pageSource = urllib.request.urlopen(testUrl) soupPKG = BeautifulSoup(pageSource) #print(soupPKG.prettify()) for link in soupPKG.find_all("a"): print(str(link)) if __name__=="__main__": main() [/python] Now we have basics of BS, In the next tutorial we will learn more about the Beautiful Soup usage and create a demo project to scarp a website.

🎓 Master Playwright End to End

Join hundreds of SDETs building real automation frameworks. Lifetime access, hands-on projects, and a job-ready portfolio.

Enroll in Playwright Automation Mastery →

2 Comments

Margret says:

June 16, 2015 at 7:15 am

Hello this is somewhat of off topic but I was wanting to know if blogs use WYSIWYG editors or if you have to manually code with HTML.
I’m starting a blog soon but have no coding know-how so I wanted
to get guidance from someone with experience. Any help would
be enormously appreciated!

nang nguc noi soi says:

June 16, 2015 at 10:43 pm

absolutely much like your web-site however you really need to check out the spelling about several of this blogposts. Some of possibilities filled having punctuational problems and I to get this extremely bothersome in all seriousness on the other hand I’ll definitely go back yet again.

Working With Beautiful Soup in Python

🎓 Master Playwright End to End

The Ultimate Cheat Sheet on XPath in Python

Using Wait with Selenium in Python.

10 minutes Twitter Bot with Tweepy in Python

Introduction to docker – Build an Image from scratch.

3 Reason Why You Can’t Become an Automation Tester

Learning Python Smart Way : Numbers & Strings

2 Comments

Leave a Reply Cancel reply

🎓 Master Playwright End to End

Similar Posts

2 Comments

Leave a Reply Cancel reply