Intro to beautiful soup

AndreasChandra3 1,513 views 13 slides Jun 28, 2017

Slide 1 of 13

About This Presentation

Introducing beautiful soup for web scraping using python 3

Size: 411.21 KB

Language: en

Added: Jun 28, 2017

Slides: 13 pages

Slide Content

Intro to Beautiful Soup
ANDREAS CHANDRA

What is Beautiful Soup
crummy.com define Beautiful Soup is a Python library for pulling data out of HTML and XML
files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and
modifying the parse tree. It commonly saves programmers hours or days of work.

Install
Simply open your terminal or command prompt
◦$ easy_installbeautifulsoup4
Or
◦$ pip install beautifulsoup4

Getting Basic -Making a soup
Beautifulsoupapply html as a string
Example:
”””
<html><head><title>Andreas Chandra</title></head>
<body>
<h1>Hello World!</h1>
</body>
</html>
"""

Getting Basic -Making a soup
Then convert the string to Beautiful Soup format
soup = BeautifulSoup(html_doc, "html.parser")

Getting Basic -Extract
If you want to get the title of website simply code:
soup.title.text
Result:
‘Andreas Chandra’

Case Study –Detik.com
You want to get the title of popular article on the website.
What do you do first?

Case Study –Detik.com
1.Import library bs4 and urllib3 (python3)

Case Study –Detik.com
2.Download HTML from the page

Case Study –Detik.com
3.Select tag and id for most popular, you can get the id name and tag by inspect element the
page

Case Study –Detik.com
4.Find all ‘li’ for the list of most popular article

Case Study –Detik.com
5.Then iterate the selected ‘li’ and get the title of articles

Done
Cool, you can get the title of most popular article on detik.com, now you should not select, copy
and paste to your excel or your word to collect the article, further action you can save it to csv,
or txt for doing text mining.

Intro to beautiful soup

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Intro to beautiful soup

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx