---
title: 'BeautifulSoup Tutorial: Web Scraping with Python [2026]'
source: 'https://youtube.com/watch?v=ztxIxbRyPJI'
video_id: 'ztxIxbRyPJI'
date: 2026-06-17
duration_sec: 0
---

# BeautifulSoup Tutorial: Web Scraping with Python [2026]

> Source: [BeautifulSoup Tutorial: Web Scraping with Python (2026)](https://youtube.com/watch?v=ztxIxbRyPJI)

## Summary

This tutorial teaches web scraping with Python's BeautifulSoup library. It covers setting up a virtual environment, fetching web pages with requests, parsing HTML, extracting data, saving to CSV, and handling common errors. The video emphasizes responsible scraping practices.

### Key Points

- **Introduction to Web Scraping** [0:00] — Web scraping allows automatic extraction of data from websites, solving the problem of manual copy-pasting.
- **What is BeautifulSoup?** [0:25] — BeautifulSoup is a Python library that reads webpage code and extracts specific pieces, like a highlighter for a magazine.
- **Setting Up a Virtual Environment** [1:04] — Use pipenv to create a clean workspace for the project, preventing library conflicts.
- **Installing Requests** [1:38] — Install the requests library with 'pip install requests' to fetch webpages.
- **Fetching a Website with Requests** [2:05] — A simple script fetches a URL and prints raw HTML, which is messy and unreadable.
- **User-Agent Header** [2:41] — Adding a user-agent header (e.g., 'Mozilla/5.0') makes the script look like a real browser, avoiding blocks.
- **Installing BeautifulSoup** [3:07] — Install BeautifulSoup with 'pip install beautifulsoup4'.
- **Scraping Quotes from Practice Site** [4:52] — Use BeautifulSoup to parse HTML and extract quotes, authors, and tags from quotes.toscrape.com.
- **Saving Data to CSV** [6:55] — Import csv module, open a file, write header row, and loop through parsed data to save as CSV.
- **Common Errors and Fixes** [8:36] — Three common errors: NoneType error (wrong class name), empty results (wrong tag/class), and connection timeout (use try-except).
- **Responsible Scraping** [10:48] — Check robots.txt, add delays (time.sleep(1)), and read terms of service before scraping.

### Conclusion

With BeautifulSoup, you can scrape any publicly visible web data. Start with the practice site, then apply these techniques to real projects responsibly.

## Transcript

Have you ever been on a website, scrolling 
through a huge list of quotes, names, or data,  
and thought – I wish I could just grab all of 
this automatically instead of copying it one  
by one? Well, that's exactly what web scraping 
lets you do. And today, we're going to learn how  
to use a Python library called BeautifulSoup.
So what is BeautifulSoup? In plain English,  
it's a tool that reads the code behind a webpage 
and lets you pull out exactly the pieces you  
want. Think of a webpage like a magazine. 
BeautifulSoup is like a highlighter that  
lets you mark just the quotes, just the author 
names, or just the tags – whatever you're after. 
The best part? You don't need to be an expert. 
If you know basic Python – variables, loops,  
print statements – you're completely 
ready for this. We're going to go slow,  
explain every single line, and by the end of 
this video you'll be pulling real data off a  
real website and saving it to a file.
Alright – let's get into it. 
Before we install anything, let's set up a 
virtual environment. Think of it like a clean,  
separate workspace just for this project – so 
the libraries we install here don't interfere  
with anything else on your computer.
First, let’s install pipenv. 
Then navigate to your project 
folder and run this command. 
That's it – one command. It creates the virtual 
environment and activates it at the same time.  
You'll see your terminal change, which means 
you're now inside the environment and ready to go. 
Now, let's get another thing 
installed. Open your terminal and type: 
pip install requests
What's requests? It's a  
Python library that lets your code fetch 
a webpage – kind of like telling Python,  
'go open this URL for me and bring 
back whatever you find.' That's it. 
We're NOT installing BeautifulSoup yet. There's 
a reason for that – and you're about to see why. 
Alright, let's write a quick script. 
We're going to try fetching a website  
using just requests, with no extra setup.
Simple enough – we're telling Python: go to this  
URL, grab the content, and print it out.
Let's run it.
Okay - you'll see a massive wall of HTML 
printed out. The data is in there somewhere,  
but it's basically unreadable. You'd have to 
dig through hundreds of lines just to find  
one author's name. We need something better.
When your browser visits a website, it sends  
a small label called a user-agent – basically 
saying 'I'm Google Chrome on a Mac.' A plain  
Python script doesn't send that, so some websites 
block it immediately. BeautifulSoup, combined  
with adding that label to our request, fixes both 
problems – it makes us look like a real browser,  
and it organizes that messy HTML into something 
we can actually search through. Let's set that up. 
Time to install BeautifulSoup in 
your terminal with this command. 
Note that it's beautifulsoup4 with a 4 at the 
end. That's just how the package is named. 
Now here are our imports.
And this time we're adding a headers dictionary. 
Now you might be looking at Mozilla/5.0 
and thinking – what is that?  
It's actually a browser signature. When a real 
browser like Chrome or Firefox visits a website,  
it sends this string to identify itself. 
Mozilla/5.0 is the base signature that  
almost every modern browser uses – Chrome, 
Firefox, Safari, they all start with it. 
So by adding this to our request, 
we're telling the website – 'hey,  
I'm a normal browser, not a Python script.' 
Most websites will see this and let us through. 
If you're curious, a full real user-agent 
string actually looks like this. 
But for our tutorial today, the 
short version works perfectly fine. 
Let’s continue the code and 
confirm if everything is working. 
If you see 200 printed – that means success. 
Status code 200 is the web's way of saying  
everything is fine and you're in. 
Alright – let's start scraping. 
This is the main part of the video. We're going 
to scrape real quotes, author names, and tags  
from quotes.toscrape.com. This site was built 
specifically for practicing web scraping – so  
it's completely safe and legal to use here.
First, we pass the HTML into  
BeautifulSoup with this command.
response.text is the raw HTML of  
the page – all the code that makes the website. 
And 'html.parser' is telling BeautifulSoup which  
tool to use to read that code. The good news - 
it's built into Python, no extra install needed. 
Now let's find the quotes. If 
you right-click and hit Inspect,  
you'll see each quote lives inside a div with a 
class of quote. So we grab them all like this. 
The find_all variable searches through the entire 
HTML and returns every element that matches. 
Now quotes is a list and let's loop through it.
This just means – go through each quote  
one by one and do the following:
We're looking inside each quote block  
for a span with the class text, and grabbing 
the text inside it. That's the quote itself. 
Same idea – find a small tag with the class 
author. We include the tag name small to be more  
specific, because a class name alone isn't always 
unique on a page. That gives us the author's name.
Here we create an empty list called tags, then 
loop through all the tag links inside each  
quote and add each one to that list.
And finally we print everything out. 
The '\n' just adds a blank line between 
each quote so the output is easy to read. 
And join(tags) joins all the tags into 
one clean string separated by commas. 
Let's run it. 
Look at that. Real quotes, real authors, 
real tags – all pulled automatically. 
Now let's save this data to a CSV file so 
you can open it in Excel or Google Sheets. 
First import the csv.
Next, write this command. 
This line opens a new file called quotes.csv. 
The 'w' means we're writing to it – creating  
it fresh. Newline prevents extra blank lines 
appearing between rows, and encoding='utf-8'  
makes sure special characters like apostrophes 
or accented letters don't break anything. 
The writer variable creates a CSV writer – think 
of it as the pen that writes into our file. 
writer.writerow writes our header row. The column 
names at the top of the spreadsheet – the text,  
author, and tags we already extracted above.
The parsing code in the middle is exactly the  
same as before – we're just wrapping it inside the 
file writer now. No need to change anything there. 
Run it, and you'll see a quotes.csv file 
appear in your project folder. Open it,  
all your quotes are right there.
Before we wrap up, let's talk about the  
errors you will almost definitely run 
into. And I mean this happens to everyone,  
it's completely normal, and once you know 
what they are you'll fix them in seconds.
We will have a new errors.py file for that.
Error one - NoneType error. 
Let me show you this live. Watch what 
happens when I use the wrong class name. 
See that? AttributeError: 'NoneType' object has 
no attribute 'text' – BeautifulSoup returned None  
because it found nothing, and then .text on 
None crashes. The fix is simple – let’s see.
Error two - Empty results.
Similar idea but this time with find_all. Watch. 
It just returns an empty list [] - no crash, but 
no data either. This means a wrong class name or  
tag. Note that during inspection copy the exact 
class name carefully. One typo is all it takes.
Error three - Connection timeout.
This one I can't show you live because  
our practice site is too fast and reliable – 
which is a good thing. But in the real world,  
when you're scraping slower or larger websites,  
timeouts will happen. Here's the code you'll 
need when they do – just keep it handy. 
That's it for errors. Three 
simple fixes that'll cover the  
vast majority of what you'll face as a beginner.
Before we finish, a few important 
things about scraping responsibly. 
First – always check robots.txt. Go to 
any website and add /robots.txt at the  
end of the URL. Let me show 
you – walmart.com/robots.txt 
See these lines?
The Sitemap part is not related to us. 
Disallow means don't scrape this section. 
Allow means this part is open. Always read  
this file before scraping any real 
website and respect what it says. 
Now our practice site quotes.toscrape.com doesn't 
even have a robots.txt – and that's actually  
intentional. It was built specifically 
to be scrapped freely, no restrictions.
Now let's talk about how to be polite  
when scraping multiple pages.
We can do this in our multi-page  
example with time.sleep(1). You add it inside 
your loop, after each request. One second  
between requests is polite – you're 
not hammering the server all at once.
Third – always read the terms of service 
of any site before scraping it seriously.  
Some sites explicitly say no scraping. Better 
to check than to get blocked or in trouble. 
And that's a wrap! Let's do a quick 
recap of what we covered today. 
We started with a problem – a plain requests 
call with no setup, messy unreadable HTML,  
and no way to extract anything useful. 
Then we installed BeautifulSoup,  
added a user-agent header to look like a real 
browser, and suddenly everything worked cleanly. 
We scraped real quotes, authors, and tags from a 
live website, handled some common errors you'll  
run into, and talked about scraping responsibly 
with robots.txt, delays, and terms of service. 
That's a lot for one video – 
and you did it all from scratch. 
From here you can scrape 
news headlines, job postings,  
sports stats, product listings – anything 
publicly visible on the web. BeautifulSoup  
is your starting point for all of it.
If this helped, hit subscribe and drop  
a comment telling me what you're planning 
to scrape first. See you in the next one.
