TubeSum ← Transcribe a video

BeautifulSoup + Requests | Web Scraping in Python

0h 06m video Transcribed Jun 30, 2026 Watch on YouTube ↗
Beginner 3 min read For: Python beginners interested in learning web scraping fundamentals.
277.3K
Views
4.8K
Likes
85
Comments
83
Dislikes
1.8%
📊 Average

AI Summary

This lesson introduces web scraping in Python using the BeautifulSoup and Requests libraries. The instructor demonstrates how to import these packages, send a GET request to a static web page, retrieve the raw HTML, and parse it into a usable BeautifulSoup object. The video also briefly covers common HTTP response codes and the `prettify()` method for better HTML visualization.

[00:00]
Introduction to BeautifulSoup and Requests

These two packages are the main tools for beginner web scraping in Python. They can retrieve and parse HTML from websites.

[00:26]
Lesson Plan Overview

The lesson will cover importing packages, getting HTML from a website, and making it usable. The next lesson will cover querying HTML for specific tags, variable strings, classes, and attributes.

[00:42]
Importing Packages

Import BeautifulSoup from BS4 and import requests. If BS4 is not installed, use `pip install bs4` in the terminal.

[01:34]
Setting the URL

Assign the target URL to a variable (e.g., `url = '...'`). This URL will be used to pull data from the static web page.

[02:07]
Sending a GET Request

Use `requests.get(url)` to send a GET request. A response code of 200 indicates success; codes like 204, 400, 401, or 404 indicate errors.

[03:06]
Parsing HTML with BeautifulSoup

Create a BeautifulSoup object by passing the raw HTML (from `page.text`) and specifying the parser (`'html'`). This converts messy HTML into a structured, usable format.

[05:38]
Using prettify() for Visualization

Call `soup.prettify()` to display the HTML with indentation and hierarchy, making it easier to read. This is useful for visual inspection but not for querying.

This lesson provides the foundational steps for web scraping with BeautifulSoup and Requests, preparing the HTML for detailed querying in subsequent lessons.

Clickbait Check

95% Legit

"The title accurately describes the content: a beginner tutorial on using BeautifulSoup and Requests for web scraping in Python."

Mentioned in this Video

Tutorial Checklist

1 00:42 Import BeautifulSoup from BS4 and import requests.
2 01:34 Assign the target URL to a variable (e.g., `url = '...'`).
3 02:07 Send a GET request using `requests.get(url)` and check the response code (200 = success).
4 03:06 Create a BeautifulSoup object: `soup = BeautifulSoup(page.text, 'html')`.
5 05:38 (Optional) Use `soup.prettify()` to visualize the HTML with indentation.

Study Flashcards (6)

What are the two main Python packages used for beginner web scraping?

easy Click to reveal answer

BeautifulSoup and Requests.

How do you install the BS4 package if it is not already installed?

easy Click to reveal answer

Run `pip install bs4` in the terminal.

01:06

What does a 200 response code from a GET request indicate?

easy Click to reveal answer

The request was successful.

02:07

What does a 404 response code mean?

easy Click to reveal answer

The server cannot be found (page not found).

02:26

What are the two parameters required when creating a BeautifulSoup object?

medium Click to reveal answer

The raw HTML (e.g., `page.text`) and the parser (e.g., `'html'`).

03:18

What is the purpose of the `prettify()` method in BeautifulSoup?

medium Click to reveal answer

It formats the HTML with indentation and hierarchy for easier visual inspection.

05:38

💡 Key Takeaways

💡

Beginner Web Scraping Tools

Establishes BeautifulSoup and Requests as the foundational tools for beginners in web scraping.

📊

HTTP Response Codes

Explains common HTTP response codes (200, 204, 400, 404) and their meanings, which is essential for debugging.

02:07
🔧

Parsing HTML with BeautifulSoup

Demonstrates the core step of converting raw HTML into a structured BeautifulSoup object using the 'html' parser.

03:06
🔧

Using prettify() for Visualization

Introduces a method to make HTML more readable, though notes it is not used for querying.

05:38

✂️ Creator Tools: Viral Hooks

AI-generated clip ideas for Shorts based on the transcript

Why BeautifulSoup & Requests are Essential

45s

Explains the core tools for web scraping, a highly sought-after skill, in a clear beginner-friendly way.

▶ Play Clip

How to Fix Import Errors Fast

60s

Provides a quick troubleshooting tip for common installation issues, saving viewers time and frustration.

▶ Play Clip

HTTP Status Codes: What They Mean

40s

Demystifies common error codes (200, 404, etc.) that many beginners encounter, making it highly educational and shareable.

▶ Play Clip

BeautifulSoup: Turning Messy HTML into Soup

54s

The creator's funny backstory about the library's name adds personality, making the technical content more memorable and engaging.

▶ Play Clip

[00:00] Hello everybody. In this lesson, we're going to be taking a look at beautiful soup and requests. Now these packages and Python are really useful. These are the two main ones that I use and I was first starting out with web scraping. It can get a lot of what you want done in order to get

[00:14] that information out. Now of course, there are other packages that you can use that may be a little bit more advanced, but again, this is just the beginner series and a future series we'll look at other packages as well that have some more advanced functionality. So what we're going to be doing is

[00:26] run import these packages and then we're going to get all of the HTML from our website and make sure that it's in a usable state. And then in the next lesson, we're going to kind of query around in the HTML kind of pick and choose exactly what we want. We'll look at things like tags, variable

[00:42] strings, classes, attributes, and more. So let's get started by importing our packages. What we're going to say is from BS4. This is the module that we're taking it from. We're going to say import

[00:54] and then we'll do beautiful soup. Then we're going to come down and we're going to say import requests. Now, let's go ahead and run this. I'm going to hit shift enter and it works well for me.

[01:06] Now, if this does not work for you, you may potentially need to actually install BS4. So you may have to go to your terminal window and say pip install BS4. I'll just let you Google how to do that if you need to that because it's pretty easy. But if you're using Jupyter notebooks through an

[01:20] a condo like how we set it up at the beginning of this Python series, then you should be totally fine. It should be there for you. The next thing that we need to do is specify where we're taking this HTML from. So what we need to actually do is come right over here to our web page and we need to

[01:34] get the URL. So we're going to go here. We're going to copy this URL and I'm just going to put it right here for a second. And what we're going to do is we're going to using this URL quite a bit. So we just want to assign it to a variable. So we'll just say URL is equal to and then we'll put it right

[01:50] in here. Now we can get rid of that. So now this is our URL going forward. This is where we're pulling data from. Let's go ahead and run this. Now we're going to use requests and what we're going to do is we're going to say requests dot get and then we're going to put in URL. Now this

[02:07] get function is going to use the request library. It's going to send a get request to that URL and it's going to return a response object. Let's go ahead and run this. As you can see here, I got a response of 200. If you got something like a 204 or a 400 or 401 or 404, all of these things are

[02:26] potentially bad. Something like a 204 would mean there was no content in the actual web page. 400 means a bad request. So it was invalid. The server couldn't process it and you don't get any response. If you got a 404, that might be one that you're familiar with. That's an error that means

[02:40] the server cannot be found. The next thing that we're going to do is take the HTML. Now if you remember, we come right back here and we inspect this. We have all of this HTML right here. Now in this web page

[02:52] specifically right now, it's completely static because it's not a bunch of moving stuff or anything like that. Usually when you're looking at HTML, if you're looking at something like Amazon and those web pages can update, but when you actually pull that into Python, you're basically getting a snapshot

[03:06] of the HTML at that time. So what we're going to do is bring in all of this HTML, which is our snapshot of our website and then we can take a look at it. So we're going to come right down here and now we're

[03:18] going to say beautiful soup. So now we'll use the beautiful soup package or libraries when you say beautiful soup. And we're going to do an open parentheses. We're going to do two things. There's two parameters that we need to put in here. First, we need to put in this get request. We actually need to

[03:32] name this and we'll call this page. We'll say page is equal to and let's run this. And now we're going to put that page in here and what we're going to say is dot text. So the page is what's sending that request. And then the dot text is what's retrieving the actual raw HTML that we're going to be using.

[03:49] Then we're going to put a comma here and what we need to specify is how we're going to parse this information. Now this is an HTML. So what we're going to do is HTML just like this. This is a standard that's already built into this library. So we don't need to go any further, but it's basically going

[04:05] to parse the information in an HTML format. Let's go ahead and run this. Let's see what we get. And as you can see, we have a lot of information. And as we scroll down, I'll try to point out some things that we've already looked at in previous lessons. Something like this T h tag,

[04:24] that should be very similar. That's the title. Then we have these TD tags. And then of course, if we scroll down even further, we'll have things like a TR tag. So these are all things that we looked at in that first lesson when learning about HTML. Now again, we want to assign this to a variable.

[04:39] So we're going to say soup. That's going to say equal to this information right here. Now I'm not going to go into all the history behind beautiful soup. But what I will say is the guy who created this beautiful soup library. What he said was is that it takes this really messy HTML or

[04:55] XML, which you can also use it for. And it makes it into this beautiful soup. So I just thought that was kind of funny. But that's why we're calling it soup right here. And we're going to go ahead and run this. And we'll come right down here. And we'll say print soup. And let's run it.

[05:11] And now we have everything in here. So we have our HTML, our head. We have some href and some links in here. Let's scroll down a little bit more. And then we have our body right there. And of course,

[05:23] we have a bunch of information here. Now in the next lesson, what we're going to be doing is learning how to kind of query all of this to take specific information out and basically understand a lot of what's going on in this HTML to make sure we can actually get what we need. Now, if this looks really

[05:38] kind of messy to you and it just doesn't make a lot of sense, there is one more thing that I'm going to show you. And we'll come right down here. So we'll say soup dot purify. And if you've ever used a

[05:50] different type of programming languages purify is very common in a lot of them. We'll just make it a little bit more easy to visualize and see. You'll notice that it kind of has this hierarchy built in. Whereas if we scroll up, there's no hierarchy built in. It's all just down this left hand side.

[06:05] So if you kind of want to view it and just kind of visually see the differences, this does help a lot. But it doesn't actually help a lot when you're querying it or using, you know, find and find all,

[06:17] which is what we're going to look at in the next lesson. So that is our lesson on beautiful soup and requests. And the next two lessons we're going to be looking at find and find all as well as really diving into things like variable strings and tags and classes and all those things. And then

[06:30] in the last lesson, we're going to do kind of this mini project where we try to get all the data from this web page that we've been using from that table and put it into a pandas data frame. So thank you guys so much for watching. I really appreciate it. If you like this video,

[06:42] be sure to like and subscribe below. And I will see you in the next lesson.

⚡ Saved you 0h 06m reading this? Transcribe any YouTube video for free — no signup needed.