Scrape GPU Prices with Python?
54sPromises a practical application of web scraping to find sought-after GPU prices, tapping into current market demand.
▶ Play ClipThis tutorial introduces Beautiful Soup 4, a Python library for web scraping and HTML parsing. It covers installation, reading local HTML files, modifying tags, and fetching web pages using the requests library. The video also demonstrates a practical example of extracting GPU prices from a website.
Beautiful Soup allows extracting information from HTML documents and modifying them programmatically using Python.
Install Beautiful Soup 4 using 'pip install beautifulsoup4'. Alternative commands include 'pip3 install beautifulsoup4' or 'python -m pip install beautifulsoup4'.
Use 'with open("index.html", "r") as f: soup = BeautifulSoup(f, "html.parser")' to read and parse a local HTML file.
Access the first occurrence of a tag using 'soup.tagname' (e.g., 'soup.title'). Use '.string' to get or modify the text inside a tag.
Use 'soup.find_all("tagname")' to get all tags of a given type. Use 'soup.find("tagname")' to get only the first match.
Install the requests library with 'pip install requests' to fetch web pages.
Send a GET request with 'requests.get(url)' and access the HTML content via 'result.text'. Then parse it with BeautifulSoup.
Use 'soup.find_all(text="$")' to find all occurrences of a dollar sign. Then navigate to the parent tag to extract the full price.
Use '.parent' to move up the parse tree, then '.find("strong")' to locate the price tag, and '.string' to get the numeric value.
"The title accurately describes the tutorial content; it's a genuine introduction to Beautiful Soup 4 for web scraping."
How do you install Beautiful Soup 4?
pip install beautifulsoup4
01:55
What is the import statement for Beautiful Soup?
from bs4 import BeautifulSoup
02:55
How do you read a local HTML file with Beautiful Soup?
with open('index.html', 'r') as f: soup = BeautifulSoup(f, 'html.parser')
03:08
How do you get the first tag of a certain type?
soup.tagname (e.g., soup.title)
05:50
How do you access the text inside a tag?
tag.string
06:29
How do you modify the text inside a tag?
tag.string = 'new text'
06:46
How do you find all tags of a certain type?
soup.find_all('tagname')
07:47
How do you fetch a webpage's HTML using requests?
result = requests.get(url); html = result.text
10:58
How do you parse HTML from a string?
soup = BeautifulSoup(html_string, 'html.parser')
11:16
How do you find all occurrences of a specific text?
soup.find_all(text='$')
13:22
How do you get the parent tag of an element?
element.parent
15:05
How do you find a tag within another tag?
parent.find('tagname')
16:03
Beautiful Soup's Core Functionality
Defines the library's dual purpose: extracting and modifying HTML, which is the foundation of the tutorial.
00:12Accessing Tags by Name
Shows the simplest way to retrieve elements, a fundamental technique for web scraping.
05:07Using find_all for Multiple Matches
Demonstrates how to collect all tags of a type, essential for scraping lists or repeated data.
07:29Fetching Web Pages with Requests
Introduces the requests library to retrieve live HTML, bridging local files and real-world scraping.
10:58Navigating the Parse Tree with Parent
Explains the tree structure and how to move upward to extract contextual data, a key scraping pattern.
13:46[00:00] Hello everybody and welcome to a brand new tutorial series on this channel which is on beautiful soup for now beautiful soup for is kind of a web scraping and HTML parsing module. So what this
[00:12] allows you to do is actually extract information from HTML documents and then modify HTML documents as well. So you could use this for web scraping. You could also use this to read in saying HTML file,
[00:24] modify it programmatically using Python code and then recreate like a new HTML file that has those modifications to it. It's very versatile. There's a ton of stuff to show you but in this first video here what I will be doing is just giving you an introduction to how it works showing you how
[00:39] to read in a local file showing you how to read an HTML from the web and then I'll kind of just give you you know like a brief walk through of how beautiful soup works and some of the main most common functionality that you're going to want to know in the very last video of the series. I will show you how
[00:54] to write a relatively automated web scraping program that goes and looks for prices of graphics cards. I know a lot of people are looking for graphics cards right now. So I thought that would be an interesting application that we could kind of conclude everything with. It's writing that code.
[01:07] Anyways, I hope you guys are excited. If you are make sure leave a like, subscribe to the channel. Let me know in the comments anything you want to see in this series. Let's go ahead and get started. All right, so in front of me, I have the beautiful soup 4.9 documentation. I'll leave a link to this
[01:30] in the description in case you'd like to read this yourself. Pretty much everything I'm going to show you here is coming directly from this documentation page. I've just kind of summarized it and grabbed what I figured was the most important stuff from here. Anyways, if you want to see all of the functionality,
[01:43] you can see there is quite a bit of it. This is quite a long document. Then you can click the link in the description. All right, so the first thing we need to do when we're going to start working with beautiful soup is we need to install it. Now what we need to do is install the Python package,
[01:55] which comes from PIP. So if you're on Windows, open up command prompt. If you're on Mac or Linux, open up your terminal and then type the following PIP install and then this is beautiful soup 4 like that.
[02:09] I think I spelled that correctly. So you're going to PIP install beautiful soup 4 and then that should install the package for you. Now for some reason this command does not work for you. Try PIP 3 install beautiful soup 4. If that doesn't work for you, try Python hyphen M PIP install beautiful soup 4.
[02:26] And if that doesn't work, try Python 3 hyphen M PIP install beautiful soup 4. Lastly, add a 3 here. Those are the kind of different combinations you can try. If none of those work, I do have some videos I will leave in the description that show you how to fix your PIP. Anyways, at this point,
[02:40] I'm going to assume that you've installed that Python package. I'm using Python version 3.8. I believe right now you can do this in pretty much any version. It should work the same. All right, so now that we've got that installed, we can start writing our Python code. I'm currently in sublime text. You can use
[02:55] any editor that you like. This is just the one that I prefer for these types of videos. And what I'm going to do is start by importing from BS4 import beautiful soup like that. So this is what you need to
[03:08] do to get started. And then what we're going to do after this is I'm going to show you how to read in an HTML file and then to modify that file. Then later in the video, I will show you how to read in kind of a web page. So if you want to read an HTML file, first of all, you need an HTML file. So I
[03:23] have this kind of dummy HTML file here. I'll leave a link in the description as a GitHub repository that has all the code that I write here, including this document. So you can grab it from there if you want. But this is just kind of a dummy HTML file. Okay. So this is in the same directory as where I have this
[03:38] web scraping.py file. Make sure it's in the same directory. Otherwise, it's going to be a bit of a headache. And what you're going to do is open this file and then use beautiful soup to read it. So we're going to say with open and then this is going to be index dot HTML comma. And then we're going to say
[03:53] HTML dot parser. So sorry, not HTML dot parser. This is going to be our because we're opening this in read mode. I'm getting a little bit ahead of myself. And I'm just going to call this F standing for five. Okay. So with open index dot HTML in read mode as F. And then what I'm going to do is say my
[04:11] document is equal to beautiful soup. And then I am going to put F as the document that I want to read in here. And then I'm going to do HTML dot parser. So there's a few other parsers you can use here.
[04:24] I'm not really going to talk about what those are, but pretty much since this is an HTML document, we want to parse it as an HTML document. So we write HTML dot parser. This is like an accepted type for the beautiful soup module. Okay. Then what I'm going to show you is just what this looks like,
[04:39] what as kind of a Python object. So I'm going to print out the doc and run my code and show you that we get the HTML document like that. So that's as easy as it is to actually read in an HTML file. This is
[04:52] local on your machine. Now what I'm going to show you is one cool thing that you can do here. So it's usually better because your HTML is always going to be all kind of like jumbled together to predefined this before you print it out. So if you print doc dot predefined, then what this does is give you all
[05:07] the indentation. And you can see this is a lot nicer. And then obviously is much easier to read. Okay. So that is how you read in a document. Now I'm going to show you a few pieces of the functionality. So usually what happens when you read an HTML file like this is you want to search for a specific
[05:22] aspect. If we're going to the example I mentioned at the beginning of the video, maybe you're looking for the price of something, maybe you're looking for the name, you're looking for maybe a table, usually searching for some type of information. So you need to be able to find that in the document.
[05:34] So the first thing I'm going to show you is how you can find things by the tag name. So actually, I should go here and just print this out again. So obviously we have like the head tag, the HTML tag, the center tag, all of these things. It's really easy to actually find stuff that is named a
[05:50] specific tag in beautiful soup. What you can do is doc. And then the tag name, and this will actually give you access to the first tag that has this name in the document. So just bear with me
[06:02] for a second here, I'm going to say tag is equal to doc. And then we'll go with title. And then what I can actually do here is print out the tag. And you'll see that this is the title tag. Right. So if
[06:14] you want to access specific tag, just put the name. Now, obviously if there's multiple things named or using the same tag, it's only going to give you the first one. I'll show you how to get all of them in a second. Okay, so now that we have the tag, what if I just want to access what's inside of
[06:29] here? Well, the access, the string that is being held inside of a tag, what you can do is use dot string. So I can say tag dot string. And then notice it gives me your title here. Now one of the cool things about this, though, is I can also modify these tags. So what I can do is something like tag dot
[06:46] string is equal to. And then hello. And now if I print out my tag, notice that it's actually modified this in place and changed it to hello. Now what I can also do is show you that when I print
[06:59] the entire document again, so print doc, we don't need to printify it. If we go here and we find the title, notice it's actually changed in the document. So the same way that you can access things, you can change. Pretty straightforward. Now, there's a lot of other things that you can change as well. I'll
[07:13] show you those in later videos, but that's kind of the basics. That's how you access what's inside of a tag and then how you actually get the string within the tag. Okay. Now what else can we do here? Well, we need to be able to find tags that aren't just the first ones that occur in the document. So in
[07:29] order to do that, what you can do is say, doc, dot find. And then you can put the tag. So if I put the tag a, for example, here, this will give me any links. But again, this is only going to give me the first tag that occurs that has a inside of it. So what you can do instead is find all, excuse me here,
[07:47] if now I print tags, you'll see this will give me all of the a tags in the document. Actually, I'm going to go with p because I don't know if there's multiple a tags here. And when I do this, notice I get all of the p tags being printed out right here and it also shows me what's inside of these p tags.
[08:04] Okay. So that is how you can get that. So as you probably noticed here, these p tags have things inside of them, right? Like this p tag has another tag inside of it. So I'm going to show you now how you can actually access the nested tags. Now, this is the exact same way that you would access
[08:19] the tags just from your regular document. But now you're going to do it on an existing tag. So this will show you kind of how this works. But so this is pretty straightforward. But just so this is pretty straightforward, but let's just have a look here. So let's say we want to access the very first
[08:34] tag. So tag zero. In fact, let's just put a zero right here. And I want to access. Let's say the actually, let me print this out and see what we get here. Maybe I want to access the B tag, right? We're
[08:46] all of the bold tags. Well, if I want to do that, what I can do is the following. I say tags.find all just like I found everything on my document. And then I can access the B tags. And when I do this now,
[08:59] it gives me all of the different B tags, right? And then same thing within here, I could go and access the text of these B tags or I could go and access maybe the italics tag or whatever I want. But that's kind of how you can search through and parse the document. And again, I'll do an entire
[09:13] video on how you can actually find stuff in more detail. So we will continue in one second. But I need to quickly thank the sponsor of this video and the series, which is alko expert alko expert is the best platform to use from preparing for your software engineering coding interviews. They have over 160
[09:28] coding interview practice questions on the platform taught by the best instructors, one of which is me. If you want to prepare for your technical coding interviews, make sure to check out alko expert today by clicking the link in the description and using the code tech with Tim for a discount on
[09:42] the platform. All right. So now that I've showed you how to read in an HTML file from your local system, remember, actually have this file here in the same directory. I'm going to show you how you can read in HTML from a website. So what I'm going to do is go to my command prompt here in the same way
[09:56] that we installed beautiful soup. We are going to now install requests. So follow the same format of installing as I showed you previously, but pip install requests. And you notice I obviously
[10:08] already have this installed, but you guys won't most likely. And now we can actually access a website. So the website that I want to access is actually new. And as I mentioned, we're going to be looking for GPU prices later in the video series. But for now, let's say I just want to check the price of
[10:25] a specific GPU. So I'm going to steal this link right here. This is for a 3080. And this is the price. I'm going to show you how we can actually find and access this price. Okay. So what I'm going to do now is I need to leave this import and I need to import requests. Now what I'm going
[10:44] to do is say that my URL is equal to the URL of whatever website I want to access. And then what I'm going to do is I am going to send a request. I'm going to say my result is equal to requests.get.
[10:58] And I'm just going to put my URL URL like this. So super simple. All this is doing is sending an HTTP get request to this URL. It's going to return the content of the page and the content of the page will be stored in result.text. So if I do this and I run my code, notice we're going to get a bunch
[11:16] of gibberish here, but we are actually getting an HTML document. Okay. Now to prove this to you, what I'm going to do is now read in result.text using beautiful soup. So I just need to jump in here for one second and quickly mention that the URL that we're using here does actually allow us to grab
[11:33] its HTML from a script. Now there is a lot of websites. Amazon is one of them that I tried and that I failed with that have like bought protection built in and that don't actually let you grab the HTML
[11:46] of a page by just doing what I'm doing right here. This is a super simple way we're just sending a get request from a Python script. Websites can detect you're using a script and they'll try to actively block you. Now there's some kind of like policy and legal related stuff when it comes to
[12:01] scraping websites. So just make sure you're not like spamming requests on any websites or like dosing or detossing anyone or something like that. But what we're doing here is most likely fine, but I just want to mention that that there's a lot of websites this won't work for. And if they don't work
[12:15] for it, I'm not necessarily going to show you how to get around the anti-robot stuff. Regardless, let's continue the video. So I'm going to say that my doc is equal to beautiful soup result.text.
[12:27] And then in the same way as before, we want to use the HTML dot parser. And then I'm going to print out the doc.purify. Okay, so let's run this. I'll go back to that code in a second in case I went too fast
[12:40] for you. And now notice obviously it's quite long, but we are actually getting the HTML document. Perfect. So we can see all the div tags and everything like that. So now what I want to do is actually find the price of this GPU. So let me go back to the website right here and notice that this is what
[12:56] the price looks like. Now I'm going to assume that I don't know what the actual figure is. I don't know that it's $2,600. And I just want to look for the dollar sign and then find the price afterwards.
[13:08] So to do that is actually pretty easy. What we can do here is go to let's just make a new variable and let's say prices is equal to and then doc.find underscore all, but this time we're not looking for a
[13:22] specific tag. We're looking for some text. The text I'm looking for is a dollar sign. So I'm going to say text is equal to dollar sign like that. And then I'm just going to print out prices and show you
[13:34] what we get. So run this and notice we get $2 signs. Now that's not very helpful. Obviously we want the entire thing. We want the actual price, not just the dollar sign, but the thing is these dollar
[13:46] signs actually allow us to access what the price is. And the way we can do that is by using this thing called a parent. So the way that this is kind of set up, beautiful soup is everything is in a trade like structure. So when you read in the document, the HTML tag is kind of the first, I don't even know
[14:04] what to call it, branch of the tree. If you want to call it that are the root of the tree. And then there's all kinds of tags inside of the HTML tag, right? So if I have HTML here, I have a head tag inside of the HTML tag inside of the head tag. I have the title tag. Well, we kind of have this
[14:19] tree like structure where a descendant of HTML is the head tag and the body tag, a descendant of the head tag is the title tag. And then these tags right here also have a parent. So the title tags
[14:32] parent is the head tag, the head tags parent is the HTML tag pretty straight forward, but it just works in kind of a general tree structure. And so what we've accessed here, let me just write some kind of pseudo HTML here is imagine we have a p tag, okay, another p tag. And then we have like our dollar
[14:52] and 2,613, whatever it is, we've accessed this single dollar sign right here. And so if I access this dollar sign and I want the entire price, what I want is the parent of this dollar sign,
[15:05] because this just like everything else is a descendant of whatever its parent is. And so if I access the parent, this will give me the contents of the entire tag that this dollar sign is in. And then I can try to search for the 2,613. Hopefully that kind of makes sense, but that's the best way
[15:22] that I can really explain that to you. So anyways, we have these prices. What I'm going to do now is say prices 0 dot parent. And let's just print out what this is. So let's just say parent is equal to
[15:34] that and let's print the parent and run this. And notice we get this kind of large tag here, right? And we have this list item tag and then we have the price current label and then we have strong
[15:48] and then we have what the actual price is. So what I want here is what's inside of this strong tag. I want the actual value. So what I'm going to do now is search for the strong tag within the parent tag. And then I'm going to look for the contents of the strong tag. So now what I'm
[16:03] going to do is say strong is equal to parent dot find. And then I'm looking for strong. And then I will print not strong strong like that. So now when I do this, notice that we get 2,613. Perfect.
[16:23] Now I just want to get the 2,613. So what do I do? I use my dot string and I get 2,613. All right. So with that, I am going to end the video here. I just want to give you a quick
[16:37] introduction to how this module works. And later videos, I will show you more advanced stuff and all the other features that you need to know. In my opinion, this is a pretty cool thing. Really easy to use. Hope you guys enjoyed the video. If you did, make sure you pay like,
[16:50] subscribe to the channel and I will see you in another one.
⚡ Saved you 0h 17m reading this? Transcribe any YouTube video for free — no signup needed.