Python Web Scraping Examples

You want to build a web scraper to capture valuable information on various websites. You know the value of it. But how do you do it? Python is one of the best computer languages for building a web scraper, and it is easy for even those without a lot of coding experience to learn it. Yet, a Python scrape website example can help you understand the details and processes involved.

In this Python scraper example, we will show you how simple and powerful Python can be at extracting valuable data from websites. We will help you learn to do this using the requests library, fetching HTML content from the website you select, and then parsing that information using Beautiful Soup. This allows you to capture the very specific information you need to research.

We will also provide you with some strategies for using Selenium for dynamic content and how to handle factors like headers, pagination, and storing your data in a database or CSV. Ready to learn?

Try Our Residential Proxies Today!

Here’s Where to Start First: Get Up to Speed

get up to speed

There is a lot to learn through this process, so we encourage you to read some of our other tools that can help you get up to speed on the details. We encourage you to read our “How to Scape the Web Using Python Requests” tutorial as a first step. You can also learn more in-depth strategies in Advanced Web Scraping in Python.

We offer a range of tools to help you. Additionally, it is well worth getting started now with using a proxy for web scraping. Proxies are a component of this process because they protect your identity while you are navigating the internet. We encourage you to set up a proxy for web scraping now and then maintain it throughout the process. For example, use our Proxies and Python Web Scraping (Why a Proxy Is Required) guide.

Python Scraper Example: Where to Get Started

python scraper example

Before we provide you with the Python web scraper example you need, here are some basics you need to know. The process for building a web scraper includes several steps:

  • Ensure you have the most up-to-date version of Python available to you to use, have it downloaded, and be ready to go.
  • You will also need to have several Python libraries installed and ready to go. These libraries provide you with the tools to plug in code to build your web scraper. Read our articles on each of them. Install Beautiful Soup, lxml, Selenium, and requests to get started.
  • Find HTML elements that contain the information you need
  • Save the scaped data to a database or CSV.

Now, let’s build out a Python scrape website example to show you how to create your own scraper.

Requests library: You need to set HTTP requests to the website to capture the information you need. To do that, you need to use the Requests library using POST and GET requests, which typically contain the information you need. The requests library makes it possible for you to do this. Start with getting to the terminal and then using the following command:

python -m pip install requests

You can then use the GET and POST requests within your code to send the messages to the website for your needs. Here is a Python screen scraping example for requests:

import requests

response = requests.get('https://Rayobyte.com/')

print(response.text)

Beautiful Soup: The next step in this web scraping with Python example is to use Beautiful Soup, another library, along with a parser to extract the data you need from HTML. It can also use this data to turn an invalid markup into a parse tree. Start with getting Beautiful Soup by using this:

pip install beautifulsoup4

You will need to use html.parser to help facilitate this process. In the following example, we use this parser, which is also a part of the Python Standard library, to parse information.

To get HTML using requests, you will use this type of code:

import requests

url = 'https://Rayobyte.com/blog'

response = requests.get(url)

Now, let’s say we want to find the element – the target of this search. You can use this web scraper Python example to find the element:

import requests

from bs4 import BeautifulSoup

url = 'https://Rayobyte.com’

response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')

print(soup.title)

If you have followed that Python scrape website example, you should get the following title:

<title>Rayobyte Blog | Rayobyte</title>

You can then modify the requests you send to include the specific information you need. Use the find_all() bit of code to narrow down the information. You can also use the more advanced tools available to help you navigate each of the requests you need to send.

LXML: The next component to this web scraper Python example is to focus on parsing. Parsing is critical for web scraping, but it is not easy to do – by any means. With lxml, you have a very powerful and easy-to-use tool that will parse the information you need. It works with both HTML and XML files and can extract the information necessary from huge amounts of data or datasets.

There are a few factors to keep in mind here. For example, HTML impacts its ability to parse well. You will need to use it properly to achieve your objectives. To get started, you will need to get lxml with this command:

pip install lxml

This will contain the HTML module. That will then work with HTML (of course!) You will need to have an HTML string first, though, which you can use the requests library to find. This is where it gets a bit confusing, but when you pull it all together, such as in this Python web scraping example, it really becomes a fast and efficient process.

Here’s what you need to do now. Once you have HTML available, the tree is built using the fromString method. To do that, use this code:

import requests

from lxml import html

url = 'https://rayobyte.com/blog'

response = requests.get(url)

tree = html.fromstring(response.text)

You will find a lot of benefits in using this tool, but there are a few more steps that can significantly enhance the outcome of your project.

Selenium: Dynamic pages can be one of the most important factors to consider when it comes to web scraping. Many of today’s websites contain dynamic content (even our website does!) That helps to make the content more engaging and beneficial to users. However, it makes web scraping more challenging. The next part of this Python scrape website example is to use Selenium, another Python library, to get around dynamic content.

Selenium is an open-source browser automation tool that will automate a variety of the tasks necessary to navigate dynamic pages. That includes logging into websites or capturing information to answer questions. It is also one of the best ways to avoid CAPTCHA, those difficult boxes and tests that aim to prevent you from getting beyond the page with a bot.

Install it using this command:

pip install Selenium

You will then need to use one of these Python scrape website examples to help you modify the application to work on the browser you are using. Most commonly, people use Chrome, so here is the example code for Chrome:

from selenium import webdriver

from Selenium.webdriver.common.by import By

driver = webdriver.Chrome()

We can now use the GET request to navigate these pages. Here is an example:

driver.get(‘https://rayobyte.com/blog’)

Now, here is another web scraping Python example to try out. Let’s say you are going to use Selenium with CSS selectors and XPath to pull out or extract elements from data. Our objective in this example is to capture all of the titles on our blog – have you read them all? Certainly worth it!

Using a CSS selector, we could use this code:

blog_titles = driver.find_elements(By.CSS_SELECTOR, 'a.e1dscegp1')

for title in blog_titles:

print(title.text)

driver.quit()  # closing the browser

One caveat about using Selenium for this type of project is that it will slow down the process. That is because the code has to first execute the JavaScript code on every page. It cannot move beyond this to parse until it takes that step.

If you are trying to navigate a huge amount of data, then you may find this to be a bit slow. For most other web scraping projects, though, there is no need to do more than use Selenium.

Breaking Down the Tools to Facilitate Web Scraping in Python

tools to facilitate web scraping in python

Now that we have the basics and Python scrape website examples, we can take a closer look at some of the core details that you need to really produce the results you need.

Handling Pagination and Dynamic Content: One of the struggles that many have when creating a web scraper is being able to navigate dynamic content. We have already mentioned this a bit, but let’s talk about other tasks. For example, what happens when you need multiple pages of data – and not just a simple website URL?

You will be able to use Selenium to help you overcome this for pagination. Use Selenium for web scraping involving:

  • Delayed content, such as data that takes a few seconds to show up on the page before it actually displays
  • JavaScript websites, including any website that is heavily reliant on JavaScript
  • JavaScript blocks, which some sites use.

How to Pick a URL: Another component of this process is selecting a URL. In our web scraping examples in Python, we have provided a range of specific URLs to our blog, but there are a few key factors to remember when choosing any URL to include:

  • Watch out for hidden JavaScript elements. If there are elements provided, the simple methods we are providing to you here may not work in the way you need them to work.
  • Image scraping requires more extensive processes. You can get an example using our Seleium guide. Web scraping images with Python takes a bit more detail to make it a successful process for you.
  • Make sure you follow the rules. Our guide here and any other on this website is meant to provide you with the tools you need to scrape content from the web in an ethical manner. You should use it only for public data, and you should never overstep on third-party rights. Be sure to read the terms and conditions of the website you are using.

Exporting Your Data

export data using python

Now that you have worked through most of the tasks necessary for web scraping with Python, our next objective is to do something with that data. The best option is to export the data utilizing a CSV file.

Before moving on to that process, it is important to check your data at this point. You want to make sure you are getting the data assigned to the right object in this process so that it moves properly. Use the PRINT function for this:

for x in results:

print(x)

And then:

print(results)

Now, when you remove the PRINT loop, you will be able to move the data to the CSV file. You can see how to do that here:

df = pd.DataFrame({'Names': results})

df.to_csv('names.csv', index=False, encoding='utf-8')

Try Out This Python Scrape Website Example

try out this python scrape website

The following adds to all of the web scraping examples in Python we have used (and adds a few more specific elements to the process that you can learn more about in our advanced tutorials.) Update this to the specs you need to build the type of scraping you desire.

import requests

from bs4 import BeautifulSoup

from selenium import webdriver

from selenium.webdriver import ChromeOptions

import pandas as pd

# Generate 5 URLs of search results.

pages = ['https://sandbox.rayobyte.com/products?page=' + str(i) for i in range(1, 6)]

# Crawl all URLs and extract each product's URL.

product_urls = []

for page in pages:

print(f'Crawling page \033[38;5;120m{page}\033[0m')

response = requests.get(page)

soup = BeautifulSoup(response.text, 'lxml')

for product in soup.select('.product-card'):

href = product.find('a').get('href')

product_urls.append('https://sandbox.rayobyte.com' + href)

print(f'\nFound \033[38;5;229m{len(product_urls)}\033[0m product URLs.')

# Initiliaze a Chrome browser without its GUI.

options = ChromeOptions()

options.add_argument('--headless=new')

driver = webdriver.Chrome(options=options)

# Scrape all product URLs and parse each product's data.

products = []

for i, url in enumerate(product_urls, 1):

print(f'Scraping URL \033[1;34m{i}\033[0m/{len(product_urls)}.', end='\r')

driver.get(url)

soup = BeautifulSoup(driver.page_source, 'lxml')

info = soup.select_one('.brand-wrapper')

product_data = {

'Title': soup.find('h2').get_text(),

'Price': soup.select_one('.price').get_text(),

'Availability': soup.select_one('.availability').get_text(),

'Stars': len(soup.select('.star-rating > svg')),

'Description': soup.select_one('.description').get_text(),

'Genres': ', '.join([genre.get_text().strip() for genre in soup.select('.genre')]),

'Developer': info.select_one('.brand.developer').get_text().replace('Developer:', '').strip() if info else None,

'Platform': info.select_one('.game-platform').get_text() if info and info.select_one('.game-platform') else None,

'Type': info.select('span')[-1].get_text().replace('Type:', '').strip() if info else None

}

# Append each product's data to a list.

products.append(product_data)

driver.quit()

# Save results to a CSV file.

df = pd.DataFrame(products)

df.to_csv('products.csv', index=False, encoding='utf-8')

print('\n\n\033[32mDone!\033[0m Products saved to a CSV file.')

Try Our Residential Proxies Today!

Ready to Get Started?

use rayobyte proxies with python

To help you get started, learn more about proxies for web scraping (we strongly recommend this step). At Rayobyte, we aim to educate you about all of your options. Use this Python scrape website example to help you get started building your own.

The information contained within this article, including information posted by official staff, guest-submitted material, message board postings, or other third-party material is presented solely for the purposes of education and furtherance of the knowledge of the reader. All trademarks used in this publication are hereby acknowledged as the property of their respective owners.

Table of Contents

    Kick-Ass Proxies That Work For Anyone

    Rayobyte is America's #1 proxy provider, proudly offering support to companies of any size using proxies for any ethical use case. Our web scraping tools are second to none and easy for anyone to use.

    Related blogs

    large scale web scraping python
    how to scrape website python
    how to build a web scraper in python
    python web scraping projects