Scrape product titles from Currys UK using Python

Ahmose Tetty · 2024-12-13T08:19:36+00:00

Scraping product titles from Currys UK involves using Python with the BeautifulSoup library for efficient HTML parsing. Product titles are generally located within specific tags, such as h1 or span, often accompanied by class attributes that help differentiate them from other elements. The first step is to inspect the HTML structure of the page to identify these tags and their corresponding class names.After determining the structure, the script fetches the page content using the requests library and parses it with BeautifulSoup. The script then uses CSS selectors or tag-based queries to extract the titles. This process can also be extended to scrape multiple products from category pages by iterating over the product list. Below is a complete implementation for extracting product titles from Currys UK using Python:import requests from bs4 import BeautifulSoup# URL of the Currys product pageurl "https://www.currys.co.uk/product-page"# Headers to mimic a browser requestheaders { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"}# Fetch the page contentresponse requests.get(url, headersheaders)if response.status_code 200: soup BeautifulSoup(response.content, "html.parser") # Extract product title product_title soup.find("h1", class_"product-title") if product_title: print("Product Title:", product_title.text.strip()) else: print("Product title not found.")else: print(f"Failed to fetch the page. Status code: {response.status_code}")

General Web Scraping

Scrape product titles from Currys UK using Python

Posted by Ahmose Tetty on 12/13/2024 at 8:19 am
Scraping product titles from Currys UK involves using Python with the BeautifulSoup library for efficient HTML parsing. Product titles are generally located within specific tags, such as h1 or span, often accompanied by class attributes that help differentiate them from other elements. The first step is to inspect the HTML structure of the page to identify these tags and their corresponding class names.
After determining the structure, the script fetches the page content using the requests library and parses it with BeautifulSoup. The script then uses CSS selectors or tag-based queries to extract the titles. This process can also be extended to scrape multiple products from category pages by iterating over the product list. Below is a complete implementation for extracting product titles from Currys UK using Python:
```
import requests
from bs4 import BeautifulSoup
# URL of the Currys product page
url = "https://www.currys.co.uk/product-page"
# Headers to mimic a browser request
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}
# Fetch the page content
response = requests.get(url, headers=headers)
if response.status_code == 200:
    soup = BeautifulSoup(response.content, "html.parser")
    # Extract product title
    product_title = soup.find("h1", class_="product-title")
    if product_title:
        print("Product Title:", product_title.text.strip())
    else:
        print("Product title not found.")
else:
    print(f"Failed to fetch the page. Status code: {response.status_code}")
```
Soheil Sarala replied 3 months, 2 weeks ago 5 Members · 4 Replies
4 Replies

Mawunyo Ajdin

Member
12/14/2024 at 9:35 am

The script could be improved by implementing a retry mechanism in case of failed requests due to network issues or server errors. Adding a loop to attempt multiple retries with delays between requests would ensure reliability.
Sanja Yevgeny

Member
12/17/2024 at 7:37 am

Enhancing the script to handle pagination would allow scraping of multiple product titles from category pages. This can be done by identifying and following the “Next Page” link dynamically until no further pages exist.
Minik Hamid

Member
12/18/2024 at 7:11 am

To improve data storage, the script could save the extracted product titles in a CSV file or database. This would make it easier to organize, analyze, and share the collected information.
Soheil Sarala

Member
12/19/2024 at 5:30 am

Adding user-agent rotation and proxy support would help avoid detection by Currys’ anti-bot mechanisms. This would ensure that the script can scrape data consistently without being blocked.

Scrape product titles from Currys UK using Python

Mawunyo Ajdin

Sanja Yevgeny

Minik Hamid

Soheil Sarala