News Feed Forums General Web Scraping Scrape product titles from Currys UK using Python

  • Scrape product titles from Currys UK using Python

    Posted by Ahmose Tetty on 12/13/2024 at 8:19 am

    Scraping product titles from Currys UK involves using Python with the BeautifulSoup library for efficient HTML parsing. Product titles are generally located within specific tags, such as h1 or span, often accompanied by class attributes that help differentiate them from other elements. The first step is to inspect the HTML structure of the page to identify these tags and their corresponding class names.
    After determining the structure, the script fetches the page content using the requests library and parses it with BeautifulSoup. The script then uses CSS selectors or tag-based queries to extract the titles. This process can also be extended to scrape multiple products from category pages by iterating over the product list. Below is a complete implementation for extracting product titles from Currys UK using Python:

    import requests
    from bs4 import BeautifulSoup
    # URL of the Currys product page
    url = "https://www.currys.co.uk/product-page"
    # Headers to mimic a browser request
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
    }
    # Fetch the page content
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, "html.parser")
        # Extract product title
        product_title = soup.find("h1", class_="product-title")
        if product_title:
            print("Product Title:", product_title.text.strip())
        else:
            print("Product title not found.")
    else:
        print(f"Failed to fetch the page. Status code: {response.status_code}")
    
    Soheil Sarala replied 3 days, 19 hours ago 5 Members · 4 Replies
  • 4 Replies
  • Mawunyo Ajdin

    Member
    12/14/2024 at 9:35 am

    The script could be improved by implementing a retry mechanism in case of failed requests due to network issues or server errors. Adding a loop to attempt multiple retries with delays between requests would ensure reliability.

  • Sanja Yevgeny

    Member
    12/17/2024 at 7:37 am

    Enhancing the script to handle pagination would allow scraping of multiple product titles from category pages. This can be done by identifying and following the “Next Page” link dynamically until no further pages exist.

  • Minik Hamid

    Member
    12/18/2024 at 7:11 am

    To improve data storage, the script could save the extracted product titles in a CSV file or database. This would make it easier to organize, analyze, and share the collected information.

  • Soheil Sarala

    Member
    12/19/2024 at 5:30 am

    Adding user-agent rotation and proxy support would help avoid detection by Currys’ anti-bot mechanisms. This would ensure that the script can scrape data consistently without being blocked.

Log in to reply.