-
Scrape product reviews, pricing, and categories from Currys UK with Python
Scraping data from Currys UK, a leading electronics retailer, involves extracting key information like product reviews, pricing, and categories to build insights or automate certain workflows. This process is done using Python, where libraries like requests and BeautifulSoup come into play. The first step is to identify the URL structure for the pages you want to scrape. This is usually done by visiting a few product pages and observing patterns in the URLs, such as whether the pages are static or dynamic.
Next, you need to inspect the webpage source (using browser developer tools) to identify the tags and classes associated with the data you wish to extract. Reviews are often stored in a section separate from the main product description, while pricing and categories might be directly embedded within the product details section. It’s important to handle pagination if the reviews span multiple pages.
One of the challenges with scraping reviews is ensuring that dynamically loaded content (rendered using JavaScript) is handled properly. If the required data isn’t present in the HTML response from requests, you may need to use Selenium or analyze network activity for API calls that fetch this data. For simplicity in this example, we’ll focus on scraping static HTML content.
After fetching the HTML content with the requests library, BeautifulSoup is used to parse and navigate the document tree. This allows us to locate and extract data using tags and attributes, such as product names, prices, reviews, and associated categories. Once extracted, the data can be stored in a structured format like a CSV or database for further processing. For instance, you might want to analyze the reviews to determine customer sentiment or study pricing trends.
Below is the complete Python script using requests and BeautifulSoup for scraping reviews, pricing, and categories from Currys UK:import requests from bs4 import BeautifulSoup import csv # URL of the product page url = "https://www.currys.co.uk/products/your-product-url" # Headers to mimic a real browser headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" } # Send a GET request to the page response = requests.get(url, headers=headers) # Check if the request was successful if response.status_code == 200: soup = BeautifulSoup(response.content, "html.parser") # Scrape product name product_name = soup.find("h1", class_="product-title").text.strip() print("Product Name:", product_name) # Scrape price price = soup.find("span", class_="price").text.strip() print("Price:", price) # Scrape category category = soup.find("a", class_="breadcrumb-link").text.strip() print("Category:", category) # Scrape reviews reviews_section = soup.find("div", class_="reviews-section") if reviews_section: reviews = reviews_section.find_all("p", class_="review-text") for idx, review in enumerate(reviews, 1): print(f"Review {idx}:", review.text.strip()) # Save to CSV with open("currys_data.csv", "w", newline="", encoding="utf-8") as file: writer = csv.writer(file) writer.writerow(["Product Name", "Price", "Category", "Reviews"]) review_texts = [review.text.strip() for review in reviews] if reviews_section else ["No reviews"] writer.writerow([product_name, price, category, " | ".join(review_texts)]) else: print(f"Failed to fetch the page. Status code: {response.status_code}")
Log in to reply.