General Web Scraping

Scrape customer reviews from Zalando Poland using Python

Posted by Nitin Annemarie on 12/13/2024 at 10:46 am

Zalando is one of Poland’s most popular online fashion retailers, known for its extensive collection of clothing and accessories. Scraping customer reviews from Zalando Poland involves using Python with the requests and BeautifulSoup libraries to parse HTML content. Reviews are typically located in a dedicated section of the product page, often displayed as user-generated text accompanied by ratings. These reviews provide valuable insights into customer satisfaction and product quality.
The first step is to inspect the HTML structure of the Zalando product page using browser developer tools to identify the tags and classes that house the review data. This typically includes reviewer names, star ratings, and text feedback. The script fetches the product page, parses the HTML, and targets these elements using BeautifulSoup. Below is the Python script for extracting customer reviews from Zalando Poland:

import requests
from bs4 import BeautifulSoup
# URL of the Zalando product page
url = "https://www.zalando.pl/product-page"
# Headers to mimic a browser request
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}
# Fetch the page content
response = requests.get(url, headers=headers)
if response.status_code == 200:
    soup = BeautifulSoup(response.content, "html.parser")
    # Find the reviews section
    reviews = soup.find_all("div", class_="review-item")
    if reviews:
        for idx, review in enumerate(reviews, 1):
            reviewer = review.find("span", class_="reviewer-name")
            rating = review.find("div", class_="review-rating")
            comment = review.find("p", class_="review-comment")
            print(f"Review {idx}:")
            print("Reviewer:", reviewer.text.strip() if reviewer else "Anonymous")
            print("Rating:", rating.text.strip() if rating else "No rating")
            print("Comment:", comment.text.strip() if comment else "No comment")
            print("-" * 40)
    else:
        print("No reviews found.")
else:
    print(f"Failed to fetch the page. Status code: {response.status_code}"

Lugos Shanti replied 1 year, 7 months ago 5 Members · 4 Replies

4 Replies

Ekaterina Kenyatta

Member
12/14/2024 at 10:18 am

The script could benefit from additional error handling to ensure it gracefully manages missing elements. For example, using try-except blocks for specific sections like reviewer names or comments would prevent the script from failing if some reviews are incomplete.
Yolande Alojz

Member
12/17/2024 at 8:14 am

An improvement would be to add support for pagination to scrape reviews across multiple pages. Many products on Zalando have multiple review pages, and the script could be extended to iterate through them dynamically.
Marzieh Daniela

Member
12/18/2024 at 7:44 am

Saving the extracted reviews into a structured format like JSON or CSV would make it easier to analyze customer feedback. This would also allow for easy integration with sentiment analysis tools or visualization software.
Lugos Shanti

Member
12/19/2024 at 11:03 am

To enhance scalability, the script could store the reviews in a database. This would allow for efficient querying and make it easier to compare reviews across different products or categories over time.

Scrape customer reviews from Zalando Poland using Python

Ekaterina Kenyatta

Yolande Alojz

Marzieh Daniela

Lugos Shanti