News Feed Forums General Web Scraping How to scrape product reviews from Etsy.com using Python?

  • How to scrape product reviews from Etsy.com using Python?

    Posted by Javed Roland on 12/17/2024 at 7:47 am

    Scraping product reviews from Etsy.com is a valuable way to analyze customer sentiment, product popularity, and review trends. Python is an excellent tool for extracting such data, using libraries like requests and BeautifulSoup for static pages or Selenium for dynamically rendered content. Etsy organizes its reviews into structured sections on product pages, typically containing the reviewer’s name, star rating, and the review text. However, ensure that your scraping activities align with Etsy’s terms of service and respect user privacy.
    The first step is to inspect the product review section using browser developer tools. Identify the tags and classes associated with review details, such as ratings and text. Here’s an example of scraping reviews using BeautifulSoup:

    import requests
    from bs4 import BeautifulSoup
    # Target URL for an Etsy product's review page
    url = "https://www.etsy.com/listing/123456789/example-product"
    headers = {
        "User-Agent": "Mozilla/5.0"
    }
    # Fetch the page
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, "html.parser")
        # Extract review details
        reviews = soup.find_all("div", class_="review")
        for review in reviews:
            reviewer_name = review.find("span", class_="reviewer-name").text.strip()
            rating = review.find("span", class_="star-rating").text.strip()
            review_text = review.find("p", class_="review-text").text.strip()
            print(f"Reviewer: {reviewer_name}, Rating: {rating}, Review: {review_text}")
    else:
        print("Failed to fetch the Etsy product page.")
    

    If the reviews are dynamically loaded, you can use Selenium to render the content before extracting it. Below is an example using Selenium:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    # Initialize Selenium WebDriver
    driver = webdriver.Chrome()
    driver.get("https://www.etsy.com/listing/123456789/example-product")
    # Wait for the page to load
    driver.implicitly_wait(10)
    # Extract review details
    reviews = driver.find_elements(By.CLASS_NAME, "review")
    for review in reviews:
        reviewer_name = review.find_element(By.CLASS_NAME, "reviewer-name").text.strip()
        rating = review.find_element(By.CLASS_NAME, "star-rating").text.strip()
        review_text = review.find_element(By.CLASS_NAME, "review-text").text.strip()
        print(f"Reviewer: {reviewer_name}, Rating: {rating}, Review: {review_text}")
    # Close the browser
    driver.quit()
    

    For large-scale scraping, consider implementing a delay between requests using time.sleep() and rotating proxies to avoid detection. You can also store the scraped data in a database or CSV file for easier analysis using libraries like pandas or SQLite.

    Medine Daniyal replied 19 hours, 50 minutes ago 3 Members · 2 Replies
  • 2 Replies
  • Antonio Elfriede

    Member
    12/19/2024 at 7:21 am

    A significant improvement to the scraper would be handling pagination. Etsy reviews are often split across multiple pages, requiring the scraper to navigate through each page. By identifying and automating the “Next” button in Selenium or parsing the pagination URLs in BeautifulSoup, you can ensure complete review collection. Adding a delay between requests can help avoid rate-limiting or detection.

  • Medine Daniyal

    Member
    12/21/2024 at 11:20 am

    Rotating user-agent headers and using proxies can improve the scraper’s robustness. Etsy may block repeated requests from the same IP address, so distributing traffic across multiple proxies can prevent detection. Libraries like fake_useragent or proxy rotation services can make your scraper appear more like a real user.

Log in to reply.