Forum Replies Created

  • Scraping reviews from Shopee Thailand with Puppeteer involves interacting with the product page, loading additional reviews if necessary, and then parsing the content using custom selectors. It’s important to handle the asynchronous nature of review loading—using Puppeteer’s waiting functions ensures you only scrape reviews after they’ve been fully loaded.

    const puppeteer = require('puppeteer');
    (async () => {
        const browser = await puppeteer.launch({ headless: true });
        const page = await browser.newPage();
        await page.goto('https://shopee.co.th/product-page-url');
        // Ensure reviews are fully loaded
        await page.waitForSelector('.shopee-review-item');
        // Scroll to load reviews
        await page.evaluate(() => {
            window.scrollBy(0, window.innerHeight);
        });
        // Scrape review data
        const reviews = await page.$$eval('.shopee-review-item', reviewItems => {
            return reviewItems.map(review => ({
                name: review.querySelector('.shopee-review-item__user-name').innerText,
                rating: review.querySelector('.shopee-star-rating').innerText,
                reviewText: review.querySelector('.shopee-review-item__content').innerText
            }));
        });
        console.log(reviews);
        await browser.close();
    })();
    
  • When scraping Lazada Thailand, make sure you’re handling the request headers properly. The site may block requests that don’t appear to come from an actual browser, so it’s essential to mimic a real browser using headers. In addition, the structure of the HTML might change across different product categories, so using flexible selectors is a good approach. Always keep an eye on the terms of service of any site you scrape and ensure you’re in compliance.

    import requests
    from bs4 import BeautifulSoup
    url = 'https://www.lazada.co.th/catalog/?q=shoes'
    headers = {'User-Agent': 'Mozilla/5.0'}
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')
    # Parse and print product details
    products = soup.find_all('div', {'class': 'c1ZEkM'})
    for product in products:
        title = product.find('div', {'class': 'c16H9d'}).text.strip()
        price = product.find('span', {'class': 'c13VH6'}).text.strip()
        print(f'Title: {title}, Price: {price}')