Replies – Discussions – Zaheer Arethusa

Forum Replies Created

Zaheer Arethusa

Member

12/14/2024 at 6:29 am in reply to: How can I scrape product reviews from Shopee Thailand using Node.js n Puppeteer?

Scraping reviews from Shopee Thailand with Puppeteer involves interacting with the product page, loading additional reviews if necessary, and then parsing the content using custom selectors. It’s important to handle the asynchronous nature of review loading—using Puppeteer’s waiting functions ensures you only scrape reviews after they’ve been fully loaded.

const puppeteer = require('puppeteer');
(async () => {
    const browser = await puppeteer.launch({ headless: true });
    const page = await browser.newPage();
    await page.goto('https://shopee.co.th/product-page-url');
    // Ensure reviews are fully loaded
    await page.waitForSelector('.shopee-review-item');
    // Scroll to load reviews
    await page.evaluate(() => {
        window.scrollBy(0, window.innerHeight);
    });
    // Scrape review data
    const reviews = await page.$$eval('.shopee-review-item', reviewItems => {
        return reviewItems.map(review => ({
            name: review.querySelector('.shopee-review-item__user-name').innerText,
            rating: review.querySelector('.shopee-star-rating').innerText,
            reviewText: review.querySelector('.shopee-review-item__content').innerText
        }));
    });
    console.log(reviews);
    await browser.close();
})();

Zaheer Arethusa

Member
12/14/2024 at 6:28 am in reply to: How can I scrape product data from Lazada Thailand using Python n BeautifulSoup?
When scraping Lazada Thailand, make sure you’re handling the request headers properly. The site may block requests that don’t appear to come from an actual browser, so it’s essential to mimic a real browser using headers. In addition, the structure of the HTML might change across different product categories, so using flexible selectors is a good approach. Always keep an eye on the terms of service of any site you scrape and ensure you’re in compliance.
```
import requests
from bs4 import BeautifulSoup
url = 'https://www.lazada.co.th/catalog/?q=shoes'
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
# Parse and print product details
products = soup.find_all('div', {'class': 'c1ZEkM'})
for product in products:
    title = product.find('div', {'class': 'c16H9d'}).text.strip()
    price = product.find('span', {'class': 'c13VH6'}).text.strip()
    print(f'Title: {title}, Price: {price}')
```