Replies – Discussions – Gerel Tomislav

Forum Replies Created

Gerel Tomislav

Member

12/13/2024 at 6:36 am in reply to: Scrape customer reviews from Tesco Lotus Thailand using Node.js and Puppeteer?

To scrape customer reviews from Tesco Lotus Thailand using Puppeteer, you’ll need to handle dynamic content. As you navigate to a product page, Puppeteer will allow you to wait for the reviews section to load fully. Extracting review details such as user ratings and comments can be done using CSS selectors. It’s also important to handle any delays in loading reviews, as some content may be lazy-loaded as you scroll. Puppeteer’s waitForSelector() method will ensure that all necessary elements are visible before scraping.

const puppeteer = require('puppeteer');
(async () => {
    const browser = await puppeteer.launch({ headless: true });
    const page = await browser.newPage();
    await page.goto('https://www.tescolotus.com/product-page');
    // Ensure the review section is visible
    await page.waitForSelector('.review-section');
    // Extract reviews
    const reviewData = await page.evaluate(() => {
        const reviews = [];
        const reviewElements = document.querySelectorAll('.review');
        reviewElements.forEach(review => {
            const reviewerName = review.querySelector('.reviewer-name').innerText;
            const rating = review.querySelector('.rating')?.innerText;
            const comment = review.querySelector('.review-comment').innerText;
            reviews.push({ reviewerName, rating, comment });
        });
        return reviews;
    });
    console.log(reviewData);
    await browser.close();
})();

Gerel Tomislav

Member

12/13/2024 at 6:35 am in reply to: Scrape product availability, price from Central Thailand’s using Python n Scrapy

Scraping Central Thailand requires attention to both static and dynamic content, especially for prices and product availability. Scrapy excels at parsing static content, but for more complex sites with dynamic loading, you might need to ensure that you are targeting the correct elements. Scraping the prices and availability status can sometimes require additional logic to account for out-of-stock items or sale prices. It’s also important to handle pagination to scrape all available products within a category.

import scrapy
class CentralPriceScraper(scrapy.Spider):
    name = 'central_price_scraper'
    start_urls = ['https://www.central.co.th/en/shop/']
    def parse(self, response):
        for product in response.xpath('//div[@class="product-item"]'):
            title = product.xpath('.//h2[@class="product-title"]/text()').get()
            price = product.xpath('.//span[@class="price"]/text()').get()
            availability = product.xpath('.//div[@class="availability-status"]/text()').get()
            yield {
                'product': title.strip(),
                'price': price.strip(),
                'availability': availability.strip(),
            }
        # Follow pagination
        next_page = response.xpath('//a[contains(@class, "next-page")]/@href').get()
        if next_page:
            yield response.follow(next_page, self.parse)