Forum Replies Created

  • To scrape product availability and prices from Central Thailand’s e-commerce site, Scrapy is well-suited for the task, particularly when dealing with structured HTML pages. By using Scrapy’s XPath selectors, you can navigate the page and extract information such as product price, availability status, and product titles. The page might use JavaScript to load certain content, so a potential challenge here is ensuring that all the content is loaded before scraping. You may need to configure Scrapy’s download delay to mimic human behavior and prevent rate-limiting from the server.

    import scrapy
    class CentralScraperSpider(scrapy.Spider):
        name = 'central_scraper'
        start_urls = ['https://www.central.co.th/en/product-category']
        def parse(self, response):
            for product in response.xpath('//div[@class="product-item"]'):
                title = product.xpath('.//div[@class="product-name"]/text()').get()
                price = product.xpath('.//span[@class="price"]/text()').get()
                availability = product.xpath('.//span[@class="in-stock"]/text()').get()
                yield {
                    'product': title.strip(),
                    'price': price.strip(),
                    'availability': availability.strip(),
                }
            # Handle pagination
            next_page = response.xpath('//a[contains(@class, "pagination-next")]/@href').get()
            if next_page:
                yield response.follow(next_page, self.parse)
    
  • Scraping customer reviews from Tesco Lotus Thailand with Puppeteer involves dealing with dynamic content. After loading the product page, you’ll need to wait for reviews to be fully loaded. Once all the reviews are visible, use Puppeteer’s page.evaluate() to extract the review data. This includes the reviewer’s name, their rating, and their comments. Pagination is often used for longer review lists, so you’ll have to handle multiple pages by clicking through or simulating scrolling.

    const puppeteer = require('puppeteer');
    (async () => {
        const browser = await puppeteer.launch({ headless: false });
        const page = await browser.newPage();
        await page.goto('https://www.tescolotus.com/product-page');
        // Wait for review section to load
        await page.waitForSelector('.customer-reviews');
        // Scrape review data
        const reviews = await page.evaluate(() => {
            const reviewsList = [];
            const reviewElements = document.querySelectorAll('.customer-review');
            reviewElements.forEach(review => {
                const reviewer = review.querySelector('.reviewer-name').textContent;
                const rating = review.querySelector('.star-rating')?.textContent;
                const reviewText = review.querySelector('.review-text')?.textContent;
                reviewsList.push({ reviewer, rating, reviewText });
            });
            return reviewsList;
        });
        console.log(reviews);
        await browser.close();
    })();