Replies – Discussions – Elisavet Jordana

Forum Replies Created

Elisavet Jordana

Member

12/12/2024 at 8:43 am in reply to: Scrape product availability, price from Central Thailand’s using Python n Scrapy

To scrape product availability and prices from Central Thailand’s e-commerce site, Scrapy is well-suited for the task, particularly when dealing with structured HTML pages. By using Scrapy’s XPath selectors, you can navigate the page and extract information such as product price, availability status, and product titles. The page might use JavaScript to load certain content, so a potential challenge here is ensuring that all the content is loaded before scraping. You may need to configure Scrapy’s download delay to mimic human behavior and prevent rate-limiting from the server.

import scrapy
class CentralScraperSpider(scrapy.Spider):
    name = 'central_scraper'
    start_urls = ['https://www.central.co.th/en/product-category']
    def parse(self, response):
        for product in response.xpath('//div[@class="product-item"]'):
            title = product.xpath('.//div[@class="product-name"]/text()').get()
            price = product.xpath('.//span[@class="price"]/text()').get()
            availability = product.xpath('.//span[@class="in-stock"]/text()').get()
            yield {
                'product': title.strip(),
                'price': price.strip(),
                'availability': availability.strip(),
            }
        # Handle pagination
        next_page = response.xpath('//a[contains(@class, "pagination-next")]/@href').get()
        if next_page:
            yield response.follow(next_page, self.parse)

Elisavet Jordana

Member

12/12/2024 at 8:42 am in reply to: Scrape customer reviews from Tesco Lotus Thailand using Node.js and Puppeteer?

Scraping customer reviews from Tesco Lotus Thailand with Puppeteer involves dealing with dynamic content. After loading the product page, you’ll need to wait for reviews to be fully loaded. Once all the reviews are visible, use Puppeteer’s page.evaluate() to extract the review data. This includes the reviewer’s name, their rating, and their comments. Pagination is often used for longer review lists, so you’ll have to handle multiple pages by clicking through or simulating scrolling.

const puppeteer = require('puppeteer');
(async () => {
    const browser = await puppeteer.launch({ headless: false });
    const page = await browser.newPage();
    await page.goto('https://www.tescolotus.com/product-page');
    // Wait for review section to load
    await page.waitForSelector('.customer-reviews');
    // Scrape review data
    const reviews = await page.evaluate(() => {
        const reviewsList = [];
        const reviewElements = document.querySelectorAll('.customer-review');
        reviewElements.forEach(review => {
            const reviewer = review.querySelector('.reviewer-name').textContent;
            const rating = review.querySelector('.star-rating')?.textContent;
            const reviewText = review.querySelector('.review-text')?.textContent;
            reviewsList.push({ reviewer, rating, reviewText });
        });
        return reviewsList;
    });
    console.log(reviews);
    await browser.close();
})();