News Feed Forums General Web Scraping Scrape customer reviews from Tesco Lotus Thailand using Node.js and Puppeteer?

  • Scrape customer reviews from Tesco Lotus Thailand using Node.js and Puppeteer?

    Posted by Anapa Jerilyn on 12/11/2024 at 11:19 am

    When scraping Tesco Lotus Thailand, Puppeteer is a powerful tool for handling dynamically loaded content, such as customer reviews. The site often uses JavaScript to load reviews, so you’ll need to make sure you wait for all content to load before attempting to scrape it. After navigating to the product page, you can extract review details like rating, text, and reviewer name using page.evaluate(). It’s important to handle scenarios where reviews might span multiple pages, which can be done by simulating user scrolling or clicking on pagination links.

    const puppeteer = require('puppeteer');
    (async () => {
        const browser = await puppeteer.launch({ headless: false });
        const page = await browser.newPage();
        await page.goto('https://www.tescolotus.com/product-page');
        // Wait for the review section to load
        await page.waitForSelector('.review-item');
        // Scrape reviews
        const reviews = await page.evaluate(() => {
            const reviewElements = document.querySelectorAll('.review-item');
            const reviewData = [];
            reviewElements.forEach(element => {
                const name = element.querySelector('.reviewer-name')?.innerText;
                const rating = element.querySelector('.rating-stars')?.innerText;
                const reviewText = element.querySelector('.review-text')?.innerText;
                reviewData.push({ name, rating, reviewText });
            });
            return reviewData;
        });
        console.log(reviews);
        await browser.close();
    })();
    
    Gerel Tomislav replied 1 month, 1 week ago 3 Members · 2 Replies
  • 2 Replies
  • Elisavet Jordana

    Member
    12/12/2024 at 8:42 am

    Scraping customer reviews from Tesco Lotus Thailand with Puppeteer involves dealing with dynamic content. After loading the product page, you’ll need to wait for reviews to be fully loaded. Once all the reviews are visible, use Puppeteer’s page.evaluate() to extract the review data. This includes the reviewer’s name, their rating, and their comments. Pagination is often used for longer review lists, so you’ll have to handle multiple pages by clicking through or simulating scrolling.

    const puppeteer = require('puppeteer');
    (async () => {
        const browser = await puppeteer.launch({ headless: false });
        const page = await browser.newPage();
        await page.goto('https://www.tescolotus.com/product-page');
        // Wait for review section to load
        await page.waitForSelector('.customer-reviews');
        // Scrape review data
        const reviews = await page.evaluate(() => {
            const reviewsList = [];
            const reviewElements = document.querySelectorAll('.customer-review');
            reviewElements.forEach(review => {
                const reviewer = review.querySelector('.reviewer-name').textContent;
                const rating = review.querySelector('.star-rating')?.textContent;
                const reviewText = review.querySelector('.review-text')?.textContent;
                reviewsList.push({ reviewer, rating, reviewText });
            });
            return reviewsList;
        });
        console.log(reviews);
        await browser.close();
    })();
    
  • Gerel Tomislav

    Member
    12/13/2024 at 6:36 am

    To scrape customer reviews from Tesco Lotus Thailand using Puppeteer, you’ll need to handle dynamic content. As you navigate to a product page, Puppeteer will allow you to wait for the reviews section to load fully. Extracting review details such as user ratings and comments can be done using CSS selectors. It’s also important to handle any delays in loading reviews, as some content may be lazy-loaded as you scroll. Puppeteer’s waitForSelector() method will ensure that all necessary elements are visible before scraping.

    const puppeteer = require('puppeteer');
    (async () => {
        const browser = await puppeteer.launch({ headless: true });
        const page = await browser.newPage();
        await page.goto('https://www.tescolotus.com/product-page');
        // Ensure the review section is visible
        await page.waitForSelector('.review-section');
        // Extract reviews
        const reviewData = await page.evaluate(() => {
            const reviews = [];
            const reviewElements = document.querySelectorAll('.review');
            reviewElements.forEach(review => {
                const reviewerName = review.querySelector('.reviewer-name').innerText;
                const rating = review.querySelector('.rating')?.innerText;
                const comment = review.querySelector('.review-comment').innerText;
                reviews.push({ reviewerName, rating, comment });
            });
            return reviews;
        });
        console.log(reviewData);
        await browser.close();
    })();
    

Log in to reply.