Use Node.js to scrape seller ratings from JD.com product pages

Ariah Alfred · 2024-12-13T08:30:44+00:00

Scraping seller ratings from JD.com, one of the largest e-commerce websites in China, involves handling dynamic content rendered by JavaScript. Node.js, along with Puppeteer, is an excellent choice for this task as it provides the ability to interact with dynamically loaded pages and extract content effectively. Seller ratings on JD.com are typically displayed near the product description or alongside the seller's name. These ratings are important for determining the reliability and quality of the seller and are often displayed as numerical values or percentages.To begin, the script uses Puppeteer to navigate to a product page on JD.com. It waits for the seller ratings section to load fully to ensure the data is accessible in the DOM. By inspecting the webpage structure, the script identifies the specific element containing the ratings and extracts its text content. Additional error handling is added to account for cases where the ratings section is missing or fails to load. Below is a complete implementation of the script using Puppeteer:const puppeteer require('puppeteer'); (async () > { const browser await puppeteer.launch({ headless: true }); const page await browser.newPage(); // Navigate to a JD.com product page await page.goto('https://item.jd.com/123456.html', { waitUntil: 'networkidle2' }); // Wait for the seller ratings section to load await page.waitForSelector('.seller-info .rating'); // Extract seller ratings const sellerRating await page.evaluate(() > { const ratingElement document.querySelector('.seller-info .rating'); return ratingElement ? ratingElement.innerText.trim() : 'Seller rating not found'; }); console.log('Seller Rating:', sellerRating); await browser.close();})();

General Web Scraping

Use Node.js to scrape seller ratings from JD.com product pages

Posted by Ariah Alfred on 12/13/2024 at 8:30 am
Scraping seller ratings from JD.com, one of the largest e-commerce websites in China, involves handling dynamic content rendered by JavaScript. Node.js, along with Puppeteer, is an excellent choice for this task as it provides the ability to interact with dynamically loaded pages and extract content effectively. Seller ratings on JD.com are typically displayed near the product description or alongside the seller’s name. These ratings are important for determining the reliability and quality of the seller and are often displayed as numerical values or percentages.
To begin, the script uses Puppeteer to navigate to a product page on JD.com. It waits for the seller ratings section to load fully to ensure the data is accessible in the DOM. By inspecting the webpage structure, the script identifies the specific element containing the ratings and extracts its text content. Additional error handling is added to account for cases where the ratings section is missing or fails to load. Below is a complete implementation of the script using Puppeteer:
```
const puppeteer = require('puppeteer');
(async () => {
    const browser = await puppeteer.launch({ headless: true });
    const page = await browser.newPage();
    // Navigate to a JD.com product page
    await page.goto('https://item.jd.com/123456.html', { waitUntil: 'networkidle2' });
    // Wait for the seller ratings section to load
    await page.waitForSelector('.seller-info .rating');
    // Extract seller ratings
    const sellerRating = await page.evaluate(() => {
        const ratingElement = document.querySelector('.seller-info .rating');
        return ratingElement ? ratingElement.innerText.trim() : 'Seller rating not found';
    });
    console.log('Seller Rating:', sellerRating);
    await browser.close();
})();
```
Subhash Siddiqa replied 3 months, 2 weeks ago 5 Members · 4 Replies
4 Replies

Lencho Coenraad

Member
12/14/2024 at 9:53 am

The script could be improved by adding support for scraping seller ratings across multiple products. By extracting links to similar products on the page, the script could navigate through them and gather data at scale.
Javed Roland

Member
12/17/2024 at 7:47 am

Integrating a feature to handle location-specific seller ratings would enhance the script. JD.com may show different ratings or policies depending on the buyer’s region, so simulating location-based inputs would provide more accurate data.
Elen Honorata

Member
12/18/2024 at 7:22 am

The script could store the extracted ratings in a database or JSON file instead of printing them to the console. This would make it easier to analyze and share the data, especially when working with large datasets.
Subhash Siddiqa

Member
12/19/2024 at 5:45 am

Adding user-agent rotation and proxy support would make the scraper more resilient to detection by JD.com’s anti-bot measures. This would allow for consistent data scraping without interruptions or IP bans.

Use Node.js to scrape seller ratings from JD.com product pages

Lencho Coenraad

Javed Roland

Elen Honorata

Subhash Siddiqa