News Feed Forums General Web Scraping Scrape delivery times from Empik Poland using Node.js

  • Scrape delivery times from Empik Poland using Node.js

    Posted by Abidan Grete on 12/13/2024 at 9:54 am

    Empik is one of Poland’s largest e-commerce platforms, offering a wide range of books, electronics, and lifestyle products. Scraping delivery times from Empik requires handling dynamic content since delivery estimates are often generated based on the customer’s location or stock availability. Using Node.js and Puppeteer, we can load the product page, wait for the delivery section to appear, and extract the estimated delivery times.
    The first step is to inspect the delivery time section on the product page using browser developer tools to identify the relevant HTML tags and classes. Puppeteer allows us to mimic user interactions, such as scrolling or inputting a postal code if required, to fetch location-based delivery estimates. Below is a complete implementation for scraping delivery times from Empik Poland using Puppeteer:

    const puppeteer = require('puppeteer');
    (async () => {
        const browser = await puppeteer.launch({ headless: true });
        const page = await browser.newPage();
        // Navigate to the Empik product page
        await page.goto('https://www.empik.com/product-page', { waitUntil: 'networkidle2' });
        // Wait for the delivery time section to load
        await page.waitForSelector('.delivery-time');
        // Extract delivery time
        const deliveryTime = await page.evaluate(() => {
            const element = document.querySelector('.delivery-time');
            return element ? element.innerText.trim() : 'Delivery time not available';
        });
        console.log('Delivery Time:', deliveryTime);
        await browser.close();
    })();
    
    • This discussion was modified 1 week, 2 days ago by  Abidan Grete.
    Jayesh Reuben replied 3 days, 13 hours ago 5 Members · 4 Replies
  • 4 Replies
  • Ken Josefiina

    Member
    12/14/2024 at 10:06 am

    The script could be improved by adding support for scraping delivery times across multiple locations. By simulating input for different postal codes, the scraper could collect regional delivery estimates, providing a more comprehensive dataset.

  • Dafne Stanko

    Member
    12/17/2024 at 8:01 am

    Error handling could be enhanced to capture scenarios where the delivery time section fails to load. Adding retries or logging errors with detailed messages would make the script more robust and easier to debug.

  • Navneet Gustavo

    Member
    12/18/2024 at 7:33 am

    To improve scalability, the script could save the extracted delivery times into a database or a file like JSON or CSV. This approach would make the data easier to query and analyze, especially when scraping delivery estimates for multiple products.

  • Jayesh Reuben

    Member
    12/19/2024 at 10:55 am

    Adding user-agent rotation and proxy support would help the script avoid detection by Empik’s anti-scraping measures. This would ensure consistent access to the website while scraping large amounts of data.

Log in to reply.