General Web Scraping

Compare Node.js and Python for scraping product prices on Elgiganten Swede

Posted by Scilla Phoebe on 12/14/2024 at 8:14 am

How does scraping product prices from Elgiganten, one of Sweden’s largest electronics retailers, differ between Node.js and Python? Would Python’s BeautifulSoup and requests libraries provide a more straightforward solution for parsing static content, or does Node.js with Puppeteer offer a better approach for handling dynamic content, such as discounts or price changes? Which language would be more scalable when scraping a large number of product pages?
Here are two implementations—one in Node.js and one in Python—for scraping product prices from Elgiganten. Which is better suited for handling the complexities of modern web scraping?Node.js Implementation:

const puppeteer = require('puppeteer');
(async () => {
    const browser = await puppeteer.launch({ headless: true });
    const page = await browser.newPage();
    // Navigate to the Elgiganten product page
    await page.goto('https://www.elgiganten.se/product-page', { waitUntil: 'networkidle2' });
    // Wait for the price section to load
    await page.waitForSelector('.product-price');
    // Extract product price
    const price = await page.evaluate(() => {
        const priceElement = document.querySelector('.product-price');
        return priceElement ? priceElement.innerText.trim() : 'Price not found';
    });
    console.log('Product Price:', price);
    await browser.close();
})();

Python Implementation:

import requests
from bs4 import BeautifulSoup
# URL of the Elgiganten product page
url = "https://www.elgiganten.se/product-page"
# Headers to mimic a browser request
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}
# Fetch the page content
response = requests.get(url, headers=headers)
if response.status_code == 200:
    soup = BeautifulSoup(response.content, "html.parser")
    # Extract product price
    price = soup.find("span", class_="product-price")
    if price:
        print("Product Price:", price.text.strip())
    else:
        print("Price not found.")
else:
    print(f"Failed to fetch the page. Status code: {response.status_code}")

Sanjit Andria replied 3 months, 3 weeks ago 5 Members · 4 Replies

4 Replies

Senka Leontios

Member
12/17/2024 at 10:37 am

Node.js with Puppeteer is ideal for handling dynamic content, such as prices that are updated via JavaScript. Its ability to render full pages in a headless browser makes it highly reliable for modern websites like Elgiganten.
Orrin Ajay

Member
12/18/2024 at 10:12 am

Python’s BeautifulSoup and requests are lightweight and easier to set up for scraping static content. However, if the prices are loaded dynamically, integrating Selenium might be necessary, which adds complexity.
Anita Maria

Member
12/21/2024 at 5:40 am

For large-scale scraping, Node.js handles concurrency more efficiently due to its non-blocking I/O model. This makes it more suitable for scraping multiple product pages simultaneously compared to Python’s threading or multiprocessing.
Sanjit Andria

Member
12/21/2024 at 5:52 am

Python offers a simpler learning curve and a vast library ecosystem, making it easier for beginners to implement scraping tasks. On the other hand, Node.js is better for developers already familiar with JavaScript and building scalable, asynchronous scraping solutions.

Compare Node.js and Python for scraping product prices on Elgiganten Swede

Senka Leontios

Orrin Ajay

Anita Maria

Sanjit Andria