News Feed Forums General Web Scraping How to scrape API data using Node.js and node-fetch?

  • How to scrape API data using Node.js and node-fetch?

    Posted by Halinka Landric on 12/10/2024 at 10:27 am

    Scraping data from APIs using Node.js and node-fetch is an efficient way to gather structured data without parsing HTML. APIs often return data in JSON format, which is easy to process and store. Using node-fetch, you can send HTTP requests to the API endpoint and handle responses asynchronously. Before scraping, ensure the API allows access and complies with the site’s terms of service. To start, identify the API endpoint using browser developer tools or documentation.Here’s an example of using node-fetch to fetch and display API data:

    const fetch = require('node-fetch');
    
    const fetchData = async () => {
        const url = 'https://api.example.com/products';
        const options = {
            method: 'GET',
            headers: {
                'User-Agent': 'Mozilla/5.0',
                'Authorization': 'Bearer your-api-token', // Optional if API requires authentication
            },
        };
        try {
            const response = await fetch(url, options);
            if (!response.ok) throw new Error(`HTTP error! status: ${response.status}`);
            const data = await response.json();
            data.products.forEach(product => {
                console.log(`Name: ${product.name}, Price: ${product.price}`);
            });
        } catch (error) {
            console.error('Error fetching data:', error.message);
        }
    };
    fetchData();
    

    Using this approach allows you to interact directly with the API and avoid scraping the HTML structure of a webpage. How do you handle rate limits or authentication challenges with APIs?

    Gohar Maksimilijan replied 1 month, 1 week ago 3 Members · 3 Replies
  • 3 Replies
  • Gerri Hiltraud

    Member
    12/10/2024 at 11:00 am

    If the site provides JSON endpoints, I query those directly instead of parsing HTML. It’s faster and avoids issues with complex layouts or JavaScript rendering.

  • Gohar Maksimilijan

    Member
    12/10/2024 at 11:18 am

    I design my scraper with flexible CSS selectors or XPath queries that target attributes rather than static class names, making it easier to adapt to layout updates.

  • Gohar Maksimilijan

    Member
    12/10/2024 at 11:19 am

    For infinite scrolling pages, I use Capybara to simulate scrolling until all content is loaded. This ensures complete data extraction without missing hidden user agent profiles.

Log in to reply.