News Feed Forums General Web Scraping How to scrape hotel prices from Expedia.com using JavaScript?

  • How to scrape hotel prices from Expedia.com using JavaScript?

    Posted by Ella Karl on 12/19/2024 at 11:50 am

    Scraping hotel prices from Expedia.com using JavaScript can be effectively done with Node.js and Puppeteer, a library that provides control over a headless browser. This approach allows you to handle dynamic content rendered by JavaScript, ensuring all hotel listings are fully loaded before data extraction. The process involves navigating to the desired Expedia page, waiting for the content to load, and then targeting elements like hotel names, prices, and ratings using DOM selectors. Puppeteer is particularly useful for handling scenarios where content is paginated or dynamically updated. Below is an example script to scrape hotel prices from Expedia.

    const puppeteer = require('puppeteer');
    (async () => {
        const browser = await puppeteer.launch({ headless: true });
        const page = await browser.newPage();
        const url = 'https://www.expedia.com/Hotels';
        await page.goto(url, { waitUntil: 'networkidle2' });
        const hotels = await page.evaluate(() => {
            const hotelData = [];
            const items = document.querySelectorAll('.uitk-card');
            items.forEach(item => {
                const name = item.querySelector('.uitk-heading-5')?.textContent.trim() || 'Name not available';
                const price = item.querySelector('.uitk-price')?.textContent.trim() || 'Price not available';
                const rating = item.querySelector('.uitk-star-rating-score')?.textContent.trim() || 'No rating';
                hotelData.push({ name, price, rating });
            });
            return hotelData;
        });
        console.log(hotels);
        await browser.close();
    })();
    

    This script uses Puppeteer to scrape hotel details like names, prices, and ratings from Expedia. It ensures all content is loaded before starting the extraction process. Adding functionality for pagination allows the scraper to navigate through multiple pages of listings, collecting a complete dataset. Using random delays between requests reduces the risk of being flagged by Expedia’s anti-scraping mechanisms. The extracted data can be saved in a structured format like JSON or CSV for further analysis.

    Umeda Domenica replied 1 month ago 3 Members · 2 Replies
  • 2 Replies
  • Agathi Toviyya

    Member
    12/20/2024 at 7:41 am

    A key improvement for the scraper is handling pagination to collect data from multiple pages. Expedia’s hotel listings often span several pages, and scraping just the first page limits the scope of the data. By programmatically following the “Next” button and looping through all available pages, you can ensure comprehensive data collection. Introducing random delays between page loads prevents detection by anti-bot measures. This method allows for detailed analysis of hotel pricing trends across a wider dataset.

  • Umeda Domenica

    Member
    12/20/2024 at 11:28 am

    To make the scraper more robust, adding error handling for missing or incomplete elements is essential. For instance, some hotels may not display ratings or prices, and failing to handle these cases could cause the script to crash. By checking for null or undefined values before attempting to extract data, the scraper can avoid runtime errors. Logging skipped items helps identify problem areas and refine the scraper. Regular updates to the script ensure compatibility with changes in Expedia’s page structure.

Log in to reply.