How can I scrape websites with infinite scroll without losing data?

Puleng Evy · 2024-11-14T05:45:08+00:00

I usually rely on Puppeteer for this. You can set it to scroll a specific number of pixels or until a specific element is visible. Combine this with a wait time to make sure data is fully loaded each time.

General Web Scraping

How can I scrape websites with infinite scroll without losing data?

Posted by Puleng Evy on 11/14/2024 at 5:45 am

I usually rely on Puppeteer for this. You can set it to scroll a specific number of pixels or until a specific element is visible. Combine this with a wait time to make sure data is fully loaded each time.

Ravi Ernestas replied 4 months, 3 weeks ago 4 Members · 3 Replies
3 Replies

Yannig Avicenna

Member
11/15/2024 at 8:29 am

If you’re using Python, selenium has built-in support for scrolling. Just use a loop that scrolls down and checks if new data has loaded. I add a try-catch block in case loading fails.
Raul Marduk

Member
11/15/2024 at 9:49 am

Check if the site loads new items through an AJAX request. If so, you can intercept the AJAX call URL in your browser’s dev tools and hit it directly. This way, you can get the data in JSON format without scrolling.
Ravi Ernestas

Member
11/16/2024 at 4:55 am

I’ve also written scripts that detect the ‘load more’ button, which some sites use instead of infinite scrolling. Simulating clicks on this button in a loop allows you to retrieve all content without scrolling.

How can I scrape websites with infinite scroll without losing data?

Yannig Avicenna

Raul Marduk

Ravi Ernestas