-
Seon Theotleib started the discussion What are the most reliable ways to detect website blocks before scraping? in the forum General Web Scraping a year ago
What are the most reliable ways to detect website blocks before scraping?
One method I use is checking for specific response codes, like 403 or 429. If you start getting these more frequently, it’s usually a sign that blocks are imminent. Some sites even have custom messages in their headers to warn you.
-
Seon Theotleib changed their photo a year ago
-
Seon Theotleib became a registered member a year ago
-
Thibaut Ron replied to the discussion What’s the best way to handle CAPTCHAs while scraping? in the forum General Web Scraping a year ago
What’s the best way to handle CAPTCHAs while scraping?
Another way is to use a headless browser like Puppeteer or Selenium to try and solve CAPTCHAs using machine learning models, though this requires setup.
-
Thibaut Ron started the discussion How do I identify hidden APIs that might be easier to scrape? in the forum General Web Scraping a year ago
How do I identify hidden APIs that might be easier to scrape?
I start by opening the network tab in dev tools and navigating through the site. Often, you’ll find JSON or AJAX requests that fetch the data directly.
-
Thibaut Ron changed their photo a year ago
-
Thibaut Ron became a registered member a year ago
-
Zahir Xiu replied to the discussion What are the best methods for scraping data from dynamically-loaded websites? in the forum General Web Scraping a year ago
What are the best methods for scraping data from dynamically-loaded websites?
-
Sometimes, inspecting the network activity in dev tools can reveal a JSON API endpoint that loads the data, allowing you to directly query that endpoint instead of scraping rendered content.
-
-
Zahir Xiu started the discussion How can I optimize my scraping code for faster performance? in the forum General Web Scraping a year ago
How can I optimize my scraping code for faster performance?
Multithreading is a big help. Using libraries like Concurrent Futures or asyncio in Python allows me to run multiple requests simultaneously, which speeds up scraping significantly.
-
Zahir Xiu changed their photo a year ago
- Load More