-
Abioye Blaga changed their photo a year ago
-
Abioye Blaga became a registered member a year ago
-
Ratan Carol replied to the discussion What’s the best approach to scraping PDF documents online? in the forum General Web Scraping a year ago
What’s the best approach to scraping PDF documents online?
For websites that host multiple PDFs, I use BeautifulSoup to locate and download all PDF links in bulk before extraction.
-
Ratan Carol replied to the discussion How can I scrape multi-step verification processes? in the forum General Web Scraping a year ago
How can I scrape multi-step verification processes?
Some systems allow IP whitelisting to bypass verification. Setting up a static IP or VPN helps simplify this process.
-
Ratan Carol replied to the discussion How do I deal with scraped data that has inconsistent formatting? in the forum General Web Scraping a year ago
How do I deal with scraped data that has inconsistent formatting?
I add error logging to flag particularly messy fields for manual review, which saves time during data cleaning.
-
Ratan Carol started the discussion What are some efficient ways to scrape Real.de’s marketplace data with Golang? in the forum General Web Scraping a year ago
What are some efficient ways to scrape Real.de’s marketplace data with Golang?
Golang’s Colly framework is efficient for crawling and scraping Real.de’s static product pages, including product names and prices.
-
Ratan Carol changed their photo a year ago
-
Ratan Carol became a registered member a year ago
-
Maksims Emmy replied to the discussion How do I handle scraping pages with endless AJAX requests? in the forum General Web Scraping a year ago
How do I handle scraping pages with endless AJAX requests?
I identify the JSON or XML responses of AJAX calls to pull out data directly, avoiding the need to render the entire page.
-
Maksims Emmy replied to the discussion What’s the best approach to scraping PDF documents online? in the forum General Web Scraping a year ago
What’s the best approach to scraping PDF documents online?
Tabula is another great tool for extracting tables from PDFs. I use it to pull tabular data into a structured format for further processing.
- Load More