-
Florianne Andrius replied to the discussion What’s the best approach to scraping PDF documents online? in the forum General Web Scraping a year ago
What’s the best approach to scraping PDF documents online?
Optical Character Recognition (OCR) with Tesseract is effective for scanned PDFs, though it requires more processing and is less accurate.
-
Florianne Andrius replied to the discussion How do I handle scraping for real-time data that updates frequently? in the forum General Web Scraping a year ago
How do I handle scraping for real-time data that updates frequently?
Running the scraper on a server with high bandwidth ensures that updates are handled smoothly without lags.
-
Florianne Andrius replied to the discussion How can I handle pagination when scraping JavaScript-heavy sites? in the forum General Web Scraping a year ago
How can I handle pagination when scraping JavaScript-heavy sites?
In cases where JavaScript pagination isn’t feasible, I often look for URL parameters that can be manipulated to skip between pages.
-
Florianne Andrius started the discussion What’s the best way to track used or second-hand listings on Carousell? in the forum General Web Scraping a year ago
What’s the best way to track used or second-hand listings on Carousell?
Carousell has an API in some regions, making it easier to monitor new listings and prices for specific categories like electronics or fashion.
-
Florianne Andrius changed their photo a year ago
-
Florianne Andrius became a registered member a year ago
-
Keith Marwin replied to the discussion How do I scrape data from sites using custom fonts or icons? in the forum General Web Scraping a year ago
How do I scrape data from sites using custom fonts or icons?
I sometimes find that simply copying and pasting into a text editor can reveal hidden font or icon text that doesn’t render normally in browsers.
-
Keith Marwin replied to the discussion How do I handle scraping for real-time data that updates frequently? in the forum General Web Scraping a year ago
How do I handle scraping for real-time data that updates frequently?
I prioritize only the most essential data fields, which speeds up each request and allows for more frequent updates.
-
Keith Marwin replied to the discussion How can I handle pagination when scraping JavaScript-heavy sites? in the forum General Web Scraping a year ago
How can I handle pagination when scraping JavaScript-heavy sites?
For sites with “Load More” buttons, I simulate clicks on the button until all items are loaded. This works well for e-commerce and content sites.
-
Keith Marwin started the discussion How can I monitor price trends on Lazada for specific product categories? in the forum General Web Scraping a year ago
How can I monitor price trends on Lazada for specific product categories?
Lazada has APIs in some countries, which makes data retrieval easier and more reliable for categories like electronics or home goods.
- Load More