

Saori Mariana
-
Saori Mariana replied to the discussion How can I scrape JavaScript-based content without headless browsers? in the forum General Web Scraping 4 months ago
How can I scrape JavaScript-based content without headless browsers?
I identify preloaded JSON data in HTML sources, which sometimes includes all necessary data without JavaScript.
-
Saori Mariana replied to the discussion What’s the best way to scrape map-based data from websites? in the forum General Web Scraping 4 months ago
What’s the best way to scrape map-based data from websites?
I sometimes screenshot map data and run OCR to extract names and locations, though it’s not as accurate as JSON-based scraping.
-
Saori Mariana replied to the discussion How can I detect and manage duplicate data in my scraped results? in the forum General Web Scraping 4 months ago
How can I detect and manage duplicate data in my scraped results?
For more complex data, I create custom matching algorithms to compare similar fields and flag duplicates with slight variations.
-
Saori Mariana replied to the discussion How do I handle scraping for real-time data that updates frequently? in the forum General Web Scraping 4 months ago
How do I handle scraping for real-time data that updates frequently?
Using a message queue like RabbitMQ or Kafka helps organize and process real-time data efficiently without overloading resources.
-
Saori Mariana started the discussion What’s the best way to scrape product pages on Decathlon with Ruby? in the forum General Web Scraping 4 months ago
What’s the best way to scrape product pages on Decathlon with Ruby?
Use the Nokogiri gem in Ruby for HTML parsing, which is effective for scraping Decathlon’s static pages with consistent HTML structure.
-
Saori Mariana changed their photo 4 months ago
-
Saori Mariana became a registered member 4 months ago