-
Gianna Xanti replied to the discussion What’s the most efficient way to handle scraped data in multiple languages? in the forum General Web Scraping a year ago
What’s the most efficient way to handle scraped data in multiple languages?
Encoding issues can arise with non-English characters, so I ensure all data is processed in UTF-8 for consistency.
-
Gianna Xanti replied to the discussion How can I scrape JavaScript-based content without headless browsers? in the forum General Web Scraping a year ago
How can I scrape JavaScript-based content without headless browsers?
requests and BeautifulSoup can handle sites with predictable URL structures, allowing direct data access without interaction.
-
Gianna Xanti replied to the discussion How do I handle scraping pages with endless AJAX requests? in the forum General Web Scraping a year ago
How do I handle scraping pages with endless AJAX requests?
Sometimes, lowering the scroll speed allows AJAX calls to complete and avoids missing dynamically loaded content.
-
Gianna Xanti started the discussion How do I scrape product reviews from ZozoTown using PHP? in the forum General Web Scraping a year ago
How do I scrape product reviews from ZozoTown using PHP?
PHP’s CURL library allows me to fetch product page HTML from ZozoTown, and then I parse reviews using DOMDocument and XPath.
-
Gianna Xanti changed their photo a year ago
-
Gianna Xanti became a registered member a year ago
-
Filipp Maglocunos replied to the discussion How can I detect and manage duplicate data in my scraped results? in the forum General Web Scraping a year ago
How can I detect and manage duplicate data in my scraped results?
By using unique constraints in SQL databases, I can prevent duplicates at the database level, which simplifies post-processing.
-
Filipp Maglocunos replied to the discussion How do I handle scraping pages with endless AJAX requests? in the forum General Web Scraping a year ago
How do I handle scraping pages with endless AJAX requests?
Inspecting the URL structure of AJAX requests often reveals pagination parameters, which I can modify to control data retrieval directly.
-
Filipp Maglocunos replied to the discussion What’s the best approach to scraping PDF documents online? in the forum General Web Scraping a year ago
What’s the best approach to scraping PDF documents online?
Cloud-based OCR solutions, like Google Vision API, handle complex PDFs more effectively, though there’s a cost involved.
-
Filipp Maglocunos started the discussion How can I track price changes on Mercari Japan using Ruby? in the forum General Web Scraping a year ago
How can I track price changes on Mercari Japan using Ruby?
I use the Nokogiri gem in Ruby for scraping Mercari Japan’s product listings, focusing on price data and seller information.
- Load More