-
Gojko Diomedes replied to the discussion How can I scrape structured data from sites without standard HTML tags? in the forum General Web Scraping a year ago
How can I scrape structured data from sites without standard HTML tags?
Scrapy’s XPath expressions are especially helpful for locating non-standard elements by their position in the DOM structure.
-
Gojko Diomedes replied to the discussion How can I scrape JavaScript-based content without headless browsers? in the forum General Web Scraping a year ago
How can I scrape JavaScript-based content without headless browsers?
Inspecting JavaScript functions in the source code can reveal data endpoints that load data independently of interactive content.
-
Gojko Diomedes started the discussion Best methods to scrape SKU or UPC metadata from Amazon or eBay? in the forum General Web Scraping a year ago
Best methods to scrape SKU or UPC metadata from Amazon or eBay?
Amazon often includes SKUs or ASINs directly in the page metadata, which I extract using BeautifulSoup to streamline data organization.
-
Gojko Diomedes changed their photo a year ago
-
Gojko Diomedes became a registered member a year ago
-
Robert Yehoyaqim replied to the discussion How do I extract text from images or infographics? in the forum General Web Scraping a year ago
How do I extract text from images or infographics?
I use layout analysis tools to detect text regions, which allows me to extract text while ignoring non-text elements.
-
Robert Yehoyaqim replied to the discussion What’s the most efficient way to handle scraped data in multiple languages? in the forum General Web Scraping a year ago
What’s the most efficient way to handle scraped data in multiple languages?
Combining translation and NLP libraries, like spaCy, enables me to analyze multilingual data without extensive preprocessing.
-
Robert Yehoyaqim replied to the discussion How can I scrape JavaScript-based content without headless browsers? in the forum General Web Scraping a year ago
How can I scrape JavaScript-based content without headless browsers?
Loading only JSON responses via AJAX calls in Requests is another workaround, provided the structure is accessible without rendering.
-
Robert Yehoyaqim started the discussion How to handle site formatting differences when scraping multiple Shopify stores? in the forum General Web Scraping a year ago
How to handle site formatting differences when scraping multiple Shopify stores?
Shopify sites use unique themes, so I adjust CSS selectors or XPath expressions for each store based on its layout to capture data accurately.
-
Robert Yehoyaqim changed their photo a year ago
- Load More