-
Hepsie Sobekhotep replied to the discussion Scraping with BeautifulSoup vs. Scrapy: Which one should I choose? in the forum General Web Scraping a year ago
Scraping with BeautifulSoup vs. Scrapy: Which one should I choose?
If your scraping needs grow, start with BeautifulSoup, then switch to Scrapy when necessary.
-
Hepsie Sobekhotep replied to the discussion Legal considerations and ethics of web scraping: What are the boundaries? in the forum General Web Scraping a year ago
Legal considerations and ethics of web scraping: What are the boundaries?
Avoid scraping login-protected or personal data, as that could lead to legal issues under privacy laws like GDPR.
-
Hepsie Sobekhotep started the discussion What’s the best way to scrape media files (images, videos) from a website? in the forum General Web Scraping a year ago
What’s the best way to scrape media files (images, videos) from a website?
You can use BeautifulSoup to parse the HTML and extract image or video URLs, then download them using Python’s requests library.
-
Hepsie Sobekhotep changed their photo a year ago
-
Hepsie Sobekhotep became a registered member a year ago
-
Oskar Dannie replied to the discussion How do I extract data from a PDF using web scraping tools? in the forum General Web Scraping a year ago
How do I extract data from a PDF using web scraping tools?
For OCR-based PDFs, try Tesseract to extract text from images within the PDF.
-
Oskar Dannie replied to the discussion How do you handle pagination when scraping websites? in the forum General Web Scraping a year ago
How do you handle pagination when scraping websites?
Many times, sites have hidden pagination APIs that power the ‘next’ button. Inspect the network requests to see if there’s a JSON endpoint or similar. You can then scrape the JSON directly, skipping HTML parsing altogether.
-
Oskar Dannie started the discussion How can I handle anti-scraping mechanisms when extracting data? in the forum General Web Scraping a year ago
How can I handle anti-scraping mechanisms when extracting data?
Rotate proxies and user-agents frequently to avoid detection by the site’s anti-scraping systems.
-
Oskar Dannie changed their photo a year ago
-
Oskar Dannie became a registered member a year ago
- Load More