-
Marta Era changed their photo a year ago
-
Marta Era became a registered member a year ago
-
Gerlind Kelley replied to the discussion Compare Ruby and Go to scrape shipping details from Yahoo! Taiwan in the forum General Web Scraping a year ago
Compare Ruby and Go to scrape shipping details from Yahoo! Taiwan
Ruby’s Nokogiri is simple and intuitive, making it a great choice for developers who need a straightforward way to parse HTML. However, it may not perform as efficiently as Go’s Colly library when handling a large number of pages.
-
Gerlind Kelley replied to the discussion Compare Python and Node.js to scrape product reviews from Momo Taiwan in the forum General Web Scraping a year ago
Compare Python and Node.js to scrape product reviews from Momo Taiwan
Python’s BeautifulSoup is lightweight and excels at parsing static HTML, making it a good choice for simpler pages. However, it may struggle with dynamically loaded content unless combined with a tool like Selenium.
-
Gerlind Kelley started the discussion What’s the best approach for scraping table data from websites? in the forum General Web Scraping a year ago
What’s the best approach for scraping table data from websites?
Scraping table data is one of the most common tasks in web scraping. Tables often hold structured data, making them an ideal target for scraping. But how do you approach this? The first step is to inspect the website’s HTML to identify the table structure. Most tables use <table>, <tr> for rows, and <td> or <th> for cells. Using Python’s…
-
Gerlind Kelley changed their photo a year ago
-
Gerlind Kelley became a registered member a year ago
-
Mildburg Beth replied to the discussion Use Node.js to scrape product titles from Books.com.tw in the forum General Web Scraping a year ago
Use Node.js to scrape product titles from Books.com.tw
Ensuring proper handling of Chinese characters might require confirming that the content fetched from the site is encoded in UTF-8. Using Puppeteer eliminates encoding issues by simulating a browser session, which natively handles Unicode characters.
-
Mildburg Beth replied to the discussion Use Python to scrape product availability from Ruten Taiwan in the forum General Web Scraping a year ago
Use Python to scrape product availability from Ruten Taiwan
If the product availability is dynamically loaded, using a headless browser like Selenium or Playwright might be necessary. These tools can render JavaScript content and ensure that the availability information is fully visible before scraping.
-
Mildburg Beth started the discussion How to handle multi-page scraping with pagination in Python? in the forum General Web Scraping a year ago
How to handle multi-page scraping with pagination in Python?
Scraping data across multiple pages can be challenging, especially when dealing with pagination. The key is to identify how the website handles its “Next Page” button or pagination links. For some sites, the URL changes with each page (e.g., adding ?page=2 to the URL), while others might rely on JavaScript to load more content dynamically. How…
- Load More