Use Go to scrape product prices from PChome Taiwan

Zaheer Arethusa · 2024-12-14T06:22:41+00:00

How would you scrape product prices from PChome, one of Taiwan's leading e-commerce websites? Are the prices located in a specific HTML element that is consistent across all products? Or does the page structure vary depending on the product category? What about dynamically loaded content—are the prices rendered directly in the HTML or fetched via an API after the page loads?Would Go, along with the Colly library, be suitable for this task? Colly is lightweight and efficient for scraping static content, but does it handle dynamic content well? If the prices are loaded via JavaScript, should the scraper fetch and parse the API response instead of relying on HTML elements? Here’s a potential script that might work—does it address these challenges?Does this script properly identify and handle the price elements? If the prices are loaded dynamically, would adding API scraping or a headless browser improve accuracy? How should edge cases, like missing prices or formatting variations, be handled? These are all considerations for building a robust scraper for PChome.package main import ( "fmt" "log" "github.com/gocolly/colly")func main() { // Create a new Colly collector c : colly.NewCollector() // Scrape product prices c.OnHTML(".price-section", func(e *colly.HTMLElement) { price : e.Text fmt.Println("Product Price:", price) }) // Handle errors c.OnError(func(_ *colly.Response, err error) { log.Println("Error occurred:", err) }) // Visit the PChome product page err : c.Visit("https://24h.pchome.com.tw/product-page") if err ! nil { log.Fatalf("Failed to visit website: %v", err) }}

General Web Scraping

Use Go to scrape product prices from PChome Taiwan

Posted by Zaheer Arethusa on 12/14/2024 at 6:22 am
How would you scrape product prices from PChome, one of Taiwan’s leading e-commerce websites? Are the prices located in a specific HTML element that is consistent across all products? Or does the page structure vary depending on the product category? What about dynamically loaded content—are the prices rendered directly in the HTML or fetched via an API after the page loads?
Would Go, along with the Colly library, be suitable for this task? Colly is lightweight and efficient for scraping static content, but does it handle dynamic content well? If the prices are loaded via JavaScript, should the scraper fetch and parse the API response instead of relying on HTML elements? Here’s a potential script that might work—does it address these challenges?
Does this script properly identify and handle the price elements? If the prices are loaded dynamically, would adding API scraping or a headless browser improve accuracy? How should edge cases, like missing prices or formatting variations, be handled? These are all considerations for building a robust scraper for PChome.
```
package main
import (
	"fmt"
	"log"
	"github.com/gocolly/colly"
)
func main() {
	// Create a new Colly collector
	c := colly.NewCollector()
	// Scrape product prices
	c.OnHTML(".price-section", func(e *colly.HTMLElement) {
		price := e.Text
		fmt.Println("Product Price:", price)
	})
	// Handle errors
	c.OnError(func(_ *colly.Response, err error) {
		log.Println("Error occurred:", err)
	})
	// Visit the PChome product page
	err := c.Visit("https://24h.pchome.com.tw/product-page")
	if err != nil {
		log.Fatalf("Failed to visit website: %v", err)
	}
}
```
Anwar Riya replied 11 months, 1 week ago 5 Members · 4 Replies
4 Replies

Samir Sergo

Member
12/17/2024 at 9:46 am

If the prices are dynamically loaded via JavaScript, fetching the underlying API might be a more reliable approach. Examining the network requests in the browser’s developer tools could reveal the API endpoints used to load pricing data.
Sunny Melanija

Member
12/18/2024 at 8:27 am

If the price structure varies by product category, adding conditional logic to handle different classes or tags could make the script more robust. For example, the scraper could first identify the category and adjust its queries accordingly.
Indiana Valentim

Member
12/19/2024 at 11:29 am

To handle cases where prices are missing or incorrectly formatted, the script could log these issues for review. Including detailed error messages would make it easier to identify and resolve problems in future runs.
Anwar Riya

Member
12/21/2024 at 5:15 am

Integrating a database to store the scraped prices would allow for efficient tracking and analysis of price trends over time. Using a time-stamped schema could also help detect and monitor price fluctuations for competitive analysis.

Use Go to scrape product prices from PChome Taiwan

Samir Sergo

Sunny Melanija

Indiana Valentim

Anwar Riya