News Feed Forums General Web Scraping Use Go to scrape product prices from PChome Taiwan

  • Use Go to scrape product prices from PChome Taiwan

    Posted by Zaheer Arethusa on 12/14/2024 at 6:22 am

    How would you scrape product prices from PChome, one of Taiwan’s leading e-commerce websites? Are the prices located in a specific HTML element that is consistent across all products? Or does the page structure vary depending on the product category? What about dynamically loaded content—are the prices rendered directly in the HTML or fetched via an API after the page loads?
    Would Go, along with the Colly library, be suitable for this task? Colly is lightweight and efficient for scraping static content, but does it handle dynamic content well? If the prices are loaded via JavaScript, should the scraper fetch and parse the API response instead of relying on HTML elements? Here’s a potential script that might work—does it address these challenges?
    Does this script properly identify and handle the price elements? If the prices are loaded dynamically, would adding API scraping or a headless browser improve accuracy? How should edge cases, like missing prices or formatting variations, be handled? These are all considerations for building a robust scraper for PChome.

    package main
    import (
    	"fmt"
    	"log"
    	"github.com/gocolly/colly"
    )
    func main() {
    	// Create a new Colly collector
    	c := colly.NewCollector()
    	// Scrape product prices
    	c.OnHTML(".price-section", func(e *colly.HTMLElement) {
    		price := e.Text
    		fmt.Println("Product Price:", price)
    	})
    	// Handle errors
    	c.OnError(func(_ *colly.Response, err error) {
    		log.Println("Error occurred:", err)
    	})
    	// Visit the PChome product page
    	err := c.Visit("https://24h.pchome.com.tw/product-page")
    	if err != nil {
    		log.Fatalf("Failed to visit website: %v", err)
    	}
    }
    
    Anwar Riya replied 1 month ago 5 Members · 4 Replies
  • 4 Replies
  • Samir Sergo

    Member
    12/17/2024 at 9:46 am

    If the prices are dynamically loaded via JavaScript, fetching the underlying API might be a more reliable approach. Examining the network requests in the browser’s developer tools could reveal the API endpoints used to load pricing data.

  • Sunny Melanija

    Member
    12/18/2024 at 8:27 am

    If the price structure varies by product category, adding conditional logic to handle different classes or tags could make the script more robust. For example, the scraper could first identify the category and adjust its queries accordingly.

  • Indiana Valentim

    Member
    12/19/2024 at 11:29 am

    To handle cases where prices are missing or incorrectly formatted, the script could log these issues for review. Including detailed error messages would make it easier to identify and resolve problems in future runs.

  • Anwar Riya

    Member
    12/21/2024 at 5:15 am

    Integrating a database to store the scraped prices would allow for efficient tracking and analysis of price trends over time. Using a time-stamped schema could also help detect and monitor price fluctuations for competitive analysis.

Log in to reply.