News Feed Forums General Web Scraping Scrape bestsellers, delivery charges, and star ratings from WHSmith UK using Go

  • Scrape bestsellers, delivery charges, and star ratings from WHSmith UK using Go

    Posted by Candice Agata on 12/13/2024 at 6:50 am

    Scraping bestsellers, delivery charges, and star ratings from WHSmith UK involves setting up a Go application using the Colly library for efficient HTML parsing and data extraction. Bestsellers are often highlighted prominently on the homepage or in a dedicated section, typically with tags like “Bestseller” or “Top Picks.” These elements can be identified by inspecting the webpage’s structure and locating the relevant sections.
    Delivery charges, an essential detail for online shopping, are generally displayed either in the product page’s delivery information section or during the checkout process. Identifying and extracting these charges requires focusing on specific text blocks or classes labeled as shipping or delivery details.
    Star ratings for products are usually displayed as visual elements, such as filled stars, or as numerical ratings near the product reviews section. Using Colly, you can target the section where these ratings are embedded and extract them as plain text for easier analysis. Below is a complete Go implementation for scraping bestsellers, delivery charges, and star ratings from WHSmith UK:

    package main
    import (
    	"encoding/csv"
    	"fmt"
    	"log"
    	"os"
    	"github.com/gocolly/colly"
    )
    func main() {
    	// Create a new collector
    	c := colly.NewCollector()
    	// Open a CSV file to save the scraped data
    	file, err := os.Create("whsmith_data.csv")
    	if err != nil {
    		log.Fatalf("Failed to create CSV file: %v", err)
    	}
    	defer file.Close()
    	writer := csv.NewWriter(file)
    	defer writer.Flush()
    	// Write the CSV header
    	writer.Write([]string{"Bestseller", "Delivery Charges", "Star Ratings"})
    	// Scrape bestsellers
    	c.OnHTML(".bestseller-item", func(e *colly.HTMLElement) {
    		bestseller := e.ChildText(".bestseller-title")
    		deliveryCharges := e.ChildText(".delivery-info")
    		starRatings := e.ChildText(".star-ratings")
    		fmt.Printf("Bestseller: %s | Delivery Charges: %s | Star Ratings: %s\n", bestseller, deliveryCharges, starRatings)
    		writer.Write([]string{bestseller, deliveryCharges, starRatings})
    	})
    	// Visit the WHSmith bestsellers page
    	err = c.Visit("https://www.whsmith.co.uk/bestsellers")
    	if err != nil {
    		log.Fatalf("Failed to visit website: %v", err)
    	}
    }
    
    Sandra Gowad replied 5 days, 8 hours ago 5 Members · 4 Replies
  • 4 Replies
  • Sandrine Vidya

    Member
    12/13/2024 at 10:34 am

    The script could be improved by implementing better error handling to manage cases where specific elements are missing or the page fails to load. For example, adding a conditional check for each element ensures the script doesn’t break unexpectedly.

  • Laleh Korina

    Member
    12/14/2024 at 8:29 am

    Adding functionality to scrape additional categories or pages dynamically would make the script more versatile. This can be achieved by extracting links to other sections and visiting them recursively.

  • Laurids Liljana

    Member
    12/17/2024 at 7:12 am

    To enhance usability, the script could include an option to filter products based on star ratings or delivery charges. This would make it more practical for real-world analysis or reporting.

  • Sandra Gowad

    Member
    12/17/2024 at 11:10 am

    Finally, integrating the script with a database, such as MySQL or PostgreSQL, would allow for better storage and querying of the scraped data. This approach is more efficient for larger datasets compared to saving the data in a CSV file.

Log in to reply.