Scrape bestsellers, delivery charges, and star ratings from WHSmith UK using Go

Candice Agata · 2024-12-13T06:50:38+00:00

Scraping bestsellers, delivery charges, and star ratings from WHSmith UK involves setting up a Go application using the Colly library for efficient HTML parsing and data extraction. Bestsellers are often highlighted prominently on the homepage or in a dedicated section, typically with tags like "Bestseller" or "Top Picks." These elements can be identified by inspecting the webpage's structure and locating the relevant sections.Delivery charges, an essential detail for online shopping, are generally displayed either in the product page's delivery information section or during the checkout process. Identifying and extracting these charges requires focusing on specific text blocks or classes labeled as shipping or delivery details.Star ratings for products are usually displayed as visual elements, such as filled stars, or as numerical ratings near the product reviews section. Using Colly, you can target the section where these ratings are embedded and extract them as plain text for easier analysis. Below is a complete Go implementation for scraping bestsellers, delivery charges, and star ratings from WHSmith UK:package main import ( "encoding/csv" "fmt" "log" "os" "github.com/gocolly/colly")func main() { // Create a new collector c : colly.NewCollector() // Open a CSV file to save the scraped data file, err : os.Create("whsmith_data.csv") if err ! nil { log.Fatalf("Failed to create CSV file: %v", err) } defer file.Close() writer : csv.NewWriter(file) defer writer.Flush() // Write the CSV header writer.Write(string{"Bestseller", "Delivery Charges", "Star Ratings"}) // Scrape bestsellers c.OnHTML(".bestseller-item", func(e *colly.HTMLElement) { bestseller : e.ChildText(".bestseller-title") deliveryCharges : e.ChildText(".delivery-info") starRatings : e.ChildText(".star-ratings") fmt.Printf("Bestseller: %s | Delivery Charges: %s | Star Ratings: %s\n", bestseller, deliveryCharges, starRatings) writer.Write(string{bestseller, deliveryCharges, starRatings}) }) // Visit the WHSmith bestsellers page err c.Visit("https://www.whsmith.co.uk/bestsellers") if err ! nil { log.Fatalf("Failed to visit website: %v", err) }}

General Web Scraping

Scrape bestsellers, delivery charges, and star ratings from WHSmith UK using Go

Posted by Candice Agata on 12/13/2024 at 6:50 am
Scraping bestsellers, delivery charges, and star ratings from WHSmith UK involves setting up a Go application using the Colly library for efficient HTML parsing and data extraction. Bestsellers are often highlighted prominently on the homepage or in a dedicated section, typically with tags like “Bestseller” or “Top Picks.” These elements can be identified by inspecting the webpage’s structure and locating the relevant sections.
Delivery charges, an essential detail for online shopping, are generally displayed either in the product page’s delivery information section or during the checkout process. Identifying and extracting these charges requires focusing on specific text blocks or classes labeled as shipping or delivery details.
Star ratings for products are usually displayed as visual elements, such as filled stars, or as numerical ratings near the product reviews section. Using Colly, you can target the section where these ratings are embedded and extract them as plain text for easier analysis. Below is a complete Go implementation for scraping bestsellers, delivery charges, and star ratings from WHSmith UK:
```
package main
import (
	"encoding/csv"
	"fmt"
	"log"
	"os"
	"github.com/gocolly/colly"
)
func main() {
	// Create a new collector
	c := colly.NewCollector()
	// Open a CSV file to save the scraped data
	file, err := os.Create("whsmith_data.csv")
	if err != nil {
		log.Fatalf("Failed to create CSV file: %v", err)
	}
	defer file.Close()
	writer := csv.NewWriter(file)
	defer writer.Flush()
	// Write the CSV header
	writer.Write([]string{"Bestseller", "Delivery Charges", "Star Ratings"})
	// Scrape bestsellers
	c.OnHTML(".bestseller-item", func(e *colly.HTMLElement) {
		bestseller := e.ChildText(".bestseller-title")
		deliveryCharges := e.ChildText(".delivery-info")
		starRatings := e.ChildText(".star-ratings")
		fmt.Printf("Bestseller: %s | Delivery Charges: %s | Star Ratings: %s\n", bestseller, deliveryCharges, starRatings)
		writer.Write([]string{bestseller, deliveryCharges, starRatings})
	})
	// Visit the WHSmith bestsellers page
	err = c.Visit("https://www.whsmith.co.uk/bestsellers")
	if err != nil {
		log.Fatalf("Failed to visit website: %v", err)
	}
}
```
Sandra Gowad replied 2 months, 1 week ago 5 Members · 4 Replies
4 Replies

Sandrine Vidya

Member
12/13/2024 at 10:34 am

The script could be improved by implementing better error handling to manage cases where specific elements are missing or the page fails to load. For example, adding a conditional check for each element ensures the script doesn’t break unexpectedly.
Laleh Korina

Member
12/14/2024 at 8:29 am

Adding functionality to scrape additional categories or pages dynamically would make the script more versatile. This can be achieved by extracting links to other sections and visiting them recursively.
Laurids Liljana

Member
12/17/2024 at 7:12 am

To enhance usability, the script could include an option to filter products based on star ratings or delivery charges. This would make it more practical for real-world analysis or reporting.
Sandra Gowad

Member
12/17/2024 at 11:10 am

Finally, integrating the script with a database, such as MySQL or PostgreSQL, would allow for better storage and querying of the scraped data. This approach is more efficient for larger datasets compared to saving the data in a CSV file.

Scrape bestsellers, delivery charges, and star ratings from WHSmith UK using Go

Sandrine Vidya

Laleh Korina

Laurids Liljana

Sandra Gowad