News Feed Forums General Web Scraping Extract top deals, shipping costs, and ratings from John Lewis UK using Go

  • Extract top deals, shipping costs, and ratings from John Lewis UK using Go

    Posted by Narayanan Syed on 12/12/2024 at 11:34 am

    Scraping data from John Lewis UK, a renowned retailer in the UK, involves targeting key information such as top deals, shipping costs, and product ratings. Using Go with the Colly library, you can automate the scraping process to collect this data efficiently. The first step in this process is understanding the page structure by inspecting the HTML elements. By doing this, you can locate the specific classes or IDs that contain the information for top deals, shipping costs, and ratings.
    Top deals are usually displayed prominently on the page, often within a div or span tag that highlights discounted prices or promotions. These sections may include tags like “Offer” or “Deal,” making them easier to identify and extract. Once extracted, this data can be processed to calculate savings or identify the most competitive offers.
    Shipping costs, on the other hand, are typically located in the product details section or the checkout page. It’s important to handle cases where shipping costs depend on the location of the buyer. If the page requires interaction to display shipping costs, such as selecting a region, additional tools like browser automation might be needed.
    Ratings are usually displayed in the form of stars or a numerical value. This information is critical for understanding customer feedback and product quality. Scraping ratings involves locating the HTML element where this data is stored, often in a div or span tag adjacent to reviews.
    Using Colly in Go, you can design a scraper to navigate through pages, extract the relevant data, and save it in a structured format like JSON or CSV. Colly’s simplicity and speed make it an excellent choice for such tasks. Below is a complete implementation to scrape top deals, shipping costs, and ratings from John Lewis UK:

    package main
    import (
    	"encoding/csv"
    	"fmt"
    	"log"
    	"os"
    	"github.com/gocolly/colly"
    )
    func main() {
    	// Create a new Colly collector
    	c := colly.NewCollector()
    	// Open CSV file for saving scraped data
    	file, err := os.Create("john_lewis_deals.csv")
    	if err != nil {
    		log.Fatalf("Could not create file: %v", err)
    	}
    	defer file.Close()
    	writer := csv.NewWriter(file)
    	defer writer.Flush()
    	// Write CSV header
    	writer.Write([]string{"Deal Title", "Price", "Shipping Cost", "Rating"})
    	// Scrape top deals
    	c.OnHTML(".product-card", func(e *colly.HTMLElement) {
    		dealTitle := e.ChildText(".product-card__title")
    		price := e.ChildText(".price")
    		shippingCost := e.ChildText(".shipping-cost")
    		rating := e.ChildText(".star-rating")
    		// Print and save data
    		fmt.Printf("Deal: %s | Price: %s | Shipping: %s | Rating: %s\n", dealTitle, price, shippingCost, rating)
    		writer.Write([]string{dealTitle, price, shippingCost, rating})
    	})
    	// Visit the website
    	err = c.Visit("https://www.johnlewis.com/deals")
    	if err != nil {
    		log.Fatalf("Failed to visit website: %v", err)
    	}
    }
    
    Alexius Poncio replied 1 week, 1 day ago 5 Members · 4 Replies
  • 4 Replies
  • Ahmose Tetty

    Member
    12/13/2024 at 8:22 am

    One improvement to this script is handling edge cases where certain elements might be missing, such as a product without a rating or shipping cost. Adding conditional checks would ensure the script does not break when such elements are absent.

  • Abidan Grete

    Member
    12/13/2024 at 10:05 am

    The script could be enhanced by adding pagination support to scrape all deals across multiple pages. This can be achieved by using Colly’s OnHTML callback to extract the “Next Page” link and visiting it recursively.

  • Nitin Annemarie

    Member
    12/13/2024 at 10:48 am

    To make the script more robust, you can include a retry mechanism for failed requests. Colly provides built-in support for retries, which is useful for handling temporary network issues or server-side rate-limiting.

  • Alexius Poncio

    Member
    12/14/2024 at 7:45 am

    Finally, integrating a database like PostgreSQL instead of writing to a CSV file would allow better data management. Using a database ensures scalability and makes querying the scraped data more efficient.

Log in to reply.