Harvesting Electronics Deals from Cimri.com Using Go & SQLite: Collecting Tech Discounts, Best-Selling Products, and Consumer Ratings

In the digital age, finding the best deals on electronics can be a daunting task. With countless online platforms offering a myriad of products, consumers often find themselves overwhelmed. Cimri.com, a popular Turkish price comparison site, offers a solution by aggregating deals and consumer ratings. This article explores how to harness the power of Go and SQLite to efficiently scrape and store data from Cimri.com, enabling users to make informed purchasing decisions.

Understanding the Basics of Web Scraping

Web scraping is the process of extracting data from websites. It involves fetching the HTML of a webpage and parsing it to retrieve the desired information. This technique is invaluable for collecting data from e-commerce sites like Cimri.com, where prices and product details are frequently updated.

Using Go, a statically typed, compiled language known for its efficiency and simplicity, we can create a robust web scraper. Go’s concurrency model allows for efficient handling of multiple requests, making it ideal for scraping large datasets. Additionally, SQLite, a lightweight database engine, provides a simple yet powerful way to store and query the scraped data.

Setting Up Your Go Environment

Before diving into the code, ensure that you have Go installed on your machine. You can download it from the official Go website. Once installed, set up your workspace by creating a new directory for your project. This will help keep your files organized and make it easier to manage dependencies.

Next, initialize a new Go module in your project directory. This will allow you to manage your project’s dependencies using Go modules. Run the following command in your terminal:

go mod init cimri-scraper

With your environment set up, you can now start writing the code to scrape Cimri.com.

Writing the Web Scraper in Go

To begin, you’ll need to import the necessary packages. The “net/http” package will allow you to make HTTP requests, while the “golang.org/x/net/html” package provides tools for parsing HTML. Additionally, you’ll use the “github.com/mattn/go-sqlite3” package to interact with SQLite.

Here’s a basic structure for your Go web scraper:

package main

import (
    "database/sql"
    "fmt"
    "net/http"
    "golang.org/x/net/html"
    _ "github.com/mattn/go-sqlite3"
)

func main() {
    // Connect to SQLite database
    db, err := sql.Open("sqlite3", "./cimri.db")
    if err != nil {
        panic(err)
    }
    defer db.Close()

    // Create table for storing product data
    createTable(db)

    // Fetch and parse data from Cimri.com
    url := "https://www.cimri.com"
    resp, err := http.Get(url)
    if err != nil {
        panic(err)
    }
    defer resp.Body.Close()

    // Parse HTML and extract data
    doc, err := html.Parse(resp.Body)
    if err != nil {
        panic(err)
    }

    // Extract and store product data
    extractData(doc, db)
}

func createTable(db *sql.DB) {
    query := `
    CREATE TABLE IF NOT EXISTS products (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        name TEXT,
        price TEXT,
        rating TEXT
    );`
    _, err := db.Exec(query)
    if err != nil {
        panic(err)
    }
}

func extractData(n *html.Node, db *sql.DB) {
    // Implement data extraction logic here
}

This code sets up a connection to an SQLite database and creates a table for storing product data. The `extractData` function will contain the logic for parsing the HTML and extracting the relevant information.

Extracting and Storing Data

To extract data from the HTML, you’ll need to traverse the DOM tree and identify the elements containing the desired information. This can be done using a recursive function that visits each node in the tree.

For example, if you’re interested in extracting product names, prices, and ratings, you might look for elements with specific class names or attributes. Once you’ve identified the relevant elements, you can extract their text content and store it in the database.

Here’s an example of how you might implement the `extractData` function:

func extractData(n *html.Node, db *sql.DB) {
    if n.Type == html.ElementNode && n.Data == "div" {
        for _, a := range n.Attr {
            if a.Key == "class" && a.Val == "product-info" {
                // Extract product details
                name := extractText(n, "product-name")
                price := extractText(n, "product-price")
                rating := extractText(n, "product-rating")

                // Insert data into database
                insertData(db, name, price, rating)
            }
        }
    }

    // Recursively visit child nodes
    for c := n.FirstChild; c != nil; c = c.NextSibling {
        extractData(c, db)
    }
}

func extractText(n *html.Node, className string) string {
    // Implement logic to extract text content based on class name
    return ""
}

func insertData(db *sql.DB, name, price, rating string) {
    query := `INSERT INTO products (name, price, rating) VALUES (?, ?, ?)`
    _, err := db.Exec(query, name, price, rating)
    if err != nil {
        panic(err)
    }
}

This code defines a recursive function that traverses the DOM tree and extracts product details based on class names. The extracted data is then inserted into the SQLite database.

Analyzing and Utilizing the Data

Once you’ve successfully scraped and stored the data, you can use SQL queries to analyze it. For example, you might want to find the top-rated products or identify the best deals based on price reductions.

SQLite provides a powerful query language that allows you to perform complex analyses on your data. You can use SQL functions to calculate averages, find maximum or minimum values, and group data by specific criteria.

Here’s an example of a query that retrieves the top 5 best-selling products based on consumer ratings:

SELECT name, price, rating
FROM products
ORDER BY rating DESC
LIMIT 5;

This query sorts the products by their ratings in descending order and returns the top 5 results. You can modify the query to suit your specific needs and gain valuable insights from the