Extract Data from Mudah.my with C# MySQL: Extracting Classified Ads, Seller Contact Info, and Listing Prices for Market Research

Extract Data from Mudah.my with C# & MySQL: Extracting Classified Ads, Seller Contact Info, and Listing Prices for Market Research

In the digital age, data is a powerful tool for businesses looking to gain a competitive edge. One of the most valuable sources of data is online classified ads, which can provide insights into market trends, pricing strategies, and consumer behavior. Mudah.my, a popular online marketplace in Malaysia, offers a wealth of information that can be harnessed for market research. This article will guide you through the process of extracting data from Mudah.my using C# and MySQL, focusing on classified ads, seller contact information, and listing prices.

Understanding the Importance of Data Extraction

Data extraction from online platforms like Mudah.my is crucial for businesses aiming to understand market dynamics. By analyzing classified ads, companies can identify popular products, assess pricing strategies, and gauge consumer demand. This information is invaluable for making informed business decisions and staying ahead of competitors.

Moreover, extracting seller contact information allows businesses to build a network of potential partners or clients. It also enables targeted marketing efforts, ensuring that promotional activities reach the right audience. Finally, analyzing listing prices helps businesses set competitive prices for their products or services, maximizing profitability.

Setting Up the Development Environment

Before diving into the data extraction process, it’s essential to set up a suitable development environment. This involves installing the necessary software and tools to facilitate the extraction process. For this project, you’ll need to have C# and MySQL installed on your system.

C# is a versatile programming language that is well-suited for web scraping tasks. It offers robust libraries and frameworks that simplify the process of extracting data from websites. MySQL, on the other hand, is a powerful database management system that allows you to store and manage the extracted data efficiently.

Extracting Classified Ads with C#

To extract classified ads from Mudah.my, you’ll need to write a C# script that can navigate the website and retrieve the desired information. This involves using web scraping techniques to parse the HTML content of the site and extract relevant data points.

Here’s a basic example of a C# script that extracts classified ads from Mudah.my:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
using System;
using HtmlAgilityPack;
using System.Net.Http;
using System.Threading.Tasks;
class Program
{
static async Task Main(string[] args)
{
var url = "https://www.mudah.my";
var httpClient = new HttpClient();
var html = await httpClient.GetStringAsync(url);
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html);
var ads = htmlDocument.DocumentNode.SelectNodes("//div[@class='listing_ads']");
foreach (var ad in ads)
{
var title = ad.SelectSingleNode(".//h2").InnerText;
var price = ad.SelectSingleNode(".//span[@class='price']").InnerText;
Console.WriteLine($"Title: {title}, Price: {price}");
}
}
}
using System; using HtmlAgilityPack; using System.Net.Http; using System.Threading.Tasks; class Program { static async Task Main(string[] args) { var url = "https://www.mudah.my"; var httpClient = new HttpClient(); var html = await httpClient.GetStringAsync(url); var htmlDocument = new HtmlDocument(); htmlDocument.LoadHtml(html); var ads = htmlDocument.DocumentNode.SelectNodes("//div[@class='listing_ads']"); foreach (var ad in ads) { var title = ad.SelectSingleNode(".//h2").InnerText; var price = ad.SelectSingleNode(".//span[@class='price']").InnerText; Console.WriteLine($"Title: {title}, Price: {price}"); } } }
using System;
using HtmlAgilityPack;
using System.Net.Http;
using System.Threading.Tasks;

class Program
{
    static async Task Main(string[] args)
    {
        var url = "https://www.mudah.my";
        var httpClient = new HttpClient();
        var html = await httpClient.GetStringAsync(url);

        var htmlDocument = new HtmlDocument();
        htmlDocument.LoadHtml(html);

        var ads = htmlDocument.DocumentNode.SelectNodes("//div[@class='listing_ads']");

        foreach (var ad in ads)
        {
            var title = ad.SelectSingleNode(".//h2").InnerText;
            var price = ad.SelectSingleNode(".//span[@class='price']").InnerText;
            Console.WriteLine($"Title: {title}, Price: {price}");
        }
    }
}

This script uses the HtmlAgilityPack library to parse the HTML content of Mudah.my and extract the titles and prices of classified ads. You can modify the script to extract additional information, such as seller contact details, by adjusting the XPath queries.

Storing Extracted Data in MySQL

Once you’ve extracted the data, the next step is to store it in a MySQL database for further analysis. This involves creating a database schema that can accommodate the extracted information, such as ad titles, prices, and seller contact details.

Here’s an example of a MySQL script that creates a database schema for storing the extracted data:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
CREATE DATABASE MudahData;
USE MudahData;
CREATE TABLE ClassifiedAds (
AdID INT AUTO_INCREMENT PRIMARY KEY,
Title VARCHAR(255),
Price VARCHAR(50),
SellerContact VARCHAR(100)
);
CREATE DATABASE MudahData; USE MudahData; CREATE TABLE ClassifiedAds ( AdID INT AUTO_INCREMENT PRIMARY KEY, Title VARCHAR(255), Price VARCHAR(50), SellerContact VARCHAR(100) );
CREATE DATABASE MudahData;

USE MudahData;

CREATE TABLE ClassifiedAds (
    AdID INT AUTO_INCREMENT PRIMARY KEY,
    Title VARCHAR(255),
    Price VARCHAR(50),
    SellerContact VARCHAR(100)
);

This script creates a database named “MudahData” and a table called “ClassifiedAds” with columns for storing ad titles, prices, and seller contact information. You can expand the schema to include additional fields as needed.

Integrating C# and MySQL for Data Storage

To integrate the C# script with the MySQL database, you’ll need to establish a connection between the two. This involves using a MySQL connector library in your C# project to execute SQL queries and insert the extracted data into the database.

Here’s an example of how you can modify the C# script to store the extracted data in MySQL:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
using MySql.Data.MySqlClient;
// Add this method to your existing C# script
static void InsertDataIntoDatabase(string title, string price, string sellerContact)
{
string connectionString = "Server=localhost;Database=MudahData;User ID=root;Password=yourpassword;";
using (var connection = new MySqlConnection(connectionString))
{
connection.Open();
var query = "INSERT INTO ClassifiedAds (Title, Price, SellerContact) VALUES (@Title, @Price, @SellerContact)";
using (var command = new MySqlCommand(query, connection))
{
command.Parameters.AddWithValue("@Title", title);
command.Parameters.AddWithValue("@Price", price);
command.Parameters.AddWithValue("@SellerContact", sellerContact);
command.ExecuteNonQuery();
}
}
}
using MySql.Data.MySqlClient; // Add this method to your existing C# script static void InsertDataIntoDatabase(string title, string price, string sellerContact) { string connectionString = "Server=localhost;Database=MudahData;User ID=root;Password=yourpassword;"; using (var connection = new MySqlConnection(connectionString)) { connection.Open(); var query = "INSERT INTO ClassifiedAds (Title, Price, SellerContact) VALUES (@Title, @Price, @SellerContact)"; using (var command = new MySqlCommand(query, connection)) { command.Parameters.AddWithValue("@Title", title); command.Parameters.AddWithValue("@Price", price); command.Parameters.AddWithValue("@SellerContact", sellerContact); command.ExecuteNonQuery(); } } }
using MySql.Data.MySqlClient;

// Add this method to your existing C# script
static void InsertDataIntoDatabase(string title, string price, string sellerContact)
{
    string connectionString = "Server=localhost;Database=MudahData;User ID=root;Password=yourpassword;";
    using (var connection = new MySqlConnection(connectionString))
    {
        connection.Open();
        var query = "INSERT INTO ClassifiedAds (Title, Price, SellerContact) VALUES (@Title, @Price, @SellerContact)";
        using (var command = new MySqlCommand(query, connection))
        {
            command.Parameters.AddWithValue("@Title", title);
            command.Parameters.AddWithValue("@Price", price);
            command.Parameters.AddWithValue("@SellerContact", sellerContact);
            command.ExecuteNonQuery();
        }
    }
}

This method establishes a connection to the MySQL database and inserts the extracted data into the “ClassifiedAds” table. You can call this method within your main script to store each ad’s information as it’s extracted.

When extracting data from websites, it’s crucial to ensure compliance with legal and ethical standards. This includes respecting the website’s terms of service and privacy policies, as well as adhering to data protection regulations such as the General Data Protection Regulation (GDPR).

Before proceeding with data extraction, review Mudah.my’s terms of service to ensure that your activities are permitted. Additionally, consider implementing measures to anonymize and protect any personal data you collect, such as seller contact information.

Conclusion

Extracting data from Mudah.my using C# and MySQL can provide valuable insights for market research. By analyzing classified ads, seller contact information, and listing prices, businesses can make informed decisions and gain a competitive edge. This article has outlined the steps involved in setting up a development environment, extracting data with C#, storing it in a MySQL database, and ensuring compliance with legal and ethical standards. By following these guidelines, you can harness the power of data to drive your business forward.

Responses

  1. Tools Used
    Requests: To fetch HTML content.

    BeautifulSoup: To parse and extract data from the HTML.

    Psycopg2: To connect and store data in PostgreSQL.

    1. Install Required Libraries
    Before running the script, install the necessary Python libraries:

    bash
    Copy
    Edit
    pip install requests beautifulsoup4 psycopg2
    2. Python Web Scraper for Mudah.my
    This script scrapes title, price, location, and date posted from Mudah.my classified ads.

    python
    Copy
    Edit
    import requests
    from bs4 import BeautifulSoup
    import psycopg2

    # Database connection setup
    DB_NAME = “mudah_data”
    DB_USER = “postgres”
    DB_PASSWORD = “yourpassword”
    DB_HOST = “localhost”
    DB_PORT = “5432”

    def connect_db():
    return psycopg2.connect(
    dbname=DB_NAME,
    user=DB_USER,
    password=DB_PASSWORD,
    host=DB_HOST,
    port=DB_PORT
    )

    def create_table():
    “””Creates the table if it does not exist”””
    conn = connect_db()
    cursor = conn.cursor()
    cursor.execute(“””
    CREATE TABLE IF NOT EXISTS classified_ads (
    id SERIAL PRIMARY KEY,
    title TEXT,
    price TEXT,
    location TEXT,
    date_posted TEXT
    );
    “””)
    conn.commit()
    cursor.close()
    conn.close()

    def scrape_mudah():
    url = “https://www.mudah.my/malaysia/all”
    headers = {“User-Agent”: “Mozilla/5.0”}

    response = requests.get(url, headers=headers)
    if response.status_code != 200:
    print(“Failed to retrieve the page”)
    return

    soup = BeautifulSoup(response.text, “html.parser”)
    ads = soup.select(“div.listing_ads”) # Adjust based on Mudah’s actual structure

    extracted_data = []

    for ad in ads:
    title = ad.select_one(“h2”).get_text(strip=True) if ad.select_one(“h2”) else “N/A”
    price = ad.select_one(“span.price”).get_text(strip=True) if ad.select_one(“span.price”) else “N/A”
    location = ad.select_one(“span.location”).get_text(strip=True) if ad.select_one(“span.location”) else “N/A”
    date_posted = ad.select_one(“span.date-posted”).get_text(strip=True) if ad.select_one(“span.date-posted”) else “N/A”

    extracted_data.append((title, price, location, date_posted))
    print(f”Title: {title}, Price: {price}, Location: {location}, Date: {date_posted}”)

    return extracted_data

    def store_data(data):
    “””Stores extracted data in PostgreSQL”””
    conn = connect_db()
    cursor = conn.cursor()

    for title, price, location, date_posted in data:
    cursor.execute(“””
    INSERT INTO classified_ads (title, price, location, date_posted)
    VALUES (%s, %s, %s, %s);
    “””, (title, price, location, date_posted))

    conn.commit()
    cursor.close()
    conn.close()

    if __name__ == “__main__”:
    create_table() # Ensure table exists
    data = scrape_mudah()
    if data:
    store_data(data)
    3. PostgreSQL Database Setup
    Run the following SQL commands in PostgreSQL to create the database:

    sql
    Copy
    Edit
    CREATE DATABASE mudah_data;

    \c mudah_data;

    CREATE TABLE classified_ads (
    id SERIAL PRIMARY KEY,
    title TEXT,
    price TEXT,
    location TEXT,
    date_posted TEXT
    );
    4. How This Works
    Scrapes classified ads from Mudah.my.

    Extracts title, price, location, and date posted.

    Stores the data in a PostgreSQL database.

    5. Future Improvements
    Use Selenium for dynamic content.

    Schedule the script with cron jobs.

    Store additional details like seller contact.

  2. My version would be to use Node.js for web scraping and MongoDB for data storage

    Programming Language: JavaScript (Node.js)

    Web Scraping Library: axios, cheerio

    Database: MongoDB (NoSQL)

    Database Connector: mongodb (MongoDB Node.js driver)

    1️⃣ Install Required Dependencies
    Before running the script, install the required libraries:

    sh
    Copy
    Edit
    npm install axios cheerio mongodb
    2️⃣ Node.js Web Scraper
    javascript
    Copy
    Edit
    const axios = require(“axios”);
    const cheerio = require(“cheerio”);
    const { MongoClient } = require(“mongodb”);

    // MongoDB connection details
    const DB_URI = “mongodb://localhost:27017”;
    const DB_NAME = “mudahDB”;
    const COLLECTION_NAME = “classified_ads”;

    // Function to store data in MongoDB
    async function storeData(ads) {
    const client = new MongoClient(DB_URI, { useNewUrlParser: true, useUnifiedTopology: true });
    try {
    await client.connect();
    const db = client.db(DB_NAME);
    const collection = db.collection(COLLECTION_NAME);

    await collection.insertMany(ads);
    console.log(“✅ Data successfully stored in MongoDB!”);
    } catch (error) {
    console.error(“❌ Error storing data:”, error);
    } finally {
    await client.close();
    }
    }

    // Function to scrape Mudah.my
    async function scrapeMudah() {
    const url = “https://www.mudah.my/malaysia/for-sale”;
    const headers = { “User-Agent”: “Mozilla/5.0” };

    try {
    const response = await axios.get(url, { headers });
    const $ = cheerio.load(response.data);

    let ads = [];
    $(“.sc-1sj3nln-0”).each((index, element) => {
    const title = $(element).find(“h2”).text().trim();
    const price = $(element).find(“.sc-1kn4z61-1”).text().trim() || “N/A”;
    const location = $(element).find(“.listing-location”).text().trim() || “Unknown”;
    const postDate = $(element).find(“.listing-post-date”).text().trim() || “Unknown”;
    const seller = $(element).find(“.seller-name”).text().trim() || “Unknown”;

    ads.push({ title, price, location, postDate, seller });
    });

    console.log(“📋 Scraped Ads:”, ads);
    await storeData(ads);
    } catch (error) {
    console.error(“❌ Error scraping Mudah.my:”, error);
    }
    }

    // Run the scraper
    scrapeMudah();
    3️⃣ MongoDB Database Setup
    Start MongoDB and create the database:

    sh
    Copy
    Edit
    mongo
    use mudahDB
    db.createCollection(“classified_ads”)
    You can check stored ads using:

    sh
    Copy
    Edit
    db.classified_ads.find().pretty()
    🌟 Key Improvements in This Version
    Switched to Node.js – Non-blocking, asynchronous scraping with axios and cheerio.

    Switched to MongoDB – A NoSQL database that stores ads in a flexible format.

    Extracted More Data Points:

    Location (where the item is being sold)

    Post Date (when the ad was posted)

    Seller Name (who is selling the item)

    Bulk Insertion – Instead of inserting records one by one, we insert all at once for better efficiency.

    Improved Error Handling – Handles missing data and MongoDB connection issues.

    🔜 Next Steps
    Schedule Scraper using node-cron to run periodically.

    Use Proxies to prevent IP blocks.

    Build a Dashboard to visualize data in a web UI.

Related blogs

an introduction to web scraping with NodeJS and Firebase. A futuristic display showcases NodeJS code extrac
parsing XML using Ruby and Firebase. A high-tech display showcases Ruby code parsing XML data structure
handling timeouts in Python Requests with Firebase. A high-tech display showcases Python code implement
downloading a file with cURL in Ruby and Firebase. A high-tech display showcases Ruby code using cURL t