Checking Broken Links on Any Website Using C++ and MariaDB

In the digital age, maintaining a website’s integrity is crucial for user experience and search engine optimization. Broken links can lead to frustrated users and a drop in search engine rankings. This article explores how to check for broken links on any website using C++ and MariaDB, providing a comprehensive guide with examples and code snippets.

Broken links, also known as dead links, are hyperlinks that no longer lead to the intended destination. They can occur due to various reasons such as the deletion of a webpage, changes in URL structure, or server issues. Identifying and fixing these links is essential for maintaining a website’s credibility and user satisfaction.

Search engines like Google penalize websites with numerous broken links, affecting their ranking. Moreover, users encountering broken links may leave the site, increasing the bounce rate. Therefore, regular checks for broken links are necessary for both SEO and user retention.

Setting Up the Environment

Before diving into the code, it’s essential to set up the necessary environment. This involves installing a C++ compiler and MariaDB, a popular open-source relational database management system. MariaDB will be used to store and manage the URLs and their statuses.

To begin, download and install a C++ compiler such as GCC or Microsoft Visual Studio. For MariaDB, you can download the installer from the official website and follow the installation instructions. Ensure that both are correctly installed and configured on your system.

Web Scraping with C++

Web scraping is the process of extracting data from websites. In this context, we will use C++ to scrape URLs from a given website. The C++ library libcurl is an excellent tool for making HTTP requests and handling responses, which is essential for web scraping.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
#include
#include
size_t WriteCallback(void* contents, size_t size, size_t nmemb, void* userp) {
((std::string*)userp)->append((char*)contents, size * nmemb);
return size * nmemb;
}
void scrapeWebsite(const std::string& url) {
CURL* curl;
CURLcode res;
std::string readBuffer;
curl = curl_easy_init();
if(curl) {
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &readBuffer);
res = curl_easy_perform(curl);
curl_easy_cleanup(curl);
if(res != CURLE_OK) {
std::cerr << "curl_easy_perform() failed: " << curl_easy_strerror(res) << std::endl;
} else {
std::cout << "Scraped data: " << readBuffer << std::endl;
}
}
}
#include #include size_t WriteCallback(void* contents, size_t size, size_t nmemb, void* userp) { ((std::string*)userp)->append((char*)contents, size * nmemb); return size * nmemb; } void scrapeWebsite(const std::string& url) { CURL* curl; CURLcode res; std::string readBuffer; curl = curl_easy_init(); if(curl) { curl_easy_setopt(curl, CURLOPT_URL, url.c_str()); curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback); curl_easy_setopt(curl, CURLOPT_WRITEDATA, &readBuffer); res = curl_easy_perform(curl); curl_easy_cleanup(curl); if(res != CURLE_OK) { std::cerr << "curl_easy_perform() failed: " << curl_easy_strerror(res) << std::endl; } else { std::cout << "Scraped data: " << readBuffer << std::endl; } } }
#include 
#include 

size_t WriteCallback(void* contents, size_t size, size_t nmemb, void* userp) {
    ((std::string*)userp)->append((char*)contents, size * nmemb);
    return size * nmemb;
}

void scrapeWebsite(const std::string& url) {
    CURL* curl;
    CURLcode res;
    std::string readBuffer;

    curl = curl_easy_init();
    if(curl) {
        curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
        curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback);
        curl_easy_setopt(curl, CURLOPT_WRITEDATA, &readBuffer);
        res = curl_easy_perform(curl);
        curl_easy_cleanup(curl);

        if(res != CURLE_OK) {
            std::cerr << "curl_easy_perform() failed: " << curl_easy_strerror(res) << std::endl;
        } else {
            std::cout << "Scraped data: " << readBuffer << std::endl;
        }
    }
}

This code snippet demonstrates how to use libcurl to fetch data from a website. The `scrapeWebsite` function takes a URL as input and prints the HTML content of the page. This content can then be parsed to extract URLs.

Storing URLs in MariaDB

Once URLs are extracted, they need to be stored in a database for further processing. MariaDB is an ideal choice due to its robustness and ease of use. Below is a simple script to create a database and a table to store URLs and their statuses.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
CREATE DATABASE IF NOT EXISTS LinkChecker;
USE LinkChecker;
CREATE TABLE IF NOT EXISTS Links (
id INT AUTO_INCREMENT PRIMARY KEY,
url VARCHAR(255) NOT NULL,
status VARCHAR(50)
);
CREATE DATABASE IF NOT EXISTS LinkChecker; USE LinkChecker; CREATE TABLE IF NOT EXISTS Links ( id INT AUTO_INCREMENT PRIMARY KEY, url VARCHAR(255) NOT NULL, status VARCHAR(50) );
CREATE DATABASE IF NOT EXISTS LinkChecker;
USE LinkChecker;

CREATE TABLE IF NOT EXISTS Links (
    id INT AUTO_INCREMENT PRIMARY KEY,
    url VARCHAR(255) NOT NULL,
    status VARCHAR(50)
);

This script creates a database named `LinkChecker` and a table `Links` with columns for storing the URL and its status. The `status` column will later be used to indicate whether a link is broken or not.

With URLs stored in the database, the next step is to check each link’s status. This involves sending an HTTP request to each URL and checking the response code. A response code of 404 indicates a broken link.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
#include
void checkLinks() {
MYSQL* conn;
MYSQL_RES* res;
MYSQL_ROW row;
conn = mysql_init(NULL);
if (conn == NULL) {
std::cerr << "mysql_init() failedn";
return;
}
if (mysql_real_connect(conn, "localhost", "user", "password", "LinkChecker", 0, NULL, 0) == NULL) {
std::cerr << "mysql_real_connect() failedn";
mysql_close(conn);
return;
}
if (mysql_query(conn, "SELECT url FROM Links")) {
std::cerr << "SELECT failed. Error: " << mysql_error(conn) << std::endl;
mysql_close(conn);
return;
}
res = mysql_store_result(conn);
if (res == NULL) {
std::cerr << "mysql_store_result() failed. Error: " << mysql_error(conn) << std::endl;
mysql_close(conn);
return;
}
while ((row = mysql_fetch_row(res)) != NULL) {
std::string url = row[0];
// Check URL status using libcurl and update the database
}
mysql_free_result(res);
mysql_close(conn);
}
#include void checkLinks() { MYSQL* conn; MYSQL_RES* res; MYSQL_ROW row; conn = mysql_init(NULL); if (conn == NULL) { std::cerr << "mysql_init() failedn"; return; } if (mysql_real_connect(conn, "localhost", "user", "password", "LinkChecker", 0, NULL, 0) == NULL) { std::cerr << "mysql_real_connect() failedn"; mysql_close(conn); return; } if (mysql_query(conn, "SELECT url FROM Links")) { std::cerr << "SELECT failed. Error: " << mysql_error(conn) << std::endl; mysql_close(conn); return; } res = mysql_store_result(conn); if (res == NULL) { std::cerr << "mysql_store_result() failed. Error: " << mysql_error(conn) << std::endl; mysql_close(conn); return; } while ((row = mysql_fetch_row(res)) != NULL) { std::string url = row[0]; // Check URL status using libcurl and update the database } mysql_free_result(res); mysql_close(conn); }
#include 

void checkLinks() {
    MYSQL* conn;
    MYSQL_RES* res;
    MYSQL_ROW row;

    conn = mysql_init(NULL);
    if (conn == NULL) {
        std::cerr << "mysql_init() failedn";
        return;
    }

    if (mysql_real_connect(conn, "localhost", "user", "password", "LinkChecker", 0, NULL, 0) == NULL) {
        std::cerr << "mysql_real_connect() failedn";
        mysql_close(conn);
        return;
    }

    if (mysql_query(conn, "SELECT url FROM Links")) {
        std::cerr << "SELECT failed. Error: " << mysql_error(conn) << std::endl;
        mysql_close(conn);
        return;
    }

    res = mysql_store_result(conn);
    if (res == NULL) {
        std::cerr << "mysql_store_result() failed. Error: " << mysql_error(conn) << std::endl;
        mysql_close(conn);
        return;
    }

    while ((row = mysql_fetch_row(res)) != NULL) {
        std::string url = row[0];
        // Check URL status using libcurl and update the database
    }

    mysql_free_result(res);
    mysql_close(conn);
}

This C++ code connects to the MariaDB database, retrieves all URLs, and checks their status. The `checkLinks` function uses the MySQL C API to interact with the database. The actual status check can be implemented using libcurl, similar to the web scraping example.

After checking each link’s status, the database should be updated to reflect whether the link is broken. This involves executing an SQL `UPDATE` statement for each URL.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
void updateLinkStatus(MYSQL* conn, const std::string& url, const std::string& status) {
std::string query = "UPDATE Links SET status='" + status + "' WHERE url='" + url + "'";
if (mysql_query(conn, query.c_str())) {
std::cerr << "UPDATE failed. Error: " << mysql_error(conn) << std::endl;
}
}
void updateLinkStatus(MYSQL* conn, const std::string& url, const std::string& status) { std::string query = "UPDATE Links SET status='" + status + "' WHERE url='" + url + "'"; if (mysql_query(conn, query.c_str())) { std::cerr << "UPDATE failed. Error: " << mysql_error(conn) << std::endl; } }
void updateLinkStatus(MYSQL* conn, const std::string& url, const std::string& status) {
    std::string query = "UPDATE Links SET status='" + status + "' WHERE url='" + url + "'";
    if (mysql_query(conn, query.c_str())) {
        std::cerr << "UPDATE failed. Error: " << mysql_error(conn) << std::endl;
    }
}

The `updateLinkStatus` function updates the status of a given URL in the database. It takes the database connection, URL, and status as parameters and executes an SQL `UPDATE` statement.

Conclusion

Checking for broken links is a vital task for maintaining a website’s

Responses

Related blogs

an introduction to web scraping with NodeJS and Firebase. A futuristic display showcases NodeJS code extrac
parsing XML using Ruby and Firebase. A high-tech display showcases Ruby code parsing XML data structure
handling timeouts in Python Requests with Firebase. A high-tech display showcases Python code implement
downloading a file with cURL in Ruby and Firebase. A high-tech display showcases Ruby code using cURL t