Use PHP to scrape discount banners from the homepage of Selfridges UK

Paula Odalys · 2024-12-13T07:33:34+00:00

Scraping discount banners from the Selfridges UK homepage involves using PHP to fetch the webpage content with the Guzzle HTTP client and parse the HTML with DOMDocument and XPath. Discount banners are typically prominent on the homepage, showcasing ongoing promotions, seasonal sales, or limited-time offers. These banners often contain text like "Up to 50% off" or "Limited Time Offer" and are usually located within div elements or sections styled for marketing purposes.To begin, you inspect the homepage using browser developer tools to identify the exact structure of the discount banner elements. This includes analyzing their tags and classes to build a reliable query for extracting the content. Once identified, the script fetches the HTML content and parses it to locate the relevant sections. Using XPath expressions, you can extract the banner text and any associated links or images. Below is the complete PHP script for scraping discount banners from the Selfridges UK homepage:<?php require 'vendor/autoload.php';use GuzzleHttp\Client;// Initialize Guzzle client$client new Client();$response $client->get('https://www.selfridges.com/GB/en/homepage');$html $response->getBody()->getContents();// Load HTML into DOMDocument$dom new DOMDocument;libxml_use_internal_errors(true);$dom->loadHTML($html);libxml_clear_errors();// Initialize XPath$xpath new DOMXPath($dom);// Scrape discount banners$banners $xpath->query('//div');if ($banners->length > 0) { foreach ($banners as $banner) { $text trim($banner->nodeValue); $link $banner->getElementsByTagName('a')->item(0)->getAttribute('href') ?? 'No link'; echo "Banner: $text\n"; echo "Link: $link\n"; echo "-------------------\n"; }} else { echo "No discount banners found.\n";}?>

General Web Scraping

Use PHP to scrape discount banners from the homepage of Selfridges UK

Posted by Paula Odalys on 12/13/2024 at 7:33 am
Scraping discount banners from the Selfridges UK homepage involves using PHP to fetch the webpage content with the Guzzle HTTP client and parse the HTML with DOMDocument and XPath. Discount banners are typically prominent on the homepage, showcasing ongoing promotions, seasonal sales, or limited-time offers. These banners often contain text like “Up to 50% off” or “Limited Time Offer” and are usually located within div elements or sections styled for marketing purposes.
To begin, you inspect the homepage using browser developer tools to identify the exact structure of the discount banner elements. This includes analyzing their tags and classes to build a reliable query for extracting the content. Once identified, the script fetches the HTML content and parses it to locate the relevant sections. Using XPath expressions, you can extract the banner text and any associated links or images. Below is the complete PHP script for scraping discount banners from the Selfridges UK homepage:
```
<?php
require 'vendor/autoload.php';
use GuzzleHttp\Client;
// Initialize Guzzle client
$client = new Client();
$response = $client->get('https://www.selfridges.com/GB/en/homepage');
$html = $response->getBody()->getContents();
// Load HTML into DOMDocument
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();
// Initialize XPath
$xpath = new DOMXPath($dom);
// Scrape discount banners
$banners = $xpath->query('//div[contains(@class, "banner-discount")]');
if ($banners->length > 0) {
    foreach ($banners as $banner) {
        $text = trim($banner->nodeValue);
        $link = $banner->getElementsByTagName('a')->item(0)->getAttribute('href') ?? 'No link';
        echo "Banner: $text\n";
        echo "Link: $link\n";
        echo "-------------------\n";
    }
} else {
    echo "No discount banners found.\n";
}
?>
```
Ivo Joris replied 1 year, 7 months ago 5 Members · 4 Replies
4 Replies

Artur Mirjam

Member
12/13/2024 at 11:31 am

The script could be improved by handling cases where discount banners have multiple images or links. Adding logic to extract and save all related assets, such as images and descriptions, would provide a more complete dataset.
Aretha Melech

Member
12/14/2024 at 8:39 am

Another enhancement could involve scheduling this script to run periodically to track changes in discounts or new promotions on the homepage. This could be done using cron jobs in a Linux environment.
Anil Dalila

Member
12/17/2024 at 7:26 am

Adding SSL verification for the Guzzle client would improve the security of the script when accessing the website. This ensures that the data fetched is transmitted securely over HTTPS.
Ivo Joris

Member
12/18/2024 at 6:57 am

The script could be extended to store the scraped discount banners in a database or JSON file. This would make it easier to analyze trends or share the collected data with other systems.

Use PHP to scrape discount banners from the homepage of Selfridges UK

Artur Mirjam

Aretha Melech

Anil Dalila

Ivo Joris