News Feed Forums General Web Scraping Use PHP to scrape discount banners from the homepage of Selfridges UK

  • Use PHP to scrape discount banners from the homepage of Selfridges UK

    Posted by Paula Odalys on 12/13/2024 at 7:33 am

    Scraping discount banners from the Selfridges UK homepage involves using PHP to fetch the webpage content with the Guzzle HTTP client and parse the HTML with DOMDocument and XPath. Discount banners are typically prominent on the homepage, showcasing ongoing promotions, seasonal sales, or limited-time offers. These banners often contain text like “Up to 50% off” or “Limited Time Offer” and are usually located within div elements or sections styled for marketing purposes.
    To begin, you inspect the homepage using browser developer tools to identify the exact structure of the discount banner elements. This includes analyzing their tags and classes to build a reliable query for extracting the content. Once identified, the script fetches the HTML content and parses it to locate the relevant sections. Using XPath expressions, you can extract the banner text and any associated links or images. Below is the complete PHP script for scraping discount banners from the Selfridges UK homepage:

    <?php
    require 'vendor/autoload.php';
    use GuzzleHttp\Client;
    // Initialize Guzzle client
    $client = new Client();
    $response = $client->get('https://www.selfridges.com/GB/en/homepage');
    $html = $response->getBody()->getContents();
    // Load HTML into DOMDocument
    $dom = new DOMDocument;
    libxml_use_internal_errors(true);
    $dom->loadHTML($html);
    libxml_clear_errors();
    // Initialize XPath
    $xpath = new DOMXPath($dom);
    // Scrape discount banners
    $banners = $xpath->query('//div[contains(@class, "banner-discount")]');
    if ($banners->length > 0) {
        foreach ($banners as $banner) {
            $text = trim($banner->nodeValue);
            $link = $banner->getElementsByTagName('a')->item(0)->getAttribute('href') ?? 'No link';
            echo "Banner: $text\n";
            echo "Link: $link\n";
            echo "-------------------\n";
        }
    } else {
        echo "No discount banners found.\n";
    }
    ?>
    
    Ivo Joris replied 1 month ago 5 Members · 4 Replies
  • 4 Replies
  • Artur Mirjam

    Member
    12/13/2024 at 11:31 am

    The script could be improved by handling cases where discount banners have multiple images or links. Adding logic to extract and save all related assets, such as images and descriptions, would provide a more complete dataset.

  • Aretha Melech

    Member
    12/14/2024 at 8:39 am

    Another enhancement could involve scheduling this script to run periodically to track changes in discounts or new promotions on the homepage. This could be done using cron jobs in a Linux environment.

  • Anil Dalila

    Member
    12/17/2024 at 7:26 am

    Adding SSL verification for the Guzzle client would improve the security of the script when accessing the website. This ensures that the data fetched is transmitted securely over HTTPS.

  • Ivo Joris

    Member
    12/18/2024 at 6:57 am

    The script could be extended to store the scraped discount banners in a database or JSON file. This would make it easier to analyze trends or share the collected data with other systems.

Log in to reply.