News Feed Forums General Web Scraping Analyze discounts, seller details, and shipping options from ASDA UK using PHP

  • Analyze discounts, seller details, and shipping options from ASDA UK using PHP

    Posted by Arturs Caleb on 12/12/2024 at 11:55 am

    Scraping discounts, seller details, and shipping options from ASDA UK requires a well-structured approach due to the dynamic nature of modern e-commerce websites. The first step involves analyzing the webpage structure by inspecting the HTML elements to identify the specific sections containing the required data points. Discounts are typically shown alongside the product pricing, often with tags like “Offer” or “Discount.” These elements are generally located in span or div tags that can be accessed using CSS selectors.
    Seller details are a critical data point, especially for third-party sellers on platforms like ASDA. This information is often presented below the product description or near the pricing section. Using PHP, you can extract these details by targeting the specific class or ID associated with the seller’s name, rating, and other details.
    Shipping options and costs vary based on the delivery region. These details are usually found during the checkout process or as part of the product details. PHP’s cURL library is useful for fetching page content, while DOMDocument and DOMXPath help parse and extract the data. This script will target discounts, seller information, and shipping options and save them in a structured format like JSON for further processing.
    Below is a complete implementation in PHP for scraping discounts, seller details, and shipping options from ASDA UK.

    <?php
    // Initialize cURL
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, 'https://groceries.asda.com/product-page');
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $html = curl_exec($ch);
    curl_close($ch);
    // Load HTML into DOMDocument
    $dom = new DOMDocument;
    libxml_use_internal_errors(true);
    $dom->loadHTML($html);
    libxml_clear_errors();
    // Initialize XPath for querying
    $xpath = new DOMXPath($dom);
    // Scrape discounts
    $discount = $xpath->query('//span[@class="offer-text"]')->item(0);
    $discountText = $discount ? trim($discount->nodeValue) : 'No discounts available';
    echo "Discount: $discountText\n";
    // Scrape seller details
    $seller = $xpath->query('//div[@class="seller-details"]')->item(0);
    $sellerText = $seller ? trim($seller->nodeValue) : 'ASDA (no third-party seller)';
    echo "Seller: $sellerText\n";
    // Scrape shipping options
    $shipping = $xpath->query('//div[@class="shipping-options"]')->item(0);
    $shippingText = $shipping ? trim($shipping->nodeValue) : 'Shipping information not available';
    echo "Shipping Options: $shippingText\n";
    // Save to JSON
    $data = [
        'discount' => $discountText,
        'seller' => $sellerText,
        'shipping' => $shippingText
    ];
    file_put_contents('asda_data.json', json_encode($data, JSON_PRETTY_PRINT));
    ?>
    
    Elora Shani replied 1 month ago 5 Members · 4 Replies
  • 4 Replies
  • Flora Abdias

    Member
    12/13/2024 at 9:28 am

    The script could be improved by adding error handling for missing elements. For example, if discounts or shipping options are not found, logging these cases separately can help identify trends in missing data.

  • Jessie Georgijs

    Member
    12/14/2024 at 8:04 am

    Another improvement would be implementing pagination support to scrape discounts and sellers across multiple products. By identifying “Next Page” buttons and recursively visiting them, the script could collect a broader dataset.

  • Isa Charly

    Member
    12/17/2024 at 6:25 am

    To make the script more dynamic, you could pass the product URL as a parameter to avoid hardcoding it. This way, the script can be reused for multiple pages without modification.

  • Elora Shani

    Member
    12/17/2024 at 10:51 am

    Finally, using a database like MySQL to store the scraped data instead of saving it as JSON would allow for better querying and analysis. This approach would also make the data more accessible for integration with other tools or reports.

Log in to reply.