News Feed Forums General Web Scraping Extract discounts, product reviews, seller details from The Entertainer UK -PHP

  • Extract discounts, product reviews, seller details from The Entertainer UK -PHP

    Posted by Michael Woo on 12/05/2024 at 12:30 pm

    Scraping limited-time discounts, product reviews, and seller details from The Entertainer UK involves using PHP with the Guzzle HTTP client to fetch the webpage content and DOMDocument for HTML parsing. Limited-time discounts are usually displayed prominently on the product page or within banners on the site, often marked with text like “Hurry, ends soon” or a countdown timer. By inspecting the structure of these sections, you can locate the relevant tags and extract discount details.
    Product reviews provide valuable insights into customer opinions and are typically located in a dedicated reviews section. This section often includes the reviewer’s name, rating, and comment. Scraping reviews might require handling pagination if multiple pages of reviews are present. It’s also important to clean and structure this data properly for further analysis.
    Seller details, especially for marketplace items, are usually presented near the pricing section or at the bottom of the product description. These details may include the seller’s name, rating, and policies. Extracting this information involves targeting the specific tags that house these details.
    Below is a PHP script that uses Guzzle and DOMDocument to scrape limited-time discounts, product reviews, and seller details from The Entertainer UK.

    <?php
    require 'vendor/autoload.php';
    use GuzzleHttp\Client;
    // Initialize Guzzle client
    $client = new Client();
    $response = $client->get('https://www.thetoyshop.com/product-page');
    $html = $response->getBody()->getContents();
    // Load HTML into DOMDocument
    $dom = new DOMDocument;
    libxml_use_internal_errors(true);
    $dom->loadHTML($html);
    libxml_clear_errors();
    // Initialize XPath
    $xpath = new DOMXPath($dom);
    // Scrape limited-time discounts
    $discount = $xpath->query('//span[contains(@class, "limited-time-discount")]')->item(0);
    $discountText = $discount ? trim($discount->nodeValue) : 'No limited-time discounts available';
    echo "Discount: $discountText\n";
    // Scrape product reviews
    $reviews = $xpath->query('//div[@class="product-review"]');
    if ($reviews->length > 0) {
        foreach ($reviews as $review) {
            $reviewer = $xpath->query('.//span[@class="reviewer-name"]', $review)->item(0)->nodeValue ?? 'Anonymous';
            $rating = $xpath->query('.//span[@class="review-rating"]', $review)->item(0)->nodeValue ?? 'No rating';
            $comment = $xpath->query('.//p[@class="review-comment"]', $review)->item(0)->nodeValue ?? 'No comment';
            echo "Reviewer: $reviewer | Rating: $rating | Comment: $comment\n";
        }
    } else {
        echo "No reviews available\n";
    }
    // Scrape seller details
    $seller = $xpath->query('//div[@class="seller-details"]')->item(0);
    $sellerText = $seller ? trim($seller->nodeValue) : 'No seller information available';
    echo "Seller: $sellerText\n";
    ?>
    
    Laurids Liljana replied 1 month ago 5 Members · 4 Replies
  • 4 Replies
  • Sandrine Vidya

    Member
    12/13/2024 at 10:34 am

    The script could be improved by implementing retries for failed requests. For instance, if the server responds with a temporary error or timeout, the script could attempt to fetch the page again before failing. This would help ensure reliability when dealing with unstable network connections or rate-limited websites.

  • Isaia Niko

    Member
    12/13/2024 at 11:16 am

    Adding functionality to handle dynamic content loaded via JavaScript would make the script more robust. Tools like Puppeteer or headless browsers could be integrated with PHP to ensure data that isn’t included in the initial HTML response can still be scraped. This would be especially useful for sections like dynamically updated reviews or countdown timers.

  • Laleh Korina

    Member
    12/14/2024 at 8:29 am

    To improve security, the script could include input sanitization and validate the URLs being scraped. This would prevent potential vulnerabilities if user input is passed directly to the script, ensuring that only valid, whitelisted domains are processed. Additionally, adding SSL verification for requests would enhance security when accessing HTTPS sites.

  • Laurids Liljana

    Member
    12/17/2024 at 7:13 am

    The script could be extended to store scraped data in a relational database like MySQL instead of simply printing it. By designing a proper schema for discounts, reviews, and seller details, the data could be queried and analyzed efficiently. For larger-scale applications, database indexing and optimization could further improve performance when querying large datasets.

Log in to reply.