News Feed Forums General Web Scraping Extracting property images and prices with PHP and DOMDocument

  • Extracting property images and prices with PHP and DOMDocument

    Posted by Fanni Marija on 12/18/2024 at 11:02 am

    Scraping property images and prices from real estate websites is a common use case for data aggregation or market analysis. PHP’s DOMDocument and DOMXPath libraries provide robust methods for extracting structured data. For static pages, you can parse HTML using these libraries and extract elements like image URLs and prices. If the site uses JavaScript for rendering, integrating PHP with tools like cURL to fetch AJAX responses is necessary. Additionally, handling multiple image sizes or formats can add complexity to the scraping process.
    Here’s an example of extracting property images and prices using PHP:

    <?php
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, "https://example.com/properties");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $html = curl_exec($ch);
    curl_close($ch);
    $dom = new DOMDocument();
    @$dom->loadHTML($html);
    $xpath = new DOMXPath($dom);
    $properties = $xpath->query("//div[@class='property-item']");
    foreach ($properties as $property) {
        $title = $xpath->query(".//h2[@class='property-title']", $property)->item(0)->nodeValue;
        $price = $xpath->query(".//span[@class='property-price']", $property)->item(0)->nodeValue;
        $image = $xpath->query(".//img[@class='property-image']/@src", $property)->item(0)->nodeValue;
        echo "Title: $title, Price: $price, Image: $image\n";
    }
    ?>
    

    For better performance, caching responses and using multi-cURL for parallel requests can significantly speed up scraping. How do you ensure image URLs are valid and properly formatted?

    Hirune Islam replied 2 days, 13 hours ago 2 Members · 1 Reply
  • 1 Reply
  • Hirune Islam

    Member
    12/20/2024 at 11:51 am

    I use PHP’s filter_var function to validate and sanitize image URLs. This ensures the URLs are safe and usable for downloading images later.

Log in to reply.