News Feed Forums General Web Scraping Extracting property images and prices with PHP and DOMDocument

  • Extracting property images and prices with PHP and DOMDocument

    Posted by Fanni Marija on 12/18/2024 at 11:02 am

    Scraping property images and prices from real estate websites is a common use case for data aggregation or market analysis. PHP’s DOMDocument and DOMXPath libraries provide robust methods for extracting structured data. For static pages, you can parse HTML using these libraries and extract elements like image URLs and prices. If the site uses JavaScript for rendering, integrating PHP with tools like cURL to fetch AJAX responses is necessary. Additionally, handling multiple image sizes or formats can add complexity to the scraping process.
    Here’s an example of extracting property images and prices using PHP:

    <?php
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, "https://example.com/properties");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $html = curl_exec($ch);
    curl_close($ch);
    $dom = new DOMDocument();
    @$dom->loadHTML($html);
    $xpath = new DOMXPath($dom);
    $properties = $xpath->query("//div[@class='property-item']");
    foreach ($properties as $property) {
        $title = $xpath->query(".//h2[@class='property-title']", $property)->item(0)->nodeValue;
        $price = $xpath->query(".//span[@class='property-price']", $property)->item(0)->nodeValue;
        $image = $xpath->query(".//img[@class='property-image']/@src", $property)->item(0)->nodeValue;
        echo "Title: $title, Price: $price, Image: $image\n";
    }
    ?>
    

    For better performance, caching responses and using multi-cURL for parallel requests can significantly speed up scraping. How do you ensure image URLs are valid and properly formatted?

    Satyendra replied 2 days, 3 hours ago 4 Members · 3 Replies
  • 3 Replies
  • Hirune Islam

    Member
    12/20/2024 at 11:51 am

    I use PHP’s filter_var function to validate and sanitize image URLs. This ensures the URLs are safe and usable for downloading images later.

  • Martyn Ramadan

    Member
    01/03/2025 at 7:16 am

    For dynamic content, I use JavaScript libraries or cURL to fetch JSON responses. This method avoids parsing HTML for every request and improves efficiency.

  • Satyendra

    Administrator
    01/20/2025 at 1:43 pm

    To manage large-scale scraping, I store images in cloud storage while maintaining metadata like titles and prices in a database for easy retrieval.

Log in to reply.