-
Extracting property images and prices with PHP and DOMDocument
Scraping property images and prices from real estate websites is a common use case for data aggregation or market analysis. PHP’s DOMDocument and DOMXPath libraries provide robust methods for extracting structured data. For static pages, you can parse HTML using these libraries and extract elements like image URLs and prices. If the site uses JavaScript for rendering, integrating PHP with tools like cURL to fetch AJAX responses is necessary. Additionally, handling multiple image sizes or formats can add complexity to the scraping process.
Here’s an example of extracting property images and prices using PHP:<?php $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "https://example.com/properties"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $html = curl_exec($ch); curl_close($ch); $dom = new DOMDocument(); @$dom->loadHTML($html); $xpath = new DOMXPath($dom); $properties = $xpath->query("//div[@class='property-item']"); foreach ($properties as $property) { $title = $xpath->query(".//h2[@class='property-title']", $property)->item(0)->nodeValue; $price = $xpath->query(".//span[@class='property-price']", $property)->item(0)->nodeValue; $image = $xpath->query(".//img[@class='property-image']/@src", $property)->item(0)->nodeValue; echo "Title: $title, Price: $price, Image: $image\n"; } ?>
For better performance, caching responses and using multi-cURL for parallel requests can significantly speed up scraping. How do you ensure image URLs are valid and properly formatted?
Log in to reply.