Extracting property images and prices with PHP and DOMDocument

Fanni Marija · 2024-12-18T11:02:10+00:00

Scraping property images and prices from real estate websites is a common use case for data aggregation or market analysis. PHP’s DOMDocument and DOMXPath libraries provide robust methods for extracting structured data. For static pages, you can parse HTML using these libraries and extract elements like image URLs and prices. If the site uses JavaScript for rendering, integrating PHP with tools like cURL to fetch AJAX responses is necessary. Additionally, handling multiple image sizes or formats can add complexity to the scraping process.Here’s an example of extracting property images and prices using PHP:<?php $ch curl_init();curl_setopt($ch, CURLOPT_URL, "https://example.com/properties");curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);$html curl_exec($ch);curl_close($ch);$dom new DOMDocument();@$dom->loadHTML($html);$xpath new DOMXPath($dom);$properties $xpath->query("//div");foreach ($properties as $property) { $title $xpath->query(".//h2", $property)->item(0)->nodeValue; $price $xpath->query(".//span", $property)->item(0)->nodeValue; $image $xpath->query(".//img/@src", $property)->item(0)->nodeValue; echo "Title: $title, Price: $price, Image: $image\n";}?>For better performance, caching responses and using multi-cURL for parallel requests can significantly speed up scraping. How do you ensure image URLs are valid and properly formatted?

General Web Scraping

Extracting property images and prices with PHP and DOMDocument

Posted by Fanni Marija on 12/18/2024 at 11:02 am
Scraping property images and prices from real estate websites is a common use case for data aggregation or market analysis. PHP’s DOMDocument and DOMXPath libraries provide robust methods for extracting structured data. For static pages, you can parse HTML using these libraries and extract elements like image URLs and prices. If the site uses JavaScript for rendering, integrating PHP with tools like cURL to fetch AJAX responses is necessary. Additionally, handling multiple image sizes or formats can add complexity to the scraping process.
Here’s an example of extracting property images and prices using PHP:
```
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://example.com/properties");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$properties = $xpath->query("//div[@class='property-item']");
foreach ($properties as $property) {
    $title = $xpath->query(".//h2[@class='property-title']", $property)->item(0)->nodeValue;
    $price = $xpath->query(".//span[@class='property-price']", $property)->item(0)->nodeValue;
    $image = $xpath->query(".//img[@class='property-image']/@src", $property)->item(0)->nodeValue;
    echo "Title: $title, Price: $price, Image: $image\n";
}
?>
```
For better performance, caching responses and using multi-cURL for parallel requests can significantly speed up scraping. How do you ensure image URLs are valid and properly formatted?
Satyendra replied 2 months, 1 week ago 4 Members · 3 Replies
3 Replies

Hirune Islam

Member
12/20/2024 at 11:51 am

I use PHP’s filter_var function to validate and sanitize image URLs. This ensures the URLs are safe and usable for downloading images later.
Martyn Ramadan

Member
01/03/2025 at 7:16 am

For dynamic content, I use JavaScript libraries or cURL to fetch JSON responses. This method avoids parsing HTML for every request and improves efficiency.
Satyendra

Administrator
01/20/2025 at 1:43 pm

To manage large-scale scraping, I store images in cloud storage while maintaining metadata like titles and prices in a database for easy retrieval.

Extracting property images and prices with PHP and DOMDocument

Hirune Islam

Martyn Ramadan

Satyendra