-
How to scrape rental property data from Trulia.com using Ruby?
Scraping rental property data from Trulia.com using Ruby can help you collect useful information like property addresses, rental prices, and key features. By using Ruby’s open-uri library to fetch HTML content and the nokogiri gem for parsing, you can extract structured data from the page. The script navigates through the page structure, identifies the necessary elements such as property cards, and extracts specific details. Ruby provides an easy and efficient way to handle this process for static web pages. Below is an example Ruby script to scrape rental property information from Trulia.
require 'open-uri' require 'nokogiri' # Target URL url = 'https://www.trulia.com/for_rent/San_Francisco,CA' html = URI.open(url).read # Parse HTML doc = Nokogiri::HTML(html) # Extract property details doc.css('.Grid__CellBox-sc-1njij7e-0').each do |property| name = property.css('.Text__TextBase-sc-1cait9d-0').text.strip rescue 'No name available' price = property.css('.Text__TextBase-sc-1cait9d-0').text.strip rescue 'No price available' details = property.css('.Text__TextBase-sc-1cait9d-0').text.strip rescue 'No details available' puts "Name: #{name}, Price: #{price}, Details: #{details}" end
This script uses open-uri to retrieve the Trulia rental listings page and nokogiri to parse the HTML structure. It extracts property names, prices, and features using CSS selectors, ensuring that default messages are returned for missing elements. To scrape data across multiple pages, you can implement pagination handling by detecting and following the “Next” button. Adding random delays between requests helps prevent detection by anti-scraping mechanisms, and storing the data in a structured format such as CSV or a database ensures ease of analysis.
Log in to reply.