-
Scraping job postings and locations using Ruby and Nokogiri
Scraping job postings and their locations is a common task for market research or recruitment platforms. Ruby’s Nokogiri library is excellent for parsing HTML and extracting data from job boards or company career pages. Most job listings are structured with job titles, company names, and locations in consistent HTML tags. For dynamic pages, using Ruby alongside Capybara can handle JavaScript-rendered content effectively. Additionally, handling pagination is crucial for scraping all available job postings
require 'nokogiri' require 'open-uri' url = 'https://example.com/jobs' doc = Nokogiri::HTML(URI.open(url)) doc.css('.job-item').each do |job| title = job.css('.job-title').text.strip location = job.css('.job-location').text.strip puts "Job Title: #{title}, Location: #{location}" end
For efficiency, adding error handling and validating extracted data ensures reliability. How do you address frequently changing HTML structures when scraping job data?
- This discussion was modified 1 week, 6 days ago by Katriona Felicyta.
Log in to reply.