-
How to scrape car listings from AutoScout24.com using Python?
Scraping car listings from AutoScout24.com using Python allows you to gather data such as car models, prices, and mileage, providing valuable insights into the automotive market. AutoScout24 is one of Europe’s largest online car marketplaces, making it a great source for analyzing pricing trends and car availability. By using Python’s HTTP libraries, you can fetch content from the website and extract relevant information by parsing the HTML structure. This involves identifying the tags or classes that contain data like car names, prices, and specifications, and automating the process to handle multiple listings efficiently.
Pagination is essential when scraping AutoScout24, as car listings are distributed across multiple pages. Automating navigation through the “Next” button ensures that the scraper collects all available data. Adding random delays between requests can mimic human behavior and reduce detection risks. Once extracted, storing the data in structured formats like CSV or JSON allows for easier analysis and comparison. Below is an example script for extracting car data from AutoScout24.import requests from html.parser import HTMLParser class AutoScoutParser(HTMLParser): def __init__(self): super().__init__() self.in_car_name = False self.in_car_price = False self.cars = [] self.current_car = {} def handle_starttag(self, tag, attrs): attrs = dict(attrs) if tag == "h2" and "class" in attrs and "car-name" in attrs["class"]: self.in_car_name = True if tag == "span" and "class" in attrs and "price" in attrs["class"]: self.in_car_price = True def handle_endtag(self, tag): if self.in_car_name and tag == "h2": self.in_car_name = False if self.in_car_price and tag == "span": self.in_car_price = False def handle_data(self, data): if self.in_car_name: self.current_car["name"] = data.strip() if self.in_car_price: self.current_car["price"] = data.strip() self.cars.append(self.current_car) self.current_car = {} url = "https://www.autoscout24.com/" response = requests.get(url) parser = AutoScoutParser() parser.feed(response.text) for car in parser.cars: print(f"Car: {car['name']}, Price: {car['price']}")
This script parses the HTML content from AutoScout24 to extract car names and prices. Pagination logic can be added to navigate through all pages, ensuring that all listings are captured. Adding random delays between requests can prevent detection and ensure smooth scraping sessions.
Log in to reply.