News Feed Forums General Web Scraping How to scrape car listings from AutoScout24.com using Python?

  • How to scrape car listings from AutoScout24.com using Python?

    Posted by Linda Ylva on 12/21/2024 at 7:31 am

    Scraping car listings from AutoScout24.com using Python allows you to gather data such as car models, prices, and mileage, providing valuable insights into the automotive market. AutoScout24 is one of Europe’s largest online car marketplaces, making it a great source for analyzing pricing trends and car availability. By using Python’s HTTP libraries, you can fetch content from the website and extract relevant information by parsing the HTML structure. This involves identifying the tags or classes that contain data like car names, prices, and specifications, and automating the process to handle multiple listings efficiently.
    Pagination is essential when scraping AutoScout24, as car listings are distributed across multiple pages. Automating navigation through the “Next” button ensures that the scraper collects all available data. Adding random delays between requests can mimic human behavior and reduce detection risks. Once extracted, storing the data in structured formats like CSV or JSON allows for easier analysis and comparison. Below is an example script for extracting car data from AutoScout24.

    import requests
    from html.parser import HTMLParser
    class AutoScoutParser(HTMLParser):
        def __init__(self):
            super().__init__()
            self.in_car_name = False
            self.in_car_price = False
            self.cars = []
            self.current_car = {}
        def handle_starttag(self, tag, attrs):
            attrs = dict(attrs)
            if tag == "h2" and "class" in attrs and "car-name" in attrs["class"]:
                self.in_car_name = True
            if tag == "span" and "class" in attrs and "price" in attrs["class"]:
                self.in_car_price = True
        def handle_endtag(self, tag):
            if self.in_car_name and tag == "h2":
                self.in_car_name = False
            if self.in_car_price and tag == "span":
                self.in_car_price = False
        def handle_data(self, data):
            if self.in_car_name:
                self.current_car["name"] = data.strip()
            if self.in_car_price:
                self.current_car["price"] = data.strip()
                self.cars.append(self.current_car)
                self.current_car = {}
    url = "https://www.autoscout24.com/"
    response = requests.get(url)
    parser = AutoScoutParser()
    parser.feed(response.text)
    for car in parser.cars:
        print(f"Car: {car['name']}, Price: {car['price']}")
    

    This script parses the HTML content from AutoScout24 to extract car names and prices. Pagination logic can be added to navigate through all pages, ensuring that all listings are captured. Adding random delays between requests can prevent detection and ensure smooth scraping sessions.

    Linda Ylva replied 1 day, 6 hours ago 1 Member · 0 Replies
  • 0 Replies

Sorry, there were no replies found.

Log in to reply.