News Feed Forums General Web Scraping How to scrape car listings from AutoScout24.com using Python?

  • How to scrape car listings from AutoScout24.com using Python?

    Posted by Linda Ylva on 12/21/2024 at 7:31 am

    Scraping car listings from AutoScout24.com using Python allows you to gather data such as car models, prices, and mileage, providing valuable insights into the automotive market. AutoScout24 is one of Europe’s largest online car marketplaces, making it a great source for analyzing pricing trends and car availability. By using Python’s HTTP libraries, you can fetch content from the website and extract relevant information by parsing the HTML structure. This involves identifying the tags or classes that contain data like car names, prices, and specifications, and automating the process to handle multiple listings efficiently.
    Pagination is essential when scraping AutoScout24, as car listings are distributed across multiple pages. Automating navigation through the “Next” button ensures that the scraper collects all available data. Adding random delays between requests can mimic human behavior and reduce detection risks. Once extracted, storing the data in structured formats like CSV or JSON allows for easier analysis and comparison. Below is an example script for extracting car data from AutoScout24.

    import requests
    from html.parser import HTMLParser
    class AutoScoutParser(HTMLParser):
        def __init__(self):
            super().__init__()
            self.in_car_name = False
            self.in_car_price = False
            self.cars = []
            self.current_car = {}
        def handle_starttag(self, tag, attrs):
            attrs = dict(attrs)
            if tag == "h2" and "class" in attrs and "car-name" in attrs["class"]:
                self.in_car_name = True
            if tag == "span" and "class" in attrs and "price" in attrs["class"]:
                self.in_car_price = True
        def handle_endtag(self, tag):
            if self.in_car_name and tag == "h2":
                self.in_car_name = False
            if self.in_car_price and tag == "span":
                self.in_car_price = False
        def handle_data(self, data):
            if self.in_car_name:
                self.current_car["name"] = data.strip()
            if self.in_car_price:
                self.current_car["price"] = data.strip()
                self.cars.append(self.current_car)
                self.current_car = {}
    url = "https://www.autoscout24.com/"
    response = requests.get(url)
    parser = AutoScoutParser()
    parser.feed(response.text)
    for car in parser.cars:
        print(f"Car: {car['name']}, Price: {car['price']}")
    

    This script parses the HTML content from AutoScout24 to extract car names and prices. Pagination logic can be added to navigate through all pages, ensuring that all listings are captured. Adding random delays between requests can prevent detection and ensure smooth scraping sessions.

    Giiwedin Vesna replied 5 days, 9 hours ago 3 Members · 2 Replies
  • 2 Replies
  • Kjerstin Thamina

    Member
    01/01/2025 at 10:44 am

    Pagination is critical for collecting data from all product listings on Otto.de. Automating navigation through “Next” buttons ensures that no products are missed. Adding random delays between requests mimics human behavior, reducing the likelihood of detection. This functionality enhances the scraper’s effectiveness and makes it ideal for collecting comprehensive datasets. Proper pagination handling allows for more detailed analysis of pricing trends across categories.

  • Giiwedin Vesna

    Member
    01/16/2025 at 2:15 pm

    Handling pagination is crucial for collecting all car listings from AutoScout24.com. Cars are often distributed across multiple pages, so automating navigation ensures that no data is missed. Random delays between requests mimic human behavior, reducing the chances of detection. Pagination handling allows for a more comprehensive dataset, which is essential for analyzing car pricing trends. Properly navigating through all pages ensures that both common and rare listings are captured.

Log in to reply.