All Courses
Scraping

Scraping Infinite Google Maps Results

Scraping More Than 120 Results on Google Maps

Welcome back to Rail Bite University! If you've already learned how to scrape Google Maps and are now hitting a limit of 120 results, this guide is for you. We’ll explore advanced techniques to bypass this limitation using infinite scrolling and strategic search parameters, ensuring you can capture as much data as you need.

YouTube Thumbnail

The Challenge: Breaking the 120 Results Barrier

When scraping Google Maps, you might find that your scraper only returns up to 120 results. This limit is imposed by Google to prevent excessive scraping. But don't worry—there are ways to work around this limitation and gather more extensive data.

Step 1: Implementing Infinite Scroll

Google Maps uses infinite scroll to load more results as you navigate through the page. We can modify our spider to take advantage of this feature by continuously scrolling until all available results are loaded.

  1. Modify the Spider for Infinite Scroll:In your spider’s start_requests function, enable Playwright to keep the webpage open and interact with it:
from googlemaps.items import UniversityItemLoader
from scrapy import Spider, Request
from scrapy_playwright.page import PageMethod

class UniversitySpider(Spider):
    name = "university"

    def start_requests(self):
        yield Request(
            url="https://www.google.com/maps/search/university+in+nebraska+United+States?hl=en-US",
            callback=self.parse_universities,
            meta=dict(
                playwright=True,
                playwright_include_page=True,
                playwright_page_methods=[
                    PageMethod("wait_for_selector", selector="form"),
                    PageMethod("click", selector="button"),
                ]
            )
        )

    async def parse_universities(self, response):
        page = response.meta["playwright_page"]
        html = await page.content()
        while True:
            if "You've reached the end of the list." in html:
                break
            await page.get_by_role('feed').press("PageDown")
            await page.wait_for_timeout(500)
            html = await page.content()

        await page.close()

        sel = Selector(text=html)
        links = sel.css('div[role="feed"] > div > div > a')
        for link in links:
            yield response.follow(
                url=link,
                callback=self.parse_university,
                meta={"playwright": True}
            )

    def parse_university(self, response):
        item = UniversityItemLoader(response=response)
        item.add_css('name', 'h1::text')
        item.add_xpath('rating', ".//*[contains(@aria-label,'stars')]/@aria-label")
        item.add_xpath('phone', '//button[contains(@aria-label, "Phone:")]/@aria-label')
        yield item.load_item()

This approach ensures that your spider keeps scrolling through the results until it reaches the end of the list, effectively bypassing the 120-result limit.

Step 2: Expanding Your Search Scope

Even with infinite scroll, you might not get all the results you need. Google Maps might limit the results further depending on the breadth of your search query. The solution? Narrow down your search parameters by focusing on smaller regions or categories.

  1. State-by-State Search: Instead of searching for universities across the entire United States in one go, break down your search by state. This allows Google Maps to return more comprehensive results for each region:
def parse_consent(self, response):
    us_states = [
        "Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado",
        "Connecticut", "Delaware", "Florida", "Georgia", "Hawaii", "Idaho",
        "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana",
        "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota",
        "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada",
        "New Hampshire", "New Jersey", "New Mexico", "New York",
        "North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon",
        "Pennsylvania", "Rhode Island", "South Carolina", "South Dakota",
        "Tennessee", "Texas", "Utah", "Vermont", "Virginia", "Washington",
        "West Virginia", "Wisconsin", "Wyoming"
    ]

    for state in us_states:
        yield Request(
            url=f"https://www.google.com/maps/search/university+in+{quote_plus(state)}+United+States?hl=en-US",
            callback=self.parse_universities,
            meta=dict(
                playwright=True,
                playwright_include_page=True
            )
        )

This method ensures that your scraper gathers data from each state individually, circumventing the overall result limit and providing a more extensive dataset.

Step 3: Optimizing the Scraping Process

To speed up your scraping and reduce the load on your system, you can instruct Playwright to skip downloading unnecessary content, such as images.

  1. Abort Unnecessary Requests: Modify your settings.py to skip loading images:
def should_abort_request(request):
    return (
        request.resource_type == "image"
        or any(
            x in request.url
            for x in [
                "gen_204",
                "/maps/vt",
            ]
        )
        or any(ext in request.url for ext in [".jpg", ".png", ".jpeg", ".gif", ".svg", ".webp", ".ico"])
    )

PLAYWRIGHT_ABORT_REQUEST = should_abort_request

This optimization makes your scraping faster and more efficient, especially when dealing with large datasets.

Conclusion

By implementing infinite scroll and breaking down your search into smaller, manageable queries, you can effectively gather more than the standard 120 results from Google Maps. This advanced approach ensures you capture all the data you need, whether it's for market research, lead generation, or any other application.

Happy scraping!🎉

See What Makes Rayobyte Special For Yourself!