Welcome back to Rail Bite University! If you've already learned how to scrape Google Maps and are now hitting a limit of 120 results, this guide is for you. We’ll explore advanced techniques to bypass this limitation using infinite scrolling and strategic search parameters, ensuring you can capture as much data as you need.
When scraping Google Maps, you might find that your scraper only returns up to 120 results. This limit is imposed by Google to prevent excessive scraping. But don't worry—there are ways to work around this limitation and gather more extensive data.
Google Maps uses infinite scroll to load more results as you navigate through the page. We can modify our spider to take advantage of this feature by continuously scrolling until all available results are loaded.
start_requests
function, enable Playwright to keep the webpage open and interact with it:from googlemaps.items import UniversityItemLoader
from scrapy import Spider, Request
from scrapy_playwright.page import PageMethod
class UniversitySpider(Spider):
name = "university"
def start_requests(self):
yield Request(
url="https://www.google.com/maps/search/university+in+nebraska+United+States?hl=en-US",
callback=self.parse_universities,
meta=dict(
playwright=True,
playwright_include_page=True,
playwright_page_methods=[
PageMethod("wait_for_selector", selector="form"),
PageMethod("click", selector="button"),
]
)
)
async def parse_universities(self, response):
page = response.meta["playwright_page"]
html = await page.content()
while True:
if "You've reached the end of the list." in html:
break
await page.get_by_role('feed').press("PageDown")
await page.wait_for_timeout(500)
html = await page.content()
await page.close()
sel = Selector(text=html)
links = sel.css('div[role="feed"] > div > div > a')
for link in links:
yield response.follow(
url=link,
callback=self.parse_university,
meta={"playwright": True}
)
def parse_university(self, response):
item = UniversityItemLoader(response=response)
item.add_css('name', 'h1::text')
item.add_xpath('rating', ".//*[contains(@aria-label,'stars')]/@aria-label")
item.add_xpath('phone', '//button[contains(@aria-label, "Phone:")]/@aria-label')
yield item.load_item()
This approach ensures that your spider keeps scrolling through the results until it reaches the end of the list, effectively bypassing the 120-result limit.
Even with infinite scroll, you might not get all the results you need. Google Maps might limit the results further depending on the breadth of your search query. The solution? Narrow down your search parameters by focusing on smaller regions or categories.
def parse_consent(self, response):
us_states = [
"Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado",
"Connecticut", "Delaware", "Florida", "Georgia", "Hawaii", "Idaho",
"Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana",
"Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota",
"Mississippi", "Missouri", "Montana", "Nebraska", "Nevada",
"New Hampshire", "New Jersey", "New Mexico", "New York",
"North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon",
"Pennsylvania", "Rhode Island", "South Carolina", "South Dakota",
"Tennessee", "Texas", "Utah", "Vermont", "Virginia", "Washington",
"West Virginia", "Wisconsin", "Wyoming"
]
for state in us_states:
yield Request(
url=f"https://www.google.com/maps/search/university+in+{quote_plus(state)}+United+States?hl=en-US",
callback=self.parse_universities,
meta=dict(
playwright=True,
playwright_include_page=True
)
)
This method ensures that your scraper gathers data from each state individually, circumventing the overall result limit and providing a more extensive dataset.
To speed up your scraping and reduce the load on your system, you can instruct Playwright to skip downloading unnecessary content, such as images.
settings.py
to skip loading images:def should_abort_request(request):
return (
request.resource_type == "image"
or any(
x in request.url
for x in [
"gen_204",
"/maps/vt",
]
)
or any(ext in request.url for ext in [".jpg", ".png", ".jpeg", ".gif", ".svg", ".webp", ".ico"])
)
PLAYWRIGHT_ABORT_REQUEST = should_abort_request
This optimization makes your scraping faster and more efficient, especially when dealing with large datasets.
By implementing infinite scroll and breaking down your search into smaller, manageable queries, you can effectively gather more than the standard 120 results from Google Maps. This advanced approach ensures you capture all the data you need, whether it's for market research, lead generation, or any other application.
Happy scraping!🎉