Scrape Google My Business Data Using Python: A Step-by-Step Guide

Table of content

Google My Business is a crucial tool for businesses to manage their online presence. In this tutorial, we’ll show you how to build a Google My Business scraper using Python. You’ll learn how to extract valuable business information such as business names, reviews, ratings, contact details, and more. This tool will help you gather insights and manage your business’s online reputation effectively.

What You’ll Need:

  • Python 3.x
  • Playwright library for web scraping
  • CSV for storing the scraped data
  • Proxy support for anonymity
  • Basic understanding of HTML and CSS selectors

Introduction

Google My Business is an essential platform that helps businesses appear in local search results on Google, including Google Maps. With millions of businesses listing their details online, scraping data from Google My Business can provide valuable insights into local markets, business performance, and competition.

In this step-by-step guide, we will walk you through creating a scraper using Python and the Playwright library to extract business data, including:

  • Business Name
  • Address
  • Phone Number
  • Website
  • Ratings & Reviews

The Python code provided will allow you to scrape Google My Business data directly from Google Search results, store it in a CSV file, and use a proxy to enhance your scraping process and avoid getting blocked by Google.

Prerequisites

Before we dive into the code, make sure you have Python 3.x installed on your computer. You will also need to install the Playwright library, which is a powerful web automation tool for Python.

Run the following command to install Playwright:

pip install playwright 
python -m playwright install

You will also need a proxy service to hide your identity while scraping. This will prevent your IP from being blocked by Google. If you don’t have one, consider using paid proxy services like rayobyte.

Here’s how you can set up the proxy in your script.

Step-by-Step Guide

Step 1: Import Libraries

We will use the sync_playwright function from the Playwright library. This will allow us to interact with web pages as if we were using a browser. Additionally, we’ll import the CSV library to save the scraped data.

from playwright.sync_api import sync_playwright
import csv

Step 2: Define the Scrape Function

The scrape_page() function is designed to scrape specific information from a Google My Business listing, such as:

  • Business Name
  • Address
  • Phone Number
  • Website
  • Ratings and Reviews

Here’s how it works:

def scrape_page(page, writer):
    # Scrape restaurant-type business
    all_business = page.query_selector_all(".rllt__details")
    
    for business in all_business:
        business.click()
        page.wait_for_timeout(2000)  # Wait for 2 seconds

        # Extract business info
        business_name = page.query_selector('.SPZz6b')
        business_name = business_name.text_content() if business_name else "not found"
        
        business_address = page.query_selector(".LrzXr")
        business_address = business_address.text_content() if business_address else "not found"
        
        try:
            business_phone_number = page.query_selector(".LrzXr.zdqRlf.kno-fv")
            business_phone_number = business_phone_number.text_content() if business_phone_number else "not found"
        except:
            business_phone_number = "not found"

        try:
            business_website = page.query_selector(".xFAlBc")
            business_website = business_website.text_content() if business_website else "not found"
        except:
            business_website = "not found"

        try:
            rating_reviews = page.query_selector(".TLYLSe.MaBy9")
            rating_reviews = rating_reviews.text_content() if rating_reviews else "not found"
        except:
            rating_reviews = "not found"

        # Store the scraped data in CSV
        writer.writerow([business_name, business_address, business_phone_number, business_website, rating_reviews])
        print(f"Data saved: {business_name}, {business_address}, {business_phone_number}, {business_website}, {rating_reviews}n")

You will also need a proxy service to hide your identity while scraping. This will prevent your IP from being blocked by Google. If you don’t have one, consider using paid proxy services like rayobyte.

Here’s how you can set up the proxy in your script.

Step 3: Main Function to Scrape Data and Use Proxy

The main() function will use Playwright to navigate through the pages, scrape the data, and store it in a CSV file. It also includes proxy support to help hide your identity during the scraping process.

def main():
    with sync_playwright() as p:
        # Set up the proxy and the browser context
        browser = p.chromium.launch(headless=False, slow_mo=50)
        context = browser.new_context(
            viewport={"width": 1920, "height": 1080},
            device_scale_factor=1,
            proxy={ 
               "server": "",   # Replace with your proxy server address and port
                "username": "",         # Replace with your proxy username (if required)
                "password": ""          # Replace with your proxy password (if required)
            }
        )
        
        page = context.new_page()
        url = input("Give URL and press enter: ").strip()
        page.goto(url)

        # Open CSV file to store the scraped data
        with open('google_my_business_data.csv', mode='w', newline='', encoding='utf-8') as file:
            writer = csv.writer(file)
            writer.writerow(["Business Name", "Business Address", "Phone Number", "Website", "Ratings & Reviews"])

            while True:
                page.wait_for_timeout(1000)  # Wait for 1 second
                scrape_page(page, writer)

                try:
                    # Check for and click the next page button
                    next_page = page.query_selector(".oeN89d")
                    if next_page:
                        next_page.click()
                        page.wait_for_timeout(2000)  # Wait for 2 seconds
                    else:
                        print("No more pages.")
                        break
                except Exception as e:
                    print("Error navigating to next page:", e)
                    break

        browser.close()

if __name__ == "__main__":
    main()

Here is a screenshot of how the CSV result looks

Screenshot 2024 12 16 195105

Step 4: Running the Script

Once the script is ready, save it as a Python file (e.g., google_business_scrape.py) and run it. The script will prompt you for a Google My Business URL, scrape the listings, and store the information in a CSV file. You can easily modify the script to handle more complex tasks or scrape more details.

Important Notes

1. Google’s Continuous HTML Updates

Google frequently updates the structure of its HTML pages. This means that the CSS selectors used in the scraper may not always work. If the script stops working or throws errors, you may need to update the CSS selectors in the script to match the new structure. Here are some things to check:

  • Element Class Names: These may change over time. The script uses class names like .rllt__details or .LrzXr. If Google changes these, the script won’t be able to find the data.
  • Element Structure: The order or position of certain elements on the page may change, requiring updates to the scraper.

To fix these issues, inspect the page elements using a browser’s developer tools (F12) to find the new CSS selectors and update the script accordingly.

2. Legal Considerations

Scraping Google My Business data may violate Google’s terms of service. Always ensure that you are scraping data in accordance with the relevant legal guidelines and the site’s terms.

3. Proxy Usage

Using proxies is important to avoid being blocked by Google while scraping. You can use a proxy service to change your IP address for each request, thus ensuring anonymity. Here’s an example of how to configure the proxy in Playwright:

context = browser.new_context(
             proxy={ "server": "server_name:port",
                     "username": "username",
                     "password": "password"}
           )

Make sure to replace server_name:port, username, and password with your actual proxy details. Most proxy services will provide these details when you sign up or subscribe to their services.

Conclusion

In this guide, we showed how to build a Google My Business scraper using Python and Playwright. This script extracts business information like name, address, phone number, website, and ratings, and stores it in a CSV file. Additionally, we integrated proxy support to help prevent blocking during scraping.

Remember that Google frequently updates its HTML structure, so keep your CSS selectors up to date. Always respect legal guidelines and Google’s terms of service when scraping data from their platform.

Happy scraping!

Responses

Related Projects

google shopping scraper python
yahoo search
Bing search 1
Google Maps