{"id":2746,"date":"2025-12-31T12:22:23","date_gmt":"2025-12-31T12:22:23","guid":{"rendered":"https:\/\/rayobyte.com\/community\/?post_type=scraping_project&#038;p=2746"},"modified":"2025-12-31T12:22:23","modified_gmt":"2025-12-31T12:22:23","slug":"scrape-google-my-business-data-using-python-a-step-by-step-guide","status":"publish","type":"scraping_project","link":"https:\/\/rayobyte.com\/community\/scraping-project\/scrape-google-my-business-data-using-python-a-step-by-step-guide\/","title":{"rendered":"Scrape Google My Business Data Using Python: A Step-by-Step Guide"},"content":{"rendered":"<h1>Table of content<\/h1>\n<ul>\n<li><a href=\"#Introduction\">Introduction<\/a><\/li>\n<li><a href=\"#Prerequisites\">Prerequisites<\/a><\/li>\n<li><a href=\"#Import Libraries\">Import Libraries<\/a><\/li>\n<li><a href=\"#Define the Scrape Function\">Define the Scrape Function<\/a><\/li>\n<li><a href=\"#Step-by-Step Guide\">Step-by-Step Guide<\/a><\/li>\n<li><a href=\"#Important Notes\">Important Notes<\/a><\/li>\n<li><a href=\"#Conclusion\">Conclusion<\/a><\/li>\n<\/ul>\n<p><strong>Google My Business<\/strong> is a crucial tool for businesses to manage their online presence. In this tutorial, we\u2019ll show you how to build a <strong>Google My Business scraper<\/strong> using Python. You\u2019ll learn how to extract valuable business information such as business names, reviews, ratings, contact details, and more. This tool will help you gather insights and manage your business\u2019s online reputation effectively.<\/p>\n<h3>What You&#8217;ll Need:<\/h3>\n<ul>\n<li><strong>Python 3.x<\/strong><\/li>\n<li><strong>Playwright<\/strong> library for web scraping<\/li>\n<li><strong>CSV<\/strong> for storing the scraped data<\/li>\n<li><strong>Proxy<\/strong> support for anonymity<\/li>\n<li>Basic understanding of HTML and CSS selectors<\/li>\n<\/ul>\n<h3 id=\"Introduction\">Introduction<\/h3>\n<p>Google My Business is an essential platform that helps businesses appear in local search results on Google, including Google Maps. With millions of businesses listing their details online, scraping data from Google My Business can provide valuable insights into local markets, business performance, and competition.<\/p>\n<p>In this step-by-step guide, we will walk you through creating a scraper using <strong>Python<\/strong> and the <strong>Playwright<\/strong> library to extract business data, including:<\/p>\n<ul>\n<li>Business Name<\/li>\n<li>Address<\/li>\n<li>Phone Number<\/li>\n<li>Website<\/li>\n<li>Ratings &amp; Reviews<\/li>\n<\/ul>\n<p>The Python code provided will allow you to scrape Google My Business data directly from Google Search results, store it in a <strong>CSV<\/strong> file, and use a <strong>proxy<\/strong> to enhance your scraping process and avoid getting blocked by Google.<\/p>\n<h3 id=\"Prerequisites\">Prerequisites<\/h3>\n<p>Before we dive into the code, make sure you have <strong>Python 3.x<\/strong> installed on your computer. You will also need to install the <strong>Playwright<\/strong> library, which is a powerful web automation tool for Python.<\/p>\n<p>Run the following command to install Playwright:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">pip install playwright \npython -m playwright install<\/pre>\n<p>You will also need a <strong>proxy service<\/strong> to hide your identity while scraping. This will prevent your IP from being blocked by Google. If you don\u2019t have one, consider using paid proxy services like <strong>rayobyte<\/strong>.<\/p>\n<p>Here\u2019s how you can set up the proxy in your script.<\/p>\n<h3 id=\"Step-by-Step Guide\">Step-by-Step Guide<\/h3>\n<h4 id=\"Import Libraries\">Step 1: Import Libraries<\/h4>\n<p>We will use the <strong>sync_playwright<\/strong> function from the Playwright library. This will allow us to interact with web pages as if we were using a browser. Additionally, we&#8217;ll import the <strong>CSV<\/strong> library to save the scraped data.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">from playwright.sync_api import sync_playwright\nimport csv\n<\/pre>\n<h4 id=\"Define the Scrape Function\">Step 2: Define the Scrape Function<\/h4>\n<p>The <code>scrape_page()<\/code> function is designed to scrape specific information from a Google My Business listing, such as:<\/p>\n<ul>\n<li>Business Name<\/li>\n<li>Address<\/li>\n<li>Phone Number<\/li>\n<li>Website<\/li>\n<li>Ratings and Reviews<\/li>\n<\/ul>\n<p>Here\u2019s how it works:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">def scrape_page(page, writer):\n    # Scrape restaurant-type business\n    all_business = page.query_selector_all(\".rllt__details\")\n    \n    for business in all_business:\n        business.click()\n        page.wait_for_timeout(2000)  # Wait for 2 seconds\n\n        # Extract business info\n        business_name = page.query_selector('.SPZz6b')\n        business_name = business_name.text_content() if business_name else \"not found\"\n        \n        business_address = page.query_selector(\".LrzXr\")\n        business_address = business_address.text_content() if business_address else \"not found\"\n        \n        try:\n            business_phone_number = page.query_selector(\".LrzXr.zdqRlf.kno-fv\")\n            business_phone_number = business_phone_number.text_content() if business_phone_number else \"not found\"\n        except:\n            business_phone_number = \"not found\"\n\n        try:\n            business_website = page.query_selector(\".xFAlBc\")\n            business_website = business_website.text_content() if business_website else \"not found\"\n        except:\n            business_website = \"not found\"\n\n        try:\n            rating_reviews = page.query_selector(\".TLYLSe.MaBy9\")\n            rating_reviews = rating_reviews.text_content() if rating_reviews else \"not found\"\n        except:\n            rating_reviews = \"not found\"\n\n        # Store the scraped data in CSV\n        writer.writerow([business_name, business_address, business_phone_number, business_website, rating_reviews])\n        print(f\"Data saved: {business_name}, {business_address}, {business_phone_number}, {business_website}, {rating_reviews}n\")\n<\/pre>\n<p>You will also need a <strong>proxy service<\/strong> to hide your identity while scraping. This will prevent your IP from being blocked by Google. If you don\u2019t have one, consider using paid proxy services like <a href=\"https:\/\/rayobyte.com\"><strong>rayobyte<\/strong><\/a>.<\/p>\n<p>Here\u2019s how you can set up the proxy in your script.<\/p>\n<h4>Step 3: Main Function to Scrape Data and Use Proxy<\/h4>\n<p>The <strong>main()<\/strong> function will use Playwright to navigate through the pages, scrape the data, and store it in a CSV file. It also includes <strong>proxy support<\/strong> to help hide your identity during the scraping process.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">def main():\n    with sync_playwright() as p:\n        # Set up the proxy and the browser context\n        browser = p.chromium.launch(headless=False, slow_mo=50)\n        context = browser.new_context(\n            viewport={\"width\": 1920, \"height\": 1080},\n            device_scale_factor=1,\n            proxy={ \n               \"server\": \"\",   # Replace with your proxy server address and port\n                \"username\": \"\",         # Replace with your proxy username (if required)\n                \"password\": \"\"          # Replace with your proxy password (if required)\n            }\n        )\n        \n        page = context.new_page()\n        url = input(\"Give URL and press enter: \").strip()\n        page.goto(url)\n\n        # Open CSV file to store the scraped data\n        with open('google_my_business_data.csv', mode='w', newline='', encoding='utf-8') as file:\n            writer = csv.writer(file)\n            writer.writerow([\"Business Name\", \"Business Address\", \"Phone Number\", \"Website\", \"Ratings &amp; Reviews\"])\n\n            while True:\n                page.wait_for_timeout(1000)  # Wait for 1 second\n                scrape_page(page, writer)\n\n                try:\n                    # Check for and click the next page button\n                    next_page = page.query_selector(\".oeN89d\")\n                    if next_page:\n                        next_page.click()\n                        page.wait_for_timeout(2000)  # Wait for 2 seconds\n                    else:\n                        print(\"No more pages.\")\n                        break\n                except Exception as e:\n                    print(\"Error navigating to next page:\", e)\n                    break\n\n        browser.close()\n\nif __name__ == \"__main__\":\n    main()\n<\/pre>\n<p>Here is a screenshot of how the CSV result looks<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-2764 size-full\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/12\/Screenshot-2024-12-16-195105.png\" alt=\"\" width=\"690\" height=\"974\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/12\/Screenshot-2024-12-16-195105.png 690w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/12\/Screenshot-2024-12-16-195105-213x300.png 213w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/12\/Screenshot-2024-12-16-195105-624x881.png 624w\" sizes=\"auto, (max-width: 690px) 100vw, 690px\" \/><\/p>\n<h4>Step 4: Running the Script<\/h4>\n<p>Once the script is ready, save it as a Python file (e.g., <code>google_business_scrape.py<\/code>) and run it. The script will prompt you for a <strong>Google My Business URL<\/strong>, scrape the listings, and store the information in a CSV file. You can easily modify the script to handle more complex tasks or scrape more details.<\/p>\n<h3 id=\"Important Notes\">Important Notes<\/h3>\n<h4>1. <strong>Google\u2019s Continuous HTML Updates<\/strong><\/h4>\n<p>Google frequently updates the structure of its HTML pages. This means that the CSS selectors used in the scraper may not always work. If the script stops working or throws errors, you may need to update the <strong>CSS selectors<\/strong> in the script to match the new structure. Here are some things to check:<\/p>\n<ul>\n<li><strong>Element Class Names<\/strong>: These may change over time. The script uses class names like <code>.rllt__details<\/code> or <code>.LrzXr<\/code>. If Google changes these, the script won\u2019t be able to find the data.<\/li>\n<li><strong>Element Structure<\/strong>: The order or position of certain elements on the page may change, requiring updates to the scraper.<\/li>\n<\/ul>\n<p>To fix these issues, inspect the page elements using a browser&#8217;s developer tools (F12) to find the new CSS selectors and update the script accordingly.<\/p>\n<h4>2. <strong>Legal Considerations<\/strong><\/h4>\n<p>Scraping Google My Business data may violate Google\u2019s terms of service. Always ensure that you are scraping data in accordance with the relevant legal guidelines and the site&#8217;s terms.<\/p>\n<h4>3. <strong>Proxy Usage<\/strong><\/h4>\n<p>Using proxies is important to avoid being blocked by Google while scraping. You can use a proxy service to change your IP address for each request, thus ensuring anonymity. Here&#8217;s an example of how to configure the proxy in Playwright:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">context = browser.new_context(\n             proxy={ \"server\": \"server_name:port\",\n                     \"username\": \"username\",\n                     \"password\": \"password\"}\n           )\n<\/pre>\n<p>Make sure to replace <code>server_name:port<\/code>, <code>username<\/code>, and <code>password<\/code> with your actual proxy details. Most proxy services will provide these details when you sign up or subscribe to their services.<\/p>\n<h3 id=\"Conclusion\">Conclusion<\/h3>\n<p>In this guide, we showed how to build a <strong>Google My Business scraper<\/strong> using <strong>Python<\/strong> and <strong>Playwright<\/strong>. This script extracts business information like name, address, phone number, website, and ratings, and stores it in a <strong>CSV file<\/strong>. Additionally, we integrated <strong>proxy support<\/strong> to help prevent blocking during scraping.<\/p>\n<p>Remember that <strong>Google frequently updates its HTML structure<\/strong>, so keep your CSS selectors up to date. Always respect legal guidelines and Google\u2019s terms of service when scraping data from their platform.<\/p>\n<p>Happy scraping!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Table of content Introduction Prerequisites Import Libraries Define the Scrape Function Step-by-Step Guide Important Notes Conclusion Google My Business is a crucial tool for businesses&hellip;<\/p>\n","protected":false},"author":23,"featured_media":2747,"comment_status":"open","ping_status":"closed","template":"","meta":{"rank_math_lock_modified_date":false},"categories":[],"class_list":["post-2746","scraping_project","type-scraping_project","status-publish","has-post-thumbnail","hentry"],"_links":{"self":[{"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/scraping_project\/2746","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/scraping_project"}],"about":[{"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/types\/scraping_project"}],"author":[{"embeddable":true,"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/users\/23"}],"replies":[{"embeddable":true,"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/comments?post=2746"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/media\/2747"}],"wp:attachment":[{"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/media?parent=2746"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/categories?post=2746"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}