Build a Google Shopping Scraper in Python for Price and Product Data
Table of contents
- Google Shopping Scraper
- Why Build a Google Shopping Scraper?
- Tools You’ll Need
- Prerequisites
- Writing the Scraper
- Code Explanation
- Important Notes
- Conclusion
Google Shopping Scraper
Google Shopping provides a wealth of product and price data, making it a valuable resource for e-commerce analysis. In this tutorial, we’ll guide you through building a Google Shopping scraper using Python. You’ll learn how to extract product details, prices, and seller information, enabling you to monitor market trends and competitor pricing.
Why Build a Google Shopping Scraper?
Accessing structured product and price data can be incredibly useful for:
- Competitor Analysis: Monitor pricing trends and offers from various sellers.
- Market Research: Understand market dynamics by analyzing product availability and seller competition.
- Inventory Management: Keep track of products listed by different vendors.
Tools You’ll Need
We’ll use the following tools to build our scraper:
- Python: The programming language for scripting.
- Selenium: A powerful library to interact with web pages.
- selenium-stealth: To help evade detection mechanisms by websites.
- ChromeDriver: To control the Chrome browser for web scraping.
- CSV: To save the scraped data for analysis.
Prerequisites
Make sure you have Python installed on your system. You can install the required libraries using pip:
pip install selenium selenium-stealth
Download and install ChromeDriver compatible with your Chrome browser version.
Writing the Scraper
Here is the full Python code for scraping Google Shopping:
from selenium import webdriver from selenium_stealth import stealth from selenium.webdriver.common.by import By from selenium.webdriver.chrome.options import Options from selenium.webdriver.support.wait import WebDriverWait from selenium.webdriver.support import expected_conditions as EC import time import traceback import csv # Set up Chrome options options = webdriver.ChromeOptions() options.add_argument("start-maximized") # Uncomment the line below to run in headless mode # options.add_argument("--headless") options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) driver = webdriver.Chrome(options=options) # Enable stealth mode to avoid detection stealth(driver, languages=["en-US", "en"], vendor="Google Inc.", platform="Win32", webgl_vendor="Intel Inc.", renderer="Intel Iris OpenGL Engine", fix_hairline=True, ) # Set up CSV file csv_file = "products_data.csv" csv_columns = ["Product Title", "Num of Vendor", "Vendor Name", "Price", "Image URL", "Vendor Link", "Product Rating"] # Write the header to the CSV file with open(csv_file, mode="w", newline="", encoding="utf-8") as file: writer = csv.DictWriter(file, fieldnames=csv_columns) writer.writeheader() # Open Google Shopping search page driver.get("https://www.google.com/search?sca_esv=272b628add5e1644&sxsrf=ADLYWIKOv6zMptOSkrduUIogA-_1QFM4Vw:1736252785259&q=winter+coats&udm=28&fbs=AEQNm0Aa4sjWe7Rqy32pFwRj0UkWxyMMuf0D-HOMEpzq2zertRy7G-dme1ONMLTCBvZzSliUAtuJwXTPBVWQOpeBsM3fUSkUhBhk1d2K-btGU92EzjqhYy18vZp6Kq7rwx5djl86CZ3SmcnTCNSLLpUAYm1ku78BstqWpCyrkw60NVNGecT-nd3TIWJpCihPwU0PqL2HsfQ2bCEzBN5b0AAdvbM5G21Byw&ved=1t:220175&ictx=111&biw=1541&bih=945&dpr=1#ip=1") # Scraping data all_box = driver.find_elements(By.CSS_SELECTOR, ".SsM98d") for box in all_box: try: box.click() # Wait for the product title to load element = WebDriverWait(driver, 10).until( EC.presence_of_element_located( (By.CSS_SELECTOR, "div.bi9tFe.PZPZlf[jsname='ZOnBqe'][data-attrid='product_title']") ) ) time.sleep(3) # Allow additional time for content to load product_title = driver.find_element(By.CSS_SELECTOR, 'div.bi9tFe').text image = driver.find_element(By.CSS_SELECTOR, '.q5hmpb img.KfAt4d').get_attribute('src') vendors = driver.find_elements(By.CSS_SELECTOR, 'a.P9159d') for vendor in vendors: vendor_name = vendor.find_element(By.CSS_SELECTOR, '.uWvFpd').text try: price = vendor.find_element(By.CSS_SELECTOR, '.GBgquf span span').text except: price = vendor.find_element(By.CSS_SELECTOR, 'span.Pgbknd.xUrPFc').text try: href = vendor.get_attribute('href') except Exception as e: href = None try: rating = vendor.find_element(By.CSS_SELECTOR, 'span.NFq8Ad').text except Exception as e: rating = "" # Save data to CSV with open(csv_file, mode="a", newline="", encoding="utf-8") as file: writer = csv.DictWriter(file, fieldnames=csv_columns) writer.writerow({ "Product Title": product_title, "Num of Vendor": len(vendors), "Vendor Name": vendor_name, "Price": price, "Image URL": image, "Vendor Link": href, "Product Rating": rating, }) print(f"nProduct Title: {product_title}nNum of Vendor: {len(vendors)}nVendor Name: {vendor_name}nPrice: {price}nImage URL: {image}nVendor Link: {href}nRating: {rating}n") except Exception as e: print("An error occurred:") traceback.print_exc() # Close the browser driver.quit()
Code Explanation:
- Chrome Options:
- Configures the browser to run maximized and disables automation flags to avoid detection.
- Stealth Mode:
- Makes the browser appear human-like to bypass anti-bot measures.
- CSV Setup:
- Prepares the file to store scraped data with headers like “Product Title,” “Price,” etc.
- Google Shopping URL:
- Opens Google search with a specific query (
winter+coats
). - Customizable: Replace the query with your desired product search (e.g.,
summer+shoes
).
- Opens Google search with a specific query (
- Product Scraping:
- Locates product elements, clicks to expand details, and extracts:
- Product Title, Vendor Name, Price, Image URL, Vendor Link, and Rating.
- Locates product elements, clicks to expand details, and extracts:
- Error Handling:
- Logs errors to avoid crashing when data is missing or inaccessible.
- Save to CSV:
- Appends the extracted data into a CSV file for future use.
- Close Browser:
- Ensures resources are freed by quitting the browser after scraping.
Important Notes
- Dynamic Selectors: Google frequently changes its HTML structure. You’ll need to update the CSS selectors in your scraper when the page structure changes.
- Legal Considerations: Scraping data from websites may violate their terms of service. Use this script responsibly.
Conclusion
Building a Google Shopping scraper in Python can provide you with valuable insights into product pricing and market competition. While the process is straightforward, keeping the scraper updated with Google’s ever-changing HTML is crucial. With this tool, you can extract and analyze data efficiently, giving you an edge in the competitive e-commerce landscape.
Responses