Automate Retail Price Monitoring with a Python Scraper

Watch how you can setup an automated retail price monitoring system using Python:

Important links to get started

Introduction

Tracking the prices manually can be overwhelming. On top of that, prices change almost by the minute on big ecommerce sites like Amazon or Walmart which have multiple thousand transactions a day. And this is exactly where you require automated retail price monitoring. Automating the process saves businesses a lot of time, eliminates human error and keeps you informed on market trends as/when they happen. Automated reports keep you thinking big–not small.

In this tutorial, we are going to build an Automate Retail Price Monitoring system with help of python and  see how you can perform what is known as web scraping using BeautifulSoup in Python. The code created here will help us monitor retail prices automatically. How to scrape prices from Amazon and Walmart, follow the price change over time, get alerted for when there is a substantial shift in costs. In this Automate Retail Price Monitoring with a Python Scraper tutorial we’ll provide you with a very strong tool to be sure your retail business stays ahead of the competitive curve.

Automate Retail Price Monitoring with a Python Scraper

python_scraper

Retail Price Monitoring **System**

Price monitoring allows you to track what your competitors are pricing, not only so that you can react to changes but also so that they give context “the bigger picture.” This way you will be able to predict the course taken by other players, as well as their potential transformations prior to any mutations begin. If you observe the same type of product prices repeatedly declining this may signify that demand is decreasing or there are new models coming. Information like that can be critical to outpacing the competition and making key moves that lead your company towards revenue generation in an increasingly competitive business environment.

Tools and Libraries Needed

pandas

In this article, we will first learn what tools and libraries you need to set up your python environment for the same. This will be the Singleton class on which we can rely to implement our ultimate price monitoring system.

We will go over the key Python libraries needed (a high-level overview):

BeautifulSoup : This library provides tools for scraping information from web pages which are written in HTML or XML.

Requests : This is the library you can use to make GET and POST request, in this case we will get web pages where price were available for scanning

Pandas: Pandas is a built in data manipulation library. Great for organizing and cleaning the data you scrape from web pages.

You should install them to start. If you do not have it yet, install some with this command based on Python’s package manager `pip`. In your command line or terminal run:

pip install beautifulsoup4

pip install requests

pip install pandas

pip install lxml

This will download the libraries and install them so that you can use it in your Python scripts.

Web Scraping Introduction

Now that you have your tools of choice, let’s track back a little and talk about web scraping. Web scraping is just extracting data automatically from websites. Taking the information from web pages and manually copying it is no longer required because you can write a script that will do it for you — faster, more reliable with less errors.

This will give us the ability to scrape products prices from part of e-commerce websites like Amazon and Walmart. This data is what we will then rely on for monitoring price overtime.

But you should know that scraping can require legal steps. Most websites have a terms of services that details what you can and cannot do. Certain websites may have a clear ban on scraping, or the requests can be allowed in limited quantities, and done for less restrictive means. Amazon and Walmart, for instance does not allow data scraping their platforms automatically.

Here are some of the best practices that you must consider to not fall on the wrong side of law:

– Respect the robots. txt file: most of the sites has `robots. txt file which describes where spiders are allowed to crawl the site and where they can not. Please inspect your copy of this file and follow the indications that it gives you.

– Server overloading: Never ask for more request into short span. It may cause a load on the server and lead to you being blocked due of your IP address.

– Proxies as needed: To prevent blocking and throttling, it’s essential to spread your large volume of requests across multiple IPs. When scraping websites, getting blocked is a common issue, which is why integrating a proxy into your scraper is necessary.

This allows you to use web scraping in a safe and ethical manner while also making an effective retail price monitoring system. This way you can avoid breaking any rules and guideline of the site that you are working with while gathering the data.

Scraping Basics

So before we get to know how you can scrape specific website[s] let us start with the basics of web scraping The idea behind web scraping is that you should be able to take a webpage and programmatically extract the content from it so that you too could do something useful with all those useless pieces of data.

  1. Making HTTP Requests:

http

The initial step in web scraping is sending an HTTP request to the website that you want to scrape. In Python this is done through the Requests library. The response to the request is simply a webpage with HTML content (which you’ll extract data from) when extracted.

– Here’s a simple example:

import requests

url = "https://quotes.toscrape.com/"

download_content  = requests.get(url).text #we are downloading the website html page using python request library

print(download_content)

– In this code we are hitting a GET call to “quotes.toscrape” and output is the response received from “https://quotes.toscrape.com/” and save the HTML content of the page in download_content variable.

  1. Parsing HTML:

bs4

Now that you have the HTML content, you need to parse the data. This is where the BeautifulSoup library comes in. The purpose of BeautifulSoup is to target specific HTML elements on the page using various methods such as soup.find, soup.select, and others

Parsing the HTML content:

import requests 

from bs4 import BeautifulSoup

url = "https://quotes.toscrape.com/"

download_content  = requests.get(url).text #we are downloading the website html page using python request library

soup = BeautifulSoup(download_content,'lxml') # here we are parsing the content using BeautifulSoup

print(soup)

– Now, you can search for elements of interest in soup like product prices by their tags|Classes |Ids (using everything that we know so far). You can read more by visiting beautifulsoup documentation

Extracting Prices from Amazon

amazon

For implementation, just understand the concept and we will scrape product prices from Amazon.

  1. Identify the Target Element:

– First: you need to go to the Amazon product page and check price element. You can do this by right clicking on the price, and choosing “Inspect” in your browser. Find some sort of specific identifier — either an ID, a classname or both; whatever you can use to point your scraper towards the price.

  1. Write the Scraping Code:

– Get the HTML content of the page using `Requests` and extract only price from it with BeautifulSoup as soon you know which element to target

import requests

from bs4 import BeautifulSoup

import smtplib

import csv

import os

from datetime import datetime



url = "https://www.amazon.com/Fitbit-Management-Intensity-Tracking-Midnight/dp/B0B5F9SZW7/?_encoding=UTF8&pd_rd_w=raGwi&content-id=amzn1.sym.9929d3ab-edb7-4ef5-a232-26d90f828fa5&pf_rd_p=9929d3ab-edb7-4ef5-a232-26d90f828fa5&pf_rd_r=A1B0XQ919M066QVE71VN&pd_rd_wg=Aw2vX&pd_rd_r=69a343dc-b5f2-4e2a-ae85-2ca4e3945a26&ref_=pd_hp_d_btf_crs_zg_bs_3375251&th=1"



# Scraping the price from the webpage

download_content = requests.get(url).text #downloading the html using python request 

soup = BeautifulSoup(download_content, 'lxml') #parsing the HTML content using BeautifulSoup

price = soup.find("span", class_="a-price-whole") #using soup.find method for target price




# Cleaning and converting the price

price = price.text.strip().replace('.', '')

price = float(price)




# Get the current date

current_date = datetime.now().strftime("%Y-%m-%d")




# File name for the CSV file

file_name = 'price_data.csv'




# Function to save price data to CSV

def save_to_csv(url, price):

    file_exists = os.path.isfile(file_name)

    

    with open(file_name, 'a', newline='') as csvfile:

        fieldnames = ['Date', 'URL', 'Price']

        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)




        # Write header only if the file does not exist

        if not file_exists:

            writer.writeheader()




        # Write data row

        writer.writerow({'Date': current_date, 'URL': url, 'Price': price})




    print(f"Data saved to {file_name}")




# Function to send an email alert

def send_email():

    email = "your email"

    receiver_email = "receiver_email"

    subject = "Walmart Price Alert"

    message = f"Great news! The price has dropped. The new price is now {price}!"

    text = f"Subject:{subject}nn{message}"




    server = smtplib.SMTP("smtp.gmail.com", 587)

    server.starttls()

    server.login(email, "your app password")

    server.sendmail(email, receiver_email, text)

    server.quit()

    print("Email sent!")




# Function to check price and notify if needed

def check_and_notify():

    if price < 80:  # Threshold price for notification

        send_email()




# Save the scraped data to CSV

save_to_csv(url, price)

Code Explanation:  This line, `requests.get(url).text` , makes an HTTP request using Python’s requests library and downloads the HTTP content. Then, in `BeautifulSoup(download_content, ‘lxml’)`, we parse the content using BeautifulSoup and use the `soup.find` method to target the price. The `send_email()` function is responsible for sending the email notification. The `check_and_notify()` function only sends an email if the price is less than 80 or our given threshold. Finally, after finishing the scraping, we save the results in a CSV file, which looks like this:

csv

  1. Handle Potential Issues:

– Amazon is notorious for changing their HTML structure more than Dr Jekyll adjusts his personality. Be sure to set a proxy as well, so that Amazon does not block you. There are many proxy providers available in the market; you can use any of them. I am using Rayobyte Proxy. Here is code how you can use  proxy in your scraper

import requests

from bs4 import BeautifulSoup

import smtplib

import csv

import os

from datetime import datetime




# Proxy and headers configuration for avoid blocking

proxies = {

    "https": "http://PROXY_USERNAME:PROXY_PASS@PROXY_SERVER:PROXY_PORT/"

}




headers = {

    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',

    'content-encoding': 'gzip'

}




url = "https://www.amazon.com/Fitbit-Management-Intensity-Tracking-Midnight/dp/B0B5F9SZW7/?_encoding=UTF8&pd_rd_w=raGwi&content-id=amzn1.sym.9929d3ab-edb7-4ef5-a232-26d90f828fa5&pf_rd_p=9929d3ab-edb7-4ef5-a232-26d90f828fa5&pf_rd_r=A1B0XQ919M066QVE71VN&pd_rd_wg=Aw2vX&pd_rd_r=69a343dc-b5f2-4e2a-ae85-2ca4e3945a26&ref_=pd_hp_d_btf_crs_zg_bs_3375251&th=1"




# Scraping the price from the webpage

download_content = requests.get(url,proxies=proxies,headers=headers).text #downloading the html using python request and also using proxy for avoid blocking

soup = BeautifulSoup(download_content, 'lxml') #parsing the HTML content using BeautifulSoup

price = soup.find("span", class_="a-price-whole")  #using soup.find method for target price





# Cleaning and converting the price

price = price.text.strip().replace('.', '')

price = float(price)




# Get the current date

current_date = datetime.now().strftime("%Y-%m-%d")




# File name for the CSV file

file_name = 'price_data.csv'




# Function to save price data to CSV

def save_to_csv(url, price):

    file_exists = os.path.isfile(file_name)

    

    with open(file_name, 'a', newline='') as csvfile:

        fieldnames = ['Date', 'URL', 'Price']

        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)




        # Write header only if the file does not exist

        if not file_exists:

            writer.writeheader()




        # Write data row

        writer.writerow({'Date': current_date, 'URL': url, 'Price': price})




    print(f"Data saved to {file_name}")




# Function to send an email alert

def send_email():

    email = "your email"

    receiver_email = "receiver_email"

    subject = "Amazon Price Alert"

    message = f"Great news! The price has dropped. The new price is now {price}!"

    text = f"Subject:{subject}nn{message}"




    server = smtplib.SMTP("smtp.gmail.com", 587)

    server.starttls()

    server.login(email, "your app password")

    server.sendmail(email, receiver_email, text)

    server.quit()

    print("Email sent!")




# Function to check price and notify if needed

def check_and_notify():

    if price < 80:  # Threshold price for notification

        send_email()




# Save the scraped data to CSV

save_to_csv(url, price)

Extract Prices from Walmart

Moving to the next, Now let’s go ahead with scraping prices from Walmart.

  1. Inspect the Product Page:

– Just like Amazon, Check the Walmart Product Page to trackdown Price Elements. Walmart has different html structure, so look for particular class names or IDs.

  1. Write the Scraping Code:

Now that we have the element, here is how you can scrape his price

import requests

from bs4 import BeautifulSoup

import smtplib

import csv

import os

from datetime import datetime




# Proxy and headers configuration

proxies = {

    "https": "http://PROXY_USERNAME:PROXY_PASS@PROXY_SERVER:PROXY_PORT/"

}




headers = {

    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',

    'content-encoding': 'gzip'

}




url = "https://www.walmart.com/ip/Men-s-G-Shock-GA100L-8A-Tan-Silicone-Japanese-Quartz-Sport-Watch/166515367?classType=REGULAR"




# Scraping the price from the webpage

download_content = requests.get(url, headers=headers).text

soup = BeautifulSoup(download_content, 'lxml')

price = soup.find("span",{"":"price"}) 

price =  price.text

price = price.replace("Now $","")

price = float(price)




# Get the current date

current_date = datetime.now().strftime("%Y-%m-%d")




# File name for the CSV file

file_name = 'price_data.csv'




# Function to save price data to CSV

def save_to_csv(url, price):

    file_exists = os.path.isfile(file_name)

    

    with open(file_name, 'a', newline='') as csvfile:

        fieldnames = ['Date', 'URL', 'Price']

        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)




        # Write header only if the file does not exist

        if not file_exists:

            writer.writeheader()




        # Write data row

        writer.writerow({'Date': current_date, 'URL': url, 'Price': price})




    print(f"Data saved to {file_name}")




# Function to send an email alert

def send_email():

    email = "your email"

    receiver_email = "receiver_email"

    subject = "Walmart Price Alert"

    message = f"Great news! The price has dropped. The new price is now {price}!"

    text = f"Subject:{subject}nn{message}"




    server = smtplib.SMTP("smtp.gmail.com", 587)

    server.starttls()

    server.login(email, "your app password")

    server.sendmail(email, receiver_email, text)

    server.quit()

    print("Email sent!")




# Function to check price and notify if needed

def check_and_notify():

    if price < 80:  # Threshold price for notification

        send_email()




# Save the scraped data to CSV

save_to_csv(url, price)

This code just fetch the price of product form Walmart : And remember, monitor changes in Walmart’s HTML structure just like you would with Amazon.

— Setting up Price Monitoring System

We will use the Gmail SMTP server, and we need to create an app password for the SMTP settings. If two-factor authentication is not enabled on your Gmail account, please enable it first, then go to the App Password page.

Enter the name of your app and click on ‘Create’.

Copy the password and store it in a secure place, then click ‘Done’

screen_shot_2

Setting Up Alerts

A price monitoring system needs to be able to alert you if there is a great change in the market. You can add this signal an alert that will let you know whenever the price crosses either above or below your predefined limits so you do not need to manually check for changes. That way, you always know what is happening and can make necessary adjustments to market changes.

Let me show you how to enable email alerts for price changes.

  1. Workflow to Choose an Alert Trigger:

First — Determine what is a “significant” price change for your use case. This could be any greater-than-X-change or a Y-dollar amount. You may for instance be interested in getting notified if the price of a product drops by more than 5%, or $5;

  1. Sending Email Alerts With Python

smtplib — A core Python library for sending emails directly from your script. This can be automated easily so that if a major price change is registered by your scraper, it mails you an alert for the same.

  1. Example of Code Send Email Alerts:

Here is a tutorial on setting up email alerts step by step.

import smtplib





def send_email():

    email = "your_email"

    receiver_email = "receiver_email"




    subject = "price alert"

    message = f"Great news! The price has dropped. The new price is now {price}!"

    text = f"Subject:{subject}nn{message}"




    server = smtplib.SMTP("smtp.gmail.com",587)

    server.starttls()

    server.login(email,"app password")

    server.sendmail(email,receiver_email,text)

    print("email sent")

 

 

def check_and_notify():

    if price < 80:

        send_email()

– Explanation:

send_email() Function: Sends an email with subject and body that is sent to a given email address. This will connect to your email server via the smtplib library (in this case, it’s gmail) and send a message.

check_and_notify Function: This function will check if price difference reaches to the threshold (in-sample $80). If it does, calls send_email to notify.

  1. Customizing the Alert:

The alert message can be extended or modified to include details like the URL of the product, when did it go out of stock and a link back to its page.

If nothing else, you are notified quickly if a price will be changed drastically. This can be especially useful in markets where the timing is crucial. And with auto alerts, you can respond in real time to change your own price or make a more informed decision on if buying at that moment is optimal and track it over the years.

Legal and Ethical Concerns

When web scraping, it is important to do so responsibly and lawfully. Scraping the web can be a powerful method to get and structurize information but we must understand that websites are an asset of someone. Failing to adhere to these rules can get you into legal trouble, your IP banned or you could have negative aspects on how people see and interact with the content.

Compliance With Website Terms of Service

Always pay a visit to the terms of service (ToS) page before you start scraping a website. A lot of websites Noted that web scraping is allowed or not. For instance, for personal use some e-commerce sites may allow scraping but do not give you the rights to use it commercially. Breaking these terms could get you in trouble or even blacklisted.

Understanding robots.txt:

The robots. What is a robots.txt file?. It describes what parts of the site can and cannot be accessed with automated tools. While the robots. The.txt is not legally a binding document but we generally agree to follow what it says for good practice. It even can result in your scraper being blocked or IP blacklisted.

Preventing Over Load on Servers

Scraping uses multiple requests to the server for extracting data. For example if your scraper sends Frequent requests in a short interval than the server can be overloaded and will slow down or it could even crash. These are a bit shady but moreover your scraper could be very easily labelled as malicious if the other end recognizes you’re scraping their website too much. Too get around this always put your requests in a queue with a delay between each one and only make X amount of attempts per second.

Always Pay Due Care about Personal Data:

For example — in case you scrap user reviews or comments your script will also start to scrape personal data, which should be handled responsibly. Similarly you will need to make sure your solution complies with data protection regulations, for example GDPR (which regulates how personal data can be collected stored and used) if targeting Europe-based consumers.

Responses

Related Projects

PHP-based Web Crawler