How to Scrape Amazon Product Data With Python
Data is everything to a business. And whether you’re a merchant, a product retailer or even developing new goods for the market, there’s no greater resource of public data quite like Amazon.
So the obvious question is, how can you best scrape Amazon product data? Done right, you can gain exceptional insights and resources, but done wrong, you can find yourself against Amazon’s anti-bot defenses and other measures.
Need Public Amazon Data?
For pricing intelligence, market rsearch and more, explore our Amazon scraping API & proxies.

In this guide, we’ll not only showcase how to successfully scrape Amazon products, but also explain why a prebuilt Amazon product scraper isn’t going to meet your large-scale needs. If you’re serious about gaining data, we’ll show you not only the value in how to build your own Amazon product scraper, but also how to best do it.
Use Cases: Why Scrape Amazon Product Data?
There are a number of benefits to capturing data from Amazon to use for your own needs. A product description, for example, provides insight into branding, features, and functions that could influence your product.
Consider what a product description needs to do – educate the buyer by providing the best features and functions of that product. It should always create a desire to buy and entice the buyer to make a purchase now.

As a product seller or owner, you need to be able to monitor what the competition is promising your customers. What are they offering, what features are most desirable, and what marketing language and keywords are they using to pull in customers? What they are doing may influence changes you make in your own campaign.
What Product Data Can You Extract?
Consider the data most important to your products, and when analyzing that of the competition. At a public level, there is a wealth of information that fits this level of support, including:
Data Type | Common HTML Element / Selector | Notes |
Product Title | #productTitle | Use .text.strip() to clean. |
Price | #corePrice_feature_div .a-offscree | Can vary by product — also try #priceblock_ourprice or .a-price .a-offscreen . |
Discount / Deal | #corePrice_feature_div .a-price .a-text-price , #dealprice | Use fallback logic as the structure varies. |
Rating | #acrPopover title attribute | Example: “4.3 out of 5 stars”. Use .get("title") . |
Review Count | #acrCustomerReviewText | Example: “1,234 ratings”. |
Product Description | #productDescription , #feature-bullets ul | Some descriptions are only in the bullet list (feature-bullets ). |
Images (main) | #landingImage src attribute | For galleries, check #altImages img . |
ASIN | #ASIN (hidden input), or URL segment (/dp/<ASIN> ) | You can also find it under the “Product details” section. |
Categories | #wayfinding-breadcrumbs_container ul li span.a-list-item | Text list; each span contains a breadcrumb. |
Availability | #availability span | Example: “In Stock.” or “Currently unavailable.” |
Shipping Info | #mir-layout-DELIVERY_BLOCK-slot-PRIMARY_DELIVERY_MESSAGE_LARGE | Use .text.strip() . Often dynamically loaded. |
Seller Info | #merchant-info or #tabular-buybox | Identifies third-party or Amazon as seller. |
Bestseller Tag | .badge-text or look for text like “#1 Best Seller” | May appear near title or image. |
Amazon’s Choice | .ac-badge or badge container with similar text | Not always present. |
Specifications | #productDetails_techSpec_section_1, #productDetails_detailBullets_sections1 | Tables containing technical and additional product info. |
Bullet Points | #feature-bullets ul li span | Use .text.strip() on each <span> . |
Variations (size, color) | Look for twister elements, e.g., #twister .a-dropdown-prompt | Parsing can be complex — may require AJAX calls or JS execution. |
Customer Questions | #ask_lazy_load_div .a-section.askTeaserQuestions | Often hidden behind a tab; may require JS-rendered scraping. |
As you look at this list, consider what details could help you make informed decisions about your products and marketing efforts.
Are There Pre-Built Tools for Scraping Amazon Product Data on the Market?
Sure, there are a variety of tools available, but they mostly have limitations that could interfere with your process and potentially influence your decisions in an unintended way.
This includes:
- Pricing tools that claim to pull the information you need. As dedicated tools for specific sites, even with a reasonable amount of IPs, they inevitably get flagged and blocked by Amazon. They also often limit how fresh the data might be, or how much data you can scrape at a time.
- Plugins and extensions for common browsers. There are a wide range of Chrome extensions, for example, that can often work, at least for a short time. However, most extensions and plugins lack robot proxy rotation or captcha-solving capabilities, limiting their reliability when scraping Amazon at scale.
In both cases, you also have to consider the manual effort required. Looking to analyze an entire product category? You’re likely looking at tens of thousands of potential pages to scrape. There’s no way to manually do that via Chrome, or even via a cheap tool.

To be frank, pre-built solutions only offer quick results at a smaller scale. They just do not fit the needs of most businesses, as a result.
That’s not to say that they can’t succeed, just that most market options are optimized for smaller scales and rarely fit typical business needs. For a good example, take a look at our own Web Scraping API. Yes, it’s still a prebuilt tool, but it’s powered by our diverse proxy pool, handles CAPTCHAs and delivers raw data in seconds. But outside an initial 5,000 free scrapes, it certainly isn’t free!
Building Your Own Amazon Product Scraper
If you’ve got big data ambitions, it’s better to build a custom solution. Not only can you optimize it via appropriate proxy rotations, fingerprints, and other bypasses, you can also ensure you’re extracting the exact product details you require.

To build our example Amazon product scraper, we’ll use Python. It’s a popular language that’s robust enough for web scraping challenges and, indeed, we’ve covered this in more detailed guides:
- How to Web Scrape in Python
- 13 Python Web Scraping Projects to Try
- A Comprehensive Guide to Python Web Crawlers
There are several steps you’ll take for scraping Amazon products:
- Install the necessary libraries, such as Requests and Pandas (or BeautifulSoup)
- Create a folder that will allow you to store your information, such as a virtual environment
- Requests will allow you to return the HTML response from the site as a string.
- BeautifulSoup will allow you to pull the data out of those HTML and XML files that you want to capture based on attributes, tags, and specific text mentions.
Note: If you don’t know which pages to scrape, or haven’t built an effective means of doing so, we also have a detailed guide on building your own web crawler for Amazon.
We’ll also be adding proxy integration here, in order to get the best results. When it comes to scraping Amazon, we highly recommend using rotating residential proxies. These offer authentic, ISP-backed IPs that are significantly less likely to get blocked. Nothing is perfect, however, so you need to rotate accordingly to best avoid rate limits and potential bans.
In this particular code, we’re scraping for the product’s name, rating, price, images, and even the product description. However, feel free to modify this code to extract the exact details you need.
import time, random import requests from urllib.parse import urljoin from bs4 import BeautifulSoup import pandas as pd # Your proxy string in the format IP:Port:Username:Password proxy_string = '123.45.67.89:4444:your_username:your_password' # Split the string into its components ip, port, username, password = proxy_string.split(':') # Format it for the 'proxies' dictionary used by requests proxies = { 'http': f'http://{username}:{password}@{ip}:{port}', 'https': f'http://{username}:{password}@{ip}:{port}' } custom_headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36' ' (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36', 'Accept-Language': 'da, en-gb, en', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': 'text/html,application/xhtml+xml,application/xml;' 'q=0.9,image/avif,image/webp,*/*;q=0.8', 'Referer': 'https://www.google.com/' } def get_product_info(url): response = requests.get(url, headers=custom_headers, proxies=proxies) if response.status_code != 200: print(f'Error in getting webpage: {url}') return None soup = BeautifulSoup(response.text, 'lxml') title_element = soup.select_one('#productTitle') title = title_element.text.strip() if title_element else None price_element = soup.select_one('#corePrice_feature_div span.a-offscreen') price = price_element.text if price_element else None rating_element = soup.select_one('#acrPopover') rating_text = rating_element.attrs.get('title') if rating_element else None rating = rating_text.replace('out of 5 stars', '') if rating_text else None image_element = soup.select_one('#landingImage') image = image_element.attrs.get('src') if image_element else None description_element = soup.select_one( '#productDescription, #feature-bullets > ul' ) description = ( description_element.text.strip() if description_element else None ) return { 'title': title, 'price': price, 'rating': rating, 'image': image, 'description': description, 'url': url } def parse_listing(listing_url, visited_urls, current_page=1, max_pages=2): response = requests.get( listing_url, headers=custom_headers, proxies=proxies ) print(response.status_code) soup_search = BeautifulSoup(response.text, 'lxml') link_elements = soup_search.select( '[data-cy="title-recipe"] > a.a-link-normal' ) page_data = [] for link in link_elements: full_url = urljoin(listing_url, link.attrs.get('href')) if full_url not in visited_urls: visited_urls.add(full_url) print(f'Scraping product from {full_url[:100]}', flush=True) product_info = get_product_info(full_url) if product_info: page_data.append(product_info) time.sleep(random.uniform(3, 7)) next_page_el = soup_search.select_one('a.s-pagination-next') if next_page_el and current_page < max_pages: next_page_url = next_page_el.attrs.get('href') next_page_url = urljoin(listing_url, next_page_url) print( f'Scraping next page: {next_page_url}' f'(Page {current_page+1} of {max_pages})', flush=True ) page_data += parse_listing( next_page_url, visited_urls, current_page+1, max_pages ) return page_data def main(): visited_urls = set() data = [] search_url = 'https://www.amazon.com/s?k=bose' data = parse_listing(search_url, visited_urls) df = pd.DataFrame(data) df.to_csv('headphones.csv', index=False) if __name__ == '__main__': main()
What next? After acquiring the code, you might want to parse it into a spreadsheet. If that’s the case, you can see our last guide on how to extract data from Amazon to Excel.
Additional Tips for Successful Scraping
There are a variety of ways to improve efficiency and achieve your goals in this process. Here are a few things to remember:
Timing
Timing your requests properly is important. You want to capture the most up-to-date information, and you need to ensure that you are getting new releases on a consistent basis. Set up a schedule to automate this process.
Real-time versus historical data
During this process, be sure you know what type of product description you are capturing. Historical product data could be helpful to some tasks, but most businesses need access to real-time data. You can design your Amazon product scraper to capture that data for you.
Rescrape content at a rate that makes sense
If your industry frequently launches new products or new sellers come online, then more frequent checks may be necessary. For other products, you may not need to rescrape data more than once every week or so.
Rotate user agents
Many of the best sites are investing in TLS fingerprinting and other forms of analysis. In this case, if you don’t rotate your user agents and request headers, even if you rotate your IPs, it might not be enough.
Beaware of Amazon’s anti-bot defenses
Amazon’s website is very protected and there are a lot of additional surprises that you might encounter. For example,
- You will likely need to enable JavaScript rendering and dynamic content loading. Additional tools like Selenium or Playwright can help you here.
- If you come across CAPTCHAs, you should first consider rotating proxies faster and maybe lowering your request frequencies. If you do still trigger this, consider a solver like 2Captcha.
We didn’t include these in the main Amazon product scraper, because it often depends on your specific situation and how you are interacting with Amazon’s website. However, as your scraping scales up, you should be prepared for these extra challenges.
The Easiest Way to Scrape
Turn Amazon URLs into raw data in seconds with our Web Scraping API!

Getting Started with Scraping Amazon Products
It’s important to be ethical. Ethical data collection means that you should only ever be pulling data that is publicly available. That means you should not need to log into the Amazon system to capture it.
When you use these strategies to create an Amazon product scraper, you will have the ability to capture the specific information you need, at the scale right for your business, and with the accuracy you need. This makes it a very worthwhile task and, with our tools, it only takes a few minutes to set it up.

Rayobyte can help you along the way. Use our proxy services for building your Amazon product scraper to protect your identity. And, you can trust our web scraping API to help you navigate the challenges of creating a scraper that is fast and efficient.
As a business, you need to monitor your product descriptions and those of your competitors. Building a scraper for Amazon product data makes that possible.
The information contained within this article, including information posted by official staff, guest-submitted material, message board postings, or other third-party material is presented solely for the purposes of education and furtherance of the knowledge of the reader. All trademarks used in this publication are hereby acknowledged as the property of their respective owners.