How to Scrape Amazon Prices Using Python

Amazon is a fantastic resource for information. Whether you’re an online merchant or a research firm, it’s a great place to find current prices and gain insights into the global e-commerce market.

However, getting that vital data is a different story. In this guide, we explore how to scrape Amazon prices with a custom solution, why you shouldn’t trust most Amazon price tools on the market, and how to get the best results.

Need Useful Data Fast?

We’ve got all the proxies, API and expertise needed for reliable Amazon scraping!

Take a Look!

Why Scraping Amazon Prices Matters

When you scrape Amazon prices, you gain powerful competitive intelligence. Whether you’re a business owner monitoring rival products, a researcher tracking trends, or simply curious, price tracking lets you:

Spot pricing trends and competitor strategies
Track discounts, deals, and seasonal patterns
Monitor delivery and shipping costs
Understand customer preferences and market demand

The challenge? Amazon prices change frequently, and manual checking simply can’t keep up. Automated tracking is the answer — but scraping Amazon isn’t as simple as sending a GET request.

Why Simple Amazon Price Tools Fail

There are plenty of Amazon price-checker extensions and basic scripts, but they often break quickly because of Amazon’s anti-bot systems.

Amazon uses a combination of techniques to detect and block bots, including:

Rate limiting, restricting the number of requests per IP in a given timeframe
IP blocking, banning those that show bot-like behavior. For free tools, the IP pool is often very small and easily detectable, whereas browser extensions place the burden of IP replacement entirely on you.
Bot detection, often through checking for missing browser fingerprints, unnatural browsing patterns, or headless automation markers
CAPTCHAs, which are triggered when suspicious behavior is detected
Dynamic content loading, as many prices and product details are rendered with JavaScript after page load

Extensions and basic HTTP requests usually can’t handle these obstacles for more than a few runs.

Why Should You Build Your Own Amazon Price Scraper?

As you can see, free tools and extensions aren’t going to cut it for most business purposes. For something enterprise-grade, there are two main options.

The first, as explained further down, is to build your own Amazon price scraper. By building your own scraper, you can:

Target exactly the data you need
Control how often you scrape (hourly, daily, weekly, or based on seasonal patterns)
Scale up to track hundreds or thousands of products
Integrate with other systems for automatic analysis and reporting
Fine-tune anti-detection strategies for higher reliability

Alternatively, you can consider our Web Scraping API, which can convert URLs into raw data in seconds. You’ll still need to know which pages to scrape (and for that, you can build an Amazon Web Crawler), but our API takes care of proxy rotations and other common detection signals for you. It’s not a free tool, but it is worth it!

However, a custom solution that you own will always give you the most control, alongside responsibility. So that’s what we will explore here. Even if you choose not to go down this path, you’ll at least understand what goes into successfully scraping Amazon prices.

The Importance of Proxies and Fingerprinting

A robust Amazon scraper needs more than code. It needs infrastructure to stay undetected.

This means:

Use rotating proxies to change your IP on every request to avoid detection. Residential proxies and rotating ISP proxies are best for Amazon.
Rotate User-Agent and headers to change your browser fingerprint regularly.
Implement human-like timing by randomizing delays between requests to mimic natural browsing.
Ensure fingerprint robustness and, for high-volume operations, consider solutions that randomize deeper browser characteristics (e.g., canvas fingerprints, WebRTC data, fonts).

Without these measures, even the most advanced scraper will get blocked.

Amazon Price Scraping Tech Stack

There are more than one way to scrape data, but we needed to pick a solution that has a high chance of success and is generally used among tech specialists.

So, while alternatives are of course available, we’ve chosen the following tech stack to build our Amazon price tool.

Python

We use Python because it’s one of the most widely adopted languages for web scraping. It offers:

A huge ecosystem of scraping-related libraries and frameworks
Strong community support and abundant documentation
Quick development and easy maintenance
Built-in and third-party tools for data parsing, storage, and automation

Python also integrates seamlessly with scraping-friendly libraries such as BeautifulSoup for HTML parsing and Playwright for browser automation, which we’re also using here.

Playwright (Browser Automation)

Instead of relying solely on requests and BeautifulSoup to fetch static HTML, we use Playwright to control a real browser instance. This allows us to:

Load dynamic, JavaScript-rendered content (including prices that don’t appear in static HTML)
Simulate genuine browsing behavior
Handle more complex anti-bot mechanisms

Playwright also offers more reliable cross-browser testing than older tools like Selenium, with better async support out of the box.

Asynchronous Execution

We’ve chosen an async architecture for three key reasons:

Async lets us scrape multiple Amazon product pages concurrently without spinning up multiple separate processes.
We avoid unnecessary idle time waiting for network responses by letting other tasks run in the meantime.
Async execution also makes it easier to rotate proxies and headers per request while keeping a high request throughput.

In short, async scraping means more pages scraped in less time, while still letting us add random delays and other anti-detection tactics.

Supporting Libraries

Finally, we have a few more Python libraries that shouldn’t surprise anyone. They’re effective go-to options built for purpose.

BeautifulSoup — Simple and powerful HTML parsing, perfect for extracting product data after Playwright renders the page.
CSV Handling — Built-in Python CSV tools to store data in a structured format.
Regex — To extract embedded JSON/image URLs and parse price formats.
Random/Time — For adding realistic human-like pauses between requests.

Python Script for Amazon Price Scraping

The following script is a starting point for reliable Amazon scraping. It includes:

Async Playwright for parallel page loads
Proxy rotation for IP diversity
User-Agent rotation for basic fingerprint variation
Resilient selectors with multiple fallbacks
Random delays to reduce bot-like patterns
JSON parsing for image extraction
CSV output for storing results
Retry logic with exponential backoff

You will need to modify it to suit your preferences, the proxies you’re connecting to, and any additional data you may require. Our guides on scraping Amazon product data and extracting data from Amazon to Excel may also be useful additions.

"""
async_playwright_amazon_scraper.py
Requires: playwright, beautifulsoup4, aiofiles
Install: pip install playwright beautifulsoup4 aiofiles
Then: playwright install
"""

import asyncio
import aiofiles
import json
import re
import csv
import random
import time
from pathlib import Path
from typing import Optional, Dict, List
from bs4 import BeautifulSoup
from playwright.async_api import async_playwright, TimeoutError as PlaywrightTimeoutError

# -------------------
# Configuration
# -------------------
PROXIES = [
    # Format: "http://username:password@ip:port" or "http://ip:port"
    "http://proxy1.example:8000",
    "http://proxy2.example:8000",
]

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 13_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.4 Safari/605.1.15",
]

OUTPUT_CSV = Path("amazon_prices.csv")
MAX_RETRIES = 3
BASE_SLEEP = (2, 5)
NAV_TIMEOUT = 30000  # ms

# -------------------
# Helper Functions
# -------------------
def choose_proxy() -> Optional[str]:
    return random.choice(PROXIES) if PROXIES else None

def choose_user_agent() -> str:
    return random.choice(USER_AGENTS)

async def save_results(rows: List[Dict]):
    """Append scraped data to CSV asynchronously."""
    write_header = not OUTPUT_CSV.exists()
    async with aiofiles.open(OUTPUT_CSV, "a", encoding="utf-8", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=["url", "title", "price", "currency", "rating", "images", "timestamp", "raw_price_text"])
        if write_header:
            writer.writeheader()
        for r in rows:
            writer.writerow(r)

def extract_images_from_html(text: str) -> List[str]:
    """Extract image URLs from Amazon's HTML."""
    images = re.findall(r'"hiRes":"(https?://[^"]+?)"', text)
    return list(set([u.replace('\\/', '/') for u in images]))

def parse_product_from_html(html: str, url: str) -> Dict:
    """Parse title, price, rating, and images from the HTML."""
    soup = BeautifulSoup(html, "html.parser")
    o = {"url": url, "title": None, "price": None, "currency": None, "rating": None, "images": [], "timestamp":
time.time(), "raw_price_text": None}

    # Title
    title_elem = soup.select_one("#productTitle") or soup.select_one("#title")
    if title_elem:
        o["title"] = title_elem.get_text(strip=True)

    # Price
    price_elem = soup.select_one("span.a-price span.a-offscreen") \
        or soup.select_one("#priceblock_ourprice") \
        or soup.select_one("#priceblock_dealprice")
    if price_elem:
        raw = price_elem.get_text(strip=True)
        o["raw_price_text"] = raw
        m = re.search(r"([^\d.,]*)([\d,]+(?:[.,]\d+)?)", raw)
        if m:
            o["currency"] = m.group(1).strip() or None
            o["price"] = m.group(2).replace(',', '')

    # Rating
    rating_elem = soup.select_one("#acrPopover") or soup.select_one(".a-icon-alt")
    if rating_elem:
        o["rating"] = rating_elem.get_text(strip=True)

    # Images
    o["images"] = extract_images_from_html(html)
    return o

# -------------------
# Main Scraper Function
# -------------------
async def scrape_with_playwright(url: str) -> Optional[Dict]:
    """Scrape a single Amazon product page."""
    proxy = choose_proxy()
    ua = choose_user_agent()

    for attempt in range(MAX_RETRIES):
        try:
            async with async_playwright() as p:
                browser_args = {"headless": True}
                if proxy:
                    browser_args["proxy"] = {"server": proxy}
                browser = await p.chromium.launch(**browser_args)
                context = await browser.new_context(user_agent=ua, locale="en-US")
                page = await context.new_page()
                await page.set_extra_http_headers({"Accept-Language": "en-US,en;q=0.9"})
                try:
                    await page.goto(url, timeout=NAV_TIMEOUT, wait_until="networkidle")
                except PlaywrightTimeoutError:
                    await page.goto(url, timeout=NAV_TIMEOUT)
                html = await page.content()
                if "captcha" in html.lower():
                    print(f"[WARN] CAPTCHA detected for {url}")
                    await context.close()
                    await browser.close()
                    return None

                result = parse_product_from_html(html, url)

                await context.close()
                await browser.close()
                return result

        except Exception as e:
            print(f"[ERROR] Attempt {attempt+1} failed for {url}: {e}")
            await asyncio.sleep((2 ** attempt) + random.random())

    return None

# -------------------
# Runner Function
# -------------------
async def scrape_urls(urls: List[str]):
    """Scrape multiple Amazon product URLs."""
    random.shuffle(urls)  # Avoid scraping in the same order every run
    tasks = []
    for url in urls:
        await asyncio.sleep(random.uniform(*BASE_SLEEP))
        tasks.append(scrape_with_playwright(url))
    results = await asyncio.gather(*tasks)
    results = [r for r in results if r]

    if results:
        with OUTPUT_CSV.open("w", newline="", encoding="utf-8") as f:
            writer = csv.DictWriter(f, fieldnames=results[0].keys())
            writer.writeheader()
            writer.writerows(results)

# -------------------
# Entry Point
# -------------------
if __name__ == "__main__":
    test_urls = ["https://www.amazon.com/dp/B0BSHF7WHW"]
    asyncio.run(scrape_urls(test_urls))

Implementation notes & tuning checklist

As we said, the above is a robust but rudimentary tool for scraping Amazon prices. For large-scale needs, fine-tuning will certainly be needed, and we’ve included some suggestions to tackle the more common challenges.

Playwright vs requests+BS: use Playwright when the price or elements are loaded by JS (common). requests + BeautifulSoup is faster but brittle for Amazon.
Selectors to prioritize:
- Title: #productTitle
- Price: span.a-price span.a-offscreen, #priceblock_ourprice, #priceblock_dealprice, or .a-size-medium.a-color-price
- Rating: #acrPopover or .a-icon-alt
- Image JSON: ImageBlockATF or other inline JS blobs
- Details/specs: #productDetails_techSpec_section_1, #productDetails_detailBullets_sections1, or #feature-bullets
Proxy management: maintain proxy health checks and prefer residential/ISP proxies for retail sites. Rotate proxies per product or per request. Don’t reuse a single bad proxy.
Headers / fingerprint — Rotate User-Agent, and also rotate other request headers such as Accept-Language, Referer, sec-fetch-*, and sec-ch-ua to increase header entropy. For each session, vary viewport size, timezone, and locale to make automation harder to detect. For high-scale work, evaluate full fingerprint robustness; professional solutions exist for managing browser fingerprints (commercial services). The script above does not attempt to override low-level fingerprinting signals (WebRTC, canvas, fonts, etc.).
CAPTCHAs & blocks: detect them and route those URLs to a manual/paid solution rather than attempting to bypass automatically in an ad-hoc way.
Logging & monitoring: capture response codes, elapsed time, proxy used, and whether any bot checks were triggered.
Rate limits & randomization: add random.uniform() sleeps and jitter. Use a job queue (Redis/RQ, Celery) or APScheduler for scheduled scrapes.
Scaling: for many URLs, use a distributed job queue and proxy pool with health checks. Persist results to a database rather than CSV.

Tracking Prices Over Time

To monitor price changes, schedule this script to run at random intervals using a job scheduler (like cron on Linux or Task Scheduler on Windows).

Avoid fixed, predictable scraping times. Even with rotating proxies, if your requests arrive at exactly the same times each day, it’s easier for anti-bot systems to flag the activity as automated. Instead:

Use randomized intervals — for example, scrape every 25–40 minutes instead of exactly every 30.
Introduce jitter per request within each scraping run.
Randomize product order within your list, so the same items aren’t always scraped in the same sequence.

This helps mimic real-world browsing patterns and reduces the likelihood of detection. And because Amazon’s HTML and anti-bot systems change often, you should also:

Run regular selector checks to confirm your scraper is still finding prices
Log every failed request to review patterns and adapt
Keep your proxy pool fresh and healthy

Final Thoughts

By combining async Playwright with proxy rotation, header randomization, and robust parsing, you’ll have a much higher success rate scraping Amazon prices than with older static HTML methods.

This setup is not a finished product — it’s a foundation you can adapt to your scale, data needs, and infrastructure. High-profile sites often deploy the most advanced protection measures, so your scraping methods need to be equally advanced.

With fine-tuning, you can reliably track Amazon prices over time and integrate that data into your pricing intelligence systems. We also offer a wide range of Amazon scraping solutions to support you, ranging from ethically-sourced residential proxies to our unique Web Scraping API!

Need Useful Data Fast?

We’ve got all the proxies, API and expertise needed for reliable Amazon scraping!

Take a Look!

The information contained within this article, including information posted by official staff, guest-submitted material, message board postings, or other third-party material is presented solely for the purposes of education and furtherance of the knowledge of the reader. All trademarks used in this publication are hereby acknowledged as the property of their respective owners.

How to Scrape Amazon Prices Using Python

Need Useful Data Fast?

Why Scraping Amazon Prices Matters

Why Simple Amazon Price Tools Fail

Why Should You Build Your Own Amazon Price Scraper?

The Importance of Proxies and Fingerprinting