-
How to scrape movie ratings and reviews from RottenTomatoes.com using Python?
Scraping movie ratings and reviews from RottenTomatoes.com is an excellent way to analyze audience feedback, review trends, and critic scores for films. Python, along with libraries like BeautifulSoup and requests, can be used to scrape static content from the site. If the reviews or ratings are dynamically loaded, Selenium can help render the JavaScript content for scraping. RottenTomatoes structures its data in a systematic way, with dedicated sections for audience reviews, critic reviews, and movie details, making it straightforward to target specific data points.
Before starting, use the developer tools in your browser to inspect the webpage. Identify the HTML tags and classes that house the ratings and reviews. This will guide your scraping script. Here’s an example of scraping static content using BeautifulSoup:import requests from bs4 import BeautifulSoup # Target URL for a specific movie url = "https://www.rottentomatoes.com/m/example_movie" headers = { "User-Agent": "Mozilla/5.0" } # Fetch the page response = requests.get(url, headers=headers) if response.status_code == 200: soup = BeautifulSoup(response.content, "html.parser") # Extract movie details movie_title = soup.find("h1", class_="scoreboard__title").text.strip() critic_score = soup.find("span", class_="scoreboard__score").text.strip() audience_score = soup.find("span", class_="scoreboard__percentage").text.strip() print(f"Movie: {movie_title}") print(f"Critic Score: {critic_score}") print(f"Audience Score: {audience_score}") # Extract audience reviews reviews = soup.find_all("div", class_="audience-review") for review in reviews[:5]: # Limit to first 5 reviews review_text = review.find("p", class_="audience-review__text").text.strip() print(f"Review: {review_text}") else: print("Failed to fetch Rotten Tomatoes page.")
This script extracts the movie title, critic score, audience score, and a few audience reviews. If the reviews are dynamically loaded, Selenium is a better alternative. Here’s an example:
from selenium import webdriver from selenium.webdriver.common.by import By # Initialize Selenium WebDriver driver = webdriver.Chrome() driver.get("https://www.rottentomatoes.com/m/example_movie") # Wait for the page to load driver.implicitly_wait(10) # Extract movie details movie_title = driver.find_element(By.CLASS_NAME, "scoreboard__title").text.strip() critic_score = driver.find_element(By.CLASS_NAME, "scoreboard__score").text.strip() audience_score = driver.find_element(By.CLASS_NAME, "scoreboard__percentage").text.strip() print(f"Movie: {movie_title}") print(f"Critic Score: {critic_score}") print(f"Audience Score: {audience_score}") # Extract audience reviews reviews = driver.find_elements(By.CLASS_NAME, "audience-review") for review in reviews[:5]: # Limit to first 5 reviews review_text = review.find_element(By.CLASS_NAME, "audience-review__text").text.strip() print(f"Review: {review_text}") # Close the browser driver.quit()
In both examples, it’s important to include headers in your requests to mimic a browser and avoid being flagged as a bot. If you want to scrape reviews for multiple movies, you can create a loop that navigates through movie URLs or categories.
For long-term scraping projects, storing the data in a structured format, such as a CSV file or database, is recommended. Libraries like pandas make it easy to write data to a CSV file, while databases like SQLite or PostgreSQL allow for efficient querying and analysis.
Log in to reply.