Extract Restaurant Details, Customer Reviews and Ratings from TripAdvisor using Python

Video tutorial: https://youtu.be/kFs9YkxGvZE?si=1ZmcXPBpnUWgjNT7

Table of Content

Introduction
Ethical consideration
Prerequisites
Project setup
Inspecting the Elements
Complete Code First Page
The result in CSV format
Handling Pagination

Understanding the URL Structure
Implementing Pagination in the code
Implementing Proxy Rotation

The complete code
Conclusion

Introduction

TripAdvisor stands as a premier platform for reviews and ratings related to hotels, restaurants, attractions, and travel experiences. For professionals in the hospitality and travel sectors, the ability to analyze and extract data from TripAdvisor is invaluable. Whether your goal is to gather customer feedback, monitor service trends, or conduct comparative analyses of various establishments, scraping TripAdvisor reviews can yield powerful insights.

In this tutorial, we will provide a detailed, step-by-step guide on how to scrape the restaurant details, customer reviews, ratings, and review dates from a Michelin Star restaurant in New York using Python. Our focus will be on extracting data from the following page:

https://www.tripadvisor.com/Restaurant_Review-g60763-d478965-Reviews-Gallaghers_Steakhouse-New_York_City_New_York.html

By the end of this tutorial, you’ll have the tools to efficiently scrape and collect user feedback for deeper analysis, helping you make data-driven decisions. Let’s dive into the world of web scraping and get started with practical examples and source code!

The information that we want to scrape are:

Restaurant Name
Price Level
Cuisine Type
Total Rating
Total Reviews
Ranking
City
Food Rating
Service Rating
Value Rating
Atmosphere Rating
Address
Phone Number
Customer Rating
Review Title
Review Details
Customer Type
Written Date

Restaurant Details

Details Page

Ethical consideration

This tutorial employs widely-used web scraping methods for educational purposes. When engaging with public servers, it is crucial to approach the task responsibly. Here are some essential guidelines:

Avoid scraping at a speed that could negatively impact the website’s performance.
Do not scrape data that isn’t publicly accessible.
Refrain from redistributing entire public datasets, as this may violate legal regulations in certain jurisdictions.

Prerequisites

Python installed on your machine.

Project Setup

For this tutorial, we will utilize the requests and BeautifulSoup libraries:

pip install requests beautifulsoup4

Given that TripAdvisor employs robust anti-bot detection mechanisms, it is advisable to incorporate headers and proxies when scraping data from this site. While scraping without proxies may suffice for small-scale tasks, using them helps prevent IP blocking during extensive scraping activities.

Let’s begin by importing the necessary libraries and the start URL into our code:

import requests
from bs4 import BeautifulSoup
import logging
import csv

url = 'https://www.tripadvisor.com/Restaurant_Review-g60763-d478965-Reviews-Gallaghers_Steakhouse-New_York_City_New_York.html'

We set headers such as User-Agent to disguise our request as if it’s coming from a browser and avoid detection. DNT stands for ‘Do Not Track’, and including this can make your request appear more like a real user.

It is recommended that we utilize a proxy for this project. In this tutorial, I’m using the residential proxy from Rayobyte. You can sign up for a free trial of 50MB, and the best part is that no credit card is required to redeem it.

headers = {
    "User-Agent": Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36,
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "DNT": "1",
    "Connection": "keep-alive",
    "Upgrade-Insecure-Requests": "1",
}


proxies = {
    'http': 'http://username:password@host:port',
    'https': 'http://username:password@host:port'
}

To handle potential errors during execution, we will use a try-except block:

try:
    response = requests.get(url, headers=headers, proxies=proxies)
    soup = BeautifulSoup(response.content, 'html.parser')


    # Write the prettified HTML to a txt file
    with open('tripadvisor_restaurant_review.txt', 'w', encoding='utf-8') as file:
        file.write(soup.prettify())
    print("Content successfully written to 'tripadvisor_restaurant_review.txt")


except requests.exceptions.RequestException as e:
    logging.error(f"Error during requests to {url} : {str(e)}")

Instead of printing output results directly in the terminal, we will save them in a text file for easier review. This output file allows us to verify whether we successfully received a response from the website.

Response txt file

From the output file, we can see that by adding headers and proxies, we are able to parse the html element from the web page.

The purpose of this txt file is to check whether we get the response from the website or not. We can delete this file after that.

Inspecting the Elements

To inspect the elements, go back to the website, “right-click” anywhere and click on “Inspect” . Click on this arrow icon and start hovering on the element that we want.

Inspect element arrow

In this tutorial, we’ll focus our efforts on collecting reviews from a single restaurant, allowing us to scrape its details just once. After gathering this foundational information, we can easily replicate it across subsequent rows. The key details that will remain consistent include the restaurant’s name, ranking, total rating, price range, categories, address, city, and phone number. With this information in place, we’ll dive into scraping individual ratings and customer reviews to enrich our dataset.

Restaurant Name

Restaurant Name Element

restaurant_name = soup.find('h1').text.strip()

Price Level and Cuisine Type

Price Level and Cuisine Type element

general_infos = soup.find('span', class_='cPbcf').text.strip()

It returns all the values in the string. Therefore, we can split the string by commas then extract the price_level and cuisine_types.

info_parts = general_infos.split(', ')
price_level = info_parts[0]
cuisine_type = ', '.join(info_parts[1:])

Price Level and Cuisine Type output separate

Total Rating

Total Rating Element Total Rating Tag

detail_cards = soup.find_all('div', attrs={'data-automation': 'OVERVIEW_TAB_ELEMENT'})

These detail_cards are referring to the cards that are appears here:

Details card location

For rating information, all the data are inside the first card which is 0.

rating_info = detail_cards[0]
total_rating = rating_info.find('span', class_='biGQs').text.strip()

Total Reviews

Total review element Total review tag

total_reviews = rating_info.find('div', class_='jXaJR').text.strip().replace(' reviews', '')

Ranking text, Ranking and City

Ranking and city element

ranking_tag = rating_info.find_all('a', class_='BMQDV')
ranking_text = ranking_tag[1].find('span').text.strip().replace('#', '')
ranking = ranking_text.split()[0]

To extract the city inside the ranking_text string, we need to split it with the index ‘in‘. Then extract everything after the ‘in‘.

in_index = ranking_text.split().index('in')
city = ' '.join(ranking_text.split()[in_index + 1:])

Ranking and City output

Food Rating, Service Rating, Value Rating and Atmosphere Rating

All rating element All rating tag

Inside this div tag within the class="khxWm" there are 4 div tags with the same class="YwaWb". If we expand this div tag, we will see each rating category inside it.

rating_container = rating_info.find('div', class_='khxWm')
rating_category = rating_container.find_all('div', class_='YwaWb')

The food_rating is inside the first category.

Food rating tag example

food_rating = rating_category[0].find('svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')
service_rating = rating_category[1].find('svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')
value_rating = rating_category[2].find('svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')
atmosphere_rating = rating_category[3].find('svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')

All rating output

Address and Phone Number

Address and Phone Number element

location_info = detail_cards[2]
address = location_info.find('span', class_='biGQs').text.strip()

Phone Number tag

phone_no = location_info.find('a', attrs={'aria-label': 'Call'}).get('href').replace('tel:','')

Customer Rating

Please note that all reviews and ratings are retrieved in the “Detailed Reviews” format rather than the “Most Recent” format. While the page displays reviews in the “Most Recent” view, the data returned by our requests will be in the “Detailed Reviews” format.

First, locate all review cards within the webpage:

Customer review card

This is the example of the first review card inside the div tag with class="_c"

We will use data-automation="reviewCard" for specificity:

review_cards = soup.find_all('div', attrs={'data-automation': 'reviewCard'})

33 customer rating

The rating is initially represented as a visual element (a circle) without a numerical value. However, if we expand the svg tag, we can access the rating value inside the title tag, which appears as “5.0 of 5 bubbles.”

Customer rating tag

To extract just the numerical value (e.g., ‘5.0‘), we use the replace() method to remove the “ of 5 bubbles” part of the string.

rating_element = review.find('svg', class_='UctUV')
customer_rating = rating_element.find('title').text.strip().replace(' of 5 bubbles', '')

Review Title

Review title element Review title tag

review_title = review.find('div', attrs={'data-test-target': 'review-title'}).text.strip()

Review Details

Review details element Review details tag

review_details = review.find('div', attrs={'data-test-target': 'review-body'}).text.strip()

Customer Type

Customer type tag

customer_type = review.find('span', class_='DlAxN').text.strip()

Written Date

41 written date element 42 written date tag

To extract the date from the HTML, the div tag with the class="neAPm" contains two inner div tags. We need to target the first inner div to find the date. Here’s the code:

date_element = review.find('div', {'class': 'neAPm'})
child_divs = date_element.find_all('div')
date = child_divs[0].text.strip().replace('Written ', '')

The complete code for the reviews on the first page and save into csv format

import requests
from bs4 import BeautifulSoup
import logging
import csv
import time


def setup_request():

    headers = {
        "User-Agent": Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36,
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.5",
        "Accept-Encoding": "gzip, deflate, br",
        "DNT": "1",
        "Connection": "keep-alive",
        "Upgrade-Insecure-Requests": "1",
    }
    proxies = {
        'http': 'http://username:password@host:port',
        'https': 'http://username:password@host:port'
    }

    return headers, proxies


def get_restaurant_info(soup):
    restaurant_info = {
        'name': '',
        'price_level': '',
        'cuisine_type': '',
        'total_rating': '',
        'total_reviews': '',
        'food_rating': '',
        'service_rating': '',
        'value_rating': '',
        'atmosphere_rating': '',
        'ranking': '',
        'city': '',
        'address': '',
        'phone_no': ''
    }

    restaurant_info['name'] = soup.find('h1').text.strip()

    # General info processing
    general_infos = soup.find('span', class_='cPbcf').text.strip()
    info_parts = general_infos.split(', ')
    restaurant_info['price_level'] = info_parts[0]
    restaurant_info['cuisine_type'] = ', '.join(info_parts[1:])

    # Rating and review info
    detail_cards = soup.find_all(
        'div', attrs={'data-automation': 'OVERVIEW_TAB_ELEMENT'})
    if detail_cards:
        rating_info = detail_cards[0]
        restaurant_info['total_rating'] = rating_info.find(
            'span', class_='biGQs').text.strip()
        reviews_text = rating_info.find('div', class_='jXaJR').text.strip()
        restaurant_info['total_reviews'] = reviews_text.replace(' reviews', '')

        # Detailed ratings
        try:
            rating_container = rating_info.find('div', class_='khxWm')
            if rating_container:
                rating_category = rating_container.find_all(
                    'div', class_='YwaWb')
                if len(rating_category) >= 4:
                    restaurant_info['food_rating'] = rating_category[0].find(
                        'svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')
                    restaurant_info['service_rating'] = rating_category[1].find(
                        'svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')
                    restaurant_info['value_rating'] = rating_category[2].find(
                        'svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')
                    restaurant_info['atmosphere_rating'] = rating_category[3].find(
                        'svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')
        except Exception as e:
            logging.error(f"Error extracting detailed ratings: {str(e)}")

        # Ranking and city info
        ranking_tag = rating_info.find_all('a', class_='BMQDV')
        if len(ranking_tag) > 1:
            ranking_text = ranking_tag[1].find('span').text.strip()
            restaurant_info['ranking'] = ranking_text.split()[
                0].replace('#', '')
            in_index = ranking_text.split().index('in')
            restaurant_info['city'] = ' '.join(
                ranking_text.split()[in_index + 1:])

    # Address and phone info
    if len(detail_cards) > 2:
        location_info = detail_cards[2]
        restaurant_info['address'] = location_info.find(
            'span', class_='biGQs').text.strip()

        # Phone number
        try:
            phone_link = location_info.find('a', attrs={'aria-label': 'Call'})
            if phone_link:
                restaurant_info['phone_no'] = phone_link.get(
                    'href').replace('tel:', '')
        except Exception as e:
            logging.error(f"Error extracting phone number: {str(e)}")

    return restaurant_info


def scrape_reviews(soup):
    reviews = []
    review_cards = soup.find_all(
        'div', attrs={'data-automation': 'reviewCard'})

    for review in review_cards:
        review_data = {
            'rating': '',
            'title': '',
            'text': '',
            'date': ''
        }

        rating_element = review.find('svg', class_='UctUV')
        if rating_element:
            review_data['rating'] = rating_element.find(
                'title').text.strip().replace(' of 5 bubbles', '')

        title_element = review.find(
            'div', attrs={'data-test-target': 'review-title'})
        if title_element:
            review_data['title'] = title_element.text.strip()

        text_element = review.find(
            'div', attrs={'data-test-target': 'review-body'})
        if text_element:
            review_data['text'] = text_element.text.strip()

        date_element = review.find('div', class_='neAPm')
        if date_element:
            child_divs = date_element.find_all('div')
            if child_divs:
                review_data['date'] = child_divs[0].text.strip().replace(
                    'Written ', '')

        reviews.append(review_data)
        time.sleep(3)

    return reviews


def save_to_csv(restaurant_info, reviews, filename):
    with open(filename, mode='w', newline='', encoding='utf-8') as file:
        writer = csv.writer(file)

        # Write header
        header = ['RESTAURANT_NAME', 'PRICE_LEVEL', 'CUISINE_TYPE', 'TOTAL_RATING',
                  'TOTAL_REVIEWS', 'FOOD_RATING', 'SERVICE_RATING', 'VALUE_RATING',
                  'ATMOSPHERE_RATING', 'RANKING', 'CITY', 'ADDRESS', 'PHONE_NO',
                  'RATING', 'REVIEW_TITLE', 'REVIEW_TEXT', 'REVIEW_DATE']
        writer.writerow(header)

        # Write reviews with restaurant info
        for review in reviews:
            row = [
                restaurant_info['name'],
                restaurant_info['price_level'],
                restaurant_info['cuisine_type'],
                restaurant_info['total_rating'],
                restaurant_info['total_reviews'],
                restaurant_info['food_rating'],
                restaurant_info['service_rating'],
                restaurant_info['value_rating'],
                restaurant_info['atmosphere_rating'],
                restaurant_info['ranking'],
                restaurant_info['city'],
                restaurant_info['address'],
                restaurant_info['phone_no'],
                review['rating'],
                review['title'],
                review['text'],
                review['date']
            ]
            writer.writerow(row)


def main():
    url = 'https://www.tripadvisor.com/Restaurant_Review-g60763-d478965-Reviews-or45-Gallaghers_Steakhouse-New_York_City_New_York.html'
    headers, proxies = setup_request()

    try:
        response = requests.get(url, headers=headers, proxies=proxies)
        soup = BeautifulSoup(response.content, 'html.parser')
        time.sleep(10)

        restaurant_info = get_restaurant_info(soup)
        reviews = scrape_reviews(soup)

        save_to_csv(restaurant_info, reviews,
                    'tripadvisor_ny_restaurant_reviews_details_page_1.csv')
        print("All information saved successfully")

    except requests.exceptions.RequestException as e:
        logging.error(f"Error during requests to {url} : {str(e)}")


if __name__ == "__main__":
    main()

The result in CSV format

Here is the result from the first page. But please note that they are based on the “Detailed Review” instead of “Recent Reviews“

csv output

RESTAURANT_NAME,PRICE_LEVEL,CUISINE_TYPE,TOTAL_RATING,TOTAL_REVIEWS,FOOD_RATING,SERVICE_RATING,VALUE_RATING,ATMOSPHERE_RATING,RANKING,CITY,ADDRESS,PHONE_NO,RATING,REVIEW_TITLE,REVIEW_TEXT,REVIEW_DATE
Gallaghers Steakhouse,$$$$,"Steakhouse, Seafood, Gluten free options",4.5,"5,977",,,,,43,New York City,"228 W 52nd St, New York City, NY 10019-5802",,4,Pricey but worth it,"We booked ahead and glad we did, it was packed on a Thursday evening.

Lovely old school vibe to the restaurant with the majority of wait staff of an older generation.

Great meal, carpaccio and salmon tartare to start, rib eye steak and fillet for main. We weren’t told that there was a specials menu which I would have ordered from (8oz fillet rather than 10oz). We only knew about it as we overheard the table next to us being told what they were. When we asked the waiter about it he wasn’t happy we weren’t told about them and gave us a complimentary fruit platter, which was a nice gesture but I would have preferred them to ask if they could offer a dessert on the house (all the same price including the fruit platter). My husband ordered the pecan pie with ice cream which should have been hot but was cold.

Be prepared, this is a pricey joint but worth it. Food above with 2 sides, 1 glass of Prosecco and 2 glasses of Sangiovese came to $300 dollars with taxes and without tipRead more","October 2, 2022"
Gallaghers Steakhouse,$$$$,"Steakhouse, Seafood, Gluten free options",4.5,"5,977",,,,,43,New York City,"228 W 52nd St, New York City, NY 10019-5802",,5,An experience to remember!,"I had a reservation for 5:30- made weeks in advance as we had broadway tickets and it was my sweetheart’s birthday and this is where I wanted to take him. We were a little late getting out of the hotel, not thinking about rush-hour, there was no way we were going to get a cab, so we had to make it 17 blocks in about a 15 minute window. without stoplights and other people on the sidewalk, we probably would have made it in time, but we were running late. I called, James answered, I asked if he would please hold my reservation. He said that he would. We were still running behind, and I called back to be sure James was still going to hold my reservation since we were 15 minutes late. Thankfully, he did. I really appreciated that!!

The restaurant itself is really nice, high end. It was super crowded. We were seated at a small table in between two other couples, it was cramped. I was so happy to be there, I did not worry about rubbing elbows with strangers. It is also quite loud. This is not a quiet candlelight dinner spot if that’s what you are looking for.

The service was wonderful, quick, efficient, and very friendly. We had two mixed drinks to start, for the meal I had a sirloin steak, he had the surf and turf with a filet and lobster tail. We had Caesar salad, mashed potatoes, and spinach. We also had a glass of Shiraz with the dinner.

Everything was so outstanding. I like my steaks medium well, the steak that was brought to me was medium rare at best. Normally, I would have sent it back, but it tasted good, i did not. I did only eat half of it because the closer I got to the center, the more rare it was.
I do not know how they make their mashed potatoes, but they were the best ones I’ve ever had. The sautéed spinach was also quite delicious.

We did not finish our food, there was just so much of it! I did mention to the waiter that it was my boyfriends birthday, and he brought out a small chocolate cake. Even though we were too full to eat another bite, we ate the whole cake. It was so delicious.

Our bill was $275 plus tip. That is a really big bill for a dinner but this was an experience.Read more","October 24, 2022"
Gallaghers Steakhouse,$$$$,"Steakhouse, Seafood, Gluten free options",4.5,"5,977",,,,,43,New York City,"228 W 52nd St, New York City, NY 10019-5802",,5,Prime Rib all the time!,"I recently went to the restaurant with friends. Nicely set out with plenty of room and plenty of meat to choose from. We all went for the Prime Rib which we ordered in advance ( must do this ) absolutely delicious melt in your mouth. Great service, fantastic waiter an all round great evening out!Read more","March 14, 2020"
Gallaghers Steakhouse,$$$$,"Steakhouse, Seafood, Gluten free options",4.5,"5,977",,,,,43,New York City,"228 W 52nd St, New York City, NY 10019-5802",,5,Amazing restaurant,"Gallagher's is absolutely the best steak restaurant we have ever been to, the place is stylish and comfortable, the staff were very professional and friendly, service couldn't be faulted. The kitchen is in full view and very clean, great to watched the steaks being cooked on the open ovens.
The steaks were cooked to perfection and delicious, a really good cut of meat and a good choice of sides that were all big enough to share.
We can't wait to return on our next visit to NYC and have recommended this place to many others.Read more","January 20, 2020"
Gallaghers Steakhouse,$$$$,"Steakhouse, Seafood, Gluten free options",4.5,"5,977",,,,,43,New York City,"228 W 52nd St, New York City, NY 10019-5802",,5,Exceptional meal with fantastic service,"My husband and I enjoyed a fabulous dinner here. Our waiter, Derek, was phenomenal. The service was impeccable. My water glass was constantly filled, the food arrived in the right amount of time and was cooked absolutely perfectly and the manager stopped by to make sure everything was good. My NY strip was out of this world good. The sides are large and tasty. The wedge salad was the best I've ever eaten. The prices were just right for what you get. As an added bonus, a photographer stopped by to take table side pictures available for purchase. Nice touch! Overall, it was a meal we will always remember and we hope to visit again the next time we're in Vegas!Read more","June 28, 2021"
Gallaghers Steakhouse,$$$$,"Steakhouse, Seafood, Gluten free options",4.5,"5,977",,,,,43,New York City,"228 W 52nd St, New York City, NY 10019-5802",,5,Outstanding food and service,"Managed to squeeze in an early evening booking, lunch menu still
available!
Large restaurant, elegant, good atmosphere!
Super value 3 courses!
Basket of 4 different breads and whipped salted butter was a nice touch.
Caesar salad, clam chowder starters were first class.
Fillet mignonette 10oz with supplemental fee was simply sublime, cooked perfectly!
Desserts NY cheesecake and Key lime pie were delicious!
The service was old school perfection! Experienced, unfussy and extremely professional!
Outstanding dining experience, highly recommended!
Book a table, if you can get one!Read more","December 14, 2022"
Gallaghers Steakhouse,$$$$,"Steakhouse, Seafood, Gluten free options",4.5,"5,977",,,,,43,New York City,"228 W 52nd St, New York City, NY 10019-5802",,5,Outstanding Traditional Steakhouse,"The restaurant was full when we arrived and full when we left. That says it all really. Gallaghers has the feel of a traditional 1930's type establishment. All staff appear to be ""old school"", mature is years and sooo attentive with their service. The atmosphere was superb, the food divine. I had the Filet Mignon - to die for. Others had the ""Surf & Turf"" - the lobster tail was huge and so well prepared. The deserts were something else - absolutely huge. For us, we struggled with 2 courses. The bill wasn't cheap but, upon reflection, it was absolutely worth every dollar. Outstanding. If we ever go back to NYC, Gallaghers is a ""Must Do"" again.Read more","June 9, 2022"
Gallaghers Steakhouse,$$$$,"Steakhouse, Seafood, Gluten free options",4.5,"5,977",,,,,43,New York City,"228 W 52nd St, New York City, NY 10019-5802",,4,Great but Pricey,"Our first night in NY and not a disappointment! We dined early (5.30pm) as we had The Lion King booked down the road (Mishkov is about 10 minutes walk through Times Square from here).
The Dr Loosen Blue riesling by the glass is lovely. So is the Zardetto Prosecco. We had 3 Poterhouses to share between 3 adults and it was way too much food! We should have ordered 2 but never learn! Too much food but what great quality and perfectly cooked. Medium Rare is Rare to our UK pallets but was fine for me!
Our party also had the 10oz Filet and the Salmon which were also great.
The onion rings are exceptional. Cauliflower and Mac and cheese also very good. The service here is more formal than many places we went in NY subsequently but I guess that's what you should expect.
Great job Alvaro thanks.Read more","January 7, 2023"
Gallaghers Steakhouse,$$$$,"Steakhouse, Seafood, Gluten free options",4.5,"5,977",,,,,43,New York City,"228 W 52nd St, New York City, NY 10019-5802",,1,Awful service. Intimidating serving staff.,"Where to begin with this one. As soon as you walk through the door, you know the belt is already being unbuckled. This was with hindsight. To get to the point the service started okay. A little fake and rehearsed but okay. Starter was great. Steak was average at best. Lukewarm to cold. Sides were average. Drinks not of a sufficient standard. Guinness was poor. The real fun started when the bill arrived. Bill was 474 between the 4 of us. A decent spend, we thought. Tip time, not happy. This is when the real aggression started. We left what we thought was a suitable cash tip for 1 hours service. Server not happy. Informed head honcho. He wasn't happy. Wanted to know why we were not leaving a 78 dollar tip. The reason was is the fact we do not believe in entitlement. Servers do not earn 78 dollars an hour and seeing as the food was average, we thought what we had left was more than enough. To sum up. Find somewhere else to spend your hard earned. This place was recommended to us. Very disappointed. All in all the experience left an extremely nasty taste in our mouths.Read more","February 12, 2023"
Gallaghers Steakhouse,$$$$,"Steakhouse, Seafood, Gluten free options",4.5,"5,977",,,,,43,New York City,"228 W 52nd St, New York City, NY 10019-5802",,5,Tops!,"On par with the “chophouse” dining we have in SC. We had an exceptional experience for several reasons. Our waiter made it feel like we could taste every dish as he presented the options and specials. We had several staff members check on us throughout the meal. The table setup had us close to other diners and despite the reputation of northerners to be adverse to being bothered, we had great conversations with outlets neighbors. Wonderful ambiance as you can see the steak storage room and kitchen from all angles. Beautiful experience all around.Read more","October 5, 2021"
Gallaghers Steakhouse,$$$$,"Steakhouse, Seafood, Gluten free options",4.5,"5,977",,,,,43,New York City,"228 W 52nd St, New York City, NY 10019-5802",,1,A birthday celebration ruined,"Staff were opinionated and interrupted conversation with sarcastic comments. Calamari was lukewarm and lacked seasoning. The Caesar salad was ok but nothing special. The steaks were good but not hot, jacket potato was cold and was sent back but waiter instead of apologising preferred to antagonise us. Bottle of Rioja served wasn’t the same as the wine list which wasn’t explained and not very professional. The toilets had cheap hand soap at this prices you would expect better quality. A birthday celebration ruined.Read more","December 12, 2022"
Gallaghers Steakhouse,$$$$,"Steakhouse, Seafood, Gluten free options",4.5,"5,977",,,,,43,New York City,"228 W 52nd St, New York City, NY 10019-5802",,4,Delicious and get the NYC vibe,"This was worth a visit because it is steeped in history and the experience and service we got was quite special.
It is expensive and perhaps because we go out to eat steak alot, we have ruined it for ourselves a bit, because although the food was delicious and we had a great time, we didn’t feel it was worth the amount we spent comparing it to where we usually eat steak for much less.
We really did enjoy our food though, we had pork belly special and crab cocktail to start, porterhouse for two for mains with French fries and we also had key lime pie and NY cheesecake to finish.
We overindulged and it was a wonderful night but definitely in the high price bracket.
Our waiters were friendly, helpful and prompt. It was a good vibe.
The disclaimer on the photo near the host desk make me laugh re “this is not a photo of Jeffrey epstein, it’s Perry como”, I wonder how many times people asked about it for the sign to be necessary.
Worth a visit but remember that the prices don’t include the tax and service charge so don’t get carried away ordering like we did! Thanks for a lovely evening.Read more","February 12, 2023"
Gallaghers Steakhouse,$$$$,"Steakhouse, Seafood, Gluten free options",4.5,"5,977",,,,,43,New York City,"228 W 52nd St, New York City, NY 10019-5802",,5,Old time NYC steakhouse,"Excellent steakhouse. We make a reservation every time we come to NYC. The steak was fantastic. We had a Porterhouse for two and it was so good. Also had the Mac and cheese, which was delicious. The bread is so good, especially the date nut bread.
We were celebrating my husband’s birthday and the waiter brought a cake with a candle that was yummy.Read more","September 26, 2021"
Gallaghers Steakhouse,$$$$,"Steakhouse, Seafood, Gluten free options",4.5,"5,977",,,,,43,New York City,"228 W 52nd St, New York City, NY 10019-5802",,4,"Great dinner, but priced out- 1 & done!","Good food, service, and dinner experience, but it is the most expensive dinner we have ever eaten in our entire life! $22 for a glass of wine. Side of asparagus was $16 and my goat cheese salad had little goat cheese and beets. The filet was very good and watching the kitchen staff prepare food was a treat. Also, the lobby area/coat check area is not efficient. Basically had to fight to get my coat through the people waiting for the host.Read more","January 7, 2023"
Gallaghers Steakhouse,$$$$,"Steakhouse, Seafood, Gluten free options",4.5,"5,977",,,,,43,New York City,"228 W 52nd St, New York City, NY 10019-5802",,3,"Good food, disinterested waiter","We were greeted at our table by a lovely guy called Melvin...but that was the last we saw of him. We could see tables around us, who arrived later all being attended to. When our waiter arrived to take our order he pulled a face when my husband and aunt asked for their steaks to be cooked well done. We had the $29 lunch and my husband upgraded to 10oz filet mignon. Food was good but mine arrived with no roast potatoes. We were brought the wrong bill when it was time to pay. Not once in the whole time we were there did anyone ask if everything was OK with our meal. We all felt we were being hurried. This is second time in gallaghers (last time was full priced dinner) and whilst food has been great both times service wasn't that great on either. Dont think I'll be back as there are many other steakhouses in the area who are happy to see their customers and make sure their experience is enjoyable.Read more","September 4, 2022"

Details reviews sort

Handling Pagination

The code provided earlier only scrapes information from the first page, yielding a total of 15 results. To gather reviews from additional pages, we need to examine how the URL changes when we click on the “next” button.

Pagination

Understanding the URL Structure

The URL for the first page is:

https://www.tripadvisor.com/Restaurant_Review-g60763-d478965-Reviews-Gallaghers_Steakhouse-New_York_City_New_York.html

For subsequent pages, the URL changes as follows:

Second Page: https://www.tripadvisor.com/Restaurant_Review-g60763-d478965-Reviews-or15-Gallaghers_Steakhouse-New_York_City_New_York.html

Third Page: https://www.tripadvisor.com/Restaurant_Review-g60763-d478965-Reviews-or30-Gallaghers_Steakhouse-New_York_City_New_York.html

The pattern here shows that the or parameter increments by 15 for each new page.

Implementing Pagination in the code

To effectively scrape multiple pages of reviews, we’ll need to:

Create a function to generate URLs
Modify the main function to handle multiple pages
Ensure we don’t duplicate restaurant info for each page

Create generate_url function to create URLs for each page:

def generate_url(base_url, page_number):
    if page_number == 1:
        return base_url
    offset = (page_number - 1) * 15
    parts = base_url.split('Reviews')
    return f"{parts[0]}Reviews-or{offset}{parts[1]}"

2. Create check_last_page function to determine when to stop pagination:

def check_last_page(soup):
    try:
        pagination = soup.find('div', class_='pageNumbers')
        if pagination:
            last_page = int(pagination.find_all('a')[-1].text)
            return last_page
        return None
    except Exception as e:
        logging.error(f"Error checking last page: {str(e)}")
        return None

3. Modified save_to_csv to support appending to the file:

def save_to_csv(restaurant_info, reviews, filename, mode='w'):
    # Write header only if it's a new file
    if mode == 'w':
        writer.writerow(header)

Implementing Proxy Rotation

To implement the proxy rotation let’s generate the proxy list from the our proxy dashboard

Download or copy the proxies in the format “username:password@hostname:port” and save it inside a txt file. For example here proxies.txt

Create load_proxies function to read proxies from a file:

def load_proxies(file_path):
    try:
        with open(file_path, 'r') as file:
            proxies = [line.strip() for line in file if line.strip()]
        return proxies
    except Exception as e:
        logging.error(f"Error loading proxies from file: {str(e)}")
        return []

2. Create get_random_proxy function to randomly select a proxy:

def get_random_proxy(proxy_list):
    if not proxy_list:
        return None
    proxy = random.choice(proxy_list)
    return {
        'http': f'http://{proxy}',
        'https': f'http://{proxy}'
    }

3. Modified setup_request to use random proxies:

def setup_request(proxy_list):
    # ... headers setup ...
    proxies = get_random_proxy(proxy_list)
    return headers, proxies

4. Create make_request function for handling retries with different proxies:

def make_request(url, proxy_list, max_retries=3):
    retries = 0
    while retries < max_retries:
        try:
            headers, proxies = setup_request(proxy_list)
            if proxies:
                print(f"Using proxy: {proxies['http']}")
            response = requests.get(url, headers=headers, proxies=proxies, timeout=30)
            response.raise_for_status()
            return response
        except requests.RequestException as e:
            retries += 1
            if retries == max_retries:
                logging.error(f"Failed after {max_retries} attempts: {str(e)}")
                return None
            print(f"Retry {retries} with a different proxy")
            time.sleep(5)
    return None

5. Update the main function to use proxy rotation:

Loads proxies from file
Uses make_request function for each page
Implements random delays between requests

The complete code

import requests
from bs4 import BeautifulSoup
import logging
import csv
import time


def load_proxies(file_path):
    try:
        with open(file_path, 'r') as file:
            proxies = [line.strip() for line in file if line.strip()]
        return proxies
    except Exception as e:
        logging.error(f"Error loading proxies from file: {str(e)}")
        return []


def get_random_proxy(proxy_list):
    if not proxy_list:
        return None
    proxy = random.choice(proxy_list)
    return {
        'http': f'http://{proxy}',
        'https': f'http://{proxy}'
    }


def setup_request(proxy_list):
    headers = {
        "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.5",
        "Accept-Encoding": "gzip, deflate, br",
        "DNT": "1",
        "Connection": "keep-alive",
        "Upgrade-Insecure-Requests": "1",
    }
    proxies = get_random_proxy(proxy_list)
    return headers, proxies


def make_request(url, proxy_list, max_retries=3):
    retries = 0
    while retries < max_retries:
        try:
            headers, proxies = setup_request(proxy_list)
            if proxies:
                print(f"Using proxy: {proxies['http']}")
            response = requests.get(
                url, headers=headers, proxies=proxies, timeout=30)
            response.raise_for_status()
            return response
        except requests.RequestException as e:
            retries += 1
            if retries == max_retries:
                logging.error(f"Failed after {max_retries} attempts: {str(e)}")
                return None
            print(f"Retry {retries} with a different proxy")
            time.sleep(5)
    return None


def get_restaurant_info(soup):
    restaurant_info = {
        'name': '',
        'price_level': '',
        'cuisine_type': '',
        'total_rating': '',
        'total_reviews': '',
        'food_rating': '',
        'service_rating': '',
        'value_rating': '',
        'atmosphere_rating': '',
        'ranking': '',
        'city': '',
        'address': '',
        'phone_no': ''
    }

    restaurant_info['name'] = soup.find('h1').text.strip()

    # General info processing
    general_infos = soup.find('span', class_='cPbcf').text.strip()
    info_parts = general_infos.split(', ')
    restaurant_info['price_level'] = info_parts[0]
    restaurant_info['cuisine_type'] = ', '.join(info_parts[1:])

    # Rating and review info
    detail_cards = soup.find_all(
        'div', attrs={'data-automation': 'OVERVIEW_TAB_ELEMENT'})
    if detail_cards:
        rating_info = detail_cards[0]
        restaurant_info['total_rating'] = rating_info.find(
            'span', class_='biGQs').text.strip()
        reviews_text = rating_info.find('div', class_='jXaJR').text.strip()
        restaurant_info['total_reviews'] = reviews_text.replace(' reviews', '')

        # Detailed ratings
        try:
            rating_container = rating_info.find('div', class_='khxWm')
            if rating_container:
                rating_category = rating_container.find_all(
                    'div', class_='YwaWb')
                if len(rating_category) >= 4:
                    restaurant_info['food_rating'] = rating_category[0].find(
                        'svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')
                    restaurant_info['service_rating'] = rating_category[1].find(
                        'svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')
                    restaurant_info['value_rating'] = rating_category[2].find(
                        'svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')
                    restaurant_info['atmosphere_rating'] = rating_category[3].find(
                        'svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')
        except Exception as e:
            logging.error(f"Error extracting detailed ratings: {str(e)}")

        # Ranking and city info
        ranking_tag = rating_info.find_all('a', class_='BMQDV')
        if len(ranking_tag) > 1:
            ranking_text = ranking_tag[1].find('span').text.strip()
            restaurant_info['ranking'] = ranking_text.split()[
                0].replace('#', '')
            in_index = ranking_text.split().index('in')
            restaurant_info['city'] = ' '.join(
                ranking_text.split()[in_index + 1:])

    # Address and phone info
    if len(detail_cards) > 2:
        location_info = detail_cards[2]
        restaurant_info['address'] = location_info.find(
            'span', class_='biGQs').text.strip()

        # Phone number
        try:
            phone_link = location_info.find('a', attrs={'aria-label': 'Call'})
            if phone_link:
                restaurant_info['phone_no'] = phone_link.get(
                    'href').replace('tel:', '')
        except Exception as e:
            logging.error(f"Error extracting phone number: {str(e)}")

    return restaurant_info


def scrape_reviews(soup):
    reviews = []
    review_cards = soup.find_all(
        'div', attrs={'data-automation': 'reviewCard'})

    for review in review_cards:
        review_data = {
            'rating': '',
            'title': '',
            'text': '',
            'date': ''
        }

        rating_element = review.find('svg', class_='UctUV')
        if rating_element:
            review_data['rating'] = rating_element.find(
                'title').text.strip().replace(' of 5 bubbles', '')

        title_element = review.find(
            'div', attrs={'data-test-target': 'review-title'})
        if title_element:
            review_data['title'] = title_element.text.strip()

        text_element = review.find(
            'div', attrs={'data-test-target': 'review-body'})
        if text_element:
            review_data['text'] = text_element.text.strip()

        date_element = review.find('div', class_='neAPm')
        if date_element:
            child_divs = date_element.find_all('div')
            if child_divs:
                review_data['date'] = child_divs[0].text.strip().replace(
                    'Written ', '')

        reviews.append(review_data)
        time.sleep(3)

    return reviews


def generate_url(base_url, page_number):
    if page_number == 1:
        return base_url
    offset = (page_number - 1) * 15
    parts = base_url.split('Reviews')
    return f"{parts[0]}Reviews-or{offset}{parts[1]}"


def check_last_page(soup):
    try:
        pagination = soup.find('div', class_='pageNumbers')
        if pagination:
            last_page = int(pagination.find_all('a')[-1].text)
            return last_page
        return None
    except Exception as e:
        logging.error(f"Error checking last page: {str(e)}")
        return None


def save_to_csv(restaurant_info, reviews, filename):
    with open(filename, mode='w', newline='', encoding='utf-8') as file:
        writer = csv.writer(file)

        # Write header
        header = ['RESTAURANT_NAME', 'PRICE_LEVEL', 'CUISINE_TYPE', 'TOTAL_RATING',
                  'TOTAL_REVIEWS', 'FOOD_RATING', 'SERVICE_RATING', 'VALUE_RATING',
                  'ATMOSPHERE_RATING', 'RANKING', 'CITY', 'ADDRESS', 'PHONE_NO',
                  'RATING', 'REVIEW_TITLE', 'REVIEW_DETAILS', 'REVIEW_DATE']
        writer.writerow(header)

        # Write reviews with restaurant info
        for review in reviews:
            row = [
                restaurant_info['name'],
                restaurant_info['price_level'],
                restaurant_info['cuisine_type'],
                restaurant_info['total_rating'],
                restaurant_info['total_reviews'],
                restaurant_info['food_rating'],
                restaurant_info['service_rating'],
                restaurant_info['value_rating'],
                restaurant_info['atmosphere_rating'],
                restaurant_info['ranking'],
                restaurant_info['city'],
                restaurant_info['address'],
                restaurant_info['phone_no'],
                review['rating'],
                review['title'],
                review['text'],
                review['date']
            ]
            writer.writerow(row)


def main():
    base_url = 'https://www.tripadvisor.com/Restaurant_Review-g60763-d478965-Reviews-Gallaghers_Steakhouse-New_York_City_New_York.html'
    output_filename = 'tripadvisor_ny_restaurant_reviews.csv'
    proxy_file = 'proxies.txt'  # Make sure this matches your actual proxy file name

    # Load proxies
    proxy_list = load_proxies(proxy_file)
    if not proxy_list:
        print("No proxies loaded. Exiting.")
        return

    print(f"Loaded {len(proxy_list)} proxies")

    restaurant_info = None
    current_page = 1

    try:
        while True:
            url = generate_url(base_url, current_page)
            print(f"Scraping page {current_page}...")

            response = make_request(url, proxy_list)
            if not response:
                print(f"Failed to fetch page {current_page}. Stopping.")
                break

            soup = BeautifulSoup(response.content, 'html.parser')

            # Get restaurant info only once
            if restaurant_info is None:
                restaurant_info = get_restaurant_info(soup)
                last_page = check_last_page(soup)
                print(f"Total pages to scrape: {last_page}")

            reviews = scrape_reviews(soup)

            # Save to CSV (append mode for all pages after the first)
            save_mode = 'w' if current_page == 1 else 'a'
            save_to_csv(restaurant_info, reviews,
                        output_filename, mode=save_mode)

            print(f"Page {current_page} completed.")

            if last_page and current_page >= last_page:
                print("Reached the last page. Scraping completed.")
                break

            current_page += 1
            time.sleep(random.uniform(8, 12))  # Random delay between pages

        print(f"All information saved successfully to {output_filename}")

    except Exception as e:
        logging.error(f"An error occurred: {str(e)}")


if __name__ == "__main__":
    main()

Screenshot of the output terminal. For this tutorial, I’m setting to scrape all the reviews for the first 10 pages only.

46 run

You can visit the source code to see the example result.

Video Tutorial: Extract Restaurant Details, Customer Reviews and Ratings from TripAdvisor using Python

Conclusion

We’ve successfully learned how to scrape TripAdvisor reviews using Python, extracting key details like ratings, review dates, and more. Here’s a quick recap of what we’ve achieved:

Scraped restaurant details: Name, address, cuisine, and contact info.
Extracted review data: Customer feedback, ratings, and review titles.
Handled multiple review pages: Using pagination to get more data.
Stored data: In a structured format for analysis.

As you move forward, remember:

Ethics matter: Scrape responsibly and respect platform rules.
Expand your skills: Explore more data points or analyze the sentiment in reviews.

With these skills, you’re ready to gain valuable insights from TripAdvisor and beyond. Happy scraping!