{"id":1111,"date":"2024-10-08T09:28:44","date_gmt":"2024-10-08T09:28:44","guid":{"rendered":"https:\/\/rayobyte.com\/community\/?post_type=scraping_project&#038;p=1111"},"modified":"2024-10-09T14:51:34","modified_gmt":"2024-10-09T14:51:34","slug":"extract-restaurant-details-customer-reviews-and-ratings-from-tripadvisor-using-python","status":"publish","type":"scraping_project","link":"https:\/\/rayobyte.com\/community\/scraping-project\/extract-restaurant-details-customer-reviews-and-ratings-from-tripadvisor-using-python\/","title":{"rendered":"Extract Restaurant Details, Customer Reviews and Ratings from TripAdvisor using Python"},"content":{"rendered":"<p style=\"text-align: center;\"><iframe loading=\"lazy\" title=\"YouTube video player\" src=\"https:\/\/www.youtube.com\/embed\/kFs9YkxGvZE?si=-5-ggPqMdOlAM3xt\" width=\"560\" height=\"315\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/p>\n<p><em><span style=\"font-weight: 400;\">Source code: <\/span><a href=\"https:\/\/github.com\/ainacodes\/tripadvisor_scraper\" rel=\"nofollow noopener\" target=\"_blank\"><span style=\"font-weight: 400;\">tripadvisor_scraper<\/span><\/a><\/em><\/p>\n<p><em>Video tutorial:\u00a0<a href=\"https:\/\/youtu.be\/kFs9YkxGvZE?si=1ZmcXPBpnUWgjNT7\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/youtu.be\/kFs9YkxGvZE?si=1ZmcXPBpnUWgjNT7<\/a><\/em><\/p>\n<h2>Table of Content<\/h2>\n<p><a href=\"#introduction\">Introduction<\/a><br \/>\n<a href=\"#ethical-consideration\">Ethical consideration<\/a><br \/>\n<a href=\"#prerequisites\">Prerequisites<\/a><br \/>\n<a href=\"#project-setup\">Project setup<\/a><br \/>\n<a href=\"#inspecting-elements\">Inspecting the Elements<\/a><br \/>\n<a href=\"#complete-code-first-page\">Complete Code First Page<\/a><br \/>\n<a href=\"#result-in-csv\">The result in CSV format<\/a><br \/>\n<a href=\"#handling-pagination\">Handling Pagination<\/a><\/p>\n<ul>\n<li><a href=\"#url-structure\">Understanding the URL Structure<\/a><\/li>\n<li><a href=\"#implementing-pagination\">Implementing Pagination in the code<\/a><\/li>\n<li><a href=\"#implementing-proxy-rotation\">Implementing Proxy Rotation<\/a><\/li>\n<\/ul>\n<p><a href=\"#the-complete-code\">The complete code<\/a><br \/>\n<a href=\"#conclusion\">Conclusion<\/a><\/p>\n<h2 id=\"introduction\">Introduction<\/h2>\n<p><span style=\"font-weight: 400;\">TripAdvisor stands as a premier platform for reviews and ratings related to hotels, restaurants, attractions, and travel experiences. For professionals in the hospitality and travel sectors, the ability to analyze and extract data from TripAdvisor is invaluable. Whether your goal is to gather customer feedback, monitor service trends, or conduct comparative analyses of various establishments, scraping TripAdvisor reviews can yield powerful insights.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In this tutorial, we will provide a detailed, step-by-step guide on how to scrape the restaurant details, customer reviews, ratings, and review dates from a Michelin Star restaurant in New York using Python. Our focus will be on extracting data from the following page:<\/span><\/p>\n<p><a href=\"https:\/\/www.tripadvisor.com\/Restaurant_Review-g60763-d478965-Reviews-Gallaghers_Steakhouse-New_York_City_New_York.html\" rel=\"nofollow noopener\" target=\"_blank\"><span style=\"font-weight: 400;\">https:\/\/www.tripadvisor.com\/Restaurant_Review-g60763-d478965-Reviews-Gallaghers_Steakhouse-New_York_City_New_York.html<\/span><\/a><\/p>\n<p><strong> <img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1125\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/01_landing_page.png\" alt=\"TA landing page\" width=\"1920\" height=\"996\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/01_landing_page.png 1920w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/01_landing_page-300x156.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/01_landing_page-1024x531.png 1024w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/01_landing_page-768x398.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/01_landing_page-1536x797.png 1536w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/01_landing_page-624x324.png 624w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><\/strong><\/p>\n<p><span style=\"font-weight: 400;\">By the end of this tutorial, you\u2019ll have the tools to efficiently scrape and collect user feedback for deeper analysis, helping you make data-driven decisions. Let\u2019s dive into the world of web scraping and get started with practical examples and source code!<\/span><\/p>\n<h3><span style=\"font-weight: 400;\">The information that we want to scrape are:<\/span><\/h3>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Restaurant Name<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Price Level<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Cuisine Type<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Total Rating<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Total Reviews<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Ranking<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">City<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Food Rating<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Service Rating<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Value Rating<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Atmosphere Rating<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Address<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Phone Number<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Customer Rating<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Review Title\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Review Details<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Customer Type<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Written Date<\/span><\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1127\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/02_restaurant_details.png\" alt=\"Restaurant Details\" width=\"1920\" height=\"996\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/02_restaurant_details.png 1920w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/02_restaurant_details-300x156.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/02_restaurant_details-1024x531.png 1024w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/02_restaurant_details-768x398.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/02_restaurant_details-1536x797.png 1536w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/02_restaurant_details-624x324.png 624w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1128\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/03_details_page.png\" alt=\"Details Page\" width=\"1920\" height=\"996\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/03_details_page.png 1920w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/03_details_page-300x156.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/03_details_page-1024x531.png 1024w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/03_details_page-768x398.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/03_details_page-1536x797.png 1536w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/03_details_page-624x324.png 624w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><\/p>\n<h2><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1129\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/04_reviews_section.png\" alt=\"Reviews section\" width=\"1920\" height=\"1032\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/04_reviews_section.png 1920w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/04_reviews_section-300x161.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/04_reviews_section-1024x550.png 1024w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/04_reviews_section-768x413.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/04_reviews_section-1536x826.png 1536w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/04_reviews_section-624x335.png 624w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><\/h2>\n<h2 id=\"ethical-consideration\"><span style=\"font-weight: 400;\">Ethical consideration<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">This tutorial employs widely-used web scraping methods for educational purposes. When engaging with public servers, it is crucial to approach the task responsibly. Here are some essential guidelines:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Avoid scraping at a speed that could negatively impact the website&#8217;s performance.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Do not scrape data that isn\u2019t publicly accessible.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Refrain from redistributing entire public datasets, as this may violate legal regulations in certain jurisdictions.<\/span><\/li>\n<\/ul>\n<h2><\/h2>\n<h2 id=\"prerequisites\"><span style=\"font-weight: 400;\">Prerequisites<\/span><\/h2>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Python installed on your machine.<\/span><\/li>\n<\/ul>\n<h2 id=\"project-setup\"><span style=\"font-weight: 400;\">Project Setup<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">For this tutorial, we will utilize the <strong>requests<\/strong> and <strong>BeautifulSoup<\/strong> libraries:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">pip install requests beautifulsoup4<\/pre>\n<p><span style=\"font-weight: 400;\">Given that TripAdvisor employs robust anti-bot detection mechanisms, it is advisable to incorporate headers and proxies when scraping data from this site. While scraping without proxies may suffice for small-scale tasks, using them helps prevent IP blocking during extensive scraping activities.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Let\u2019s begin by importing the necessary libraries and the start URL into our code:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nfrom bs4 import BeautifulSoup\r\nimport logging\r\nimport csv\r\n\r\nurl = 'https:\/\/www.tripadvisor.com\/Restaurant_Review-g60763-d478965-Reviews-Gallaghers_Steakhouse-New_York_City_New_York.html'<\/pre>\n<p><span style=\"font-weight: 400;\">We set headers such as User-Agent to disguise our request as if it&#8217;s coming from a browser and avoid detection. DNT stands for &#8216;Do Not Track&#8217;, and including this can make your request appear more like a real user.<\/span><\/p>\n<p>It is recommended that we utilize a proxy for this project. In this tutorial, I&#8217;m using the residential proxy from <a href=\"https:\/\/rayobyte.com\/products\/residential-proxies\/\">Rayobyte<\/a>. You can sign up for a free trial of 50MB, and the best part is that no credit card is required to redeem it.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">headers = {\r\n\u00a0 \u00a0 \"User-Agent\": Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/119.0.0.0 Safari\/537.36,\r\n\u00a0 \u00a0 \"Accept\": \"text\/html,application\/xhtml+xml,application\/xml;q=0.9,image\/webp,*\/*;q=0.8\",\r\n\u00a0 \u00a0 \"Accept-Language\": \"en-US,en;q=0.5\",\r\n\u00a0 \u00a0 \"Accept-Encoding\": \"gzip, deflate, br\",\r\n\u00a0 \u00a0 \"DNT\": \"1\",\r\n\u00a0 \u00a0 \"Connection\": \"keep-alive\",\r\n\u00a0 \u00a0 \"Upgrade-Insecure-Requests\": \"1\",\r\n}\r\n\r\n\r\nproxies = {\r\n\u00a0 \u00a0 'http': 'http:\/\/username:password@host:port',\r\n\u00a0 \u00a0 'https': 'http:\/\/username:password@host:port'\r\n}<\/pre>\n<p><span style=\"font-weight: 400;\">To handle potential errors during execution, we will use a try-except block:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">try:\r\n\u00a0 \u00a0 response = requests.get(url, headers=headers, proxies=proxies)\r\n\u00a0 \u00a0 soup = BeautifulSoup(response.content, 'html.parser')\r\n\r\n\r\n\u00a0 \u00a0 # Write the prettified HTML to a txt file\r\n\u00a0 \u00a0 with open('tripadvisor_restaurant_review.txt', 'w', encoding='utf-8') as file:\r\n\u00a0 \u00a0 \u00a0 \u00a0 file.write(soup.prettify())\r\n\u00a0 \u00a0 print(\"Content successfully written to 'tripadvisor_restaurant_review.txt\")\r\n\r\n\r\nexcept requests.exceptions.RequestException as e:\r\n\u00a0 \u00a0 logging.error(f\"Error during requests to {url} : {str(e)}\")<\/pre>\n<p><span style=\"font-weight: 400;\">Instead of printing output results directly in the terminal, we will save them in a text file for easier review. This output file allows us to verify whether we successfully received a response from the website.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1133\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/06_response_txt.png\" alt=\"Response txt file\" width=\"863\" height=\"510\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/06_response_txt.png 863w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/06_response_txt-300x177.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/06_response_txt-768x454.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/06_response_txt-624x369.png 624w\" sizes=\"auto, (max-width: 863px) 100vw, 863px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">From the output file, we can see that by adding headers and proxies, we are able to parse the html element from the web page.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The purpose of this txt file is to check whether we get the response from the website or not. We can delete this file after that.<\/span><\/p>\n<h2><\/h2>\n<h2 id=\"inspecting-elements\"><span style=\"font-weight: 400;\">Inspecting the Elements<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">To inspect the elements, go back to the website, \u201c<\/span><b>right-click<\/b><span style=\"font-weight: 400;\">\u201d anywhere and click on \u201c<\/span><b>Inspect<\/b><span style=\"font-weight: 400;\">\u201d . Click on this arrow icon and start hovering on the element that we want.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1135\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/07_inspect_arrow.png\" alt=\"Inspect element arrow\" width=\"317\" height=\"261\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/07_inspect_arrow.png 317w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/07_inspect_arrow-300x247.png 300w\" sizes=\"auto, (max-width: 317px) 100vw, 317px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">In this tutorial, we\u2019ll focus our efforts on collecting reviews from a single restaurant, allowing us to scrape its details just once. After gathering this foundational information, we can easily replicate it across subsequent rows. The key details that will remain consistent include the restaurant&#8217;s name, ranking, total rating, price range, categories, address, city, and phone number. With this information in place, we\u2019ll dive into scraping individual ratings and customer reviews to enrich our dataset.\u00a0<\/span><\/p>\n<h3><\/h3>\n<h3><span style=\"font-weight: 400;\">Restaurant Name<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1137\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/08_name_element.png\" alt=\"Restaurant Name Element\" width=\"628\" height=\"169\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/08_name_element.png 628w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/08_name_element-300x81.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/08_name_element-624x168.png 624w\" sizes=\"auto, (max-width: 628px) 100vw, 628px\" \/><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">restaurant_name = soup.find('h1').text.strip()<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1139\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/09_name_result.png\" alt=\"restaurant name output\" width=\"340\" height=\"35\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/09_name_result.png 340w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/09_name_result-300x31.png 300w\" sizes=\"auto, (max-width: 340px) 100vw, 340px\" \/><\/p>\n<h3><span style=\"font-weight: 400;\">Price Level and Cuisine Type<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1140\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/10_price_cuisine_element.png\" alt=\"Price Level and Cuisine Type element\" width=\"578\" height=\"143\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/10_price_cuisine_element.png 578w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/10_price_cuisine_element-300x74.png 300w\" sizes=\"auto, (max-width: 578px) 100vw, 578px\" \/><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">general_infos = soup.find('span', class_='cPbcf').text.strip()<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1143\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/11_price_cuisine_result.png\" alt=\"Price Level and Cuisine Type output\" width=\"544\" height=\"38\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/11_price_cuisine_result.png 544w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/11_price_cuisine_result-300x21.png 300w\" sizes=\"auto, (max-width: 544px) 100vw, 544px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">It returns all the values in the string. Therefore, we can split the string by commas then extract the <code>price_level<\/code> and <code>cuisine_types<\/code>.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">info_parts = general_infos.split(', ')\r\nprice_level = info_parts[0]\r\ncuisine_type = ', '.join(info_parts[1:])<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1145\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/12_price_cuisine_final_result.png\" alt=\"Price Level and Cuisine Type output separate\" width=\"479\" height=\"50\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/12_price_cuisine_final_result.png 479w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/12_price_cuisine_final_result-300x31.png 300w\" sizes=\"auto, (max-width: 479px) 100vw, 479px\" \/><\/p>\n<h3><span style=\"font-weight: 400;\">Total Rating<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1147\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/13_total_rating_element.png\" alt=\"Total Rating Element\" width=\"694\" height=\"877\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/13_total_rating_element.png 694w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/13_total_rating_element-237x300.png 237w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/13_total_rating_element-624x789.png 624w\" sizes=\"auto, (max-width: 694px) 100vw, 694px\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1148\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/14_total_rating_tag.png\" alt=\"Total Rating Tag\" width=\"547\" height=\"541\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/14_total_rating_tag.png 547w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/14_total_rating_tag-300x297.png 300w\" sizes=\"auto, (max-width: 547px) 100vw, 547px\" \/><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">detail_cards = soup.find_all('div', attrs={'data-automation': 'OVERVIEW_TAB_ELEMENT'})<\/pre>\n<p><span style=\"font-weight: 400;\">These <code>detail_cards<\/code> are referring to the cards that are appears here:<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1149\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/15_total_rating_location.png\" alt=\"Details card location\" width=\"1912\" height=\"802\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/15_total_rating_location.png 1912w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/15_total_rating_location-300x126.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/15_total_rating_location-1024x430.png 1024w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/15_total_rating_location-768x322.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/15_total_rating_location-1536x644.png 1536w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/15_total_rating_location-624x262.png 624w\" sizes=\"auto, (max-width: 1912px) 100vw, 1912px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">For rating information, all the data are inside the first card which is 0.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">rating_info = detail_cards[0]\r\ntotal_rating = rating_info.find('span', class_='biGQs').text.strip()<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1150\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/16_total_rating_output.png\" alt=\"Total rating output\" width=\"160\" height=\"35\" title=\"\"><\/p>\n<h3><span style=\"font-weight: 400;\">Total Reviews<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1151\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/17_total_reviews_element.png\" alt=\"Total review element\" width=\"180\" height=\"125\" title=\"\"> <img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1152\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/18_total_reviews_tag.png\" alt=\"Total review tag\" width=\"486\" height=\"307\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/18_total_reviews_tag.png 486w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/18_total_reviews_tag-300x190.png 300w\" sizes=\"auto, (max-width: 486px) 100vw, 486px\" \/><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">total_reviews = rating_info.find('div', class_='jXaJR').text.strip().replace(' reviews', '')<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1153\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/19_total_review_output.png\" alt=\"Total review output\" width=\"180\" height=\"31\" title=\"\"><\/p>\n<h3><span style=\"font-weight: 400;\">Ranking text, Ranking and City<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1154\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/20_ranking_city_element.png\" alt=\"Ranking and city element\" width=\"488\" height=\"148\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/20_ranking_city_element.png 488w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/20_ranking_city_element-300x91.png 300w\" sizes=\"auto, (max-width: 488px) 100vw, 488px\" \/><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">ranking_tag = rating_info.find_all('a', class_='BMQDV')\r\nranking_text = ranking_tag[1].find('span').text.strip().replace('#', '')\r\nranking = ranking_text.split()[0]<\/pre>\n<p><span style=\"font-weight: 400;\">To extract the <code>city<\/code> inside the <code>ranking_text<\/code> string, we need to split it with the index &#8216;<strong>in<\/strong>&#8216;. Then extract everything after the &#8216;<strong>in<\/strong>&#8216;.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">in_index = ranking_text.split().index('in')\r\ncity = ' '.join(ranking_text.split()[in_index + 1:])<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1155\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/21_ranking_city_output.png\" alt=\"Ranking and City output\" width=\"480\" height=\"67\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/21_ranking_city_output.png 480w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/21_ranking_city_output-300x42.png 300w\" sizes=\"auto, (max-width: 480px) 100vw, 480px\" \/><\/p>\n<h3><span style=\"font-weight: 400;\">Food Rating, Service Rating, Value Rating and Atmosphere Rating<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1156\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/22_all_rating_elements.png\" alt=\"All rating element\" width=\"610\" height=\"362\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/22_all_rating_elements.png 610w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/22_all_rating_elements-300x178.png 300w\" sizes=\"auto, (max-width: 610px) 100vw, 610px\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1157\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/23_all_rating_teg.png\" alt=\"All rating tag\" width=\"501\" height=\"261\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/23_all_rating_teg.png 501w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/23_all_rating_teg-300x156.png 300w\" sizes=\"auto, (max-width: 501px) 100vw, 501px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Inside this div tag within the <code>class=\"khxWm\"<\/code> there are 4 <code>div<\/code> tags with the same <code>class=\"YwaWb\"<\/code>. If we expand this <code>div<\/code> tag, we will see each rating category inside it.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">rating_container = rating_info.find('div', class_='khxWm')\r\nrating_category = rating_container.find_all('div', class_='YwaWb')<\/pre>\n<p><span style=\"font-weight: 400;\">The <code>food_rating<\/code> is inside the first category.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1158\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/24_all_rating_tag_2.png\" alt=\"Food rating tag example\" width=\"499\" height=\"468\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/24_all_rating_tag_2.png 499w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/24_all_rating_tag_2-300x281.png 300w\" sizes=\"auto, (max-width: 499px) 100vw, 499px\" \/><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">food_rating = rating_category[0].find('svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')\r\nservice_rating = rating_category[1].find('svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')\r\nvalue_rating = rating_category[2].find('svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')\r\natmosphere_rating = rating_category[3].find('svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1159\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/25_all_rating_outout.png\" alt=\"All rating output\" width=\"216\" height=\"87\" title=\"\"><\/p>\n<h3><span style=\"font-weight: 400;\">Address and Phone Number<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1160\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/26_address_phone_element.png\" alt=\"Address and Phone Number element\" width=\"515\" height=\"112\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/26_address_phone_element.png 515w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/26_address_phone_element-300x65.png 300w\" sizes=\"auto, (max-width: 515px) 100vw, 515px\" \/><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">location_info = detail_cards[2]\r\naddress = location_info.find('span', class_='biGQs').text.strip()<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1161\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/27_address_output.png\" alt=\"Address output\" width=\"459\" height=\"29\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/27_address_output.png 459w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/27_address_output-300x19.png 300w\" sizes=\"auto, (max-width: 459px) 100vw, 459px\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1162\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/28_phone_tag.png\" alt=\"Phone Number tag\" width=\"473\" height=\"136\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/28_phone_tag.png 473w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/28_phone_tag-300x86.png 300w\" sizes=\"auto, (max-width: 473px) 100vw, 473px\" \/><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">phone_no = location_info.find('a', attrs={'aria-label': 'Call'}).get('href').replace('tel:','')<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1163\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/29_phone_output.png\" alt=\"Phone Number Output\" width=\"261\" height=\"36\" title=\"\"><\/p>\n<h3><span style=\"font-weight: 400;\">Customer Rating<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Please note that all reviews and ratings are retrieved in the &#8220;<strong>Detailed Reviews<\/strong>&#8221; format rather than the &#8220;<strong>Most Recent<\/strong>&#8221; format. While the page displays reviews in the &#8220;<strong>Most Recent<\/strong>&#8221; view, the data returned by our requests will be in the &#8220;<strong>Detailed Reviews<\/strong>&#8221; format.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">First, locate all review cards within the webpage:<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1164\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/30_customer_card.png\" alt=\"Customer review card\" width=\"758\" height=\"562\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/30_customer_card.png 758w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/30_customer_card-300x222.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/30_customer_card-624x463.png 624w\" sizes=\"auto, (max-width: 758px) 100vw, 758px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">This is the example of the first review card inside the <code>div<\/code> tag with <code>class=\"_c\"<\/code><\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1165\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/32_customer_card_tag.png\" alt=\"Customer review card tag\" width=\"595\" height=\"43\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/32_customer_card_tag.png 595w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/32_customer_card_tag-300x22.png 300w\" sizes=\"auto, (max-width: 595px) 100vw, 595px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">We will use <code>data-automation=\"reviewCard\"<\/code> for specificity:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">review_cards = soup.find_all('div', attrs={'data-automation': 'reviewCard'})<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1167\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/33_customer_rating.png\" alt=\"\" width=\"324\" height=\"144\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/33_customer_rating.png 324w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/33_customer_rating-300x133.png 300w\" sizes=\"auto, (max-width: 324px) 100vw, 324px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">The rating is initially represented as a visual element (a circle) without a numerical value. However, if we expand the <code>svg<\/code> tag, we can access the rating value inside the <code>title<\/code> tag, which appears as &#8220;<strong>5.0 of 5 bubbles.<\/strong>&#8221;\u00a0<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1168\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/34_customer_rating_tag.png\" alt=\"Customer rating tag\" width=\"680\" height=\"157\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/34_customer_rating_tag.png 680w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/34_customer_rating_tag-300x69.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/34_customer_rating_tag-624x144.png 624w\" sizes=\"auto, (max-width: 680px) 100vw, 680px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">To extract just the numerical value (e.g., &#8216;<strong>5.0<\/strong>&#8216;), we use the <code>replace()<\/code> method to remove the &#8220;<strong> of 5 bubbles<\/strong>&#8221; part of the string.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">rating_element = review.find('svg', class_='UctUV')\r\ncustomer_rating = rating_element.find('title').text.strip().replace(' of 5 bubbles', '')<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1166\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/35_customer_rating_output.png\" alt=\"Customer rating output\" width=\"186\" height=\"37\" title=\"\"><\/p>\n<h3><span style=\"font-weight: 400;\">Review Title<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1169\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/36_review_title_element.png\" alt=\"Review title element\" width=\"287\" height=\"126\" title=\"\">\u00a0 <img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1170\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/37_review_title_tag.png\" alt=\"Review title tag\" width=\"712\" height=\"298\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/37_review_title_tag.png 712w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/37_review_title_tag-300x126.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/37_review_title_tag-624x261.png 624w\" sizes=\"auto, (max-width: 712px) 100vw, 712px\" \/><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">review_title = review.find('div', attrs={'data-test-target': 'review-title'}).text.strip()<\/pre>\n<h3><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1171\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/review_title_output.png\" alt=\"Review title output\" width=\"408\" height=\"55\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/review_title_output.png 408w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/review_title_output-300x40.png 300w\" sizes=\"auto, (max-width: 408px) 100vw, 408px\" \/><\/h3>\n<h3><span style=\"font-weight: 400;\">Review Details<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1172\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/38_review_details_element.png\" alt=\"Review details element\" width=\"695\" height=\"239\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/38_review_details_element.png 695w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/38_review_details_element-300x103.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/38_review_details_element-624x215.png 624w\" sizes=\"auto, (max-width: 695px) 100vw, 695px\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1173\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/39_review_details_tag.png\" alt=\"Review details tag\" width=\"775\" height=\"599\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/39_review_details_tag.png 775w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/39_review_details_tag-300x232.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/39_review_details_tag-768x594.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/39_review_details_tag-624x482.png 624w\" sizes=\"auto, (max-width: 775px) 100vw, 775px\" \/><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">review_details = review.find('div', attrs={'data-test-target': 'review-body'}).text.strip()<\/pre>\n<h3><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1174\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/review_text_output.png\" alt=\"Review details output\" width=\"1414\" height=\"93\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/review_text_output.png 1414w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/review_text_output-300x20.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/review_text_output-1024x67.png 1024w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/review_text_output-768x51.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/review_text_output-624x41.png 624w\" sizes=\"auto, (max-width: 1414px) 100vw, 1414px\" \/><\/h3>\n<h3><span style=\"font-weight: 400;\">Customer Type\u00a0<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1175\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/40_customer_type_element.png\" alt=\"Customer type tag\" width=\"193\" height=\"138\" title=\"\"><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">customer_type = review.find('span', class_='DlAxN').text.strip()<\/pre>\n<h3><span style=\"font-weight: 400;\">Written Date<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1176\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/41_written_date_element.png\" alt=\"\" width=\"828\" height=\"216\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/41_written_date_element.png 828w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/41_written_date_element-300x78.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/41_written_date_element-768x200.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/41_written_date_element-624x163.png 624w\" sizes=\"auto, (max-width: 828px) 100vw, 828px\" \/> <img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1177\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/42_written_date_tag.png\" alt=\"\" width=\"600\" height=\"242\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/42_written_date_tag.png 600w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/42_written_date_tag-300x121.png 300w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">To extract the date from the HTML, the <code>div<\/code> tag with the <code>class=\"neAPm\"<\/code> contains <em>two inner<\/em> <code>div<\/code> tags. We need to target the first inner div to find the date. Here&#8217;s the code:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">date_element = review.find('div', {'class': 'neAPm'})\r\nchild_divs = date_element.find_all('div')\r\ndate = child_divs[0].text.strip().replace('Written ', '')<\/pre>\n<h2><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1178\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/written_date_output.png\" alt=\"\" width=\"490\" height=\"56\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/written_date_output.png 490w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/written_date_output-300x34.png 300w\" sizes=\"auto, (max-width: 490px) 100vw, 490px\" \/><\/h2>\n<h2 id=\"complete-code-first-page\"><span style=\"font-weight: 400;\">The complete code for the reviews on the first page and save into csv format<\/span><\/h2>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nfrom bs4 import BeautifulSoup\r\nimport logging\r\nimport csv\r\nimport time\r\n\r\n\r\ndef setup_request():\r\n\r\n\u00a0 \u00a0 headers = {\r\n\u00a0 \u00a0 \u00a0 \u00a0 \"User-Agent\": Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/119.0.0.0 Safari\/537.36,\r\n\u00a0 \u00a0 \u00a0 \u00a0 \"Accept\": \"text\/html,application\/xhtml+xml,application\/xml;q=0.9,image\/webp,*\/*;q=0.8\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \"Accept-Language\": \"en-US,en;q=0.5\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \"Accept-Encoding\": \"gzip, deflate, br\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \"DNT\": \"1\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \"Connection\": \"keep-alive\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \"Upgrade-Insecure-Requests\": \"1\",\r\n\u00a0 \u00a0 }\r\n\u00a0 \u00a0 proxies = {\r\n\u00a0 \u00a0 \u00a0 \u00a0 'http': 'http:\/\/username:password@host:port',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'https': 'http:\/\/username:password@host:port'\r\n\u00a0 \u00a0 }\r\n\r\n\u00a0 \u00a0 return headers, proxies\r\n\r\n\r\ndef get_restaurant_info(soup):\r\n\u00a0 \u00a0 restaurant_info = {\r\n\u00a0 \u00a0 \u00a0 \u00a0 'name': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'price_level': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'cuisine_type': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'total_rating': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'total_reviews': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'food_rating': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'service_rating': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'value_rating': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'atmosphere_rating': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'ranking': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'city': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'address': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'phone_no': ''\r\n\u00a0 \u00a0 }\r\n\r\n\u00a0 \u00a0 restaurant_info['name'] = soup.find('h1').text.strip()\r\n\r\n\u00a0 \u00a0 # General info processing\r\n\u00a0 \u00a0 general_infos = soup.find('span', class_='cPbcf').text.strip()\r\n\u00a0 \u00a0 info_parts = general_infos.split(', ')\r\n\u00a0 \u00a0 restaurant_info['price_level'] = info_parts[0]\r\n\u00a0 \u00a0 restaurant_info['cuisine_type'] = ', '.join(info_parts[1:])\r\n\r\n\u00a0 \u00a0 # Rating and review info\r\n\u00a0 \u00a0 detail_cards = soup.find_all(\r\n\u00a0 \u00a0 \u00a0 \u00a0 'div', attrs={'data-automation': 'OVERVIEW_TAB_ELEMENT'})\r\n\u00a0 \u00a0 if detail_cards:\r\n\u00a0 \u00a0 \u00a0 \u00a0 rating_info = detail_cards[0]\r\n\u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['total_rating'] = rating_info.find(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'span', class_='biGQs').text.strip()\r\n\u00a0 \u00a0 \u00a0 \u00a0 reviews_text = rating_info.find('div', class_='jXaJR').text.strip()\r\n\u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['total_reviews'] = reviews_text.replace(' reviews', '')\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 # Detailed ratings\r\n\u00a0 \u00a0 \u00a0 \u00a0 try:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 rating_container = rating_info.find('div', class_='khxWm')\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 if rating_container:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 rating_category = rating_container.find_all(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'div', class_='YwaWb')\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 if len(rating_category) &gt;= 4:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['food_rating'] = rating_category[0].find(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['service_rating'] = rating_category[1].find(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['value_rating'] = rating_category[2].find(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['atmosphere_rating'] = rating_category[3].find(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')\r\n\u00a0 \u00a0 \u00a0 \u00a0 except Exception as e:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 logging.error(f\"Error extracting detailed ratings: {str(e)}\")\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 # Ranking and city info\r\n\u00a0 \u00a0 \u00a0 \u00a0 ranking_tag = rating_info.find_all('a', class_='BMQDV')\r\n\u00a0 \u00a0 \u00a0 \u00a0 if len(ranking_tag) &gt; 1:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 ranking_text = ranking_tag[1].find('span').text.strip()\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['ranking'] = ranking_text.split()[\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 0].replace('#', '')\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 in_index = ranking_text.split().index('in')\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['city'] = ' '.join(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 ranking_text.split()[in_index + 1:])\r\n\r\n\u00a0 \u00a0 # Address and phone info\r\n\u00a0 \u00a0 if len(detail_cards) &gt; 2:\r\n\u00a0 \u00a0 \u00a0 \u00a0 location_info = detail_cards[2]\r\n\u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['address'] = location_info.find(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'span', class_='biGQs').text.strip()\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 # Phone number\r\n\u00a0 \u00a0 \u00a0 \u00a0 try:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 phone_link = location_info.find('a', attrs={'aria-label': 'Call'})\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 if phone_link:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['phone_no'] = phone_link.get(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'href').replace('tel:', '')\r\n\u00a0 \u00a0 \u00a0 \u00a0 except Exception as e:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 logging.error(f\"Error extracting phone number: {str(e)}\")\r\n\r\n\u00a0 \u00a0 return restaurant_info\r\n\r\n\r\ndef scrape_reviews(soup):\r\n\u00a0 \u00a0 reviews = []\r\n\u00a0 \u00a0 review_cards = soup.find_all(\r\n\u00a0 \u00a0 \u00a0 \u00a0 'div', attrs={'data-automation': 'reviewCard'})\r\n\r\n\u00a0 \u00a0 for review in review_cards:\r\n\u00a0 \u00a0 \u00a0 \u00a0 review_data = {\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'rating': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'title': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'text': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'date': ''\r\n\u00a0 \u00a0 \u00a0 \u00a0 }\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 rating_element = review.find('svg', class_='UctUV')\r\n\u00a0 \u00a0 \u00a0 \u00a0 if rating_element:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 review_data['rating'] = rating_element.find(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'title').text.strip().replace(' of 5 bubbles', '')\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 title_element = review.find(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'div', attrs={'data-test-target': 'review-title'})\r\n\u00a0 \u00a0 \u00a0 \u00a0 if title_element:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 review_data['title'] = title_element.text.strip()\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 text_element = review.find(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'div', attrs={'data-test-target': 'review-body'})\r\n\u00a0 \u00a0 \u00a0 \u00a0 if text_element:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 review_data['text'] = text_element.text.strip()\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 date_element = review.find('div', class_='neAPm')\r\n\u00a0 \u00a0 \u00a0 \u00a0 if date_element:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 child_divs = date_element.find_all('div')\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 if child_divs:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 review_data['date'] = child_divs[0].text.strip().replace(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'Written ', '')\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 reviews.append(review_data)\r\n\u00a0 \u00a0 \u00a0 \u00a0 time.sleep(3)\r\n\r\n\u00a0 \u00a0 return reviews\r\n\r\n\r\ndef save_to_csv(restaurant_info, reviews, filename):\r\n\u00a0 \u00a0 with open(filename, mode='w', newline='', encoding='utf-8') as file:\r\n\u00a0 \u00a0 \u00a0 \u00a0 writer = csv.writer(file)\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 # Write header\r\n\u00a0 \u00a0 \u00a0 \u00a0 header = ['RESTAURANT_NAME', 'PRICE_LEVEL', 'CUISINE_TYPE', 'TOTAL_RATING',\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'TOTAL_REVIEWS', 'FOOD_RATING', 'SERVICE_RATING', 'VALUE_RATING',\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'ATMOSPHERE_RATING', 'RANKING', 'CITY', 'ADDRESS', 'PHONE_NO',\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'RATING', 'REVIEW_TITLE', 'REVIEW_TEXT', 'REVIEW_DATE']\r\n\u00a0 \u00a0 \u00a0 \u00a0 writer.writerow(header)\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 # Write reviews with restaurant info\r\n\u00a0 \u00a0 \u00a0 \u00a0 for review in reviews:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 row = [\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['name'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['price_level'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['cuisine_type'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['total_rating'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['total_reviews'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['food_rating'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['service_rating'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['value_rating'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['atmosphere_rating'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['ranking'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['city'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['address'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['phone_no'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 review['rating'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 review['title'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 review['text'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 review['date']\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 ]\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 writer.writerow(row)\r\n\r\n\r\ndef main():\r\n\u00a0 \u00a0 url = 'https:\/\/www.tripadvisor.com\/Restaurant_Review-g60763-d478965-Reviews-or45-Gallaghers_Steakhouse-New_York_City_New_York.html'\r\n\u00a0 \u00a0 headers, proxies = setup_request()\r\n\r\n\u00a0 \u00a0 try:\r\n\u00a0 \u00a0 \u00a0 \u00a0 response = requests.get(url, headers=headers, proxies=proxies)\r\n\u00a0 \u00a0 \u00a0 \u00a0 soup = BeautifulSoup(response.content, 'html.parser')\r\n\u00a0 \u00a0 \u00a0 \u00a0 time.sleep(10)\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 restaurant_info = get_restaurant_info(soup)\r\n\u00a0 \u00a0 \u00a0 \u00a0 reviews = scrape_reviews(soup)\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 save_to_csv(restaurant_info, reviews,\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'tripadvisor_ny_restaurant_reviews_details_page_1.csv')\r\n\u00a0 \u00a0 \u00a0 \u00a0 print(\"All information saved successfully\")\r\n\r\n\u00a0 \u00a0 except requests.exceptions.RequestException as e:\r\n\u00a0 \u00a0 \u00a0 \u00a0 logging.error(f\"Error during requests to {url} : {str(e)}\")\r\n\r\n\r\nif __name__ == \"__main__\":\r\n\u00a0 \u00a0 main()<\/pre>\n<h2 id=\"result-in-csv\"><span style=\"font-weight: 400;\">The result in CSV format<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Here is the result from the first page. But please note that they are based on the &#8220;<strong>Detailed Review<\/strong>&#8221; instead of &#8220;<strong>Recent Reviews<\/strong>&#8220;<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1179\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/43_csv.png\" alt=\"csv output\" width=\"1259\" height=\"338\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/43_csv.png 1259w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/43_csv-300x81.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/43_csv-1024x275.png 1024w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/43_csv-768x206.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/43_csv-624x168.png 624w\" sizes=\"auto, (max-width: 1259px) 100vw, 1259px\" \/><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">RESTAURANT_NAME,PRICE_LEVEL,CUISINE_TYPE,TOTAL_RATING,TOTAL_REVIEWS,FOOD_RATING,SERVICE_RATING,VALUE_RATING,ATMOSPHERE_RATING,RANKING,CITY,ADDRESS,PHONE_NO,RATING,REVIEW_TITLE,REVIEW_TEXT,REVIEW_DATE\r\nGallaghers Steakhouse,$$$$,\"Steakhouse, Seafood, Gluten free options\",4.5,\"5,977\",,,,,43,New York City,\"228 W 52nd St, New York City, NY 10019-5802\",,4,Pricey but worth it,\"We booked ahead and glad we did, it was packed on a Thursday evening. \r\n\r\nLovely old school vibe to the restaurant with the majority of wait staff of an older generation. \r\n\r\nGreat meal, carpaccio and salmon tartare to start, rib eye steak and fillet for main. We weren\u2019t told that there was a specials menu which I would have ordered from (8oz fillet rather than 10oz). We only knew about it as we overheard the table next to us being told what they were.  When we asked the waiter about it he wasn\u2019t happy we weren\u2019t told about them and gave us a complimentary fruit platter, which was a nice gesture but I would have preferred them to ask if they could offer a dessert on the house (all the same price including the fruit platter). My husband ordered the pecan pie with ice cream which should have been hot but was cold. \r\n\r\nBe prepared, this is a pricey joint but worth it. Food above with 2 sides, 1 glass of Prosecco and 2 glasses of Sangiovese came to $300 dollars with taxes and without tipRead more\",\"October 2, 2022\"\r\nGallaghers Steakhouse,$$$$,\"Steakhouse, Seafood, Gluten free options\",4.5,\"5,977\",,,,,43,New York City,\"228 W 52nd St, New York City, NY 10019-5802\",,5,An experience to remember!,\"I had a reservation for 5:30- made weeks in advance as we had broadway tickets and it was my sweetheart\u2019s birthday and this is where I wanted to take him. We were a little late getting out of the hotel, not thinking about rush-hour, there was no way we were going to get a cab, so we had to make it 17 blocks in about a 15 minute window. without stoplights and other people on the sidewalk, we probably would have made it in time, but we were running late. I called, James answered, I asked if he would please hold my reservation. He said that he would. We were still running behind, and I called back to be sure James was still going to hold my reservation since we were 15 minutes late. Thankfully, he did. I really appreciated that!!\r\n\r\nThe restaurant itself is really nice, high end. It was super crowded. We were seated at a small table in between two other couples, it was cramped. I was so happy to be there, I did not worry about rubbing elbows with strangers. It is also quite loud. This is not a quiet candlelight dinner spot if that\u2019s what you are looking for. \r\n\r\nThe service was wonderful, quick, efficient, and very friendly. We had two mixed drinks to start, for the meal I had a sirloin steak, he had the surf and turf with a filet and lobster tail. We had Caesar salad, mashed potatoes, and spinach. We also had a glass of Shiraz with the dinner.\r\n\r\nEverything was so outstanding. I like my steaks medium well, the steak that was brought to me was medium rare at best. Normally, I would have sent it back, but it tasted good, i did not. I did only eat half of it because the closer I got to the center, the more rare it was.\r\nI do not know how they make their mashed potatoes, but they were the best ones I\u2019ve ever had. The saut\u00e9ed spinach was also quite delicious.\r\n\r\nWe did not finish our food, there was just so much of it! I did mention to the waiter that it was my boyfriends birthday, and he brought out a small chocolate cake. Even though we were too full  to eat another bite, we ate the whole cake. It was so delicious.\r\n\r\nOur bill was $275 plus tip. That is a really big bill for a dinner but this was an experience.Read more\",\"October 24, 2022\"\r\nGallaghers Steakhouse,$$$$,\"Steakhouse, Seafood, Gluten free options\",4.5,\"5,977\",,,,,43,New York City,\"228 W 52nd St, New York City, NY 10019-5802\",,5,Prime Rib all the time!,\"I recently went to the restaurant with friends. Nicely set out with plenty of room and plenty of meat to choose from. We all went for the Prime Rib which we ordered in advance ( must do this ) absolutely delicious melt in your mouth.  Great service, fantastic waiter an all round great evening out!Read more\",\"March 14, 2020\"\r\nGallaghers Steakhouse,$$$$,\"Steakhouse, Seafood, Gluten free options\",4.5,\"5,977\",,,,,43,New York City,\"228 W 52nd St, New York City, NY 10019-5802\",,5,Amazing restaurant,\"Gallagher's is absolutely the best steak restaurant we have ever been to, the place is stylish and comfortable, the staff were very professional and friendly, service couldn't be faulted. The kitchen is in full view and very clean, great to watched the steaks being cooked on the open ovens.\r\nThe steaks were cooked to perfection and delicious, a really good cut of meat and a good choice of sides that were all big enough to share.\r\nWe can't wait to return on our next visit to NYC and have recommended this place to many others.Read more\",\"January 20, 2020\"\r\nGallaghers Steakhouse,$$$$,\"Steakhouse, Seafood, Gluten free options\",4.5,\"5,977\",,,,,43,New York City,\"228 W 52nd St, New York City, NY 10019-5802\",,5,Exceptional meal with fantastic service,\"My husband and I enjoyed a fabulous dinner here. Our waiter, Derek, was phenomenal. The service was impeccable. My water glass was constantly filled, the food arrived in the right amount of time and was cooked absolutely perfectly and the manager stopped by to make sure everything was good. My NY strip was out of this world good. The sides are large and tasty. The wedge salad was the best I've ever eaten. The prices were just right for what you get. As an added bonus, a photographer stopped by to take table side pictures available for purchase. Nice touch! Overall, it was a meal we will always remember and we hope to visit again the next time we're in Vegas!Read more\",\"June 28, 2021\"\r\nGallaghers Steakhouse,$$$$,\"Steakhouse, Seafood, Gluten free options\",4.5,\"5,977\",,,,,43,New York City,\"228 W 52nd St, New York City, NY 10019-5802\",,5,Outstanding food and service,\"Managed to squeeze in an early evening booking, lunch menu still\r\navailable!\r\nLarge restaurant, elegant, good atmosphere!\r\nSuper value 3 courses!\r\nBasket of 4 different breads and whipped salted butter was a nice touch.\r\nCaesar salad, clam chowder starters were first class.\r\nFillet mignonette 10oz with supplemental fee was simply sublime, cooked perfectly!\r\nDesserts NY cheesecake and Key lime pie were delicious!\r\nThe service was old school perfection! Experienced, unfussy and extremely professional!\r\nOutstanding dining experience, highly recommended! \r\nBook a table, if you can get one!Read more\",\"December 14, 2022\"\r\nGallaghers Steakhouse,$$$$,\"Steakhouse, Seafood, Gluten free options\",4.5,\"5,977\",,,,,43,New York City,\"228 W 52nd St, New York City, NY 10019-5802\",,5,Outstanding Traditional Steakhouse,\"The restaurant was full when we arrived and full when we left. That says it all really. Gallaghers has the feel of a traditional 1930's type establishment. All staff appear to be \"\"old school\"\", mature is years and sooo attentive with their service. The atmosphere was superb, the food divine. I had the Filet Mignon - to die for. Others had the \"\"Surf &amp; Turf\"\" - the lobster tail was huge and so well prepared. The deserts were something else - absolutely huge. For us, we struggled with 2 courses. The bill wasn't cheap but, upon reflection, it was absolutely worth every dollar. Outstanding. If we ever go back to NYC, Gallaghers is a \"\"Must Do\"\" again.Read more\",\"June 9, 2022\"\r\nGallaghers Steakhouse,$$$$,\"Steakhouse, Seafood, Gluten free options\",4.5,\"5,977\",,,,,43,New York City,\"228 W 52nd St, New York City, NY 10019-5802\",,4,Great but Pricey,\"Our first night in NY and not a disappointment!  We dined early (5.30pm) as we had The Lion King booked down the road (Mishkov is about 10 minutes walk through Times Square from here).\r\nThe Dr Loosen Blue riesling by the glass is lovely.  So is the Zardetto Prosecco.  We had 3 Poterhouses to share between 3 adults and it was way too much food!  We should have ordered 2 but never learn!  Too much food but what great quality and perfectly cooked.  Medium Rare is Rare to our UK pallets but was fine for me!\r\nOur party also had the 10oz Filet and the Salmon which were also great.\r\nThe onion rings are exceptional.  Cauliflower and Mac and cheese also very good.  The service here is more formal than many places we went in NY subsequently but I guess that's what you should expect.\r\nGreat job Alvaro thanks.Read more\",\"January 7, 2023\"\r\nGallaghers Steakhouse,$$$$,\"Steakhouse, Seafood, Gluten free options\",4.5,\"5,977\",,,,,43,New York City,\"228 W 52nd St, New York City, NY 10019-5802\",,1,Awful service. Intimidating serving staff.,\"Where to begin with this one. As soon as you walk through the door, you know the belt is already being unbuckled. This was with hindsight. To get to the point the service started okay. A little fake and rehearsed but okay. Starter was great. Steak was average at best. Lukewarm to cold. Sides were average. Drinks not of a sufficient standard. Guinness was poor. The real fun started when the bill arrived. Bill was 474 between the 4 of us. A decent spend, we thought.  Tip time, not happy. This is when the real aggression started. We left what we thought was a suitable cash tip for 1 hours service. Server not happy. Informed head honcho. He wasn't happy. Wanted to know why we were not leaving a 78 dollar tip. The reason was is the fact we do not believe in entitlement. Servers do not earn 78 dollars an hour and seeing as the food was average, we thought what we had left was more than enough. To sum up. Find somewhere else to spend your hard earned. This place was recommended to us. Very disappointed. All in all the experience left an extremely nasty taste in our mouths.Read more\",\"February 12, 2023\"\r\nGallaghers Steakhouse,$$$$,\"Steakhouse, Seafood, Gluten free options\",4.5,\"5,977\",,,,,43,New York City,\"228 W 52nd St, New York City, NY 10019-5802\",,5,Tops!,\"On par with the \u201cchophouse\u201d dining we have in SC. We had an exceptional experience for several reasons. Our waiter made it feel like we could taste every dish as he presented the options and specials. We had several staff members check on us throughout the meal. The table setup had us close to other diners and despite the reputation of northerners to be adverse to being bothered, we had great conversations with outlets neighbors. Wonderful ambiance as you can see the steak storage room and kitchen from all angles. Beautiful experience all around.Read more\",\"October 5, 2021\"\r\nGallaghers Steakhouse,$$$$,\"Steakhouse, Seafood, Gluten free options\",4.5,\"5,977\",,,,,43,New York City,\"228 W 52nd St, New York City, NY 10019-5802\",,1,A birthday celebration ruined,\"Staff were opinionated and interrupted conversation with sarcastic comments. Calamari was lukewarm and lacked seasoning. The Caesar salad was ok but nothing special. The steaks were good but not hot, jacket potato was cold and was sent back but waiter instead of apologising preferred to antagonise us. Bottle of Rioja served wasn\u2019t the same as the wine list which wasn\u2019t explained and not very professional. The toilets had cheap hand soap at this prices you would expect better quality. A birthday celebration ruined.Read more\",\"December 12, 2022\"\r\nGallaghers Steakhouse,$$$$,\"Steakhouse, Seafood, Gluten free options\",4.5,\"5,977\",,,,,43,New York City,\"228 W 52nd St, New York City, NY 10019-5802\",,4,Delicious and get the NYC vibe,\"This was worth a visit because it is steeped in history and the experience and service we got was quite special. \r\nIt is expensive and perhaps because we go out to eat steak alot, we have ruined it for ourselves a bit, because although the food was delicious and we had a great time, we didn\u2019t feel it was worth the amount we spent comparing it to where we usually eat steak for much less. \r\nWe really did enjoy our food though, we had pork belly special and crab cocktail to start, porterhouse for two for mains with French fries and we also had key lime pie and NY cheesecake to finish. \r\nWe overindulged and it was a wonderful night but definitely in the high price bracket.\r\nOur waiters were friendly, helpful and prompt. It was a good vibe. \r\nThe disclaimer on the photo near the host desk make me laugh re \u201cthis is not a photo of Jeffrey epstein, it\u2019s Perry como\u201d, I wonder how many times people asked about it for the sign to be necessary. \r\nWorth a visit but remember that the prices don\u2019t include the tax and service charge so don\u2019t get carried away ordering like we did! Thanks for a lovely evening.Read more\",\"February 12, 2023\"\r\nGallaghers Steakhouse,$$$$,\"Steakhouse, Seafood, Gluten free options\",4.5,\"5,977\",,,,,43,New York City,\"228 W 52nd St, New York City, NY 10019-5802\",,5,Old time NYC steakhouse,\"Excellent steakhouse. We make a reservation every time we come to NYC. The steak was fantastic. We had a Porterhouse for two and it was so good. Also had the Mac and cheese, which was delicious. The bread is so good, especially the date nut bread. \r\nWe were celebrating my husband\u2019s birthday and the waiter brought a cake with a candle that was yummy.Read more\",\"September 26, 2021\"\r\nGallaghers Steakhouse,$$$$,\"Steakhouse, Seafood, Gluten free options\",4.5,\"5,977\",,,,,43,New York City,\"228 W 52nd St, New York City, NY 10019-5802\",,4,\"Great dinner, but priced out- 1 &amp; done!\",\"Good food, service, and dinner experience, but it is the most expensive dinner we have ever eaten in our entire life!   $22 for a glass of wine. Side of asparagus was $16 and my goat cheese salad had little goat cheese and beets.  The filet was very good and watching the kitchen staff prepare food was a treat. Also, the lobby area\/coat check area is not efficient. Basically had to fight to get my coat through the people waiting for the host.Read more\",\"January 7, 2023\"\r\nGallaghers Steakhouse,$$$$,\"Steakhouse, Seafood, Gluten free options\",4.5,\"5,977\",,,,,43,New York City,\"228 W 52nd St, New York City, NY 10019-5802\",,3,\"Good food, disinterested waiter\",\"We were greeted at our table by a lovely guy called Melvin...but that was the last we saw of him.  We could see tables around us, who arrived later all being attended to.  When our waiter arrived to take our order he pulled a face when my husband and aunt asked for their steaks  to be cooked well done.  We had the $29 lunch and my husband upgraded to 10oz filet mignon.  Food was good but mine arrived with no roast potatoes.  We were brought the wrong bill when it was time to pay. Not once in the whole time we were there did anyone ask if everything was OK with our meal.  We all felt we were being hurried.  This is second time in gallaghers (last time was full priced dinner) and whilst food has been great both times service wasn't that great on either.  Dont think I'll be back as there are many other steakhouses in the area who are happy to see their customers and make sure their experience is enjoyable.Read more\",\"September 4, 2022\"\r\n<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1180\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/44_sort_review.png\" alt=\"Details reviews sort\" width=\"478\" height=\"403\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/44_sort_review.png 478w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/44_sort_review-300x253.png 300w\" sizes=\"auto, (max-width: 478px) 100vw, 478px\" \/><\/p>\n<h2 id=\"handling-pagination\"><span style=\"font-weight: 400;\">Handling Pagination<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">The code provided earlier only scrapes information from the first page, yielding a total of 15 results. To gather reviews from additional pages, we need to examine how the URL changes when we click on the &#8220;<strong>next<\/strong>&#8221; button.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1181\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/45_pagination.png\" alt=\"Pagination\" width=\"313\" height=\"145\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/45_pagination.png 313w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/45_pagination-300x139.png 300w\" sizes=\"auto, (max-width: 313px) 100vw, 313px\" \/><\/p>\n<h3 id=\"url-structure\"><span style=\"font-weight: 400;\">Understanding the URL Structure<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">The URL for the first page is:<\/span><\/p>\n<p><a href=\"https:\/\/www.tripadvisor.com\/Restaurant_Review-g60763-d478965-Reviews-Gallaghers_Steakhouse-New_York_City_New_York.html\" rel=\"nofollow noopener\" target=\"_blank\"><span style=\"font-weight: 400;\">https:\/\/www.tripadvisor.com\/Restaurant_Review-g60763-d478965-Reviews-Gallaghers_Steakhouse-New_York_City_New_York.html<\/span><\/a><\/p>\n<p><span style=\"font-weight: 400;\">For subsequent pages, the URL changes as follows:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Second Page: <\/span><a href=\"https:\/\/www.tripadvisor.com\/Restaurant_Review-g60763-d478965-Reviews-or15-Gallaghers_Steakhouse-New_York_City_New_York.html\" rel=\"nofollow noopener\" target=\"_blank\"><span style=\"font-weight: 400;\">https:\/\/www.tripadvisor.com\/Restaurant_Review-g60763-d478965-Reviews-or15-Gallaghers_Steakhouse-New_York_City_New_York.html<\/span><\/a><span style=\"font-weight: 400;\">\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Third Page: <\/span><a href=\"https:\/\/www.tripadvisor.com\/Restaurant_Review-g60763-d478965-Reviews-or30-Gallaghers_Steakhouse-New_York_City_New_York.html\" rel=\"nofollow noopener\" target=\"_blank\"><span style=\"font-weight: 400;\">https:\/\/www.tripadvisor.com\/Restaurant_Review-g60763-d478965-Reviews-or30-Gallaghers_Steakhouse-New_York_City_New_York.html<\/span><\/a><span style=\"font-weight: 400;\">\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The pattern here shows that the or parameter increments by <strong>15<\/strong> for each new page.<\/span><\/p>\n<h3><\/h3>\n<h3 id=\"implementing-pagination\"><span style=\"font-weight: 400;\">Implementing Pagination in the code<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">To effectively scrape multiple pages of reviews,\u00a0 we&#8217;ll need to:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Create a function to generate URLs<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Modify the main function to handle multiple pages<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Ensure we don&#8217;t duplicate restaurant info for each page<\/span><\/li>\n<\/ul>\n<ol>\n<li><span style=\"font-weight: 400;\">Create <strong>generate_url<\/strong><\/span><span style=\"font-weight: 400;\"><strong>\u00a0<\/strong>function to create URLs for each page:<\/span><\/li>\n<\/ol>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">def generate_url(base_url, page_number):\r\n\u00a0 \u00a0 if page_number == 1:\r\n\u00a0 \u00a0 \u00a0 \u00a0 return base_url\r\n\u00a0 \u00a0 offset = (page_number - 1) * 15\r\n\u00a0 \u00a0 parts = base_url.split('Reviews')\r\n\u00a0 \u00a0 return f\"{parts[0]}Reviews-or{offset}{parts[1]}\"<\/pre>\n<p><span style=\"font-weight: 400;\">2. Create <\/span><strong>check_last_page<\/strong><span style=\"font-weight: 400;\"> function to determine when to stop pagination:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">def check_last_page(soup):\r\n\u00a0 \u00a0 try:\r\n\u00a0 \u00a0 \u00a0 \u00a0 pagination = soup.find('div', class_='pageNumbers')\r\n\u00a0 \u00a0 \u00a0 \u00a0 if pagination:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 last_page = int(pagination.find_all('a')[-1].text)\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 return last_page\r\n\u00a0 \u00a0 \u00a0 \u00a0 return None\r\n\u00a0 \u00a0 except Exception as e:\r\n\u00a0 \u00a0 \u00a0 \u00a0 logging.error(f\"Error checking last page: {str(e)}\")\r\n\u00a0 \u00a0 \u00a0 \u00a0 return None<\/pre>\n<p><span style=\"font-weight: 400;\">3. Modified <\/span><strong>save_to_csv<\/strong><span style=\"font-weight: 400;\"> to support appending to the file:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">def save_to_csv(restaurant_info, reviews, filename, mode='w'):\r\n\u00a0 \u00a0 # Write header only if it's a new file\r\n\u00a0 \u00a0 if mode == 'w':\r\n\u00a0 \u00a0 \u00a0 \u00a0 writer.writerow(header)<\/pre>\n<h3 id=\"implementing-proxy-rotation\"><span style=\"font-weight: 400;\">Implementing Proxy Rotation<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">To implement the proxy rotation let\u2019s generate the proxy list from the our proxy dashboard<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Download or copy the proxies in the format &#8220;<strong>username:password@hostname:port<\/strong>&#8221; and save it inside a txt file. For example here <\/span><span style=\"font-weight: 400;\">proxies.txt<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Create <\/span><strong>load_proxies<\/strong><span style=\"font-weight: 400;\"> function to read proxies from a file:<\/span><\/li>\n<\/ol>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">def load_proxies(file_path):\r\n\u00a0 \u00a0 try:\r\n\u00a0 \u00a0 \u00a0 \u00a0 with open(file_path, 'r') as file:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 proxies = [line.strip() for line in file if line.strip()]\r\n\u00a0 \u00a0 \u00a0 \u00a0 return proxies\r\n\u00a0 \u00a0 except Exception as e:\r\n\u00a0 \u00a0 \u00a0 \u00a0 logging.error(f\"Error loading proxies from file: {str(e)}\")\r\n\u00a0 \u00a0 \u00a0 \u00a0 return []<\/pre>\n<p><span style=\"font-weight: 400;\">2. Create <\/span><strong>get_random_proxy<\/strong><span style=\"font-weight: 400;\"> function to randomly select a proxy:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">def get_random_proxy(proxy_list):\r\n\u00a0 \u00a0 if not proxy_list:\r\n\u00a0 \u00a0 \u00a0 \u00a0 return None\r\n\u00a0 \u00a0 proxy = random.choice(proxy_list)\r\n\u00a0 \u00a0 return {\r\n\u00a0 \u00a0 \u00a0 \u00a0 'http': f'http:\/\/{proxy}',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'https': f'http:\/\/{proxy}'\r\n\u00a0 \u00a0 }<\/pre>\n<p><span style=\"font-weight: 400;\">3. Modified <\/span><strong>setup_request<\/strong><span style=\"font-weight: 400;\"> to use random proxies:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">def setup_request(proxy_list):\r\n\u00a0 \u00a0 # ... headers setup ...\r\n\u00a0 \u00a0 proxies = get_random_proxy(proxy_list)\r\n\u00a0 \u00a0 return headers, proxies<\/pre>\n<p><span style=\"font-weight: 400;\">4. Create <\/span><strong>make_request<\/strong><span style=\"font-weight: 400;\"> function for handling retries with different proxies:<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">def make_request(url, proxy_list, max_retries=3):\r\n\u00a0 \u00a0 retries = 0\r\n\u00a0 \u00a0 while retries &lt; max_retries:\r\n\u00a0 \u00a0 \u00a0 \u00a0 try:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 headers, proxies = setup_request(proxy_list)\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 if proxies:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 print(f\"Using proxy: {proxies['http']}\")\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 response = requests.get(url, headers=headers, proxies=proxies, timeout=30)\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 response.raise_for_status()\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 return response\r\n\u00a0 \u00a0 \u00a0 \u00a0 except requests.RequestException as e:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 retries += 1\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 if retries == max_retries:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 logging.error(f\"Failed after {max_retries} attempts: {str(e)}\")\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 return None\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 print(f\"Retry {retries} with a different proxy\")\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 time.sleep(5)\r\n\u00a0 \u00a0 return None<\/pre>\n<p><span style=\"font-weight: 400;\">5. <\/span><span style=\"font-weight: 400;\">Update the <\/span><strong>main<\/strong><span style=\"font-weight: 400;\"> function to use proxy rotation:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Loads proxies from file<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Uses <\/span><strong>make_request<\/strong><span style=\"font-weight: 400;\"> function for each page<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Implements random delays between requests<\/span><\/li>\n<\/ul>\n<h2><\/h2>\n<h2 id=\"the-complete-code\"><span style=\"font-weight: 400;\">The complete code<\/span><\/h2>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import requests\r\nfrom bs4 import BeautifulSoup\r\nimport logging\r\nimport csv\r\nimport time\r\n\r\n\r\ndef load_proxies(file_path):\r\n\u00a0 \u00a0 try:\r\n\u00a0 \u00a0 \u00a0 \u00a0 with open(file_path, 'r') as file:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 proxies = [line.strip() for line in file if line.strip()]\r\n\u00a0 \u00a0 \u00a0 \u00a0 return proxies\r\n\u00a0 \u00a0 except Exception as e:\r\n\u00a0 \u00a0 \u00a0 \u00a0 logging.error(f\"Error loading proxies from file: {str(e)}\")\r\n\u00a0 \u00a0 \u00a0 \u00a0 return []\r\n\r\n\r\ndef get_random_proxy(proxy_list):\r\n\u00a0 \u00a0 if not proxy_list:\r\n\u00a0 \u00a0 \u00a0 \u00a0 return None\r\n\u00a0 \u00a0 proxy = random.choice(proxy_list)\r\n\u00a0 \u00a0 return {\r\n\u00a0 \u00a0 \u00a0 \u00a0 'http': f'http:\/\/{proxy}',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'https': f'http:\/\/{proxy}'\r\n\u00a0 \u00a0 }\r\n\r\n\r\ndef setup_request(proxy_list):\r\n\u00a0 \u00a0 headers = {\r\n\u00a0 \u00a0 \u00a0 \u00a0 \"User-Agent\":\"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/119.0.0.0 Safari\/537.36\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \"Accept\": \"text\/html,application\/xhtml+xml,application\/xml;q=0.9,image\/webp,*\/*;q=0.8\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \"Accept-Language\": \"en-US,en;q=0.5\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \"Accept-Encoding\": \"gzip, deflate, br\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \"DNT\": \"1\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \"Connection\": \"keep-alive\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \"Upgrade-Insecure-Requests\": \"1\",\r\n\u00a0 \u00a0 }\r\n\u00a0 \u00a0 proxies = get_random_proxy(proxy_list)\r\n\u00a0 \u00a0 return headers, proxies\r\n\r\n\r\ndef make_request(url, proxy_list, max_retries=3):\r\n\u00a0 \u00a0 retries = 0\r\n\u00a0 \u00a0 while retries &lt; max_retries:\r\n\u00a0 \u00a0 \u00a0 \u00a0 try:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 headers, proxies = setup_request(proxy_list)\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 if proxies:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 print(f\"Using proxy: {proxies['http']}\")\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 response = requests.get(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 url, headers=headers, proxies=proxies, timeout=30)\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 response.raise_for_status()\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 return response\r\n\u00a0 \u00a0 \u00a0 \u00a0 except requests.RequestException as e:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 retries += 1\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 if retries == max_retries:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 logging.error(f\"Failed after {max_retries} attempts: {str(e)}\")\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 return None\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 print(f\"Retry {retries} with a different proxy\")\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 time.sleep(5)\r\n\u00a0 \u00a0 return None\r\n\r\n\r\ndef get_restaurant_info(soup):\r\n\u00a0 \u00a0 restaurant_info = {\r\n\u00a0 \u00a0 \u00a0 \u00a0 'name': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'price_level': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'cuisine_type': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'total_rating': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'total_reviews': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'food_rating': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'service_rating': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'value_rating': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'atmosphere_rating': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'ranking': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'city': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'address': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 'phone_no': ''\r\n\u00a0 \u00a0 }\r\n\r\n\u00a0 \u00a0 restaurant_info['name'] = soup.find('h1').text.strip()\r\n\r\n\u00a0 \u00a0 # General info processing\r\n\u00a0 \u00a0 general_infos = soup.find('span', class_='cPbcf').text.strip()\r\n\u00a0 \u00a0 info_parts = general_infos.split(', ')\r\n\u00a0 \u00a0 restaurant_info['price_level'] = info_parts[0]\r\n\u00a0 \u00a0 restaurant_info['cuisine_type'] = ', '.join(info_parts[1:])\r\n\r\n\u00a0 \u00a0 # Rating and review info\r\n\u00a0 \u00a0 detail_cards = soup.find_all(\r\n\u00a0 \u00a0 \u00a0 \u00a0 'div', attrs={'data-automation': 'OVERVIEW_TAB_ELEMENT'})\r\n\u00a0 \u00a0 if detail_cards:\r\n\u00a0 \u00a0 \u00a0 \u00a0 rating_info = detail_cards[0]\r\n\u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['total_rating'] = rating_info.find(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'span', class_='biGQs').text.strip()\r\n\u00a0 \u00a0 \u00a0 \u00a0 reviews_text = rating_info.find('div', class_='jXaJR').text.strip()\r\n\u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['total_reviews'] = reviews_text.replace(' reviews', '')\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 # Detailed ratings\r\n\u00a0 \u00a0 \u00a0 \u00a0 try:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 rating_container = rating_info.find('div', class_='khxWm')\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 if rating_container:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 rating_category = rating_container.find_all(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'div', class_='YwaWb')\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 if len(rating_category) &gt;= 4:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['food_rating'] = rating_category[0].find(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['service_rating'] = rating_category[1].find(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['value_rating'] = rating_category[2].find(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['atmosphere_rating'] = rating_category[3].find(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'svg', class_='UctUV').find('title').text.strip().replace(' of 5 bubbles', '')\r\n\u00a0 \u00a0 \u00a0 \u00a0 except Exception as e:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 logging.error(f\"Error extracting detailed ratings: {str(e)}\")\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 # Ranking and city info\r\n\u00a0 \u00a0 \u00a0 \u00a0 ranking_tag = rating_info.find_all('a', class_='BMQDV')\r\n\u00a0 \u00a0 \u00a0 \u00a0 if len(ranking_tag) &gt; 1:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 ranking_text = ranking_tag[1].find('span').text.strip()\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['ranking'] = ranking_text.split()[\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 0].replace('#', '')\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 in_index = ranking_text.split().index('in')\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['city'] = ' '.join(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 ranking_text.split()[in_index + 1:])\r\n\r\n\u00a0 \u00a0 # Address and phone info\r\n\u00a0 \u00a0 if len(detail_cards) &gt; 2:\r\n\u00a0 \u00a0 \u00a0 \u00a0 location_info = detail_cards[2]\r\n\u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['address'] = location_info.find(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'span', class_='biGQs').text.strip()\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 # Phone number\r\n\u00a0 \u00a0 \u00a0 \u00a0 try:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 phone_link = location_info.find('a', attrs={'aria-label': 'Call'})\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 if phone_link:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['phone_no'] = phone_link.get(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'href').replace('tel:', '')\r\n\u00a0 \u00a0 \u00a0 \u00a0 except Exception as e:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 logging.error(f\"Error extracting phone number: {str(e)}\")\r\n\r\n\u00a0 \u00a0 return restaurant_info\r\n\r\n\r\ndef scrape_reviews(soup):\r\n\u00a0 \u00a0 reviews = []\r\n\u00a0 \u00a0 review_cards = soup.find_all(\r\n\u00a0 \u00a0 \u00a0 \u00a0 'div', attrs={'data-automation': 'reviewCard'})\r\n\r\n\u00a0 \u00a0 for review in review_cards:\r\n\u00a0 \u00a0 \u00a0 \u00a0 review_data = {\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'rating': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'title': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'text': '',\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'date': ''\r\n\u00a0 \u00a0 \u00a0 \u00a0 }\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 rating_element = review.find('svg', class_='UctUV')\r\n\u00a0 \u00a0 \u00a0 \u00a0 if rating_element:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 review_data['rating'] = rating_element.find(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'title').text.strip().replace(' of 5 bubbles', '')\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 title_element = review.find(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'div', attrs={'data-test-target': 'review-title'})\r\n\u00a0 \u00a0 \u00a0 \u00a0 if title_element:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 review_data['title'] = title_element.text.strip()\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 text_element = review.find(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'div', attrs={'data-test-target': 'review-body'})\r\n\u00a0 \u00a0 \u00a0 \u00a0 if text_element:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 review_data['text'] = text_element.text.strip()\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 date_element = review.find('div', class_='neAPm')\r\n\u00a0 \u00a0 \u00a0 \u00a0 if date_element:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 child_divs = date_element.find_all('div')\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 if child_divs:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 review_data['date'] = child_divs[0].text.strip().replace(\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'Written ', '')\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 reviews.append(review_data)\r\n\u00a0 \u00a0 \u00a0 \u00a0 time.sleep(3)\r\n\r\n\u00a0 \u00a0 return reviews\r\n\r\n\r\ndef generate_url(base_url, page_number):\r\n\u00a0 \u00a0 if page_number == 1:\r\n\u00a0 \u00a0 \u00a0 \u00a0 return base_url\r\n\u00a0 \u00a0 offset = (page_number - 1) * 15\r\n\u00a0 \u00a0 parts = base_url.split('Reviews')\r\n\u00a0 \u00a0 return f\"{parts[0]}Reviews-or{offset}{parts[1]}\"\r\n\r\n\r\ndef check_last_page(soup):\r\n\u00a0 \u00a0 try:\r\n\u00a0 \u00a0 \u00a0 \u00a0 pagination = soup.find('div', class_='pageNumbers')\r\n\u00a0 \u00a0 \u00a0 \u00a0 if pagination:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 last_page = int(pagination.find_all('a')[-1].text)\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 return last_page\r\n\u00a0 \u00a0 \u00a0 \u00a0 return None\r\n\u00a0 \u00a0 except Exception as e:\r\n\u00a0 \u00a0 \u00a0 \u00a0 logging.error(f\"Error checking last page: {str(e)}\")\r\n\u00a0 \u00a0 \u00a0 \u00a0 return None\r\n\r\n\r\ndef save_to_csv(restaurant_info, reviews, filename):\r\n\u00a0 \u00a0 with open(filename, mode='w', newline='', encoding='utf-8') as file:\r\n\u00a0 \u00a0 \u00a0 \u00a0 writer = csv.writer(file)\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 # Write header\r\n\u00a0 \u00a0 \u00a0 \u00a0 header = ['RESTAURANT_NAME', 'PRICE_LEVEL', 'CUISINE_TYPE', 'TOTAL_RATING',\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'TOTAL_REVIEWS', 'FOOD_RATING', 'SERVICE_RATING', 'VALUE_RATING',\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'ATMOSPHERE_RATING', 'RANKING', 'CITY', 'ADDRESS', 'PHONE_NO',\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 'RATING', 'REVIEW_TITLE', 'REVIEW_DETAILS', 'REVIEW_DATE']\r\n\u00a0 \u00a0 \u00a0 \u00a0 writer.writerow(header)\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 # Write reviews with restaurant info\r\n\u00a0 \u00a0 \u00a0 \u00a0 for review in reviews:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 row = [\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['name'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['price_level'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['cuisine_type'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['total_rating'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['total_reviews'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['food_rating'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['service_rating'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['value_rating'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['atmosphere_rating'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['ranking'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['city'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['address'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info['phone_no'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 review['rating'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 review['title'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 review['text'],\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 review['date']\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 ]\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 writer.writerow(row)\r\n\r\n\r\ndef main():\r\n\u00a0 \u00a0 base_url = 'https:\/\/www.tripadvisor.com\/Restaurant_Review-g60763-d478965-Reviews-Gallaghers_Steakhouse-New_York_City_New_York.html'\r\n\u00a0 \u00a0 output_filename = 'tripadvisor_ny_restaurant_reviews.csv'\r\n\u00a0 \u00a0 proxy_file = 'proxies.txt'\u00a0 # Make sure this matches your actual proxy file name\r\n\r\n\u00a0 \u00a0 # Load proxies\r\n\u00a0 \u00a0 proxy_list = load_proxies(proxy_file)\r\n\u00a0 \u00a0 if not proxy_list:\r\n\u00a0 \u00a0 \u00a0 \u00a0 print(\"No proxies loaded. Exiting.\")\r\n\u00a0 \u00a0 \u00a0 \u00a0 return\r\n\r\n\u00a0 \u00a0 print(f\"Loaded {len(proxy_list)} proxies\")\r\n\r\n\u00a0 \u00a0 restaurant_info = None\r\n\u00a0 \u00a0 current_page = 1\r\n\r\n\u00a0 \u00a0 try:\r\n\u00a0 \u00a0 \u00a0 \u00a0 while True:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 url = generate_url(base_url, current_page)\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 print(f\"Scraping page {current_page}...\")\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 response = make_request(url, proxy_list)\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 if not response:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 print(f\"Failed to fetch page {current_page}. Stopping.\")\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 break\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 soup = BeautifulSoup(response.content, 'html.parser')\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 # Get restaurant info only once\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 if restaurant_info is None:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 restaurant_info = get_restaurant_info(soup)\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 last_page = check_last_page(soup)\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 print(f\"Total pages to scrape: {last_page}\")\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 reviews = scrape_reviews(soup)\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 # Save to CSV (append mode for all pages after the first)\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 save_mode = 'w' if current_page == 1 else 'a'\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 save_to_csv(restaurant_info, reviews,\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 output_filename, mode=save_mode)\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 print(f\"Page {current_page} completed.\")\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 if last_page and current_page &gt;= last_page:\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 print(\"Reached the last page. Scraping completed.\")\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 break\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 current_page += 1\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 time.sleep(random.uniform(8, 12))\u00a0 # Random delay between pages\r\n\r\n\u00a0 \u00a0 \u00a0 \u00a0 print(f\"All information saved successfully to {output_filename}\")\r\n\r\n\u00a0 \u00a0 except Exception as e:\r\n\u00a0 \u00a0 \u00a0 \u00a0 logging.error(f\"An error occurred: {str(e)}\")\r\n\r\n\r\nif __name__ == \"__main__\":\r\n\u00a0 \u00a0 main()<\/pre>\n<p><span style=\"font-weight: 400;\">Screenshot of the output terminal. For this tutorial, I\u2019m setting to scrape all the reviews for the first 10 pages only.\u00a0<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1182\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/46_run.png\" alt=\"\" width=\"887\" height=\"570\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/46_run.png 887w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/46_run-300x193.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/46_run-768x494.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/10\/46_run-624x401.png 624w\" sizes=\"auto, (max-width: 887px) 100vw, 887px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">You can visit the <\/span><a href=\"https:\/\/github.com\/ainacodes\/tripadvisor_scraper\" rel=\"nofollow noopener\" target=\"_blank\"><span style=\"font-weight: 400;\">source code<\/span><\/a><span style=\"font-weight: 400;\"> to see the example result.<\/span><\/p>\n<p><em><span style=\"font-weight: 400;\">Video Tutorial: <\/span><a href=\"https:\/\/youtu.be\/kFs9YkxGvZE?si=oIsMSJxdhVO6Nf1V\" rel=\"nofollow noopener\" target=\"_blank\"><span style=\"font-weight: 400;\">Extract Restaurant Details, Customer Reviews and Ratings from TripAdvisor using Python<\/span><\/a><\/em><\/p>\n<h2 id=\"conclusion\"><span style=\"font-weight: 400;\">Conclusion<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">We&#8217;ve successfully learned how to scrape TripAdvisor reviews using Python, extracting key details like ratings, review dates, and more. Here&#8217;s a quick recap of what we&#8217;ve achieved:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><b>Scraped restaurant details<\/b><span style=\"font-weight: 400;\">: Name, address, cuisine, and contact info.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Extracted review data<\/b><span style=\"font-weight: 400;\">: Customer feedback, ratings, and review titles.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Handled multiple review pages<\/b><span style=\"font-weight: 400;\">: Using pagination to get more data.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Stored data<\/b><span style=\"font-weight: 400;\">: In a structured format for analysis.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">As you move forward, remember:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><b>Ethics matter<\/b><span style=\"font-weight: 400;\">: Scrape responsibly and respect platform rules.<\/span><\/li>\n<li style=\"font-weight: 400;\"><b>Expand your skills<\/b><span style=\"font-weight: 400;\">: Explore more data points or analyze the sentiment in reviews.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">With these skills, you\u2019re ready to gain valuable insights from TripAdvisor and beyond. Happy scraping!<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Source code: tripadvisor_scraper Video tutorial:\u00a0https:\/\/youtu.be\/kFs9YkxGvZE?si=1ZmcXPBpnUWgjNT7 Table of Content Introduction Ethical consideration Prerequisites Project setup Inspecting the Elements Complete Code First Page The result in CSV&hellip;<\/p>\n","protected":false},"author":25,"featured_media":1184,"comment_status":"open","ping_status":"closed","template":"","meta":{"rank_math_lock_modified_date":false},"categories":[],"class_list":["post-1111","scraping_project","type-scraping_project","status-publish","has-post-thumbnail","hentry"],"_links":{"self":[{"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/scraping_project\/1111","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/scraping_project"}],"about":[{"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/types\/scraping_project"}],"author":[{"embeddable":true,"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/users\/25"}],"replies":[{"embeddable":true,"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/comments?post=1111"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/media\/1184"}],"wp:attachment":[{"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/media?parent=1111"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/categories?post=1111"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}