{"id":2233,"date":"2025-12-31T12:20:21","date_gmt":"2025-12-31T12:20:21","guid":{"rendered":"https:\/\/rayobyte.com\/community\/?post_type=scraping_project&#038;p=2233"},"modified":"2025-12-31T12:21:17","modified_gmt":"2025-12-31T12:21:17","slug":"create-a-flight-price-tracker-scraping-airlines-ticket-prices-from-google-flights-using-python","status":"publish","type":"scraping_project","link":"https:\/\/rayobyte.com\/community\/scraping-project\/create-a-flight-price-tracker-scraping-airlines-ticket-prices-from-google-flights-using-python\/","title":{"rendered":"Create a Flight Price Tracker: Scraping Airlines Ticket Prices from Google Flights using Python"},"content":{"rendered":"<h1><span style=\"font-weight: 400\">Create a Flight Price Tracker: Scraping Airlines Ticket Prices from Google Flights using Python<\/span><\/h1>\n<p><span style=\"font-weight: 400\">Source code: <\/span><a href=\"https:\/\/github.com\/ainacodes\/google_flight_scraper\" rel=\"nofollow noopener\" target=\"_blank\"><span style=\"font-weight: 400\">google_flight_scraper<\/span><\/a><span style=\"font-weight: 400\">\u00a0<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Table of Content<\/span><\/h2>\n<p><a href=\"#introduction\">Introduction<\/a><br \/><a href=\"#ethical-consideration\">Ethical Consideration<\/a><br \/><a href=\"#scape-data\">Data that we want to scrape<\/a><br \/><a href=\"#prerequisites\">Prerequisites<\/a><br \/><a href=\"#project-setup\">Project Setup<\/a><br \/><a href=\"#palywright\">Why Playwright?<\/a><br \/><a href=\"#browser-automation\">Setting up Browser Automation<\/a><br \/><a href=\"#google-flight-url\">Understanding Google Flights URL Structure<\/a><br \/><a href=\"#flight-data\">Scrape the the flight data<\/a><br \/><a href=\"#csv-result\">Saving to CSV<\/a><br \/><a href=\"#complete-code\">The complete code<\/a><br \/><a href=\"#proxy-setup\">Setting Up Proxy Rotation<\/a><br \/><a href=\"#conclusion\">Conclusion<\/a><br \/><a href=\"#disclaimer\">Disclaimer<\/a><\/p>\n<h2 id=\"introduction\"><span style=\"font-weight: 400\">Introduction<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Google Flights aggregates data from various airlines and travel companies, providing travelers with comprehensive information about available flights, pricing, and schedules. This allows travelers to compare airline prices, assess flight durations, and monitor environmental impacts, ultimately helping them secure the best travel deals.<\/span><\/p>\n<p><span style=\"font-weight: 400\">In this tutorial, I will guide you through the process of scraping essential flight data from Google Flights using Python and Playwright. You will learn how to extract valuable information such as departure and arrival times, flight durations, prices, and more\u2014all while ensuring that your scraping methods are effective and efficient.<\/span><\/p>\n<p><span style=\"font-weight: 400\">This information is not only valuable for individual travelers but also for businesses. Companies can leverage flight data to conduct competitor analysis, understand customer preferences, and make informed decisions about pricing and marketing strategies. By scraping data from Google Flights, businesses can gain insights into market trends and optimize their offerings to better meet the needs of their customers.<\/span><\/p>\n<h2 id=\"ethical-consideration\"><span style=\"font-weight: 400\">Ethical Consideration<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Web scraping involves legal and ethical responsibilities:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Respect website terms of service<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Avoid overwhelming server resources<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Use scraping for research and personal purposes<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Implement rate limiting and proxy rotation<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Ensure data is not used for commercial exploitation without permission<\/span><\/li>\n<\/ul>\n<h2 id=\"scape-data\"><span style=\"font-weight: 400\">Data that we want to scrape<\/span><\/h2>\n<p><span style=\"font-weight: 400\">We will collect this comprehensive flight information:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Departure times<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Arrival times<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Airline name<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Flight duration<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Number of stops<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Price<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">CO2 emissions<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Emissions comparison with typical flights.<\/span><\/li>\n<\/ul>\n<h2 id=\"prerequisite\"><span style=\"font-weight: 400\">Prerequisites<\/span><\/h2>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Python 3.7+<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Basic Python knowledge<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Required packages: <code>playwright<\/code> and <code>asyncio<\/code><\/span><\/li>\n<\/ul>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">pip install playwright asyncio\nplaywright install\u00a0 # Install browser binaries<\/pre>\n<h2 id=\"palywright\"><span style=\"font-weight: 400\">Why Playwright?<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Playwright is a modern automation framework that makes browser automation straightforward. It supports multiple browsers and offers robust features for handling dynamic websites, including waiting for elements to load and intercepting network requests.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Its asynchronous capabilities allow for efficient handling of multiple tasks, making it suitable for web scraping where speed and performance are crucial.<\/span><\/p>\n<h2 id=\"browser-automation\"><span style=\"font-weight: 400\">Setting up Browser Automation<\/span><\/h2>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">async def setup_browser():\n\u00a0\u00a0\u00a0\u00a0p = await async_playwright().start()\n\u00a0\u00a0\u00a0\u00a0browser = await p.chromium.launch(headless=False)\u00a0 # Set to True in production\n\u00a0\u00a0\u00a0\u00a0page = await browser.new_page()\n\u00a0\u00a0\u00a0\u00a0return p, browser, page<\/pre>\n<p><span style=\"font-weight: 400\">This function initializes the Playwright browser, allowing for web scraping of flight data. The headless parameter can be toggled for visibility during development.<\/span><\/p>\n<h2 id=\"google-flight-url\"><span style=\"font-weight: 400\">Understanding Google Flights URL Structure<\/span><\/h2>\n<p><span style=\"font-weight: 400\">One of the trickiest parts of scraping Google Flights is constructing the correct URLs. Google Flights encodes search parameters in base64 to ensure compactness and security.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-2259 size-full\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/01_browser_url.png\" alt=\"browser url\" width=\"1653\" height=\"136\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/01_browser_url.png 1653w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/01_browser_url-300x25.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/01_browser_url-1024x84.png 1024w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/01_browser_url-768x63.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/01_browser_url-1536x126.png 1536w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/01_browser_url-624x51.png 624w\" sizes=\"auto, (max-width: 1653px) 100vw, 1653px\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-2261 size-full\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/02_search_browser.png\" alt=\"Search browser\" width=\"1756\" height=\"490\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/02_search_browser.png 1756w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/02_search_browser-300x84.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/02_search_browser-1024x286.png 1024w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/02_search_browser-768x214.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/02_search_browser-1536x429.png 1536w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/02_search_browser-624x174.png 624w\" sizes=\"auto, (max-width: 1756px) 100vw, 1756px\" \/><\/p>\n<p><span style=\"font-weight: 400\">Once we click on the \u201csearch\u201d button, we will notice something like this appears in the url \u201cCBwQAhoeEgoyMDI0LTEyLTI1agcIARIDU0ZPcgcIARIDTEFYQAFIAXABggELCP___________wGYAQI\u201d<\/span><\/p>\n<p><span style=\"font-weight: 400\">Let\u2019s decode this url by using <code>base64<\/code><\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import base64\n\nencoded = \"CBwQAhoeEgoyMDI0LTEyLTI1agcIARIDU0ZPcgcIARIDTEFYQAFIAXABggELCP___________wGYAQI\"\ndecoded = base64.urlsafe_b64decode(encoded + \"==\")\nprint(decoded)<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2238\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/03_decode_url_result.png\" alt=\"decode url result\" width=\"963\" height=\"121\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/03_decode_url_result.png 963w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/03_decode_url_result-300x38.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/03_decode_url_result-768x96.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/03_decode_url_result-624x78.png 624w\" sizes=\"auto, (max-width: 963px) 100vw, 963px\" \/><\/p>\n<p><span style=\"font-weight: 400\">From the result, we can confirm that the flight is on 25th of December 2024 departing from San Francisco (SFO) to Los Angeles (LAX)<\/span><\/p>\n<p><span style=\"font-weight: 400\">In order to incorporate this in our scraper, we need to reverse this decode to create the encoding URL.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Let&#8217;s break down how to handles this issue by creating a Class of <code>FlightURLBuilder<\/code>.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">class FlightURLBuilder:<\/pre>\n<h3>Creating Binary Data<\/h3>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">@staticmethod\ndef _create_one_way_bytes(departure: str, destination: str, date: str) -&gt; bytes:\n    return (\n        b'x08x1cx10x02x1ax1ex12n' + date.encode() +\n        b'jx07x08x01x12x03' + departure.encode() +\n        b'rx07x08x01x12x03' + destination.encode() +\n        b'@x01Hx01px01x82x01x0bx08xfcx06`x04x08'\n    )\n<\/pre>\n<p>This code generates a <code>bytes<\/code> object that encodes the flight details (departure, destination, and date).<\/p>\n<h3>Modifying a Base64 String<\/h3>\n<p>This is the result from the encoding<\/p>\n<p>&#8220;CBwQAhoeEgoyMDI0LTEyLTI1agcIARIDU0ZPcgcIARIDTEFYQAFIAXABggELCPwGYAQI&#8221;<\/p>\n<p>\u00a0As the URL should contain 7 underscores before the 6 characters from the end.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">@staticmethod\ndef _modify_base64(encoded_str: str) -&gt; str:\n    insert_index = len(encoded_str) - 6\n    return encoded_str[:insert_index] + '_' * 7 + encoded_str[insert_index:]\n<\/pre>\n<h3>Building the Full URL<\/h3>\n<p>Lastly, let&#8217;s generates a complete Google Flights URL by adding the &#8220;https:\/\/www.google.com\/travel\/flights\/search?tfs=&#8221; at the start<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">@classmethod\ndef build_url(\n    cls,\n    departure: str,\n    destination: str,\n    departure_date: str\n) -&gt; str:\n    flight_bytes = cls._create_one_way_bytes(departure, destination, departure_date)\n    base64_str = base64.b64encode(flight_bytes).decode('utf-8')\n    modified_str = cls._modify_base64(base64_str)\n    return f'https:\/\/www.google.com\/travel\/flights\/search?tfs={modified_str}'\n<\/pre>\n<h2 id=\"flight-data\"><span style=\"font-weight: 400\">Scrape the the flight data<\/span><\/h2>\n<p><span style=\"font-weight: 400\">We will extract the element using selector and aria-label (if aplicable)<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">async def extract_flight_element_text(flight, selector: str, aria_label: Optional[str] = None) -&gt; str:\n\u00a0\u00a0\u00a0\u00a0if aria_label:\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0element = await flight.query_selector(f'{selector}[aria-label*=\"{aria_label}\"]')\n\u00a0\u00a0\u00a0\u00a0else:\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0element = await flight.query_selector(selector)\n\u00a0\u00a0\u00a0\u00a0return await element.inner_text() if element else \"N\/A\"<\/pre>\n<p><span style=\"font-weight: 400\">The <code>extract_flight_element_text<\/code><\/span><span style=\"font-weight: 400\">\u00a0function is an asynchronous utility designed to extract text from elements on a web page. Here&#8217;s how it works:<\/span><\/p>\n<p><span style=\"font-weight: 400\"><strong>Parameters<\/strong>:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">flight: The web element to search within.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">selector: A string representing the CSS selector to locate the element.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">aria_label (optional): An accessibility label to refine the search within the selected elements.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\"><strong>Logic<\/strong>:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">If an aria_label is provided, the function adds a condition to the selector to search for elements containing the specified label.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">It queries the element using the combined selector.<\/span><\/li>\n<\/ul>\n<p><strong>Return Value:<\/strong><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">If the element is found, the function returns its inner text.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">If no element matches the query, it returns &#8220;N\/A&#8221; as a fallback.<\/span><\/li>\n<\/ul>\n<h3><span style=\"font-weight: 400\">Departure time<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2240\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/04_departure_time_element.png\" alt=\"Departure time element\" width=\"851\" height=\"229\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/04_departure_time_element.png 851w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/04_departure_time_element-300x81.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/04_departure_time_element-768x207.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/04_departure_time_element-624x168.png 624w\" sizes=\"auto, (max-width: 851px) 100vw, 851px\" \/><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">departure_time = await extract_flight_element_text(flight, 'span', \"Departure time\")<\/pre>\n<h3><span style=\"font-weight: 400\">Arrival Time<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2242\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/05_arrival_time_element.png\" alt=\"Arrival time element\" width=\"774\" height=\"153\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/05_arrival_time_element.png 774w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/05_arrival_time_element-300x59.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/05_arrival_time_element-768x152.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/05_arrival_time_element-624x123.png 624w\" sizes=\"auto, (max-width: 774px) 100vw, 774px\" \/><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">arrival_time =\u00a0 await extract_flight_element_text(flight, 'span', \"Arrival time\")<\/pre>\n<h3><span style=\"font-weight: 400\">Airline<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2244\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/06_airline_element.png\" alt=\"Airline element\" width=\"767\" height=\"149\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/06_airline_element.png 767w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/06_airline_element-300x58.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/06_airline_element-624x121.png 624w\" sizes=\"auto, (max-width: 767px) 100vw, 767px\" \/><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">airline = await extract_flight_element_text(flight, \".sSHqwe\")<\/pre>\n<h3><span style=\"font-weight: 400\">Flight Duration<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2246\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/07_flight_duration_element.png\" alt=\"Flight duration element\" width=\"723\" height=\"153\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/07_flight_duration_element.png 723w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/07_flight_duration_element-300x63.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/07_flight_duration_element-624x132.png 624w\" sizes=\"auto, (max-width: 723px) 100vw, 723px\" \/><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">duration = await extract_flight_element_text(flight, \"div.gvkrdb\")<\/pre>\n<h3><span style=\"font-weight: 400\">Stops<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2248\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/08_stops_element.png\" alt=\"Stop element\" width=\"773\" height=\"224\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/08_stops_element.png 773w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/08_stops_element-300x87.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/08_stops_element-768x223.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/08_stops_element-624x181.png 624w\" sizes=\"auto, (max-width: 773px) 100vw, 773px\" \/><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">stops =\u00a0 await extract_flight_element_text(flight, \"div.EfT7Ae span.ogfYpf\")<\/pre>\n<h3><span style=\"font-weight: 400\">Price<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2250\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/09_price_element.png\" alt=\"Price element\" width=\"737\" height=\"259\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/09_price_element.png 737w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/09_price_element-300x105.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/09_price_element-624x219.png 624w\" sizes=\"auto, (max-width: 737px) 100vw, 737px\" \/><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">price =\u00a0 await extract_flight_element_text(flight, \"div.FpEdX span\")<\/pre>\n<h3><span style=\"font-weight: 400\">CO2 emission<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2252\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/10_co2_element.png\" alt=\"co2 emission element\" width=\"655\" height=\"291\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/10_co2_element.png 655w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/10_co2_element-300x133.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/10_co2_element-624x277.png 624w\" sizes=\"auto, (max-width: 655px) 100vw, 655px\" \/><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">co2_emissions =\u00a0 await extract_flight_element_text(flight, \"div.O7CXue\")<\/pre>\n<h3><span style=\"font-weight: 400\">Emission Variation<\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2254\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/11_emission_variation_element.png\" alt=\"Emission variation element\" width=\"726\" height=\"156\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/11_emission_variation_element.png 726w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/11_emission_variation_element-300x64.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/11_emission_variation_element-624x134.png 624w\" sizes=\"auto, (max-width: 726px) 100vw, 726px\" \/><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">emissions_variation =\u00a0 await extract_flight_element_text(flight, \"div.N6PNV\")<\/pre>\n<h2 id=\"csv-result\"><span style=\"font-weight: 400\">Saving to CSV<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Before save the information in csv format, we need to make sure the data is clean from any unwanted characters.<\/span><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">def clean_csv(filename: str):\n\u00a0\u00a0\u00a0\u00a0data = pd.read_csv(filename, encoding=\"utf-8\")\u00a0\u00a0\u00a0\n\u00a0\u00a0\u00a0\u00a0def clean_text(value):\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0if isinstance(value, str):\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0return value.replace('\u00c2', '').replace('\u202f', ' ').replace('\u00c3', '').replace('\u00b6', '').strip()\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0return value\n\n\u00a0\u00a0\u00a0\u00a0cleaned_data = data.applymap(clean_text)\n\u00a0\u00a0\u00a0\u00a0cleaned_file_path = f\"{filename}\"\n\u00a0\u00a0\u00a0\u00a0cleaned_data.to_csv(cleaned_file_path, index=False)\n\u00a0\u00a0\u00a0\u00a0print(f\"Cleaned CSV saved to: {cleaned_file_path}\")<\/pre>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">def save_to_csv(data: List[Dict[str, str]], filename: str = \"flight_data.csv\") -&gt; None:\n    if not data:\n        return    \n    headers = list(data[0].keys())    \n    with open(filename, 'w', newline='', encoding='utf-8') as csvfile:\n        writer = csv.DictWriter(csvfile, fieldnames=headers)\n        writer.writeheader()\n        writer.writerows(data)\n    \n    # Clean the saved CSV\n    clean_csv(filename)\n<\/pre>\n<p><span style=\"font-weight: 400\">Here\u2019s the result for the flight departing from San Francisco (SFO) to Los Angeles (LAX) on 25th of December 2024<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-2256\" src=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/12_csv_result-1024x405.png\" alt=\"CSV result\" width=\"640\" height=\"253\" title=\"\" srcset=\"https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/12_csv_result-1024x405.png 1024w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/12_csv_result-300x119.png 300w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/12_csv_result-768x303.png 768w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/12_csv_result-624x247.png 624w, https:\/\/rayobyte.com\/community\/wp-content\/uploads\/2024\/11\/12_csv_result.png 1493w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/p>\n<h2 id=\"complete-code\"><span style=\"font-weight: 400\">The complete code<\/span><\/h2>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">import asyncio\nimport csv\nimport base64\nfrom playwright.async_api import async_playwright\nfrom typing import List, Dict, Optional\nimport pandas as pd\n\n\nclass FlightURLBuilder:\n    \"\"\"Class to handle flight URL creation with base64 encoding.\"\"\"\n    \n    @staticmethod\n    def _create_one_way_bytes(departure: str, destination: str, date: str) -&gt; bytes:\n        \"\"\"Create bytes for one-way flight.\"\"\"\n        return (\n            b'x08x1cx10x02x1ax1ex12n' + date.encode() +\n            b'jx07x08x01x12x03' + departure.encode() +\n            b'rx07x08x01x12x03' + destination.encode() +\n            b'@x01Hx01px01x82x01x0bx08xfcx06`x04x08'\n        )\n    \n    @staticmethod\n    def _modify_base64(encoded_str: str) -&gt; str:\n        \"\"\"Add underscores at the specific position in base64 string.\"\"\"\n        insert_index = len(encoded_str) - 6\n        return encoded_str[:insert_index] + '_' * 7 + encoded_str[insert_index:]\n\n    @classmethod\n    def build_url(\n        cls,\n        departure: str,\n        destination: str,\n        departure_date: str\n    ) -&gt; str:\n        \n        flight_bytes = cls._create_one_way_bytes(departure, destination, departure_date)\n        base64_str = base64.b64encode(flight_bytes).decode('utf-8')\n        modified_str = cls._modify_base64(base64_str)\n        return f'https:\/\/www.google.com\/travel\/flights\/search?tfs={modified_str}'\n\n\nasync def setup_browser():\n    p = await async_playwright().start()\n    browser = await p.chromium.launch(headless=False)\n    page = await browser.new_page()\n    return p, browser, page\n\n\nasync def extract_flight_element_text(flight, selector: str, aria_label: Optional[str] = None) -&gt; str:\n    \"\"\"Extract text from a flight element using selector and optional aria-label.\"\"\"\n    if aria_label:\n        element = await flight.query_selector(f'{selector}[aria-label*=\"{aria_label}\"]')\n    else:\n        element = await flight.query_selector(selector)\n    return await element.inner_text() if element else \"N\/A\"\n\n\nasync def scrape_flight_info(flight) -&gt; Dict[str, str]:\n    \"\"\"Extract all relevant information from a single flight element.\"\"\"\n    departure_time = await extract_flight_element_text(flight, 'span', \"Departure time\")\n    arrival_time =  await extract_flight_element_text(flight, 'span', \"Arrival time\")\n    airline = await extract_flight_element_text(flight, \".sSHqwe\")\n    duration = await extract_flight_element_text(flight, \"div.gvkrdb\")\n    stops =  await extract_flight_element_text(flight, \"div.EfT7Ae span.ogfYpf\")\n    price =  await extract_flight_element_text(flight, \"div.FpEdX span\")\n    co2_emissions =  await extract_flight_element_text(flight, \"div.O7CXue\")\n    emissions_variation =  await extract_flight_element_text(flight, \"div.N6PNV\")\n    return {\n        \"Departure Time\": departure_time,\n        \"Arrival Time\": arrival_time,\n        \"Airline Company\": airline,\n        \"Flight Duration\": duration,\n        \"Stops\": stops,\n        \"Price\": price,\n        \"co2 emissions\": co2_emissions,\n        \"emissions variation\": emissions_variation\n    }\n\ndef clean_csv(filename: str):\n    \"\"\"Clean unwanted characters from the saved CSV file.\"\"\"\n    data = pd.read_csv(filename, encoding=\"utf-8\")\n    \n    def clean_text(value):\n        if isinstance(value, str):\n            return value.replace('\u00c2', '').replace('\u202f', ' ').replace('\u00c3', '').replace('\u00b6', '').strip()\n        return value\n\n    cleaned_data = data.applymap(clean_text)\n    cleaned_file_path = f\"{filename}\"\n    cleaned_data.to_csv(cleaned_file_path, index=False)\n    print(f\"Cleaned CSV saved to: {cleaned_file_path}\")\n\ndef save_to_csv(data: List[Dict[str, str]], filename: str = \"flight_data.csv\") -&gt; None:\n    \"\"\"Save flight data to a CSV file.\"\"\"\n    if not data:\n        return\n    \n    headers = list(data[0].keys())\n    \n    with open(filename, 'w', newline='', encoding='utf-8') as csvfile:\n        writer = csv.DictWriter(csvfile, fieldnames=headers)\n        writer.writeheader()\n        writer.writerows(data)\n    \n    # Clean the saved CSV\n    clean_csv(filename)\n\nasync def scrape_flight_data(one_way_url):\n    flight_data = []\n\n    playwright, browser, page = await setup_browser()\n    \n    try:\n        await page.goto(one_way_url)\n        \n        # Wait for flight data to load\n        await page.wait_for_selector(\".pIav2d\")\n        \n        # Get all flights and extract their information\n        flights = await page.query_selector_all(\".pIav2d\")\n        for flight in flights:\n            flight_info = await scrape_flight_info(flight)\n            flight_data.append(flight_info)\n        \n        # Save the extracted data in CSV format\n        save_to_csv(flight_data)\n            \n    finally:\n        await browser.close()\n        await playwright.stop()\n\nif __name__ == \"__main__\":\n    one_way_url = FlightURLBuilder.build_url(\n        departure=\"SFO\",\n        destination=\"LAX\",\n        departure_date=\"2024-12-25\"\n    )\n    print(\"One-way URL:\", one_way_url)\n\n    # Run the scraper\n    asyncio.run(scrape_flight_data(one_way_url))<\/pre>\n<h2 id=\"proxy-setup\"><span style=\"font-weight: 400\">Setting Up Proxy Rotation<\/span><\/h2>\n<p><span style=\"font-weight: 400\">For large scale scrapers, using proxies helps to distribute your requests across multiple IPs, reducing the risk of being blocked.<\/span><\/p>\n<p><span style=\"font-weight: 400\">I\u2019m using the free residential proxy from <a href=\"https:\/\/rayobyte.com\/products\/residential-proxies\/\">Rayobyte.<\/a><\/span><\/p>\n<h3><span style=\"font-weight: 400\">Save the proxy credential in .env file<\/span><\/h3>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\"># Proxy Configuration\nPROXY_SERVER=http:\/\/proxy.example.com:8080\nPROXY_USERNAME=your_username\nPROXY_PASSWORD=your_password\nPROXY_BYPASS=localhost,127.0.0.1<\/pre>\n<h3><span style=\"font-weight: 400\">Setup the proxy<\/span><\/h3>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">class ProxyConfig:\n\u00a0\u00a0\u00a0\u00a0def __init__(self):\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0self.server = os.getenv('PROXY_SERVER')\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0self.username = os.getenv('PROXY_USERNAME')\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0self.password = os.getenv('PROXY_PASSWORD')\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0self.bypass = os.getenv('PROXY_BYPASS')\n\n\u00a0\u00a0\u00a0\u00a0def get_proxy_settings(self) -&gt; Optional[Dict]:\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0if not self.server:\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0return None\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0proxy_settings = {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\"server\": self.server\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0}\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0if self.username and self.password:\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0proxy_settings.update({\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\"username\": self.username,\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\"password\": self.password\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0})\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0if self.bypass:\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0proxy_settings[\"bypass\"] = self.bypass\n\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0return proxy_settings\n\n\u00a0\u00a0\u00a0\u00a0@property\n\u00a0\u00a0\u00a0\u00a0def is_configured(self) -&gt; bool:\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0return bool(self.server)<\/pre>\n<h3><span style=\"font-weight: 400\">Setup the browser with the proxy<\/span><\/h3>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">async def setup_browser():\n\u00a0\u00a0\u00a0\u00a0p = await async_playwright().start()\u00a0\u00a0\n\u00a0\u00a0\u00a0\u00a0browser_settings = {\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\"headless\": False\n\u00a0\u00a0\u00a0\u00a0}\n\u00a0\u00a0\u00a0\u00a0\n\u00a0\u00a0\u00a0\u00a0# Initialize proxy configuration from environment variables\n\u00a0\u00a0\u00a0\u00a0proxy_config = ProxyConfig()\n\u00a0\u00a0\u00a0\u00a0if proxy_config.is_configured:\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0proxy_settings = proxy_config.get_proxy_settings()\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0if proxy_settings:\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0browser_settings[\"proxy\"] = proxy_settings\n\u00a0\u00a0\u00a0\n\u00a0\u00a0\u00a0\u00a0browser = await p.chromium.launch(**browser_settings)\n\u00a0\u00a0\u00a0\u00a0page = await browser.new_page()\n\n\u00a0\u00a0\u00a0\u00a0return p, browser, page<\/pre>\n<h2 id=\"conclusion\"><span style=\"font-weight: 400\">Conclusion<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Building a flight price tracker using Python and Playwright allows you to automate the collection of valuable flight data for personal or business purposes. In this tutorial, you learned how to:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Understand and decode Google Flights URLs<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Automate browser actions with Playwright<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Extract and clean flight data<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Save data in a structured CSV format<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Enhance the scraper with proxy rotation and delays<\/span><\/li>\n<\/ul>\n<h2 id=\"disclaimer\"><span style=\"font-weight: 400\">Disclaimer<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Scraping Google Flights or any similar service should always adhere to ethical guidelines and respect the site&#8217;s terms of service. This tutorial is for educational purposes only. Use this tool responsibly and only for permitted purposes.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Check out this <a href=\"https:\/\/github.com\/ainacodes\/google_flight_scraper\" rel=\"nofollow noopener\" target=\"_blank\">GitHub Repository<\/a> for the complete source code.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Happy Scraping! \ud83d\ude80<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Create a Flight Price Tracker: Scraping Airlines Ticket Prices from Google Flights using Python Source code: google_flight_scraper\u00a0 Table of Content IntroductionEthical ConsiderationData that we want&hellip;<\/p>\n","protected":false},"author":25,"featured_media":2257,"comment_status":"open","ping_status":"closed","template":"","meta":{"rank_math_lock_modified_date":false},"categories":[],"class_list":["post-2233","scraping_project","type-scraping_project","status-publish","has-post-thumbnail","hentry"],"_links":{"self":[{"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/scraping_project\/2233","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/scraping_project"}],"about":[{"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/types\/scraping_project"}],"author":[{"embeddable":true,"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/users\/25"}],"replies":[{"embeddable":true,"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/comments?post=2233"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/media\/2257"}],"wp:attachment":[{"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/media?parent=2233"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rayobyte.com\/community\/wp-json\/wp\/v2\/categories?post=2233"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}