General Web Scraping

Compare using Python and Go to scrape hotel prices from Traveloka Indonesia

Posted by Alexius Poncio on 12/14/2024 at 7:44 am

How does scraping hotel prices from Traveloka, a popular travel booking website in Indonesia, differ when using Python versus Go? Does Python’s BeautifulSoup library provide enough flexibility to handle the site’s structure, or does Go’s Colly library offer better performance for large-scale scraping tasks? How do both languages handle dynamic content, such as discounts or region-specific pricing, which are common on travel websites?
Below are two potential implementations—one in Python and one in Go—to scrape hotel prices from a Traveloka page. Which approach better handles the complexities of dynamic content and ensures accurate data extraction?Python Implementation:

import requests
from bs4 import BeautifulSoup
# URL of the Traveloka hotel page
url = "https://www.traveloka.com/en-id/hotel/product-page"
# Headers to mimic a browser request
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}
# Fetch the page content
response = requests.get(url, headers=headers)
if response.status_code == 200:
    soup = BeautifulSoup(response.content, "html.parser")
    # Extract hotel prices
    prices = soup.find_all("div", class_="hotel-price")
    for idx, price in enumerate(prices, 1):
        print(f"Hotel {idx} Price:", price.text.strip())
else:
    print(f"Failed to fetch the page. Status code: {response.status_code}")

Go Implementation:

package main
import (
	"fmt"
	"log"
	"github.com/gocolly/colly"
)
func main() {
	// Create a new Colly collector
	c := colly.NewCollector()
	// Scrape hotel prices
	c.OnHTML(".hotel-price", func(e *colly.HTMLElement) {
		price := e.Text
		fmt.Println("Hotel Price:", price)
	})
	// Handle errors
	c.OnError(func(_ *colly.Response, err error) {
		log.Println("Error occurred:", err)
	})
	// Visit the Traveloka hotel page
	err := c.Visit("https://www.traveloka.com/en-id/hotel/product-page")
	if err != nil {
		log.Fatalf("Failed to visit website: %v", err)
	}
}

Niclas Yvonne replied 3 months, 2 weeks ago 5 Members · 4 Replies

4 Replies

Marta Era

Member
12/17/2024 at 10:24 am

Python’s BeautifulSoup library is great for beginners due to its simplicity and readability. However, it might not perform as well as Go’s Colly library when handling large datasets or concurrent scraping tasks.
Laura Warda

Member
12/18/2024 at 9:46 am

Go’s Colly library is optimized for speed and concurrency, making it a better choice for scraping large-scale travel websites like Traveloka. It can handle multiple pages simultaneously, which is useful for extracting prices from several hotels.
Alheri Mien

Member
12/19/2024 at 12:05 pm

Dynamic content, such as hotel prices loaded via JavaScript, may require additional tools like Selenium in Python or Playwright for Go. These tools can simulate a browser and ensure that all elements are rendered before scraping.
Niclas Yvonne

Member
12/21/2024 at 5:32 am

For scalability, Go’s efficient resource handling makes it ideal for projects involving high-volume scraping. However, Python’s ecosystem includes advanced data processing and analysis libraries, which might be beneficial for post-scraping tasks.

Compare using Python and Go to scrape hotel prices from Traveloka Indonesia

Marta Era

Laura Warda

Alheri Mien

Niclas Yvonne