News Feed Forums General Web Scraping Compare using Python and Go to scrape hotel prices from Traveloka Indonesia

  • Compare using Python and Go to scrape hotel prices from Traveloka Indonesia

    Posted by Alexius Poncio on 12/14/2024 at 7:44 am

    How does scraping hotel prices from Traveloka, a popular travel booking website in Indonesia, differ when using Python versus Go? Does Python’s BeautifulSoup library provide enough flexibility to handle the site’s structure, or does Go’s Colly library offer better performance for large-scale scraping tasks? How do both languages handle dynamic content, such as discounts or region-specific pricing, which are common on travel websites?
    Below are two potential implementations—one in Python and one in Go—to scrape hotel prices from a Traveloka page. Which approach better handles the complexities of dynamic content and ensures accurate data extraction?Python Implementation:

    import requests
    from bs4 import BeautifulSoup
    # URL of the Traveloka hotel page
    url = "https://www.traveloka.com/en-id/hotel/product-page"
    # Headers to mimic a browser request
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
    }
    # Fetch the page content
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, "html.parser")
        # Extract hotel prices
        prices = soup.find_all("div", class_="hotel-price")
        for idx, price in enumerate(prices, 1):
            print(f"Hotel {idx} Price:", price.text.strip())
    else:
        print(f"Failed to fetch the page. Status code: {response.status_code}")
    

    Go Implementation:

    package main
    import (
    	"fmt"
    	"log"
    	"github.com/gocolly/colly"
    )
    func main() {
    	// Create a new Colly collector
    	c := colly.NewCollector()
    	// Scrape hotel prices
    	c.OnHTML(".hotel-price", func(e *colly.HTMLElement) {
    		price := e.Text
    		fmt.Println("Hotel Price:", price)
    	})
    	// Handle errors
    	c.OnError(func(_ *colly.Response, err error) {
    		log.Println("Error occurred:", err)
    	})
    	// Visit the Traveloka hotel page
    	err := c.Visit("https://www.traveloka.com/en-id/hotel/product-page")
    	if err != nil {
    		log.Fatalf("Failed to visit website: %v", err)
    	}
    }
    
    Niclas Yvonne replied 1 month ago 5 Members · 4 Replies
  • 4 Replies
  • Marta Era

    Member
    12/17/2024 at 10:24 am

    Python’s BeautifulSoup library is great for beginners due to its simplicity and readability. However, it might not perform as well as Go’s Colly library when handling large datasets or concurrent scraping tasks.

  • Laura Warda

    Member
    12/18/2024 at 9:46 am

    Go’s Colly library is optimized for speed and concurrency, making it a better choice for scraping large-scale travel websites like Traveloka. It can handle multiple pages simultaneously, which is useful for extracting prices from several hotels.

  • Alheri Mien

    Member
    12/19/2024 at 12:05 pm

    Dynamic content, such as hotel prices loaded via JavaScript, may require additional tools like Selenium in Python or Playwright for Go. These tools can simulate a browser and ensure that all elements are rendered before scraping.

  • Niclas Yvonne

    Member
    12/21/2024 at 5:32 am

    For scalability, Go’s efficient resource handling makes it ideal for projects involving high-volume scraping. However, Python’s ecosystem includes advanced data processing and analysis libraries, which might be beneficial for post-scraping tasks.

Log in to reply.