News Feed Forums General Web Scraping Compare Ruby and Go to scrape shipping details from Yahoo! Taiwan

  • Compare Ruby and Go to scrape shipping details from Yahoo! Taiwan

    Posted by Rayna Meinrad on 12/14/2024 at 7:19 am

    How does scraping shipping details from Yahoo! Taiwan differ when using Ruby versus Go? Is Ruby’s Nokogiri gem easier to implement for parsing HTML, or does Go’s Colly library provide better performance for large-scale scraping? How do both languages handle dynamically loaded content, such as shipping costs or estimated delivery times that might depend on user interactions?
    Below are two potential implementations—one in Ruby and one in Go—to scrape shipping details from Yahoo! Taiwan. Which approach is better suited for the task at hand, and which would you choose for scalability and ease of maintenance?Ruby Implementation:

    require 'nokogiri'
    require 'open-uri'
    # URL of the Yahoo! Taiwan product page
    url = 'https://tw.buy.yahoo.com/product-page'
    # Fetch the page content
    doc = Nokogiri::HTML(URI.open(url))
    # Scrape shipping details
    shipping_section = doc.at_css('.shipping-info')
    if shipping_section
      shipping_cost = shipping_section.at_css('.cost')&.text&.strip || 'No shipping cost available'
      delivery_time = shipping_section.at_css('.time')&.text&.strip || 'No delivery time specified'
      puts "Shipping Cost: #{shipping_cost}"
      puts "Delivery Time: #{delivery_time}"
    else
      puts "Shipping details not found."
    end
    

    Go Implementation:

    package main
    import (
    	"fmt"
    	"log"
    	"github.com/gocolly/colly"
    )
    func main() {
    	// Create a new Colly collector
    	c := colly.NewCollector()
    	// Scrape shipping details
    	c.OnHTML(".shipping-info", func(e *colly.HTMLElement) {
    		cost := e.ChildText(".cost")
    		time := e.ChildText(".time")
    		if cost == "" {
    			cost = "No shipping cost available"
    		}
    		if time == "" {
    			time = "No delivery time specified"
    		}
    		fmt.Printf("Shipping Cost: %s\nDelivery Time: %s\n", cost, time)
    	})
    	// Handle errors
    	c.OnError(func(_ *colly.Response, err error) {
    		log.Println("Error occurred:", err)
    	})
    	// Visit the Yahoo! Taiwan product page
    	err := c.Visit("https://tw.buy.yahoo.com/product-page")
    	if err != nil {
    		log.Fatalf("Failed to visit website: %v", err)
    	}
    }
    
    Fiachna Iyabo replied 2 days, 9 hours ago 5 Members · 4 Replies
  • 4 Replies
  • Gerlind Kelley

    Member
    12/17/2024 at 10:11 am

    Ruby’s Nokogiri is simple and intuitive, making it a great choice for developers who need a straightforward way to parse HTML. However, it may not perform as efficiently as Go’s Colly library when handling a large number of pages.

  • Deisy Swarna

    Member
    12/18/2024 at 9:36 am

    Go’s Colly library is faster and more efficient for large-scale scraping due to its concurrent request handling. If scalability is a concern, Go might be the better choice for scraping shipping details from multiple product pages.

  • Ella Karl

    Member
    12/19/2024 at 11:51 am

    If the shipping details are dynamically loaded, neither Ruby’s Nokogiri nor Go’s Colly would suffice alone. In such cases, integrating tools like Selenium for Ruby or Playwright for Go would help render JavaScript and access the required data.

  • Fiachna Iyabo

    Member
    12/20/2024 at 10:04 am

    Ruby is often considered easier to learn and implement for smaller tasks, while Go’s strong performance and concurrency capabilities make it ideal for larger projects. Choosing between the two depends on the scale and complexity of the scraping task.

Log in to reply.