General Web Scraping

Compare Ruby and Go to scrape shipping details from Yahoo! Taiwan

Posted by Rayna Meinrad on 12/14/2024 at 7:19 am

How does scraping shipping details from Yahoo! Taiwan differ when using Ruby versus Go? Is Ruby’s Nokogiri gem easier to implement for parsing HTML, or does Go’s Colly library provide better performance for large-scale scraping? How do both languages handle dynamically loaded content, such as shipping costs or estimated delivery times that might depend on user interactions?
Below are two potential implementations—one in Ruby and one in Go—to scrape shipping details from Yahoo! Taiwan. Which approach is better suited for the task at hand, and which would you choose for scalability and ease of maintenance?Ruby Implementation:

require 'nokogiri'
require 'open-uri'
# URL of the Yahoo! Taiwan product page
url = 'https://tw.buy.yahoo.com/product-page'
# Fetch the page content
doc = Nokogiri::HTML(URI.open(url))
# Scrape shipping details
shipping_section = doc.at_css('.shipping-info')
if shipping_section
  shipping_cost = shipping_section.at_css('.cost')&.text&.strip || 'No shipping cost available'
  delivery_time = shipping_section.at_css('.time')&.text&.strip || 'No delivery time specified'
  puts "Shipping Cost: #{shipping_cost}"
  puts "Delivery Time: #{delivery_time}"
else
  puts "Shipping details not found."
end

Go Implementation:

package main
import (
	"fmt"
	"log"
	"github.com/gocolly/colly"
)
func main() {
	// Create a new Colly collector
	c := colly.NewCollector()
	// Scrape shipping details
	c.OnHTML(".shipping-info", func(e *colly.HTMLElement) {
		cost := e.ChildText(".cost")
		time := e.ChildText(".time")
		if cost == "" {
			cost = "No shipping cost available"
		}
		if time == "" {
			time = "No delivery time specified"
		}
		fmt.Printf("Shipping Cost: %s\nDelivery Time: %s\n", cost, time)
	})
	// Handle errors
	c.OnError(func(_ *colly.Response, err error) {
		log.Println("Error occurred:", err)
	})
	// Visit the Yahoo! Taiwan product page
	err := c.Visit("https://tw.buy.yahoo.com/product-page")
	if err != nil {
		log.Fatalf("Failed to visit website: %v", err)
	}
}

Fiachna Iyabo replied 2 months ago 5 Members · 4 Replies

4 Replies

Gerlind Kelley

Member
12/17/2024 at 10:11 am

Ruby’s Nokogiri is simple and intuitive, making it a great choice for developers who need a straightforward way to parse HTML. However, it may not perform as efficiently as Go’s Colly library when handling a large number of pages.
Deisy Swarna

Member
12/18/2024 at 9:36 am

Go’s Colly library is faster and more efficient for large-scale scraping due to its concurrent request handling. If scalability is a concern, Go might be the better choice for scraping shipping details from multiple product pages.
Ella Karl

Member
12/19/2024 at 11:51 am

If the shipping details are dynamically loaded, neither Ruby’s Nokogiri nor Go’s Colly would suffice alone. In such cases, integrating tools like Selenium for Ruby or Playwright for Go would help render JavaScript and access the required data.
Fiachna Iyabo

Member
12/20/2024 at 10:04 am

Ruby is often considered easier to learn and implement for smaller tasks, while Go’s strong performance and concurrency capabilities make it ideal for larger projects. Choosing between the two depends on the scale and complexity of the scraping task.

Compare Ruby and Go to scrape shipping details from Yahoo! Taiwan

Gerlind Kelley

Deisy Swarna

Ella Karl

Fiachna Iyabo