-
Compare Python and Ruby for scraping product reviews on Tiki Vietnam
How does scraping product reviews from Tiki, one of Vietnam’s largest e-commerce platforms, differ between Python and Ruby? Would Python’s BeautifulSoup library be more efficient for parsing static HTML, or does Ruby’s Nokogiri offer a simpler and more elegant solution? How do both languages handle dynamic content, such as paginated reviews or JavaScript-rendered elements?
Below are two implementations—one in Python and one in Ruby—for scraping product reviews from a Tiki product page. Which approach better handles the site’s structure and ensures accurate data extractionPython Implementation:import requests from bs4 import BeautifulSoup # URL of the Tiki product page url = "https://tiki.vn/product-page" # Headers to mimic a browser request headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" } # Fetch the page content response = requests.get(url, headers=headers) if response.status_code == 200: soup = BeautifulSoup(response.content, "html.parser") # Extract reviews reviews = soup.find_all("div", class_="review-item") for idx, review in enumerate(reviews, 1): reviewer = review.find("span", class_="reviewer-name").text.strip() if review.find("span", class_="reviewer-name") else "Anonymous" comment = review.find("p", class_="review-text").text.strip() if review.find("p", class_="review-text") else "No comment" print(f"Review {idx}: {reviewer} - {comment}") else: print(f"Failed to fetch the page. Status code: {response.status_code}")
Ruby Implementation:
require 'nokogiri' require 'open-uri' # URL of the Tiki product page url = 'https://tiki.vn/product-page' # Fetch the page content doc = Nokogiri::HTML(URI.open(url)) # Scrape reviews reviews = doc.css('.review-item') if reviews.any? reviews.each_with_index do |review, index| reviewer = review.at_css('.reviewer-name')&.text&.strip || 'Anonymous' comment = review.at_css('.review-text')&.text&.strip || 'No comment' puts "Review #{index + 1}: #{reviewer} - #{comment}" end else puts "No reviews found." end
Log in to reply.