News Feed Forums General Web Scraping Compare Python and Ruby for scraping product reviews on Tiki Vietnam

  • Compare Python and Ruby for scraping product reviews on Tiki Vietnam

    Posted by Aretha Melech on 12/14/2024 at 8:37 am

    How does scraping product reviews from Tiki, one of Vietnam’s largest e-commerce platforms, differ between Python and Ruby? Would Python’s BeautifulSoup library be more efficient for parsing static HTML, or does Ruby’s Nokogiri offer a simpler and more elegant solution? How do both languages handle dynamic content, such as paginated reviews or JavaScript-rendered elements?
    Below are two implementations—one in Python and one in Ruby—for scraping product reviews from a Tiki product page. Which approach better handles the site’s structure and ensures accurate data extractionPython Implementation:

    import requests
    from bs4 import BeautifulSoup
    # URL of the Tiki product page
    url = "https://tiki.vn/product-page"
    # Headers to mimic a browser request
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
    }
    # Fetch the page content
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, "html.parser")
        # Extract reviews
        reviews = soup.find_all("div", class_="review-item")
        for idx, review in enumerate(reviews, 1):
            reviewer = review.find("span", class_="reviewer-name").text.strip() if review.find("span", class_="reviewer-name") else "Anonymous"
            comment = review.find("p", class_="review-text").text.strip() if review.find("p", class_="review-text") else "No comment"
            print(f"Review {idx}: {reviewer} - {comment}")
    else:
        print(f"Failed to fetch the page. Status code: {response.status_code}")
    

    Ruby Implementation:

    require 'nokogiri'
    require 'open-uri'
    # URL of the Tiki product page
    url = 'https://tiki.vn/product-page'
    # Fetch the page content
    doc = Nokogiri::HTML(URI.open(url))
    # Scrape reviews
    reviews = doc.css('.review-item')
    if reviews.any?
      reviews.each_with_index do |review, index|
        reviewer = review.at_css('.reviewer-name')&.text&.strip || 'Anonymous'
        comment = review.at_css('.review-text')&.text&.strip || 'No comment'
        puts "Review #{index + 1}: #{reviewer} - #{comment}"
      end
    else
      puts "No reviews found."
    end
    
    Margery Roxana replied 1 day, 6 hours ago 5 Members · 4 Replies
  • 4 Replies
  • Shakti Siria

    Member
    12/18/2024 at 10:27 am

    Python’s BeautifulSoup is highly efficient for parsing static HTML, making it a great choice for smaller tasks. However, it may struggle with dynamic content unless combined with tools like Selenium for JavaScript rendering.

  • Lilla Roma

    Member
    12/21/2024 at 6:00 am

    Ruby’s Nokogiri is simple and intuitive for static content scraping, but like Python, it requires additional libraries or tools, such as Watir, to handle JavaScript-heavy pages or dynamic content.

  • Rayan Todorka

    Member
    12/21/2024 at 6:34 am

    Both Python and Ruby would require enhancements for paginated reviews. By iterating over the “Next Page” button, the scripts could collect reviews across multiple pages for a more comprehensive dataset.

  • Margery Roxana

    Member
    12/21/2024 at 6:52 am

    For large-scale scraping, Python offers better scalability due to its rich ecosystem of libraries and frameworks. Ruby, while powerful, may require more manual effort for handling advanced scraping tasks involving concurrency.

Log in to reply.