Replies – Discussions – Lalitha Kreka

Forum Replies Created

Lalitha Kreka

Member

12/12/2024 at 8:09 am in reply to: How can I scrape product details from JD Central Thailand using Python n Scrapy?

To scrape product details from JD Central Thailand, you’ll need to target the HTML elements containing product names, prices, and availability. JD Central’s product listings typically contain structured data inside specific class attributes. Scrapy makes it easy to extract this data using CSS selectors. Handling pagination correctly is essential to scrape products across multiple pages, which can be done using Scrapy’s response.follow() method to navigate through links.

import scrapy
class JDSpider(scrapy.Spider):
    name = 'jd_spider'
    start_urls = ['https://www.jd.co.th/th/search?query=laptop']
    def parse(self, response):
        for product in response.xpath('//div[@class="product-item"]'):
            title = product.xpath('.//div[@class="product-name"]/text()').get()
            price = product.xpath('.//span[@class="product-price"]/text()').get()
            availability = product.xpath('.//span[@class="product-availability"]/text()').get()
            yield {
                'title': title.strip(),
                'price': price.strip(),
                'availability': availability.strip(),
            }
        next_page = response.xpath('//a[@class="next"]/@href').get()
        if next_page:
            yield response.follow(next_page, self.parse)

Lalitha Kreka

Member

12/12/2024 at 8:07 am in reply to: Scrape product availability, price from Central Thailand’s using Python n Scrapy

The Central Thailand website may require special handling when scraping product prices and availability due to the dynamic nature of its e-commerce platform. Using Scrapy, you can easily handle the extraction of product details using selectors, while dealing with issues like pagination and dynamic content loading. One key step is to identify the correct XPath for the price and availability fields, as these can often be found in separate span tags or inside attributes like data-availability. Scrapy’s pagination feature helps you scrape multiple pages of products automatically.

import scrapy
class CentralSpider(scrapy.Spider):
    name = 'central_spider'
    start_urls = ['https://www.central.co.th/en/collections']
    def parse(self, response):
        for product in response.css('div.product'):
            title = product.css('.product-title::text').get()
            price = product.css('.product-price span::text').get()
            availability = product.css('.availability::text').get()
            yield {
                'product': title.strip(),
                'price': price.strip(),
                'availability': availability.strip(),
            }
        # Handle pagination
        next_page = response.css('a.pagination-next::attr(href)').get()
        if next_page:
            yield response.follow(next_page, self.parse)