Forum Replies Created

  • To scrape product details from JD Central Thailand, you’ll need to target the HTML elements containing product names, prices, and availability. JD Central’s product listings typically contain structured data inside specific class attributes. Scrapy makes it easy to extract this data using CSS selectors. Handling pagination correctly is essential to scrape products across multiple pages, which can be done using Scrapy’s response.follow() method to navigate through links.

    import scrapy
    class JDSpider(scrapy.Spider):
        name = 'jd_spider'
        start_urls = ['https://www.jd.co.th/th/search?query=laptop']
        def parse(self, response):
            for product in response.xpath('//div[@class="product-item"]'):
                title = product.xpath('.//div[@class="product-name"]/text()').get()
                price = product.xpath('.//span[@class="product-price"]/text()').get()
                availability = product.xpath('.//span[@class="product-availability"]/text()').get()
                yield {
                    'title': title.strip(),
                    'price': price.strip(),
                    'availability': availability.strip(),
                }
            next_page = response.xpath('//a[@class="next"]/@href').get()
            if next_page:
                yield response.follow(next_page, self.parse)
    
  • The Central Thailand website may require special handling when scraping product prices and availability due to the dynamic nature of its e-commerce platform. Using Scrapy, you can easily handle the extraction of product details using selectors, while dealing with issues like pagination and dynamic content loading. One key step is to identify the correct XPath for the price and availability fields, as these can often be found in separate span tags or inside attributes like data-availability. Scrapy’s pagination feature helps you scrape multiple pages of products automatically.

    import scrapy
    class CentralSpider(scrapy.Spider):
        name = 'central_spider'
        start_urls = ['https://www.central.co.th/en/collections']
        def parse(self, response):
            for product in response.css('div.product'):
                title = product.css('.product-title::text').get()
                price = product.css('.product-price span::text').get()
                availability = product.css('.availability::text').get()
                yield {
                    'product': title.strip(),
                    'price': price.strip(),
                    'availability': availability.strip(),
                }
            # Handle pagination
            next_page = response.css('a.pagination-next::attr(href)').get()
            if next_page:
                yield response.follow(next_page, self.parse)