Forum Replies Created

  • JD Central Thailand utilizes JavaScript and dynamic content loading for product information, so it’s important to work with Scrapy’s middleware settings to handle AJAX requests. Scrapy is excellent for scraping static HTML content, but you may need to integrate it with other tools like Splash if dealing with pages that require JavaScript rendering. By using XPath selectors, you can easily extract the desired product data such as name, price, and availability from JD Central’s HTML structure.

    import scrapy
    class JDSpider(scrapy.Spider):
        name = 'jd_spider'
        start_urls = ['https://www.jd.co.th/th/products']
        def parse(self, response):
            for product in response.xpath('//div[@class="product-card"]'):
                name = product.xpath('.//h3[@class="product-name"]/text()').get()
                price = product.xpath('.//span[@class="price"]/text()').get()
                availability = product.xpath('.//span[@class="availability-status"]/text()').get()
                yield {
                    'name': name.strip(),
                    'price': price.strip(),
                    'availability': availability.strip(),
                }
            next_page = response.xpath('//a[@class="next-page"]/@href').get()
            if next_page:
                yield response.follow(next_page, self.parse)
    
  • When scraping Central Thailand’s product pages, Scrapy offers an easy way to extract prices and availability. The product price is typically found within a specific HTML tag, often within a span or div. Availability might be indicated with classes like in-stock or out-of-stock. One challenge here is that some products may show different availability statuses based on geographic location, so checking for these variations can be key. Scrapy’s ability to handle large amounts of data from multiple pages makes it an excellent tool for this task.

    import scrapy
    class CentralProductScraper(scrapy.Spider):
        name = 'central_product_scraper'
        start_urls = ['https://www.central.co.th/en/shop/category']
        def parse(self, response):
            for product in response.css('div.product-listing'):
                title = product.css('h2.product-name::text').get()
                price = product.css('span.product-price span.price::text').get()
                availability = product.css('div.availability-status::text').get()
                yield {
                    'title': title.strip(),
                    'price': price.strip(),
                    'availability': availability.strip(),
                }
            # Pagination handling
            next_page = response.css('a.pagination-next::attr(href)').get()
            if next_page:
                yield response.follow(next_page, self.parse)