JD Central Thailand utilizes JavaScript and dynamic content loading for product information, so it’s important to work with Scrapy’s middleware settings to handle AJAX requests. Scrapy is excellent for scraping static HTML content, but you may need to integrate it with other tools like Splash if dealing with pages that require JavaScript rendering. By using XPath selectors, you can easily extract the desired product data such as name, price, and availability from JD Central’s HTML structure.
import scrapy
class JDSpider(scrapy.Spider):
name = 'jd_spider'
start_urls = ['https://www.jd.co.th/th/products']
def parse(self, response):
for product in response.xpath('//div[@class="product-card"]'):
name = product.xpath('.//h3[@class="product-name"]/text()').get()
price = product.xpath('.//span[@class="price"]/text()').get()
availability = product.xpath('.//span[@class="availability-status"]/text()').get()
yield {
'name': name.strip(),
'price': price.strip(),
'availability': availability.strip(),
}
next_page = response.xpath('//a[@class="next-page"]/@href').get()
if next_page:
yield response.follow(next_page, self.parse)