Replies – Discussions – Aston Martial

Forum Replies Created

Aston Martial

Member

12/12/2024 at 10:37 am in reply to: How can I scrape product details from JD Central Thailand using Python n Scrapy?

JD Central Thailand utilizes JavaScript and dynamic content loading for product information, so it’s important to work with Scrapy’s middleware settings to handle AJAX requests. Scrapy is excellent for scraping static HTML content, but you may need to integrate it with other tools like Splash if dealing with pages that require JavaScript rendering. By using XPath selectors, you can easily extract the desired product data such as name, price, and availability from JD Central’s HTML structure.

import scrapy
class JDSpider(scrapy.Spider):
    name = 'jd_spider'
    start_urls = ['https://www.jd.co.th/th/products']
    def parse(self, response):
        for product in response.xpath('//div[@class="product-card"]'):
            name = product.xpath('.//h3[@class="product-name"]/text()').get()
            price = product.xpath('.//span[@class="price"]/text()').get()
            availability = product.xpath('.//span[@class="availability-status"]/text()').get()
            yield {
                'name': name.strip(),
                'price': price.strip(),
                'availability': availability.strip(),
            }
        next_page = response.xpath('//a[@class="next-page"]/@href').get()
        if next_page:
            yield response.follow(next_page, self.parse)

Aston Martial

Member

12/12/2024 at 10:35 am in reply to: Scrape product availability, price from Central Thailand’s using Python n Scrapy

When scraping Central Thailand’s product pages, Scrapy offers an easy way to extract prices and availability. The product price is typically found within a specific HTML tag, often within a span or div. Availability might be indicated with classes like in-stock or out-of-stock. One challenge here is that some products may show different availability statuses based on geographic location, so checking for these variations can be key. Scrapy’s ability to handle large amounts of data from multiple pages makes it an excellent tool for this task.

import scrapy
class CentralProductScraper(scrapy.Spider):
    name = 'central_product_scraper'
    start_urls = ['https://www.central.co.th/en/shop/category']
    def parse(self, response):
        for product in response.css('div.product-listing'):
            title = product.css('h2.product-name::text').get()
            price = product.css('span.product-price span.price::text').get()
            availability = product.css('div.availability-status::text').get()
            yield {
                'title': title.strip(),
                'price': price.strip(),
                'availability': availability.strip(),
            }
        # Pagination handling
        next_page = response.css('a.pagination-next::attr(href)').get()
        if next_page:
            yield response.follow(next_page, self.parse)