Scrape product availability, price from Central Thailand’s using Python n Scrapy

General Web Scraping

Scrape product availability, price from Central Thailand’s using Python n Scrapy

Posted by Ketut Hippolytos on 12/11/2024 at 10:59 am
Scraping Central Thailand’s e-commerce website using Scrapy is efficient when you want to extract structured data such as product availability and price. The first step is to navigate to the specific product category page, where you can locate the price and availability by inspecting the HTML elements. You can use Scrapy’s XPath or CSS selectors to target the specific data fields. One challenge when scraping product availability is handling items that may not be listed as “in stock” but are available via special request. The next challenge is to ensure you paginate through product pages for a complete dataset.
```
import scrapy
class CentralEcommerceSpider(scrapy.Spider):
    name = 'central_ecommerce'
    start_urls = ['https://www.central.co.th/en/product-category']
    def parse(self, response):
        products = response.css('div.product-item')
        for product in products:
            title = product.css('div.product-name::text').get()
            price = product.css('span.product-price::text').get()
            availability = product.css('span.availability-status::text').get()
            yield {
                'title': title.strip(),
                'price': price.strip(),
                'availability': availability.strip(),
            }
        # Pagination handling
        next_page = response.css('a.next-page::attr(href)').get()
        if next_page:
            yield response.follow(next_page, self.parse)
```
Gerel Tomislav replied 4 months ago 5 Members · 4 Replies
4 Replies

Lalitha Kreka

Member

12/12/2024 at 8:07 am

The Central Thailand website may require special handling when scraping product prices and availability due to the dynamic nature of its e-commerce platform. Using Scrapy, you can easily handle the extraction of product details using selectors, while dealing with issues like pagination and dynamic content loading. One key step is to identify the correct XPath for the price and availability fields, as these can often be found in separate span tags or inside attributes like data-availability. Scrapy’s pagination feature helps you scrape multiple pages of products automatically.

import scrapy
class CentralSpider(scrapy.Spider):
    name = 'central_spider'
    start_urls = ['https://www.central.co.th/en/collections']
    def parse(self, response):
        for product in response.css('div.product'):
            title = product.css('.product-title::text').get()
            price = product.css('.product-price span::text').get()
            availability = product.css('.availability::text').get()
            yield {
                'product': title.strip(),
                'price': price.strip(),
                'availability': availability.strip(),
            }
        # Handle pagination
        next_page = response.css('a.pagination-next::attr(href)').get()
        if next_page:
            yield response.follow(next_page, self.parse)

Elisavet Jordana

Member

12/12/2024 at 8:43 am

To scrape product availability and prices from Central Thailand’s e-commerce site, Scrapy is well-suited for the task, particularly when dealing with structured HTML pages. By using Scrapy’s XPath selectors, you can navigate the page and extract information such as product price, availability status, and product titles. The page might use JavaScript to load certain content, so a potential challenge here is ensuring that all the content is loaded before scraping. You may need to configure Scrapy’s download delay to mimic human behavior and prevent rate-limiting from the server.

import scrapy
class CentralScraperSpider(scrapy.Spider):
    name = 'central_scraper'
    start_urls = ['https://www.central.co.th/en/product-category']
    def parse(self, response):
        for product in response.xpath('//div[@class="product-item"]'):
            title = product.xpath('.//div[@class="product-name"]/text()').get()
            price = product.xpath('.//span[@class="price"]/text()').get()
            availability = product.xpath('.//span[@class="in-stock"]/text()').get()
            yield {
                'product': title.strip(),
                'price': price.strip(),
                'availability': availability.strip(),
            }
        # Handle pagination
        next_page = response.xpath('//a[contains(@class, "pagination-next")]/@href').get()
        if next_page:
            yield response.follow(next_page, self.parse)

Aston Martial

Member

12/12/2024 at 10:35 am

When scraping Central Thailand’s product pages, Scrapy offers an easy way to extract prices and availability. The product price is typically found within a specific HTML tag, often within a span or div. Availability might be indicated with classes like in-stock or out-of-stock. One challenge here is that some products may show different availability statuses based on geographic location, so checking for these variations can be key. Scrapy’s ability to handle large amounts of data from multiple pages makes it an excellent tool for this task.

import scrapy
class CentralProductScraper(scrapy.Spider):
    name = 'central_product_scraper'
    start_urls = ['https://www.central.co.th/en/shop/category']
    def parse(self, response):
        for product in response.css('div.product-listing'):
            title = product.css('h2.product-name::text').get()
            price = product.css('span.product-price span.price::text').get()
            availability = product.css('div.availability-status::text').get()
            yield {
                'title': title.strip(),
                'price': price.strip(),
                'availability': availability.strip(),
            }
        # Pagination handling
        next_page = response.css('a.pagination-next::attr(href)').get()
        if next_page:
            yield response.follow(next_page, self.parse)

Gerel Tomislav

Member

12/13/2024 at 6:35 am

Scraping Central Thailand requires attention to both static and dynamic content, especially for prices and product availability. Scrapy excels at parsing static content, but for more complex sites with dynamic loading, you might need to ensure that you are targeting the correct elements. Scraping the prices and availability status can sometimes require additional logic to account for out-of-stock items or sale prices. It’s also important to handle pagination to scrape all available products within a category.

import scrapy
class CentralPriceScraper(scrapy.Spider):
    name = 'central_price_scraper'
    start_urls = ['https://www.central.co.th/en/shop/']
    def parse(self, response):
        for product in response.xpath('//div[@class="product-item"]'):
            title = product.xpath('.//h2[@class="product-title"]/text()').get()
            price = product.xpath('.//span[@class="price"]/text()').get()
            availability = product.xpath('.//div[@class="availability-status"]/text()').get()
            yield {
                'product': title.strip(),
                'price': price.strip(),
                'availability': availability.strip(),
            }
        # Follow pagination
        next_page = response.xpath('//a[contains(@class, "next-page")]/@href').get()
        if next_page:
            yield response.follow(next_page, self.parse)