News Feed Forums General Web Scraping What data can be extracted from MarksandSpencer.com using Python?

  • What data can be extracted from MarksandSpencer.com using Python?

    Posted by Jacinda Thilini on 12/21/2024 at 11:57 am

    Scraping data from MarksandSpencer.com using Python allows you to gather details such as product names, prices, and availability across categories like clothing, food, and homeware. Marks & Spencer is a leading retailer in the UK, offering a wide range of products that make it an excellent source for market research. Using Python, you can automate the process of collecting this data efficiently. The first step involves inspecting the HTML structure of the product pages to identify relevant elements such as product titles and price tags. By using Python’s libraries to send HTTP requests and parse HTML, you can build a robust scraper.
    One challenge when scraping Marks & Spencer is handling multiple categories and subcategories, as the website is vast. Automating navigation across categories ensures that the scraper captures a comprehensive dataset. Additionally, tracking customer reviews and ratings alongside product data adds depth to the analysis. Below is an example Python script for scraping product data from MarksandSpencer.com.

    import requests
    from html.parser import HTMLParser
    class MarksAndSpencerParser(HTMLParser):
        def __init__(self):
            super().__init__()
            self.in_product_name = False
            self.in_product_price = False
            self.products = []
            self.current_product = {}
        def handle_starttag(self, tag, attrs):
            attrs = dict(attrs)
            if tag == "h2" and "class" in attrs and "product-title" in attrs["class"]:
                self.in_product_name = True
            if tag == "span" and "class" in attrs and "price" in attrs["class"]:
                self.in_product_price = True
        def handle_endtag(self, tag):
            if self.in_product_name and tag == "h2":
                self.in_product_name = False
            if self.in_product_price and tag == "span":
                self.in_product_price = False
        def handle_data(self, data):
            if self.in_product_name:
                self.current_product["name"] = data.strip()
            if self.in_product_price:
                self.current_product["price"] = data.strip()
                self.products.append(self.current_product)
                self.current_product = {}
    url = "https://www.marksandspencer.com/"
    response = requests.get(url)
    parser = MarksAndSpencerParser()
    parser.feed(response.text)
    for product in parser.products:
        print(f"Product: {product['name']}, Price: {product['price']}")
    

    This script extracts product names and prices from MarksandSpencer.com. By adding functionality for tracking stock levels or analyzing reviews, the scraper can provide more comprehensive insights. Introducing random delays between requests ensures smoother operations and reduces detection risks.

    Jacinda Thilini replied 5 hours, 9 minutes ago 1 Member · 0 Replies
  • 0 Replies

Sorry, there were no replies found.

Log in to reply.