To begin, you'll need to install Scrapy in your Python environment. Run the following command in your terminal:
pip install scrapy
This command installs Scrapy along with any necessary dependencies, setting up your environment to begin building your project.
With Scrapy installed, start a new project by running:
scrapy startproject myproject
cd myproject
scrapy genspider example example.com
This creates the basic folder structure for your Scrapy project, and the genspider
command initializes your first spider. Spiders are essential in Scrapy, as they define how Scrapy should navigate a website, what data to extract, and how to follow links.
Open your spider file (example.py
), located in the spiders
folder. Here, you'll define what to scrape from the target page. A simple example might look like this:
import scrapy
class ExampleSpider(scrapy.Spider):
name = "example"
start_urls = ['http://example.com']
def parse(self, response):
title = response.css('h1::text').get()
yield {'title': title}
In this example, the spider navigates to the URL, scrapes the <h1>
text, and stores it in a dictionary format. You can customize this further to target other elements or attributes on the page.
Scrapy allows you to export the scraped data in formats such as JSON or CSV, making it easy to analyze later. Run the following command to save your spider's output in a data.json
file:
scrapy crawl example -o data.json
This command starts the spider and writes the results into data.json
, offering a structured way to store your data.
To execute your spider, simply enter the command:
scrapy crawl example
This will run the spider as defined, scraping the data and storing it according to your configuration. Scrapy handles everything from sending requests to parsing the HTML and saving the output.
Congratulations! You've just created and run your first Scrapy project, setting the foundation for more advanced scraping tasks. With Scrapy’s built-in support for data pipelines and storage, you’re well-equipped to manage larger projects and handle more complex data needs.
Stay tuned for the next tutorial, where we’ll dive into advanced extraction techniques using CSS Selectors and XPath for more precise data targeting.
Our community is here to support your growth, so why wait? Join now and let’s build together!