Naukri Job Scraper with Python and PostgreSQL (Latest)

Naukri Job Scraper with Python and PostgreSQL: A Comprehensive Guide

In today’s digital age, data is the new oil. Companies and individuals alike are constantly seeking ways to harness the power of data to gain insights and make informed decisions. One of the most valuable sources of data is job listings, which can provide insights into industry trends, skill demands, and employment opportunities. In this article, we will explore how to create a job scraper for Naukri.com using Python and PostgreSQL, two powerful tools that can help you efficiently collect and store job data.

Understanding the Basics of Web Scraping

Web scraping is the process of extracting data from websites. It involves fetching the content of a webpage and parsing it to extract the desired information. Python, with its rich ecosystem of libraries, is a popular choice for web scraping due to its simplicity and versatility.

Before diving into the technical details, it’s important to understand the legal and ethical considerations of web scraping. Always ensure that you comply with a website’s terms of service and robots.txt file, which outlines the rules for web crawlers.

Setting Up Your Python Environment

To get started with web scraping, you’ll need to set up your Python environment. This involves installing the necessary libraries and tools. The primary libraries we’ll use for this project are BeautifulSoup and Requests.

BeautifulSoup: A library for parsing HTML and XML documents. It provides Pythonic idioms for iterating, searching, and modifying the parse tree.
Requests: A simple and elegant HTTP library for Python, used to send HTTP requests to websites and receive responses.

To install these libraries, you can use pip, the Python package manager:

pip install beautifulsoup4 requests

pip install beautifulsoup4 requests

Building the Naukri Job Scraper

Now that we have our environment set up, let’s build the Naukri job scraper. The goal is to extract job listings from Naukri.com and store them in a PostgreSQL database for further analysis.

First, we’ll write a Python script to fetch job listings from Naukri.com. We’ll use the Requests library to send an HTTP request to the website and BeautifulSoup to parse the HTML content.

import requests

from bs4 import BeautifulSoup

def fetch_job_listings(url):

response = requests.get(url)

soup = BeautifulSoup(response.content, 'html.parser')

jobs = []

for job_card in soup.find_all('div', class_='jobTuple'):

title = job_card.find('a', class_='title').text.strip()

company = job_card.find('a', class_='subTitle').text.strip()

location = job_card.find('li', class_='location').text.strip()

jobs.append({'title': title, 'company': company, 'location': location})

return jobs

url = 'https://www.naukri.com/python-jobs'

job_listings = fetch_job_listings(url)

print(job_listings)

import requests from bs4 import BeautifulSoup def fetch_job_listings(url): response = requests.get(url) soup = BeautifulSoup(response.content, 'html.parser') jobs = [] for job_card in soup.find_all('div', class_='jobTuple'): title = job_card.find('a', class_='title').text.strip() company = job_card.find('a', class_='subTitle').text.strip() location = job_card.find('li', class_='location').text.strip() jobs.append({'title': title, 'company': company, 'location': location}) return jobs url = 'https://www.naukri.com/python-jobs' job_listings = fetch_job_listings(url) print(job_listings)

import requests
from bs4 import BeautifulSoup

def fetch_job_listings(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    jobs = []

    for job_card in soup.find_all('div', class_='jobTuple'):
        title = job_card.find('a', class_='title').text.strip()
        company = job_card.find('a', class_='subTitle').text.strip()
        location = job_card.find('li', class_='location').text.strip()
        jobs.append({'title': title, 'company': company, 'location': location})

    return jobs

url = 'https://www.naukri.com/python-jobs'
job_listings = fetch_job_listings(url)
print(job_listings)

Storing Data in PostgreSQL

With the job data extracted, the next step is to store it in a PostgreSQL database. PostgreSQL is a powerful, open-source relational database system known for its robustness and scalability.

First, ensure that PostgreSQL is installed on your system. You can download it from the official PostgreSQL website. Once installed, create a new database and table to store the job listings.

CREATE DATABASE job_scraper;

c job_scraper

CREATE TABLE job_listings (

id SERIAL PRIMARY KEY,

title VARCHAR(255),

company VARCHAR(255),

location VARCHAR(255)

);

CREATE DATABASE job_scraper; c job_scraper CREATE TABLE job_listings ( id SERIAL PRIMARY KEY, title VARCHAR(255), company VARCHAR(255), location VARCHAR(255) );

CREATE DATABASE job_scraper;

c job_scraper

CREATE TABLE job_listings (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255),
    company VARCHAR(255),
    location VARCHAR(255)
);

Next, we’ll use the psycopg2 library to connect to the PostgreSQL database and insert the job data. Install psycopg2 using pip:

pip install psycopg2

pip install psycopg2

Now, let’s write a Python script to insert the job listings into the database:

import psycopg2

def insert_job_listings(jobs):

conn = psycopg2.connect(

dbname='job_scraper',

user='your_username',

password='your_password',

host='localhost'

)

cursor = conn.cursor()

for job in jobs:

cursor.execute(

"INSERT INTO job_listings (title, company, location) VALUES (%s, %s, %s)",

(job['title'], job['company'], job['location'])

)

conn.commit()

cursor.close()

conn.close()

insert_job_listings(job_listings)

import psycopg2 def insert_job_listings(jobs): conn = psycopg2.connect( dbname='job_scraper', user='your_username', password='your_password', host='localhost' ) cursor = conn.cursor() for job in jobs: cursor.execute( "INSERT INTO job_listings (title, company, location) VALUES (%s, %s, %s)", (job['title'], job['company'], job['location']) ) conn.commit() cursor.close() conn.close() insert_job_listings(job_listings)

import psycopg2

def insert_job_listings(jobs):
    conn = psycopg2.connect(
        dbname='job_scraper',
        user='your_username',
        password='your_password',
        host='localhost'
    )
    cursor = conn.cursor()

    for job in jobs:
        cursor.execute(
            "INSERT INTO job_listings (title, company, location) VALUES (%s, %s, %s)",
            (job['title'], job['company'], job['location'])
        )

    conn.commit()
    cursor.close()
    conn.close()

insert_job_listings(job_listings)

Conclusion

In this article, we’ve explored how to create a Naukri job scraper using Python and PostgreSQL. By leveraging the power of web scraping and relational databases, you can efficiently collect and store job data for analysis. This project serves as a foundation for more advanced data analysis and visualization tasks, enabling you to gain valuable insights into the job market.

Remember to always adhere to ethical guidelines and legal requirements when scraping data from websites. With the right tools and techniques, you can unlock the potential of data to drive informed decision-making and stay ahead in the competitive job market.