Scrape YouTube Comments Using Python: A Step-by-Step Guide

Table of content

Comments on YouTube videos can provide valuable insights and feedback. In this tutorial, we’ll guide you through building a YouTube comment scraper using Python. You’ll learn how to extract and analyze comments from any YouTube video, helping you gather user opinions and sentiment data. Whether you’re a developer, marketer, or researcher, this guide will equip you with the tools to unlock valuable insights from YouTube’s vast comment ecosystem.

Why Scrape YouTube Comments?

YouTube comments are a treasure trove of information. Here’s why scraping them might be useful:

  • Audience Analysis: Understand what users think about your content or a competitor’s video.
  • Sentiment Analysis: Gauge the general tone (positive, negative, neutral) of audience feedback.
  • Content Ideas: Extract common questions and feedback to inspire future content.
  • Data Mining: Collect data for academic or market research.

Tools You’ll Need

To build a YouTube comment scraper in Python, you’ll use the following tools and libraries:

  1. YouTube Data API: Provided by Google, it allows programmatic access to YouTube data.
  2. Python Libraries:
    • googleapiclient to interact with the YouTube Data API.
    • pandas for organizing and analyzing the scraped data.
    • requests and json for handling HTTP requests and parsing responses.
  3. API Key: A YouTube Data API key from the Google Cloud Console.

Setting Up Your Environment

Step 1: Install Required Libraries

First, ensure you have Python installed. Then, install the required libraries:

pip install google-api-python-client pandas requests

Step 2: Obtain Your API Key

  • Go to the Google Cloud Console.
  • Create a new project and enable the “YouTube Data API v3.”
  • Generate an API key for accessing the API.

youtube api

You can also read my Build a YouTube Scraper tutorial, where I provided detailed instructions on how to create a YouTube API key from the Google Console.

Step-by-Step Guide to Scraping YouTube Comments

Step 1: Import Libraries

Start by importing the necessary Python libraries:

from googleapiclient.discovery import build
import pandas as pd

Step 2: Initialize the YouTube API Client

Use your API key to create an API client:

api_key = "YOUR_API_KEY" 
youtube = build("youtube", "v3", developerKey=api_key)

 Step 3: Find the Video ID

Before fetching comments, you need to determine the video ID of the YouTube video for which you want to scrape comments. The video ID is a unique identifier in the YouTube URL. For example:

  • URL: https://www.youtube.com/watch?v=VIDEO_ID
  • Video ID: The portion after ?v=, e.g., VIDEO_ID.

You can manually extract the video ID or automate this process. Here’s how you can implement an automated solution:

Extract Video ID from URL

If you have the video URL, you can use the following Python function to extract the video ID:

import re

def extract_video_id(url):
    """
    Extract the video ID from a YouTube URL.
    Args:
    url (str): The YouTube video URL.

    Returns:
    str: The video ID or None if invalid URL.
    """
    # Regular expression to match YouTube video IDs
    video_id_match = re.search(r"v=([a-zA-Z0-9_-]{11})", url)
    if video_id_match:
        return video_id_match.group(1)
    else:
        print("Invalid YouTube URL.")
        return None

# Example usage
url = "https://www.youtube.com/watch?v=tXiD9XnCBXg"
video_id = extract_video_id(url)
print("Video ID:", video_id)

Explanation of the Code:

  1. Regex Matching: The regex pattern r"v=([a-zA-Z0-9_-]{11})" captures the 11-character video ID after v= in the URL.
  2. Validation: If the regex doesn’t find a match, the function returns None and prints a message indicating an invalid URL.
  3. Example: If the URL is https://www.youtube.com/watch?v=dQw4w9WgXcQ, the function returns dQw4w9WgXcQ.

Step 4: Extract Video Comments

Define a function to fetch comments from a YouTube video:

def get_comments(video_id):
    comments = []
    request = youtube.commentThreads().list(
        part="snippet",
        videoId=video_id,
        maxResults=100
    )
    response = request.execute()

    while response:
        for item in response['items']:
            comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
            author = item['snippet']['topLevelComment']['snippet']['authorDisplayName']
            comments.append({"Author": author, "Comment": comment})
        
        if 'nextPageToken' in response:
            request = youtube.commentThreads().list(
                part="snippet",
                videoId=video_id,
                pageToken=response['nextPageToken'],
                maxResults=100
            )
            response = request.execute()
        else:
            break
    return comments

The get_comments function fetches YouTube comments for a given video ID using the YouTube Data API. Here’s a concise explanation:

  1. Initialize: Creates an empty list comments to store comment data.
  2. API Request: Makes an initial API call to fetch up to 100 comments for the video.
  3. Extract Data: Loops through the response to extract comment text and author name, appending them to the comments list.
  4. Pagination: Checks for nextPageToken to fetch additional pages of comments if available.
  5. Return: Outputs the complete list of comments as dictionaries containing Author and Comment.

This function effectively handles pagination and retrieves all top-level comments from a video.

Step 5: Save Comments to a CSV File

Save the extracted comments into a CSV file for analysis:

video_id = "YOUR_VIDEO_ID"
comments = get_comments(video_id)
df = pd.DataFrame(comments)
df.to_csv("youtube_comments.csv", index=False)
print("Comments saved to youtube_comments.csv")

Here is a screenshot showing what the CSV result looks like:

youtube_comment

Analyzing YouTube Comments

With the comments saved in a CSV file, you can analyze them using Python or a tool like Excel. For instance, you can use Python’s TextBlob library to perform sentiment analysis on the comments.

Example: Sentiment Analysis

Install the textblob library and analyze the sentiment of each comment:

pip install textblob
from textblob import TextBlob

df['Sentiment'] = df['Comment'].apply(lambda x: TextBlob(x).sentiment.polarity)
print(df.head())

Ethical Considerations

When scraping data from YouTube, ensure you adhere to ethical and legal guidelines:

  • Respect YouTube’s Terms of Service.
  • Use the data responsibly, especially for public comments.

Full Code

from googleapiclient.discovery import build
import pandas as pd
api_key = "YOUR_API_KEY"
youtube = build("youtube", "v3", developerKey=api_key)

def get_comments(video_id):
    comments = []
    request = youtube.commentThreads().list(
        part="snippet",
        videoId=video_id,
        maxResults=100
    )
    response = request.execute()

    while response:
        for item in response['items']:
            comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
            author = item['snippet']['topLevelComment']['snippet']['authorDisplayName']
            comments.append({"Author": author, "Comment": comment})
        
        if 'nextPageToken' in response:
            request = youtube.commentThreads().list(
                part="snippet",
                videoId=video_id,
                pageToken=response['nextPageToken'],
                maxResults=100
            )
            response = request.execute()
        else:
            break
    return comments


video_id = "YOUR_VIDEO_ID"
comments = get_comments(video_id)
df = pd.DataFrame(comments)
df.to_csv("youtube_comments.csv", index=False)
print("Comments saved to youtube_comments.csv")

Conclusion

Building a YouTube comment scraper in Python is a straightforward and powerful way to gather insights from audience feedback. With the tools and steps provided, you can extract, save, and analyze YouTube comments to uncover trends, opinions, and actionable insights.

What will you do with your scraped YouTube comments? Share your thoughts or questions in the comments below!

Responses

Related Projects

New Bing Screenshot
b9929b09 167f 4365 9087 fddf3278a679
Google Maps
DALL·E 2024 12 05 18.44.15 A visually appealing banner image for a blog titled Scrape Google Trends Data Using Python. The image features a laptop displaying Google Trends on