How to Scrape YouTube Data — Video Metrics and Comments

Need to scrape YouTube data for market research, competitor analysis, or content strategy? YouTube is the second largest search engine, and its data — view counts, engagement metrics, comment sentiment, channel growth — is valuable for creators, marketers, and researchers alike.

This tutorial covers every approach: the official YouTube Data API v3, lightweight libraries for metadata extraction, and SearchHive for scraping when the API falls short. Each method includes working Python code.

Key Takeaways

YouTube Data API v3 is the official but rate-limited option (10K quota units/day on free tier)
yt-dlp extracts video metadata without API keys — faster setup, no quotas
SearchHive ScrapeForge handles full page scraping including comments, thumbnails, and related videos
For comment scraping at scale, combining yt-dlp metadata with SearchHive page scraping is the most reliable approach
Always respect YouTube's Terms of Service and rate limits to avoid API key suspension

Prerequisites

pip install searchhive yt-dlp google-api-python-client

searchhive — Web scraping API (free tier: 50K requests/month)
yt-dlp — Video/metadata downloader, no API key needed
google-api-python-client — Official YouTube Data API client

Optional: A YouTube Data API v3 key from Google Cloud Console (free, 10K quota/day).

Step 1: Extract Video Metadata with yt-dlp

yt-dlp is the fastest way to pull video metadata without an API key:

import yt_dlp

def get_video_metadata(video_url: str) -> dict:
    """Extract comprehensive metadata from a YouTube video."""
    ydl_opts = {
        'quiet': True,
        'no_download': True,  # Don't download the actual video
        'extract_flat': True,
    }
    
    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        info = ydl.extract_info(video_url, download=False)
    
    return {
        'id': info.get('id'),
        'title': info.get('title'),
        'uploader': info.get('uploader'),
        'uploader_id': info.get('uploader_id'),
        'channel_url': info.get('channel_url'),
        'duration': info.get('duration'),
        'view_count': info.get('view_count'),
        'like_count': info.get('like_count'),
        'comment_count': info.get('comment_count'),
        'upload_date': info.get('upload_date'),
        'description': info.get('description', '')[:500],
        'tags': info.get('tags', []),
        'categories': info.get('categories', []),
        'thumbnail': info.get('thumbnail'),
        'live_status': info.get('live_status'),
    }

# Usage
metadata = get_video_metadata("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
print(f"Title: {metadata['title']}")
print(f"Views: {metadata['view_count']:,}")
print(f"Likes: {metadata['like_count']:,}")
print(f"Duration: {metadata['duration']}s")
print(f"Tags: {metadata['tags'][:5]}")

This works without any API key. yt-dlp parses the video page directly and extracts structured metadata.

Batch Extract Multiple Videos

def batch_metadata(video_urls: list[str]) -> list[dict]:
    """Extract metadata from multiple YouTube videos."""
    results = []
    for url in video_urls:
        try:
            meta = get_video_metadata(url)
            results.append(meta)
            print(f"OK: {meta['title'][:50]}...")
        except Exception as e:
            print(f"FAIL {url}: {e}")
    return results

urls = [
    "https://www.youtube.com/watch?v=video1",
    "https://www.youtube.com/watch?v=video2",
    "https://www.youtube.com/watch?v=video3",
]

videos = batch_metadata(urls)

Step 2: Use the YouTube Data API v3 for Structured Queries

The official API gives you search, channel info, and playlist data — things yt-dlp doesn't handle well:

from googleapiclient.discovery import build

def get_channel_stats(api_key: str, channel_id: str) -> dict:
    """Get channel statistics using YouTube Data API v3."""
    youtube = build('youtube', 'v3', developerKey=api_key)
    
    request = youtube.channels().list(
        part='statistics,snippet,brandingSettings',
        id=channel_id
    )
    response = request.execute()
    
    if response['items']:
        channel = response['items'][0]
        stats = channel['statistics']
        snippet = channel['snippet']
        return {
            'name': snippet['title'],
            'subscribers': int(stats.get('subscriberCount', 0)),
            'total_views': int(stats.get('viewCount', 0)),
            'video_count': int(stats.get('videoCount', 0)),
            'description': snippet.get('description', '')[:300],
            'published_at': snippet.get('publishedAt'),
            'thumbnails': snippet.get('thumbnails', {}),
        }
    return {}

def search_videos(api_key: str, query: str, max_results: int = 10) -> list[dict]:
    """Search YouTube videos by keyword."""
    youtube = build('youtube', 'v3', developerKey=api_key)
    
    request = youtube.search().list(
        part='snippet',
        q=query,
        type='video',
        maxResults=max_results,
        order='viewCount'
    )
    response = request.execute()
    
    videos = []
    for item in response.get('items', []):
        videos.append({
            'video_id': item['id']['videoId'],
            'title': item['snippet']['title'],
            'channel': item['snippet']['channelTitle'],
            'published_at': item['snippet']['publishedAt'],
            'url': f"https://youtube.com/watch?v={item['id']['videoId']}",
        })
    return videos

# Usage (replace with your API key)
# stats = get_channel_stats("YOUR_API_KEY", "UCxxxxxxxx")
# results = search_videos("YOUR_API_KEY", "python web scraping tutorial")

API Quota Management

YouTube API v3 costs quota units:

search.list: 100 units per request
videos.list: 1 unit per request
commentThreads.list: 1 unit per request
Free tier: 10,000 units/day = ~100 searches or 10,000 video lookups

If you're hitting quota limits, switch to yt-dlp for metadata and save API quota for searches.

Step 3: Scrape Comments with SearchHive

The YouTube API only returns comments for your own videos (or public videos with limited pagination). For full comment scraping, use SearchHive:

from searchhive import ScrapeForge

def scrape_youtube_comments(video_url: str, max_comments: int = 50) -> list[dict]:
    """Scrape comments from a YouTube video using SearchHive."""
    client = ScrapeForge()
    
    result = client.scrape(
        url=video_url,
        render_js=True,
        wait_for="#comments-section, ytd-comments",
        selectors={
            "comments": {
                "each": "#content ytd-comment-thread-renderer",
                "fields": {
                    "author": "#author-text span",
                    "text": "#content-text",
                    "likes": "#vote-count-middle",
                    "time": ".time a",
                }
            }
        }
    )
    
    comments = result.data.get("comments", [])
    return comments[:max_comments]

# Usage
comments = scrape_youtube_comments("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
for c in comments[:10]:
    print(f"{c.get('author', 'Unknown')}: {c.get('text', '')[:80]}...")

YouTube loads comments dynamically with JavaScript. SearchHive's render_js=True waits for the comments section to populate before extracting data.

Step 4: Get Video Transcript

Transcripts are gold for content analysis, NLP, and SEO research:

# pip install youtube-transcript-api
from youtube_transcript_api import YouTubeTranscriptApi

def get_transcript(video_id: str) -> str:
    """Get the full transcript of a YouTube video."""
    try:
        transcript_list = YouTubeTranscriptApi.get_transcript(video_id)
        full_text = " ".join(
            entry['text'] for entry in transcript_list
        )
        return full_text
    except Exception as e:
        print(f"No transcript available: {e}")
        return ""

# Extract video ID from URL
video_id = "dQw4w9WgXcQ"
transcript = get_transcript(video_id)
print(f"Transcript length: {len(transcript)} characters")
print(transcript[:500] + "...")

Step 5: Scrape Search Results and Trending

YouTube search results pages contain rankings, view counts, and channel info. Use SearchHive to scrape them:

from searchhive import ScrapeForge

def scrape_youtube_search(query: str) -> list[dict]:
    """Scrape YouTube search results without using the API."""
    client = ScrapeForge()
    
    url = f"https://www.youtube.com/results?search_query={query.replace(' ', '+')}"
    
    result = client.scrape(
        url=url,
        render_js=True,
        wait_for="ytd-video-renderer",
        selectors={
            "videos": {
                "each": "ytd-video-renderer",
                "fields": {
                    "title": "#video-title",
                    "channel": "ytd-channel-name a",
                    "views": "#metadata-line span:first-child",
                    "time": "#metadata-line span:last-child",
                    "url": "a#video-title @href",
                }
            }
        }
    )
    
    return result.data.get("videos", [])

# Usage
results = scrape_youtube_search("python web scraping tutorial")
for v in results[:5]:
    print(f"{v.get('title', 'N/A')[:60]} — {v.get('views', 'N/A')}")

This bypasses API quotas entirely. Each search query uses one SearchHive request instead of 100 API quota units.

Step 6: Build a Complete YouTube Analytics Pipeline

Combine all the methods into a unified pipeline:

import json
import csv
from datetime import datetime

from searchhive import ScrapeForge
import yt_dlp

class YouTubeScraper:
    def __init__(self):
        self.scrape_client = ScrapeForge()
        self.ydl_opts = {'quiet': True, 'no_download': True}
    
    def get_metadata(self, video_url: str) -> dict:
        with yt_dlp.YoutubeDL(self.ydl_opts) as ydl:
            info = ydl.extract_info(video_url, download=False)
        return {
            'id': info['id'],
            'title': info['title'],
            'channel': info['uploader'],
            'views': info.get('view_count', 0),
            'likes': info.get('like_count', 0),
            'duration': info.get('duration', 0),
            'upload_date': info.get('upload_date'),
            'tags': info.get('tags', []),
        }
    
    def get_comments(self, video_url: str) -> list[dict]:
        result = self.scrape_client.scrape(
            url=video_url, render_js=True,
            wait_for="#comments-section",
            selectors={"comments": {
                "each": "ytd-comment-thread-renderer",
                "fields": {
                    "author": "#author-text span",
                    "text": "#content-text",
                    "likes": "#vote-count-middle",
                }
            }}
        )
        return result.data.get("comments", [])
    
    def full_analysis(self, video_url: str) -> dict:
        metadata = self.get_metadata(video_url)
        comments = self.get_comments(video_url)
        metadata['comments'] = comments
        metadata['comment_count'] = len(comments)
        metadata['scraped_at'] = datetime.utcnow().isoformat()
        return metadata

# Usage
scraper = YouTubeScraper()
analysis = scraper.full_analysis("https://www.youtube.com/watch?v=dQw4w9WgXcQ")

with open("youtube_analysis.json", "w") as f:
    json.dump(analysis, f, indent=2, default=str)

print(f"Title: {analysis['title']}")
print(f"Views: {analysis['views']:,}")
print(f"Comments scraped: {len(analysis['comments'])}")

Complete Code Example

Here's a production-ready script that analyzes multiple videos and exports results:

from searchhive import ScrapeForge
import yt_dlp
import json
import csv
from datetime import datetime

def analyze_videos(video_urls: list[str], output_csv: str = "youtube_data.csv"):
    """Analyze multiple YouTube videos and export to CSV."""
    client = ScrapeForge()
    ydl_opts = {'quiet': True, 'no_download': True}
    results = []
    
    for url in video_urls:
        try:
            # Get metadata via yt-dlp
            with yt_dlp.YoutubeDL(ydl_opts) as ydl:
                info = ydl.extract_info(url, download=False)
            
            video_data = {
                'url': url,
                'title': info.get('title', ''),
                'channel': info.get('uploader', ''),
                'views': info.get('view_count', 0),
                'likes': info.get('like_count', 0),
                'duration_sec': info.get('duration', 0),
                'upload_date': info.get('upload_date', ''),
                'tags': '|'.join(info.get('tags', [])[:10]),
            }
            
            # Get comment count via SearchHive
            try:
                scrape_result = client.scrape(
                    url=url, render_js=True,
                    wait_for="#comments-section",
                    selectors={
                        "comment_count": "#count ytd-comments-header-renderer h2",
                        "top_comments": {
                            "each": "ytd-comment-thread-renderer",
                            "limit": 5,
                            "fields": {
                                "author": "#author-text span",
                                "text": "#content-text",
                            }
                        }
                    }
                )
                if scrape_result.data:
                    comments = scrape_result.data.get("top_comments", [])
                    video_data['comment_count'] = len(comments)
                    video_data['top_comment'] = comments[0]['text'][:100] if comments else ""
            except Exception:
                video_data['comment_count'] = 0
            
            results.append(video_data)
            print(f"OK: {video_data['title'][:50]} — {video_data['views']:,} views")
            
        except Exception as e:
            print(f"FAIL {url}: {e}")
    
    # Export to CSV
    if results:
        fieldnames = list(results[0].keys())
        with open(output_csv, 'w', newline='', encoding='utf-8') as f:
            writer = csv.DictWriter(f, fieldnames=fieldnames)
            writer.writeheader()
            writer.writerows(results)
        print(f"\nExported {len(results)} videos to {output_csv}")

if __name__ == "__main__":
    urls = [
        "https://www.youtube.com/watch?v=example1",
        "https://www.youtube.com/watch?v=example2",
    ]
    analyze_videos(urls)

Common Issues

yt-dlp returns "Video unavailable"

The video may be private, deleted, or geo-restricted. Check with info.get('availability') before processing.

YouTube API quota exhausted

Switch to yt-dlp for metadata extraction and SearchHive for page scraping. Neither uses API quota.

Comments not loading with SearchHive

YouTube may require scrolling to load more comments. Use SearchHive's actions parameter to trigger scroll events, or increase the wait_for timeout.

Rate limiting from YouTube

SearchHive's proxy rotation distributes requests across different IPs, reducing the chance of rate limiting. Keep concurrency low (2-3 simultaneous requests) for YouTube specifically.

Next Steps

Combine YouTube data with SearchHive DeepDive for sentiment analysis on comments
Check /blog/how-to-monitor-competitor-prices-with-python-automated-system for scraping competitor pricing strategies
See /compare/scraperapi for how SearchHive compares to other scraping APIs on reliability and cost

Start scraping YouTube data with SearchHive's free tier — 50,000 requests/month with JS rendering and proxy rotation. No API key needed for page scraping. Read the docs.

How to Scrape YouTube Data — Video Metrics and Comments

AI-Powered Research

How to Scrape YouTube Data — Video Metrics and Comments

Key Takeaways

Prerequisites

Step 1: Extract Video Metadata with yt-dlp

Batch Extract Multiple Videos

Step 2: Use the YouTube Data API v3 for Structured Queries

API Quota Management

Step 3: Scrape Comments with SearchHive

Step 4: Get Video Transcript

Step 5: Scrape Search Results and Trending

Step 6: Build a Complete YouTube Analytics Pipeline

Complete Code Example

Common Issues

yt-dlp returns "Video unavailable"

YouTube API quota exhausted

Comments not loading with SearchHive

Rate limiting from YouTube

Next Steps

Keywords

RELATED ARTICLES

How to Build a Proxy Rotator for Web Scraping with Python

How to Scrape Wikipedia Data for Knowledge Graphs

How to Monitor Competitor Prices with Python — Automated System

BUILD WITH SEARCHHIVE