How to Scrape Twitter/X Data with Python in 2026

Twitter (now X) remains one of the most valuable data sources on the internet — real-time conversations, sentiment signals, trending topics, and professional networking data. But scraping X has gotten significantly harder since 2023. The official API moved to pay-per-use pricing, free tiers disappeared, and X has aggressively pursued legal action against scraping services.

This tutorial covers every approach still working in 2026 — from the official API to third-party services — with real code examples and honest assessments of each option.

Key Takeaways

X's official API is now credit-based with no free tier — you must purchase credits to make any requests
twscrape (v0.17) is the best free option but requires real X accounts and technically violates ToS
SocialData.tools at $0.20/1,000 items is the cheapest legal-ish alternative
tweepy is the best Python wrapper for the official API
SearchHive can complement X data with broader web research and sentiment analysis

Prerequisites

Python 3.9+
For official API: X Developer account at console.x.com + purchased credits
For twscrape: One or more X accounts (username/password + email access for verification)
For SocialData.tools: API key from socialdata.tools

# For official API
pip install tweepy

# For unofficial approach
pip install twscrape

# For third-party API
pip install requests

# For SearchHive enrichment
pip install searchhive

Step 1: Official X API with tweepy

X's official API is the only fully legal method for accessing X data. As of 2026, it uses a credit-based, pay-per-use model — no subscription tiers.

import tweepy

# App-only auth (Bearer Token) — read public data
bearer_token = "YOUR_BEARER_TOKEN"
client = tweepy.Client(bearer_token=bearer_token)

# Search recent tweets
response = client.search_recent_tweets(
    query="python web scraping",
    max_results=10,
    tweet_fields=["created_at", "public_metrics", "author_id", "lang"],
    expansions=["author_id"],
    user_fields=["username", "name", "public_metrics"]
)

for tweet in response.data:
    print(f"[{tweet.created_at}] @{tweet.author_id}")
    print(f"  {tweet.text[:100]}...")
    print(f"  Likes: {tweet.public_metrics['like_count']} | RTs: {tweet.public_metrics['retweet_count']}")
    print()

Cost structure: Credits are purchased upfront and deducted per request. Different endpoints have different costs. Same resource requested within 24 hours is charged only once (deduplication).

User Context Auth (for reading/writing)

# OAuth 1.0a User Context
client = tweepy.Client(
    consumer_key="YOUR_API_KEY",
    consumer_secret="YOUR_API_SECRET",
    access_token="YOUR_ACCESS_TOKEN",
    access_token_secret="YOUR_ACCESS_TOKEN_SECRET"
)

# Get user by username
user = client.get_user(username="elonmusk", user_fields=["public_metrics", "description", "created_at"])
print(f"Name: {user.data.name}")
print(f"Followers: {user.data.public_metrics['followers_count']}")
print(f"Following: {user.data.public_metrics['following_count']}")

# Get user's tweets
tweets = client.get_users_tweets(
    id=user.data.id,
    max_results=10,
    tweet_fields=["created_at", "public_metrics"]
)

Step 2: Free Alternative — twscrape

twscrape uses X's internal GraphQL API with real account credentials. It's free and actively maintained (v0.17, updated April 2025).

Warning: This violates X's Terms of Service. Use for research purposes only.

import asyncio
from twscrape import API, gather

async def setup_and_search():
    api = API()

    # Add accounts (username, password, email, email_password)
    await api.pool.add_account(
        "username", "password",
        "email@gmail.com", "email_password"
    )
    await api.pool.login_all()

    # Search tweets
    tweets = await gather(api.search("python web scraping", limit=20))
    for tweet in tweets:
        print(f"@{tweet.user.username} ({tweet.likes} likes)")
        print(f"  {tweet.rawContent[:120]}...")
        print(f"  {tweet.url}")
        print()

    # Get user profile
    user = await api.user_by_login("GaborVarga84")
    print(f"{user.displayname}: {user.followersCount} followers")

    # Get followers
    followers = await gather(api.followers(user_id=user.id, limit=50))
    for f in followers:
        print(f"  @{f.username}: {f.followersCount} followers")

asyncio.run(setup_and_search())

twscrape Key Features

Automatic account rotation to handle rate limits
Async/await for concurrent requests
Search tweets (Latest, Top, Media tabs)
User profiles, followers, following
Tweet details, retweeters, replies, quotes
CLI tool included
SNScrape-compatible data models

Step 3: Budget Third-Party — SocialData.tools

SocialData.tools is the cheapest third-party option at $0.20 per 1,000 tweets or users, with 3 free requests/minute.

import requests
import json

API_KEY = "YOUR_SOCIALDATA_KEY"
headers = {"Authorization": f"Bearer {API_KEY}"}

# Search tweets
resp = requests.get(
    "https://api.socialdata.tools/v1/search",
    headers=headers,
    params={"query": "python web scraping", "limit": 20}
)
data = resp.json()
for tweet in data.get("tweets", []):
    print(f"@{tweet.get('author', {}).get('username')}")
    print(f"  {tweet.get('text', '')[:120]}")
    print()

# Get user profile
resp = requests.get(
    "https://api.socialdata.tools/v1/user/profile/xdevelopers",
    headers=headers
)
profile = resp.json()
print(json.dumps(profile, indent=2))

# Get tweet details
resp = requests.get(
    "https://api.socialdata.tools/v1/tweet/1234567890",
    headers=headers
)
tweet = resp.json()
print(tweet.get("text", "N/A"))

Pricing Comparison

Service	Price per 1K items	Free tier	Auth required	Legal status
Official X API	Variable (credit-based)	None	X Developer account	Fully legal
twscrape	Free (needs accounts)	Unlimited	Real X accounts	ToS violation
SocialData.tools	$0.20	3 req/min	API key	Gray area
Bright Data	$0.75–$1.00	Trial	Account	Gray area
RapidAPI proxies	$1–$10	Varies	API key	Varies

Step 4: Handle Rate Limits

All approaches have rate limits. Here's how to handle them properly:

import time
import requests

class XRateLimiter:
    """Rate limiter for X API access."""

    def __init__(self, max_rpm=50, max_rps=5):
        self.max_rpm = max_rpm  # requests per minute
        self.max_rps = max_rps  # requests per second
        self.requests = []
        self._last_request = 0

    def wait_if_needed(self):
        """Wait if we're approaching rate limits."""
        now = time.time()

        # Per-second throttle
        elapsed = now - self._last_request
        if elapsed < 1.0 / self.max_rps:
            time.sleep((1.0 / self.max_rps) - elapsed)

        # Per-minute throttle
        cutoff = now - 60
        self.requests = [t for t in self.requests if t > cutoff]
        if len(self.requests) >= self.max_rpm:
            wait = self.requests[0] + 60 - now + 1
            if wait > 0:
                print(f"Rate limit: waiting {wait:.1f}s")
                time.sleep(wait)

        self.requests.append(time.time())
        self._last_request = time.time()

# Usage
limiter = XRateLimiter(max_rpm=30, max_rps=2)
for i in range(100):
    limiter.wait_if_needed()
    # Make your API call here

Step 5: Complement with SearchHive

X data gives you what people are posting. SearchHive gives you the broader context — what the web is saying about those same topics.

from searchhive import SwiftSearch, DeepDive

api_key = "your_searchhive_key"

# Find broader web discussions about trending X topics
search = SwiftSearch(api_key=api_key)
results = search.search(
    query="AI regulation policy 2026",
    domains=["reuters.com", "techcrunch.com", "arstechnica.com", "theverge.com"],
    extract_fields=["title", "description", "url", "date"]
)

# Analyze sentiment and extract entities from articles
dd = DeepDive(api_key=api_key)
for article in results[:5]:
    analysis = dd.analyze(
        url=article["url"],
        summarize=True,
        extract_entities=True
    )
    print(f"{article['title']}")
    print(f"  Summary: {analysis.get('summary', 'N/A')[:200]}")
    print(f"  Entities: {analysis.get('entities', [])}")
    print()

How SearchHive complements X data:

X/Twitter Data	SearchHive Adds
Real-time posts and opinions	In-depth article analysis and fact-checking
User profiles and follower counts	Broader web presence (blogs, news, forums)
Trending hashtags	Context: why something is trending, what's driving it
Short-form text (280 chars)	Long-form analysis and sentiment from full articles
User-posted links	Content extraction and analysis of linked articles

Step 6: Build a Complete X Research Pipeline

Combine everything into a reusable research workflow:

import json
import asyncio
from datetime import datetime

class XResearchPipeline:
    """Complete X/Twitter research pipeline."""

    def __init__(self, method="official", api_key=None, bearer_token=None):
        self.method = method
        self.api_key = api_key
        self.bearer_token = bearer_token

    def search_x(self, query, limit=50):
        """Search X for tweets matching query."""
        if self.method == "official" and self.bearer_token:
            return self._search_official(query, limit)
        elif self.method == "socialdata" and self.api_key:
            return self._search_socialdata(query, limit)
        else:
            print("No valid credentials configured")
            return []

    def _search_official(self, query, limit):
        """Search using official X API via tweepy."""
        import tweepy
        client = tweepy.Client(bearer_token=self.bearer_token)

        response = client.search_recent_tweets(
            query=query,
            max_results=min(limit, 100),
            tweet_fields=["created_at", "public_metrics", "author_id", "lang"]
        )

        results = []
        if response.data:
            for tweet in response.data:
                results.append({
                    "id": tweet.id,
                    "text": tweet.text,
                    "created_at": str(tweet.created_at),
                    "metrics": tweet.public_metrics,
                    "source": "x_official_api"
                })
        return results

    def _search_socialdata(self, query, limit):
        """Search using SocialData.tools API."""
        import requests
        headers = {"Authorization": f"Bearer {self.api_key}"}
        resp = requests.get(
            "https://api.socialdata.tools/v1/search",
            headers=headers,
            params={"query": query, "limit": limit}
        )
        data = resp.json()
        return data.get("tweets", [])

    def enrich_with_web_context(self, topic):
        """Add broader web context using SearchHive."""
        from searchhive import SwiftSearch
        search = SwiftSearch(api_key=self.api_key)
        return search.search(
            query=topic,
            extract_fields=["title", "description", "url"]
        )

    def save_results(self, results, filename):
        """Save results to JSON file."""
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        filepath = f"{filename}_{timestamp}.json"
        with open(filepath, "w") as f:
            json.dump(results, f, indent=2, default=str)
        print(f"Saved {len(results)} results to {filepath}")

# Usage
pipeline = XResearchPipeline(
    method="socialdata",
    api_key="your_api_key"
)

results = pipeline.search_x("artificial intelligence regulation", limit=50)
print(f"Found {len(results)} tweets")

# Add web context
web_context = pipeline.enrich_with_web_context("AI regulation policy 2026")
print(f"Found {len(web_context)} related web articles")

pipeline.save_results(results, "x_research_ai_regulation")

Common Issues

Official API: 403 Forbidden

Cause: Insufficient credits, expired token, or rate limit exceeded. Fix: Check your credit balance at console.x.com. Monitor x-rate-limit-remaining headers. Implement exponential backoff for retries.

twscrape: Account Suspended

Cause: X detected automated activity. Fix: Use cookie-based auth instead of password login, use residential proxies, limit activity patterns to human-like frequency, and rotate between multiple accounts.

SocialData.tools: Rate Limited

Cause: Free tier limit (3 req/min). Fix: Add a 21-second delay between requests, or upgrade to paid usage. At $0.20/1K items, costs are minimal.

Missing or Truncated Data

Cause: Not requesting all fields via tweet_fields/user_fields parameters. Fix: Explicitly request all needed fields. For the official API, use expansions to get related user objects.

Legal Considerations

X's ToS explicitly prohibits unauthorized data collection
X has sued scraping services (Bright Data, others) and sent DMCA takedowns
The official API is the only fully legal method
Mass collection and redistribution of tweets may infringe copyright
GDPR applies to personal data of EU users — even public data
Use data minimization: only collect what you need
Don't redistribute scraped data — this significantly increases legal risk

Next Steps

Start small — test with the official API on a limited dataset before scaling
Combine sources — X data + web search via SearchHive gives a more complete picture
Implement caching — deduplication within 24h windows saves credits
Monitor costs — set spending limits in the X Developer Console
Build dashboards — visualize trends, sentiment, and engagement metrics

Need to research X topics alongside broader web data? Start with SearchHive's free tier — 100 free requests per month. Check the API docs for integration guides.

How to Scrape Twitter/X Data with Python in 2026

AI-Powered Research

How to Scrape Twitter/X Data with Python in 2026

Key Takeaways

Prerequisites

Step 1: Official X API with tweepy

User Context Auth (for reading/writing)

Step 2: Free Alternative — twscrape

twscrape Key Features

Step 3: Budget Third-Party — SocialData.tools

Pricing Comparison

Step 4: Handle Rate Limits

Step 5: Complement with SearchHive

Step 6: Build a Complete X Research Pipeline

Common Issues

Official API: 403 Forbidden

twscrape: Account Suspended

SocialData.tools: Rate Limited

Missing or Truncated Data

Legal Considerations

Next Steps

Keywords

RELATED ARTICLES

How to Build a Web Data Pipeline with Python

How to Monitor Website Changes with Python

How to Scrape Websites Behind Login with Python

BUILD WITH SEARCHHIVE