How to Scrape Twitter/X Data with Python in 2026
Twitter (now X) remains one of the most valuable data sources on the internet — real-time conversations, sentiment signals, trending topics, and professional networking data. But scraping X has gotten significantly harder since 2023. The official API moved to pay-per-use pricing, free tiers disappeared, and X has aggressively pursued legal action against scraping services.
This tutorial covers every approach still working in 2026 — from the official API to third-party services — with real code examples and honest assessments of each option.
Key Takeaways
- X's official API is now credit-based with no free tier — you must purchase credits to make any requests
- twscrape (v0.17) is the best free option but requires real X accounts and technically violates ToS
- SocialData.tools at $0.20/1,000 items is the cheapest legal-ish alternative
- tweepy is the best Python wrapper for the official API
- SearchHive can complement X data with broader web research and sentiment analysis
Prerequisites
- Python 3.9+
- For official API: X Developer account at console.x.com + purchased credits
- For twscrape: One or more X accounts (username/password + email access for verification)
- For SocialData.tools: API key from socialdata.tools
# For official API
pip install tweepy
# For unofficial approach
pip install twscrape
# For third-party API
pip install requests
# For SearchHive enrichment
pip install searchhive
Step 1: Official X API with tweepy
X's official API is the only fully legal method for accessing X data. As of 2026, it uses a credit-based, pay-per-use model — no subscription tiers.
import tweepy
# App-only auth (Bearer Token) — read public data
bearer_token = "YOUR_BEARER_TOKEN"
client = tweepy.Client(bearer_token=bearer_token)
# Search recent tweets
response = client.search_recent_tweets(
query="python web scraping",
max_results=10,
tweet_fields=["created_at", "public_metrics", "author_id", "lang"],
expansions=["author_id"],
user_fields=["username", "name", "public_metrics"]
)
for tweet in response.data:
print(f"[{tweet.created_at}] @{tweet.author_id}")
print(f" {tweet.text[:100]}...")
print(f" Likes: {tweet.public_metrics['like_count']} | RTs: {tweet.public_metrics['retweet_count']}")
print()
Cost structure: Credits are purchased upfront and deducted per request. Different endpoints have different costs. Same resource requested within 24 hours is charged only once (deduplication).
User Context Auth (for reading/writing)
# OAuth 1.0a User Context
client = tweepy.Client(
consumer_key="YOUR_API_KEY",
consumer_secret="YOUR_API_SECRET",
access_token="YOUR_ACCESS_TOKEN",
access_token_secret="YOUR_ACCESS_TOKEN_SECRET"
)
# Get user by username
user = client.get_user(username="elonmusk", user_fields=["public_metrics", "description", "created_at"])
print(f"Name: {user.data.name}")
print(f"Followers: {user.data.public_metrics['followers_count']}")
print(f"Following: {user.data.public_metrics['following_count']}")
# Get user's tweets
tweets = client.get_users_tweets(
id=user.data.id,
max_results=10,
tweet_fields=["created_at", "public_metrics"]
)
Step 2: Free Alternative — twscrape
twscrape uses X's internal GraphQL API with real account credentials. It's free and actively maintained (v0.17, updated April 2025).
Warning: This violates X's Terms of Service. Use for research purposes only.
import asyncio
from twscrape import API, gather
async def setup_and_search():
api = API()
# Add accounts (username, password, email, email_password)
await api.pool.add_account(
"username", "password",
"email@gmail.com", "email_password"
)
await api.pool.login_all()
# Search tweets
tweets = await gather(api.search("python web scraping", limit=20))
for tweet in tweets:
print(f"@{tweet.user.username} ({tweet.likes} likes)")
print(f" {tweet.rawContent[:120]}...")
print(f" {tweet.url}")
print()
# Get user profile
user = await api.user_by_login("GaborVarga84")
print(f"{user.displayname}: {user.followersCount} followers")
# Get followers
followers = await gather(api.followers(user_id=user.id, limit=50))
for f in followers:
print(f" @{f.username}: {f.followersCount} followers")
asyncio.run(setup_and_search())
twscrape Key Features
- Automatic account rotation to handle rate limits
- Async/await for concurrent requests
- Search tweets (Latest, Top, Media tabs)
- User profiles, followers, following
- Tweet details, retweeters, replies, quotes
- CLI tool included
- SNScrape-compatible data models
Step 3: Budget Third-Party — SocialData.tools
SocialData.tools is the cheapest third-party option at $0.20 per 1,000 tweets or users, with 3 free requests/minute.
import requests
import json
API_KEY = "YOUR_SOCIALDATA_KEY"
headers = {"Authorization": f"Bearer {API_KEY}"}
# Search tweets
resp = requests.get(
"https://api.socialdata.tools/v1/search",
headers=headers,
params={"query": "python web scraping", "limit": 20}
)
data = resp.json()
for tweet in data.get("tweets", []):
print(f"@{tweet.get('author', {}).get('username')}")
print(f" {tweet.get('text', '')[:120]}")
print()
# Get user profile
resp = requests.get(
"https://api.socialdata.tools/v1/user/profile/xdevelopers",
headers=headers
)
profile = resp.json()
print(json.dumps(profile, indent=2))
# Get tweet details
resp = requests.get(
"https://api.socialdata.tools/v1/tweet/1234567890",
headers=headers
)
tweet = resp.json()
print(tweet.get("text", "N/A"))
Pricing Comparison
| Service | Price per 1K items | Free tier | Auth required | Legal status |
|---|---|---|---|---|
| Official X API | Variable (credit-based) | None | X Developer account | Fully legal |
| twscrape | Free (needs accounts) | Unlimited | Real X accounts | ToS violation |
| SocialData.tools | $0.20 | 3 req/min | API key | Gray area |
| Bright Data | $0.75–$1.00 | Trial | Account | Gray area |
| RapidAPI proxies | $1–$10 | Varies | API key | Varies |
Step 4: Handle Rate Limits
All approaches have rate limits. Here's how to handle them properly:
import time
import requests
class XRateLimiter:
"""Rate limiter for X API access."""
def __init__(self, max_rpm=50, max_rps=5):
self.max_rpm = max_rpm # requests per minute
self.max_rps = max_rps # requests per second
self.requests = []
self._last_request = 0
def wait_if_needed(self):
"""Wait if we're approaching rate limits."""
now = time.time()
# Per-second throttle
elapsed = now - self._last_request
if elapsed < 1.0 / self.max_rps:
time.sleep((1.0 / self.max_rps) - elapsed)
# Per-minute throttle
cutoff = now - 60
self.requests = [t for t in self.requests if t > cutoff]
if len(self.requests) >= self.max_rpm:
wait = self.requests[0] + 60 - now + 1
if wait > 0:
print(f"Rate limit: waiting {wait:.1f}s")
time.sleep(wait)
self.requests.append(time.time())
self._last_request = time.time()
# Usage
limiter = XRateLimiter(max_rpm=30, max_rps=2)
for i in range(100):
limiter.wait_if_needed()
# Make your API call here
Step 5: Complement with SearchHive
X data gives you what people are posting. SearchHive gives you the broader context — what the web is saying about those same topics.
from searchhive import SwiftSearch, DeepDive
api_key = "your_searchhive_key"
# Find broader web discussions about trending X topics
search = SwiftSearch(api_key=api_key)
results = search.search(
query="AI regulation policy 2026",
domains=["reuters.com", "techcrunch.com", "arstechnica.com", "theverge.com"],
extract_fields=["title", "description", "url", "date"]
)
# Analyze sentiment and extract entities from articles
dd = DeepDive(api_key=api_key)
for article in results[:5]:
analysis = dd.analyze(
url=article["url"],
summarize=True,
extract_entities=True
)
print(f"{article['title']}")
print(f" Summary: {analysis.get('summary', 'N/A')[:200]}")
print(f" Entities: {analysis.get('entities', [])}")
print()
How SearchHive complements X data:
| X/Twitter Data | SearchHive Adds |
|---|---|
| Real-time posts and opinions | In-depth article analysis and fact-checking |
| User profiles and follower counts | Broader web presence (blogs, news, forums) |
| Trending hashtags | Context: why something is trending, what's driving it |
| Short-form text (280 chars) | Long-form analysis and sentiment from full articles |
| User-posted links | Content extraction and analysis of linked articles |
Step 6: Build a Complete X Research Pipeline
Combine everything into a reusable research workflow:
import json
import asyncio
from datetime import datetime
class XResearchPipeline:
"""Complete X/Twitter research pipeline."""
def __init__(self, method="official", api_key=None, bearer_token=None):
self.method = method
self.api_key = api_key
self.bearer_token = bearer_token
def search_x(self, query, limit=50):
"""Search X for tweets matching query."""
if self.method == "official" and self.bearer_token:
return self._search_official(query, limit)
elif self.method == "socialdata" and self.api_key:
return self._search_socialdata(query, limit)
else:
print("No valid credentials configured")
return []
def _search_official(self, query, limit):
"""Search using official X API via tweepy."""
import tweepy
client = tweepy.Client(bearer_token=self.bearer_token)
response = client.search_recent_tweets(
query=query,
max_results=min(limit, 100),
tweet_fields=["created_at", "public_metrics", "author_id", "lang"]
)
results = []
if response.data:
for tweet in response.data:
results.append({
"id": tweet.id,
"text": tweet.text,
"created_at": str(tweet.created_at),
"metrics": tweet.public_metrics,
"source": "x_official_api"
})
return results
def _search_socialdata(self, query, limit):
"""Search using SocialData.tools API."""
import requests
headers = {"Authorization": f"Bearer {self.api_key}"}
resp = requests.get(
"https://api.socialdata.tools/v1/search",
headers=headers,
params={"query": query, "limit": limit}
)
data = resp.json()
return data.get("tweets", [])
def enrich_with_web_context(self, topic):
"""Add broader web context using SearchHive."""
from searchhive import SwiftSearch
search = SwiftSearch(api_key=self.api_key)
return search.search(
query=topic,
extract_fields=["title", "description", "url"]
)
def save_results(self, results, filename):
"""Save results to JSON file."""
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filepath = f"{filename}_{timestamp}.json"
with open(filepath, "w") as f:
json.dump(results, f, indent=2, default=str)
print(f"Saved {len(results)} results to {filepath}")
# Usage
pipeline = XResearchPipeline(
method="socialdata",
api_key="your_api_key"
)
results = pipeline.search_x("artificial intelligence regulation", limit=50)
print(f"Found {len(results)} tweets")
# Add web context
web_context = pipeline.enrich_with_web_context("AI regulation policy 2026")
print(f"Found {len(web_context)} related web articles")
pipeline.save_results(results, "x_research_ai_regulation")
Common Issues
Official API: 403 Forbidden
Cause: Insufficient credits, expired token, or rate limit exceeded. Fix: Check your credit balance at console.x.com. Monitor x-rate-limit-remaining headers. Implement exponential backoff for retries.
twscrape: Account Suspended
Cause: X detected automated activity. Fix: Use cookie-based auth instead of password login, use residential proxies, limit activity patterns to human-like frequency, and rotate between multiple accounts.
SocialData.tools: Rate Limited
Cause: Free tier limit (3 req/min). Fix: Add a 21-second delay between requests, or upgrade to paid usage. At $0.20/1K items, costs are minimal.
Missing or Truncated Data
Cause: Not requesting all fields via tweet_fields/user_fields parameters. Fix: Explicitly request all needed fields. For the official API, use expansions to get related user objects.
Legal Considerations
- X's ToS explicitly prohibits unauthorized data collection
- X has sued scraping services (Bright Data, others) and sent DMCA takedowns
- The official API is the only fully legal method
- Mass collection and redistribution of tweets may infringe copyright
- GDPR applies to personal data of EU users — even public data
- Use data minimization: only collect what you need
- Don't redistribute scraped data — this significantly increases legal risk
Next Steps
- Start small — test with the official API on a limited dataset before scaling
- Combine sources — X data + web search via SearchHive gives a more complete picture
- Implement caching — deduplication within 24h windows saves credits
- Monitor costs — set spending limits in the X Developer Console
- Build dashboards — visualize trends, sentiment, and engagement metrics
Need to research X topics alongside broader web data? Start with SearchHive's free tier — 100 free requests per month. Check the API docs for integration guides.
See also: How to build a news scraper | SearchHive vs SerpApi | Web scraping with Python