Academic Search APIs — Scholar, PubMed, and Semantic Scholar Compared

Research-driven applications need access to academic papers, citations, and metadata. Whether you're building a literature review tool, an AI research assistant, or a citation graph, the quality of your academic search API determines the quality of your output.

The landscape has shifted in 2026. Google Scholar still doesn't offer an official API (third-party wrappers remain the only option). PubMed's E-utilities are free but limited. Semantic Scholar provides a well-designed, free API with AI-powered features. And new commercial options have entered the market.

This guide compares the available academic search APIs head-to-head with real pricing, feature matrices, and code examples.

Key Takeaways

Semantic Scholar API is the best free option — 100 requests/second, AI-powered paper recommendations, and structured citation data
PubMed E-utilities remains the gold standard for biomedical literature — free, reliable, but biomedical-only
Google Scholar has no official API — third-party wrappers (SerpApi, SearchHive) proxy it, with varying reliability
CrossRef API provides free DOI metadata and citation counts across all disciplines
SearchHive SwiftSearch can extract Google Scholar results when you need broader academic coverage

1. Semantic Scholar — Best Free Academic API

Semantic Scholar, backed by the Allen Institute for AI, offers a comprehensive academic search API with natural language queries, paper recommendations, and citation graph traversal.

Pricing: Free. 100 requests/second for registered users. API key recommended but not required for low-volume use.

import requests

# Search for papers
resp = requests.get(
    "https://api.semanticscholar.org/graph/v1/paper/search",
    params={
        "query": "transformer architecture attention mechanism",
        "limit": 10,
        "fields": "title,abstract,year,citationCount,authors,openAccessPdf,url"
    }
)
for paper in resp.json().get("data", []):
    print(f"[{paper.get('year')}] {paper['title']}")
    print(f"  Citations: {paper.get('citationCount', 0)}")
    print(f"  URL: {paper.get('url', 'N/A')}")
    print()

# Get paper details and references
paper_resp = requests.get(
    "https://api.semanticscholar.org/graph/v1/paper/{paper_id}",
    params={
        "fields": "title,abstract,references,citations,embedding,tldr"
    }
)

Key features:

Natural language paper search
Citation graph traversal (references + citing papers)
AI-generated TLDR summaries for papers
Paper embeddings for similarity search
Author and venue information
Open access PDF links

Limitations: Coverage gaps in older papers (pre-2000), some disciplines less covered than CS/biomedicine, occasional rate limiting for unauthenticated requests.

2. PubMed E-utilities — Best for Biomedical Research

PubMed's E-utilities API provides free access to the MEDLINE database — over 36 million biomedical citations. It's been the standard for biomedical research for decades.

Pricing: Free. No API key required. Rate limit: 3 requests/second without a key, 10/second with an API key.

import requests
import xml.etree.ElementTree as ET

# Search PubMed for papers
search_resp = requests.get(
    "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi",
    params={
        "db": "pubmed",
        "term": "CRISPR gene editing clinical trial 2024:2026[dp]",
        "retmax": 20,
        "sort": "relevance"
    }
)
ids = ET.fromstring(search_resp.text).findall(".//Id")
id_list = ",".join(i.text for i in ids)

# Fetch details for the results
fetch_resp = requests.get(
    "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi",
    params={"db": "pubmed", "id": id_list, "rettype": "xml"}
)
articles = ET.fromstring(fetch_resp.text).findall(".//PubmedArticle")
for article in articles:
    title = article.find(".//ArticleTitle").text or "No title"
    print(title)

Strengths: Comprehensive biomedical coverage, free, reliable, well-documented Weaknesses: Biomedical only, XML responses (not free JSON formatter), limited full-text access, strict rate limits

3. CrossRef API — Best for DOI Metadata and Citation Counts

CrossRef is the DOI registration agency for scholarly publications. Their API provides metadata for over 150 million scholarly records across all disciplines.

Pricing: Free. Polite pool gets higher rate limits (include your email in requests).

import requests

# Search works by title or DOI
resp = requests.get(
    "https://api.crossref.org/works",
    params={
        "query": "attention is all you need transformer",
        "rows": 5,
        "select": "DOI,title,author,published,container-title,is-referenced-by-count"
    },
    headers={"mailto": "your@email.com"}  # Higher rate limits
)
for item in resp.json().get("message", {}).get("items", []):
    print(f"{item['title'][0]}")
    print(f"  DOI: {item.get('DOI', 'N/A')}")
    print(f"  Citations: {item.get('is-referenced-by-count', 0)}")
    print()

Strengths: Cross-disciplinary, free, DOI resolution, citation counts, funding information Weaknesses: No abstracts, no full text, limited search relevance, metadata quality varies by publisher

4. Google Scholar via SearchHive — Broadest Coverage

Google Scholar has no official API. SearchHive's SwiftSearch can extract Google Scholar results, giving you access to its broad interdisciplinary coverage.

import requests

API_KEY = "your-searchhive-key"

# Search Google Scholar via SearchHive
resp = requests.get(
    "https://api.searchhive.dev/v1/swift/search",
    headers={"Authorization": f"Bearer {API_KEY}"},
    params={
        "query": "site:scholar.google.com transformer neural network efficiency",
        "limit": 10
    }
)
for result in resp.json().get("results", []):
    print(f"{result['title']}")
    print(f"  {result['url']}")
    print(f"  {result.get('description', '')[:150]}")
    print()

Strengths: Broadest academic coverage, familiar Google-quality results, works alongside other SearchHive APIs Weaknesses: No structured citation data, dependent on Google's HTML structure, may hit anti-bot limits at high volume

5. SerpApi Scholar — Google Scholar Wrapper

SerpApi provides a structured API for Google Scholar results, parsing titles, authors, citations, and PDF links into clean JSON.

Pricing: Scholar searches count against your SerpApi plan. $25/mo for 1K total searches (not just Scholar).

import requests

resp = requests.get(
    "https://serpapi.com/search",
    params={
        "engine": "google_scholar",
        "q": "large language models survey 2026",
        "api_key": "YOUR_KEY"
    }
)
for organic in resp.json().get("organic_results", []):
    print(organic.get("title"))
    print(f"  Cited by: {organic.get('inline_links', {}).get('cited_by', {}).get('total', 0)}")

Strengths: Structured Google Scholar data, handles anti-bot automatically Weaknesses: Expensive, rate limits on high-volume academic search, no full-text access

Comparison Table

API	Free Tier	Rate Limit	Disciplines	Full Text	Citation Data	Output Format
Semantic Scholar	Yes (100 req/s)	100 req/s	All (CS strongest)	Links to OA	Full graph	JSON
PubMed E-utilities	Yes	10 req/s	Biomedical	Links to PMC	Limited	XML
CrossRef	Yes (polite pool)	50 req/s	All disciplines	No	Citation counts	JSON
SearchHive (Scholar)	500 credits	Plan-based	All (via Google)	No	Snippets only	JSON
SerpApi Scholar	250/mo	Plan-based	All (via Google)	No	Structured	JSON

Choosing the Right Academic Search API

For AI/CS research: Start with Semantic Scholar. The API quality, citation graph, and TLDR summaries are purpose-built for this domain.

For biomedical research: PubMed is irreplaceable. Combine it with Semantic Scholar for broader context.

For cross-disciplinary research: Layer CrossRef (for DOIs and citation counts) under Semantic Scholar (for abstracts and recommendations).

When you need Google Scholar results: Use SearchHive's SwiftSearch to extract Scholar results. The site:scholar.google.com operator limits results to academic sources while using SearchHive's credit system.

For a complete research pipeline: Combine multiple APIs. Search with Semantic Scholar, validate DOIs with CrossRef, fetch biomedical details from PubMed, and use SearchHive DeepDive to extract data from paper pages.

Getting Started

Semantic Scholar requires only a free API key for high-volume use. PubMed needs nothing. SearchHive gives you 500 free credits to test Scholar search alongside scraping and extraction.

Academic Search APIs — Scholar, PubMed, and Semantic Scholar Compared

AI-Powered Research

Key Takeaways

1. Semantic Scholar — Best Free Academic API

2. PubMed E-utilities — Best for Biomedical Research

3. CrossRef API — Best for DOI Metadata and Citation Counts

4. Google Scholar via SearchHive — Broadest Coverage

5. SerpApi Scholar — Google Scholar Wrapper

Comparison Table

Choosing the Right Academic Search API

Getting Started

Keywords

RELATED ARTICLES

Real Estate Scraping APIs — Zillow, Realtor, and MLS Data

LinkedIn Scraping APIs — Best Tools for Lead Generation

Geolocation Scraping APIs — Localized Data Collection Compared

BUILD WITH SEARCHHIVE