Why EDGAR full-text search is painful, and the API that fixes it
EDGAR full-text search at efts.sec.gov is the right tool for one job: find every
filing of a given form across every filer, regardless of which company's page it
lands on. The per-company submissions feed cannot do that, because some forms (a
Schedule 13D, for one) are filed under the reporting person's CIK, not the
target's. So if you want "every new activist stake this week," full-text search is
where you go. Then you meet the quirks.
The five things that bite
Ten hits per page. The search index returns ten documents at a time. Paging with
from= walks you forward in tens, and the result set is capped near 10,000, so a
broad query needs a tight date filter or it truncates without telling you.
The form-name mismatch. The submissions API and the filing cover page call it
SC 13D. The full-text index calls the same form SCHEDULE 13D. Query with the
wrong token and you get a clean 200 with zero hits and no hint that your string
was the problem. The same trap waits on SC 13G versus SCHEDULE 13G and on the
424B variants.
Day-only dates. startdt and enddt filter to the calendar day. You cannot ask
for "filings since 14:32," so an intraday poller over-fetches the current day and
dedups the overlap itself.
Hits are documents, not facts. A result tells you a CIK, an accession, and that
your term appeared somewhere in the filing. It does not tell you the holder, the
target ticker, or the percent of class. You still fetch each hit's primary
document and parse it, and newer filings carry a structured primary_doc.xml
while older ones bury the same numbers in cover-page prose.
The User-Agent rule. EDGAR wants a declared User-Agent with contact info and
expects you under 10 requests a second. Miss the header and you get 403s that look
like outages.
None of this is hard in isolation. It is fiddly, it breaks quietly, and it is the kind of plumbing that works for months and then drops events when the SEC rotates a schema version.
Hand the search off, get typed records back
EDGAR Events runs that full-text pipeline for you. It normalizes the form names, pages through the hits, follows each one to its cover-page XML, and returns the parsed entity instead of a document pointer. The activist-stakes endpoint is the clearest example, and it is live data:
curl -s -H "X-API-Key: $EDGAR_KEY" \
"https://api.edgarevents.com/activist-stakes?min_percent=5&limit=2"
A single result, pulled from the feed today:
{
"event_id": "0001829126-26-006953",
"event_type": "activist_stake",
"form": "SCHEDULE 13D/A",
"filed_date": "2026-06-26",
"accession": "0001829126-26-006953",
"target": {
"name": "Indaptus Therapeutics, Inc.",
"ticker": "INDP",
"tickers": ["INDP"],
"cik": "0001857044"
},
"holders": [
{
"name": "Sino Lion Ventures Ltd",
"cik": "0002120638",
"shares": 38895000.0,
"percent_of_class": 29.19,
"sole_voting": 0.0,
"shared_voting": 38895000.0,
"type_of_reporting_person": "CO"
}
],
"percent_of_class": 29.19,
"shares": 38895000.0,
"security_class": "Common Stock, par value $0.01 per share",
"date_of_event": "06/24/2026",
"filing_url": "https://www.sec.gov/Archives/edgar/data/2120638/000182912626029689/primary_doc.xml",
"source": "fulltext_search",
"parsed_structured": true
}
The source: "fulltext_search" field is the tell: the service resolved the
SCHEDULE 13D/A form name, ran the index, deduped the document hits down to one
event, and read the cover page. You filter on the numbers you trade on, not on
search syntax. min_percent thresholds the largest stake in the filing,
include_amendments=false drops the 13D/A updates, and ticker narrows to a
target. When a filing is too old for structured XML, parsed_structured comes
back false and you get a percent_narrative field with the raw cover-page text
rather than a guessed number.
The same wrapping covers form scanning generally. The /filings read returns an
envelope that names what it actually searched:
{
"count": 1,
"universe": ["AAPL", "MSFT", "NVDA", "TSLA", "WBD", "F", "GM", "AMD"],
"events": [ ... ],
"errors": null
}
universe is the resolved set the query ran against, and errors surfaces a
ticker that failed to resolve instead of silently dropping it from the search. The
two failure modes that hurt most in raw full-text search, a query that matched
nothing because the form token was wrong and a symbol that quietly fell out, both
become visible.
When raw efts is the better call
If you need ad-hoc, exploratory search across the whole corpus, run efts.sec.gov
directly. It is free, it covers every form, and it is the right primitive for a
one-off investigation. The API earns its price when the search is a standing
production dependency: you want the form-name normalization, the cover-page
parser kept current across schema versions, the SEC rate-limit handling, and a
typed record your code can act on without a second parsing stage.
Free tier to start, then $29/month, self-serve, cancel anytime. Also on RapidAPI. Get a key at edgarevents.com; the full reference is at api.edgarevents.com/docs.
EDGAR Events is an independent service and is not affiliated with or endorsed by the U.S. Securities and Exchange Commission. Data comes from the public EDGAR system (data.sec.gov, efts.sec.gov).
SEC filings, already parsed.
Typed JSON for 8-K item codes, SC 13D activist stakes, IPO forms and merger proxies. $29/mo, self-serve, cancel anytime.