← Blog

Why EDGAR full-text search is painful, and the API that fixes it

EDGAR full-text search at efts.sec.gov is the right tool for one job: find every filing of a given form across every filer, regardless of which company's page it lands on. The per-company submissions feed cannot do that, because some forms (a Schedule 13D, for one) are filed under the reporting person's CIK, not the target's. So if you want "every new activist stake this week," full-text search is where you go. Then you meet the quirks.

The five things that bite

Ten hits per page. The search index returns ten documents at a time. Paging with from= walks you forward in tens, and the result set is capped near 10,000, so a broad query needs a tight date filter or it truncates without telling you.

The form-name mismatch. The submissions API and the filing cover page call it SC 13D. The full-text index calls the same form SCHEDULE 13D. Query with the wrong token and you get a clean 200 with zero hits and no hint that your string was the problem. The same trap waits on SC 13G versus SCHEDULE 13G and on the 424B variants.

Day-only dates. startdt and enddt filter to the calendar day. You cannot ask for "filings since 14:32," so an intraday poller over-fetches the current day and dedups the overlap itself.

Hits are documents, not facts. A result tells you a CIK, an accession, and that your term appeared somewhere in the filing. It does not tell you the holder, the target ticker, or the percent of class. You still fetch each hit's primary document and parse it, and newer filings carry a structured primary_doc.xml while older ones bury the same numbers in cover-page prose.

The User-Agent rule. EDGAR wants a declared User-Agent with contact info and expects you under 10 requests a second. Miss the header and you get 403s that look like outages.

None of this is hard in isolation. It is fiddly, it breaks quietly, and it is the kind of plumbing that works for months and then drops events when the SEC rotates a schema version.

Hand the search off, get typed records back

EDGAR Events runs that full-text pipeline for you. It normalizes the form names, pages through the hits, follows each one to its cover-page XML, and returns the parsed entity instead of a document pointer. The activist-stakes endpoint is the clearest example, and it is live data:

curl -s -H "X-API-Key: $EDGAR_KEY" \
  "https://api.edgarevents.com/activist-stakes?min_percent=5&limit=2"

A single result, pulled from the feed today:

{
  "event_id": "0001829126-26-006953",
  "event_type": "activist_stake",
  "form": "SCHEDULE 13D/A",
  "filed_date": "2026-06-26",
  "accession": "0001829126-26-006953",
  "target": {
    "name": "Indaptus Therapeutics, Inc.",
    "ticker": "INDP",
    "tickers": ["INDP"],
    "cik": "0001857044"
  },
  "holders": [
    {
      "name": "Sino Lion Ventures Ltd",
      "cik": "0002120638",
      "shares": 38895000.0,
      "percent_of_class": 29.19,
      "sole_voting": 0.0,
      "shared_voting": 38895000.0,
      "type_of_reporting_person": "CO"
    }
  ],
  "percent_of_class": 29.19,
  "shares": 38895000.0,
  "security_class": "Common Stock, par value $0.01 per share",
  "date_of_event": "06/24/2026",
  "filing_url": "https://www.sec.gov/Archives/edgar/data/2120638/000182912626029689/primary_doc.xml",
  "source": "fulltext_search",
  "parsed_structured": true
}

The source: "fulltext_search" field is the tell: the service resolved the SCHEDULE 13D/A form name, ran the index, deduped the document hits down to one event, and read the cover page. You filter on the numbers you trade on, not on search syntax. min_percent thresholds the largest stake in the filing, include_amendments=false drops the 13D/A updates, and ticker narrows to a target. When a filing is too old for structured XML, parsed_structured comes back false and you get a percent_narrative field with the raw cover-page text rather than a guessed number.

The same wrapping covers form scanning generally. The /filings read returns an envelope that names what it actually searched:

{
  "count": 1,
  "universe": ["AAPL", "MSFT", "NVDA", "TSLA", "WBD", "F", "GM", "AMD"],
  "events": [ ... ],
  "errors": null
}

universe is the resolved set the query ran against, and errors surfaces a ticker that failed to resolve instead of silently dropping it from the search. The two failure modes that hurt most in raw full-text search, a query that matched nothing because the form token was wrong and a symbol that quietly fell out, both become visible.

When raw efts is the better call

If you need ad-hoc, exploratory search across the whole corpus, run efts.sec.gov directly. It is free, it covers every form, and it is the right primitive for a one-off investigation. The API earns its price when the search is a standing production dependency: you want the form-name normalization, the cover-page parser kept current across schema versions, the SEC rate-limit handling, and a typed record your code can act on without a second parsing stage.

Free tier to start, then $29/month, self-serve, cancel anytime. Also on RapidAPI. Get a key at edgarevents.com; the full reference is at api.edgarevents.com/docs.

EDGAR Events is an independent service and is not affiliated with or endorsed by the U.S. Securities and Exchange Commission. Data comes from the public EDGAR system (data.sec.gov, efts.sec.gov).

SEC filings, already parsed.

Typed JSON for 8-K item codes, SC 13D activist stakes, IPO forms and merger proxies. $29/mo, self-serve, cancel anytime.

Get an API key