Documentation

How Spotlight works.

A versioned skill bundle that turns any AI agent runtime into an OSINT investigation pipeline. Brief → methodology → execution cycles → fact-check gate → vault. Open by default; opt-in sensitive track for local-model sessions.

01 — What it isAn OSINT pipeline you check into your repo.

Spotlight is a set of skills, agent prompts, schemas, and verification rules stored as version-controlled markdown. You point your AI agent runtime at the bundle, give it a lead, and it walks the investigation end-to-end: planning the methodology, running search and scrape cycles, archiving evidence locally before citing it, fact-checking with an independent pass, and writing the result into a knowledge vault you own.

The point isn't to replace the journalist. The point is to remove rote work — link-walking, archive-before-cite, schema compliance, citation hygiene, fact-check provenance — so editorial time goes to judgement and source work, not to babysitting the agent.

Everything in the bundle is editable. If your newsroom needs a different verdict taxonomy or a custom integration, you fork the repo, edit the markdown, and your agent picks up the change next session.

02 — The pipelineSix stages, one editorial loop.

Every investigation runs through the same six stages. The agent moves between them under explicit gates — no silent transitions, no half-finished cases.

01
Preflight
02
Brief
03
Methodology
04
Execution × 5
05
Gate 1
06
Ingestion

Preflight

Runtime check, vault check, integrations check. Spotlight refuses to start if Firecrawl is missing, the vault is unreadable, or a required external tool isn't responding. Failures here are surfaced in plain English; no work happens until the environment is sound.

Brief

The agent restates the lead in its own words, names the affected parties, sets a scope, and writes a hypothesis. You read it back. If the agent has misunderstood the lead, you correct it here — it's faster to redirect in two sentences of prose than three execution cycles later.

Methodology

The agent writes a methodology JSON: which entities to investigate, which sources to consult (corporate registries, social media, archived web, OSINT Navigator tool lookups), which fact-check threshold applies. The file is editable. The execution cycles will follow this plan; if you don't like the plan, change it before the agent starts.

Execution cycles

Up to five execution cycles, each in two modes: PLANNING (what to do next) and EXECUTION (do it, record what came back). The investigator agent runs the cycles; every source is archived locally before it can be cited; every claim gets an evidence trail with grounding rationale.

Gate 1 — six readiness criteria

Before ingestion, the case has to pass six checks:

  1. Findings are non-empty and schema-valid.
  2. Independence — fact-checker spawned as a separate agent, not the investigator's same context.
  3. Disputes recorded — contradictions in the evidence are surfaced, not papered over.
  4. Affected perspective consulted (where the lead concerns a named entity).
  5. Document trail intact — every source has an access_method and a local file.
  6. Gap assessment — what we don't know is named, not hidden.

If a criterion fails, the agent loops back. If it persists past cycle five, the stall protocol triggers — the investigation is paused and the journalist decides whether to continue, change scope, or shelf.

Ingestion

Findings, fact-check verdicts, methodology, and source archives are written into your knowledge vault as structured markdown with wikilinks. Future investigations search the vault via QMD and build on prior work without rediscovering the same entities from scratch.

03 — Skills inside Spotlight15 skills that compose the pipeline.

Each skill is a markdown file with a frontmatter contract (name, description, allowed verbs). The agent loads them at runtime. You can edit any of them.

spotlight
Orchestrator — walks the pipeline, manages state, calls other skills.
investigate
Execution-cycle logic. PLANNING/EXECUTION modes, stall protocol.
review
Gate 1 readiness checks. Renders human-review HTML.
ingest
Vault archival — entity notes, methodology notes, source notes, registries.
epistemic-grounding
Claim-to-evidence grounding. Confidence caps by access method.
osint
Tool lookup via OSINT Navigator. Country-specific registries.
follow-the-money
Corporate filings, beneficial-ownership chains, registry walks.
social-media-intelligence
Platform-specific scraping, archive-first capture.
content-access
Legal access hierarchy for paywalled or restricted sources.
web-archiving
Wayback / Archive.today / local archive chain, chain-of-custody.
acquisition-graduation
Source-acquisition workflow — when a lead becomes a story.
integrations
External tool manifests, preflight, dev-browser, Junkipedia, Unpaywall.
monitoring
Persistent monitoring tied to a case — Scoutpost, Mycroft, runtime-native.
provenance-signing
Noosphere C2PA provenance manifest for published findings.
shell-safety
Validates untrusted input before any shell execution; probes for destructive ops.

Two agent prompts sit alongside the skills — investigator and fact-checker — and a thin runtime contract in AGENTS.md defines the 13 verbs the host runtime has to implement.

04 — RuntimesUse the agent you already have.

Spotlight is runtime-agnostic. The same skill bundle drives any agent that can read markdown skills and call shell tools.

Cloud runtimes

Claude Code, Gemini CLI, Codex CLI, and OpenCode (routed to OpenRouter / Fireworks / Together) all work natively. Authentication is via each runtime's own login — no API key needed for the subscription-backed ones (Claude, Gemini, Codex), and a provider key for OpenCode pay-per-token.

Cloud runtimes are the right choice for open material — public-record research, archived web, corporate filings. They're faster and more capable than what fits on a laptop, and the trade-off is that the agent's context goes to the third-party provider.

Local runtimes

OpenCode pointed at a local Ollama endpoint keeps everything on the journalist's machine. The installer ships two abliterated journalism models:

  • Default — Qwen 3.5 9B Journalist (tomvaillant/qwen3.5-9b-abliterated-journalist-GGUF:Q4_K_M). 9B dense, ~6 GB on disk, fits 16 GB Macs comfortably. Bench-tested at 100% refusal-resistance on OSINT-grade prompts.
  • Heavy tier — Qwen 3.6 27B Journalist (tomvaillant/qwen3.6-27b-abliterated-journalist-GGUF:Q4_K_M). 27B dense, fine-tuned on the same investigative-journalism corpus as the 9B, on Huihui's abliterated Qwen 3.6 base. ~15 GB on disk, ~22 GB resident at runtime. Runs in thinking mode (the abliterated /no_think path produces token soup). Needs 32 GB unified memory minimum — the setup form's fit-check enforces this before the option commits.

Local runtimes are slower than cloud runtimes — the 9B does an investigation in ~15 min vs ~5 min on cloud; the 27B takes 45–90 min. They're the right choice when material can't leave the machine.

Hardware fit-check

The install form probes your hardware via navigator.deviceMemory + WebGPU and recommends a model: 9B Journalist for any 16 GB+ Mac (default), the 27B if RAM reports ≥ 32 GB. Below 16 GB the form pushes users to Frontier mode.

05 — ConfigurationOne install, a handful of env vars.

The install form generates a single Terminal one-liner that writes your config and runs the installer. Everything lives in ~/.config/spotlight/.env; you can edit it after install to reconfigure.

Required

SPOTLIGHT_VAULT_PATH=~/Obsidian/main          # where investigations get archived
FIRECRAWL_API_KEY=fc-...                      # web scraping (free tier ok)
OSINT_NAV_API_KEY=on_...                      # tool lookup (free tier ok)

Runtime selection

SPOTLIGHT_RUNTIME=local                       # or claude / gemini / codex / opencode
SPOTLIGHT_LOCAL_MODEL=qwen9b                  # qwen9b (16 GB) / qwen27b (32 GB+)
SPOTLIGHT_LOCAL_SERVER=ollama                 # ollama / llamacpp

Optional — sensitive vault

SPOTLIGHT_SENSITIVE_ENABLED=1                 # opt-in; default off

# Both of these are derived by convention if SPOTLIGHT_SENSITIVE_ENABLED=1.
# Override them only if you want the sensitive vault on an unusual path.
# SPOTLIGHT_SENSITIVE_VAULT_PATH=~/Obsidian/main-sensitive
# SPOTLIGHT_SENSITIVE_INDEX=main-sensitive

Optional — integrations

JUNKIPEDIA_API_KEY=...     # disinformation monitoring (optional)
UNPAYWALL_EMAIL=...        # OA paper access (optional)
SPOTLIGHT_INT_DEVBROWSER=true   # dev-browser for interactive browser acquisition

06 — Sensitive vaultTwo ingest targets, one union query.

Spotlight investigations default to open material. If you also work with sensitive material — source-protected documents, off-record interview notes, unpublished tip identities — you can opt in to a second, parallel knowledge vault that lives next to the main one with a -sensitive suffix.

Convention

Set SPOTLIGHT_SENSITIVE_ENABLED=1. That's it. The sensitive vault path defaults to ${SPOTLIGHT_VAULT_PATH}-sensitive and the sensitive QMD index name follows the same suffix. No second name field to maintain.

How it works

  • The ingest skill accepts --target sensitive. With the flag, notes land in the sensitive vault and get indexed into a separate QMD database. Without the flag, ingestion is unchanged.
  • A local-runtime wrapper (qmd-spotlight) is installed on PATH only when SPOTLIGHT_RUNTIME=local. Pass --with-sensitive and the wrapper queries both the open and sensitive indices and unions the results.
  • Frontier-runtime sessions don't get the wrapper installed. If the wrapper is somehow invoked from a frontier session anyway, an env-var check refuses with a clear error.

What this is — and what it isn't

This is plumbing, not enforcement. The flag lets a local-model session reach material the journalist deliberately put in a separate vault, without making that material discoverable from a frontier-model session that just calls bare qmd. It does not prevent a frontier agent on the same machine from running qmd --index <name>-sensitive directly and reading the sensitive vault that way.

If your threat model requires that level of guarantee, run sensitive work on a separate machine and treat the sensitive vault on the laptop as a summary archive of work done elsewhere — not as the workspace for the sensitive investigation itself. The design spec on GitHub walks through why earlier drafts proposed UID separation and air-gap phases, why those were cut, and what's left.

Suggested workflow

  1. Open research happens via the normal Spotlight pipeline against the default vault, under whichever runtime is best for the material (usually cloud).
  2. If a sensitive piece arrives, work on it in your own environment — local model on an isolated machine, encrypted disk, paper, whatever matches your operational layer.
  3. When you're ready, the output of that work can be ingested into the sensitive vault (invoke-skill ingest --target sensitive) so future local-runtime Spotlight sessions can search it alongside the open knowledge base.
  4. Declassification (moving a finding from sensitive to open) is a manual file copy + re-ingest — no special command, by design.

07 — FAQCommon questions.

Is Spotlight a SaaS?

No. Spotlight is a markdown skill bundle you check out from GitHub and point your existing agent at. There's no Spotlight server, no Spotlight account, no Spotlight API key. The tools it leans on (Firecrawl, OSINT Navigator) each have their own free tier.

What does it cost?

The Spotlight bundle itself is free and open-source (MIT). Costs come from whatever runtime you choose:

  • Claude Code / Gemini CLI / Codex CLI — covered by your existing subscription (Claude Max, Gemini Advanced, ChatGPT Plus/Pro/Team).
  • OpenCode — pay-per-token through OpenRouter / Fireworks / Together. Roughly $0.50–$2 per investigation.
  • Local model — free at runtime; one-time disk cost for the model weights (~5 GB for E4B, ~10 GB for Qwen3.6 27B).

Firecrawl and OSINT Navigator have free tiers sufficient for normal newsroom volume.

Why does the runtime matter?

Different runtimes have different trust properties. Cloud runtimes send the agent's context to a third-party provider — fine for open material, not fine for sensitive material. Local runtimes keep the context on the journalist's machine. The journalist picks the runtime per session; there's no automatic switching.

How is fact-checking different from the main investigation?

The fact-check pass runs as an independent agent spawn — a fresh context, separate from the investigator. It applies SIFT methodology (Stop, Investigate the source, Find better coverage, Trace claims), produces a verdict taxonomy (verified / unverified / disputed / false / partially_verified / mischaracterized), and writes its evidence trail to a separate JSON. The investigator can't see the fact-checker's output until after Gate 1. This is deliberately adversarial — same-context fact-checking under-catches errors the investigator already convinced itself of.

What's the sensitive vault flag for?

It turns on a second, parallel ingest target so material you deliberately classified sensitive doesn't show up in default QMD searches that a frontier-model session might run. It's a convenience boundary, not a confidentiality guarantee — see "What this is and what it isn't" above for the honest framing.

Can I bring my own skills / custom integrations?

Yes. Fork the repo, add a skill folder under skills/ with a SKILL.md matching the frontmatter contract, register it in AGENTS.md. Your runtime picks it up next session. The 13-verb runtime contract is the only thing that has to stay stable; everything else is editable.

What happens if the agent gets stuck?

The stall protocol fires after five execution cycles without passing Gate 1's readiness criteria. The investigation pauses, the agent writes a stall summary (what was tried, what didn't work, what's missing), and you decide: continue with a different scope, shelf the case, or escalate to a manual investigation. The stall summary becomes part of the case record either way — no investigation silently abandons.

Where does provenance live?

Each case produces a provenance-manifest.json built against the Noosphere C2PA contract. By default it's unsigned (the signing block exists but credential / endpoint are null); newsrooms that have a Noosphere signing setup can wire it in via the provenance-signing skill. The manifest tracks ingestion events, fact-check verdicts, and (when enabled) tier transitions.

Where do I file bugs or contribute?

Issues, PRs, and discussions on GitHub: github.com/buriedsignals/spotlight. The contribution guide is at CONTRIBUTING.md in the repo root.

08 — Read moreDeeper docs on GitHub.

The repo holds the full operator manual — architecture, runtime wiring, integrations, vulnerabilities, recovery. Most readers won't need these unless they're customizing Spotlight or debugging.

Install Spotlight GitHub ↗ MIT licensed · v2 in development