Spotlight — Docs

01 — What it isAn OSINT pipeline you check into your repo.

Spotlight is a set of skills, agent prompts, schemas, and verification rules stored as version-controlled markdown. You point your AI agent runtime at the bundle, give it a lead, and it walks the investigation end-to-end: planning the methodology, running search and scrape cycles, archiving evidence locally before citing it, fact-checking with an independent pass, and writing the result into a knowledge vault you own.

The point isn't to replace the journalist. The point is to remove rote work — link-walking, archive-before-cite, schema compliance, citation hygiene, fact-check provenance — so editorial time goes to judgement and source work, not to babysitting the agent.

Everything in the bundle is editable. If your newsroom needs a different verdict taxonomy or a custom integration, you fork the repo, edit the markdown, and your agent picks up the change next session.

02 — The pipelineSix stages, one editorial loop.

Every investigation runs through the same six stages. The agent moves between them under explicit gates — no silent transitions, no half-finished cases.

Preflight

Brief

Methodology

Execution × 5

Gate 1

Ingestion

Preflight

Runtime check, vault check, integrations check. Spotlight refuses to start if the vault is unreadable, the bundled scraping/search stack (Crawl4AI + SearXNG) isn't responding, or a required external tool is missing. Failures here are surfaced in plain English; no work happens until the environment is sound.

Brief

The agent restates the lead in its own words, names the affected parties, sets a scope, and writes a hypothesis. You read it back. If the agent has misunderstood the lead, you correct it here — it's faster to redirect in two sentences of prose than three execution cycles later.

Methodology

The agent writes a methodology JSON: which entities to investigate, which sources to consult (corporate registries, social media, archived web, OSINT Navigator tool lookups), which fact-check threshold applies. The file is editable. The execution cycles will follow this plan; if you don't like the plan, change it before the agent starts.

Execution cycles

Up to five execution cycles, each in two modes: PLANNING (what to do next) and EXECUTION (do it, record what came back). The investigator agent runs the cycles; every source is archived locally before it can be cited; every claim gets an evidence trail with grounding rationale.

Gate 1 — six readiness criteria

Before ingestion, the case has to pass six checks:

Findings are non-empty and schema-valid.
Independence — fact-checker spawned as a separate agent, not the investigator's same context.
Disputes recorded — contradictions in the evidence are surfaced, not papered over.
Affected perspective consulted (where the lead concerns a named entity).
Document trail intact — every source has an access_method and a local file.
Gap assessment — what we don't know is named, not hidden.

If a criterion fails, the agent loops back. If it persists past cycle five, the stall protocol triggers — the investigation is paused and the journalist decides whether to continue, change scope, or shelf.

Ingestion

During a case, source archives and working JSON stay in the active case workspace. After explicit approval, verified or clearly caveated findings, fact-check verdicts, methodology lessons, and entity notes can be ingested into your knowledge vault as structured markdown with wikilinks. Future investigations search the vault via QMD and build on prior work without rediscovering the same entities from scratch.

03 — Skills inside Spotlight17 skills that compose the pipeline.

Each skill is a markdown file with a frontmatter contract (name, description, allowed verbs). The agent loads them at runtime. You can edit any of them.

spotlight

Orchestrator — walks the pipeline, manages state, calls other skills.

investigate

Execution-cycle logic. PLANNING/EXECUTION modes, stall protocol.

review

Gate 1 readiness checks. Renders human-review HTML.

ingest

Vault archival — entity notes, methodology notes, source notes, registries.

epistemic-grounding

Claim-to-evidence grounding. Confidence caps by access method.

osint

Tool lookup via OSINT Navigator. Country-specific registries.

follow-the-money

Corporate filings, beneficial-ownership chains, registry walks.

technical-investigation

Passive indicators, infrastructure history, document/email metadata, public GitHub.

social-media-intelligence

Platform-specific scraping, archive-first capture.

content-access

Legal access hierarchy for paywalled or restricted sources.

web-archiving

Wayback / Archive.today / local archive chain, chain-of-custody.

acquisition-graduation

Source-acquisition workflow — when a lead becomes a story.

integrations

External tool manifests, preflight, dev-browser, Junkipedia, Unpaywall.

monitoring

User-approved monitoring recommendations and Mycroft handoff.

provenance-signing

Noosphere C2PA provenance manifest for published findings.

report-drafting

Post-gate public report with an evidence ledger and reproducible methods.

shell-safety

Validates untrusted input before any shell execution; probes for destructive ops.

Two agent prompts sit alongside the skills — investigator and fact-checker — and a thin runtime contract in AGENTS.md defines the 13 verbs the host runtime has to implement.

04 — RuntimesUse the agent you already have.

Spotlight is runtime-agnostic. The same skill bundle drives any agent that can read markdown skills and call shell tools.

Cloud runtimes

Claude Code, Codex CLI, Pi, and OpenCode (routed to OpenRouter / Fireworks / Together) all work natively. Authentication is via each runtime's own login — no API key needed for the subscription-backed ones (Claude, Codex, Pi), and a provider key for OpenCode pay-per-token.

Cloud runtimes are the right choice for open material — public-record research, archived web, corporate filings. They're faster and more capable than what fits on a laptop, and the trade-off is that the agent's context goes to the third-party provider.

Local runtimes

OpenCode pointed at a local Ollama endpoint keeps everything on the journalist's machine. The installer ships two abliterated journalism models:

Default — Qwen 3.5 9B Journalist (tomvaillant/qwen3.5-9b-abliterated-journalist-GGUF:Q4_K_M). 9B dense, ~6 GB on disk, fits 16 GB Macs comfortably. Bench-tested at 100% refusal-resistance on OSINT-grade prompts.
Heavy tier — Qwen 3.6 27B Journalist (tomvaillant/qwen3.6-27b-abliterated-journalist-GGUF:Q4_K_M). 27B dense, fine-tuned on the same investigative-journalism corpus as the 9B, on Huihui's abliterated Qwen 3.6 base. ~15 GB on disk, ~22 GB resident at runtime. Runs in thinking mode (the abliterated /no_think path produces token soup). Needs 32 GB unified memory minimum — the setup form's fit-check enforces this before the option commits.

Local runtimes are slower than cloud runtimes — the 9B does an investigation in ~15 min vs ~5 min on cloud; the 27B takes 45–90 min. They're the right choice when material can't leave the machine.

Hardware fit-check

The local configurator the installer opens probes your hardware via navigator.deviceMemory + WebGPU and recommends a model: 9B Journalist for any 16 GB+ Mac (default), the 27B if RAM reports ≥ 32 GB. Below 16 GB it pushes users to Frontier mode.

05 — ConfigurationOne install, browser-connected research.

The public installer offers a direct local browser sign-in for Navigator or an explicit Skip. A successful Pro or Lab connection stores a revocable credential in the system keychain; no Navigator API key is pasted into Spotlight configuration. Lab additionally unlocks the Data Navigator tool.

Required

SPOTLIGHT_VAULT_PATH=~/Obsidian/main          # durable knowledge vault for search/ingest
SPOTLIGHT_CASES_ROOT=~/Documents/Spotlight/cases  # active case workspace

Search and scraping need no key: fetch runs on Crawl4AI and search on SearXNG, both open-source and installed locally. FIRECRAWL_API_KEY=fc-... is optional — set it only to enable Firecrawl as an explicit fallback provider.

Runtime selection

SPOTLIGHT_RUNTIME=local                       # or claude / codex / pi / opencode
SPOTLIGHT_MODEL_TIER=12b                      # 12b / 26b / 31b — sets reasoning budget + compaction profile
SPOTLIGHT_LOCAL_SERVER=llamacpp               # llamacpp (:8080) / ollama (:11434)

Optional — sensitive mode

# Runtime-level sensitive mode is controlled by AGENTS.md.
# A separate sensitive-vault install path is documented as a design target,
# but it is not enabled by the current installer.

Optional — integrations

JUNKIPEDIA_API_KEY=...     # disinformation monitoring (optional)
UNPAYWALL_EMAIL=...        # OA paper access (optional)
SPOTLIGHT_INT_DEVBROWSER=true   # dev-browser for interactive browser acquisition

06 — Sensitive modeLocal-only investigations first.

Spotlight investigations default to open material. For source-protected documents, off-record interview notes, unpublished tip identities, or other sensitive material, use runtime-level sensitive mode so agents work from local case files and vault context instead of live network acquisition.

Current support

The current installer does not create a second sensitive vault or install a qmd-spotlight wrapper. The implemented control is the runtime contract's sensitive mode: when enabled, adapters strip live fetch and search access and the case must be worked from local material.

How it works

Set sensitive: true in AGENTS.md or use the equivalent runtime command when your adapter supports it.
Preload material into the resolved {CASE_DIR}/research/ workspace and the local vault before starting the sensitive investigation.
Do not rely on sensitive-vault commands from the design spec until they are implemented in setup, ingest, and doctor checks.

What this is — and what it isn't

This is plumbing, not enforcement. Sensitive mode reduces accidental network use by the Spotlight adapter, but it is not a host security boundary. It does not stop a sufficiently privileged local agent from using unrelated shell or search tools outside the Spotlight contract.

If your threat model requires that level of guarantee, run sensitive work on a separate machine and treat the sensitive vault on the laptop as a summary archive of work done elsewhere — not as the workspace for the sensitive investigation itself. The design spec on GitHub walks through why earlier drafts proposed UID separation and air-gap phases, why those were cut, and what's left.

Suggested workflow

Open research happens via the normal Spotlight pipeline against the default vault, under whichever runtime is best for the material (usually cloud).
If a sensitive piece arrives, work on it in your own environment — local model on an isolated machine, encrypted disk, paper, whatever matches your operational layer.
When you're ready, write sanitized outputs into the normal case files or keep them in a separately managed local vault until the sensitive-vault workflow is implemented.
Declassification (moving a finding from sensitive to open) is a manual file copy + re-ingest — no special command, by design.

07 — FAQCommon questions.

Is Spotlight a SaaS?

No. Spotlight is a markdown skill bundle you check out from GitHub and point your existing agent at. There's no Spotlight server or Spotlight account. Search and scraping run on bundled open-source tools (SearXNG, Crawl4AI) with no account; Navigator is an optional member integration.

What does it cost?

The Spotlight bundle itself is free and open-source (MIT). Costs come from whatever runtime you choose:

Claude Code / Codex CLI / Pi — covered by your existing subscription (Claude Max, ChatGPT Plus/Pro/Team, or a supported Pi subscription).
OpenCode — pay-per-token through OpenRouter / Fireworks / Together. Roughly $0.50–$2 per investigation.
Local model — free at runtime; one-time disk cost for the model weights (~7 GB for the tuned Gemma-4 12B Q4, ~18–22 GB for the 26B tier).

Search and scraping are open-source and free (SearXNG + Crawl4AI; Firecrawl is an optional keyed fallback). Pro members can connect OSINT Navigator; Lab members also receive the Data Navigator tool.

Why does the runtime matter?

Different runtimes have different trust properties. Cloud runtimes send the agent's context to a third-party provider — fine for open material, not fine for sensitive material. Local runtimes keep the context on the journalist's machine. The journalist picks the runtime per session; there's no automatic switching.

How is fact-checking different from the main investigation?

The fact-check pass runs as an independent agent spawn — a fresh context, separate from the investigator. It applies SIFT methodology (Stop, Investigate the source, Find better coverage, Trace claims), produces a verdict taxonomy (verified / unverified / disputed / false / partially_verified / mischaracterized), and writes its evidence trail to a separate JSON. The investigator can't see the fact-checker's output until after Gate 1. This is deliberately adversarial — same-context fact-checking under-catches errors the investigator already convinced itself of.

What's the sensitive vault flag for?

It turns on a second, parallel ingest target so material you deliberately classified sensitive doesn't show up in default QMD searches that a frontier-model session might run. It's a convenience boundary, not a confidentiality guarantee — see "What this is and what it isn't" above for the honest framing.

Can I bring my own skills / custom integrations?

Yes. Fork the repo, add a skill folder under skills/ with a SKILL.md matching the frontmatter contract, register it in AGENTS.md. Your runtime picks it up next session. The 13-verb runtime contract is the only thing that has to stay stable; everything else is editable.

What happens if the agent gets stuck?

The stall protocol fires after five execution cycles without passing Gate 1's readiness criteria. The investigation pauses, the agent writes a stall summary (what was tried, what didn't work, what's missing), and you decide: continue with a different scope, shelf the case, or escalate to a manual investigation. The stall summary becomes part of the case record either way — no investigation silently abandons.

Where does provenance live?

Each case produces a provenance-manifest.json built against the Noosphere C2PA contract. By default it's unsigned (the signing block exists but credential / endpoint are null); newsrooms that have a Noosphere signing setup can wire it in via the provenance-signing skill. The manifest tracks ingestion events, fact-check verdicts, and (when enabled) tier transitions.

Where do I file bugs or contribute?

Issues, PRs, and discussions on GitHub: github.com/buriedsignals/spotlight. The contribution guide is at CONTRIBUTING.md in the repo root.

08 — Read moreDeeper docs on GitHub.

The repo holds the full operator manual — architecture, runtime wiring, integrations, vulnerabilities, recovery. Most readers won't need these unless they're customizing Spotlight or debugging.

structure.md — repo layout, the 13-verb contract
runtimes.md — wiring Spotlight into each agent runtime
integrations.md — external tool manifests + preflight
investigating.md — pipeline detail, gate criteria, stall protocol
technical-investigation.md — CTI-derived methods, model-tier loading, verified export, and source review
fact-checking.md — independent verification pass
epistemic-grounding.md — claim-to-evidence grounding, confidence caps
sensitivity.md — design spec for the sensitive vault (this page summarizes it)
vulnerabilities.md — shell-safety vulnerabilities and v2 mitigations
monitoring.md — persistent monitoring tied to a case
recovery.md — when things break
DISCLAIMER.md — scope limits and editorial responsibility

09 — AcknowledgementsStanding on open work.

Two Spotlight skills — content-access and social-media-intelligence — are adapted from claude-skills-journalism by Joe Amditis (Center for Cooperative Media, Montclair State University), MIT licensed. CTI Expert by Hieu Ngo / chongluadao.vn (GitHub: 7onez) is the source for selected passive technical-investigation, coverage, contradiction, and public-ledger methods. Spotlight substantially adapts the reviewed f9ecc9b revision under the MIT License with the upstream Ethical Use Addendum. Methodology throughout also draws on Bellingcat and GIJN training material, Jim Shultz's follow-the-money work, and OpenSanctions' evidence-preservation practice. The full record lives in NOTICE.md.

Install Spotlight GitHub ↗ MIT licensed · v2 in development