12
crates planned
0
crates implemented
1.76
MSRV
alpha.1
version
0
Rust LOC

Xiphos is a single-binary scanner that fans out to AST-based static analysis (tree-sitter), SCA + SPDX/CycloneDX SBOM, secret detection, IaC misconfig, and a Wasmtime-hosted plugin surface for community rules. Output is SARIF 2.1.0 with CWE / OWASP / CVSS metadata on every finding. As of writing, the workspace is an architectural skeleton --- the layering rules and RFCs exist before the crates do, on purpose.

Tech scope

  • End-user binary xiphos-cli drives subcommands (scan, sbom, ...) and is the only crate with a bin target.
  • Capability scanners ship as separate crates: xiphos-sast, -sca, -secrets, -heuristic, -iac, each unable to see the others. Coordination only happens in xiphos-scanner.
  • Three rule flavors, one host: built-in Rust rules, declarative YAML in rules/, and community WASM components run by xiphos-plugin against the xiphos:plugin@0.1.0 WIT world.
  • Plugin capabilities (read-source, read-metadata, read-findings, emit-finding, decorate-finding, http-egress) are default-deny and per-call enforced inside the host, not in the scanner that called it.
  • Optional xiphos-llm may add explanation / remediation text and downgrade severity with justification — but cannot create findings.

Architecture

The pipeline runs in one direction. CLI builds a ScanRequest; xiphos-scanner walks the tree (using the ignore crate for .gitignore / .xiphosignore semantics), classifies files, and schedules AnalysisUnits onto a Rayon pool; capability scanners produce Vec<Finding> into a deduplicating FindingSet; the WASM plugin host may add or decorate (never delete); the LLM decorator may downgrade severity (with justification); xiphos-sarif emits.

Determinism is a hard contract. Same inputs → byte-identical canonical JSON, regardless of thread count. Finding.id is a content hash over (rule_id, location_fingerprint, message_template). Time, env vars, and machine identifiers must never enter a finding. tests/determinism.rs runs every scanner twice and diffs; CI blocks regressions.

Threat model and constraints

  • Scanned source is treated as hostile: per-file size cap (default 10 MiB), per-scanner timeout, OOM → non-zero exit (not panic).
  • xiphos.toml is read only from explicit --config, never from the scanned tree.
  • --output is canonicalized; symlinks disallowed when writing.
  • LLM passes redact / truncate freeform code excerpts to defend against prompt injection.
  • Severity is derived from a contextual CVSS calculation, not set directly — final_cvss = base_cvss ☆ environmental_modifiers ☆ confidence_modifier.
core scanner cli plugin sarif llm
enforced by `cargo xtask check-layers` once code lands · as of 2026-04-26
CLI scanner capabilities FindingSet WASM plugins LLM decorate SARIF 2.1
as of 2026-04-26
scanner · 800 sast · 700 core · 600 sca · 500 plugin · 450 secrets · 400 iac · 350 heuristic · 300 llm sarif rules cli
12 crates declared in Cargo.toml workspace · 0 LOC implemented as of 2026-04-26

Surface

Today the surface is the workspace and the architecture document — the crates declared in Cargo.toml are intent, not yet implementation. The eventual end-user binary is xiphos-cli, the only crate that will carry a bin target. xiphos scan, xiphos sbom, and xiphos plugin compose into one binary; capability scanners (sast, sca, secrets, iac, heuristic) live in their own crates and only see the work the orchestrator hands them. Plugins are WebAssembly components against the xiphos:plugin@0.1.0 WIT world; capabilities are default-deny and per-call enforced inside the host.

Numbers

The numbers above are honest about the stage. 12 crates declared, 0 implemented — the workspace is the architectural skeleton on top of which the code will be poured. MSRV 1.76 is firm; cargo features that landed later are not available. unsafe_code = "deny" applies workspace-wide with one fenced exception in the Wasmtime glue. Determinism is the load-bearing contract: same inputs → byte-identical canonical SARIF, regardless of thread count. Finding.id is a content hash; time, env vars, and machine identifiers must never enter a finding.

:/ ESC