7275
TS LOC
478
MD LOC (docs)
AGPL
license
self
host model
GEO
primary metric

gapr is built around the premise that ranking inside an LLM's generated answer matters as much as ranking on a Google SERP. Beyond traditional SEO (crawls, on-page audits, SERP tracking, backlinks, content briefs, keyword difficulty) it fans a tracked prompt out to ChatGPT, Claude, Perplexity, and Gemini, captures the answer + citations, and computes a transparent GEO Presence Score. Every score in the system exposes its factor breakdown via the Score.breakdown JSON column.

Tech scope

  • Three-process runtime. apps/api (Fastify, JWT, Swagger at /docs) handles HTTP. apps/web is the Next.js dashboard. apps/worker is the BullMQ consumer that owns Playwright, LLM calls, and SerpAPI.
  • API never blocks on long work --- routes enqueue jobs; only the worker touches network-heavy operations. Concurrency is per-queue (serp and rank at 8, crawl/audit at 2).
  • Multi-tenant via Workspace → WorkspaceMember → Project. Every domain record (Keyword, Crawl, Audit, AiQuery, Backlink, ContentBrief, Report) hangs off a Project.
  • The generic Score model is the transparent-scoring ledger — any new metric writes its factor breakdown there.
  • Raw HTML is content-addressed (Page.hash, SerpSnapshot.htmlHash) and stored in object storage; Postgres only holds the references.

GEO subsystem

Generative Engine Optimization is a first-class concern, not a bolt-on. The model is:

  • AiQuery — a tracked prompt for a project.
  • AiAnswer — one row per engine per fetch (raw response retained).
  • AiBrandMention — per-domain mention with kind = CITATION / PROSE / BOTH and sentiment.

The packages/geo and packages/llm packages route across OPENAI_CHATGPT, ANTHROPIC_CLAUDE, PERPLEXITY, GOOGLE_GEMINI. The score is citation share + prose share + position + engine coverage + sentiment, and the breakdown is published with the score.

Workspace map

  • apps/web — Next.js 15 dashboard.
  • apps/api — Fastify HTTP API; all writes enqueue jobs.
  • apps/worker — BullMQ consumer of 8 queues.
  • packages/db — Prisma schema + generated client.
  • packages/types — shared Zod schemas + queue payload types + QUEUES registry.
  • packages/llm — provider router (Anthropic / OpenAI / Perplexity / Gemini).
  • packages/crawler — Playwright crawler (robots.txt-aware, polite).
  • packages/serp — SERP fetcher (with SerpAPI fallback) + Google/Bing parser + intent classifier.
  • packages/geo — AI-engine fan-out, brand-mention detection, GEO Presence scoring.
  • packages/analyzer, packages/entities, packages/scoring — on-page rules, entity extraction, score writers.
crawler extractor GEO scorer index API console
as of 2026-04-26
TypeScript4992 TSX2283 Markdown478 CSS57 JavaScript25
monorepo: apps/ + packages/ · as of 2026-04-26
apps · 4242 packages · 3033
apps/ + packages/ · as of 2026-04-26

Surface

Self-hostable means it ships with everything an operator needs to bring a single-tenant index up: the crawler, the extractor, the GEO scorer (which runs LLM-style queries against a pinned set of major models and scores presence in their answers), the index, an API, and a small operator console. Single-tenant on purpose — per-tenant boundaries are a known source of cross-tenant data leakage and the AGPL alternative is to ship the boundary instead.

Roadmap

The shape ahead: more LLM targets pinned to specific snapshot dates (so a GEO score is reproducible against a specific model version), expansion of the crawl schedulers to honor robots.txt + per-host rate caps without operator hand-holding, and a small import path for existing Ahrefs / SEMrush exports so operators can backfill. None of this is shippable yet — the scoring rubric is the first thing that needs to be settled, because it determines what every other piece is being asked to optimize for.

:/ ESC