gapr,
SEO and GEO in one ledger.
A self-hostable, AGPL-licensed alternative to Ahrefs and SEMrush. First-class GEO --- Generative Engine Optimization --- scoring across the major LLMs.
gapr is built around the premise that ranking inside an LLM's generated answer matters as much as ranking on a Google SERP. Beyond traditional SEO (crawls, on-page audits, SERP tracking, backlinks, content briefs, keyword difficulty) it fans a tracked prompt out to ChatGPT, Claude, Perplexity, and Gemini, captures the answer + citations, and computes a transparent GEO Presence Score. Every score in the system exposes its factor breakdown via the Score.breakdown JSON column.
Tech scope
- Three-process runtime.
apps/api(Fastify, JWT, Swagger at/docs) handles HTTP.apps/webis the Next.js dashboard.apps/workeris the BullMQ consumer that owns Playwright, LLM calls, and SerpAPI. - API never blocks on long work --- routes enqueue jobs; only the worker touches network-heavy operations. Concurrency is per-queue (
serpandrankat 8,crawl/auditat 2). - Multi-tenant via
Workspace → WorkspaceMember → Project. Every domain record (Keyword, Crawl, Audit, AiQuery, Backlink, ContentBrief, Report) hangs off a Project. - The generic
Scoremodel is the transparent-scoring ledger — any new metric writes its factor breakdown there. - Raw HTML is content-addressed (
Page.hash,SerpSnapshot.htmlHash) and stored in object storage; Postgres only holds the references.
GEO subsystem
Generative Engine Optimization is a first-class concern, not a bolt-on. The model is:
- AiQuery — a tracked prompt for a project.
- AiAnswer — one row per engine per fetch (raw response retained).
- AiBrandMention — per-domain mention with kind =
CITATION / PROSE / BOTHand sentiment.
The packages/geo and packages/llm packages route across OPENAI_CHATGPT, ANTHROPIC_CLAUDE, PERPLEXITY, GOOGLE_GEMINI. The score is citation share + prose share + position + engine coverage + sentiment, and the breakdown is published with the score.
Workspace map
apps/web— Next.js 15 dashboard.apps/api— Fastify HTTP API; all writes enqueue jobs.apps/worker— BullMQ consumer of 8 queues.packages/db— Prisma schema + generated client.packages/types— shared Zod schemas + queue payload types +QUEUESregistry.packages/llm— provider router (Anthropic / OpenAI / Perplexity / Gemini).packages/crawler— Playwright crawler (robots.txt-aware, polite).packages/serp— SERP fetcher (with SerpAPI fallback) + Google/Bing parser + intent classifier.packages/geo— AI-engine fan-out, brand-mention detection, GEO Presence scoring.packages/analyzer,packages/entities,packages/scoring— on-page rules, entity extraction, score writers.
Surface
Self-hostable means it ships with everything an operator needs to bring a single-tenant index up: the crawler, the extractor, the GEO scorer (which runs LLM-style queries against a pinned set of major models and scores presence in their answers), the index, an API, and a small operator console. Single-tenant on purpose — per-tenant boundaries are a known source of cross-tenant data leakage and the AGPL alternative is to ship the boundary instead.
Roadmap
The shape ahead: more LLM targets pinned to specific snapshot dates (so a GEO score is reproducible against a specific model version), expansion of the crawl schedulers to honor robots.txt + per-host rate caps without operator hand-holding, and a small import path for existing Ahrefs / SEMrush exports so operators can backfill. None of this is shippable yet — the scoring rubric is the first thing that needs to be settled, because it determines what every other piece is being asked to optimize for.