Foundations

The five papers CiteForge is built on.

Every product decision maps to a citable claim—numbers you can audit, not marketing adjectives. Start here when you need to explain why a feature exists.

Cite map

Papers on the left; product surfaces on the right. Lines are schematic—each card below cites the same mapping in prose.

KDD 2024

GEO: Generative Engine Optimization

Aggarwal, Mittal, Murthy, Tian, Patel et al. (KDD 2024)

Why it matters

Foundational. Shows +41% visibility from GEO methods, +115% lift specifically for rank-5 sites. The headline result that says: structural changes can move you from invisible to cited.

Applied in

Content Agent uses Aggarwal's top techniques: Cite Sources, Statistics, Quotation Addition.

Content Agent

arXiv:2311.09735

Preprint 2025

Structural Feature Engineering for Generative Engine Optimization

Yu et al. (2025)

Why it matters

Operational. Defines six numerical thresholds (heading depth, paragraph length, format diversity, emphasis density, internal-link density, content density) and three engine paradigms (STS, IR, ISG) with per-paradigm weight profiles.

Applied in

Content Analyzer encodes every threshold and paradigm weight verbatim. Semantic preservation guardrail uses Yu's eq. 16 thresholds.

Content Analyzer

arXiv:2603.29979

Under review

Source Coverage and Citation Bias in LLM-based vs. Traditional Search Engines

Anonymous (under review, 2025)

Why it matters

Strategic. Shows LLM-SEs cite 37% unique domains vs Google, and that every LLM-SE except ChatGPT prefers less popular domains. Inverts the 'big brands will dominate AEO' objection.

Applied in

Authority Agent targets less-popular but topical domains. Citation Graph computes popularity-controlled lift over Tranco rank.

arXiv:2512.09483

2025

Generative Engine Optimization: How to Dominate AI Search

Chen et al. (2025)

Why it matters

Quantitative. Large-scale empirical comparison across engines. Documents earned-media bias, cross-engine variability, paraphrase sensitivity, cross-language stability.

Applied in

Probe Engine runs paraphrase ensembles by default and reports stability scores. Per-platform weighting uses Chen's bias measurements.

Probe Engine

arXiv:2509.08919

ICLR 2025

Adversarial Search Engine Optimization for Large Language Models

Nestaas et al. (ICLR 2025)

Why it matters

Defensive. Catalogues PMA attack classes: visual hidden injection, instruction override, preference biasing, cross-page injection. Proves the position effect and the prisoner's dilemma.

Applied in

Defensive Mode detector implements all four attack classes. Enterprise upsell scans both your pages and competitor pages.

Defensive Mode

arXiv:6195

What's not in here yet

v2 research ingestion on the roadmap includes deeper Google AI Overviews coverage, ContextCite-style attribution beyond our enterprise defensive tier, and Tranco rank as a first-class popularity prior for the Citation Graph—shipping as those pipelines harden.