← All Projects
AI SystemsIn Development
Project Corpus
Harvesting the entirety of human literature and vectorizing it into usable, queryable knowledge.
Overview
Corpus is a large-scale literature ingestion and vectorization platform targeting 50,000–100,000 biomedical and scientific papers. The pipeline combines programmatic retrieval, AI-driven semantic interpretation, and dense vector embeddings — producing a retrieval-augmented knowledge base that Cadence and other systems can query with biological and conceptual precision.
Key Capabilities
- PubMed MCP connector for programmatic article retrieval at scale
- AI interpretation layer — extracts structured insight, not just text
- Dense vector embeddings indexed for semantic similarity retrieval
- Targeted ingestion: neuroscience, genomics, longevity, AI architecture
- Self-modeling loop — Cadence reads its own architecture literature
Strategic Motivation
The limiting factor in AI-assisted research is not compute — it is the absence of deep, domain-specific knowledge grounded in primary literature. Corpus closes that gap.