v2.0 — Self-improving AI
Open Aya OS — the agentic, in-browser cognitive operating system.
Open Aya uses a CAISI-inspired evaluation framework to measure capability, cost, latency, auditability, and workflow lift across baseline models, the Aya Pipeline, and reasoner routes. The goal is not to claim AGI; the goal is to prove whether an AI operating layer completes organizational work better than fragmented AI tools.
28 integrated apps. Voice-native, vision-aware, local-first with optional cloud sync. Multi-agent routing across planner, executor, memory, verifier, and critic strategies — every step auditable through a public benchmark harness, not a brochure.
Open Aya OS
Intelligence Card
Live system facts. Every number on this card is generated at request time from the runtime registry or the public eval database — there is no separate marketing source to drift.
Model layer
- anthropic/claude-sonnet-4.6 (conversation tier)
- anthropic/claude-opus-4.6 (extended thinking, 10k budget)
- google/gemini-3-flash (multimodal)
- anthropic/claude-opus-4.6 (SWE-Bench leader)
Agent layer
6 routed strategies: planner, executor, memory_retriever, verifier, router, self_critic
Strategy-Auction routing implemented as system-prompt routing rules
Memory layer
Supabase + browser IndexedDB (local-first)
Kinds: short-term turn cache · long-term Auto-Dream consolidation · GraphRAG knowledge edges
Tool layer
6 built-in tools across 28 apps
Web search · Code execution (Code Lab) · File store (Spatial Files) · Calendar / Notes / Word Processor · …
Local-first status
Yes — runs in-browser; data stays on device by default
Cloud sync status
Optional — Supabase auth + persistence when signed in
Apps in registry
28
Generated from lib/app-registry.ts
Routed agents
6
Strategy-auction policies, system-prompt routed
Eval score (avg)
—
Across 0 completed runs
Last eval run
no runs yet
UTC server time
Avg latency / task
—
Wall-clock, includes network hop
Audit mode
Public — every eval result writes a reasoning trace to /api/aya/inspect and aggregates to /api/aya/audit
A/B comparison — pass rate by route
baseline
—
Claude Sonnet 4.6, no spine (control)
aya_pipeline
—
Claude Sonnet 4.6 + 7-stage cognitive spine
aya_reasoner
—
Claude Opus 4.6, extended thinking (10k)
What you can verify, right now, without an account.
- Public eval API. /api/evaluate accepts a prompt and returns the canonical result shape (task_id, category, answer, agents_used, confidence, latency_ms, cost_estimate, memory_used, audit_trace).
- Public status JSON. /api/aya/status lists every capability flag with an honest functional / claimed marker — no inference required.
- Public audit aggregates. /api/aya/audit publishes the A/B verdict between baseline
Claude Sonnet 4.6, Aya's 7-stage cognitive pipeline onanthropic/claude-sonnet-4.6, and theanthropic/claude-opus-4.6reasoner (extended thinking, 10k budget) across all completed runs. - Three live demos. Reasoning, memory, and agent routing run a canned task end-to-end and show the full reasoning trace.
- No claim without a receipt. Every superlative on this site links to a reproducible run with a JSON trace. Where data isn't available yet, we say so plainly instead of rounding up.
What we are not yet, and how you'll know when we are.
- Open Aya OS is not AGI and does not claim to be. ARC-AGI alignment refers to architecture (multi-strategy reasoning, verifier loops, cost-per-task accounting) — not to a published score.
- The strategy auction is currently implemented as deterministic system-prompt routing rules, not as six independent learned policies. The /eval harness measures the lift this routing actually provides over a
Claude Sonnet 4.6baseline running the same conversation tier without the cognitive spine — so the A/B delta isolates the wrapping, not a model upgrade. - “Self-improving” refers to per-user memory consolidation (Auto-Dream) and TinyAdapter parameter drift, not to weight updates of the underlying base model.
- Pass rates on /receipts are computed from real, persisted eval runs. If a tier shows “no data”, no run of that tier has completed yet.