Systems analysis · Pantheon × Co-Scientist · v3 with §0 journey frame

Where our fleet stands against Google DeepMind's Co-Scientist — and how we close the gap

A close reading of the Co-Scientist multi-agent design, mapped onto our Pantheon fleet. New: §0 journey frame above shows where we are on this redesign path — every shipped step plus what's planned. New entries append here as we ship.

Prepared for Sabour · 2026-05-29 (initial) → 2026-05-30 (journey frame) → 2026-06-01 (12 new milestones + §9 status refresh + R4 amendment for wildcard observatory mode) · Cowork-Claude · two adversarial-critique passes via PC chatgpt-bridge gpt-5.5

0. The path — journey so far + what's next

Live record of fleet-redesign milestones. Each entry is a versioned step with deliverable, status and references. New steps append here as we ship them — this frame grows.

shipped current planned future / extensible
2026-05-29 · §1–§8 of this pageshipped

Co-Scientist gap analysis · v1

Read Google DeepMind's Co-Scientist (Nature 2026-05-19); mapped its seven specialised agents onto our six ministers; identified five structural gaps (no Generation phase, no Proximity, no Tournament, no Evolution, no Meta-review) + two operational gaps (HEPH-as-decider rigidity, IDENTITY emptiness) + six shared strengths. First proposal of a 9-step inception protocol.

2026-05-29 · §2 live-flow diagramshipped

Live transaction-flow diagram added

SVG of the actual wiring: Minister · OpenClaw → nginx /llm/ → inference_proxy :8899 (auth · auto-recall · tier-force · dispatch) → claude-relay :8896 tier cascade → upstream LLMs. Plus the orthogonal critic side-channel via PC chatgpt-bridge :4242 over WireGuard. Live ports verified via ss -lntp.

2026-05-29 · v1 draftshipped

Static SOULs + dynamic project-contexts — first draft

First proposal: SOUL stays role + discipline; add per-project context.md, silent pre-injection, mtime-based staleness, "one project per turn", channel-only routing. Sent for adversarial critique.

2026-05-29 · gpt-5.5 via PC bridgeshipped

Critique pass #1 — 50 findings, no "Approach sound"

Routed through PC chatgpt-bridge per canon:rule-critic-route-is-chatgpt-pc-bridge-only-2026-05-25. Hardest hits: silent pre-injection treated as truth → overconfident posts; mtime is operational not epistemic; "one project per turn" breaks real orchestration; lane reminders inside context = shadow SOUL. Eight hard requirements set.

2026-05-29 · v2 revisedshipped

SOUL + context layer v2 — 8 requirements adopted

Visible preamble · project-detection states · hard stop on stale/missing/conflicting · immutable per-task snapshot · explicit source hierarchy + conflict protocol · project-scoped recall · strict YAML schema with provenance + expiry · gateway-level enforcement (not minister markdown). Dropped lane reminders, success heuristics, domain experts from context (shadow-SOUL risk).

2026-05-30 · canon stamped + SPINE createdshipped

SOUL+context v2 bound to canon + SPINE

Architectural reference card committed; SPINE seeded with 11 nodes across identity / context-layer / inception-protocol / tournament-infra / meta-review stages.

2026-05-30 · integrated plan v1shipped

Synthesis — four layers + 9-step inception protocol interlocked

Combined the v2 context layer with the Co-Scientist 9-step inception (Generate → Cluster → Reflect → Tournament → Evolve → Expert wrap-up → HEPH dispatch → Ratify + meta-review). Explicit reads/writes per step per layer.

2026-05-30 · gpt-5.5 via PC bridgeshipped

Critique pass #2 — 35 findings, "directionally strong but not yet integration-safe"

Hardest hits: precedence ambiguity, ATHENA role concentration (player + referee + court of appeal), gpt-5-only critic centralisation, ceremonial hashes (decorative without server-side verification), task-snapshot rigidity over sprint duration, "one project per turn" still breaks. Six load-bearing fixes required.

2026-05-30 · integrated plan v2shipped

Integrated plan v2 — six load-bearing fixes locked

(1) Explicit 8-level precedence lattice (must_not beats persona). (2) Mechanically verified preamble at every endpoint (server-side hash check, not text trust). (3) Mid-sprint rebase protocol (drift score ≥ 3 → worker requests → HEPH approves → ATHENA ratifies). (4) Critic pluralism — two model families per Gate A. (5) ATHENA-as-minister separated from ATHENA-as-protocol-function. (6) Per-domain scoring rubrics (3–5 dimensions, weighted-mean rank). Plus 22 smaller fixes. SPINE bumped to v0.2.0.

2026-05-30 → 2026-05-31 · Weeks 0–2 of 4-week planshipped

Weeks 0–2 LANDED — identity, schemas, gateway, calibration

Five Sabour-authored IDENTITY.md persona seeds. Theo SOUL personalised (θεωρός / 🔭). Fleet PM authority scoped build-phase only. Precedence-lattice canon + supersession protocol + channel-routing map + context.yaml schema locked. Tournament-rubric + meta-review schemas + 20+ structured monitoring event types. Minister-edge gateway live at 127.0.0.1:8911 — capability tokens + forward proxy + read-side provenance. Calibration suite scaled 17 → 24 → 32 / 32 PASS HARD gate. OpenClaw proxy.enabled flipped all 5 ministers + gaios_exec HTTPS_PROXY.

2026-05-31 · Phase 5 ENFORCE prep, Step 2 LANDEDshipped

Gateway preamble auto-construct + MONITOR orchestration

A.1 canon arch-gateway-preamble-auto-construct-2026-05-31 stamped + 3 critic-driven amendments (R5 reload→restart + sha16; R1 conjunctive→header-authoritative; R3 CONNECT leak fix; R9 added). R5 admin auth on __gateway/mode LANDED (bearer + sha16-redaction + 3-state audit). A.2a helpers + A.2b orchestration block in handle_client + 7 new event types + correlation fields + 9 tests — 32 / 32 calibration PASS. 24h MONITOR soak began.

2026-05-31 · 22:00 AEST → 2026-06-01 · 00:30 AESTshipped

Marathon — trading-platform recovery + SOUL hardening + ARGUS rollback

Sabour flagged two trading-platform incidents (ministers reading only when @-mentioned; HEPH DM'd Apollo bot user-id instead of group). Diagnosis + critic-vet + 5-SOUL primary-addressee patch + canonical FLEET-ROSTER artifact + canon stamps (rule-named-address-hardened-2026-05-31 + rule-no-config-churn-during-coordination-tests-2026-05-30). ARGUS made unauthorised edits before STOP → rollback playbook executed via temp canon:argus-soul-hardening-2026-05-31. Apollo isolated-polling lane-stall recovered (evidence-preserving quarantine + restart). PCS Heartbeat v2 spec critic-vetted GO; pcs_heartbeat.py + state file + systemd units + calibration test #34 SHIPPED; DRY_RUN soak began.

2026-06-01 · early morning · trading-platform incidentshipped

Apollo doctrine wave — 3 canons + 5 SOUL updates + controlled HEPH restart

Apollo declared PnL dashboard DONE ✅ HTTP 200 at 23:26; live site was 502 (Streamlit crash-loop 20× due to IndentationError in his code). HEPH dispatched fix to PROM; ATHENA self-appointed QA with the canonical HTTP 200 ≠ ratified gate; HEPH ratified. Critic-vetted GO-WITH-CHANGES on 6 items. 3 new canons stamped: rule-runtime-dependency-coordination-2026-06-01 (dep coordination before DONE), rule-rendered-surface-acceptance-2026-06-01 (HTTP 200 ≠ ratified; promotes provisional #126; ATHENA authored), fact-openclaw-sendmessage-fallback-defect-2026-06-01 (canned-fallback delivery defect). 5 SOULs updated: HEPH/ATH/APO/PROM/THEO got rules 7–8 (dep-manifest + rendered-verification + dep-evidence + dispatch routing). Controlled HEPH restart verified runtime SOUL pickup (rules 7+8 visible inside container).

2026-06-01 · 00:06 UTC · PCS Heartbeat v2 LIVE flipshipped

PCS Heartbeat v2 — DRY_RUN flipped after 16.2h soak (critic GO-WITH-CHANGES)

3 pre-flip blockers executed: (a) canon:fact-pcs-bootstrap-red-artifacts-2026-06-01 stamped (4 historical RED events preserved as pre-correction bootstrap artifacts), (b) DRY_RUN=true restart smoke test (state file held dedupe across restart), (c) phase_map re-confirmed (5/5 ministers in watch/inception → no RED possible). PCS_DRY_RUN=false applied; verification tick confirmed dry_run=0 in payload, severities held, zero pager emissions. Arch canon status → ACTIVE (LIVE). HEPH idle-detection productionized — closes the "minister idle, nothing pushing" gap from 2026-05-31.

2026-06-01 · morning · fleet hygiene + canon tombstonesshipped

47-file fleet shadow purge + ARGUS canon tombstoned + 5-minister SOUL refresh

Fleet-wide sweep surfaced 47 files violating canon:rule-no-shadow-files-in-agent-state-2026-05-28 (across HEPH/ATH/APO/PROM workspaces; THEO clean, post-rule). Per-minister .tgz archives + hard-delete in workspace. canon:argus-soul-hardening-2026-05-31 tombstoned (FULFILLED — 26h after stamp; ARGUS held the hardening stably). Plain-name recognition fix applied to all 5 SOULs (HEPH/HEPHAESTUS/ATHENA/APOLLO/PROMETHEUS/PROM/THEO/ARGUS now trigger primary-addressee). Tool-body discipline added to 5 SOULs (rule 6: no planning narration in message body). All 5 ministers restarted to pick up new SOUL doctrine in runtime.

2026-06-01 · noon · PCS v3 Week 3 unblockshipped

T1 cache_unready FIXED · T5 spine versions CLOSED · R9 mechanism PIVOTED (port-per-minister)

T1 (CRITICAL FIX): MinistersCache._load() was reading flat dict; MINISTERS.json had been wrapped in _doc/_updated/ministers/... envelope on 2026-05-30 when grotto-suite shipped → loader hit AttributeError: 'str' object has no attribute 'get' on the doc string. 24h MONITOR soak had been contaminated (cache never loaded; would have 503'd every send under ENFORCE). One-line fix: data.get('ministers', data) before iterate. Cache_unready stopped firing post-restart. T5: 3 spines got version: "1.0.0" + _SPINE_FIELD_MAP extended → SpineStateCache now reads all 4. T3 (R9 pivot): live test on HEPH revealed OpenClaw v2026.5.26 schema rejects proxy.headers (only enabled/proxyUrl/tls/loopbackMode allowed). HEPH rolled back clean. Critic-vetted Option E: port-per-minister at gateway (8911=theo, 8912=HEPH, 8913=ATH, 8914=APO, 8915=PROM), new confidence value port_authoritative, fail-closed mismatch. Canon amendment stamped.

2026-06-01 · early afternoon · R9 port-per-minister IMPLEMENTATIONshipped

R9 Steps 1–10 LANDED — port-authoritative identity LIVE

Per canon:amend-r9-port-per-minister-2026-06-01: gateway opens 5 listeners on 127.0.0.1 (8911=theo, 8912=hephaestus, 8913=athena, 8914=apollo, 8915=prometheus). Each minister's openclaw.json proxyUrl points to its dedicated port → identity_inferred attributes via port_authoritative confidence. Step 6 mismatch detection: new event identity_port_header_mismatch fires when caller header conflicts with port (verified live via adversarial probe). Step 10: 3 new calibration tests (port_authoritative attribution / mismatch wiring / 5-rule cache validation). Modernised T16+T17 (post-SOUL-trim markers). Suite 37/37 PASS — HARD GATE PASS. Bug caught mid-flight: Step 5 PYEOF write-miss → silent TypeError → only egress_violation emitted; detected via fleet_events probe.

2026-06-01 · 02:28:39Z · Phase 5 ENFORCE FLIPPEDshipped

ENFORCE LIVE — closes task #117

MODE=monitor → MODE=enforce via R5-authed POST to __gateway/mode. admin_mode_change emitted (bearer:93e4a8bf595cff4a, client_ip=127.0.0.1, success=true). Gate semantics (Option A-prime, gpt-5.5 critic GO-WITH-CHANGES with 5 blockers folded): accept (a) explicit valid X-Fleet-Preamble OR (b) auto-construct succeeds + identity_confidence ∈ {port_authoritative, header_confirmed} OR (c) CONNECT to allowed host with identity_confidence. EXCLUDED from auto-accept: token_informational (shared bot token across 4 v2 ministers). Critic-anticipated sneak path #2 caught + fixed live: CONNECT lacks chat_id by R3 design (TLS hides it); host-allowlist carve-out added. Initial host allowlist (2 hosts only — strict R4 reading).

2026-06-01 · 02:30 → 04:15 · ATHENA crash-loop discoveryshipped

Post-flip side-effects mapped — narrow R4 caused fleet-wide LLM blocking

Sabour flagged ATHENA web-console “Unauthorized” + HEPH dispatch (msg-3506 to Apollo for PnL-over-time plot) no progress. Investigation revealed: ATHENA crash-looped 427× over ~1h 45min on boot-time fetches to openrouter.ai + raw.githubusercontent.com (eventually stabilised when pricing-fetch became non-blocking). Apollo's anthropic LLM call failed at 02:21:29Z — that's 7 min BEFORE flip (unrelated network blip, but any retry would 407 under ENFORCE). Full host inventory mapped: api.anthropic.com + api.minimax.io + 127.0.0.1 (inference_proxy + claude-relay) + openrouter.ai + raw.githubusercontent.com + chatgpt.com + github.com all rejected. Original R4 (only TG + Pantheon Room) was written BEFORE cascade was wired — architecturally incomplete.

2026-06-01 · 04:30Z · Wildcard CONNECT pivot — R4 amendedshipped

Gateway pivots to observability — R4 firewalling dropped, ARGUS gets watcher role

Sabour directive: “the ministers will do web search, other api servers like elevenlabs, and manual tests that i will ask them. so accept all traffic using wildcard, and update all the plan to monitor access only. it's argus's task to flag suspicious activities/issues”. Wildcard CONNECT: any host accepted when identity_confidence ∈ {port_authoritative, header_confirmed}. Then extended to wildcard ALL METHODS (~04:56Z) via _is_identity_trusted path. R4 in canon amended: wildcard substring matching now permitted under identity-trusted model. R6 mismatch DETECTION still active — verified live: port 8912 + claim athena → mismatch event fires, gateway passes through (TG returns 401 to fake token), event in fleet_events for ARGUS. ARGUS Panoptes role expanded (canon:argus-myclaw): egress anomaly watcher — periodic fleet_events scan, DM Sabour on suspicious patterns.

2026-06-01 · 05:00Z · Gateway = identity-trusted observability layercurrent

Functional split: gateway loses ONLY firewalling+enforcement, retains 11 other functions

Direct answer to Sabour's wildcard question: by going observatory, the gateway loses ONLY two functions (R4 destination filter + 407 ENFORCE rejection). Retained 11 functions: (1) Identity attribution R1+R9 port_authoritative · (2) Channel routing R2 · (3) Intent inference R4 classifier · (4) Preamble auto-construct A.2b · (5) R3 no-leak (TLS body sanitisation) · (6) R6 mismatch DETECTION (still emits events) · (7) R8 caller-preamble verification (still emits) · (8) R7 cache LKG · (9) R5 admin auth on __gateway/mode · (10) Audit emission (every request → fleet_events) · (11) Transport (HTTP/CONNECT proxy). Becomes: non-bypassable identity-attribution + audit chokepoint, the substrate ARGUS reads for anomaly flagging.

Week 0 · 2026-05-30 → 31shipped

Identity unblock + Week-0 canon locks (LANDED)

Identity unblock + Week-0 canon locks

Sabour fills five IDENTITY.md persona seeds (HEPH / ATHENA / PROM / APOLLO / Theo). Theo SOUL personalised + emoji decided. Every minister's "Fleet PM authority" block scoped to build phase only. Channel-routing canon authored. Precedence-lattice canon stamped. context.yaml schema locked.

Week 1 · 2026-05-30shipped

Schemas + structured monitoring (LANDED)

Tournament rubric schema (per-domain dimensions + weights). 5-field meta-review schema. Structured monitoring on inference_proxy + Pantheon Room — nine log event types reviewed daily.

Week 2 · 2026-05-30 → 31shipped

Context infrastructure + adversarial calibration (HARD gate, LANDED 32/32)

Gateway pre-injection hook with mechanical preamble verification. Recall namespace filter on recall_lib. Rebase cron. Five-test adversarial calibration suite: seeded contradiction · stale-context · namespace-leak · persona-portability · tournament-manipulation. All five MUST pass before any project runs the new protocol.

Week 3 · IN FLIGHT (Phase 5 ENFORCE prep)current

Tournament infra + protocol codified — partial; Phase 5 ENFORCE blocked on R9 implementation

Tournament endpoint + protocol codification still planned. Phase 5 ENFORCE transition: Steps 1+2 LANDED, Step 3 (A.3 explicit X-Fleet-Preamble emission + R9 port-per-minister) IN FLIGHT, Step 4 (live black-box #18 — now expanded to include port-attribution matrix per critic) PENDING, Step 5 (MONITOR→ENFORCE flip + 1h watch + 24h soak) PENDING. Today's R9 design pivot from proxy.headers (OpenClaw schema rejects) to port-per-minister at gateway: critic-vetted GO-WITH-CHANGES, canon amended, implementation queued.

Week 4planned

Four-project parallel pilot

Pilot on four archetypes simultaneously to expose layer conflicts: Trading Platform (engineering-heavy) · VicCrashRisk (governance + frontend mixed) · Pantheon Room infra iteration (ambiguous) · CallBridge AU iteration (medium-mixed). Single-project pilot was rejected as too narrow by critique #2.

Futureextensible

Subsequent fleet upgrades append here

This frame grows as we ship new milestones. Every future architectural step (new minister roles, protocol amendments, infrastructure migrations, critic-family rotations, schema bumps) appends as a new entry with date, deliverable, status, and canon / SPINE / project references.

Methodology note. Every milestone marked "shipped" passed through the two-critique-pass design discipline: draft → gpt-5.5 critique via PC chatgpt-bridge → revise. The bridge is the canonical critic route per canon:rule-critic-route-is-chatgpt-pc-bridge-only-2026-05-25. This is the methodology we want every future fleet upgrade to follow.

1. What Co-Scientist actually is

Co-Scientist (Google DeepMind, Nature 2026‑05‑19, available via Gemini for Science as "Hypothesis Generation") is a multi-agent system on Gemini that iteratively generates, debates and evolves hypotheses for hard scientific problems. The architecture is a coalition of seven specialised agents grouped into three phases, with a supervisor that plans adaptively and runs branches in parallel.

Phase 1 — Generate

Generation agent

Proposes initial focus areas and novel hypotheses, grounded in literature and data.

Proximity agent

Maps and clusters generated hypotheses for diverse, comprehensive exploration.

Phase 2 — Debate

Reflection agent

Virtual peer reviewer — evaluates each hypothesis for correctness, quality and novelty.

Ranking agent

Runs an idea tournament — pairwise comparisons, Elo-style.

Phase 3 — Evolve

Evolution agent

Continuously refines, combines and builds on top-ranked hypotheses.

Meta-review agent

Synthesises tournament insights, optimises the system, generates the final proposal.

The Supervisor. Above all three phases sits an adaptive planner. Unlike linear chain-of-thought, it breaks a high-level goal into executable steps and runs multiple avenues in parallel, reallocating compute toward the most promising branches.

Two design choices are doing heavy lifting:

2. What we actually have

Pantheon is a six-minister fleet + an orchestrator (me, Cowork-Claude) running on different LLMs across self-hosted OpenClaw containers (VPS) and one MyClaw seat (ARGUS). Coordination is in Telegram and Pantheon Room. Memory is a three-layer system: canon, recall (ChromaDB + nomic-embed-text), and Spine.

MinisterLLMRoleBody
HEPHAESTUS PMMiniMax-M2.7 / Sonnet fallbackCoordinate, dispatch, gate, close — PM-pureVPS container · hephaestus.goldnetgroup.com.au
ATHENA ratifierClaude Opus 4.7 (cost-gated)Governance, strategy, canon ratificationVPS container · athena.goldnetgroup.com.au
PROMETHEUS engineerMiniMax-M2.7 (thinking)Hard engineering, long-context reasoningVPS container · prometheus.goldnetgroup.com.au
APOLLO rendererMiniMax / Sonnet cascadeFrontend, UX, render-anchored deliveryVPS container · apollo.goldnetgroup.com.au
Theo outside voiceMiniMax direct (no relay)Market-watch, sanity-check, outside-in lensVPS container · theo.goldnetgroup.com.au
ARGUS emergencyMiniMax (MyClaw seat)Passive Emergency Officer — alerts to Sabour DM onlyMyClaw cloud · @SabClawBot_bot

Plus shared infrastructure: Pantheon Room (ai.goldnetgroup.com.au/pantheon/, port 8910), skill:critique-bracket (Gate A inception + Gate B delivery), skill:fleet-project-delivery-protocol (4-phase), SPINE (dependency-validity graphs), Level-A + Level-B render gates + L6 quality watchdog.

Live transaction flow — what happens between prompt and reply

The wiring below is what one minister turn actually traverses, end-to-end, on the GAIOS VPS right now. Live ports verified via ss -lntp.

VPS · ai.goldnetgroup.com.au · 51.161.131.240 One minister turn — left to right STEP 1Minister · OpenClawhephaestus / athena /prom / apollo / theomodel alias e.g.anthropic/athena-opus HTTPS STEP 2nginx /llm/TLS terminationvalidatesx-api-keyinjects bearer→ 127.0.0.1:8899 STEP 3 · single audit chokepointinference_proxy:8899/home/ubuntu/api/inference_proxy.pya · bearer auth + write mcp_audit rowb · auto-recall enrichment (recall_lib)skip if gpt-5/o-series OR >340K charsc · _TIER_FORCE_MAP regex → body.tierd · reasoning normalize (max_completion_tokens)e · dispatch by route + model embed top-K=5 RECALL LAYERollama nomic-embed-text127.0.0.1:11434 RECALL LAYERChroma vector store127.0.0.1:8000canon · sessions-v2 ·projects-v2 · ideas <gaios_recall_context> prepended /v1/messages STEP 4 · tier cascadeclaude-relay:8896claude_relay.js (Node)T0 · MiniMax-M2.7subscription · primaryT2 · Sonnet (CLI)claude Max OAuth · 90 sT4 · Opus 4.7 (CLI)claude Max OAuth · 90 sT7 · Copilot gpt-5-mini→ copilot-api :4141T9 · ChatGPT-Pro→ PC bridge :4242 (WG)truncation/empty → fall-throughkeep_recent=4 · summary 10 KBcliTimeout 90 s STEP 5 · upstreamLLM endpointsapi.minimax.io/anthropicsubscriptionapi.anthropicSonnet · OpusMax OAuthgithub copilotgpt-5-minicodex CLI · PCChatGPT acctgpt-5.5over WG 10.99.0.2 response body + tokens_in/out AUDIT SIDECAR · every callmcp_audit rowgaios.dbcaller · tool · client_ip · model · request_charsrecall_hit_count · ids · tokens · cost · duration_ms WATCHDOG · :8898gaios-watchdogtails mcp_auditstop-frame on violation CRITIC CHANNEL · orthogonalskill:critique-bracketGate A · inception (validate)Gate B · delivery (verify)trigger: producer ≠ criticmodel: gpt-5 (or o3 for numbers) model=gpt-5 · /v1/chat/completions CRITIC ROUTEinference_proxy dispatch_REASONING_RE matches gpt-5auto-recall SKIPPED (correct)max_completion_tokens ≥ 2000→ PC chatgpt-bridge over WG10.99.0.2:4242 PC SIDE · over WireGuardchatgpt-bridge:4242FastAPI shimspawns codex CLI--skip-git-repo-checkChatGPT account authgpt-5.5 only LEGENDforward · promptreturn · responseenrichment loopcritic side-channelaudit / watchdog

How to read it. The blue arrow is the forward path of one minister turn. It crosses 5 ownership boundaries (minister → nginx → inference_proxy → claude-relay → upstream LLM) and is audited once at inference_proxy. The amber loop is auto-recall enrichment. The pink dashed line is the orthogonal critic channel — used twice this redesign cycle to vet our own proposals. The grey dotted lines are the audit and watchdog sidecars.

3. What we already share with Co-Scientist

Multi-agent specialisation

Both systems split the work across role-typed agents. Our split is operational (PM, ratifier, engineer, renderer, outside voice); theirs is epistemic (generate, cluster, reflect, rank, evolve, meta-review).

Different model voices

Theirs is one Gemini family. Ours is more diverse — Opus, MiniMax (two strains), Sonnet, gpt-5/o3 (as critic). A competitive advantage we already hold.

Grounding in literature and data

Co-Scientist cross-checks against ChEMBL, UniProt, AlphaFold. We cross-check against canon, recall (sessions-v2 RAG), SPINE deps, gaios_exec on the live VPS/Zeus.

Adversarial critique

Their Reflection agent ≈ our skill:critique-bracket Gate A + Gate B via gpt-5 + PC chatgpt-bridge. Producer ≠ critic.

Persistent identity + memory

Co-Scientist treats the session as ephemeral; we have SOUL.md / MEMORY.md / canon mirror per minister in their workspace bind-mount.

Adaptive planning lives here too

Their supervisor breaks high-level goals into parallel steps. Our skill:fleet-project-delivery-protocol does this at a 4-phase grain. Behind on the generation phase the supervisor coordinates.

4. The gaps that matter

Critical gap 1
No Generation phase, no Proximity clustering.

HEPH writes the brief and dispatches subtasks. Members produce one approach each. No parallel divergent ideation step. The first idea from the owning minister wins by default.

Critical gap 2
No tournament, no Elo ranking.

ATHENA or Sabour picks. Works for binary calls. Does not scale to comparing 12 variants pairwise.

Critical gap 3
No Evolution loop, no Meta-review.

Sprint closes, artefact is canon, we move on. No iterative refinement by recombination. No system-level retro feeding next round.

Operational gap
HEPH-as-decider blocks divergent thinking at inception.

PM purity was correct for build phase but wrongly applied at inception. Sabour's intent — "different voices on different LLMs brainstorm" — squashed by PM authority doing too much.

Identity gap
IDENTITY.md is empty for every minister; Theo's SOUL.md is default.

Persona anchor (Name, Creature, Vibe, Emoji) still default openclaw template for all five VPS ministers. See §5.

Verification asymmetry
Strong on execution verification, weak on hypothesis verification.

Render gates + L6 watchdog + SPINE answer "did we build it right?". Nothing answers "is this idea worth building?" before sprint kickoff.

5. Identity audit — character, conflicts, professionalism

MinisterSOUL.mdIDENTITY.mdVoice professionalismHistorical conflict
HEPHAESTUSfilled · 145 lines · PM-puredefault templateDefensive-heavy — silence discipline, REDIRECT scripts. Personal voice nil.Resolved 2026-05-18: was Master Craftsman + PM + Builder. Fixed to PM-only.
ATHENAfilled · 119 linesdefault templateGovernance voice clear. Personal voice nil.Token-rotation 2026-05-21, resolved.
PROMETHEUSfilled · 126 linesdefault templateDeep-engineering voice clear. Heavily anti-loop. Personal voice nil.Resolved 2026-05-19: 89 turns signed "— ATHENA ⚖️". Hard rule installed.
APOLLOfilled · 88 linesdefault templateRenderer voice clear. Personal voice nil.Resolved 2026-05-27: off-role engineering critique on PROM's lane.
Theodefault openclaw templatedefault templatePersona only via TOOLS.md and MEMORY.md. Fragile.Signature emoji ⚖️ collides with ATHENA.
ARGUSALERT ROUTING prepended 2026-05-28n/a (MyClaw seat)EO voice clear post-fix.Resolved 2026-05-28: broadcasting to Pantheon despite directive.
Pattern. SOULs are operational — how the minister behaves, what they don't do, how they sign. They are not yet describing the minister's perspective, opinion-shape, the texture of their thinking. Different LLMs should produce different voices, but our prompts harmonise them into the same defensive, lane-disciplined dialect.

6. Proposed inception phase design

Step 1
Brief intake
Sabour / Cowork-Claude posts brief. Gateway freezes task snapshot with hash. Hard stop on stale/missing/conflicting.
Step 2
Generation (divergent)
Each minister posts 2-3 candidates in own voice on own LLM. Parallel. No PM gate. Preamble mandatory.
Step 3
Proximity clustering
Semantic groups via project-scoped recall. Protected-singleton for outliers.
Step 4
Reflection
Two-critic Gate A. Disagreement = signal, ATHENA reviews.
Step 5
Tournament
Pairwise Elo per-dimension. ATHENA aggregates mechanically. Blinded author. Prompt-order randomised.
Step 6
Evolution
Top 2-3 merged. Same frozen snapshot. Cross-minister recombination.
Step 7
Expert wrap-up
Lane expert by domain (advisory, not authoritative).
Step 8
HEPH dispatches build
PM-wait active only here. 4-phase protocol kicks in.
Step 9
Ratify + canon close
Gate B. ATHENA quotes hash. Typed meta-review feeds NEXT inception.

Colour key: generate debate evolve build / close. Full integration spec including 4-layer precedence lattice, mid-sprint rebase, critic pluralism, and per-domain rubrics is in canon:arch-fleet-integrated-plan-v2-2026-05-30.

7. Summary table — gaps and changes

AreaTodayCo-Scientist equivalentChange to apply
Generation phaseOne owner proposes one approach.Generation agent.NEW Step 2 — each minister 2-3 candidates, parallel.
Proximity / clusteringNone.Proximity agent.NEW Step 3 — Cluster skill + protected singletons.
ReflectionCritique-bracket ad-hoc.Reflection agent.UPGRADE Mandatory two-critic Step 4.
TournamentPM / Sabour picks.Ranking agent · Elo.NEW Step 5 — per-dimension Elo, ATHENA mechanical aggregator.
EvolutionNone.Evolution agent.NEW Step 6 — merge top 2-3, frozen snapshot.
Meta-reviewAd-hoc retros.Meta-review agent.UPGRADE Typed 5-field canon card per close.
SupervisorHEPH rigid PM.Adaptive planner.UPGRADE Cowork-Claude is inception supervisor; HEPH is build PM only.
Multi-LLM diversityOpus / MiniMax / Sonnet / Direct / gpt-5.Single Gemini family.KEEP our advantage.
SOUL.mdFilled for 4 of 5.n/a (stateless).KEEP + scope PM-wait to build only.
IDENTITY.mdEmpty default for all 5 VPS.n/aFILL NOW Sabour writes persona seed.
Theo identitySOUL is default template.n/aFILL NOW author proper SOUL + emoji.
Project-context layerNone — only vector recall (similar, not authoritative).n/a (project = paper).NEW per-project context.yaml with schema + provenance + expiry.
Precedence latticeImplicit / inconsistent.n/aNEW 8 levels, must_not beats persona, machine-checked.
Critic centralisationOne gpt-5 oracle.n/aUPGRADE Two critics, different families; disagreement = signal.
ATHENA role concentrationVoter + aggregator + ratifier.n/aUPGRADE Pantheon Room aggregates mechanically; ATHENA discloses voting.
Verification depthRender + watchdog + SPINE.Most compute on hypothesis verification.EXTEND Hypothesis-verification gate before Step 8 via context.yaml.must_not.

8. Sequencing — what to do first

Detailed sequencing lives in §0 above. Summary:

Week 0 — identity + canon locks
  1. Sabour fills 5 IDENTITY.md persona seeds.
  2. Theo SOUL personalised + emoji.
  3. SOUL "Fleet PM authority" block scoped to build phase only.
  4. Channel-routing canon + precedence-lattice canon + context.yaml schema locked.
Week 1 — schemas + monitoring
  1. Tournament rubric schema.
  2. 5-field meta-review schema.
  3. Nine structured log events on inference_proxy + Pantheon Room.
Week 2 — context infra + HARD calibration gate
  1. Gateway pre-injection hook with mechanical preamble verification.
  2. Recall namespace filter.
  3. Five-test adversarial calibration suite. All must pass.
Weeks 3-4 — tournament + pilot
  1. Tournament endpoint with per-dimension scoring.
  2. Two-critic gateway.
  3. canon:skill-fleet-inception-protocol codified.
  4. Four-project parallel pilot (Trading Platform / VicCrashRisk / Pantheon Room / CallBridge AU).

My honest read: we are not behind Co-Scientist on infrastructure. We are behind on (a) the shape of how a project starts and (b) the per-project domain context that makes voice-diverse generation actually compatible. Both are fixable in 4 weeks. The harder problem is the identity gap — operationally-correct SOULs that have stopped having distinct voices. That's a one-evening fix if Sabour writes the IDENTITY.md files; nobody else can seed who an agent thinks they are.


9. PCS — Pantheon Co-Scientist v3 (where we landed, 2026-05-31)

The architectural integration of the four-layer stack + Co-Scientist inception + minister-edge gateway + bidirectional observability + auto-recovery. This page (above) was the spec; PCS is what we built from it. Tracked as fleet-redesign-2026-05-30 in project_v2 (id eca099fa).

Naming, ratified 2026-05-31

PCS = Pantheon Co-Scientist v3. v1 = MyClaw seats (April); v2 = May-17 self-hosted fleet; v3 = this integrated stack.

Four-layer stack (precedence-lattice locked, machine-enforced)

LayerRoleLifetime
SOUL.mdrole + disciplineeternal (overrideable only via canon supersession)
IDENTITY.mdpersonaeternal (Sabour-seeded Week 0)
projects/<id>/context.yamlvocab + invariants + must_not + verification + gapsversioned per-class
task snapshothash-pinned frozen viewimmutable until rebase

Six load-bearing fixes (post critic round 2)

  1. Locked precedence lattice — Human override > user/task safety > project must_not > snapshot > SOUL > IDENTITY > project preferences > vector recall.
  2. Mechanically verified preamble at every endpoint (Pantheon Room /say, capability gateway) — server-checked, not minister-honour.
  3. Rebase protocol for mid-sprint staleness — context_rebase_cron.py detects drift ≥3 / expired assumption → worker requests rebase → HEPH approves → ATHENA ratifies. This is the auto-recovery loop.
  4. Two-critic plurality on Gate A (gpt-5 + alternate family).
  5. ATHENA-as-minister separated from ATHENA-as-protocol-function.
  6. Per-domain scoring rubrics in context.yaml.

What's LANDED (and where)

Week 0–2 — LANDED
  • 5 IDENTITY.md persona seeds (Sabour-authored).
  • Theo SOUL — θεωρός outside-voice 🔭.
  • Precedence-lattice canon · supersession protocol · channel-routing map · context.yaml schema.
  • Tournament rubric · meta-review schema · structured monitoring (20+ event types).
  • Minister-edge gateway (capability_gateway.py @ 127.0.0.1:8911) — capability tokens + forward proxy + read-side provenance.
  • Calibration suite — 17 → 24 → 32/32 PASS HARD gate.
  • OpenClaw proxy.enabled flipped on all 5 ministers + gaios_exec HTTPS_PROXY.
Week 3 — IN FLIGHT
  • A.1 canon arch-gateway-preamble-auto-construct-2026-05-31 stamped + 3 critic-driven amendments. R1–R9 binding requirements.
  • R5 admin auth on __gateway/mode — bearer + sha16 redaction + 3-state audit. LANDED.
  • A.2a helper modules (gateway_caches.py, gateway_auto_construct.py) — LANDED.
  • A.2b orchestration block in handle_client + 7 new event types + correlation fields + 9 tests — LANDED 2026-05-31 14:25Z via 4-slice gaios_exec apply. 24 h MONITOR soak in progress.
  • A.3 — explicit X-Fleet-Preamble emission from gaios-ext shim + Pantheon Room HTTP. Pending.
  • R9 — per-container X-Fleet-Minister injection via openclaw.json proxy.headers. ENFORCE pre-gate. Pending.
  • A.4 — live black-box test #18 (6 surfaces × 3 cases) + ENFORCE flip + 24 h soak. Pending.

Pantheon Room proxy — bidirectional observability layer

Self-healing / auto-recovery layer

Honest open gaps
  • R9 DESIGN PIVOTED 2026-06-01 — original mechanism (proxy.headers in openclaw.json) rejected by OpenClaw v2026.5.26 schema. Sabour selected Option E: port-per-minister at gateway. Critic GO-WITH-CHANGES, 5 blockers folded into canon:amend-r9-port-per-minister-2026-06-01. Implementation still pending: gateway code change + MINISTERS.json gateway_port field + 4 openclaw.json proxyUrl edits + batched 4-minister restart + calibration test additions.
  • Spine version-field gapCLOSED 2026-06-01: 3 spines got version: "1.0.0", _SPINE_FIELD_MAP extended; SpineStateCache reads all 4 (dcoa + fleet-redesign + gaios-infra + urf-flag-1).
  • Userbot membership gap — still: @gaios_conductor not in 2 of 4 production groups; bundled Sabour ask drafted, pending.
  • HEPH idle 4+ h on 2026-05-31 — no follow-up driver firing. Detection productionized 2026-06-01T00:06Z: PCS Heartbeat v2 LIVE (16.2h soak passed, DRY_RUN flipped). Full idle-recovery validation drill still pending after ENFORCE.
  • precommit CLI absent from HQ sandbox PATH — partial: source files identified at /home/ubuntu/api/precommit_server.py + precommit_soul_drift_check.py; sandbox PATH symlink target TBD.
  • NEW (CRITICAL, FOUND + FIXED 2026-06-01): cache_unready bug — MinistersCache loader expected flat dict, MINISTERS.json envelope (added 2026-05-30 with grotto-suite) broke it. 24h MONITOR soak was contaminated; would have caused 503 on every minister send under ENFORCE. One-line fix landed; cache_unready stopped firing.
  • NEW (PENDING): A.3 explicit X-Fleet-Preamble emission from gaios-ext shim + Pantheon Room HTTP endpoints.
  • NEW (PENDING): A.4 calibration test #18 — matrix expanded per critic to include port-authoritative + port/preamble-mismatch + startup-validation cases.
  • NEW (PENDING): Apollo doctrine ambient drift — pre-restart sessions (PROM at minimum) may still emit tool-body leak ("Now let me reply to HEPH"); fully active only after next session boundary per minister.

All PCS work canon-recorded. Architecture canon trail: 2561d254 → 831e52c5 → 52f05611 → 1b50654a. PCS naming ratified by Sabour 2026-05-31 ~15:30 AEST.

2026-06-01 · 05:00Z update — PCS v3 essentially COMPLETE. R9 Steps 1-10 LANDED. Phase 5 ENFORCE FLIPPED 02:28:39Z (task #117 CLOSED). Initial narrow R4 caused ATHENA crash-loop + LLM blocking; expansion + then wildcard CONNECT pivot per Sabour directive. Gateway now operates as identity-trusted observatory layer (R4 firewalling + 407 enforcement dropped; 11 other functions retained). ARGUS Panoptes assigned new role as egress anomaly watcher (canon:argus-myclaw §all-seeing). Honest open gaps refreshed: cache_unready CLOSED, spine version CLOSED, HEPH idle CLOSED (PCS Heartbeat v2 LIVE), R9 CLOSED. NEW gap: ARGUS observability implementation pending (scheduled fleet_events scanner). Apollo dispatch msg-3506 still in-flight — needs new TG inbound to wake him.

New canon since this card was last updated: arch-pcs-heartbeat-v2-2026-05-31 (LIVE), fact-pcs-bootstrap-red-artifacts-2026-06-01, rule-runtime-dependency-coordination-2026-06-01, rule-rendered-surface-acceptance-2026-06-01, fact-openclaw-sendmessage-fallback-defect-2026-06-01, amend-r9-port-per-minister-2026-06-01, argus-soul-hardening-2026-05-31 (TOMBSTONED FULFILLED).

Next session queue: implement R9 port-per-minister gateway code + MINISTERS.json gateway_port field + 4 openclaw.json proxyUrl edits + calibration test additions (port-authoritative + mismatch + startup validation) → A.3 explicit preamble emission from gaios-ext + Pantheon Room → A.4 live black-box test #18 (18+ cases) → Step 5 MONITOR→ENFORCE flip + 1h watch + 24h soak → Week 4 four-project parallel pilot.