Consult - Multi-Agent Consensus for Architecture Decisions

The Problem

Architecture decisions fail at integration points. A single perspective—human or LLM—optimizes for one dimension while missing others.

✗

Schema normalized for queries, but connection pool exhaustion under load

✗

Microservices boundary clean, but distributed transaction hell

✗

Auth flow secure, but latency budget blown on token validation

✗

Cache invalidation "solved" until eventual consistency bites

Single-agent LLMs echo back your framing. Peer review surfaces the friction points.

How It Works

CONSENSUS WORKFLOW

1

PARALLEL ANALYSIS

Database
Agent

Backend
Agent

Infra
Agent

Domain-specific
system prompts

2

PEER REVIEW

Each agent critiques the others' solutions

"Would I sign off on THIS for production?"

3

META REVIEW

Cross-cutting analysis: integration issues, gaps, conflicts

4

REVISION

Agents incorporate feedback, revise their solutions

5

APPROVAL VOTE

Each agent: "Would I sign off on THEIR solution?"

≥80% approval → consensus reached

<80% approval → iterate or orchestrator resolves

6

SYNTHESIS

Presentation agent merges all expert solutions into one unified recommendation

Convergent insights + resolved trade-offs + actionable next steps

Why This Works

The skeptical question: "Why not just prompt a single LLM to consider multiple perspectives?"

Separation of concerns

Each agent has a focused system prompt. A database agent isn't trying to also think about security—it reviews the security agent's work instead. Specialization without context pollution.

Explicit disagreement surfaces

When Agent A objects to Agent B's solution, you see the specific critique. Single-LLM "multi-perspective" prompts tend to smooth over conflicts. Peer review makes friction visible.

Iteration with feedback loops

Agents revise based on peer critique, not just their own re-reading. The revision incorporates external signal, not just self-consistency checks.

Quantified confidence

73% consensus is information. "I think this is good" isn't. The approval scores tell you where disagreement lives—and that's often where bugs hide.

The Math — Transparent

Not "how similar are outputs?" but "would each agent approve the others' work?"

# 3 agents = 6 pairwise reviews

Database → Backend:  APPROVE  (1.0)
Database → Infra:    CONCERNS (0.7)
Backend  → Database: APPROVE  (1.0)
Backend  → Infra:    OBJECT   (0.0)
Infra    → Database: CONCERNS (0.7)
Infra    → Backend:  APPROVE  (1.0)

─────────────────────────────────
Aggregate: (1.0 + 0.7 + 1.0 + 0.0 + 0.7 + 1.0) / 6 = 73%

Threshold: 80%
Result: Iterate or orchestrator resolves

✓ APPROVE (1.0) Production-ready

~ CONCERNS (0.7) Acceptable with noted issues

✗ OBJECT (0.0) Fundamental problems

How scores are derived

1

5 weighted dimensions Requirements 30% · Approach 25% · Trade-offs 20% · Architecture 15% · Implementation 10%

2

Categorical verdict per dimension APPROVE → 1.0 · CONCERNS → 0.7 · OBJECT → 0.0

3

Weighted sum = final approval Structured output, deterministic aggregation.

Capabilities

Beyond the core workflow—features that make Consult practical for real engineering work.

⚡

Team Mode: Cross-Provider Competition

Run the same query across Anthropic, OpenAI, and Google simultaneously. Agents from different providers critique each other's solutions. Claude reviews GPT's architecture. Gemini challenges Claude's assumptions. Disagreement across model families surfaces blind spots that single-provider analysis misses.

consult -p "..." -m team

💬

Smart Clarification

Before burning tokens on full analysis, a lightweight pre-flight detects ambiguous queries. Asks only high-impact questions: scope boundaries, constraints, success criteria. Skips clarification for follow-ups clearly scoped by prior context. Explains why each question matters.

Automatic—triggers when ambiguity detected

🔄

Multi-Turn Sessions

Follow-up queries preserve full context. "Now add rate limiting to that design"—without re-explaining your schema, constraints, or prior decisions. Session state persisted to ~/.consult/sessions/. Resume conversations across terminal sessions.

Just keep typing in TUI, or use session flags in CLI

📎

Attachment Intelligence

Drop in your schema.sql, architecture diagrams, error logs. PDFs automatically converted to images for providers lacking native support. Provider-specific size limits enforced gracefully. Conversion cached—no redundant processing across workflow phases.

F key in TUI, or --attach in CLI

🧠

Memory Compaction

Long conversations get AI-summarized when context window fills. Preserves: original question, final solution, key insights and constraints. Discards: intermediate back-and-forth, superseded ideas. Inspired by Claude Code's context management strategy.

C key in TUI, or automatic when threshold exceeded

🎛️

Model Flexibility

Cost-optimized defaults: Haiku, GPT-4o-mini, Gemini Flash for expert agents. SOTA model (Opus) reserved for meta-review synthesis only. Override any model via environment variables. Switch providers mid-session in TUI.

ANTHROPIC_MODEL=claude-sonnet-4-20250514 consult ...

CLI Usage

Prefer scripts and pipelines? The CLI does the same workflow without the interface.

Event sourcing trade-offs database, backend, architect

$ consult -p "Order service: event sourcing vs state-based. 100k orders/day, \
  need audit trail, eventual consistency for reads, strong for inventory" \
  -e "database_expert,backend_expert,software_architect"

Multi-tenant auth architecture security_focused preset, 3 iterations

$ consult -p "SaaS auth: OAuth2 + SAML, tenant isolation, session management \
  across subdomains, SOC2 compliance. Pain point: JWT token bloat" \
  -e security_focused -i 3

Cache invalidation strategy database, backend, performance

$ consult -p "Product cache: 50k SKUs, prices from ERP every 15min, inventory \
  real-time. Redis 5min TTL showing stale prices. Avoid cache stampede" \
  -e "database_expert,backend_expert,performance_expert"

Zero-downtime migration architecture preset, 90% threshold

$ consult -p "Migrate user table UUID→ULID: 200M rows, 50+ FK references, \
  zero-downtime. Aurora PostgreSQL. Evaluate dual-write vs shadow vs CDC" \
  -e architecture -t 0.9

Terminal Interface

Not a web dashboard. A proper terminal UI with collapsible workflow visualization, live consensus tracking, and keyboard-driven navigation.

CONSULT — Many Agents, One Answer

● PHASE 1: Initial Analysis [COMPLETE]

database_agent done

"Recommending event sourcing with CQRS..."

backend_agent done

"Async message queue between services..."

infrastructure_agent done

"K8s with horizontal pod autoscaling..."

● PHASE 2: Peer Review [IN PROGRESS]

database → backend: APPROVE

database → infra: CONCERNS "connection pooling..."

backend → database: APPROVE

backend → infra: reviewing...

○ PHASE 3: Meta Review [PENDING]

○ PHASE 4: Revision [PENDING]

○ PHASE 5: Approval Vote [PENDING]

○ PHASE 6: Synthesis [PENDING]

ACTIVITY LOG

14:23:01 backend_agent reviewing infra_agent...

14:23:03 Pairwise: 1.0 0.7 1.0

14:23:05 Aggregate: 73% (need 80%)

Enter query... [Send]

D Expand any agent card to see full response + peer feedback received

L Toggle activity log — real-time consensus math as it computes

ESC Collapse all phases — see just the status overview

F Attach files — schema.sql, architecture.md, error logs

Y Copy final consensus to clipboard

C Compact memory — summarize conversation, free context window

When to Use This

5-10× token cost. 45-150s latency. Apply when reversal cost exceeds analysis cost.

1

Can you reverse this decision in under a day?

YES → Skip Consult. Ship it, learn, iterate.

NO → ↓

2

Does this touch ≥2 of: schema, API contracts, auth, infrastructure, pipelines?

YES → Use Consult. Cross-domain constraints collide.

NO → ↓

3

Would a wrong answer cause outage, data loss, or security incident?

YES → Use Consult. MTTR is sprints, not hours.

NO → Skip Consult. Standard LLM is faster.

What the peer review actually catches

You submit a question. Multiple expert agents analyze it independently, then critique each other. Here's what that looks like:

USE CONSULT

Your question

"PK migration: UUID to BIGINT, 200M rows, 47 FK constraints, 3k QPS reads, near-zero downtime requirement"

What each expert says

Database Expert Add new column, backfill, then swap — standard online DDL

Backend Expert 47 FKs = 47 ALTER TABLEs, each takes ACCESS EXCLUSIVE lock. Sequential locks under 3k QPS → connection pile-up.

Infra Expert 200M rows at ~6k/sec = 9+ hr backfill. WAL generation outpaces replica apply → replication lag alerts.

Each expert's solution is valid in isolation. Peer review surfaces the conflicts before they hit production.

USE CONSULT

Your question

"Order service: event sourcing vs state-based. 100k orders/day, 7-year retention for compliance, strong consistency for inventory"

What each expert says

Architect Event sourcing — immutable audit log built-in, replay for debugging

Database Expert 100k × 50 events × 365 × 7 = 12.7B rows. ES query patterns ≠ OLTP. Needs separate read models.

Backend Expert "Strong consistency for inventory" contradicts ES eventual consistency. Synchronous decrement breaks ES purity.

Compliance audit ≠ event sourcing mandate. CDC + append-only audit table achieves retention without ES complexity.

Standard LLM is fine for:

Syntax lookups "PostgreSQL partial index syntax"

Single-domain patterns "useDebounce hook in React"

Reversible refactors "Convert callback to async/await"

What Consult is NOT

✗

Not Cursor, Windsurf, or Claude Code

Those are code editors—they write code in your IDE. Consult is for architecture decisions before you write code. Use Consult to decide what to build, then your editor to build it. Complementary, not competitive.

✗

Not a replacement for domain expertise

You still need to understand your problem. Consult enhances analysis, it doesn't replace your judgment.

✗

Not magic

It's LLMs with structure. Better than raw ChatGPT for architecture decisions, but still LLM-powered with LLM limitations.

✗

Not a black box

Full reasoning chains visible. Every approval/objection shows rationale. You see exactly why agents agreed or disagreed.

Pricing

Not a chat wrapper. A structured peer review workflow that surfaces disagreement between perspectives.
BYOK model — you bring your own API keys, pay providers directly. We don't upcharge.

Free

$0/month

2 experts max (-e essentials)
3 queries/day
1 iteration
CLI only

Get Started

Pro Monthly

$9 USD/month

Unlimited agents
100 queries/day
5 iterations
CLI + TUI
All agent sets + custom
Session persistence
Attachments (PDF, images)
Markdown export
Team mode

Pro Annual

$90 USD/year

Save $18 — 2 months free

Everything in Pro Monthly
Billed annually
$7.50/month effective

Security Model

Your API keys and data never leave your machine. Here's exactly how.

●

API keys stored locally, never transmitted

Keys read from ~/.consult/.env or environment variables. Loaded into memory at runtime, never written to logs, never sent over network except directly to the provider (Anthropic/OpenAI/Google) via HTTPS.

Verify: tcpdump or mitmproxy shows only outbound connections to api.anthropic.com, api.openai.com, generativelanguage.googleapis.com.

●

No telemetry, no analytics, no phone-home

Zero outbound connections to our servers. License validation is cryptographic signature verification performed locally—no network call required. We don't know who's using Consult, how often, or for what.

Verify: Block all outbound traffic except LLM providers. Consult continues to work.

●

Prompts and responses stay on disk you control

Session history stored in ~/.consult/sessions/. Logs in ~/.consult/logs/. Both are local filesystem—encrypt at rest with FileVault/LUKS if your policy requires. No cloud sync, no external backup.

Verify: ls -la ~/.consult/ shows all persisted data. Delete anytime.

●

Log redaction for sensitive content

API keys pattern-matched and redacted in logs (sk-ant-***, sk-proj-***). Full prompt content logged only at DEBUG level (disabled by default). Production logging shows workflow events, not payload content.

Verify: grep -r "sk-" ~/.consult/logs/ returns zero matches.

●

Your data goes to LLM providers—that's the trade-off

We can't prevent Anthropic/OpenAI/Google from seeing your prompts—that's how LLMs work. If your compliance requires on-prem inference, Consult isn't the right tool. We're transparent about this boundary.

Mitigation: Use providers with data retention opt-outs. Anthropic API has zero-retention by default.

Agent Types

Each agent has domain-specific prompts that shape analysis. Peer review catches what individual perspectives miss.

database_expert Data modeling, queries, consistency, migrations

backend_expert API design, service boundaries, error handling

infrastructure_expert Deployment, scaling, monitoring, reliability

security_expert Threat models, auth, input validation, compliance

performance_expert Bottlenecks, caching, profiling, optimization

software_architect System design, trade-offs, patterns

cloud_engineer Cloud services, IaC, containers, DevOps

frontend_expert UI architecture, state management, rendering

ml_expert ML systems, training, inference, MLOps

data_expert Pipelines, ETL, streaming, warehousing

ux_expert User research, interaction design, accessibility

Predefined Sets

Curated combinations for common scenarios. Use -e set_name in CLI.

essentials Free

Quick sanity check for frontend/backend alignment

backend frontend

default Pro

Core infrastructure triad for most backend decisions

database backend infrastructure

security_focused Pro

Auth, compliance, threat modeling emphasis

security backend infrastructure

architecture Pro

System design, trade-offs, cloud-native patterns

architect database cloud

full_stack Pro

End-to-end coverage for feature development

backend frontend database infra

Installation

# Install
$ pip install getconsult

# Configure API key (at least one required)
$ mkdir -p ~/.consult
$ echo 'ANTHROPIC_API_KEY=sk-ant-...' > ~/.consult/.env
$ chmod 600 ~/.consult/.env

# Or export directly
$ export ANTHROPIC_API_KEY=sk-ant-...

# Verify setup
$ consult --status

# Run a query (free tier requires -e essentials)
$ consult -p "Migrating users table: add tenant_id to 50M rows, \
  UUID PK, 12 FKs, 2k QPS. Zero downtime. What can go wrong?" -e essentials

Supports Anthropic, OpenAI, and Google models.
Default uses cost-optimized models (Haiku, GPT-4o-mini, Gemini Flash).