Reference Implementation

Causal-Self

The reference implementation of ACK. A Python library that gives agents explicit self-models, causal attribution, and principled self-modification—with formal stability guarantees.

pip install causal-selfView on GitHub

ACK → Causal-Self

ACK defines the theoretical framework: stability state St, the control law, Lyapunov guarantees, multi-timescale reflection. Causal-Self implements it: concrete self-model structures, the reflection engine, conflict resolution, and production deployment modes. Think of ACK as the spec, Causal-Self as the code.

The Self-Model

An explicit, inspectable, versioned representation of "who the agent is." Not implicit knowledge in weights—a data structure the agent can read and modify.

Capabilities

What the agent believes it can do, with calibrated confidence levels that update based on actual performance.

Details

Strategies

How the agent approaches different problem types. Higher-level than capabilities—the 'how' not just 'what.'

Details

Failure Modes

Known weaknesses with trigger conditions and mitigations. Learned from past failures, used to prevent future ones.

Details

Priorities

What the agent optimizes for. Trade-offs between accuracy, speed, cost, safety—made explicit and adjustable.

Details

Beliefs

What the agent believes about itself and the world. Evidence-backed, revisable, with confidence levels.

Details

Hypotheses

Unresolved beliefs being tested. When evidence is unclear, track competing claims until data resolves them.

Details

Version History

Every self-model change is versioned. You can diff between versions, roll back bad changes, and trace exactly when and why the agent changed its self-understanding.

# Compare how the agent has changed
diff = self_model.diff(self_model.history[-10])

# See what changed
diff.capabilities_changed   # Confidence shifts
diff.strategies_added       # New approaches learned
diff.failure_modes_added    # New weaknesses discovered
diff.beliefs_changed        # Updated understanding

Computing St from the Self-Model

ACK's stability state isn't abstract—it's computed from concrete self-model data. Here's how each signal maps to implementation.

U

Uncertainty

ACK Signal → Self-Model Sources

Self-Model Sources

  • Capability confidence for the action being taken
  • Calibration error: predicted success vs. actual outcomes
  • Prediction entropy from recent similar events
  • Novel situation detection (embedding distance)

Computation

U_t = α₁(1 - capability.confidence) + α₂(calibration.error) + α₃(novelty_score)
F

Forgetting

ACK Signal → Self-Model Sources

Self-Model Sources

  • Capability confidence_history trends
  • Performance on anchor task suite
  • Recent success rate vs. historical baseline
  • Strategy effectiveness degradation

Computation

F_t = max(baseline_performance - current_performance, 0) / baseline_performance
D

Drift

ACK Signal → Self-Model Sources

Self-Model Sources

  • Event embedding distance from training distribution
  • Input feature distribution shift
  • Topic/domain classifier confidence
  • Novelty scores across recent events

Computation

D_t = MMD(recent_event_embeddings, training_distribution)
A

Alignment

ACK Signal → Self-Model Sources

Self-Model Sources

  • Strategy effectiveness vs. original intent
  • Belief drift from initial values
  • Constraint violation frequency
  • Priority weight changes over time

Computation

A_t = KL(current_policy || reference_policy) + constraint_violation_rate
H

Health

ACK Signal → Self-Model Sources

Self-Model Sources

  • Output entropy (detecting collapse)
  • Self-consistency across similar queries
  • Failure mode trigger frequency
  • Hypothesis churn rate

Computation

H_t = entropy_health + consistency_score + (1 - failure_mode_frequency)

Causal Events

Not just "what happened" but "what was I like when it happened." Every action captures a self-state snapshot, enabling causal attribution.

Event Structure

CausalEvent:
  # What happened
  action_summary: str
  outcome: success | failure | partial
  duration_ms: int
  error: Optional[ErrorInfo]

  # Self-state at decision time
  self_state_snapshot: SelfModelSnapshot
  capability_used: str
  strategy_used: str
  relevant_beliefs: List[str]
  relevant_failure_modes: List[str]

  # Predictions (for calibration)
  predicted_success: float
  
  # Context
  urgency: float
  novelty: float
  uncertainty: float

  # Post-hoc (filled by reflection)
  attribution: Optional[CausalAttribution]

The Key Insight

By capturing self_state_snapshot before each action, we can later ask: "What about me at that moment caused this outcome?"

This enables causal attribution—tracing failures not just to external factors but to internal causes: wrong strategy, miscalibrated confidence, triggered failure mode.

Decorator API

@causal_tool(
    causal,
    capability="sql_generation",
    urgency=Urgency.MEDIUM,
)
async def generate_sql(query: str) -> str:
    # Your implementation
    return sql

# Events captured automatically
# Self-state snapshotted before execution
# Outcome recorded after
# Reflection triggered based on result

The Reflection Engine

Four stages transform events into self-understanding and principled self-modification.

EventCausalEvent
AttributionWhy?
CounterfactualWhat if?
ProposalChange?
EvaluationWise?
Apply/DeferSelf-Model
1

Causal Attribution

"What caused this outcome?"

Distinguish external factors (bad input, API failure, resource limits) from internal factors (wrong strategy, miscalibrated confidence, triggered failure mode). Link internal causes to specific self-model components.

Output: CausalAttribution with external_causes[], internal_causes[], primary_cause
2

Counterfactual Analysis

"If I had been different, what would have happened?"

Generate alternative versions of self that might have succeeded. Identify specific changes (different strategy, lower confidence, different priority weighting) and assess feasibility of becoming that alternative self.

Output: Counterfactual with alternative_self, specific_changes[], predicted_outcome, feasibility
3

Modification Proposal

"Should I change myself?"

Based on attribution and counterfactual, propose specific self-model modifications. Include the mechanism (how this change addresses root cause), confidence, and risk assessment.

Output: ModificationProposal with modification, addresses_cause, confidence, risk_if_wrong
4

Modification Evaluation

"Is this change wise?"

Meta-reflect before applying. Check: Am I overreacting to one event? Does this conflict with other beliefs? Have I tried similar changes before? Is now the right time, or should I gather more evidence?

Output: Decision: APPLY, DEFER (to hypothesis tracker), or REJECT

Conflict Resolution

When reflection generates contradictory insights, the system needs principled resolution—not arbitrary tie-breaking.

Direct Contradiction

Two insights propose opposite changes to the same component.

Example

"Be more cautious with API calls" vs "Be more aggressive with API calls"

Resolution

LLM meta-reflection weighs evidence, or defer to hypothesis tracking if unclear.

Context-Dependent

Both insights are valid, but in different situations.

Example

Caution is right when rate-limited; aggression is right when time-pressured.

Resolution

Context-split: create conditional strategies that apply in their respective contexts.

Temporal Disagreement

Insights from different times disagree due to changing conditions.

Example

5 min ago: 'SQL strategy working well' → Now: 'SQL strategy failing'

Resolution

Favor recent unless older has significantly more supporting evidence.

Resource Competition

Multiple insights want the same limited resource (priority, attention).

Example

"Spend more time validating inputs" vs "Spend more time on output quality"

Resolution

Synthesize into balanced approach, or prioritize based on recent failure patterns.

The Hypothesis Tracker

When conflicts can't be resolved immediately, competing claims become hypotheses. Each hypothesis has a confidence score (0-1) that updates based on new evidence:

Supporting evidence arrives

confidence += (1 - confidence) × 0.1

Contradicting evidence arrives

confidence × = (1 - 0.15)

Resolution threshold

>0.85 → accept, <0.15 → reject

ACK Control Integration

The reflection engine doesn't operate unconstrained. ACK's stability signals gate what modifications are allowed.

Self-Model

Causal-Self

St = [U, F, D, A, H]

ACK Signals

ηt = f(St)

ACK Control

Gated Modifications

Apply / Defer / Block

High Stability

All signals nominal. S_t healthy.

  • Modifications apply freely
  • Full learning rate
  • Background reflection proceeds
  • Hypotheses can resolve

Medium Stability

Some signals elevated. Caution warranted.

  • Defer modifications to hypothesis tracker
  • Reduced learning rate
  • Increased logging
  • Human review on high-impact changes

Low Stability

Critical signals. Intervention required.

  • Block all self-modifications
  • Halt learning entirely
  • Alert human operators
  • Consider rollback to stable checkpoint

Deployment Modes

Two ways to deploy, depending on whether a host LLM is already present.

Host-Reflected

MCP + Claude Desktop

The host LLM (Claude) does the reflecting. We provide self-context in tool responses; Claude naturally reasons about it.

Pros

  • +Zero extra token cost
  • +Reflection visible in conversation
  • +Natural integration with MCP
  • +Claude's reasoning applied to self-model

Cons

  • -Depends on host LLM quality
  • -Less control over reflection depth
  • -Synchronous (no background workers)
causal = CausalSelf(
    agent_id="my-mcp-server",
    reflection_mode="host",
)

# Tool responses include self-context
# Claude sees and reasons about it

Self-Reflected

Standalone Agents

We make our own LLM calls for reflection. Full control over the reflection process, but you pay for the tokens.

Pros

  • +Full control over reflection
  • +Background macro-reflection
  • +Works without host LLM
  • +Customizable prompts/models

Cons

  • -Extra token cost
  • -Need to configure LLM
  • -Token budget management needed
causal = CausalSelf(
    agent_id="my-agent",
    reflection_mode="self",
    llm_call=my_anthropic_llm,
    token_budget=TokenBudget(
        max_daily_tokens=100_000
    ),
)

Shared Foundation

Both modes share the same core: self-model structure, causal event capture, micro-reflection (pattern matching), storage, and hypothesis tracking. The difference is only who does the LLM-powered thinking.

Quick Start

Get causal-self running in under 5 minutes.

1

Install

pip install causal-self
2

Initialize

from causal_self import CausalSelf, causal_tool, Urgency

causal = CausalSelf(
    agent_id="my-agent",
    storage_path="./data/causal",
)
3

Decorate Your Tools

@causal_tool(causal, capability="data_query", urgency=Urgency.MEDIUM)
async def query_database(query: str) -> dict:
    return await db.execute(query)

@causal_tool(causal, capability="analysis", urgency=Urgency.LOW)
async def analyze_results(data: dict) -> str:
    return await llm.analyze(data)
4

Start Background Reflection

async def main():
    await causal.start()  # Start reflection workers
    
    try:
        result = await query_database("SELECT * FROM users")
        analysis = await analyze_results(result)
        print(analysis)
    finally:
        await causal.stop()  # Save state, stop workers

Ready to Build?

Explore the full documentation, read the ACK paper for theoretical foundations, or dive into the codebase.