Back to Research Notes
Specificationv1.0

Query Provenance Store

A companion spec to the Tabular Manifold Spec (TMS).

Where queries go to be remembered.

1. What Is QPS?

The Query Provenance Store is a registry that maintains the relationship between manifolds and the queries that generated them.

QPS is:

  • Not a query engine (it doesn't execute queries)
  • Not a cache (it doesn't store results)
  • A provenance ledger — it records what query built what manifold, and tracks what happens when those queries are re-executed

TMS manifolds reference QPS entries. QPS entries contain the actual reconstruction logic.

┌─────────────────┐         ┌─────────────────┐         ┌─────────────────┐
│   TMS Manifold  │────────▶│   QPS Entry     │────────▶│  Data Platform  │
│   (cognitive)   │  ref    │   (provenance)  │  exec   │  (source)       │
└─────────────────┘         └─────────────────┘         └─────────────────┘

2. Core Principles

2.1 Separation of Concerns

LayerResponsibility
TMS ManifoldWhat the data shows (cognitive interface)
QPS EntryHow the data was produced (provenance)
Data PlatformWhere the data lives (execution)

Manifolds stay clean and portable. Provenance stays auditable. Execution stays flexible.

2.2 Immutable Generation, Mutable Execution History

A QPS entry has two parts:

  • Generation record: Immutable. Captures exactly what query produced the manifold.
  • Execution log: Append-only. Tracks every replay and whether drift occurred.

2.3 Drift Detection as a First-Class Concept

When a reconstruction query returns different data than at generation time, that's drift. QPS doesn't prevent drift — it detects and records it.

3. QPS Entry Schema

3.1 Required Fields

FieldTypeDescription
qps_idstringUnique identifier (should match manifold_id for 1:1 cases)
qps_versionstringSpec version (e.g., "1.0")
queryobjectThe reconstruction query (see §3.2)
generationobjectMetadata about when/how the manifold was built (see §3.3)

3.2 Query Block

{
  "query": {
    "dialect": "databricks_sql",
    "template": "SELECT ts, unit_price, po_id, notes FROM silver.price_events WHERE part_id = :part_id AND supplier_id = :supplier_id AND ts >= :start AND ts < :end ORDER BY ts",
    "params": {
      "part_id": "P-123456",
      "supplier_id": "S-789",
      "start": "2025-01-01",
      "end": "2026-01-01"
    },
    "param_types": {
      "part_id": "string",
      "supplier_id": "string",
      "start": "date",
      "end": "date"
    }
  }
}

Supported Dialects

DialectDescription
databricks_sqlDatabricks SQL warehouse
snowflakeSnowflake SQL
bigqueryGoogle BigQuery Standard SQL
postgresPostgreSQL
duckdbDuckDB
mcp_toolOpaque MCP tool call (see §3.2.1)

3.2.1 Opaque Tool References

When the query shouldn't be exposed (security, complexity, or abstraction reasons):

{
  "query": {
    "dialect": "mcp_tool",
    "tool_name": "get_price_telemetry",
    "tool_args": {
      "manifold_id": "mfld_abc123"
    }
  }
}

The tool implementation handles actual query execution internally.

3.3 Generation Block

{
  "generation": {
    "generated_at": "2026-01-14T22:30:00Z",
    "generated_by": "tms_generator_v1.2",
    "source_dataset": "silver.price_events",
    "source_dataset_version": "v5.2",
    "row_count": 517,
    "checksum": "sha256:a1b2c3d4e5f6...",
    "checksum_method": "row_content_hash"
  }
}

Checksum Methods

MethodDescription
row_content_hashSHA256 of sorted, serialized row content
row_count_onlyJust the count (weak but cheap)
column_stats_hashHash of min/max/sum per column
noneNo checksum computed

3.4 Optional Fields

FieldTypeDescription
executionsarrayLog of reconstruction attempts (see §4)
access_controlobjectWho can execute this query
ttlobjectRetention policy for this entry
related_manifoldsarrayOther manifolds built from the same query
notesstringHuman-readable context

4. Execution Log Schema

Each time a reconstruction query is executed, an entry is appended:

{
  "executions": [
    {
      "executed_at": "2026-01-15T10:00:00Z",
      "executed_by": "agent_session_xyz",
      "execution_context": "manifold_drilldown",
      "row_count": 517,
      "checksum": "sha256:a1b2c3d4e5f6...",
      "drift_detected": false,
      "execution_time_ms": 234
    },
    {
      "executed_at": "2026-01-16T14:30:00Z",
      "executed_by": "human_debug_session",
      "execution_context": "manual_audit",
      "row_count": 523,
      "checksum": "sha256:d4e5f67890ab...",
      "drift_detected": true,
      "drift_type": "row_count_increase",
      "drift_delta": {
        "row_count_expected": 517,
        "row_count_actual": 523,
        "rows_added": 6,
        "rows_removed": 0
      },
      "drift_note": "Late-arriving POs from batch reconciliation",
      "execution_time_ms": 287
    }
  ]
}

Drift Types

TypeDescription
row_count_increaseMore rows than at generation
row_count_decreaseFewer rows than at generation
content_changeSame row count, different content
schema_changeColumn structure changed
query_failureQuery no longer executes

5. TMS Integration

5.1 Manifold Reference to QPS

In a TMS manifold, the lineage block references QPS:

{
  "lineage": {
    "manifold_id": "mfld_abc123",
    "qps_id": "mfld_abc123",
    "reconstruction_available": true
  }
}

Or with explicit tool routing:

{
  "lineage": {
    "manifold_id": "mfld_abc123",
    "qps_id": "mfld_abc123",
    "reconstruction_available": true,
    "reconstruction_method": "mcp_tool",
    "tool_name": "qps_reconstruct",
    "tool_args_template": {
      "qps_id": "mfld_abc123"
    }
  }
}

5.2 MCP Tool Patterns

@mcp_tool
def qps_reconstruct(qps_id: str, log_execution: bool = True) -> dict:
    """
    Reconstruct telemetry from a QPS entry.
    
    Args:
        qps_id: The QPS entry identifier
        log_execution: Whether to append to execution log (default: True)
    
    Returns:
        {
            "rows": [...],
            "drift_detected": bool,
            "drift_summary": {...} | null
        }
    """
    # 1. Look up QPS entry
    # 2. Execute query
    # 3. Compare checksum to generation
    # 4. Log execution if requested
    # 5. Return rows + drift status
@mcp_tool
def qps_check_drift(qps_id: str) -> dict:
    """
    Check if a QPS entry would return different data than at generation.
    Does NOT log an execution (dry run).
    
    Returns:
        {
            "drift_detected": bool,
            "drift_type": str | null,
            "drift_delta": {...} | null
        }
    """
@mcp_tool
def qps_get_entry(qps_id: str) -> dict:
    """
    Retrieve the full QPS entry including query and execution history.
    For human debugging and audit.
    """

6. Complete Example

QPS Entry

{
  "qps_id": "mfld_price_P123456_S789_2025",
  "qps_version": "1.0",
  
  "query": {
    "dialect": "databricks_sql",
    "template": "SELECT ts, unit_price, po_id, notes FROM silver.price_events WHERE part_id = :part_id AND supplier_id = :supplier_id AND ts >= :start AND ts < :end ORDER BY ts",
    "params": {
      "part_id": "P-123456",
      "supplier_id": "S-789",
      "start": "2025-01-01",
      "end": "2026-01-01"
    },
    "param_types": {
      "part_id": "string",
      "supplier_id": "string",
      "start": "date",
      "end": "date"
    }
  },
  
  "generation": {
    "generated_at": "2026-01-14T22:30:00Z",
    "generated_by": "tms_generator_v1.2",
    "source_dataset": "silver.price_events",
    "source_dataset_version": "v5.2",
    "row_count": 517,
    "checksum": "sha256:a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890",
    "checksum_method": "row_content_hash"
  },
  
  "executions": [
    {
      "executed_at": "2026-01-15T10:00:00Z",
      "executed_by": "procurement_agent_v2",
      "execution_context": "price_anomaly_investigation",
      "row_count": 517,
      "checksum": "sha256:a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890",
      "drift_detected": false,
      "execution_time_ms": 234
    }
  ],
  
  "access_control": {
    "allowed_roles": ["procurement_analyst", "agent_service_account"],
    "requires_audit_log": true
  },
  
  "notes": "Price history for Acme Industrial Supply on stainless steel widgets. July 2025 shows expedite fee anomalies."
}

Corresponding TMS Manifold (lineage block only)

{
  "lineage": {
    "manifold_id": "mfld_price_P123456_S789_2025",
    "qps_id": "mfld_price_P123456_S789_2025",
    "computed_at": "2026-01-14T22:30:00Z",
    "computed_by": "tms_generator_v1.2",
    "reconstruction_available": true,
    "reconstruction_method": "mcp_tool",
    "tool_name": "qps_reconstruct",
    "tool_args_template": {
      "qps_id": "mfld_price_P123456_S789_2025"
    }
  }
}

7. Storage Considerations

QPS doesn't mandate a storage backend. Implementations could use:

BackendTradeoffs
PostgreSQL/MySQLACID, familiar, good for moderate scale
Document store (Mongo, Cosmos)Flexible schema, easy JSON
Delta Lake / IcebergCo-located with data platform, time travel
Git repositoryVersion control, human-readable, audit trail
Embedded in manifoldNo external dependency, but loses execution logging

The key requirements are:

  1. Generation records are immutable
  2. Execution logs are append-only
  3. Queries are retrievable by qps_id
  4. Checksums can be verified

8. Security Notes

  • Query templates may contain sensitive schema information. Access to QPS entries should be controlled.
  • Execution logs reveal access patterns. Consider retention policies.
  • Parameterized queries only. Never store interpolated SQL — always template + params.
  • The mcp_tool dialect exists precisely for cases where query exposure is unacceptable.

Relationship to TMS Versioning

TMS VersionQPS Support
TMS 1.0No QPS reference (inline queries or no reconstruction)
TMS 1.1Optional QPS reference via lineage block
TMS 1.2+Recommended QPS reference for all reconstructable manifolds

QPS is backwards-compatible. Manifolds without QPS references continue to work; they just aren't reconstructable via the standard pattern.

Changelog

v1.0 (2026-01-15)

  • Initial specification
  • Core schema: query block, generation block, execution log
  • Drift detection framework
  • TMS integration pattern
  • MCP tool patterns

License

QPS is released under Apache 2.0, same as TMS.

QPS exists because debuggability shouldn't be an afterthought.

When an agent makes a decision, you should be able to see exactly what it saw — and whether reality has changed since.