Query Provenance Store
A companion spec to the Tabular Manifold Spec (TMS).
Where queries go to be remembered.
1. What Is QPS?
The Query Provenance Store is a registry that maintains the relationship between manifolds and the queries that generated them.
QPS is:
- Not a query engine (it doesn't execute queries)
- Not a cache (it doesn't store results)
- A provenance ledger — it records what query built what manifold, and tracks what happens when those queries are re-executed
TMS manifolds reference QPS entries. QPS entries contain the actual reconstruction logic.
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ TMS Manifold │────────▶│ QPS Entry │────────▶│ Data Platform │
│ (cognitive) │ ref │ (provenance) │ exec │ (source) │
└─────────────────┘ └─────────────────┘ └─────────────────┘2. Core Principles
2.1 Separation of Concerns
| Layer | Responsibility |
|---|---|
| TMS Manifold | What the data shows (cognitive interface) |
| QPS Entry | How the data was produced (provenance) |
| Data Platform | Where the data lives (execution) |
Manifolds stay clean and portable. Provenance stays auditable. Execution stays flexible.
2.2 Immutable Generation, Mutable Execution History
A QPS entry has two parts:
- Generation record: Immutable. Captures exactly what query produced the manifold.
- Execution log: Append-only. Tracks every replay and whether drift occurred.
2.3 Drift Detection as a First-Class Concept
When a reconstruction query returns different data than at generation time, that's drift. QPS doesn't prevent drift — it detects and records it.
3. QPS Entry Schema
3.1 Required Fields
| Field | Type | Description |
|---|---|---|
| qps_id | string | Unique identifier (should match manifold_id for 1:1 cases) |
| qps_version | string | Spec version (e.g., "1.0") |
| query | object | The reconstruction query (see §3.2) |
| generation | object | Metadata about when/how the manifold was built (see §3.3) |
3.2 Query Block
{
"query": {
"dialect": "databricks_sql",
"template": "SELECT ts, unit_price, po_id, notes FROM silver.price_events WHERE part_id = :part_id AND supplier_id = :supplier_id AND ts >= :start AND ts < :end ORDER BY ts",
"params": {
"part_id": "P-123456",
"supplier_id": "S-789",
"start": "2025-01-01",
"end": "2026-01-01"
},
"param_types": {
"part_id": "string",
"supplier_id": "string",
"start": "date",
"end": "date"
}
}
}Supported Dialects
| Dialect | Description |
|---|---|
| databricks_sql | Databricks SQL warehouse |
| snowflake | Snowflake SQL |
| bigquery | Google BigQuery Standard SQL |
| postgres | PostgreSQL |
| duckdb | DuckDB |
| mcp_tool | Opaque MCP tool call (see §3.2.1) |
3.2.1 Opaque Tool References
When the query shouldn't be exposed (security, complexity, or abstraction reasons):
{
"query": {
"dialect": "mcp_tool",
"tool_name": "get_price_telemetry",
"tool_args": {
"manifold_id": "mfld_abc123"
}
}
}The tool implementation handles actual query execution internally.
3.3 Generation Block
{
"generation": {
"generated_at": "2026-01-14T22:30:00Z",
"generated_by": "tms_generator_v1.2",
"source_dataset": "silver.price_events",
"source_dataset_version": "v5.2",
"row_count": 517,
"checksum": "sha256:a1b2c3d4e5f6...",
"checksum_method": "row_content_hash"
}
}Checksum Methods
| Method | Description |
|---|---|
| row_content_hash | SHA256 of sorted, serialized row content |
| row_count_only | Just the count (weak but cheap) |
| column_stats_hash | Hash of min/max/sum per column |
| none | No checksum computed |
3.4 Optional Fields
| Field | Type | Description |
|---|---|---|
| executions | array | Log of reconstruction attempts (see §4) |
| access_control | object | Who can execute this query |
| ttl | object | Retention policy for this entry |
| related_manifolds | array | Other manifolds built from the same query |
| notes | string | Human-readable context |
4. Execution Log Schema
Each time a reconstruction query is executed, an entry is appended:
{
"executions": [
{
"executed_at": "2026-01-15T10:00:00Z",
"executed_by": "agent_session_xyz",
"execution_context": "manifold_drilldown",
"row_count": 517,
"checksum": "sha256:a1b2c3d4e5f6...",
"drift_detected": false,
"execution_time_ms": 234
},
{
"executed_at": "2026-01-16T14:30:00Z",
"executed_by": "human_debug_session",
"execution_context": "manual_audit",
"row_count": 523,
"checksum": "sha256:d4e5f67890ab...",
"drift_detected": true,
"drift_type": "row_count_increase",
"drift_delta": {
"row_count_expected": 517,
"row_count_actual": 523,
"rows_added": 6,
"rows_removed": 0
},
"drift_note": "Late-arriving POs from batch reconciliation",
"execution_time_ms": 287
}
]
}Drift Types
| Type | Description |
|---|---|
| row_count_increase | More rows than at generation |
| row_count_decrease | Fewer rows than at generation |
| content_change | Same row count, different content |
| schema_change | Column structure changed |
| query_failure | Query no longer executes |
5. TMS Integration
5.1 Manifold Reference to QPS
In a TMS manifold, the lineage block references QPS:
{
"lineage": {
"manifold_id": "mfld_abc123",
"qps_id": "mfld_abc123",
"reconstruction_available": true
}
}Or with explicit tool routing:
{
"lineage": {
"manifold_id": "mfld_abc123",
"qps_id": "mfld_abc123",
"reconstruction_available": true,
"reconstruction_method": "mcp_tool",
"tool_name": "qps_reconstruct",
"tool_args_template": {
"qps_id": "mfld_abc123"
}
}
}5.2 MCP Tool Patterns
@mcp_tool
def qps_reconstruct(qps_id: str, log_execution: bool = True) -> dict:
"""
Reconstruct telemetry from a QPS entry.
Args:
qps_id: The QPS entry identifier
log_execution: Whether to append to execution log (default: True)
Returns:
{
"rows": [...],
"drift_detected": bool,
"drift_summary": {...} | null
}
"""
# 1. Look up QPS entry
# 2. Execute query
# 3. Compare checksum to generation
# 4. Log execution if requested
# 5. Return rows + drift status@mcp_tool
def qps_check_drift(qps_id: str) -> dict:
"""
Check if a QPS entry would return different data than at generation.
Does NOT log an execution (dry run).
Returns:
{
"drift_detected": bool,
"drift_type": str | null,
"drift_delta": {...} | null
}
"""@mcp_tool
def qps_get_entry(qps_id: str) -> dict:
"""
Retrieve the full QPS entry including query and execution history.
For human debugging and audit.
"""6. Complete Example
QPS Entry
{
"qps_id": "mfld_price_P123456_S789_2025",
"qps_version": "1.0",
"query": {
"dialect": "databricks_sql",
"template": "SELECT ts, unit_price, po_id, notes FROM silver.price_events WHERE part_id = :part_id AND supplier_id = :supplier_id AND ts >= :start AND ts < :end ORDER BY ts",
"params": {
"part_id": "P-123456",
"supplier_id": "S-789",
"start": "2025-01-01",
"end": "2026-01-01"
},
"param_types": {
"part_id": "string",
"supplier_id": "string",
"start": "date",
"end": "date"
}
},
"generation": {
"generated_at": "2026-01-14T22:30:00Z",
"generated_by": "tms_generator_v1.2",
"source_dataset": "silver.price_events",
"source_dataset_version": "v5.2",
"row_count": 517,
"checksum": "sha256:a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890",
"checksum_method": "row_content_hash"
},
"executions": [
{
"executed_at": "2026-01-15T10:00:00Z",
"executed_by": "procurement_agent_v2",
"execution_context": "price_anomaly_investigation",
"row_count": 517,
"checksum": "sha256:a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890",
"drift_detected": false,
"execution_time_ms": 234
}
],
"access_control": {
"allowed_roles": ["procurement_analyst", "agent_service_account"],
"requires_audit_log": true
},
"notes": "Price history for Acme Industrial Supply on stainless steel widgets. July 2025 shows expedite fee anomalies."
}Corresponding TMS Manifold (lineage block only)
{
"lineage": {
"manifold_id": "mfld_price_P123456_S789_2025",
"qps_id": "mfld_price_P123456_S789_2025",
"computed_at": "2026-01-14T22:30:00Z",
"computed_by": "tms_generator_v1.2",
"reconstruction_available": true,
"reconstruction_method": "mcp_tool",
"tool_name": "qps_reconstruct",
"tool_args_template": {
"qps_id": "mfld_price_P123456_S789_2025"
}
}
}7. Storage Considerations
QPS doesn't mandate a storage backend. Implementations could use:
| Backend | Tradeoffs |
|---|---|
| PostgreSQL/MySQL | ACID, familiar, good for moderate scale |
| Document store (Mongo, Cosmos) | Flexible schema, easy JSON |
| Delta Lake / Iceberg | Co-located with data platform, time travel |
| Git repository | Version control, human-readable, audit trail |
| Embedded in manifold | No external dependency, but loses execution logging |
The key requirements are:
- Generation records are immutable
- Execution logs are append-only
- Queries are retrievable by qps_id
- Checksums can be verified
8. Security Notes
- Query templates may contain sensitive schema information. Access to QPS entries should be controlled.
- Execution logs reveal access patterns. Consider retention policies.
- Parameterized queries only. Never store interpolated SQL — always template + params.
- The
mcp_tooldialect exists precisely for cases where query exposure is unacceptable.
Relationship to TMS Versioning
| TMS Version | QPS Support |
|---|---|
| TMS 1.0 | No QPS reference (inline queries or no reconstruction) |
| TMS 1.1 | Optional QPS reference via lineage block |
| TMS 1.2+ | Recommended QPS reference for all reconstructable manifolds |
QPS is backwards-compatible. Manifolds without QPS references continue to work; they just aren't reconstructable via the standard pattern.
Changelog
v1.0 (2026-01-15)
- Initial specification
- Core schema: query block, generation block, execution log
- Drift detection framework
- TMS integration pattern
- MCP tool patterns
License
QPS is released under Apache 2.0, same as TMS.
QPS exists because debuggability shouldn't be an afterthought.
When an agent makes a decision, you should be able to see exactly what it saw — and whether reality has changed since.