Tabular Manifold Spec
A cognitive transmission format for AI agents.
Feature engineering as an interface contract.
1. What Is TMS?
The Tabular Manifold Spec defines a structured, multi-resolution data format optimized for consumption by AI agents and LLMs.
TMS is:
- Not a storage format (use Parquet/Delta for that)
- Not a visualization format (use dashboards for humans)
- A cognitive transmission format — the interface layer between data pipelines and agent reasoning
TMS manifolds sit alongside dashboards, not instead of them:
┌─────────────────┐
│ Data Lake │
│ (Parquet/Delta) │
└────────┬────────┘
│
┌──────────────┴──────────────┐
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Dashboards │ │ TMS Manifolds │
│ (for humans) │ │ (for agents) │
└─────────────────┘ └─────────────────┘2. Core Principles
2.1 Columnar Encoding
Keys appear once. Values are dense arrays aligned to column order.
{
"format": "columnar_json_v1",
"schema": {
"columns": [
{ "name": "period", "type": "string" },
{ "name": "value", "type": "double" }
]
},
"rows": [
["2025-01", 10.5],
["2025-02", 11.2]
]
}Why: Eliminates key repetition. A 1000-row table with 10 columns saves ~9000 key tokens.
2.2 Progressive Disclosure (Three Levels)
| Level | Name | Purpose | Token Cost |
|---|---|---|---|
| Level 0 | Summary | Instant situational awareness | 200-500 tokens |
| Level 1 | Geometry | Aggregated structure/trends | 500-2000 tokens |
| Level 2 | Telemetry | Raw evidence for forensics | 2000-50000+ tokens |
Agents start at Level 0. They drill down only when anomalies or quality flags demand it.
2.3 Self-Describing
Every manifold contains enough metadata that an agent can interpret it without external documentation:
- Column types and descriptions
- Quality flags and reliability scores
- Interpretation hints in natural language
3. Manifold Envelope Schema
{
"artifact_type": "tabular_manifold",
"artifact_version": "1.1",
"manifold_kind": "<canonical_kind>",
"subject": { },
"time_window": { },
"level_0_summary": { },
"level_1_geometry": { },
"level_2_telemetry": { },
"token_budget": { },
"lineage": { }
}3.1 Required Fields
| Field | Type | Description |
|---|---|---|
| artifact_type | string | Always "tabular_manifold" |
| artifact_version | string | Spec version (e.g., "1.1") |
| manifold_kind | enum | One of the canonical kinds (see §4) |
| subject | object | What this manifold describes |
| level_0_summary | object | Required. The cheap cognitive entry point. |
3.2 Optional Fields
| Field | Type | Description |
|---|---|---|
| time_window | object | For time-based manifolds |
| level_1_geometry | object | Aggregated data in columnar format |
| level_2_telemetry | object | Raw data in columnar format |
| token_budget | object | Hints for agent token management |
| lineage | object | Provenance metadata |
4. Canonical Manifold Kinds
TMS defines five canonical manifold kinds. Each has a fixed Level 0 schema with optional extensions.
timeseries_metric
For any metric observed over time (prices, counts, rates, etc.)
Level 0 required fields:
{
"observation_count": 150,
"time_coverage": {
"expected_periods": 12,
"observed_periods": 10,
"coverage_ratio": 0.833
},
"distribution": {
"min": 9.85,
"max": 18.40,
"mean": 12.62,
"median": 10.90,
"stddev": 3.44,
"cv": 0.273
},
"reliability": { },
"quality_flags": { },
"interpretation_hints": []
}funnel_conversion
For sequential stage-based processes (sales funnels, onboarding flows, etc.)
Level 0 required fields:
{
"stage_count": 5,
"total_entered": 10000,
"total_converted": 342,
"overall_conversion_rate": 0.0342,
"bottleneck_stage": "checkout",
"bottleneck_drop_rate": 0.67,
"reliability": { },
"quality_flags": { },
"interpretation_hints": []
}cohort_behavior
For tracking groups over time (user cohorts, customer segments, etc.)
Level 0 required fields:
{
"cohort_count": 12,
"total_subjects": 5000,
"observation_periods": 6,
"retention_summary": {
"period_1": 0.85,
"period_3": 0.62,
"period_6": 0.41
},
"reliability": { },
"quality_flags": { },
"interpretation_hints": []
}inventory_snapshot
For point-in-time inventory or resource states
Level 0 required fields:
{
"snapshot_timestamp": "2025-01-14T00:00:00Z",
"total_skus": 1500,
"total_units": 125000,
"total_value": 2500000.00,
"stockout_skus": 45,
"overstock_skus": 120,
"reliability": { },
"quality_flags": { },
"interpretation_hints": []
}anomaly_detection
For systems monitoring and alerting contexts
Level 0 required fields:
{
"detection_window": {
"start": "2025-01-01T00:00:00Z",
"end": "2025-01-14T00:00:00Z"
},
"anomaly_count": 3,
"severity_distribution": {
"critical": 1,
"warning": 2,
"info": 0
},
"top_anomaly": {
"timestamp": "2025-01-10T14:30:00Z",
"metric": "cpu_usage",
"observed": 98.5,
"expected_range": [20, 60],
"severity": "critical"
},
"reliability": { },
"quality_flags": { },
"interpretation_hints": []
}5. Reliability Block (Required in Level 0)
Every Level 0 must include a reliability block that quantifies confidence in the summary statistics.
"reliability": {
"sample_size_class": "sparse|adequate|robust",
"sample_size_n": 5,
"sample_size_threshold_adequate": 30,
"sample_size_threshold_robust": 100,
"confidence_in_mean": {
"level": 0.95,
"margin_of_error": 2.4,
"interval": [10.22, 15.02]
},
"data_quality_score": 0.85,
"data_quality_notes": "3% of rows had imputed values",
"staleness": {
"last_observation": "2025-11-21T00:00:00Z",
"days_since_last": 54,
"is_stale": true,
"stale_threshold_days": 30
}
}Sample Size Classes
| Class | Criteria | Implication |
|---|---|---|
| sparse | n < 30 | Summary stats are unstable. Treat with caution. |
| adequate | 30 ≤ n < 100 | Stats are reasonable but not rock-solid. |
| robust | n ≥ 100 | High confidence in summary statistics. |
6. Quality Flags (Required in Level 0)
Standardized boolean flags that trigger agent attention:
"quality_flags": {
"low_sample_size": true,
"missing_periods": true,
"suspected_outliers": true,
"data_staleness": true,
"high_variance": false,
"imputation_applied": false,
"schema_drift_detected": false
}| Flag | Trigger Condition |
|---|---|
| low_sample_size | reliability.sample_size_class == "sparse" |
| missing_periods | time_coverage.coverage_ratio < 0.8 |
| suspected_outliers | Any value > 3σ from mean, or IQR-based detection |
| data_staleness | reliability.staleness.is_stale == true |
| high_variance | distribution.cv > 0.3 |
| imputation_applied | Any values were filled/estimated |
| schema_drift_detected | Column types or names changed from baseline |
7. Token Budget Block
Helps agents decide whether to load deeper levels:
"token_budget": {
"level_0_tokens_approx": 450,
"level_1_tokens_approx": 1200,
"level_2_tokens_approx": 8500,
"level_2_row_count": 517,
"level_2_inline_row_limit": 50,
"level_2_inline_strategy": "preview_outliers",
"compression_ratios": {
"level_1_vs_level_2": 7.1,
"level_0_vs_level_2": 18.9
},
"recommended_strategy": "Load Level 0 first. If quality_flags has any true values, load Level 1. Only load Level 2 if investigating specific anomalies."
}Level 2 Inline Strategies
| Strategy | Behavior |
|---|---|
| preview_outliers | Inline only rows flagged as outliers or anomalies |
| preview_recent | Inline only the N most recent rows |
| preview_sample | Inline a random sample of N rows |
| full_inline | Inline all rows (use only if row_count < limit) |
| none | No rows inlined; agent must use retrieval |
8. Level 1 Geometry Schema
Level 1 uses columnar encoding for aggregated data.
"level_1_geometry": {
"format": "columnar_json_v1",
"granularity": "month",
"schema": {
"columns": [
{ "name": "period", "type": "string", "description": "Aggregation bucket (YYYY-MM)" },
{ "name": "n", "type": "integer", "description": "Observation count in period" },
{ "name": "min", "type": "double", "description": "Minimum value in period" },
{ "name": "max", "type": "double", "description": "Maximum value in period" },
{ "name": "mean", "type": "double", "description": "Mean value in period" },
{ "name": "median", "type": "double", "description": "Median value in period" },
{ "name": "stddev", "type": "double", "description": "Standard deviation", "nullable": true },
{ "name": "flag", "type": "string", "description": "Optional anomaly flag", "nullable": true }
],
"primary_sort": ["period"]
},
"rows": [
["2025-01", 15, 9.80, 11.20, 10.45, 10.40, 0.35, null],
["2025-02", 12, 10.10, 11.50, 10.80, 10.75, 0.42, null],
["2025-07", 3, 17.50, 18.90, 18.20, 18.40, 0.70, "spike_detected"]
],
"missing_periods": ["2025-03", "2025-04", "2025-05", "2025-06"]
}9. Level 2 Telemetry Schema
Level 2 contains raw observations. For large datasets, use preview + retrieval.
"level_2_telemetry": {
"format": "columnar_json_v1",
"schema": {
"columns": [
{ "name": "ts", "type": "timestamp", "description": "Observation timestamp" },
{ "name": "value", "type": "double", "description": "Observed value" },
{ "name": "document_id", "type": "string", "description": "Source document reference", "nullable": true },
{ "name": "notes", "type": "string", "description": "Human or ETL notes", "nullable": true }
],
"primary_sort": ["ts"]
},
"row_count_total": 517,
"inline_rows": {
"strategy": "preview_outliers",
"rows": [
["2025-07-09T00:00:00Z", 18.40, "INV-12002", "Expedite fee applied"],
["2025-07-15T00:00:00Z", 17.90, "INV-12015", "Small lot surcharge"]
]
},
"retrieval": {
"method": "mcp_tool",
"tool_name": "get_timeseries_telemetry",
"tool_args": {
"manifold_id": "mfld_abc123",
"level": 2,
"filters": {}
},
"pagination": {
"default_page_size": 100,
"max_page_size": 500
}
}
}When to Inline vs. Retrieve
| Row Count | Recommendation |
|---|---|
| ≤ 50 | Full inline ("strategy": "full_inline") |
| 51-500 | Preview inline + retrieval available |
| > 500 | Preview inline only; retrieval required for full data |
10. Interpretation Hints
Natural language guidance for agents, always an array of strings:
"interpretation_hints": [
"Sparse series: only 5 observations across 12 months. Summary statistics are unreliable.",
"July 2025 shows a price spike (18.40 vs median 10.90). Investigate Level 2 for evidence.",
"High coefficient of variation (0.27) suggests inconsistent pricing or mixed product types.",
"Coverage ratio is 0.42, meaning 58% of expected periods have no data."
]Guidelines for Hint Authoring
- Lead with the most actionable insight
- Reference specific numbers from the manifold
- Suggest next steps (e.g., "Investigate Level 2")
- Keep each hint to 1-2 sentences
11. Lineage Block
Provenance metadata for auditability:
"lineage": {
"manifold_id": "mfld_abc123",
"computed_at": "2026-01-14T22:10:00Z",
"computed_by": "tms_generator_v1.2",
"computation_duration_ms": 450,
"inputs": [
{
"dataset_id": "purchase_orders_silver",
"dataset_version": "v5.2",
"row_count": 125000,
"as_of_timestamp": "2026-01-14T00:00:00Z"
}
],
"filters_applied": [
{ "field": "supplier_id", "operator": "eq", "value": "S-789" },
{ "field": "part_id", "operator": "eq", "value": "P-123456" }
],
"transformations": [
"Converted unit_price from cents to dollars",
"Excluded cancelled PO lines",
"Imputed missing facility codes as 'UNKNOWN'"
]
}12. Complete Example
A full timeseries_metric manifold:
{
"artifact_type": "tabular_manifold",
"artifact_version": "1.1",
"manifold_kind": "timeseries_metric",
"subject": {
"entity_type": "part_supplier_price",
"part_id": "P-123456",
"part_number": "WIDGET-SS-075",
"part_description": "Widget, stainless steel, 3/4 inch",
"supplier_id": "S-789",
"supplier_name": "Acme Industrial Supply",
"metric_name": "unit_price",
"currency": "USD",
"unit_of_measure": "EA"
},
"time_window": {
"start": "2025-01-01",
"end": "2025-12-31",
"timezone": "UTC",
"granularity": "month"
},
"level_0_summary": {
"observation_count": 5,
"time_coverage": {
"expected_periods": 12,
"observed_periods": 4,
"coverage_ratio": 0.333
},
"distribution": {
"min": 9.85,
"max": 18.40,
"mean": 12.01,
"median": 10.70,
"stddev": 3.44,
"cv": 0.286
},
"reliability": {
"sample_size_class": "sparse",
"sample_size_n": 5
},
"quality_flags": {
"low_sample_size": true,
"missing_periods": true,
"suspected_outliers": true,
"data_staleness": true
},
"interpretation_hints": [
"Sparse data: only 5 observations across 12 months.",
"One outlier detected: $18.40 in July (82% above median).",
"Recommended: inspect Level 1 to identify spike month."
]
},
"level_1_geometry": {
"format": "columnar_json_v1",
"granularity": "month",
"schema": {
"columns": [
{ "name": "period", "type": "string" },
{ "name": "n", "type": "integer" },
{ "name": "mean", "type": "double" },
{ "name": "flag", "type": "string", "nullable": true }
]
},
"rows": [
["2025-01", 1, 10.10, null],
["2025-03", 2, 10.80, null],
["2025-07", 1, 18.40, "outlier_spike"],
["2025-11", 1, 9.85, null]
]
},
"level_2_telemetry": {
"format": "columnar_json_v1",
"row_count_total": 5,
"inline_rows": {
"strategy": "full_inline",
"rows": [
["2025-01-14T00:00:00Z", 10.10, "PO-555"],
["2025-03-02T00:00:00Z", 10.70, "PO-612"],
["2025-03-28T00:00:00Z", 10.90, "PO-640"],
["2025-07-09T00:00:00Z", 18.40, "PO-777"],
["2025-11-21T00:00:00Z", 9.85, "PO-901"]
]
}
},
"lineage": {
"manifold_id": "mfld_price_P123456_S789_2025",
"computed_at": "2026-01-14T22:30:00Z",
"computed_by": "tms_generator_v1.1"
}
}13. JSON Schema
The formal JSON Schema for validation. Key validation rules:
artifact_typemust equal "tabular_manifold"manifold_kindmust be one of the five canonical kindslevel_0_summaryis required and must include reliability and quality_flags- If level_1_geometry or level_2_telemetry exists, it must have format: "columnar_json_v1"
- rows array length must match for all rows (equal to columns array length)
14. MCP Integration Pattern
TMS manifolds are designed to be returned by MCP tools. Recommended pattern:
@mcp_tool
def get_price_manifold(part_id: str, supplier_id: str, year: int) -> dict:
"""
Returns a TMS manifold for part/supplier price history.
The manifold includes:
- Level 0: Summary statistics and quality flags
- Level 1: Monthly aggregates
- Level 2: Raw transactions (preview only if >50 rows)
Start with level_0_summary. Check quality_flags to decide
whether deeper investigation is needed.
"""
# ... generate manifold ...
return manifoldTool Documentation Should Instruct the Agent:
- Always read
level_0_summaryfirst - Check
quality_flagsfor any true values - If flags are set, read
level_1_geometryto locate the issue - Only load
level_2_telemetrywhen investigating specific anomalies - Use
interpretation_hintsas reasoning guidance
Changelog
v1.1 (2026-01-14)
- Added required reliability block to Level 0
- Defined five canonical manifold_kind values with fixed Level 0 schemas
- Added token_budget block with compression ratios
- Specified inline_rows.strategy enum for Level 2
- Removed underspecified drilldown_policy DSL (replaced with interpretation_hints and recommended_strategy)
- Added outlier_summary to Level 0 for timeseries_metric
- Clarified inline vs retrieval thresholds
v1.0 (2026-01-10)
- Initial spec
License
TMS is released under Apache 2.0. Use it freely.