Configuring Space Partitions for Multi-Tenant Time-Series in TimescaleDB
Multi-tenant time-series architectures demand strict data isolation, predictable query performance, and automated lifecycle management. When ingesting telemetry from thousands of devices across distinct tenants, relying solely on time-based chunking creates hot chunks, uneven I/O distribution, and inefficient vacuum cycles. Space partitioning resolves these bottlenecks by distributing tenant data across independent physical chunks, enabling parallel writes, targeted compression, and granular retention policies. This guide details production-ready patterns for configuring space partitions, integrating continuous aggregates, and automating data retention using TimescaleDB and Python.
flowchart LR
t(["tenant_id"]) --> h{"hash mod N"}
h --> p0["partition 0"]
h --> p1["partition 1"]
h --> p2["partition 2"]
h --> p3["partition N-1"]
Architectural Baselines and Prerequisites
Before implementing space partitions, ensure your TimescaleDB instance is tuned for high-concurrency ingestion and that your schema explicitly models tenant boundaries. A typical multi-tenant telemetry table models a time column and a tenant identifier; note that if you define a primary key or unique constraint on a hypertable, it must include the time partitioning column (and any space partitioning column). The underlying Core Hypertable Architecture & Partitioning Strategy dictates that space partitions must be defined at hypertable creation time and cannot be altered post-creation. This constraint requires careful capacity planning and tenant cardinality forecasting.
Production prerequisites:
- TimescaleDB 2.10+ (for continuous aggregate refresh policies and optimized space partitioning)
- PostgreSQL 14+
- Dedicated schema for tenant isolation (e.g.,
telemetry) - A small, fixed
number_partitionsaligned to your parallel I/O capacity (not your tenant count)
Idempotent Hypertable Configuration
Space partitioning splits a hypertable into chunks based on both time and a specified column, typically tenant_id. The partition count should align with your expected tenant growth and hardware concurrency. Over-partitioning increases metadata overhead and autovacuum pressure, while under-partitioning causes chunk bloat and query planner degradation.
-- Idempotent base table creation
CREATE TABLE IF NOT EXISTS telemetry.raw_metrics (
time TIMESTAMPTZ NOT NULL,
tenant_id UUID NOT NULL,
device_id VARCHAR(64) NOT NULL,
metric_name VARCHAR(32) NOT NULL,
value DOUBLE PRECISION NOT NULL,
metadata JSONB
);
-- Safely convert to hypertable with space partitioning
DO $$
BEGIN
IF NOT EXISTS (
SELECT 1 FROM timescaledb_information.hypertables
WHERE hypertable_name = 'raw_metrics' AND hypertable_schema = 'telemetry'
) THEN
PERFORM create_hypertable(
'telemetry.raw_metrics',
'time',
partitioning_column => 'tenant_id',
number_partitions => 16,
chunk_time_interval => INTERVAL '7 days',
create_default_indexes => false
);
END IF;
END $$;
-- Production-grade indexes (idempotent)
CREATE INDEX IF NOT EXISTS idx_raw_metrics_tenant_time ON telemetry.raw_metrics (tenant_id, time DESC);
CREATE INDEX IF NOT EXISTS idx_raw_metrics_device_time ON telemetry.raw_metrics (device_id, time DESC);
The number_partitions => 16 setting creates 16 independent space partitions per time interval. Queries filtering on tenant_id will prune irrelevant chunks at the planner level, drastically reducing I/O. For deeper implementation patterns, consult the Space Partitioning for Multi-Tenant IoT reference.
Continuous Aggregate Integration Across Space Partitions
Continuous aggregates benefit directly from space partitioning because the materialization engine can parallelize refresh jobs across independent chunks. Instead of scanning the entire hypertable, TimescaleDB only processes newly appended chunks per space partition, reducing refresh latency and CPU contention.
-- Create tenant-aware continuous aggregate
CREATE MATERIALIZED VIEW IF NOT EXISTS telemetry.tenant_hourly_metrics
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', time) AS bucket,
tenant_id,
metric_name,
AVG(value) AS avg_value,
MAX(value) AS max_value,
COUNT(*) AS sample_count
FROM telemetry.raw_metrics
GROUP BY bucket, tenant_id, metric_name
WITH NO DATA;
-- Automated refresh policy (runs every 10 minutes, covers last 24 hours)
SELECT add_continuous_aggregate_policy(
'telemetry.tenant_hourly_metrics',
start_offset => INTERVAL '24 hours',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '10 minutes',
if_not_exists => true
);
When combined with space partitions, the refresh policy executes per-chunk, allowing the query planner to leverage chunk exclusion. This architecture ensures that high-frequency telemetry from specific tenants does not block aggregate computation for others.
Python-Driven Retention Lifecycle Automation
While TimescaleDB provides native add_retention_policy, multi-tenant environments often require tenant-specific SLAs or dynamic retention windows based on storage costs. Python automation enables programmatic chunk lifecycle management, leveraging PostgreSQL’s system catalogs to identify expired partitions and execute safe drop operations.
import psycopg
import logging
from datetime import datetime, timedelta, timezone
logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
def automate_retention_lifecycle(conn_string: str, retention_days: int = 90):
"""
Idempotent retention automation for space-partitioned telemetry.
Queries chunk metadata, identifies expired ranges, and applies native drop_chunks.
"""
# Use an aware UTC timestamp; datetime.utcnow() is naive and would be
# interpreted in the session time zone when cast to timestamptz.
cutoff_date = datetime.now(timezone.utc) - timedelta(days=retention_days)
with psycopg.connect(conn_string) as conn:
with conn.cursor() as cur:
# Query expired chunks across space partitions
cur.execute("""
SELECT chunk_schema, chunk_name, range_start, range_end
FROM timescaledb_information.chunks
WHERE hypertable_name = 'raw_metrics'
AND hypertable_schema = 'telemetry'
AND range_end < %s
AND is_compressed = false
ORDER BY range_end ASC;
""", (cutoff_date,))
expired_chunks = cur.fetchall()
if not expired_chunks:
logging.info("No expired chunks found for retention window.")
return
logging.info(f"Identified {len(expired_chunks)} chunks for retention processing.")
# drop_chunks removes every chunk older than the cutoff in a single
# call, so there is no need to loop over the identified chunks.
try:
cur.execute(
"SELECT drop_chunks(%s::regclass, older_than => %s::timestamptz);",
("telemetry.raw_metrics", cutoff_date),
)
logging.info(f"Dropped chunks older than {cutoff_date.isoformat()}")
conn.commit()
except psycopg.Error as e:
logging.error(f"Retention drop failed: {e}")
conn.rollback()
raise
if __name__ == "__main__":
# Replace with secure credential management (e.g., AWS Secrets Manager, Vault)
DSN = "postgresql://tsdb_user:password@localhost:5432/telemetry_db"
automate_retention_lifecycle(DSN, retention_days=90)
This script leverages timescaledb_information.chunks to evaluate partition boundaries before invoking drop_chunks. The operation remains idempotent and respects transaction boundaries, preventing partial drops during network interruptions. For comprehensive retention configuration, refer to the official TimescaleDB Data Retention documentation.
Production Validation and Planner Optimization
After deploying space partitions, validate query execution plans to ensure chunk pruning is active. Use EXPLAIN (ANALYZE, BUFFERS) on tenant-scoped queries to verify that the planner excludes irrelevant partitions. Monitor pg_stat_user_tables to track autovacuum efficiency across high-churn chunks. If vacuum pressure increases, consider adjusting autovacuum_vacuum_scale_factor and autovacuum_vacuum_cost_delay at the table level.
-- Verify chunk pruning and index utilization
EXPLAIN (ANALYZE, BUFFERS)
SELECT time, value
FROM telemetry.raw_metrics
WHERE tenant_id = 'a1b2c3d4-e5f6-7890-abcd-ef1234567890'
AND time > NOW() - INTERVAL '3 days';
Confirm that only the relevant chunks appear as child nodes under the Append/Custom Scan — excluded chunks are simply absent from the plan — which verifies that space partitioning and chunk exclusion are working. Additionally, monitor continuous aggregate refresh stats via timescaledb_information.job_stats (joined with timescaledb_information.jobs on the refresh job) to detect refresh bottlenecks caused by skewed tenant ingestion rates.
Space partitioning transforms multi-tenant telemetry from a monolithic ingestion bottleneck into a scalable, isolated, and automatable data pipeline. By aligning partition counts with hardware concurrency, enforcing idempotent lifecycle automation, and leveraging continuous aggregates for real-time summarization, engineering teams can maintain predictable performance at scale.