How to Calculate Optimal chunk_interval for IoT Sensor Data

The chunk_time_interval parameter decides how TimescaleDB physically partitions IoT telemetry on disk, and sizing it wrong quietly degrades compression ratios, continuous aggregate refresh latency, and retention sweep predictability. This page turns that decision into arithmetic: gather four measurable ingestion metrics, plug them into a deterministic formula, round to an operational boundary, and verify the result against the catalog. It applies the sizing rules introduced in the parent guide on time-based chunk partitioning strategies to the specific case of high-frequency sensor fleets.

Input Profiling: Metrics to Gather First

Before deriving an interval, establish a reproducible ingestion profile. IoT platforms rarely maintain uniform write rates; network buffering, gateway aggregation, and protocol translation introduce burst patterns. Capture the following from a representative production window:

Ingestion rate — average and peak rows per second across the fleet
Row footprint — average uncompressed size per telemetry record, including metadata, JSON payloads, and indexed columns
Retention horizon — how many days raw data must remain queryable before it is dropped or archived
Compression cadence — how frequently the compression policy runs before chunks convert to the columnar compression models used for cold telemetry (typically 7–30 days for IoT)

These values feed directly into the sizing formula. TimescaleDB’s background workers and PostgreSQL’s MVCC machinery perform best when a single uncompressed chunk occupies roughly 100 MB to 1 GB. The band exists because two costs pull in opposite directions:

Chunks below 100 MB multiply catalog rows, fragment VACUUM, and inflate planning time — the planner still evaluates constraints for every chunk it prunes.
Chunks above 1 GB degrade ANALYZE sampling accuracy, slow continuous aggregate materialization, and lengthen the lock window when a retention job drops one.

The Deterministic Sizing Formula

The optimal interval is the target chunk size divided by the projected daily ingestion volume. Expressed in terms of the profiled inputs:

I_{chunk} = \frac{B_{target}}{R_{rows/s} \times 86400 \times B_{row}}

where $B_{target}$ is the target uncompressed chunk size in bytes (use the 500 MB midpoint of the band unless you have a reason to skew), $R_{rows/s}$ is rows per second, $86400$ is seconds per day, and $B_{row}$ is the average uncompressed bytes per row. The result $I_{chunk}$ is expressed in days; multiply by 1440 to read it in minutes.

The same calculation runs directly in psql so you can sanity-check the number against live data:

sql

-- target_chunk_bytes is a numeric literal (500000000.0) so the division
-- runs in floating point rather than truncating with integer math.
WITH telemetry_profile AS (
  SELECT
    500000000.0 AS target_chunk_bytes,  -- 500 MB midpoint of the 100 MB-1 GB band
    2500        AS avg_rows_per_sec,
    180         AS avg_row_bytes
)
SELECT
  target_chunk_bytes
    / (avg_rows_per_sec * 86400 * avg_row_bytes) AS chunk_interval_days,
  1440 * target_chunk_bytes
    / (avg_rows_per_sec * 86400 * avg_row_bytes) AS chunk_interval_minutes
FROM telemetry_profile;

Round the raw result upward to a clean boundary (15 min, 30 min, 1 h, 2 h, 6 h, 1 day). Rounding up rather than down keeps the catalog small and lets compression, retention, and the continuous aggregate refresh policy all operate on tidy chunk boundaries instead of straddling them.

Automating the calculation in Python

For fleets that scale non-linearly, a calibration routine can read live metrics and emit a ready-to-apply interval string before you onboard a new device tier. The snippet uses psycopg v3 and reflects the boundaries above:

python

import math
import psycopg
from psycopg.rows import dict_row


def calculate_optimal_chunk_interval(conn_str: str, target_mb: int = 500) -> str:
    """Query live telemetry metrics and return a PostgreSQL interval string."""
    with psycopg.connect(conn_str, row_factory=dict_row) as conn:
        with conn.cursor() as cur:
            cur.execute("""
                SELECT
                    COUNT(*)::float
                      / NULLIF(EXTRACT(EPOCH FROM (MAX(time) - MIN(time))), 0) AS rps,
                    pg_column_size(
                      (SELECT row_to_json(t) FROM sensor_readings t LIMIT 1)
                    ) AS row_bytes
                FROM sensor_readings
                WHERE time > NOW() - INTERVAL '1 hour';
            """)
            row = cur.fetchone()

    if not row or not row["rps"]:
        return "1 hour"  # safe fallback (bare interval literal for ::interval)

    daily_bytes = row["rps"] * 86400 * row["row_bytes"]
    target_bytes = target_mb * 1_048_576
    interval_minutes = max(15, math.ceil((target_bytes / daily_bytes) * 1440))

    # Round up to the next operational boundary rather than the nearest one.
    boundaries = [15, 30, 60, 120, 360, 720, 1440]
    optimal = next((b for b in boundaries if b >= interval_minutes), 1440)
    return f"{optimal} minutes"


def apply_chunk_interval(conn_str: str, interval_sql: str) -> None:
    """Update the interval for FUTURE chunks; existing chunks are untouched."""
    with psycopg.connect(conn_str) as conn:
        with conn.cursor() as cur:
            cur.execute(
                "SELECT set_chunk_time_interval('sensor_readings', %s::interval);",
                (interval_sql,),
            )
        conn.commit()

Worked Example

Take a fleet of 50,000 devices, each emitting one reading every 20 seconds, so the hypertable ingests 2,500 rows/sec. Each record — timestamptz, device_id, metric_type, a double precision value, and a small JSONB payload — averages 180 bytes uncompressed. Targeting the 500 MB midpoint:

I_{chunk} = \frac{500{,}000{,}000}{2500 \times 86400 \times 180} = \frac{5\times10^{8}}{3.888\times10^{10}} \approx 0.0129 \text{ days} \approx 18.5 \text{ minutes}

The raw answer is roughly 18.5 minutes, so the next clean boundary up is 30 minutes (or 1 hour if you prefer fewer, slightly larger chunks). At a 1-hour interval each chunk holds about 9 million rows and lands near 1.6 GB uncompressed — a signal to drop to 30 minutes to stay inside the band. This is the interval that then anchors the rest of the lifecycle:

sql

CREATE TABLE IF NOT EXISTS sensor_readings (
    time        TIMESTAMPTZ NOT NULL,
    device_id   UUID        NOT NULL,
    metric_type TEXT        NOT NULL,
    value       DOUBLE PRECISION,
    payload     JSONB
);

-- Apply the calculated interval at creation time to avoid a later resize.
SELECT create_hypertable(
    'sensor_readings', 'time',
    chunk_time_interval => INTERVAL '30 minutes',
    if_not_exists       => TRUE
);

-- Compression, aggregate refresh, and retention all key off the chunk boundary.
SELECT add_compression_policy('sensor_readings', INTERVAL '7 days', if_not_exists => TRUE);
SELECT add_retention_policy('sensor_readings', drop_after => INTERVAL '90 days', if_not_exists => TRUE);

Because retention drops whole chunks rather than issuing row-level DELETEs, a 30-minute interval means the drop_after sweep reclaims space in clean, predictable increments instead of scanning live data.

Edge Cases & When to Deviate

The formula assumes steady append-mostly ingestion at a stable row width. Deviate when:

Bursty or bimodal write rates — size against the sustained peak rows/sec, not the daily average, or the busiest hours will overshoot the 1 GB ceiling.
Wide or growing rows — large JSONB payloads or many indexed columns push bytes/row up over time; re-profile pg_column_size quarterly rather than trusting the launch-day figure.
Very low ingestion tiers — a formula result above one day should still be clamped to a 1 day or 7 days interval so sparse fleets do not accumulate near-empty chunks.
Space partitioning in play — when you also partition by device_id for multi-tenant space partitioning, each time chunk is further split per partition, so divide the target chunk size by the number of space partitions before applying the formula.
Aggregate-heavy read paths — if a 5-minute continuous aggregate refreshes slower than its schedule, prefer a slightly larger interval to cut per-refresh catalog overhead, even if it nudges chunks toward the upper band.
Aligning retention with lifecycle policies — keep drop_after an exact multiple of the interval so retention never truncates a partial chunk; the mechanics are covered in TTL policy mapping and enforcement.

Verification

After applying the interval, confirm chunks are landing inside the target band and that lifecycle jobs run on the boundary you sized for:

sql

-- Actual size of recent chunks — the number you sized against.
SELECT
    chunk_name,
    pg_size_pretty(
      pg_total_relation_size(format('%I.%I', chunk_schema, chunk_name))
    )                       AS chunk_size,
    is_compressed,
    range_start,
    range_end
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_readings'
ORDER BY range_start DESC
LIMIT 10;

-- Confirm the interval the hypertable will use for FUTURE chunks (microseconds).
SELECT hypertable_name, column_name, time_interval
FROM timescaledb_information.dimensions
WHERE hypertable_name = 'sensor_readings';

If the chunk_size column drifts above 1 GB, halve the interval; if chunks routinely come in under 100 MB, double it. Because set_chunk_time_interval only affects chunks created after the change, existing chunks keep their original span — a resize is a forward-looking adjustment, not a rewrite.

Time-Based Chunk Partitioning Strategies — the parent guide on sizing, creating, and pruning time chunks
TimescaleDB Chunk Partitioning vs PostgreSQL Table Inheritance — why automatic chunking beats manual child tables
Compression Models for High-Frequency Telemetry — how chunk size shapes columnar compression ratios
Space Partitioning for Multi-Tenant IoT — adjusting the formula when partitioning by device or tenant
TTL Policy Mapping & Enforcement — aligning retention windows with chunk boundaries

← Back to Core Hypertable Architecture & Partitioning Strategy

How to Calculate Optimal chunk_interval for IoT Sensor Data

# Input Profiling: Metrics to Gather First

# The Deterministic Sizing Formula

# Automating the calculation in Python

# Worked Example

# Edge Cases & When to Deviate

# Verification

# Related