Continuous Aggregate Creation & Refresh Management

Time-series workloads in IoT telemetry, infrastructure monitoring, and financial tick data demand predictable query latency at scale. Raw event ingestion rates frequently exceed analytical throughput, making on-the-fly aggregation a persistent bottleneck: a dashboard that runs avg(temperature) over a billion-row hypertable cannot stay under a one-second budget without help. TimescaleDB solves this with continuous aggregates, which materialize pre-computed summaries over time buckets while maintaining a unified query interface over both the rollup and the raw tail. This guide is written for the time-series data engineers, IoT platform developers, DevOps operators, and Python automation builders who own that pipeline end to end — from the CREATE MATERIALIZED VIEW statement through refresh scheduling, background-worker tuning, retention alignment, and the monitoring queries that catch drift before it reaches a dashboard. It ties together every stage of the continuous aggregate lifecycle and links out to the focused guides that go deeper on each one.

The continuous aggregate lifecycle: raw telemetry is materialized into rollups that queries read alongside fresh data, driven by a refresh policy and bounded by a retention policy.

Architecture Baseline

A continuous aggregate is not a standalone object — it is a materialized hypertable wired to a source hypertable through an invalidation log and serviced by the TimescaleDB job scheduler. Getting the environment right before the first CREATE MATERIALIZED VIEW avoids the most common class of production failures, where a policy silently never fires because a background worker slot was never available. Confirm each of the following before you build:

TimescaleDB 2.10 or later installed and CREATE EXTENSION timescaledb run in the target database (2.7 introduced modern continuous aggregates; 2.10+ adds IF NOT EXISTS on the view and stable policy helpers).
PostgreSQL 14 or later, so the planner rewrites that merge materialized rollups with the raw tail behave as documented.
The source table is already a hypertable, partitioned by a TIMESTAMPTZ time column via create_hypertable().
timescaledb.max_background_workers is set high enough to cover every refresh policy plus retention and compression jobs — the default of 16 is shared across the whole instance, and this parameter requires a PostgreSQL restart, not just pg_reload_conf().
max_worker_processes is at least max_background_workers plus PostgreSQL’s own parallel and replication workers, or the scheduler cannot launch jobs.
The connecting role has SELECT on the source hypertable and ownership of the aggregate view for policy management.
Time-bucket granularity, refresh cadence, and retention horizon have been decided together — they are one design, not three independent knobs.

The source hypertable’s physical layout has a direct effect on refresh cost. Because a refresh reads the invalidated ranges of the raw hypertable, the chunk_time_interval sizing you chose when building the hypertable determines how many chunks each refresh must scan. Oversized chunks force a refresh to touch more data than the invalidated window strictly requires; undersized chunks inflate catalog overhead. Treat the aggregate design as an extension of your core hypertable architecture and partitioning strategy rather than a bolt-on.

How Continuous Aggregates Materialize

A continuous aggregate stores its results in an internal materialization hypertable, distinct from the view name you query. When rows are inserted, updated, or deleted in the source hypertable, TimescaleDB records the affected time ranges in an invalidation log rather than eagerly recomputing anything. A refresh — whether triggered by a policy or a manual refresh_continuous_aggregate() call — reads that log, recomputes only the invalidated buckets, and advances a watermark that marks the boundary between materialized history and the live tail. The exact storage layout, partial-aggregate representation, and planner rewrite rules are covered in depth in the materialized view architecture and syntax guide; the essential point for lifecycle management is that refresh cost scales with invalidated data, not total data.

Real-time continuous aggregates combine materialized history with fresh, un-materialized rows at query time, eliminating refresh latency for the most recent buckets. That behaviour is controlled by the materialized_only option, which defaults to false (real-time on) on modern versions. The creation statement itself requires a WITH (timescaledb.continuous) clause and a deterministic time-bucketing function.

sql

-- Idempotent creation. IF NOT EXISTS (TimescaleDB 2.10+) makes the DDL safe to
-- re-run from CI/CD without guarding it in application code.
CREATE MATERIALIZED VIEW IF NOT EXISTS sensor_metrics_1h
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('1 hour', ts) AS bucket,   -- deterministic bucket boundary
  device_id,
  avg(temperature)  AS avg_temp,
  max(temperature)  AS max_temp,
  min(temperature)  AS min_temp,
  count(*)          AS reading_count
FROM raw_sensor_data
GROUP BY bucket, device_id
WITH NO DATA;                            -- defer the initial backfill

-- Keep the real-time union on so recent buckets are never stale.
ALTER MATERIALIZED VIEW sensor_metrics_1h
  SET (timescaledb.materialized_only = false);

The WITH NO DATA flag accelerates creation by deferring the initial population, which for a large hypertable can otherwise run for hours and hold resources. In production, trigger the first backfill during a maintenance window with an explicit, bounded refresh_continuous_aggregate('sensor_metrics_1h', start, end) call over one time range at a time, rather than a single unbounded refresh that materializes the entire history in one transaction. When the aggregation needs contiguous buckets even where the source has gaps — a frequent requirement for alerting on missing devices — build the view with time_bucket_gapfill as described in creating continuous aggregates with time_bucket_gapfill.

Automation Patterns

Production systems treat aggregate lifecycle as infrastructure: creation, policy attachment, and retention alignment are version-controlled and applied idempotently, so the same script converges the database to the desired state whether it runs on a fresh instance or an existing one. The TimescaleDB policy helpers already accept if_not_exists => true, which makes them safe to re-run; the automation layer’s job is to compose them, verify the result, and retry through transient failures. The following psycopg v3 routine attaches a refresh policy, aligns retention, and reports health — labelled with the workflow steps it performs.

python

import logging
import psycopg
from psycopg.rows import dict_row
from tenacity import (
    retry, stop_after_attempt, wait_exponential, retry_if_exception_type,
)

logging.basicConfig(level=logging.INFO)
DB_CONN_STR = "postgresql://user:pass@localhost:5432/telemetry_db"


@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10),
    retry=retry_if_exception_type(psycopg.OperationalError),
)
def provision_aggregate(view_name: str) -> None:
    """Step 1-3: idempotently attach refresh + retention policies to an aggregate."""
    with psycopg.connect(DB_CONN_STR) as conn:
        with conn.cursor() as cur:
            # Step 1: refresh policy — never touches the in-flight newest bucket.
            cur.execute(
                """
                SELECT add_continuous_aggregate_policy(
                    %s,
                    start_offset      => INTERVAL '3 hours',
                    end_offset        => INTERVAL '1 hour',
                    schedule_interval => INTERVAL '1 hour',
                    if_not_exists     => true
                );
                """,
                (view_name,),
            )
            # Step 2: retention on the AGGREGATE view itself (longer horizon than raw).
            cur.execute(
                """
                SELECT add_retention_policy(
                    %s, drop_after => INTERVAL '2 years', if_not_exists => true
                );
                """,
                (view_name,),
            )
        conn.commit()
    logging.info("Provisioned refresh + retention policies for %s", view_name)


def verify_aggregate_health(view_name: str) -> dict:
    """Step 4: resolve the refresh job via the catalog and read its last status."""
    with psycopg.connect(DB_CONN_STR, row_factory=dict_row) as conn:
        with conn.cursor() as cur:
            # A cagg's refresh job attaches to its materialization hypertable,
            # not the view name, so join through the catalog to find it.
            cur.execute(
                """
                SELECT js.job_id, js.last_run_status,
                       js.last_successful_finish, js.next_start
                FROM timescaledb_information.job_stats js
                JOIN timescaledb_information.continuous_aggregates ca
                  ON ca.materialization_hypertable_name = js.hypertable_name
                WHERE ca.view_name = %s;
                """,
                (view_name,),
            )
            return cur.fetchone()


if __name__ == "__main__":
    provision_aggregate("sensor_metrics_1h")
    stats = verify_aggregate_health("sensor_metrics_1h")
    if stats and stats["last_run_status"] != "Success":
        logging.warning(
            "Refresh job %s last status: %s", stats["job_id"], stats["last_run_status"]
        )

The tenacity decorator absorbs transient connection drops and lock timeouts so a flaky network does not translate into stale rollups. For the retry semantics that belong inside the database — trigger-driven recovery and dead-letter logging of failed refreshes — see error handling and retry mechanisms and the pattern for handling refresh failures with custom PL/pgSQL triggers. Full details on tuning the start_offset, end_offset, and schedule_interval trio — including the boundary math that keeps a policy off the actively written bucket — live in refresh policy design and scheduling, with a concrete recipe in setting up automatic refresh policies for 5-minute intervals.

Both offsets are measured back from now: start_offset sets how far back a run reaches, end_offset keeps the run clear of the still-filling bucket. The gap between them is the range each refresh materializes.

Refresh Strategy Selection

Choosing between incremental and full refreshes depends on data mutation patterns and backfill requirements. Incremental refreshes process only the ranges recorded in the invalidation log, minimizing I/O and compute; this is the default policy behaviour and the correct choice for steady-state ingestion. Full refreshes rebuild an entire time range and are reserved for schema changes, corrected historical data, or recovering a corrupted materialization. The trade-offs, and how to bound a full refresh so it does not lock the aggregate for the duration, are worked through in incremental vs full refresh strategies.

For platforms handling late-arriving telemetry — an IoT gateway that buffers offline and flushes hours of backlog on reconnect — leave materialized_only => false so foreground queries merge cached rollups with the raw tail. This removes the pressure to run aggressive refresh frequencies during high-velocity windows: the data is answerable from the raw hypertable until the next scheduled refresh catches up. When late data lands behind the watermark, however, the corresponding buckets are invalidated and must be re-materialized; a policy whose start_offset does not reach far enough back will never revisit them, which is the root cause behind most reports of stale continuous aggregates in production.

Performance & Scale

At fleet scale, continuous aggregate performance is governed less by the aggregation query and more by how refresh work is scheduled against finite background workers and how the underlying chunks are laid out. Four factors dominate.

Chunk count vs catalog overhead. Every chunk on both the source and the materialization hypertable is a catalog row plus a physical table with its own indexes and statistics. A refresh must resolve which chunks intersect the invalidated range, so a hypertable fragmented into hundreds of thousands of tiny chunks pays planning cost on every refresh and bloats pg_class. Size chunks so that a typical chunk holds roughly a few days of a device’s data and the working set of recent chunks fits comfortably in memory — the calculation is spelled out in the chunk_time_interval sizing guide.

Background-worker concurrency. All refresh, retention, and compression jobs draw from the same timescaledb.max_background_workers pool. If the number of policies whose next_start falls in the same instant exceeds available workers, jobs queue and refresh lag climbs even though no single job is slow. Stagger schedule_interval values or offset policy start times across aggregate tiers so the scheduler flattens compute spikes instead of stacking them. The dispatch mechanics and how to size the pool are detailed in asynchronous execution and queue management, and the technique for shrinking a single large refresh into fast, bounded passes is covered in incremental refresh performance tuning for large datasets.

IOPS distribution. A refresh reads invalidated source chunks and writes materialized chunks; retention drops chunks; compression rewrites them. When these coincide, they contend for the same disk bandwidth. Keeping refresh windows narrow (a small start_offset-to-end_offset span per run) spreads I/O evenly rather than producing hourly spikes that starve foreground queries.

Compression interaction. Materialization chunks that have aged past the real-time window are excellent compression candidates: a 1-hour rollup of per-second readings is already a large reduction, and applying columnar compression models on top typically yields another large factor on the rollup itself. Compress materialized chunks only after they fall behind the watermark and are no longer subject to invalidation, or a late-arriving refresh will have to decompress and recompress them.

The following formula estimates the number of materialized rows a single-tier aggregate produces, which drives its storage and refresh cost:

R_{\text{agg}} = D \times \left\lceil \frac{H}{B} \right\rceil

where $D$ is the number of distinct grouping keys (for example device IDs), $H$ is the retention horizon of the aggregate, and $B$ is the bucket width. A 50,000-device fleet with 1-hour buckets retained for two years produces roughly $50{,}000 \times (2 \times 8760) \approx 8.76 \times 10^{8}$ rollup rows — small next to the raw stream, but large enough that its own chunk sizing and compression matter.

Failure Modes & Operational Gotchas

Watermark drift from late data. If telemetry regularly arrives more than start_offset behind the current time, the refresh window never reaches those buckets and the aggregate silently diverges from the raw data. Mitigation: widen start_offset to cover your worst-case ingestion lag, or run a periodic bounded full refresh over the affected range. Verify with the freshness query in the monitoring section.
In-flight bucket materialized too early. An end_offset shorter than the time it takes ingestion to finalize a bucket lets the refresh capture a partial bucket, which then never gets corrected because it sits ahead of later watermarks. Mitigation: set end_offset to at least one bucket width plus your ingestion settling time.
Refresh starvation under worker pressure. Too many policies competing for max_background_workers leaves some jobs perpetually queued; next_start keeps sliding forward. Mitigation: raise the worker pool (restart required), reduce policy count by consolidating tiers, or stagger schedule intervals.
Retention drops raw chunks the refresh still needs. If the raw retention policy removes chunks before the aggregate has materialized them, those buckets are lost forever. Mitigation: ensure the raw drop_after horizon is strictly greater than start_offset, and order retention after refresh in the schedule.
Unbounded initial backfill. A single refresh_continuous_aggregate() over the whole history runs in one long transaction, holds locks, and can exhaust WAL. Mitigation: backfill in bounded ranges, oldest to newest.
Compressing materialization chunks inside the invalidation window. Compressing rollup chunks that late data can still invalidate forces expensive decompress/recompress cycles on refresh. Mitigation: only compress materialized chunks older than start_offset.
materialized_only mismatch. Toggling to materialized_only = true while assuming real-time behaviour makes recent buckets vanish from query results until the next refresh. Mitigation: keep the real-time union on unless you have a specific reason to disable it, and document the choice.

Monitoring Checklist

Track these signals continuously; each maps to a query against the TimescaleDB system views. Alert on the first two — they are the difference between a healthy pipeline and one that has quietly stopped materializing.

Refresh job status and next run — catches jobs that are failing or no longer scheduled.
Refresh lag / aggregate freshness — the gap between the watermark and now.
Chunk count — early warning for over-fragmentation.
Compression ratio on materialized chunks — confirms the storage plan is holding.
Job error history — surfaces transient failures that retries hid.

sql

-- 1. Refresh job health for every continuous aggregate.
SELECT ca.view_name,
       js.job_id,
       js.last_run_status,
       js.last_successful_finish,
       js.next_start,
       js.total_failures
FROM timescaledb_information.job_stats js
JOIN timescaledb_information.continuous_aggregates ca
  ON ca.materialization_hypertable_name = js.hypertable_name
ORDER BY js.last_successful_finish NULLS FIRST;

-- 2. Aggregate freshness: how far the newest materialized bucket lags "now".
SELECT view_name,
       now() - watermark AS refresh_lag
FROM (
  SELECT ca.view_name,
         _timescaledb_internal.to_timestamp(
           _timescaledb_internal.cagg_watermark(ca.mat_hypertable_id)
         ) AS watermark
  FROM _timescaledb_catalog.continuous_agg ca_int
  JOIN timescaledb_information.continuous_aggregates ca
    ON ca.view_name = ca_int.user_view_name
) s
ORDER BY refresh_lag DESC;

-- 3. Chunk count per hypertable (raw + materialization) to spot fragmentation.
SELECT hypertable_name, count(*) AS chunks
FROM timescaledb_information.chunks
GROUP BY hypertable_name
ORDER BY chunks DESC;

-- 4. Recent job errors (transient failures the scheduler retried).
SELECT job_id, proc_name, start_time, error_data ->> 'message' AS message
FROM timescaledb_information.job_errors
ORDER BY start_time DESC
LIMIT 20;

Wire query 1 into an alert on last_run_status <> 'Success' or a next_start that has passed without a corresponding last_successful_finish. Feed query 2 into a lag threshold tied to your dashboard SLA. Retention alignment is the final piece: dropping raw chunks does not remove already-materialized aggregate rows — the rollup retains its summaries independently, which is exactly what lets you keep long-lived history after the raw data ages out. To bound aggregate storage, attach a separate retention policy to the view with its own, typically longer, drop_after horizon, and coordinate it with your broader data retention, compression, and lifecycle automation — the mechanics of mapping business TTLs onto drop_after intervals are covered in TTL policy mapping and enforcement.

Materialized View Architecture & Syntax — storage layout, partial aggregates, and planner rewrites behind a continuous aggregate.
Refresh Policy Design & Scheduling — tuning start_offset, end_offset, and schedule_interval.
Incremental vs Full Refresh Strategies — when to rebuild a range versus process only invalidated data.
Asynchronous Execution & Queue Management — background-worker dispatch, queue depth, and concurrency limits.
Error Handling & Retry Mechanisms — resilient recovery for failed refresh jobs.

Cross-topic: Core Hypertable Architecture & Partitioning Strategy · Columnar Compression Models for High-Frequency Telemetry · Data Retention, Compression & Lifecycle Automation

← Back to all TimescaleDB topics

Continuous Aggregate Creation & Refresh Management

# Architecture Baseline

# How Continuous Aggregates Materialize

# Automation Patterns

# Refresh Strategy Selection

# Performance & Scale

# Failure Modes & Operational Gotchas

# Monitoring Checklist

# Related

In this topic