Note
I'm using InfluxDB 1.8 because starting from version 2.0 there is no ARM 32bit builds. Also, version 1.x is much simpler than 2.0.
However, things should be similar with InfluxDB 2.0 tasks.
The InfluxDB documentation includes a great guide for downsampling and retaining data. However, if you are not starting from an empty database, the guide is not enough.
I have all the data from the sensors in the database sensors
. The data looks like this:
SELECT * FROM eco2 WHERE time > now() - 10s;
name: eco2
time sensor value
---- ------ -----
1683080134061871104 SGP30 836
1683080135183121152 SGP30 833
1683080136307070720 SGP30 830
1683080137418324480 SGP30 820
1683080138542053888 SGP30 826
1683080139652603392 SGP30 833
1683080140753946368 SGP30 831
1683080141879817472 SGP30 828
1683080143005833728 SGP30 835
And I want to downsample the data into another database called sensors-history
:
CREATE DATABASE sensors_history;
Note
You can just create another retention policy instead of creating a new database, and you just need to change the queries below from
"sensors-history"."autogen"
to"sensors"."oneyear"
whereoneyear
is the name of the retention policy you just created.
Downsample all existing data #
Before creating a retention policy on sensors
, we first need to downsample all existing data into sensors-history
:
USE sensors;
SELECT MEAN(*) INTO "sensors_history"."autogen".:MEASUREMENT
FROM /.*/ GROUP BY time(10m),*;
Note
I added newlines in the query for readability. If you are executing the queries from the
influx
REPL, you need to remove the newlines for each query.
The wildcard and the backreferencing syntax :MEASUREMENT
can be found in the InfluxQL documentation. All it does is to select all data from all measurements (the regex /.*/
part), downsample them with the interval of 10m
, and write to the corresponding :MEASUREMENT
s in sensors-history
database with the autogen
retention policy, preserving all the tags (GROUP BY *
part).
If you want more details, you can do this:
SELECT
MEAN("value") AS mean_value,
PERCENTILE("value",95) AS high_value,
PERCENTILE("value",5) AS low_value
INTO "sensors_history"."autogen".:MEASUREMENT
FROM /.*/
GROUP BY time(10m),*;
where I used the PERCENTILE
function instead of MAX
and MIN
for a more robust aggregation. As a consequence, I have to use AS
for aliasing, as the default query naming fails to distinguish the two percentile functions.
Setting up a retention policy #
After checking that all data is downsampled into sensors-history
, it's time to set up the retention policy and delete old data. Here I will just keep the data for one day:
CREATE RETENTION POLICY "oneday" ON "sensors" DURATION 1d REPLICATION 1 DEFAULT;
where REPLICATION 1
is the default for single-node InfluxDB deployment.
Since I've set the new policy oneday
as the default, the query will return an empty result if the retention policy is not specified in the query. In other words, it may look like you've lost all your data after setting a retention policy.
To make the query return recent 24h
data, I need to move the recent data from old autogen
to oneday
:
SELECT * INTO "oneday".:MEASUREMENT
FROM "autogen"./.*/
WHERE time > now() - 1d GROUP BY *;
After checking everything is working fine, it's time to remove old data.
DELETE FROM /.*/ WHERE time < now() - 1d;
Creating the continuous query #
The last step is to create the continuous query:
CREATE CONTINUOUS QUERY "cq_tidy_sensors" ON "sensors"
BEGIN
SELECT
MEAN("value") AS mean_value,
PERCENTILE("value",95) AS high_value,
PERCENTILE("value",5) AS low_value
INTO "sensors_history"."autogen".:MEASUREMENT
FROM "sensors"."oneday"./.*/
GROUP BY time(10m),*;
END
The SELECT
query is similar to the query above, but it's better to specify the retention policy oneday
, especially if you didn't make it the default.
The continuous query will automatically execute every 10m
, selecting recent 10m
of data from the sensors
database to sensors-history
. You can check your continuous query by running
SHOW CONTINUOUS QUERIES;
If everything goes fine, you will see something like this every 10m
in your InfluxDB log (docker compose logs
if using Docker Compose):
msg="Continuous query execution (start)" service=continuous_querier op_name=continuous_querier_execute op_event=start
msg="Executing continuous query" service=continuous_querier op_name=continuous_querier_execute name=cq_tidy_sensors db_instance=sensors start=2023-05-03T03:00:00.000000Z end=2023-05-03T03:10:00.000000Z
msg="Executing query" service=query query="SELECT mean(value) AS mean_value, percentile(value, 95) AS high_value, percentile(value, 5) AS low_value INTO sensors_history.autogen.:MEASUREMENT FROM sensors.oneday./.*/ WHERE time >= '2023-05-03T03:00:00Z' AND time < '2023-05-03T03:10:00Z' GROUP BY time(10m), *"
msg="Finished continuous query" service=continuous_querier op_name=continuous_querier_execute name=cq_tidy_sensors db_instance=sensors written=7 start=2023-05-03T03:00:00.000000Z end=2023-05-03T03:10:00.000000Z duration=1069ms
msg="Continuous query execution (end)"
🥳 We have set up downsampling with existing data!