Skip to content

rustyconover/duckdb-cron-extension

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cron Expressions For DuckDB

Motivation

Cron jobs are a fundamental tool for scheduling tasks, often used for analyzing system behavior or verifying the completeness of data collections. For instance, a task might be scheduled to run daily at 10:30 AM, while another runs every Tuesday at 3 PM. These times might be in different time zones, such as UTC or New York time.

To efficiently compare these scheduled tasks against actual data, it’s beneficial to generate a list of expected timestamps directly within DuckDB. While DuckDB's generate_series() function can create a series of timestamps, it lacks the expressiveness needed for complex recurring patterns. I want to fill this with a new DuckDB extension that interprets cron expressions, leveraging the Rust crate croner.

This extension introduces a table-returning function cron(), which calculates upcoming or past times that satisfy a given cron expression.

Examples

Basic Usage

The cron() function returns the next timestamp for a given cron expression:

-- This expression occurs every day at 5 AM.
SELECT * FROM cron('0 5 * * *');
┌─────────────────────┐
│        cron         │
│     timestamp_s     │
├─────────────────────┤
│ 2024-05-26 05:00:00 │
└─────────────────────┘

Future Timestamps

To retrieve all future occurrences up to a specific date:

select * from cron('0 5 * * *', until='2024-06-05');
┌─────────────────────┐
│        cron         │
│     timestamp_s     │
├─────────────────────┤
│ 2024-05-26 05:00:00 │
│ 2024-05-27 05:00:00 │
│ 2024-05-28 05:00:00 │
│ 2024-05-29 05:00:00 │
│ 2024-05-30 05:00:00 │
│ 2024-05-31 05:00:00 │
│ 2024-06-01 05:00:00 │
│ 2024-06-02 05:00:00 │
│ 2024-06-03 05:00:00 │
│ 2024-06-04 05:00:00 │
├─────────────────────┤
│       10 rows       │
└─────────────────────┘

Past Timestamps

To retrieve occurrences within a past date range:

select * from cron('0 5 * * *', until='2020-08-04', start='2020-08-01');
┌─────────────────────┐
│        cron         │
│     timestamp_s     │
├─────────────────────┤
│ 2020-08-01 05:00:00 │
│ 2020-08-02 05:00:00 │
│ 2020-08-03 05:00:00 │
└─────────────────────┘

Timezone Handling

Cron expressions can be evaluated in specific time zones:

select * from
cron(
  '0 5 * * *',
  until='2024-03-14 00:00:00',
  start='2024-03-08 01:58:00',
  timezone='America/New_York');

┌─────────────────────┐
│        cron         │
│     timestamp_s     │
├─────────────────────┤
│ 2024-03-08 10:00:00 │
│ 2024-03-09 10:00:00 │
│ 2024-03-10 09:00:00 │
│ 2024-03-11 09:00:00 │
│ 2024-03-12 09:00:00 │
│ 2024-03-13 09:00:00 │
└─────────────────────┘

Changing DuckDB's timezone to match.

set timezone to 'America/New_York';

select * from
cron(
  '0 5 * * *',
  until='2024-03-14 00:00:00',
  start='2024-03-08 01:58:00',
  timezone='America/New_York');
┌─────────────────────┐
│        cron         │
│     timestamp_s     │
├─────────────────────┤
│ 2024-03-08 10:00:00 │
│ 2024-03-09 10:00:00 │
│ 2024-03-10 09:00:00 │
│ 2024-03-11 09:00:00 │
│ 2024-03-12 09:00:00 │
│ 2024-03-13 09:00:00 │
└─────────────────────┘

Second Level Precision

select * from cron('*/3 0 5 * * *', until='2024-05-30') limit 5;
┌─────────────────────┐
│        cron         │
│     timestamp_s     │
├─────────────────────┤
│ 2024-05-26 05:00:00 │
│ 2024-05-26 05:00:03 │
│ 2024-05-26 05:00:06 │
│ 2024-05-26 05:00:09 │
│ 2024-05-26 05:00:12 │
└─────────────────────┘

More complicated cron expressions

Using cron expressions for specific days of the week:

select * from cron('0 5 * * Mon', until='2024-09-30') limit 5;
┌─────────────────────┐
│        cron         │
│     timestamp_s     │
├─────────────────────┤
│ 2024-05-27 05:00:00 │
│ 2024-06-03 05:00:00 │
│ 2024-06-10 05:00:00 │
│ 2024-06-17 05:00:00 │
│ 2024-06-24 05:00:00 │
└─────────────────────┘

Function Documentation

cron(VARCHAR, start=TIMESTAMP, until=TIMESTAMP, timezone=VARCHAR)

Parameters:

  • pattern (VARCHAR): The cron pattern to evaluate.

Optional Named Parameters:

  • start (TIMESTAMP): The timestamp at which to begin evaluating the cron pattern.
  • until (TIMESTAMP): The timestamp at which to stop evaluating the cron pattern (exclusive).
  • timezone (VARCHAR): The time zone in which to evaluate the cron pattern (e.g., 'America/New_York', 'America/Chicago').

Returning

A single column cron, which contains timestamps when the cron pattern is satisfied.

About

DuckDB Cron Expression Extension

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published