How to calculate days to and after in table SQL Vertica?
Mia Lopez
I have SQL table like below:
date col1
2020-01-02 xxxAnd I have special dates like: 2020-01-01, 2020-01-05, 2020-05-10
And I need to calculate number of days to the nearest special date and number of days after last special date, so i need a result like below:
next_special = 3 because the nearest special date for 2020-01-02 is 2020-01-05 (3 days)
last_special = 1 because last special date for 2020-01-02 was 1 day ago (2020-01-01)
date col1 next_special last_special
2020-01-02 xxx 3 1 2 2 Answers
If I understand correctly, you can use complex case expressions along with least() and greatest():
select t.*, nullif( least(case when '2020-01-01' < date then datediff(day, '2020-01-01', date) else 999999 end, case when '2020-01-05' < date then datediff(day, '2020-01-05', date) else 999999 end, case when '2020-05-10' < date then datediff(day, '2020-05-10', date) else 999999 end ), 999999 ) as prev_special, nullif( least(case when '2020-01-01' > date then datediff(day, date, '2020-01-01') else 999999 end, case when '2020-01-05' > date then datediff(day, date, '2020-01-05') else 999999 end, case when '2020-05-10' > date then datediff(day, date, '2020-05-10') else 999999 end ), 999999 ) as next_special
from t;I think this would be simpler to express in almost any other database -- because they support correlated scalar subqueries in the SELECT clause.
EDIT:
If you have the values in a separate table and you have a unique id on your original table, you could use:
select t.date, t.col1, min(case when d.date < t.date then datediff(day, d.date, t.date end) as prev_special, min(case when d.date > t.date then datediff(day, t.date, d.date end) as next_special
from t cross join dates d
group by t.date, t.col1;Easier to express. Much worse performance-wise.
5Vertica works with the event series join, which is obtained using the INTERPOLATE PREVIOUS VALUE predicate in a LEFT JOIN clause. Instead of NULLS in a LEFT JOIN, you get the data from the immediately preceding row.
And Vertica is pretty good at OLAP functions - and at pipeline parallelism, so that nesting several queries usually hurts a bit less.
In the example below, I create a series of 15 consecutive dates out of the two limit dates, from 1st Jan to 15th Jan, add 'xxx' as col1 in indata , and join with a specdays table (after enriching each row with its successor date), then apply an event series join, and to the maths.
Is that what you're after?
WITH
specdays(dt) AS ( SELECT DATE '2020-01-01' -- new year's day
UNION ALL SELECT DATE '2020-01-03' -- sunday
UNION ALL SELECT DATE '2020-01-06' -- epiphany
UNION ALL SELECT DATE '2020-01-10' -- sunday
)
,
prevnext AS ( SELECT dt
, LEAD(dt) OVER w AS nextdt
FROM specdays
WINDOW w AS (ORDER BY dt)
)
,
-- list of dates beween 1st Jan and 15th Jan
indata AS (
SELECT tms::DATE AS dt
, 'xxx' AS col1
FROM ( SELECT TIMESTAMP '2020-01-01'
UNION ALL SELECT TIMESTAMP '2020-01-15'
) limits(dt)
TIMESERIES tms AS '1 DAY' OVER(ORDER BY dt)
)
SELECT indata.*
, prevnext.dt AS prevspecday
, prevnext.nextdt AS nextspecday
, TIMESTAMPDIFF('DAY',prevnext.dt ,indata.dt ) AS last_special
, CASE WHEN prevnext.dt = indata.dt THEN 0 ELSE TIMESTAMPDIFF('DAY',indata.dt ,prevnext.nextdt) END AS next_special
FROM indata
LEFT JOIN prevnext ON indata.dt INTERPOLATE PREVIOUS VALUE prevnext.dt
;
dt |col1|prevspecday|nextspecday|last_special|next_special
2020-01-01|xxx |2020-01-01 |2020-01-03 | 0| 0
2020-01-02|xxx |2020-01-01 |2020-01-03 | 1| 1
2020-01-03|xxx |2020-01-03 |2020-01-06 | 0| 0
2020-01-04|xxx |2020-01-03 |2020-01-06 | 1| 2
2020-01-05|xxx |2020-01-03 |2020-01-06 | 2| 1
2020-01-06|xxx |2020-01-06 |2020-01-10 | 0| 0
2020-01-07|xxx |2020-01-06 |2020-01-10 | 1| 3
2020-01-08|xxx |2020-01-06 |2020-01-10 | 2| 2
2020-01-09|xxx |2020-01-06 |2020-01-10 | 3| 1
2020-01-10|xxx |2020-01-10 |- | 0| 0
2020-01-11|xxx |2020-01-10 |- | 1|-
2020-01-12|xxx |2020-01-10 |- | 2|-
2020-01-13|xxx |2020-01-10 |- | 3|-
2020-01-14|xxx |2020-01-10 |- | 4|-
2020-01-15|xxx |2020-01-10 |- | 5|-