Skip to content

asof and window join

q) // setup
q)\l 2_where_clause_fakedb.q
q)makedb[50000;10000]
q)tables[]
`depth`quotes`trades
q)count quotes
50000
q)count trades
10000
* there are many more quotes than trades * we want to find quotes just before the trade took place for each sym

As of Join aj aj0

  • format : aj[c1..cn;table1;table2]
  • cn : column on which as-of-join is performed, most recent time from t2 which is asof(<=) time in table1
  • c1..cn-1 : columns are matched exactly in table1 and table2
  • aj will have all columns from table1 and values of columns c1..cn are from table1 and value from table2 will have matching values
    q)select from aj[`sym`time;trades;quotes] where sym=`AAPL
    time                          sym  src price size bid   ask   bsize asize
    -------------------------------------------------------------------------
    2020.12.26D08:00:03.057000000 AAPL L   25.33 4527 25.33 25.37 9000  3000
    2020.12.26D08:00:12.190000000 AAPL O   25.38 1889 25.35 25.38 4000  7000
    2020.12.26D08:01:22.084000000 AAPL L   25.37 5413 25.35 25.37 6000  5500
    2020.12.26D08:02:12.686000000 AAPL O   25.33 1841 25.32 25.33 4000  5000
    2020.12.26D08:02:59.207000000 AAPL N   25.32 957  25.32 25.36 1500  6500
    2020.12.26D08:03:04.870000000 AAPL N   25.34 991  25.32 25.34 3000  1000
    2020.12.26D08:03:05.194000000 AAPL N   25.34 454  25.32 25.34 3000  1000
    2020.12.26D08:03:50.205000000 AAPL N   25.34 176  25.31 25.34 1500  2000
    2020.12.26D08:03:52.988000000 AAPL N   25.34 491  25.31 25.34 1500  2000
    2020.12.26D08:04:00.136000000 AAPL L   25.35 4392 25.32 25.35 1000  5500
    2020.12.26D08:04:14.266000000 AAPL N   25.32 530  25.32 25.33 5500  9000
    2020.12.26D08:05:16.333000000 AAPL L   25.35 1118 25.33 25.35 7000  6000
    2020.12.26D08:06:36.068000000 AAPL L   25.33 1321 25.33 25.33 5500  6000
    2020.12.26D08:06:46.472000000 AAPL L   25.33 1973 25.33 25.35 2500  8500
    2020.12.26D08:07:40.059000000 AAPL O   25.36 4447 25.36 25.4  10000 500
    2020.12.26D08:08:01.875000000 AAPL N   25.37 4330 25.37 25.42 7500  10000
    2020.12.26D08:08:07.323000000 AAPL N   25.42 3511 25.37 25.42 7500  10000
    2020.12.26D08:08:11.894000000 AAPL N   25.37 4588 25.37 25.42 7500  10000
    2020.12.26D08:08:16.016000000 AAPL N   25.37 3810 25.37 25.42 7500  10000
    2020.12.26D08:08:25.522000000 AAPL N   25.37 2264 25.37 25.42 7500  10000
    ..
    q)select from quotes where sym=`AAPL
    time                          sym  src bid   ask   bsize asize
    --------------------------------------------------------------
    2020.12.26D08:00:02.705000000 AAPL N   25.32 25.37 6500  1000
    2020.12.26D08:00:02.882000000 AAPL L   25.33 25.37 9000  3000
    2020.12.26D08:00:04.111000000 AAPL L   25.33 25.37 9500  7000
    2020.12.26D08:00:05.040000000 AAPL L   25.35 25.37 2500  9500
    2020.12.26D08:00:09.235000000 AAPL N   25.35 25.37 4000  6500
    2020.12.26D08:00:10.576000000 AAPL O   25.35 25.38 4000  7000
    2020.12.26D08:00:16.432000000 AAPL O   25.35 25.39 3000  10000
    2020.12.26D08:00:25.528000000 AAPL N   25.35 25.37 3000  4000
    2020.12.26D08:00:26.237000000 AAPL N   25.34 25.37 6000  9000
    2020.12.26D08:00:32.058000000 AAPL L   25.35 25.38 3000  4000
    2020.12.26D08:00:47.294000000 AAPL L   25.34 25.36 9500  2500
    2020.12.26D08:01:06.551000000 AAPL N   25.35 25.38 4500  2500
    2020.12.26D08:01:08.708000000 AAPL N   25.32 25.37 9000  7500
    2020.12.26D08:01:16.411000000 AAPL L   25.35 25.37 6000  5500
    2020.12.26D08:01:25.050000000 AAPL L   25.34 25.37 500   5500
    2020.12.26D08:01:33.147000000 AAPL N   25.35 25.36 6500  5000
    2020.12.26D08:01:46.838000000 AAPL L   25.34 25.37 2500  4500
    2020.12.26D08:01:47.149000000 AAPL L   25.32 25.35 3000  5500
    2020.12.26D08:01:48.825000000 AAPL N   25.33 25.36 500   7000
    2020.12.26D08:01:59.163000000 AAPL L   25.35 25.36 5000  500
    ..
    
  • query performance will matter a lot if join columnsc1..cn has grouped(g#) attribute set for in-memory tables and parted(p#) attribute set for on-disk tables
    q)\t aj[`sym`time;trades;quotes]
    12
    q)meta quotes
    c    | t f a
    -----| -----
    time | p
    sym  | s   g
    src  | s   g
    bid  | f
    ask  | f
    bsize| i
    asize| i
    q)meta trades
    c    | t f a
    -----| -----
    time | p
    sym  | s   g
    src  | s   g
    price| f
    size | i
    q)update `#sym from `quotes
    `quotes
    q)\t aj[`sym`time;trades;quotes]
    5805
    
  • aj0 will used if in result table we want to show values for column cn from table2

aj in hdb

  • main difference with on-disk tables is we would have to load them in-memory first
  • while loading parted attribute p# is lost - workaroun : load data for each date and raze them into 1 table - which will preserve parted attributes

Window Join wj wj0

  • in aj we would get last value/most-recent value
  • in wj we can specify our window and also aggregation function which would be applied on joined data
  • format: wj[w; c; t; (q; (f0;c0); (f1;c1))]
  • t and q are tables
  • w is pair of lists of times/timestamps, begin and end
  • c are the common columns
  • f0, f1 are aggregate function applied to values in q columns c0 and c1 over the intervals
  • returns for each record in t, a record with additional columns c0 and c1, which are the results of the aggregation functions applied to values over the matching intervals in w.
  • aj is a special case of wj or wj is generalized form of aj
  • exact match columns c needs to have attribute #p or #g applied to them, other wise query would be much slower and also might give wrong results as well
    q)w: -1000 1000+\:trades.time
    q)w
    08:00:02.057 08:00:03.230 08:00:10.150 08:00:11.190 08:00:11.531 08:00:14.312..
    08:00:04.057 08:00:05.230 08:00:12.150 08:00:13.190 08:00:13.531 08:00:16.312..
    q)\t wj[w; `sym`time;trades;(quotes; (max;`bid);(min;`ask))]
    61
    q)wj[w; `sym`time;trades;(quotes; (max;`bid);(min;`ask))]
    time         sym  src price size bid   ask
    --------------------------------------------
    08:00:03.057 AAPL L   25.33 4527 25.33 25.37
    08:00:04.230 YHOO N   35.49 1457 35.49 35.52
    08:00:11.150 DELL L   29.08 6788 43.53 25.38
    08:00:12.190 AAPL O   25.38 1889 25.35 25.38
    08:00:12.531 MSFT L   36.08 4974 36.08 36.11
    08:00:15.312 ORCL O   32.2  668  43.53 25.37
    08:00:24.269 ORCL L   32.23 1116 32.19 32.23
    08:00:29.856 MSFT L   36.13 766  36.1  36.13
    08:00:33.033 ORCL L   32.23 1749 43.54 25.37
    08:00:33.126 ORCL L   32.19 1394 43.54 25.37
    08:00:40.441 MSFT L   36.12 1151 36.12 36.13
    08:00:46.699 IBM  N   43.57 1958 43.55 43.57
    08:00:49.597 CSCO L   35.46 2164 35.46 35.49
    08:00:51.534 CSCO L   35.46 2762 35.46 35.49
    08:00:56.880 YHOO N   35.52 227  35.49 35.52
    08:00:57.649 MSFT L   36.13 2028 36.08 36.13
    08:00:58.363 CSCO O   35.49 2273 35.45 35.49
    08:01:00.037 DELL L   29.06 3260 29.04 29.06
    08:01:04.382 NOK  O   31.79 3765 31.79 31.83
    08:01:06.810 IBM  N   43.55 4224 43.55 25.36
    ..
    
Back to top