Skip to content

Rack Align Pivot

Racking

  • common operation with timeseries data is grouping data by time bucket using function xbar
  • However using xbar we don't get buckets for all symbols for all possible time buckets, rather buckets are created for which data is present
  • Let's create timeseries table
    q)tab:([]sym:`TCS`SBI`TCS`SBI`SBI;time: 09:00 09:01 09:20 09:32 09:34; size: 200 300 5000 400 350; price: 39.99 45.3 34.56 22.34 99.89)
    q)tab
    sym time  size price
    --------------------
    TCS 09:00 200  39.99
    SBI 09:01 300  45.3
    TCS 09:20 5000 34.56
    SBI 09:32 400  22.34
    SBI 09:34 350  99.89
    q)select sum price by sym, 15 xbar time.minute from tab
    sym minute| price
    ----------| ------
    SBI 09:00 | 45.3
    SBI 09:30 | 122.23
    TCS 09:00 | 39.99
    TCS 09:15 | 34.56
    // here we dont see 09:15 bucket for SBI
    
  • we can resolve above issue with racking - creating bucket for each 15 minute interval and merging it with table tab
  • first we define start, end and bucket
    q)start:09:00
    q)end:09:59
    q)bucket:15
    q)(end-start)%bucket
    3.933333
    q)ceiling (end-start)%bucket
    4
    q)til ceiling (end-start)%bucket
    0 1 2 3
    q)bucket*til ceiling (end-start)%bucket
    0 15 30 45
    q)start+bucket*til ceiling (end-start)%bucket
    09:00 09:15 09:30 09:45
    q)times: start+bucket*til ceiling (end-start)%bucket
    q)times
    09:00 09:15 09:30 09:45
    
  • to create a rack we need original symbols from tables
    q)select distinct sym from tab
    sym
    ---
    TCS
    SBI
    q)rack: (`sym xasc select distinct sym from tab) cross ([]minute: times)
    q)rack
    sym minute
    ----------
    SBI 09:00
    SBI 09:15
    SBI 09:30
    SBI 09:45
    TCS 09:00
    TCS 09:15
    TCS 09:30
    TCS 09:45
    
  • now we can perform same operation but with value for each sym for each window
    q)select sum price by sym,bucket xbar time.minute from tab //earlier
    sym minute| price
    ----------| ------
    SBI 09:00 | 45.3
    SBI 09:30 | 122.23
    TCS 09:00 | 39.99
    TCS 09:15 | 34.56
    q)rack#select sum price by sym,bucket xbar time.minute from tab //after using rack
    sym minute| price
    ----------| ------
    SBI 09:00 | 45.3
    SBI 09:15 |
    SBI 09:30 | 122.23
    SBI 09:45 |
    TCS 09:00 | 39.99
    TCS 09:15 | 34.56
    TCS 09:30 |
    TCS 09:45 |
    
  • we can fill the blank data with fills or ^
    q)update 0^price from rack#select sum price by sym,bucket xbar time.minute from tab
    sym minute| price
    ----------| ------
    SBI 09:00 | 45.3
    SBI 09:15 | 0
    SBI 09:30 | 122.23
    SBI 09:45 | 0
    TCS 09:00 | 39.99
    TCS 09:15 | 34.56
    TCS 09:30 | 0
    TCS 09:45 | 0
    q)update fills size by sym from update 0^price from rack#select sum sum price, last size by sym,bucket xbar time.minute from tab
    sym minute| price  size
    ----------| -----------
    SBI 09:00 | 45.3   300
    SBI 09:15 | 0      300
    SBI 09:30 | 122.23 350
    SBI 09:45 | 0      350
    TCS 09:00 | 39.99  200
    TCS 09:15 | 34.56  5000
    TCS 09:30 | 0      5000
    TCS 09:45 | 0      5000
    
  • we can create a function to do same on HDB
    q)\l fakedb.q
    USAGE: makedb[NUM QUOTES;NUM TRADES] eg makedb[100000;10000]
    
    makedb1[NUM QUOTES;NUM TRADES;DATE;RANDOMISED COUNT FACTOR] eg makedb1[100000;10000;.z.d;.3]
    
    makehdb[HDBDIR;NUM DAYS;APPROXIMATE NUM QUOTES PER DAY; APPROXIMATE NUM TRADES PER DAY] eg makehdb[`:hdb; 5; 100000; 10000]
    
    makecsv[CSVDIR;NUM DAYS;NUM QUOTES;NUM TRADES]
    
    q)
    q)makehdb[`:hdb;5;100000;10000]
    2022.10.05T17:37:30.075 saving data for date 2014.04.21 to :hdb
    2022.10.05T17:37:30.139 saving data for date 2014.04.22 to :hdb
    2022.10.05T17:37:30.225 saving data for date 2014.04.23 to :hdb
    2022.10.05T17:37:30.306 saving data for date 2014.04.24 to :hdb
    2022.10.05T17:37:30.380 saving data for date 2014.04.25 to :hdb
    q)\l hdb
    q)\a
    `depth`quotes`trades
    q)// function to create rack
    q)createrack:{[start;end;bucket;symbol] times:start+bucket*til ceiling (end-start:bucket xbar start)%bucket}
    q)createrack[2022.10.05D09:00;2022.10.05D09:59;0D00:15;`SBI]
    2022.10.05D09:00:00.000000000 2022.10.05D09:15:00.000000000 2022.10.05D09:30:..
    q)//only create list of times
    q)//lets add cross with sym logic
    q)createrack:{[start;end;bucket;symbol] times:start+bucket*til ceiling (end-start:bucket xbar start)%bucket; ([]sym:symbol,()) cross ([]time: times)}
    q)createrack[2022.10.05D09:00;2022.10.05D09:59;0D00:15;`SBI]
    sym time
    ---------------------------------
    SBI 2022.10.05D09:00:00.000000000
    SBI 2022.10.05D09:15:00.000000000
    SBI 2022.10.05D09:30:00.000000000
    SBI 2022.10.05D09:45:00.000000000
    
  • we need to create another function for using createrack function with hdb data
    q)jointables:{[start;end;bucket;symbol] createrack[start;end;bucket;symbol]#select sum price by sym, bucket xbar time from trades where date within `date$(start;end), sym in symbol, time within `timestamp$(start;end)}
    q)jointables[2014.04.21D09:00;2014.04.21D12:59;0D00:15;`AAPL]
    sym  time                         | price
    ----------------------------------| -------
    AAPL 2014.04.21D09:00:00.000000000| 853.09
    AAPL 2014.04.21D09:15:00.000000000| 823.7
    AAPL 2014.04.21D09:30:00.000000000| 794.11
    AAPL 2014.04.21D09:45:00.000000000| 799.56
    AAPL 2014.04.21D10:00:00.000000000| 624.37
    AAPL 2014.04.21D10:15:00.000000000| 650.2
    AAPL 2014.04.21D10:30:00.000000000| 803.53
    AAPL 2014.04.21D10:45:00.000000000| 933.05
    AAPL 2014.04.21D11:00:00.000000000| 785.23
    AAPL 2014.04.21D11:15:00.000000000| 1034.56
    AAPL 2014.04.21D11:30:00.000000000| 779.19
    AAPL 2014.04.21D11:45:00.000000000| 1205.52
    AAPL 2014.04.21D12:00:00.000000000| 873.81
    AAPL 2014.04.21D12:15:00.000000000| 991.83
    AAPL 2014.04.21D12:30:00.000000000| 945.7
    AAPL 2014.04.21D12:45:00.000000000| 823.28
    

Align

  • Another common problem is to align two asynchronous timeseries data where the time rarely match
  • Alignment can be for the same dataset but different instruments for example aligning GOOG with AAPL
  • There are three main approached to aligning
    1. we can use bucketing to align - have buckets of common time and then join
    2. use asof join
    3. use fill join where we want all data at every timepoint
  • we can load the data from HDB
    q)trades1: select time, sym, price, size from trades where date=2014.04.21
    q)quotes1: select time, sym, bid,ask from quotes where date=2014.04.21
    
  • aj will join prevailing quote price to each trade i.e; the quote which occured just before will be joined on to that trade
  • aj makes exact match on the sym and as-of match on time
    q)aj[`sym`time;trades1;quotes1]
    time                          sym  price size bid   ask
    ---------------------------------------------------------
    2014.04.21D08:00:12.155000000 AAPL 25.31 2450 25.31 25.33
    2014.04.21D08:00:42.186000000 AAPL 25.32 289  25.32 25.34
    2014.04.21D08:00:51.764000000 AAPL 25.34 3167 25.34 25.37
    2014.04.21D08:00:54.526000000 AAPL 25.34 6474 25.34 25.37
    2014.04.21D08:00:57.071000000 AAPL 25.34 1335 25.34 25.37
    2014.04.21D08:01:19.773000000 AAPL 25.4  317  25.36 25.4
    
  • Now, what if we want to align prices and cumulative volumes of two different syms
  • example: we want to align prevailing GOOG data to AAPL data
  • For this, first we create a table for price and total volume for AAPL trades and same for GOOG trades
    q)apple: select time, appleprice:price, applevol: sums size from trades1 where sym in `AAPL
    q)apple
    time                          appleprice applevol
    -------------------------------------------------
    2014.04.21D08:00:12.155000000 25.31      2450
    2014.04.21D08:00:42.186000000 25.32      2739
    2014.04.21D08:00:51.764000000 25.34      5906
    2014.04.21D08:00:54.526000000 25.34      12380
    2014.04.21D08:00:57.071000000 25.34      13715
    q)google: select time, googprice:price, googvol: sums size from trades1 where sym in `GOOG
    q)google
    time                          googprice googvol
    -----------------------------------------------
    2014.04.21D08:00:01.184000000 41.3      2422
    2014.04.21D08:00:23.103000000 41.3      4313
    2014.04.21D08:00:49.515000000 41.29     5444
    2014.04.21D08:00:57.435000000 41.26     12207
    2014.04.21D08:01:02.314000000 41.29     18694
    
  • now we can use aj to align the two tables
    q)aj[`time;apple;google]
    time                          appleprice applevol googprice googvol
    -------------------------------------------------------------------
    2014.04.21D08:00:12.155000000 25.31      2450     41.3      2422
    2014.04.21D08:00:42.186000000 25.32      2739     41.3      4313
    2014.04.21D08:00:51.764000000 25.34      5906     41.29     5444
    2014.04.21D08:00:54.526000000 25.34      12380    41.29     5444
    2014.04.21D08:00:57.071000000 25.34      13715    41.29     5444
    

Offset Alignment

  • it is shifting the time field of one table to calculate something either side of another
  • using similar approach, calculate for every trade the VWAP for the 5 minutes following the trade (excluding the contribution from current trade)
  • we can calculate VWAP using the running sum of the size and the running sum of the size*price
    q)makedb[100000;10000]
    q)tables`
    `depth`quotes`trades
    q)trades1: update sumps: sums price*size, sumsize: sums size by sym from trades
    q)trades1
    time                          sym  src price size sumps    sumsize
    ------------------------------------------------------------------
    2022.12.25D08:00:04.569000000 CSCO N   35.48 3827 135782   3827
    2022.12.25D08:00:22.311000000 MSFT L   36.11 354  12782.94 354
    2022.12.25D08:00:27.438000000 IBM  N   43.54 2589 112725.1 2589
    2022.12.25D08:00:27.511000000 YHOO L   35.54 6956 247216.2 6956
    2022.12.25D08:00:27.906000000 IBM  N   43.54 2397 217090.4 4986
    2022.12.25D08:00:30.278000000 NOK  L   31.76 353  11211.28 353
    
  • Now we want to take just the running vwap from trades1 and shift it all by 5 mins, we can do this by creating new table

q)shiftreades: select time-0D00:05,sym,sumplus5: sumps, sumsizeplus5: sumsize from trades1
q)shiftreades
time                          sym  sumplus5 sumsizeplus5
--------------------------------------------------------
2022.12.25D07:55:04.569000000 CSCO 135782   3827
2022.12.25D07:55:22.311000000 MSFT 12782.94 354
2022.12.25D07:55:27.438000000 IBM  112725.1 2589
2022.12.25D07:55:27.511000000 YHOO 247216.2 6956
2022.12.25D07:55:27.906000000 IBM  217090.4 4986
2022.12.25D07:55:30.278000000 NOK  11211.28 353
* Now we want to join this time shifted vwap info back onto the non-shifted vwap using aj * we can have desired result by subtracting shifted from non-shifted

q)aligntab: aj[`sym`time;delete src,size from trades1;shiftreades]
q)aligntab
time                          sym  price sumps    sumsize sumplus5 sumsizeplus5
-------------------------------------------------------------------------------
2022.12.25D08:00:04.569000000 CSCO 35.48 135782   3827    1492906  42000
2022.12.25D08:00:22.311000000 MSFT 36.11 12782.94 354     507052.6 14077
2022.12.25D08:00:27.438000000 IBM  43.54 112725.1 2589    1258692  28796
2022.12.25D08:00:27.511000000 YHOO 35.54 247216.2 6956    1170549  33014
2022.12.25D08:00:27.906000000 IBM  43.54 217090.4 4986    1258692  28796
2022.12.25D08:00:30.278000000 NOK  31.76 11211.28 353     498678.1 15746
2022.12.25D08:00:30.579000000 CSCO 35.45 161518.7 4553    1571526  44201
2022.12.25D08:00:32.203000000 DELL 29.1  30758.7  1057    734669.4 25218

q)update vwapplus5: (sumplus5-sumps)%sumsizeplus5 - sumsize from delete date, time from aligntab
sym  price sumps    sumsize sumplus5 sumsizeplus5 vwapplus5
-----------------------------------------------------------
CSCO 35.48 135782   3827    1492906  42000        35.55193
MSFT 36.11 12782.94 354     507052.6 14077        36.01761
IBM  43.54 112725.1 2589    1258692  28796        43.72753
YHOO 35.54 247216.2 6956    1170549  33014        35.43376
IBM  43.54 217090.4 4986    1258692  28796        43.74641
NOK  31.76 11211.28 353     498678.1 15746        31.66809
CSCO 35.45 161518.7 4553    1571526  44201        35.56313
DELL 29.1  30758.7  1057    734669.4 25218        29.13417
ORCL 32.25 181728.8 5635    1451326  45032        32.22575
ORCL 32.26 268250.1 8317    1451326  45032        32.22324

Pivoting

  • Used to move row values into column values
  • We have a table tab which has sym, time and price
    q)tab:([]sym:`TCS`SBI`TCS`SBI`SBI;time: 09:00 09:01 09:20 09:32 09:34; size: 200 300 5000 400 350; price: 39.99 45.3 34.56 22.34 99.89)
    q)tab
    sym time  size price
    --------------------
    TCS 09:00 200  39.99
    SBI 09:01 300  45.3
    TCS 09:20 5000 34.56
    SBI 09:32 400  22.34
    SBI 09:34 350  99.89
    
  • Now we want to swap the row values to column values
  • Ex: we can use sym column as column headers
  • we get the distinct sym from table, these will form as column headers
    q)colnames: asc exec distinct sym from tab
    q)colnames
    `s#`SBI`TCS
    
  • we then extract dictionary for sym!price for each time value that we have
    q)exec sym!price by time from tab
    09:00| (,`TCS)!,39.99
    09:01| (,`SBI)!,45.3
    09:20| (,`TCS)!,34.56
    09:32| (,`SBI)!,22.34
    09:34| (,`SBI)!,99.89
    
  • then we can populate the dictionary using our colnames list
    q)pivtab: exec colnames#sym!price by time from tab
    q)pivtab
         | SBI   TCS
    -----| -----------
    09:00|       39.99
    09:01| 45.3
    09:20|       34.56
    09:32| 22.34
    09:34| 99.89
    
  • we need to name the time column
    q)pivtab: exec colnames#sym!price by time:time from tab
    q)pivtab
    time | SBI   TCS
    -----| -----------
    09:00|       39.99
    09:01| 45.3
    09:20|       34.56
    09:32| 22.34
    09:34| 99.89
    
  • finally, we can forward fill the gaps using fills
    q)pivtab: fills exec colnames#sym!price by time:time from tab
    q)pivtab
    time | SBI   TCS
    -----| -----------
    09:00|       39.99
    09:01| 45.3  39.99
    09:20| 45.3  34.56
    09:32| 22.34 34.56
    09:34| 99.89 34.56
    q)pivtab: 0^fills exec colnames#sym!price by time:time from tab
    q)pivtab
    time | SBI   TCS
    -----| -----------
    09:00| 0     39.99
    09:01| 45.3  39.99
    09:20| 45.3  34.56
    09:32| 22.34 34.56
    09:34| 99.89 34.56
    
  • simple formula can be: pivot_table: 0^fills exec (exec asc distinct col1 from tab)#col1!col2 by col3:col3 from tab
  • here col1 is column header row
  • col2 are values inside matrix
  • col3 is row of matrix
Back to top