Skip to content

Attributes

kdb has 4 attributes that can be applied to datastructures (generally lists)

Attr Symbol Description
sorted `s# Items are sorted - binary search
grouped `g# A dictionary maps each occurence to position in array
unique `u# no duplicates
parted `p# stored same items together; dictionary maps first occurence
  • Each attribute provides different performance benefits
  • Attributes may be lost during modification

  • Applying attributes

    q)list: 1 2 3 4 5
    q)list
    1 2 3 4 5
    q)list: `s#1 2 3 4 5
    q)list
    `s#1 2 3 4 5
    

  • we can check which attributes have been applied by checking meta data of the table

    q)\l 2_where_clause_fakedb.q
    USAGE: makedb[NUM QUOTES;NUM TRADES] eg makedb[100000;10000]
    
    makedb1[NUM QUOTES;NUM TRADES;DATE;RANDOMISED COUNT FACTOR] eg makedb1[100000;10000;.z.d;.3]
    
    makehdb[HDBDIR;NUM DAYS;APPROXIMATE NUM QUOTES PER DAY; APPROXIMATE NUM TRADES PER DAY] eg makehdb[`:hdb; 5; 100000; 10000]
    
    makecsv[CSVDIR;NUM DAYS;NUM QUOTES;NUM TRADES]
    
    q)makedb[10000;50000]
    q)meta trades
    c    | t f a
    -----| -----
    time | p    
    sym  | s   g
    src  | s   g
    price| f    
    size | i    
    

  • we can also update a table by applying an attribute to a specific column

q)update time:`s#time from `trades
`trades
q)meta trades
c    | t f a
-----| -----
time | p   s
sym  | s   g
src  | s   g
price| f    
size | i    
  • applying attribute to a column in table may improve query speed
  • they are commonly applied to column with many repeating values

1. Sorted `s

  • tells q that list is sorted in ascending order
  • if we use function asc to sorted a list, then q will automatically add `s# attribute
  • Just having a list in ascending order wont apply `s# attribute, we need to explicitly tell q to apply it
    q)list: 1 2 3 4 5
    q)list
    1 2 3 4 5
    q)asc list
    `s#1 2 3 4 5
    q)list: `s#list
    q)list
    `s#1 2 3 4 5
    
  • `s# attribute optimises the searching on list by performing binary search on the list
  • with `s# attribute search takes approx same time for element at start, mid and end
  • without `s# attribute search takes more time as we search for element at end
q)a: til 100000
q)a
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 ..
q)b:`s#til 100000
q)b
`s#0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 ..
q)\t do[1000;a?2]
0
q)\t do[10000;a?2]
4
q)\t do[10000;a?1000]
20
q)\t do[10000;a?10000-4]
87
q)\t do[10000;a?100000-4]
505
q)\t do[10000;b?100000-4]
11
q)\t do[10000;b?10000-4]
10
q)\t do[10000;b?1000]
7
q)\t do[10000;b?2]
8

2. Unique `u

  • tells q that list has unique elements and behind the scenes q creates a hash map, so it can query the elements in constant time
    q)n:1000000
    q)list: til n
    q)listu: `u#til n
    q)\t do[10000;list?`int$n%2]
    2364
    q)\t do[10000;listu?`int$n%2]
    24
    
  • `u# attribute is preserved when appending to list provided list still has unique elements after appending

3. Grouped `g

  • when `g# attribute is applied to a list, q forms a lookup table of elements in list against list of their indexes
q)52?`1
`o`d`d`h`e`d`l`e`m`p`a`l`k`k`p`p`f`d`m`h`f`g`o`o`i`n`d`m`e`l`f`n`a`b`g`f`f`o`..
// lookup table looks something like this:
q)group 52?`1
a| 0 13 17 42 44
i| 1 4
l| ,2
f| 3 9 29 32 38
h| 5 18 21 25 26
b| 6 49
d| 7 14 23 27 41
c| 8 15 35 45
g| 10 28
m| 11 16 31 47
k| 12 19 22 30 33 34
j| 20 46
p| 24 51
e| 36 48
o| 37 39 40
n| 43 50
  • it causes q to take more time while applying `g# attribute and it takes more space(to store lookup table)
    q)n
    1000000
    q)syms: -1000?`4
    q)t1:([]a:n#syms;b:n?1000 )
    q)tg:([]a:`g#n#syms;b:n?1000 )
    q)\t do[10000;select from t1 where a=`lghi]
    4930
    q)\t do[10000;select from tg where a=`lghi]
    19
    q)
    

4. Parted `p

  • similar to grouped but here q only stores index of first occurence of an element
  • `p# is not preserved when appending to list even if elements added maintains list partitions
  • This is mostly used for on-disk data

    q)list
    `a`a`a`a`b`b`b`c`c
    q)`p#list
    `p#`a`a`a`a`b`b`b`c`c
    q)pl: `p#list
    q)pl
    `p#`a`a`a`a`b`b`b`c`c
    // underlying lookup table looks something like this:
    q)first each group pl
    a| 0
    b| 4
    c| 7
    q)pl,`c // appending `c at end causes pl to lose `p# 
    `a`a`a`a`b`b`b`c`c`c
    

  • when querying from HDB we should use partition column first(date/month/year) and then use column with attribute, if we use any other column after date then attribute is lost

    q)\l .
    q)quotes
    date       sym  time                          src bid ask bsize asize
    ---------------------------------------------------------------------
    2014.04.21 aacj 2014.04.21D08:52:45.114000000 O           1500  5000
    2014.04.21 aacj 2014.04.21D09:56:58.529000000 L           9500  8000
    2014.04.21 aacj 2014.04.21D10:11:02.073000000 O           500   5500
    2014.04.21 aacj 2014.04.21D11:05:48.110000000 O           5000  10000
    2014.04.21 aacj 2014.04.21D11:57:34.256000000 N           6500  7000
    2014.04.21 aacj 2014.04.21D13:13:43.109000000 O           6500  4000
    2014.04.21 aacj 2014.04.21D14:53:19.787000000 N           2500  7500
    2014.04.21 aacj 2014.04.21D15:42:19.164000000 L           4000  1000
    2014.04.21 aacj 2014.04.21D16:14:48.307000000 L           9500  1500
    2014.04.21 aacj 2014.04.21D16:24:17.107000000 O           8000  7500
    2014.04.21 aakk 2014.04.21D08:33:13.751000000 N           6500  6500
    2014.04.21 aakk 2014.04.21D09:00:10.954000000 N           5500  4000
    2014.04.21 aakk 2014.04.21D09:10:39.732000000 N           9000  6000
    2014.04.21 aakk 2014.04.21D10:31:16.183000000 L           7500  9500
    2014.04.21 aakk 2014.04.21D11:05:31.205000000 L           3000  2500
    2014.04.21 aakk 2014.04.21D11:06:45.876000000 O           8000  6500
    2014.04.21 aakk 2014.04.21D11:44:05.643000000 O           2000  8500
    2014.04.21 aakk 2014.04.21D11:47:34.308000000 N           8000  1000
    2014.04.21 aakk 2014.04.21D12:25:59.515000000 L           10000 5000
    2014.04.21 aakk 2014.04.21D12:26:02.080000000 N           1000  9500
    ..
    q)meta quotes
    c    | t f a
    -----| -----
    date | d
    sym  | s   p
    time | p
    src  | s
    bid  | f
    ask  | f
    bsize| i
    asize| i
    q)meta select from quotes where date=2014.04.21
    c    | t f a
    -----| -----
    date | d
    sym  | s   p
    time | p
    src  | s
    bid  | f
    ask  | f
    bsize| i
    asize| i
    q)meta select from quotes where date=2014.04.21,bid>50
    c    | t f a
    -----| -----
    date | d
    sym  | s
    time | p
    src  | s
    bid  | f
    ask  | f
    bsize| i
    asize| i
    

Speacial Case - step function using `s

  • we can use `s# to create step function
  • if `s# is not applied we cant search for values in between
  • if `s# is applied we get values which is prevaling the argument value

    q)d:1 1.6 1.9 2.1!`a`b`c`d
    q)d
    1  | a
    1.6| b
    1.9| c
    2.1| d
    q)d[1.0]
    `a
    q)d[1.1]
    `
    q)d:`s#1 1.6 1.9 2.1!`a`b`c`d
    q)d[1.0]
    `a
    q)d[1.1]
    `a
    q)d[1.6]
    `b
    q)d[1.85]
    `b
    

  • same can be applied to tables as well

    q)t:`s#([sym:`A`A`B;date:2014.12.05 2014.12.22 2015.01.10] px: 10 12 15)
    q)t
    sym date      | px
    --------------| --
    A   2014.12.05| 10
    A   2014.12.22| 12
    B   2015.01.10| 15
    q)meta t
    c   | t f a
    ----| -----
    sym | s   p
    date| d
    px  | j
    q)t(`A;2014.12.05)
    px| 10
    q)t(`A;2014.12.21)
    px| 10
    q)t(`A;2015.12.21)
    px| 12
    q)t(`B;2015.12.21)
    px| 15
    

Back to top