Attributes
kdb has 4 attributes that can be applied to datastructures (generally lists)
Attr | Symbol | Description |
---|---|---|
sorted | `s# | Items are sorted - binary search |
grouped | `g# | A dictionary maps each occurence to position in array |
unique | `u# | no duplicates |
parted | `p# | stored same items together; dictionary maps first occurence |
- Each attribute provides different performance benefits
-
Attributes may be lost during modification
-
Applying attributes
-
we can check which attributes have been applied by checking meta data of the table
q)\l 2_where_clause_fakedb.q USAGE: makedb[NUM QUOTES;NUM TRADES] eg makedb[100000;10000] makedb1[NUM QUOTES;NUM TRADES;DATE;RANDOMISED COUNT FACTOR] eg makedb1[100000;10000;.z.d;.3] makehdb[HDBDIR;NUM DAYS;APPROXIMATE NUM QUOTES PER DAY; APPROXIMATE NUM TRADES PER DAY] eg makehdb[`:hdb; 5; 100000; 10000] makecsv[CSVDIR;NUM DAYS;NUM QUOTES;NUM TRADES] q)makedb[10000;50000] q)meta trades c | t f a -----| ----- time | p sym | s g src | s g price| f size | i
-
we can also update a table by applying an attribute to a specific column
q)update time:`s#time from `trades
`trades
q)meta trades
c | t f a
-----| -----
time | p s
sym | s g
src | s g
price| f
size | i
- applying attribute to a column in table may improve query speed
- they are commonly applied to column with many repeating values
1. Sorted `s
- tells q that list is sorted in ascending order
- if we use function
asc
to sorted a list, then q will automatically add `s# attribute - Just having a list in ascending order wont apply `s# attribute, we need to explicitly tell q to apply it
- `s# attribute optimises the searching on list by performing binary search on the list
- with `s# attribute search takes approx same time for element at start, mid and end
- without `s# attribute search takes more time as we search for element at end
q)a: til 100000
q)a
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 ..
q)b:`s#til 100000
q)b
`s#0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 ..
q)\t do[1000;a?2]
0
q)\t do[10000;a?2]
4
q)\t do[10000;a?1000]
20
q)\t do[10000;a?10000-4]
87
q)\t do[10000;a?100000-4]
505
q)\t do[10000;b?100000-4]
11
q)\t do[10000;b?10000-4]
10
q)\t do[10000;b?1000]
7
q)\t do[10000;b?2]
8
2. Unique `u
- tells q that list has unique elements and behind the scenes q creates a hash map, so it can query the elements in constant time
- `u# attribute is preserved when appending to list provided list still has unique elements after appending
3. Grouped `g
- when `g# attribute is applied to a list, q forms a lookup table of elements in list against list of their indexes
q)52?`1
`o`d`d`h`e`d`l`e`m`p`a`l`k`k`p`p`f`d`m`h`f`g`o`o`i`n`d`m`e`l`f`n`a`b`g`f`f`o`..
// lookup table looks something like this:
q)group 52?`1
a| 0 13 17 42 44
i| 1 4
l| ,2
f| 3 9 29 32 38
h| 5 18 21 25 26
b| 6 49
d| 7 14 23 27 41
c| 8 15 35 45
g| 10 28
m| 11 16 31 47
k| 12 19 22 30 33 34
j| 20 46
p| 24 51
e| 36 48
o| 37 39 40
n| 43 50
- it causes q to take more time while applying `g# attribute and it takes more space(to store lookup table)
4. Parted `p
- similar to grouped but here q only stores index of first occurence of an element
- `p# is not preserved when appending to list even if elements added maintains list partitions
-
This is mostly used for on-disk data
-
when querying from HDB we should use partition column first(date/month/year) and then use column with attribute, if we use any other column after date then attribute is lost
q)\l . q)quotes date sym time src bid ask bsize asize --------------------------------------------------------------------- 2014.04.21 aacj 2014.04.21D08:52:45.114000000 O 1500 5000 2014.04.21 aacj 2014.04.21D09:56:58.529000000 L 9500 8000 2014.04.21 aacj 2014.04.21D10:11:02.073000000 O 500 5500 2014.04.21 aacj 2014.04.21D11:05:48.110000000 O 5000 10000 2014.04.21 aacj 2014.04.21D11:57:34.256000000 N 6500 7000 2014.04.21 aacj 2014.04.21D13:13:43.109000000 O 6500 4000 2014.04.21 aacj 2014.04.21D14:53:19.787000000 N 2500 7500 2014.04.21 aacj 2014.04.21D15:42:19.164000000 L 4000 1000 2014.04.21 aacj 2014.04.21D16:14:48.307000000 L 9500 1500 2014.04.21 aacj 2014.04.21D16:24:17.107000000 O 8000 7500 2014.04.21 aakk 2014.04.21D08:33:13.751000000 N 6500 6500 2014.04.21 aakk 2014.04.21D09:00:10.954000000 N 5500 4000 2014.04.21 aakk 2014.04.21D09:10:39.732000000 N 9000 6000 2014.04.21 aakk 2014.04.21D10:31:16.183000000 L 7500 9500 2014.04.21 aakk 2014.04.21D11:05:31.205000000 L 3000 2500 2014.04.21 aakk 2014.04.21D11:06:45.876000000 O 8000 6500 2014.04.21 aakk 2014.04.21D11:44:05.643000000 O 2000 8500 2014.04.21 aakk 2014.04.21D11:47:34.308000000 N 8000 1000 2014.04.21 aakk 2014.04.21D12:25:59.515000000 L 10000 5000 2014.04.21 aakk 2014.04.21D12:26:02.080000000 N 1000 9500 .. q)meta quotes c | t f a -----| ----- date | d sym | s p time | p src | s bid | f ask | f bsize| i asize| i q)meta select from quotes where date=2014.04.21 c | t f a -----| ----- date | d sym | s p time | p src | s bid | f ask | f bsize| i asize| i q)meta select from quotes where date=2014.04.21,bid>50 c | t f a -----| ----- date | d sym | s time | p src | s bid | f ask | f bsize| i asize| i
Speacial Case - step function using `s
- we can use `s# to create step function
- if `s# is not applied we cant search for values in between
-
if `s# is applied we get values which is prevaling the argument value
-
same can be applied to tables as well
q)t:`s#([sym:`A`A`B;date:2014.12.05 2014.12.22 2015.01.10] px: 10 12 15) q)t sym date | px --------------| -- A 2014.12.05| 10 A 2014.12.22| 12 B 2015.01.10| 15 q)meta t c | t f a ----| ----- sym | s p date| d px | j q)t(`A;2014.12.05) px| 10 q)t(`A;2014.12.21) px| 10 q)t(`A;2015.12.21) px| 12 q)t(`B;2015.12.21) px| 15