forked from cruppstahl/upscaledb
-
Notifications
You must be signed in to change notification settings - Fork 0
/
TODO
300 lines (247 loc) · 11.9 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
x improve unittests
x erlang does not build because ./configure overwrites CFLAGS
x cleanup cursor code
x clean up the public interface
x BtreeCursor: use intrusive_list for linked list
x BtreeCursor: do not expose parent pointer
x LocalDb: cleanup the whole code
x LocalCursor: cleanup private functions
x create cursors on the stack
x avoid frequent cloning of cursors
x the LocalCursor should have 3 states:
- nil
- coupled to btree
- coupled to txn
x is_nil()
x is_btree_active()
x is_txn_active()
x set_to_nil()
x activate_btree(bool exclusive = false)
x activate_txn(bool exclusive = false)
x cleanup transactions
x hide refcount in base class ("ReferenceCounted")
x use intrusive_list.h for transactions in TxnManager
x get rid of the Context object (see trello comments)
x remove context.env
x improve journal performance
The journal only needs to be updated if a transaction is committed, but
not if it's aborted or for each separate operation.
x no need to have buffers in the journal - flush them after each
transaction was committed
x append to log only when txn is committed
In txn->commit: foreach (TxnOperation *op)
journal->append(op);
journal->append_commit(txn); // flush journal buffer
x fix Journal/recoverAfterChangesetAndCommit2Test
x refactor the code flow
x flush_transaction_to_journal();
x flush_transaction_to_changeset();
x flush_changeset_to_journal();
x flush_changeset_to_file();
x batch-flush multiple transactions
x manual test: make sure that the file sizes do not explode!
x make sure that the recovery tests are running stable
x run perftests/benchmarks
x improve MaskedVbyte
x review the current sources
x use an interface similar to libfor
x compression (32bit, 64bit, sorted, unsorted)
x decompression (32bit, 64bit, sorted, unsorted)
x add function to select from a stream (32bit, 64bit)
x add function to linear_search from an unsorted stream (32bit, 64bit)
x add function to lower_bound search from a sorted stream (32bit, 64bit)
x add function to append to a stream (32bit, 64bit)
x add function to get compressed size of a stream (32bit, 64bit)
x scalar version must not use ANY SIMD!
x use lookup table for select()? run benchmarks - no!
x make sure to use the newest version of MaskedVbyte
x improve test.cc - it has lots of duplicate code
x test.cc - make sure asserts also fail in release mode
x use MaskedVbyte if SIMD is available
x header file: when is "length" number of integers or size of block?
use "size" vs "length"
x LICENSE should be APL2
x add README.md file
x need a run-time check to disable MaskedVbyte on older platforms
x publish and promote
x remove os_get_simd_lane_width
x more Btree refactoring
x once again clean up the code
x move range size, LocalDb*, BtreeNode* to the base class
x common base class for KeyList and RecordList
x the namespaces (Zint32, Pax, ...) are not required
x deprecate support for the GroupVarbyte and StreamVbyte codecs
x document pro/cons for each codec in the header file
x VARBYTE and MASKEDVBYTE become synonyms
x use libvbyte for the VARBYTE codec
x make sure to use libvbyte w/o SIMD if --disable-simd is specified
x investigate a few issues that were reported by users
x tcmalloc does not return memory to OS (in txn with 10 mio ops) - ok
x c++ API improve documentation (UPS_AUTO_CLEANUP is ignored)
x ups_env_open with CRC32 *and* UPS_AUTO_RECOVERY *and* fsync -> fails?
x make sure that UQI queries with blobs use mmap, and do not copy the blob!
x recovery tests are brittle and sometimes fail
(enable core files and wait for segfault)
-> error inducer fires in a separate thread -> application terminates
x ignore failures in the perl script
x recovering "erase": the duplicate ID is lost!? - verify and fix
-> needs to insert with a "duplicate position" - i.e. DUPLICATE_FIRST
x fix segfault when reading an empty file
(happened a few times with the mysql plugin)
x do not flush committed transactions immediately, but batch them
x add option to disable this feature (UPS_FLUSH_TRANSACTIONS_IMMEDIATELY)
x add to ups_bench
x make sure that transactions are flushed when the env is closed
x mysql benchmark does not work
x run monster tests
x add UPS_FLUSH_TRANSACTIONS_IMMEDIATELY
x remove libuv, use boost::asio
-> the libuv dependency is a hassle; it's too difficult to install
and blocks end users from using upscaledb
x replace the code; make sure all unittests are working
x clean up handle management code
x check against memory leaks
x make sure all config settings are applied
x only bind to localhost (but make this configurable)
x fix the tests
x new API for bulk operations
x take care: internal key/record storage will be overwritten in subsequent
calls!
x add unittests
x add unittests w/ user-allocated keys and records
x add unittests w/ approx matching - make sure that keys are copied!
x add negative unittests
x test with valgrind
x use from remote
x add unittests (re-use existing ones)
x use from Java
x use from .NET
x windows build: update to protobuf 2.6.1 (fixes a memory leak)
o prebuilt windows files: add JAR file to the package!
-- file format changes --------------------------------------------------------
o move to C++11
o clang-tidy with
modernize-use-nullptr -use-override -use-using -use-default
-use-bool-literals -use-auto -make-unique
o can we improve space efficiency of the blobs? create header structures
with less overhead?
-> innodb has even higher overhead per blob, but don't require a freelist.
instead, deleted blobs are flagged as 'deleted'.
o reduce blob overhead to 10 bytes (1 byte flags, 1 byte checksum of the
id, 8 bytes allocated size/real size)
o get rid of the freelist; use a counter for the number of deleted blobs,
the total size of free bytes and track the largest free blob
o can we improve performance of duplicate keys? they are slow if the
duplicate tables grow fast
alternatives: blobs form a linked list
-> innodb uses a linked list of blobs
alternatives: use a tree structure instead of a duplicate table
- suggestion: create a linked list of duplicates, but: needs double linked
list (+ 16 bytes!)
-> but this will only work for blobs, not for inline records!
-> not good!!
o cheap solution: split the big table into smaller ones; purge when one is
empty
o run tests for the best block size
o use a faster crc32 algorithm (see Daniel Lemire's blog)
o The PageManager state is currently stored in a compressed encoding, but
it is less efficient than the standard varbyte encoding because
pages > 15 * page_size have to be split. Use a standard vbyte encoding
instead (it will anyway be required later on).
. create a nodejs-wrapper
. improve Java API wrapper (see trello)
. investigate the use of more simd code
x download/install http://ispc.github.io/documentation.html
x run the samples
x read the documentation
o create and benchmark a prototype, i.e. for "sum" (uint32_t)
x write the ispc functions
o use them in the code
o create benchmarking functions
o with and without ispc
o then extend the prototype for other platforms (sse2, sse4, avx, avx2)
and choose the correct one at runtime
o fix the other plugins (avg, sum)
o need to test the build process with and without ispc
. BlobManager: move to Database
o the Environment maybe also needs one? not sure
o move record compressor to BlobManager
o each database keeps its "last known blob page"
o blob pages are not shared between databases
o can we remove the "db" pointer from the Context structure?
. BlobManagerDisk: improve the read() code path. First check if a mapped
pointer is sufficient. Get a pointer to the header structure, then
skip the header and return the pointer.
Otherwise decompress and/or return a deep copy.
. new test case for cursors
insert (1, a)
insert (1, b) (duplicate of 1)
move (last) (-> 1, b)
insert (1, c)
move (last) (-> 1, c)? is the dupecache updated correctly?
. libvbyte
. add function to insert into a compressed sorted stream (32bit, 64bit)
. add function to delete from a compressed sorted stream (32bit, 64bit)
------------------- idea soup ---------------------------------------------
- compilation for ARM:
sudo apt-get install g++-arm-linux-gnueabihf
./configure --host=arm-linux-gnueabihf --disable-simd --enable-debug
o add tests to verify that the cursor is not modified if an operation fails!
(in cursor.cpp:LongTxnCursorTest are some wrapper functions to move or
insert the cursor; that's a good starting point)
o More things to refactor in the btree
o EraseAction uses duplicate_index + 1, InsertAction uses duplicate_index
-> use a common behaviour/indexing
o EraseAction line 71: if the node is empty then it should be merged and
moved to the freelist!
o when splitting and HAM_HINT_APPEND is set, the new page is appended.
do the same for prepend!
o Implement record compression - a few notes
ByteSlice: Pushing the Envelop of Main Memory Data
Processing with a New Storage Layout
http://delivery.acm.org/10.1145/2750000/2747642/p31-feng.pdf
1) the user defines the record structure.
2) an optimization stage reorders the record columns to optimize storage
(i.e. with dynamic programming)
3) SIMD code is generated on the fly to pack, unpack records and single
elements (see http://stackoverflow.com/questions/4911993/how-to-generate-and-run-native-code-dynamically or ask Ben/Andi...)
4) Pack like PAX (group all column values together) or each record
standalone?
o look for a better compression for DefaultRecordList, i.e.
- Each group is a GroupedVarInt w/ 4 bits per entry; a 64bit
number can then hold flags for 16 numbers
-> (but maybe increase this to hold at least 32 or 64 numbers, to
reduce the overhead ratio)
o create a GroupedVarInt<Max, T> class, where |Max| is the maximum number
of elements that are grouped, and T is the type of these elements
(i.e. uint64_t)
-> memory is managed by the caller
-> the state (i.e. used block size etc) is stored externally, and
managed by the caller
o append a key
o prepend a key
o insert a key in the middle
o grow blocks
o split blocks
o can perform copies w/o re-compressing
o try to move the Zint32 index to a base class
o Use small index which stores offset + bits for each group
o a separate bit is used to signal whether the (compressed) number is
a record id
o avoid ripple effects by growing/splitting the block
o use compression also for duplicate records
i.e. use GroupedVarint for inline duplicates
o Concurrency: merge BtreeUpdates in the background
o when recovering, give users the choice if active transactions should be
aborted (default behavior) or re-created
o needs a function to enumerate them
o A new transactional mode: read-only transactions can run "in the past" - only
on committed transactions. therefore they avoid conflicts and will always
succeed.
o need a function to get the txn of a conflict (same as in v2)
ups_status_t ups_txn_get_conflicting_txn(ups_txn_t *txn, ups_txn_t **other);
oder: txn-id zurückgeben? sonst gibt's ne race condition wenn ein anderer
thread "other" committed/aborted
o also add to c++ API
o add documentation (header file)
o add documentation (wiki)