leveled

Author	SHA1	Message	Date
Martin Sumner	6af1d3b003	Use more keys in bloom Use 4 keys in the bloom (which is closer to optimal size). This should halve the fpr - as we cna now use the large ExtraHash rather than being constrained by the SegmentHash here.	2017-10-24 15:42:53 +01:00
Martin Sumner	29a2d9fc35	Revert "Use lower fpr tinyblooms" This reverts commit `3fd5260cd9`.	2017-10-24 15:16:25 +01:00
Martin Sumner	3fd5260cd9	Use lower fpr tinyblooms ... but maybe they're slower?	2017-10-24 15:15:15 +01:00
Martin Sumner	a128dcdadf	Change hash algorithm for penciller Switch from magic hash to md5 - to hopefully remove the need for some of the artificial jumps required to get expected fall positive ratios. Also split the hash into two 16-bit integers. We assume that SegmentID (from the perspective of AAE merkle/tictac trees) will always be at least 16 bits. the idea is that hashes should be used in blooms and indexes such that some advantage can be gained from just knowing the segmentID - in particular when folding over all the keys in a bucket. Performance testing has been difficult so far - I think due to “cloud” mysteries.	2017-10-20 23:04:29 +01:00
Martin Sumner	61724cfedb	Merge branch 'master' into mas-riakaae-impl-2	2017-09-28 13:23:29 +01:00
Martin Sumner	3950942da3	Roll in fix for intermittently failing test As descibed in https://github.com/martinsumner/leveled/issues/92 Only the first fix was made. Just to eb safe - archiving means renaming to another file with a different extension. Assumption is that renamed files cna be manually reaped if necessary.	2017-09-27 23:52:49 +01:00
Martin Sumner	433cc37eb6	Rolled back LMD in metadata Because there's no sensible way of using it if objects are mutable - you still end up with the same false positives in the tictactree. Didn't fully rollback the change as spec and docs were added which chould be useful going forward.	2017-09-27 12:26:12 +01:00
Martin Sumner	2e5b9c80f4	Add max LMD to Riak metadata This is an interim stage towwards enhancing the proxy object so that it contains more helper information (other than size). The aim is to be able to run more efficient fold_heads queries that might filter on LMD range (so as not to have to co-ordinate the running of comparative queries). For example if producing a tictactree to compare between two different offsets, a max LMD could be passed in so that changes beyond the time the first query was requested can be ignored.	2017-09-27 12:15:18 +01:00
Martin Sumner	389694b11b	Add exportable option to tictac Idea being that sometimes you may wish to compare a tictac tree between leveled and something that doesn't understand erlang:phash or term_to_binary. So allow the magic_hash to be used instead - and perhaps an extract function that does base64 encoding or something similar.	2017-09-26 22:49:40 +01:00
Martin Sumner	2f9afa1469	Add support for performing a magic hash on a binary Ignore unnecessray term_to_binary if already binary. This will be useful when we use magic_hash in tictac_trees we wish to be exportable.	2017-09-26 16:32:59 +01:00
Heinz N. Gies	25389893cf	Add compatibility for old and new random / rand functions	2017-08-01 11:24:12 +02:00
martinsumner	80fd2615f6	Implement blacklist/whitelist Change from the all/whitelist ebhavior to the blacklist/whitelist behaviour documented in the write-up	2017-07-11 11:44:01 +01:00
martinsumner	97fdd36d53	Returning bucket when bucket is all Need to know {Bucket, Key} not just Key if all buckets are being covered by nrt aae. So shoehorning this in - will also allow for proper use of FilterFun when filtering by partition.	2017-07-03 18:03:13 +01:00
martinsumner	52ca0e4b6c	Test expansion Detect a recent difference	2017-07-02 19:33:18 +01:00
martinsumner	954995e23f	Support for recent AAE index With basic ct test. Doesn't currently prove expiry of index. Doesn't prove ability to find segments. Assumes that either "all" buckets or a special list of buckets require indexing this way. Will lead to unexpected results if the same bucket name is used across different Tags. The format of the index has been chosen so that hopeully standard index features can be used (e.g. return_terms).	2017-06-30 16:31:22 +01:00
martinsumner	8da8722b9e	Add temporary aae index Pending ct tests. The aae index should expire after limit_minutes and be on an index which is rounded to unit_minutes.	2017-06-30 10:03:36 +01:00
martinsumner	ebef27f021	Extract Last Modified Date from Riak Object As part of process to supporting a recent changes index for near-real-time anti-entropy	2017-06-27 16:25:18 +01:00
martinsumner	c586b78f45	Initial code with busted ct test Initiat comparison made betwene trees externally - but ct test is bust.	2017-06-19 11:36:57 +01:00
Martin Sumner	7642aac2cc	Change Riak object hash approach Change the riak object hash being kept in the metadata, to being a hash of the vector clock	2017-06-16 10:14:24 +01:00
martinsumner	400f65f557	Switch to binary metadata Trya nd maintain binary format when stored in Ledger so less swapping/changing as added and removed.	2017-04-04 10:02:35 +00:00
martinsumner	6143fcb664	Remove binary_to_term when fetching don't need to binary_to_term key changes	2017-03-29 15:37:04 +01:00
martinsumner	f108871691	Vclock metadata change Test performance ocntinues to be worse since the vlock metadata change. Reversing out juts in case.	2017-03-21 18:15:56 +00:00
Martin Sumner	eec9d509f9	Add back hash performance tests Need to consider if magic hash is an issue	2017-03-20 20:28:47 +00:00
martinsumner	7154815a2b	Keep vclock as binary No obvious, need at present for vlock to be a term within leveled	2017-03-20 20:28:02 +00:00
martinsumner	f3ffa920af	Trying to standardise binary manipulation of value Looking into theory that use of term_to_binary is imperfect. Also may be better to compress values only when they are compacted?	2017-03-20 15:43:54 +00:00
martinsumner	508da0be45	Additional unit tests	2017-03-14 22:47:48 +00:00
martinsumner	8a5ed1e198	Confirm skip on unknowns when compacting journal	2017-03-14 17:26:39 +00:00
martinsumner	5311a157d5	Merge remote-tracking branch 'refs/remotes/origin/mas-sstblock-i42' into mas-sstblockv2-i42	2017-03-13 19:22:41 +00:00
martinsumner	c787e0cd78	Handle corrupted Ledger Key when applying recovery strategy Otherwise may blow up in journal_compaction_bustedjournal test	2017-03-13 14:32:46 +00:00
martinsumner	b2f3d882a9	Draft of branch to condense range_only keys	2017-03-10 20:43:37 +00:00
martinsumner	a9101e4781	SibCount must be non-zero	2017-02-26 22:45:20 +00:00
martinsumner	90c920fe86	Additional unit test work Reverts a previous ct test fix	2017-01-23 15:15:40 +00:00
martinsumner	76bdd83346	Manifest refactor - STILL BROKEN Some working tests now, but sitll broken	2017-01-14 16:36:05 +00:00
martinsumner	5a88565c08	Switch to binary index in pmem Remove the ets index in pmem and use a binary index instead. This may be slower, but avoids the bulk upload to ets, and means that matches know of position (so only skiplists with a match need be tried). Also stops the discrepancy between snapshots and non-snapshots - as previously the snapshots were always slowed by not having access to the ETS table.	2017-01-05 21:58:33 +00:00
martinsumner	2f8ff640a9	Test coverage Add some furthe runit tests to improve test coverage	2017-01-04 21:36:59 +00:00
martinsumner	060ce2e263	Add put timing points	2016-12-20 23:11:50 +00:00
martinsumner	299e8e6de3	Initial phash test phash does not appear to be a potential causer of delay	2016-12-20 20:55:56 +00:00
martinsumner	9e28287231	Resolve failing recovery test Now passing consistently with a number of different corruptions catered for (including corruption of the Tag in the Inker Key)	2016-12-16 23:18:55 +00:00
martinsumner	f4e2e274e0	Reintroduce riak metadata extraction The full riak metadata had been stripped from the Ledger update for performance reasons. However, the full metadata is required in order to save a GET before a PUT. Therefore we want to do isolated testing on this change to establish the relative cost value in that cost saving.	2016-12-14 10:27:11 +00:00
martinsumner	4b48ed14c6	Correct Mistyped 2 ^ 32	2016-12-11 20:38:20 +00:00
martinsumner	1b63845050	Bring compression back to SFT It is expensive on the CPU - but it leads to a 4 x increase in the cache coverage. Try and make some small micro gains in list handling in create_block	2016-12-11 15:02:33 +00:00
martinsumner	44cee5a6e8	Experiemnt with no compression Does compression hurt CPU more than the benefit gaine din some cases?	2016-12-11 12:33:09 +00:00
martinsumner	32ac305c67	Compaction test error Compaction tests now throwing up different corruption points	2016-12-11 06:53:25 +00:00
martinsumner	2d3a40e6f1	Magic Hash - and no L0 Index Move to using the DJ Bernstein Magic Hash consistently, and trying to make sure we only hash once for each operation (as the hash is more expensive than phash2). The improved lookup time for missing keys should allow for the L0 index to be removed, and hence speed up the completion time for push_mem operations. It is expected there will be a second stage of creating a tinybloom as part of the SFT creation process, and then adding that tinybloom to the manifest. This will then reduce the message passing required for a GET not in the cache or higher levels	2016-12-11 01:02:56 +00:00
martinsumner	5bdb7fd7fa	Alter Riak HEAD Change the extract of Riak metadata. In Riak-based volume tests hte writing of SFT files is tanking. Could this be the "extra" metadata. i.e. There are only current plans to look at the vclock. Sibling count is free to fetch, what if we just get these two items, will it be less CPU to extract the metadata, but also will the reduced weight reduce the downstream impact?	2016-12-08 23:38:50 +00:00
martinsumner	e8c1d39df9	Switch to binary format Riak object Initial change to try and test assuming that leveled received the binary format of Riak objects (and parses that for metadata).	2016-11-28 22:26:09 +00:00
martinsumner	6684e8e1d3	Refine query to accept fold functions Need to be able to pass external fold functions into different queries, to work as a Riak backend	2016-11-18 15:53:22 +00:00
martinsumner	7147ec0470	Logging - Phase 1 Abstract out logging and introduce a logbase	2016-11-02 18:14:46 +00:00
martinsumner	84a92b5f95	Further testing of compaction Check we avoid crashing in challenging compaction scenarios	2016-11-01 00:46:14 +00:00
martinsumner	7d3a04428b	Refactor snapshot Better reuse snapshotting fucntions in the Bookie, and use it to support doing Inker clone checks	2016-10-31 17:26:28 +00:00

1 2

66 commits