In production scale testing, placing te check_modified call on get_kvrange not get_slots made the performance difference.
It should help in get_lots as well, but unable to reliably get coverage in tests with this. So for now, will leave off until a proper test can be constructed which demonstrates any benefits.
When scanning over a leveled store with a helper (e.g. segment filter and last modified date range), applying the filter will speed up the query when the block index cache is available to get_slots.
If it is not available, previously the leveled_sst did not then promote the cache after it had accessed the underlying blocks.
Now the code does this, and also when the cache has all been added, it extracts the largest last modified date so that sst files older than the passed in date can be immediately dismissed
Potentially reduce the overheads of scoring each file on every run.
The change also alters the default thresholds for compaction to favour longer runs (which will tend towards greater storage efficiency).
In giving an empty file a score of 0, a race condition was exposed. A file might not be active, but might still be rolling - and then cna get scored as 0, and immediately compacted. It will then be removed from the journal manifest.
Check each file is not rolling before making it a candidate for rolling.
Initial test included for running with recallc, and also transition from retain to recalc.
Moves all logic for startup fold into leveled_bookie - avoid the Inker requiring any direct knowledge about implementation of the Penciller.
Change the penciller check so that it returns current/replaced/missing not just true/false.
Reduce unnecessary penciller checks for non-standard keys that will always be retained - and remove redunandt code.
Expand tests of retain and recover to make sure that compaction on delete is well covered.
Also move the SQN number laong during initial loads - to stop aggressive loop to find starting SQN every file.
Improve the speed of leveled_cdb tests by disabling sync on write.
Improve the strength of check of the correct behaviour when compacting with a reduced journal size.
Two helpers for memory management:
1 - a scan over the cdb file may lead to a lot of binary references being made. So force a GC fater the scan.
2 - the penciller files contain slots that will be frequently read - so advice the page cache to pre-load them on startup.
This is in response to unexpected memory mangement issues in a potentially non-conventional setup - where the erlang VM held a lot of memory (that could be GC'd , in preference to the page cache - and consequently disk I/O and request latency were higher than expected.
Extracting binary from within a binary leaves a reference to the whole of the original binary.
If there are a lot of very large objects received abck toback - this can explode the amount of memory the penciller appears to hold (and gc cannot resolve this).
To dereference from the larger binary, need to do a binary copy
Confirm it results in many more files, if the slot count reduced. Has to handle the fact that Level 0 file has unlimited slots regardless of number of slots configured
This allows for leveled_iclerk:clerk_stop to be a sync call, so that files will only be closed once the iclerk has stopped. This is designed ot prevent iclerk crashes during shutdowns when files it is depnding on are closed mid shutdown.
A test thta will cause leveled to crash due to a low cache size being set - but protect against this (as well as the general scenario of the cache being full).
There could be a potential case where a L0 file present (post pending) without work backlog being set. In this case we want to roll the level zero to memory, but don't accept the cache update if the L0 cache is already full.
Will not lead to immediate run time changes in SST or CDB logs. These log settings will only change once the new files are re-written.
To completely change the log level - a restart of the store is necessary with new startup options.
Test fails as fetching repeated object is too slow.
```Head check took 124301 microseconds checking list of length 5000
Head check took 112286 microseconds checking list of length 5000
Head check took 1336512 microseconds checking list of length 5
2018-12-10T11:54:41.342 B0013 <0.2459.0> Long running task took 260788 microseconds with task of type pcl_head
2018-12-10T11:54:41.618 B0013 <0.2459.0> Long running task took 276508 microseconds with task of type pcl_head
2018-12-10T11:54:41.894 B0013 <0.2459.0> Long running task took 275225 microseconds with task of type pcl_head
2018-12-10T11:54:42.173 B0013 <0.2459.0> Long running task took 278836 microseconds with task of type pcl_head
2018-12-10T11:54:42.477 B0013 <0.2459.0> Long running task took 304524 microseconds with task of type pcl_head```
It taks twice as long to check for one repeated object as it does to check for 5K non-repeated objects
To allow for extraction of metadata, and building of head responses - it should eb possible to dynamically and user-defined tags, and functions to treat them.
If no function is defined, revert to the behaviour of the ?STD tag.
More obvious how to extend the code as it is all in one module.
Also add a new field to the standard object metadata tuple that may hold in the future other object metadata base don user-defined functions.
To show how this works, and prove that it does work thta way.
Test may require adjusting if tested on a slow node (e.g. reduce KeyCount or increase TTL)
Both log level and forced_logs. Allows for log_level to be changed at startup ad runtime. Also allow for a list of forced logs, so if log_level is set > info, individual info logs can be forced to be seen (such as to see stats logs).