Two helpers for memory management:
1 - a scan over the cdb file may lead to a lot of binary references being made. So force a GC fater the scan.
2 - the penciller files contain slots that will be frequently read - so advice the page cache to pre-load them on startup.
This is in response to unexpected memory mangement issues in a potentially non-conventional setup - where the erlang VM held a lot of memory (that could be GC'd , in preference to the page cache - and consequently disk I/O and request latency were higher than expected.
Extracting binary from within a binary leaves a reference to the whole of the original binary.
If there are a lot of very large objects received abck toback - this can explode the amount of memory the penciller appears to hold (and gc cannot resolve this).
To dereference from the larger binary, need to do a binary copy
When there is hevay PUT load, leveled_sst files could go into the delete-pending state befre the GC message is receieved - and the GC message would then interrupt the timeout cycle and lead ot the file not being GC'd until close.
in OTP R16 (and perhaps other OTP releases) there is a failure to fully garbage collect leveled_sst files after thya have initialised. They sppear to maintain a 4MB "hangover" from the initialisation process.
This can be removed by manually calling garbage_collect. So we do this now on all new non-L0 files. A L0 file will be short-lived or switched - short-lived and it doesn't matter, switched and this is already GC'd.
The L0 Pid has used a lot of memory in the construction of the file (something like 50MB). This won't be GC'd immediately. This is fine, as this will normally be short-lived. However if the SST file is switched levels ... then this may mean thta we have multiple SST files with memory not being GC'd.
This is now down on an async message passing loop between the penciller and the new SST file. this way when the penciller it shuts down, and can call close on a L0 file that is awaiting a fetch - rather than be trapped in deadlock.
The deadlock otherwise occurs if a penciller is sent a close immediately after if thas prompted a new level zero.
Make sure there is no change pending regardless of why maybe_roll_memory has been called.
Also, check that the manifest SQN has been incremented before accepting the change.
Conflict here would lead to data loss in the penciller, so extra safety is important.
the Journla snapshot is not a true snapshot, in that the active file in the snapshot can still be taking appends. So when getting a snapshot it is necessary to check if folding over the snapshot that the SQN is <= JournalSQN when the snapshot is taken.
Normally consistency of the snapshot is managed as the operation depends on the penciller, and the penciller *is* a snapshot. Not in this case, as the penciller will return true on a sqn check if the pcl SQN is behind the Journal. So the Journal folder, has been given an additionla check to stop at the JournalSQN.
This is perhaps a fault in the pcl check sqn, which should only return true on an exact match? I'm nervous about changing this though, so we have a less pure fix for now.
These logs duplicate information being received from other logs, so reduced to debug.
The long running test needs to change with the LONG_RUNNING macro
This test (prior tot his commit) works fine. However, something about running it with riak `make test` causes it to fail. The process crashes when the file:delete(F2) is called.
As the test works in isolation on R16, and also because aprocess crashing in the real world in this stage would not be the end of the world (this whole part was added as a way of dealing with some unlikely but possible tidy-up scenarios in eqc tests), losing the test in R16 is tolerable.
So In R16 tests (which includes riak make test), the delete will no longer be called in this test.