Commit graph

133 commits

Author SHA1 Message Date
Martin Sumner
4e9fa2a206 Timeout long-running snapshots
Add logic to timeout long-running snapshots.
2017-04-05 09:16:01 +01:00
martinsumner
80b62cbff2 Debug excessive log
Logs excessively during 2i tests.  Set to debug for now, until can think
further about this
2017-03-16 20:03:18 +00:00
martinsumner
8b2091cef7 Remaining ledger snapshots log -> debug
This log under 2i load appears thousands of times per second.  Not
sustainable as an info log.  Will need to think about how to manage
this, but setting back to debug for now
2017-03-16 19:37:39 +00:00
martinsumner
4f0622d2ac Merge remote-tracking branch 'refs/remotes/origin/mas-sstblock-i42' into mas-sstblockv2-i42 2017-03-13 21:09:13 +00:00
martinsumner
c5bb150f97 Drop some logs
Not found to be interesting so far
2017-03-13 20:30:33 +00:00
martinsumner
5311a157d5 Merge remote-tracking branch 'refs/remotes/origin/mas-sstblock-i42' into mas-sstblockv2-i42 2017-03-13 19:22:41 +00:00
martinsumner
f3e962c43a Add level to SST slow fetch log 2017-03-13 12:16:36 +00:00
martinsumner
b2f3d882a9 Draft of branch to condense range_only keys 2017-03-10 20:43:37 +00:00
martinsumner
9ad6969b0d Seed randomnes at Actor startup 2017-03-06 21:35:02 +00:00
martinsumner
87f2c5d7ae Merge remote-tracking branch 'origin/mas-2iphase2-i34' into mas-2iphase2-i34
# Conflicts:
#	src/leveled_log.erl
2017-03-06 18:44:22 +00:00
martinsumner
c92107e4b4 2i order of events
When running a load of mainly 2i queries, there is a huge cost in the
previous snapshot code.  The time taken to create a clone of the
Penciller (duplicating all the LoopState) varied between 1 and 200ms
depedning on the size of the LoopState.

For 2i queries, most of that LoopState was then being thrown away after
running the query against the levelzero_cache.  This was taking < 1ms on
average.  It would be better to avoid the o(100)ms of CPU  burning and
block for o(1)ms - so th eorder of events have been changed ot filter
first so only the small part of the LoopState actually required is
copied to the clone.
2017-03-06 18:42:32 +00:00
Martin Sumner
c1dc92720c Random, random, random
well random had me foxed.  As the clone was short-lived process it only
called random once - and so always got the same answer.

random has to be seeded to give different answers when called once from
a process - so this is now seeded in leveed_log
2017-03-06 13:51:38 +00:00
martinsumner
eb6f668fcd Use log at random
Easy way to sample frequent things - especially when they'r ein ocverag
equeries
2017-03-06 10:34:56 +00:00
martinsumner
5c2f05858d Alter logging to help understand performance factors
Change logging of the snapshots to better understand performance
2017-03-06 10:17:51 +00:00
martinsumner
fad9bfbff1 Trigger log points, deprecate lower PUT log points
Only record PUT timings at the Bookie, and trigger them correctly on the
right log point
2017-02-26 19:07:55 +00:00
martinsumner
adb78f5c5a Resolve pclerk crash
Need to add extra logging to understand why pclerk crashes in some
volume tests.

Penciller's Clerk <0.813.0> shutdown now complete for reason
{badarg,[{dict,fetch,[63,{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}],[{file,[100,105,99,116,46,101,114,108]},{line,126}]},{leveled_pclerk,handle_cast,2,[{file,[115,114,99,47,108,101,118,101,108,101,100,95,112,99,108,101,114,107,46,101,114,108]},{line,96}]},{gen_server,handle_msg,5,[{file,[103,101,110,95,115,101,114,118,101,114,46,101,114,108]},{line,604}]},{proc_lib,init_p_do_apply,3,[{file,[112,114,111,99,95,108,105,98,46,101,114,108]},{line,239}]}]}

Should only be prompted after prompt deletions had bene updated.
Perhaps a race whereby somehow it is prompted again after it is emptied.
2017-02-09 14:33:39 +00:00
martinsumner
d57b74d967 Re-introduce tinybloom to SST
This had been removed due to the CPU cost of adding - however then the
tinybloom wa simplemented by directly manipulating bits through binary
comprehension - rather than applying bor band bsl bsr operations.

With these operations the cost of producing and checking the bloom is
<10% by comparison.
2017-01-24 21:51:12 +00:00
martinsumner
861cedf45e Add back missed space in logs 2017-01-23 19:16:17 +00:00
martinsumner
fb896f13b1 Improve logging - add timestamp to logs 2017-01-23 18:56:01 +00:00
martinsumner
90c920fe86 Additional unit test work
Reverts a previous ct test fix
2017-01-23 15:15:40 +00:00
Martin Sumner
2c4c5c9597 Corrections
Support an empty list of entries being added.  Also specify a tree
correctly in all from_orderedlist scenarios
2017-01-23 00:22:53 +00:00
martinsumner
14ebf94e56 Correct botched manifest entry removal 2017-01-17 10:53:13 +00:00
martinsumner
5076187545 Reduce log level of troubleshooting log 2017-01-17 10:39:31 +00:00
martinsumner
5adfb5c5ef Extra logging for troubleshooting 2017-01-17 10:32:15 +00:00
martinsumner
e6c4c9eff8 Log deletions from mnaifest (via GC) 2017-01-15 11:04:26 +00:00
martinsumner
13c5cc8899 Add timing points
Add some timing points to manifest updates
2017-01-15 10:36:50 +00:00
martinsumner
7b0b3e9b83 Add logging of Manifest SQN at startup 2017-01-14 22:26:26 +00:00
martinsumner
dcf3afc056 Log basement setting when creating files 2017-01-14 21:19:51 +00:00
martinsumner
85c03a61f9 Log changes 2017-01-14 20:11:01 +00:00
martinsumner
13c81f0ed1 Basic working
Some basic tests working - but still outstanding issues.
2017-01-14 19:41:09 +00:00
martinsumner
0204a23a58 Refactor - STILL BROKEN
Will at least compile, but in need of a massive eunit rewrite and
associated debug to get back to a potentially verifiable state again
2017-01-13 18:23:57 +00:00
martinsumner
08641e05cf Manifest changes - BROKEN
Going to abandond this branch for now.  The change is beoming
excessively time consuming, and it is not clear that a smaller change
might not achieve more of the objectives.

All this is broken - but perhaps could get picke dup another day.
2017-01-12 13:48:43 +00:00
martinsumner
ed27a53452 New manifest code
The manifest had previously been a list for eveyr leevl of the manifest,
and keys were found by folding over the list.  By Level 4 the list will
be 4096 items long, and so the fold would be expensive, and would be
required many times.

To make this less expensive an ETS table is to use.  However, the ETS
table needs to be shared between snapshots and so in order to use the
ETS the entries to the table need to support multi-versioning - whereby
each clone can see a version of the table at the Manifest SQN the clone
is supporting.
2017-01-09 14:52:26 +00:00
martinsumner
70c6e52fa7 Remove logs for slot_cache 2017-01-03 15:27:28 +00:00
martinsumner
b6ae0e1af5 Fix broken SST cache 2017-01-03 13:03:59 +00:00
martinsumner
31d4346806 Log improvements
Log on bad CRC, and also not seeing SST timing logs, so log these more
frequently
2017-01-02 18:54:19 +00:00
Martin Sumner
2079fff7f8 Switched to indexed blocks as slot implementation
Prior to this refactor, the slot and been made up of four blocks with
an external binary index.  Although the form of the index has changed
again, micro-benchmarking once again showed that this was a relatively
efficient mechanism.
2017-01-02 10:47:04 +00:00
martinsumner
0c543ae3ec Remove legacy logs 2016-12-29 05:10:11 +00:00
martinsumner
e01b310d20 Handle production of empty file 2016-12-29 05:09:47 +00:00
martinsumner
dc28388c76 Removed SFT
Now moved over to SST on this branch
2016-12-29 02:07:14 +00:00
martinsumner
3716de1c82 Revert back to sampling
timing logs should be based on a sample
2016-12-28 15:49:58 +00:00
martinsumner
cbad375373 Refactoring of skiplist ranges and support for sst ranges
the Skiplist range code was needlessly complicated.  It may be faster
than the new code, but the complexity delta cannot be support for such a
small change.

This was incovered whilst troubleshooting the initial kv range test.
2016-12-28 15:48:04 +00:00
martinsumner
0d0ab32653 Some end-to-end testing 2016-12-24 17:48:31 +00:00
martinsumner
58d8e60994 Some basic code layout work 2016-12-24 15:12:24 +00:00
martinsumner
0ddaaf9ac3 Stopped unnecessary seek for last_key
When rolling we already know the last_key - no need to seek for it on
startup.

The time it takes for this seek needs to be considered with regards to
startup time.  Can we do without knowing lastkey?
2016-12-22 19:51:39 +00:00
martinsumner
a131b99082 Randomising logging of PUT timings 2016-12-22 17:33:14 +00:00
martinsumner
353fb08e21 Randomise logging of GET/HEAD samples 2016-12-22 17:28:41 +00:00
martinsumner
ee534081c3 Reduce log levels
Remove some log noise to debug level
2016-12-22 17:15:42 +00:00
martinsumner
151dd3ab70 Sample only for HEAD/GET response times
Report regularly but only on a sample
2016-12-22 16:47:36 +00:00
Martin Sumner
676e8fa494 Add Get Timing 2016-12-22 15:45:38 +00:00