jsx/doc/overview.edoc

@author Alisdair Sullivan <alisdairsullivan@yahoo.ca>
@copyright 2010 Alisdair Sullivan
@version really, really beta
@title jsx
@doc jsx is a json parser with an easily transformable representation, the ability to parse streams incrementally, low memory overhead and a clean generator based interface. it also includes an implementation of <a href="http://www.erlang.org/eeps/eep-0018.html">eep0018</a>, a json reformatter and a json verifier


== contents ==

<ol>
    <li>{@section introduction}</li>
    <li>{@section features}</li>
    <li>{@section usage}</li>
    <li>{@section contributing}</li>
    <li>{@section acknowledgements}</li>
</ol>


== introduction ==

what's a language without strong json integration? one that no one is gonna use for much of anything that requires integration with other systems

erlang has a number of json libraries, some of them very good, but all of them tied to fairly specific representations and usage scenarios. working with json in erlang, outside of simple decoding into terms that probably aren't ideal, is laborious and clumsy. jsx seeks to correct this unfortunate state


== features ==

jsx is not an end to end json parser. jsx takes a json document and produces a generator that returns a jsx event and a new generator. a jsx event is an atom or tuple that represents a json structural element (like the start or end of an object or array) or a json value (like a string or a number). this provides a simple, easy to consume, iterative api that makes it easy to produce arbitrarily complex representations of the json document

the representation of jsx events was chosen for pragmatic reasons. strings, integers and floats are encoded in json as strings of unicode characters and many erlang functions operate on lists so returning them as lists of unicode codepoints is both efficient and convenient. structural elements are converted to descriptive atoms for ease of matching and clarity. json literals (`true', `false' and `null') are encoded as atoms for ease of matching and wrapped in a tagged tuple to differentiate them from structural elements

in cases where an incomplete json document is supplied to the parser, upon reaching the end of the document the generator may also return a new function that allows another chunk of the json document to be parsed as if parsing were never interrupted. this is useful for parsing very large json documents (to avoid holding the entire document in memory) or for parsing as data is made available (like over a network or from storage)

jsx attempts to follow the <a href="http://www.ietf.org/rfc/rfc4627.txt?number=4627">json specification</a> as closely as possible but is realistic about actual usage of json and provides optional support for comments, json values not wrapped in a json object or array and streams containing multiple json documents

jsx is wicked fast. sort of. it's as fast as possible without sacrificing a useable interface, at least. things like efficiency of binary matching, elimination of unused intermediate states, the relative costs of using lists or binaries and even garbage collection were tested and taken into consideration during development


== usage ==

`jsx:parser()' is the entry point for a jsx parser. it returns a function that takes a binary and attempts to parse it as if it were a chunk of valid (or not so valid, depending on the options passed to `jsx:parser/1') json encoded in utf8, utf16 (big or little endian) or utf32 (big or little endian). it's an incremental parser, it's parsed on demand; an 'event' at a time. events are tuples of the form:

```
    {event, Event, Next}
      Event = start_object
        | start_array
        | end_object
        | end_array
        | {string, [Character]}
        | {integer, [Character]}
        | {float, [Character]}
        | {literal, true}
        | {literal, false}
        | {literal, null}
        | end_json
      Character -- a unicode codepoint represented by an erlang integer
      Next -- a function of arity zero that, when invoked, returns the next event
'''

the decoder can also return two other tuples:

```
    {incomplete, More}
      More -- a function of arity 1 that accepts additional input for 
      the parser and resumes parsing as if never interrupted. the semantics 
      are as if the new binary were appended to the already parsed binary
      
    {error, badjson}
'''
    
`incomplete' is returned when input is exhausted. `error' is returned when invalid json input is detected. how obvious

putting all of this together, the following short module:

```
    -module(jsx_ex).

    -export([simple_decode/1]).

    simple_decode(JSON) when is_binary(JSON) ->
        P = jsx:parser(),
        decode(P(JSON), []).

    decode({event, end_json, _Next}, Acc) -> 
        lists:reverse(Acc);    
    decode({event, Event, Next}, Acc) -> 
        decode(Next(), [Event] ++ Acc).
'''

does this when called from the shell with suitable input:

```
    1> jsx_ex:simple_decode(
           <<"{
               \"dead rock stars\": [\"kurt cobain\", \"elliott smith\", \"nicky wire\"], 
               \"total\": 3.0, 
               \"courtney killed kurt\": true
           }">>
       ).
    [start_object,
     {key,"dead rock stars"},
     start_array,
     {string,"kurt cobain"},
     {string,"elliott smith"},
     {string,"nicky wire"},
     end_array,
     {key,"total"},
     {float,"3.0"},
     {key,"courtney killed kurt"},
     {literal,true},
     end_object]
'''

     
jsx also has an eep0018 decoder and encoder, a json pretty printer and a json verifier:

```
    1> jsx:json_to_term(<<"[1, true, \"hi\"]">>).
    [1,true,<<"hi">>]
    2> jsx:term_to_json([{name, <<"alisdair">>}, {balance, -3742.35}, {employed, false}]).
    <<"{\"name\":\"alisdair\",\"balance\":-3742.35,\"employed\":false}">>
    3> jsx:is_json(<<"{}">>).
    true
    4> jsx:format(<<"  \t\t   [\n   \"i love whitespace\"\n ]\n ">>).
    <<"[\"i love whitespace\"]">>
'''


== contributing ==

jsx is available on <a href="http://github.com/talentdeficit/jsx">github</a>. users are encouraged to fork, edit and make pull requests


== acknowledgments ==

jsx wouldn't be possible without Lloyd Hilaiel's <a href="http://lloyd.github.com/yajl/">yajl</a> and the encouragement of the erlang community. thanks also must be given to the <a href="https://mail.mozilla.org/listinfo/es-discuss">es-discuss mailing list</a> and Douglas Crockford for insight into the intricacies of json
edoc documentation added 2010-08-19 23:30:22 -07:00			`@author Alisdair Sullivan <alisdairsullivan@yahoo.ca>`
			`@copyright 2010 Alisdair Sullivan`
			`@version really, really beta`
			`@title jsx`
			`@doc jsx is a json parser with an easily transformable representation, the ability to parse streams incrementally, low memory overhead and a clean generator based interface. it also includes an implementation of <a href="http://www.erlang.org/eeps/eep-0018.html">eep0018</a>, a json reformatter and a json verifier`


			`== contents ==`

			`<ol>`
			`<li>{@section introduction}</li>`
			`<li>{@section features}</li>`
			`<li>{@section usage}</li>`
			`<li>{@section contributing}</li>`
			`<li>{@section acknowledgements}</li>`
			`</ol>`


			`== introduction ==`

			`what's a language without strong json integration? one that no one is gonna use for much of anything that requires integration with other systems`

			`erlang has a number of json libraries, some of them very good, but all of them tied to fairly specific representations and usage scenarios. working with json in erlang, outside of simple decoding into terms that probably aren't ideal, is laborious and clumsy. jsx seeks to correct this unfortunate state`


			`== features ==`

			`jsx is not an end to end json parser. jsx takes a json document and produces a generator that returns a jsx event and a new generator. a jsx event is an atom or tuple that represents a json structural element (like the start or end of an object or array) or a json value (like a string or a number). this provides a simple, easy to consume, iterative api that makes it easy to produce arbitrarily complex representations of the json document`

minor updates to edocs 2010-08-20 18:25:06 -07:00			the representation of jsx events was chosen for pragmatic reasons. strings, integers and floats are encoded in json as strings of unicode characters and many erlang functions operate on lists so returning them as lists of unicode codepoints is both efficient and convenient. structural elements are converted to descriptive atoms for ease of matching and clarity. json literals (`true', `false' and `null') are encoded as atoms for ease of matching and wrapped in a tagged tuple to differentiate them from structural elements
edoc documentation added 2010-08-19 23:30:22 -07:00
			`in cases where an incomplete json document is supplied to the parser, upon reaching the end of the document the generator may also return a new function that allows another chunk of the json document to be parsed as if parsing were never interrupted. this is useful for parsing very large json documents (to avoid holding the entire document in memory) or for parsing as data is made available (like over a network or from storage)`

			`jsx attempts to follow the <a href="http://www.ietf.org/rfc/rfc4627.txt?number=4627">json specification</a> as closely as possible but is realistic about actual usage of json and provides optional support for comments, json values not wrapped in a json object or array and streams containing multiple json documents`

			`jsx is wicked fast. sort of. it's as fast as possible without sacrificing a useable interface, at least. things like efficiency of binary matching, elimination of unused intermediate states, the relative costs of using lists or binaries and even garbage collection were tested and taken into consideration during development`


			`== usage ==`

			`jsx:parser()' is the entry point for a jsx parser. it returns a function that takes a binary and attempts to parse it as if it were a chunk of valid (or not so valid, depending on the options passed to `jsx:parser/1') json encoded in utf8, utf16 (big or little endian) or utf32 (big or little endian). it's an incremental parser, it's parsed on demand; an 'event' at a time. events are tuples of the form:

			```
			`{event, Event, Next}`
			`Event = start_object`
			`\| start_array`
			`\| end_object`
			`\| end_array`
			`\| {string, [Character]}`
			`\| {integer, [Character]}`
			`\| {float, [Character]}`
			`\| {literal, true}`
			`\| {literal, false}`
			`\| {literal, null}`
			`\| end_json`
minor updates to edocs 2010-08-20 18:25:06 -07:00			`Character -- a unicode codepoint represented by an erlang integer`
edoc documentation added 2010-08-19 23:30:22 -07:00			`Next -- a function of arity zero that, when invoked, returns the next event`
			`'''`

			`the decoder can also return two other tuples:`

			```
			`{incomplete, More}`
			`More -- a function of arity 1 that accepts additional input for`
			`the parser and resumes parsing as if never interrupted. the semantics`
			`are as if the new binary were appended to the already parsed binary`

			`{error, badjson}`
			`'''`

			`incomplete' is returned when input is exhausted. `error' is returned when invalid json input is detected. how obvious

			`putting all of this together, the following short module:`

			```
			`-module(jsx_ex).`

			`-export([simple_decode/1]).`

			`simple_decode(JSON) when is_binary(JSON) ->`
			`P = jsx:parser(),`
			`decode(P(JSON), []).`

			`decode({event, end_json, _Next}, Acc) ->`
			`lists:reverse(Acc);`
			`decode({event, Event, Next}, Acc) ->`
			`decode(Next(), [Event] ++ Acc).`
			`'''`

			`does this when called from the shell with suitable input:`

			```
			`1> jsx_ex:simple_decode(`
			`<<"{`
			`\"dead rock stars\": [\"kurt cobain\", \"elliott smith\", \"nicky wire\"],`
			`\"total\": 3.0,`
			`\"courtney killed kurt\": true`
			`}">>`
			`).`
			`[start_object,`
			`{key,"dead rock stars"},`
			`start_array,`
			`{string,"kurt cobain"},`
			`{string,"elliott smith"},`
			`{string,"nicky wire"},`
			`end_array,`
			`{key,"total"},`
			`{float,"3.0"},`
			`{key,"courtney killed kurt"},`
			`{literal,true},`
			`end_object]`
			`'''`


			`jsx also has an eep0018 decoder and encoder, a json pretty printer and a json verifier:`

			```
			`1> jsx:json_to_term(<<"[1, true, \"hi\"]">>).`
			`[1,true,<<"hi">>]`
			`2> jsx:term_to_json([{name, <<"alisdair">>}, {balance, -3742.35}, {employed, false}]).`
			`<<"{\"name\":\"alisdair\",\"balance\":-3742.35,\"employed\":false}">>`
			`3> jsx:is_json(<<"{}">>).`
			`true`
			`4> jsx:format(<<" \t\t [\n \"i love whitespace\"\n ]\n ">>).`
			`<<"[\"i love whitespace\"]">>`
			`'''`


			`== contributing ==`

			`jsx is available on <a href="http://github.com/talentdeficit/jsx">github</a>. users are encouraged to fork, edit and make pull requests`


			`== acknowledgments ==`

			`jsx wouldn't be possible without Lloyd Hilaiel's <a href="http://lloyd.github.com/yajl/">yajl</a> and the encouragement of the erlang community. thanks also must be given to the <a href="https://mail.mozilla.org/listinfo/es-discuss">es-discuss mailing list</a> and Douglas Crockford for insight into the intricacies of json`