Destructuring JSON in Erlang Made Easy

October 18, 2010

Ever search Google for ways to decode JSON in Erlang only to come out feeling dirty on the other end? Are you sick of writing cryptic patterns to pull out a value only a few levels deep? Does the thought of pulling apart a deeply nested JSON object in Erlang make you wince? Wouldn't it be nice if you could access JSON in Erlang as easily and succinctly as you do in JavaScript? If you answered yes to any of these questions then you'll want to read further.

Destructuring JSON in Erlang

You've got some JSON in an Erlang string. The mochijson2 module is on your code path. You call mochijson2:decode/1 passing it your string. You get...this?

1> Obj = mochijson2:decode("{ \"post\": { \"title\": \"Destructuring JSON in Erlang Made Easy\" }}"). 
{struct,[{<<"post">>,
          {struct,[{<<"title">>,
                    <<"Destructuring JSON in Erlang Made Easy">>}]}}]}

Ouch! That's a far cry from the simple object you get in JavaScript. Furthermore, if you want to access the title of the post you have two choices, neither of which are optimal.

  1. Pattern match the whole thing to pull out the title. Sure, you could make use of _ (underscore) to make it shorter but then you lose some readability.

    2> {struct, [{<<"post">>, {struct, [{<<"title">>, Title}]}}]} = Obj. 
    {struct,[{<<"post">>,
              {struct,[{<<"title">>,
                        <<"Destructuring JSON in Erlang Made Easy">>}]}}]}
    3> Title.
    <<"Destructuring JSON in Erlang Made Easy">>
    
  2. A combination of {struct, Foo} and proplists:get_value/2 (along with lists:nth/2 for JavaScript Arrays). This has the benefit of being generalized to any structure--a property I'll make use of shortly.

    4> {struct, InnerObj} = Obj.
    {struct,[{<<"post">>,
              {struct,[{<<"title">>,
                        <<"Destructuring JSON in Erlang Made Easy">>}]}}]}
    5> Post = proplists:get_value(<<"post">>, InnerObj).
    {struct,[{<<"title">>,
              <<"Destructuring JSON in Erlang Made Easy">>}]}
    6> {struct, InnerObj2} = Post.
    {struct,[{<<"title">>,
              <<"Destructuring JSON in Erlang Made Easy">>}]}
    7> Title = proplists:get_value(<<"title">>, InnerObj2).
    <<"Destructuring JSON in Erlang Made Easy">>
    8> Title.
    <<"Destructuring JSON in Erlang Made Easy">>
    

Destructuring JSON in JavaScript

Considering the fact that JSON is a subset of JavaScript it seems only logical that the canonical way of accessing JSON should be found in JavaScript's syntax.

var Obj = { post: { title: "Destructuring JSON in Erlang Made Easy" }};
undefined
Obj.post.title;
"Destructuring JSON in Erlang Made Easy"

Right, so how do we go about translating Obj.post.title into a series of Erlang expressions?

Find the Pattern

Lets go back to the point I made above about technique #2 having the property of being general. You may have noticed that each object is "unwrapped" by pattern matching it with {struct, Foo} and each property's value is obtained by making use of get_value. These two operations can be applied as many times as necessary in order to get the desired value. You may also notice that each . (dot) in the JavaScript syntax corresponds to a pair of said calls. What if we had a way to take an Erlang string that contained JavaScript syntax and generate a function that would return the appropriate value? That is, a function that takes a string and generates a function to access the decoded (mochijson2:decode/1) JSON.

Neotoma

Neotoma is a packrat parser for PEGs written in Erlang. It was written by Sean Cribbs 1. What exactly that means is not nearly as important as what it allows you to do. In this case, Neotoma allows us to take what is essentially an external DSL, i.e. the subset of JavaScript syntax used to access an object, and generate Erlang code from it. It allows us to do this via PEGs and literal Erlang code 2.

The PEG

object <- var path `
[_Var, PathFun] = Node,
fun(JSON) ->
  PathFun(JSON)
end
`;
path <- "." var path? / "[" int "]" path? `
case Node of
  [".", Key, []] ->
    fun({struct, Obj}) ->
      proplists:get_value(Key, Obj)
    end;
  [".", Key, PathFun] ->
    fun({struct, Obj}) ->
      V = proplists:get_value(Key, Obj),
      PathFun(V)
    end;
  ["[", I, "]", []] ->
    fun(Array) ->
      lists:nth(I + 1, Array)
    end;
  ["[", I, "]", PathFun] ->
    fun(Array) ->
      V = lists:nth(I + 1, Array),
      PathFun(V)
    end
end
`;
int <- [0-9]+ `list_to_integer(Node)`;
var <- [_a-zA-Z] [_a-zA-Z0-9]* `list_to_binary(Node)`;

There are four non-terminals in this PEG: object, path, int and var. Each symbol on the left hand side of <- is a non-terminal and the right hand side is called a parsing expression. This expression is similar to BNF except that it's choice operator (| in BNF, / in PEG) is ordered. That is, it short-circuits on the first successful match.

Following each parsing expression is Erlang code surrounded by ` (back ticks). This code will be executed for each match of the corresponding expression and the matching tokens will be bound to the variable Node. This is what allows you to generate code directly in the grammar.

1> neotoma:file("destructure_json.peg").
ok

This PEG is then fed to Neotoma and an Erlang source file named destructure_json.erl is generated that will be able to parse the small subset of JavaScript that I've defined in the PEG. If you read the code I've embedded in the grammar you'll see that I generate nested anonymous functions--each picking apart a portion of the object and then feeding the value to the next function, until the path is exhausted.

The generated file will contain two methods: file/1 and parse/1. You can then call destructure_json:parse("Obj.post.title") and a function will be returned.

1> destructure_json:parse("Obj.post.title").
#Fun<destructure_json.24.9136604>

Main Attraction

First, let's create a small wrapper around the generated parser in order to make it easier to use.

-module(json).
-export([destructure/2]).

destructure(JS, JSON) ->
    F = destructure_json:parse(JS),
    F(JSON).

Now we can use it like so.

1> Obj = mochijson2:decode("{ \"post\": { \"title\": \"Destructuring JSON in Erlang Made Easy\" }}").
{struct,[{<<"post">>,
          {struct,[{<<"title">>,
                    <<"Destructuring JSON in Erlang Made Easy">>}]}}]}
2> json:destructure("Obj.post.title", Obj).
<<"Destructuring JSON in Erlang Made Easy">>
3> Obj2 = mochijson2:decode("{ \"person\": { \"name\": \"ryan\", \"friends\": [ \"Brendan\", \"Smokey\", \"Bams\" ] }}").
{struct,[{<<"person">>,
          {struct,[{<<"name">>,<<"ryan">>},
                   {<<"friends">>,[<<"Brendan">>,<<"Smokey">>,<<"Bams">>]}]}}]}
4> json:destructure("Obj.person.friends[1]", Obj2).
<<"Smokey">>

  1. My first attempt for better JSON access was to use Erlang parse transformations which led me to a great series of posts by Sean in which he describes how he implemented Neotoma using parse transformations. It was then that a lightbulb went off in my head and I realized I could use Neotoma to parse the JavaScript string (which is essentially an external DSL) and dynamically generate Erlang code. You can find this excellent series of posts here and here.

  2. PEG stands for Parsing Expression Grammar. A PEG is a lot like a BNF in that you describe a grammar using non-terminals and terminals, which can then be used to generate a parser. PEGs, however, are not ambiguous and can have only one parse tree for a given input. Neotoma than adds the ability to generate semantics in the form of Erlang code based on which rule was matched in the PEG. Said in another way, you can think of Neotoma as a very concise way to generate Erlang code from BNF like syntax. I guess you could say it's a very concise recursive descent parser.