Dataspace 12: AR3 to AR4, or [`] = []

Prelude:
Dataspace 11: AR2, A Better Array Representation

Okay, that didn’t quite work.

I tried to implement a Javascript parser for AR2, and it kept feeling like it was much more complicated than it ought to be. Eventually I figured out that the problem was the distinction between NIL [] and EMPTY [`]. EMPTY just… did not like existing.

So I took the plunge and added an axiom that NIL == EMPTY, calling AR2 with this axiom AR3, and everything seemed to be much better. I wrote a working Javascript parser for AR3 and it was okay-ish.

But then I started to feel that even AR3 had some problems. The main one being that it loses the array character: you can’t feed any AR3 array to a standard Javascript function that expects an array, and get a sensible result. You always have that extra cell on the end which is not part of the array.

So I took the next step, a new notation I call AR4. This one seems better.

AR4 is an array with three slots.

[Head,Body,Tail]

Head is the head (Prolog functor equivalent). It is NIL [] if the term is a List.

Body is an array. Each element of the array is an AR4.

Tail is [] or a new AR4 continuing the expression.

Then, this array can be simplified to a one- or two-slot array with a few rules:

1 If Head and Tail are both [], ie, it’s a simple proper list, it can be represented as just one slot. This is very common so we want this to be as simple as possible.

[Body]

Term-expression

[1 2 3]

represents into JSON as

[[1,2,3]]

2 If Body and Tail are both [], AND if Head is an atom (a string or number), then it can be represented as just one slot. This is also a very common case.

[Head]

eg term-expression

[`true]

represents into JSON as

["true"]

3 If Tail is [] then we can drop the third slot entirely.

[Head, Body]

eg term-expression

[`foo 1 2 3]

represents into JSON as

["foo", [1, 2, 3]]

This is pretty much all there is to AR4. A couple other special-case simplification rules:

4 If both Head and Body are [], then we can reduce the whole thing to just Tail.

eg term-expression

[`[] `foo bar]

becomes

[`foo bar]

or JSON

["foo", ["bar"]]

This simplification rule is pretty important, as it means that NIL is always [].

5 If Body only has one element and it’s an atom (string or number), then we can reduce Body to its first element.

eg term-expression

[`foo bar]

becomes JSON

["foo", "bar"]

which is really just a little bit of syntax sugar for the JSON, and I’m not sure if it’s a good idea or not, but we can do it so we might as well.

Parsing up AR4 is not too much more difficult than parsing AR3, and means we can now use all standard array functions on the Body of an expression, and it’s still fairly simple to read the raw Javascript arrays.

Dividing a term-expression conceptually into a Head, a Body and a Tail also feels like a very natural thing to do, and helps explain what is going on. It’s just an array (Body) marked up with two standard added fields: the Head telling us what kind of object it is, and the Tail giving us the Lisp CDR equivalent.

Putting together both the Head and the Tail, as well as the size and random-addressable array-ness of an array Body, gives us a “tagged linked list of arrays”, which seems like it must be a fairly well understood kind of data type, but letting the tags be fully general and combining it with a standard serialisation format still seems like something that hasn’t been explored very much. We’ve had universal data types that give us a Head (objects and predicates), we’ve had universal data types that give us a tail (Lisp conses, and other forms of singly linked lists), but this is a type that includes both as standard. It’s really just a parse tree, but it’s a concrete parse tree not an abstract one.

My feeling is that this is indeed a very powerful universal data type, one that could serve as the basis of a thousand-year language, but making that argument take actual shape will be the next step.