Dataspace 6: Terms as Types

Prelude:
Dataspace 0: Those Memex Dreams Again
Dataspace 1: In Search of a Data Model
Dataspace 2: Revenge of the Data Model
Dataspace 3: It Came From The S-Expressions
Dataspace 4: The Term-inator

I want a Memex. Roughly, I want some kind of personal but shareable information desktop where I can enter very small pieces of data, cluster them into large chunks of data, and – most importantly – point to any of these small pieces of data from any of these chunks.

‘Pointable data’ needs a data model. The data model that I am currently exploring is what I call term-expressions (or T-expressions): a modified S-expression syntax and semantics that allows a list to end with (or even simply be, with no preceding list) a logical term in the Prolog sense.

I’m not a huge fan of type theory as it currently exists in functional programming languages such as Haskell. Type theory seems, to me, to be merely an application of logic – and I think what we’d find much more useful than a type-inference engine that only runs at compile time, is a general logical inference engine like Prolog that operates on general logical terms, and can operate at runtime.

(Because on the Internet, it’s always runtime. There is no ‘compile time’ where you can escape the entire network; all you can do is pass the output of one program into another via sending and receiving data. At some point, we need to start thinking about ‘types’ as merely syntactic transformations of data, or logical statements (themselves pieces of data) made, inferred and proved about data.  To handle the creation of new types at runtime, we need functions that can take types – or structured data containing types – as arguments, and return types as values.)

Continue reading “Dataspace 6: Terms as Types”

Dataspace 5: Introducing /all

Prelude:
Dataspace 0: Those Memex Dreams Again
Dataspace 1: In Search of a Data Model
Dataspace 2: Revenge of the Data Model
Dataspace 3: It Came From The S-Expressions

I want a Memex. Roughly, I want some kind of personal but shareable information desktop where I can enter very small pieces of data, cluster them into large chunks of data, and – most importantly – point to any of these small pieces of data from any of these chunks.

To get ‘pointable data’, I want a data model that allows me to embed pointers anywhere, but keeps the simplicity and clarity that, eg, RDF triples and XML/HTML don’t have.

The data model that I am currently exploring is what I call term-expressions (or T-expressions): a modified S-expression syntax and semantics that allows a list to end with (or even simply be, with no preceding list) a logical term in the Prolog sense. In term-expressions we simply represent a term as a list whose head is the designated term marker – in this case, I’m using () to indicate a list and / as the term marker. We can have the dot, as in S-expressions, or we could skip it entirely (which gives us some interesting expanded semantics for storage that I’ll talk about later).

Continue reading “Dataspace 5: Introducing /all”

Dataspace 4: The Term-inator

Prelude:
Dataspace 0: Those Memex Dreams Again
Dataspace 1: In Search of a Data Model
Dataspace 2: Revenge of the Data Model
Dataspace 3: It Came From The S-Expressions
Dataspace 4: The Term-inator

I want a Memex. Roughly, I want some kind of personal but shareable information desktop where I can enter very small pieces of data, cluster them into large chunks of data, and – most importantly – point to any of these small pieces of data from any of these chunks.

Before we can build such a system, we have to settle on a data model. The logic programming language Prolog gives us a potentially useful universal data model – term structure –  but we don’t get the most out of it unless we express it in Lisp-style S-expressions,  which reveal hidden semantics that even the formal logic and logic programming communities didn’t catch up with until 1989. But S-expressions also introduce ambiguity that wasn’t there in the original Prolog term structure.

There is a very simple way forward from here, and it’s one that I haven’t seen described before. I think it has potential as a fundamental data semantics for building very large or very small distributed systems.

Continue reading “Dataspace 4: The Term-inator”

Dataspace 3: It Came from the S-Expressions

Prelude:
Dataspace 0: Those Memex Dreams Again
Dataspace 1: In Search of a Data Model
Dataspace 2: Revenge of the Data Model
Dataspace 3: It Came From The S-Expressions
Dataspace 4: The Term-inator

I want a Memex. Roughly, I want some kind of personal but shareable information desktop where I can enter very small pieces of data, cluster them into large chunks of data, and – most importantly – point to any of these small pieces of data from any of these chunks.

The Prolog data model – based on logical terms , which are very similar to SQL relations, but can be nested and computed like functions – looks useful, but still has a few rough edges because it was built in 1972 and hasn’t changed much since.

For one thing, a Prolog term looks like a C function call

funny(cats)

but that syntax is rather irregular. Can we come up with a simpler syntax, that gives us more options for how we organise data?

Yes. Yes we can. But there’s a price to pay.

Continue reading “Dataspace 3: It Came from the S-Expressions”

Dataspace 2: Revenge of the Data Model

Prelude:
Dataspace 0: Those Memex Dreams Again
Dataspace 1: In Search of a Data Model
Dataspace 2: Revenge of the Data Model
Dataspace 3: It Came From The S-Expressions
Dataspace 4: The Term-inator

I want a Memex. Roughly, I want some kind of personal but shareable information desktop where I can enter very small pieces of data, cluster them into large chunks of data, and – most importantly – point to any of these small pieces of data from any of these chunks.

But to even begin we need a data model that allows us to point at small pieces of data. This turns out, surprisingly, to be harder than it looks. The data models in common use that we’ve looked at and rejected so far are:

  1. Filesystems (as in Unix or the Web – the end-point components are too big)
  2. Object-oriented programming (objects are fragmented,  ill-defined and too complex)
  3. Dictionaries (promising, but don’t let us have multiple values per key)
  4. Relational databases (our data is not structured in fixed-length tuples)
  5. Graphs made of triples (they’re neither homoiconic nor recursively structured)

So are we out of options? Not quite. There’s one weird old trick we haven’t looked at. #6 may surprise you.

Continue reading “Dataspace 2: Revenge of the Data Model”