Dataspace 9: A Tower of Nulls, And Awkward Sets

Prelude:
Dataspace 0: Those Memex Dreams Again
Dataspace 1: In Search of a Data Model
Dataspace 2: Revenge of the Data Model
Dataspace 3: It Came From The S-Expressions
Dataspace 4: The Term-inator

I want a Memex. Roughly, I want some kind of personal but shareable information desktop where I can enter very small pieces of data, cluster them into large chunks of data, and – most importantly – point to any of these small pieces of data from any of these chunks.

‘Pointable data’ needs a data model. The data model that I am currently exploring is what I call term-expressions (or T-expressions): a modified S-expression syntax and semantics that allows a list to end with (or even simply be, with no preceding list) a logical term in the Prolog sense.

Looking at term-expressions, one of the first things we notice is that there are a large number of null-like terms. I’m wondering what the meaning of these varieties of null might be.

  • The simplest null-like term is the nil pair or empty list: ()
  • The next one is the empty term : (/)
  • Then we have the empty set (if can think of /all as a set) or empty union:  (/all)
  • Then, for every other term functor X, the empty X: (/X)

An interesting question is whether terms correspond to types, (and if so, in what particular type system) or whether the notion of ‘type’ is unrelated to what we’re looking at here.

But some modern type systems, such as Scala’s – in fact I think any with parameterised or dependent types – seem to allow something similar to what we see here in terms. See, for example , this post “Representing emptiness in Scala“:

Nothing is a trait that is guaranteed to have zero instances. It is a subtype of all other types.

Nil is just an empty list, exactly like the result of List(). It is of type List[Nothing]

Scala’s type system is a little weird because it has to interact with Java; I think Nothing corresponds to the type usually otherwise called ‘bottom’.

But if, for example, we called the empty term (/) Nothing, then for all the other terms X (if not the empty pair () itself),  X[Nothing] might be equivalent to (/X).

The empty set or union, (/all),  seems to be represent a different, more empty, kind of emptiness than the empty pair (). Since, if we apply Lisplike syntax rules, and say that () is equivalent to (/all ()), then obviously () is ‘bigger’ than (/all). We have to have a rule like this if we are going to track, for instance, the union of these two statements

()
(b)

which should be something like

(/all () (b))

It seems too that (/all), while being ’empty’ and ’emptier than ()‘, is also ‘less empty than (/). Because we at least know that it’s meant to be a set/union – it’s a term beginning with the /all functor – but it doesn’t have any data in it. But while we know (/) is a term, we know it’s not any term in particular. Whatever term functor we can think of, it’s not that.

Exactly how we would use these varieties of null, I’m not sure.

In the case of /all, it seems to me that – interpreted as a logical statement – a list ending in the empty set (/all) must be an invalid or empty logical statement, as it’s a list ending in ‘nothingness’. While it’s not strictly a logical ‘false’ value it is a logical ’emptiness’ value that doesn’t imply truth, and can’t be asserted or retracted on its own.

I think that ending in (/) would have the same logical effect, but I’m not entirely sure.

One of the difficulties of /all is thinking whether it represents a set, or not. We could perhaps define it informally like this (assuming a S-expression dot for the moment)

(/all) = = (/)
(/all X) == (X)
(/all X . Y) == (/and X /all Y)

ie, so that

(/all 1) = 1
(/all 1 2) = (/and 1 . 2)
(/all 1 2 3) = (/and 1 /and 2 . 3)

where

(/and X . Y) implies both X and Y

In other words, /all is just a ‘multiway AND’. It’s a simpler way of writing a whole bunch of /ands, and simplicity and clarity mostly seem to be what we want when we’re representing data.

But a multiway AND is not quite a set, because a set has a boundary which a logical AND doesn’t have. If we nest an AND inside another AND, we get the same resultas if we just put all their clauses into one big AND. Also, if those rules above apply for all Xes, then (/all (/all)) == (/all) == (/) — but in set theory, {{}} does not equal {}. And nor does {X} = X.  Ie if you nest an empty set inside another empty set, you get two nested sets, and all of set theory relies on being able to do this. But if you nest an AND inside another AND, the boundary between the two evaporates.

We could define a term with strict set-like boundary semantics, certainly. Just call it (/set). But the usefulness of (/all) as I’ve defined it here is that it’s the structure you naturally want if you want to express a collection of logical statements. Such as, eg, a Prolog database.

This seems to be quite a deep issue, and I don’t feel like I have a lot of guidance from either logic or set theory at this point. I assume that there are deep connections between set theory and logic – but here, I seem to have struck an equally deep discontinuity between them.

A fundamental guiding principle I’ve been following with this entire idea of ‘dataspace’ is that an entire data set and computation must be made out of the same logical ‘stuff’. Computers always only work on representations, not abstract ideas; all they perform is syntactic transformations on data structures. Since (following Prolog) I’ve picked ‘logical term structure’ as that fundamental ‘stuff’ that my data is made out of, it follows that if I want anything like a ‘set’, then sets must be some kind of logical term. Either a literal representing multiple data items, or an expression which can be evaluated (somehow) to produce more items. Either way, it should be a chunk of data structure which a computer program can parse.

This oddness about ‘when is a set not a set, and when do we care about it having or not having boundaries’ exists in Prolog and in Kanren too; it is never quite defined. A Prolog predicate or program produces a sequence of values, but (unless we use a higher-level predicate like ‘findall’), those values don’t exist in the computer at the same time – they are ‘backtracked’ after each one is produced. In Kanren, they are usually returned as a ‘lazy list’ with similar semantics, but allowing us to compute multiple values. A lazy list is NOT a Kanren term, though.

(We could very easily represent a lazy list in term-expression syntax: eg:

(value1 /expression-that-generates-next-values)

assuming we feed this into an evaluator that knows how to handle the expression. This is where having a term marker seems a very useful thing).

The HiLog paper says we could think of a higher-order predicate as a set, but really, it doesn’t have set semantics either. The semantics of Prolog and Kanren searches is really that of a bag – an ordered list of values that may contain the same value multiple times. And they just aren’t usually ever nested. If they are nested, it is in the form of a list. Not a set or a bag.

This is something that I’d like a lot more clarity on, but exactly what theory even connects with these representations, I don’t know.