Matteo Franchin's corner

11 Nov 2012: The new type system of Box: reflection and dynamic typing

I spent the last four months working hard on a new type system for Box and I think it is now time to comment on how things are going.

It was hard work, but also fun. Box does now come with a C API which allows creating and inspecting types. The main difference with respect to the previous version is that types are now pointers to separtely allocated type objects, rather than being type identifiers (integers) referring to a global table of types. The new design is more sensible. After all, the old type system was quite dated and was one of the few parts of the compiler not to be rewritten a third time from scratch (next candidate for a complete rewrite is value.c, registers.c and friends).

I embarked on all this work for several reasons. The most important being the ability to box/unbox objects. This is a relatively simple feature to implement, once a decent type system is in place. It is also an incredibly useful capability which - I found - was required to make progress with the language design. With this card on my sleeve, I can focus on implementing heterogeneous arrays and dictionaries, hashing, serialisation and more. More importantly, I can do all this in the right way, without having to introduce ad-hoc extensions to the type system each time I have to add support for arrays, dictionaries, etc.

Boxing, Unboxing and dynamic typing

Now, let’s be practical and see how objects can be boxed in the Box language (well the next version):

any1 = Any[123]
any2 = Any["string"]
any3 = Any[Color[0.5]]

These three lines create three Any objects. Under the hood an Any object is essentially a couple of two things: the type of the boxed object and a pointer to the boxed object. Any allows to encapsulate different objects inside the same object type, so that they can be treated uniformly. So let’s see how you could then use these objects. For example,

Print[any1; any2; any3;]

Any objects are unboxed automatically at run time and are matched against the appropriate combination. In the line above, giving the three objects to Print generates three dynamic calls. At run time, these Any objects are expanded to the corresponding boxed objects and the appropriate combinations are executed: Int@Print, Str@Print and Color@Print, respectively.

Having the compiler behave this way is pretty convenient. For example, it allows to easily implement (from C) a new Array object representing an heterogenous collection of objects (this is what I am planning to do next). I could imagine having code like this,

objs = Array[1, "hello",]
i = 0, Print[objs.Get[i++], For[i < Num[objs]]]

(but I promise I will get rid of .Get.) Similarly, for dictionaries we could have:

objs = Dict[Set[Key["one"], Value[1]]
            Set[Key["two"], Value[2]]]
Print[objs.Get["one"] + objs.Get["two"];]

Where Set = ^(Object key, value). Admittedly, some syntax sugar may be required to make the notation slimmer (e.g. a shorthand for the Set[Key[...], Value[...]] construct).

Let’s now return back to the Any type. There are some corner cases that are worth to discuss. Firstly, we saw that Any[object] allows boxing object inside an Any type. What if object is itself an Any object? For example:

any1 = Any[Any[x]]
any2 = Any[x]

I decided that the two lines above should produce the same result. Any should only expand those objects which are not already Any objects. After all, this behaviour can be seen as a consequence of what seen before: when Any[x] is given to Any, it is first expanded with a dynamic call to x which is again boxed by the outer Any.

Similarly, if SomeType is defined as,

SomeType = (Any obj, Int something_else)
Any@SomeType[.obj = $]

Then the two following lines are equivalent:

y = SomeType[Any[x]]
y = SomeType[x]

Notice also that Any objects can be used in assignments:

a = Any[]
a = "object"

AB = (Any a, b)
ab = ab[.a="string", .b=Color[0.0]]

Finally, I am unsure whether introducing an ANY object will end up being necessary. This object could be internally identical to Any, but would be treated as an ordinary object, meaning that it would not cause automatic boxing/unboxing.

What comes next...

As you may have argued, the new type system required a new object management system. The latter is more powerful than the old version, but is also slower. This is because initialisation, finalisation and copy of objects are now fully dynamic processes. I will try to make them faster and to fix some inefficiencies in the way some of the code is generated (working on assignments and general aspects of the VM bytecode). This should be enough for the upcoming release, version 4.0.

Next release will probably provide arrays, dictionaries, extensions and cleanups of the syntax. Introspection and reflection are also good candidates. After all, the new type objects are already ordinary (almost) Box objects (handled by the object management system they themselves sustain). A cyclic garbage collector (to collect cyclic references) would also be a good candidate to do some fun programming.

I also would like to get back to the GUI and to the graphic library, provide better support for drawing arcs and curves (bezier) and enable users to share content among themselves.