Clustering activity for the XO

Source: Semantic Web world for you

In the past few years many data sets have been published and made public in what is now often called the Web of Linked Data, making a step towards the “Web 3.0”: a Web combining a network of documents and data suitable for both human and machine processing. In this Web 3.0, programs are expected to give more precise answers to queries as they will be able to associate a meaning (the semantic) to the information they process. Sugar, the graphical environment found on the XO, is currently Web 2.0 enabled – it can browse web sites – but has no dedicated tools to interact with the Web 3.0. The goal of the SemanticXO project introduced earlier in this blog is to make Sugar Web 3.0 proof by adding semantic software on the XO.

One corner stone of this project is to get a triple store, the software in charge of storing the semantic data, running on the limited hardware of the machine (in our case, an XO-1). As it proved to be feasible, we can now go further and start building activities making use of it. And to begin with, a simple clustering activity: the goal there is to cluster into boxes using drag&drop. The user can create as many boxes as he needs, and the items may be moved around boxes. Here is a screenshot of the application, showing Amerindian items:

Prototype of the clustering activity

The most interesting aspect of this activity is actually under its hood and is not visible on the screenshot. Here is a some of the triples generated by the application (note that the URLs have been shortened for readability) :

subject	predicate	object
olpc:resource/a05864b4	rdf:type	olpc:Item
olpc:resource/a05864b4	olpc:name	“image114″
olpc:resource/a05864b4	olpc:hasDepiction	“image114.jpg”
olpc:resource/a82045c2	rdf:type	olpc:Box
olpc:resource/a82045c2	olpc:hasItem	olpc:resource/a05864b4
olpc:resource/78cbb1f0	rdf:type	olpc:Box

It is relevant to note here the flexibility of that data model: The assignment of one item to the only box is stated by a triple using the predicate “hasItem”, one of the box is empty because there is no such statement linking it to an item. A varied number of similar triples can be used, without any constraint and the same goes for actually all the triples in the system. There is no requirement for a set of predicates all the items must have. Let’s see the usage that can be made of this data through three diﬀerent SPARQL queries, introduced from the simple one to the most sophisticated:

List the URIs of all the boxes and the items they contain

SELECT ?box ?item WHERE { ?box rdf:type olpc:Box. ?box olpc:hasItem ?item. }

List of the items and their attributes

SELECT ?item ?property ?val WHERE { ?item rdf:type olpc:Item. ?item ?property ?val. }

List of the items that are not in a box

SELECT ?item WHERE { ?item rdf:type olpc:Item. OPTIONAL { ?box rdf:type olpc:Box. ?box olpc:hasItem ?item. } FILTER (!bound(?box)) }

These three queries are just some examples, the really nice thing about this query mechanism is that (almost) anything can be asked through SPARQL. There is no need to define a set of API calls to cover a list of anticipated needs, as soon as the SPARQL end point is made available every activity may ask whatever it wants to ask!

We are not done yet as there is still a lot to develop to finish the application (game mechanism, sharing of items, …). If you are interested in knowing more about the clustering prototype, feel free to drop a comment on this post and/or follow this activity on GitHub. You can also find more information in this technical report about the current achievements of SemanticXO and the ongoing work.