News and Updates on the KRR Group
Header image

Author Archives: cgueret

Source: Semantic Web world for you

LOD Around The Clock (LATC) logoGoogle recently announced a new project, named the Google Art, which give access to paints from around the world in very high definition. It also provides some information related to these paintings.This is a very cool service but the data is not provided in a machine-friendly way. So we thought it would be nice to have a wrapper exporting in RDF so that this data could be more easily consumed by any semantic-aware application.

The GoogleArt2RDF wrapper offers such a wrapping service for any painting made available through GoogleArt. In order to use it, just copy the name of the artwork and paste it after “http://linkeddata.few.vu.nl/googleart/”. For instance, change “http://www.googleartproject.com/museums/rijks/night-watch” into “http://linkeddata.few.vu.nl/googleart/museums/rijks/night-watch“.

The data is expressed using essentially the FOAF and Dublin Core ontologies. When possible, the resources are linked to DBPedia for the author of the painting and the medium used (oil on canvas, etc). This is a first version of the system which does not yet export all the data from Google, comments and suggestions on how to improve it are much welcome!

Related Articles

Source: Semantic Web world for you

LOD Around The Clock (LATC) logo
Althought being commonly depicted as one giant graph, the Web of Data is not a single entity that can be queried. Instead, it’s a distributed architecture made of different datasets each providing some triples (see the LOD Cloud picture and CKAN.net). Each of these data source can be queried separately, most often through an end point understanding the SPARQL query language. Looking for answers making use of information spanning over different data sets is a more challenging task as the mechanisms used internally to query one data set (database-like joins, query planning, …) do not scale easily over several data sources.

When you want to combine information from, say DBPedia and the Semantic Web doog food site, the easiest and quickest workaround is to download the content of the two datasets, eventually filtering out triples you don’t need, and load the content retrieved into a single data store. This approach as some limitations: you must have a store running somewhere (that may require a significantly powerful machine to host it), the downloaded data must be updated from time to time and the data you need may not be available for downloading at the first place.

When used along with a SPARQL datalayer, eRDF offers you a solution when one of these limitation prevents you from executing your SPARQL query over several datasets. The applications runs on a low-end laptop and can query, and combine the results from, several SPARQL end points. eRDF is a novel RDF query engine making use of evolutionary computing to search for the solution. Instead of the traditional resolution mecanism, an iterative trial and error process is used to progressively find some answers to the query (more information can be found in the published papers which are listed on erdf.nl and in this technical report). It’s a versatile optimisation tool that can run other different kind of data layers and the SPARQL data layer offers an abstraction over a set of SPARQL end points.

Let’s suppose you want to find some persons and the capital of the country they live in:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX db: <http://dbpedia.org/ontology/>

SELECT DISTINCT ?person ?first ?last ?home ?capital WHERE {
	?person  rdf:type         foaf:Person.
	?person  foaf:firstName   ?first.
	?person  foaf:family_name ?last.
	OPTIONAL {
	?person  foaf:homepage    ?home.
	}
	?person  foaf:based_near  ?country.
	?country rdf:type         db:Country.
	?country db:capital       ?capital.
	?capital rdf:type         db:Place.
}
ORDER BY ?first

Such a query can be answered by combining data from the dog food server and dbpedia. More data sets may also contain list of people but let’s focus on researchers as a start. We’ll have to indicate to eRDF which are the end points to query, this is done with a simple csv listing:

DBpedia;http://dbpedia.org/sparql
Semantic Web Dog Food;http://data.semanticweb.org/sparql

Assuming the query is saved into a “people.sparql” file and the end points list goes into a “endpoints.csv”, the query engine is called like this:

java -cp nl.erdf.datalayer-sparql-0.1-SNAPSHOT.jar nl.erdf.main.SPARQLEngine -q people.sparql -s endpoints.csv -t 5

The query will first be scanned for its basic graph patterns, all of them will be grouped and sent to the eRDF optimiser as a set of constraints to solve. Then, eRDF will look for solutions matching as many of these constraints as possible and push back all the relevant triples found back into an RDF model. After some time (set with the parameter “t”), eRDF is stopped and Jena is used to issue the query over the model that was just populated. The answers are then displayed, along with a list of the data sources that contributed in finding them.

If you don’t know which end points are likely to contribute to the answers, you can just query all of the WOD and see what happens… ;-)
The package comes with a tool to fetch a list of SPARQL end points from CKAN, test them and create a configuration file. It gets called like that:

java -cp nl.erdf.datalayer-sparql-0.1-SNAPSHOT.jar nl.erdf.main.GetEndPointsFromCKAN

After a few minutes, you will get a “ckan-endpoints.csv” allowing you to run query the WoD from your laptop.

The source code along with a package including all the dependencies are available on GitHub. Please note that this is a first public release of the tool still in snapshot state so bugs are expected to show up. If you spot some, report them and help us improve the software. Comments and suggestions are also much welcome :)


The work on eRDF is supported by the LOD Around-The-Clock (LATC) Support Action funded under the European Commission FP7 ICT Work Programme, within the Intelligent Information Management objective (ICT-2009.4.3).

Source: Semantic Web world for you

A few days ago, I posted about SemanticXO and how you will see how to install a TripleStore on your XO. Here are the steps to follow to compile&install RedStore on the XO, put some triples in it and issue some queries. The following has been tested with an XO-1 running the software 10.1.3 and a MacBookPro running ArchLinux x64 (it’s not so easy to compile directly on the XO, that’s why you will need a secondary machine). All the scripts are available here.

Installation of RedStore

RedStore depends on some external libraries that are not yet packaged for Fedora11, which is used as a base for the operating system of the XO. The script build_restore.sh will download and compile all the necessary stuff. You may however need to install external dependencies on your system, such as libxml. That script only takes care of the things redstore directly depends on, namely raptor2, rasqal and redland (all available here). Here is the full list of commands to issue:

mkdir /tmp/xo
cd /tmp/xo
wget --no-check-certificate https://github.com/cgueret/SemanticXO/raw/master/build_redstore.sh
sh build_restore.sh

Once done, you will get four files to copy on the XO and if you don’t, you can also download this pre-compiled package. These files shall be put all together somewhere, for instance “/opt/redstore”. Note that all the data redstore needs will be put into that same directory. In plus of these 4 files, you’ll need a wrapper script and an init scripts. Both are available on the source code repository. So, here what to do on the XO, as root (replacing “cgueret@192.168.1.105″ by the login/IP accurate for you) :

mkdir /opt/redstore
scp cgueret@192.168.1.105:/tmp/xo/libraptor2.so.0 .
scp cgueret@192.168.1.105:/tmp/xo/librasqal.so.2 .
scp cgueret@192.168.1.105:/tmp/xo/librdf.so.0 .
scp cgueret@192.168.1.105:/tmp/xo/restored .
wget --no-check-certificate https://github.com/cgueret/SemanticXO/raw/master/wrapper.sh
chmod +x wrapper.sh
cd /etc/init.d
wget --no-check-certificate https://github.com/cgueret/SemanticXO/raw/master/redstoredaemon
chmod +x redstoredaemon
chkconfig --add redstoredaemon

Then you can reboot your XO and enjoy the triplestore through its http frontend, available on the port 8080 :)

Loading some triples

Now that the triple store is running, it’s time to add some triples. The SP2Bench benchmark comes with a tool (sp2b_gen) to generate any number of triples. To begin with, you can generate 50000 triples. That should be about of the maximum amount of triples an XO will have to deal with later on when the activities will store data in it. Here is what to do, with “192.168.1.104″ being the IP of the XO:

sp2b_gen -t 50000
rapper -i guess -o rdfxml sp2b.n3 > sp2b.rdf
curl -T sp2b.rdf 'http://192.168.1.104:8080/data/http://example.com/data'

It takes about 43 minutes to upload these 50k triples which gives an average of 53 milliseconds per triple or 19 triples per second. That’s not fast but should be enough to have an API allowing to store a bunch triples with an acceptable response time. The data takes 4Mo of disk space on the XO for an initial RDF file of about 9.8Mo.

Issue some queries

The SP2Bench benchmark comes with a generator for the triples and a set of 17 SPARQL queries expressed over this data. The queries are of changing complexity in order to benchmark different triple stores. Unfortunately, 9 of them where to complex for RedStore on the XO, with these 50k triples. These queries where not solved, even after being executed over a full night! The 8 remaining queries are solved without much problems, as long as you have enough time to wait for the answer:

Query file Execution time
q1.sparql 14229.4 ms
q2.sparql 44189.2 ms
q3a.sparql 21506.8 ms
q3b.sparql 19498.4 ms
q3c.sparql 19663.9 ms
q10.sparql 3940.6 ms
q11.sparql 4685.2 ms
q12c.sparql 3539.6 ms

The queries have been executed using the “sparql-query” command line client that way:

cat q2.sparql | sparql-query http://192.168.1.104:8080/sparql -t -p -n

The long delay can sounds as a bad news but it must be noted that this was with 50k triples and with queries designed to be tricky in order to test triple store capabilities. Considering a normal usage with fewer triples and more standard queries, we can expect things to go better.

Source: Semantic Web world for you

The three XOs received for the project

The project One Laptop Per Child (OLPC) has provided millions of kids world wide with a low-cost connected laptop helping them to enhance their knowledge and develop learning skills. Learning a foreign language, getting an introduction to reading/writting or preserving/revive an endangered/extinct language are among the possible usages of these XOs. Such activities could take a significant benefit from a storage layer optimised for multi-lingual and loosely structured data.

One of the building block of the Semantic Web, the “Triple Store”, is such a data storage service.  A triple store is like a database engine optimised to store and provide access to triples, atomic statements binding together a subject a predicate and an object. For instance, <Amsterdam,isLocatedIn,Netherlands>. And these two triples would define two different names for two different languages: <Amsterdam,isLocatedIn,”Netherlands”@nl>,  <Amsterdam,isLocatedIn,”Pays-Bas”@fr>.

SemanticXO is a new project from the contributor program aimed at adding a triple store and a front-end API on the XOs’ operating system. This triple store will extend the functionalities of Sugar with the possibility for all activities to store loosely structured/multilingual data and easily connect information across activities. In plus, the SPARQL protocol will allow for an easy access to the data stored on any device.

A first goal is to setup RedStore on the XOs allocated to this project. RedStore is a lightweight triple store that should be able to run on low hardware and still provide nice performances. Stay tuned for the result! ;-)

Source: Semantic Web world for you

This is the first post on this blog, aimed at giving and pointing to information about the Semantic Web. The Semantic Web (or Web 3.0) is a new technology and research topic aimed at putting more semantic into the Web as we know it. The changes are happening, in a not so visible but very concrete way for you, user of the Web. On this blog you will learn more about it and how you can benefit from it, whoever you are.