News and Updates on the KRR Group
Header image

If you liked WebPIE, you’ll also like QueryPIE

WebPIE performed forward inference over up to 100 billion triples (yes, that’s 10^11). Our about-to-be-published QueryPIE can do on the fly backward-chaining inference at query-time, over a billion triples, in milliseconds, on just 8 parallel machines.

Last year, Jacopo Urbani and co-authors from the LarKC team broke the speed record for forward chaining inference over OWL-Horst.  Computing the complete closure over 100 billion of triples in a number of hours using a MapReduce/Hadoop implementation on a medium-sized cluster. The performance of WebPie [see conference and journal paper] is:

  • 1 billion FactForge triples in 1.5 hours on 32 compute nodes
  • 24 billion Bio2RDF triples in 10 hours on 32 compute nodes
  • 100 billion LUBM triples in 15 hours on 64 compute notes
  • deriving anywhere between 150K-650K triples per second, depending on the dataset
  • runtime growing linearly with number of triples
  • speedup growing linearly the number of compute nodes

Now, a year later, we’re breaking another speed record, but this time for “backward chaining“: not doing all inferencing up front, but doing the inferencing “on the fly”, at query time, as and when they are needed.

Until now, backward-chaining was considered to be unfeasible on very large realistic data, since it would slow down the query response time too much. Our paper at ISWC this year shows it’s not all that impossible: on different real-life datasets of up to 1 billion triples, QueryPIE can do on the fly backward-chaining inference at query-time, implementing the full OWL Horst fragment with response times in millisecs on just 8 machines.

All code available at http://few.vu.nl/~jui200/files/querypie-1.0.0.tar.gz

Enhanced by Zemanta

Source: Semantic Web world for you

With the last post about SemanticXO dating back from April, it’s time for an update, isn’t it? ;-)

A lot of things happened since April. First, a paper about the project was accepted for presentation at the First International Conference on e-Technologies and Networks for Development (ICeND2011). Then, I spoke about the project during the symposium of the Network Institute as well as during the SugarCamp #2. Lastly, a first release of a triple-store powered Journal is now available for testing.

Publication

The paper entitled “SemanticXO : connecting the XO with the World’s largest information network ” is available from Mendeley. It explains what the goal of the project is and then report on some performance assessement and a first test activity. Most of the information contained has actually been blogged before on this blog (c.f. there and there) but if you want a global overview of the project, this paper is still worth a read. The conference in itself was very nice and I did some networking. I came back with a lot of business card and the hope of keeping in touch with the people I met there. The slides from the presentation are available from SlideShare

Presentations

The Network Institute of Amsterdam organised on May 10 the Network Institute organized a one-day symposium to strengthen the ties between its members and to stimulate further collaboration. This institute is a long-term collaboration between groups from the Department of Computer Science, the Department of Mathematics, the Faculty of Social Sciences and the Faculty of Economics and Business Administration. I presented a poster about SemanticXO and an abstract went into the proceedings of the event.

More recently, I spent the 10 and the 11 of September at Paris for the Sugar Camp #2 organised by OLPC France. Bastien managed me a bit of time on Sunday afternoon to re-do the presentation from ICeND2011 (thanks again for that!) and get some feedback. This was a very well organised event held at a cool location (“La cité des sciences“), it was also the first time I met so many other people working on Sugar and I could finally put some faces on the name I saw so many time on the mailing lists and on the GIT logs :)

First SemanticXO prototype

The project developement effort is split in 3 parts: a common layer hidding the complexity of SPARQL, a new implementation of the journal datastore and the coding of diverse activities making use of the new semantic capabilities. All three are going more or less in parallel, at different speed, as, for instance, the work on activities direct what the common layer will contain. I’ve focused my efforts on the journal datastore to get something ready to test. It’s a very first prototype that has been coded starting with the genuine datastore 0.92 and replacing the part in charge of the metadata. The code taking care of the files remains the same. This new datastore is available from Gitorious but because installing the triple store and replacing the journal is a tricky manual process, I bundled all of that ;-)

Installation

The installation bundle consists of two files, a “semanticxo.tgz” and a script “patch-my-xo.sh“. To install SemanticXO, you need to download the two and put them in the same location somewhere on your machine and then type (as root):

sh ./patch-my-xo.sh setup

This will install a triple store, add it to the daemons to start at boot time and replace the default journal by one using the triple store. Be careful to have backups if needed as this will remove all the content previously stored in the journal! Once the script has been executed, reboot the machine to start using the new software.

The bundle has been tested on an XO-1 running the software release 11.2.0 but it should work on any software release on both the XO-1 and XO-1.5. This bundle won’t work on the 1.75 has it contains a binary (the triple store) not compiled for ARM.

What now?

Now that you have the thing installed, open the browser and go to “http://127.0.0.1:8080″. You will see the web interface of the triple store which allows you to make some SPARQL queries and see which named graphs are stored. If you are not fluent in SPARQL, the named graph interface is the most interesting part to play with. Every entry in the journal gets its own named graph, after having populated the journal with some entries you will see this list of named graphs growing. Click on one of them and the content of the journal entry will be displayed. Note that this web interface is also accessible from any other machine on the same network as the XO. This yields new opportunities in terms of backup and information gathering: a teacher can query the journal of any XO directly from a school server, or an other XO.

Removing

The patch script comes with an install function if you want to revert the XO to its original setup. To use it, simply type (as root):

sh ./patch-my-xo.sh remove

and then reboot the machine.

Data2Semantics aims to provide essential semantic infrastructure for bringing e-Science to the next level.

A core task for scientific publishers is to speed up scientific progress by improving the availability of scientific knowledge. This holds both for dissemination of results through traditional publications, as well as through the publication of scientific data. The Data2Semantics project focuses on a key problem for data management in e-Science:

How to share, publish, access, analyse, interpret and reuse data?

Data2Semantics is a collaboration between the VU University Amsterdam, the University of Amsterdam, Data Archiving and Networked Services (DANS) of the KNAW, Elsevier Publishing and Philips, and is funded under the COMMIT programme of the NL Agency of the Dutch Ministry of Economic Affairs, Agriculture and Innovation.

Enhanced by Zemanta