Another speed record for OWL Horst inferencePosted by in cloud computing | large scale | reasoning | semantic web
If you liked WebPIE, you’ll also like QueryPIE
WebPIE performed forward inference over up to 100 billion triples (yes, that’s 10^11). Our about-to-be-published QueryPIE can do on the fly backward-chaining inference at query-time, over a billion triples, in milliseconds, on just 8 parallel machines.
Last year, Jacopo Urbani and co-authors from the LarKC team broke the speed record for forward chaining inference over OWL-Horst. Computing the complete closure over 100 billion of triples in a number of hours using a MapReduce/Hadoop implementation on a medium-sized cluster. The performance of WebPie [see conference and journal paper] is:
- 1 billion FactForge triples in 1.5 hours on 32 compute nodes
- 24 billion Bio2RDF triples in 10 hours on 32 compute nodes
- 100 billion LUBM triples in 15 hours on 64 compute notes
- deriving anywhere between 150K-650K triples per second, depending on the dataset
- runtime growing linearly with number of triples
- speedup growing linearly the number of compute nodes
Now, a year later, we’re breaking another speed record, but this time for “backward chaining“: not doing all inferencing up front, but doing the inferencing “on the fly”, at query time, as and when they are needed.
Until now, backward-chaining was considered to be unfeasible on very large realistic data, since it would slow down the query response time too much. Our paper at ISWC this year shows it’s not all that impossible: on different real-life datasets of up to 1 billion triples, QueryPIE can do on the fly backward-chaining inference at query-time, implementing the full OWL Horst fragment with response times in millisecs on just 8 machines.
All code available at http://few.vu.nl/~jui200/files/querypie-1.0.0.tar.gz
You can follow any responses to this entry through the RSS 2.0 You can skip to the end and leave a response. Pinging is currently not allowed.