News and Updates on the KRR Group
Header image

The Botari application from the LarKC project has won the Open Track of the Semantic Web Challenge.

Botari is a LarKC workflow running on servers in Seoul, plus a user frontend that runs on a Galaxy Tab.

The workflow combines open data from the city of Seoul (Open Street Map, POI’s) with twitter traffic and combines stream processing, machine learning and querying over RDF datasets and streams to give personalised restaurant information and recommendations, presented in an augmented reality interface on the Galaxy Tab.

For more info on Botari, see either the website, or the demo movie or the slide deck or the paper.

Enhanced by Zemanta

The LarKC project’s development team would like to announce a new release (v.3.0) of the LarKC platform, which is available for downloading here. The new release is a considerable improvement of the previous release (v.2.5), with the following distinctive features: PLATFORM New (plain) plug-in registry light-weight plug-in loading and thus very low platform’s start-up time [...]

If you liked WebPIE, you’ll also like QueryPIE

WebPIE performed forward inference over up to 100 billion triples (yes, that’s 10^11). Our about-to-be-published QueryPIE can do on the fly backward-chaining inference at query-time, over a billion triples, in milliseconds, on just 8 parallel machines.

Last year, Jacopo Urbani and co-authors from the LarKC team broke the speed record for forward chaining inference over OWL-Horst.  Computing the complete closure over 100 billion of triples in a number of hours using a MapReduce/Hadoop implementation on a medium-sized cluster. The performance of WebPie [see conference and journal paper] is:

  • 1 billion FactForge triples in 1.5 hours on 32 compute nodes
  • 24 billion Bio2RDF triples in 10 hours on 32 compute nodes
  • 100 billion LUBM triples in 15 hours on 64 compute notes
  • deriving anywhere between 150K-650K triples per second, depending on the dataset
  • runtime growing linearly with number of triples
  • speedup growing linearly the number of compute nodes

Now, a year later, we’re breaking another speed record, but this time for “backward chaining“: not doing all inferencing up front, but doing the inferencing “on the fly”, at query time, as and when they are needed.

Until now, backward-chaining was considered to be unfeasible on very large realistic data, since it would slow down the query response time too much. Our paper at ISWC this year shows it’s not all that impossible: on different real-life datasets of up to 1 billion triples, QueryPIE can do on the fly backward-chaining inference at query-time, implementing the full OWL Horst fragment with response times in millisecs on just 8 machines.

All code available at

Enhanced by Zemanta

 by Zhisheng Huang

The China Higher Education Press will publish a LarKC book in Chinese. This book will appear  in the book series of Web Intelligence and Web Science ( .

This Chinese LarKC book consists of two parts: Technology part and application part. The technology part covers the topics of LarKC platform, development guide and various plugins and workflows.  The application part covers the topics of Linked Life Data, semantic information retrieval, urban computing, and cancer study. The main contributors of  the book are six Chinese researchers in the LarKC Consortium, who are from Amsterdam, WICI, and Siemens. See the appended text below for the detail.
The book is expected to be published by the end of this year.

Here is the outline of the book content and the main contributors.

Chapter 1 Introduction to LarKC
by Zhisheng Huang (VUA) and Ning Zhong (WICI)

Chapter 2 LarKC Platform
by Jun Fang (VUA)

Chapter 3 Identification  and Selection
by Yi Zeng (WICI)

Chapter 4 Abstraction and Transformation
by Yi Huang (SIEMENS)

Chapter 5 Reasoning  and Deciding
by Jun Fang (VUA) and Zhisheng Huang (VUA)

Chapter 6 LarKC Development Guide
by Zhisheng Huang (VUA) and Jun Fang (VUA)

Chapter 7 Linked Life Data
by Yi Huang (SIEMENS) and Zhisheng Huang (VUA)

Chapter 8 Semantic information retrieval for biomedical applications
by Ru He (SIEMENS) and Zhisheng Huang (VUA)

Chapter 9 Semantic Technology and Gene Study
by Zhisheng Huang (VUA)

Chapter 10 Urban Computing
by Yi Huang (SIEMENS) and Zhisheng Huang (VUA)

Chapter 11 Conclusions
by Zhisheng Huang (VUA),  Ru He (SIEMENS), and Ning Zhong (WICI)

The LarKC folk at the German High Performance Computing Centre in Stuttgart did a rather nice write-up on LarKC from a high-performance computing perspective, intended for their own community. Find the relevant pages here.

Enhanced by Zemanta

A LarKC workflow for traffic-aware route-planning has won the 1st prize in the AI Mashup Challenge at the ESWC 2011 conference, held this week on Crete.

The detail of “Traffic_LarKC” can be found at, but in brief:

Four different datasets are used:

  • the traffic sensors data, obtained from Milano Municipality
  • the Milano street topology
  • historical weather data from the Italian website
  • calendar information (week days and week-end days, holidays, etc.) from Milano Municipality and from the Mozilla Calendar project.

These are used in a batchtime workflow to predict the traffic situation over the next two ours and in a runtime workflow to respond to route-planning queries from users.

This LarKC workflow shows that Linked Open Data and the corresponding technologies are now getting good enough to compete with what’s possible in closed commercial systems.

Congratulations to the entire team that has made this possible!

LarKC traffic demo

The LarKC development team is proud to announce the new release V2.5 of the LarKC platform. The new release is a considerable improvement over the previous V2.0 edition, with the following distinctive features:

  • V2.5 is fully compliant with the LarKC final architecture. You can now develop your workflows and plugins, and be assured that future updates won’t change the main APIs.
  • The Management Interface, which makes it possible to run LarKC from your browser, has an updated RESTful implementation. Besides RDF/XML, workflows can now be described in very readable N3 notation.
  • The endpoint for submitting queries to LarKC is now user-definable, and multiple endpoints are supported.
  • The Plug-in Registry has been improved, and is now coupled with the browser-based Management Interface
  • LarKC now uses a Maven-based build system, giving improved version and dependency management, and a simplified procedure for new plug-in creation
  • A number of extra tools have been introduced to make life for LarKC users a lot easier. Besides the Mangement Interface to run LarKC from your browser, V2.5 also contains:
    • A WYSIWIG Worfklow Designer tool that allows you to construct workflows by drag-and-drop, right from your browser: click on some plugins, drag them to the workspace, click to connect them, and press run! (see screenshot below).
    • An updated plug-in wizard for Eclipse.
  • We have thouroughly updated the distributed execution framework. Besides deploying LarKC plugins through Apache (simply by dropping them in your Apache folder), it is now also possible to deploy plugins through JEE (for webservers) or GAT (for clusters).
  • The WYSIWYG Workflow Designer allows you to specify remote execution of a plugin simply by connecting a plugin to a remote host. Templates are provided for such remote host declaration.
  • LarKC now takes care of advanced data caching for plug-ins
  • V2.5 comes with extended and improved JUnit tests
  • Last but not least, we have considerably improved documentation and user manuals, including a quick-start guide, tutorial materials and example workflows.

The release can be downloaded from
The platform’s manual is available at

Bugs can be submitted using the bug tracker at

As usual, you are encouraged to use the discussion forums and mailing lists served by the LarKC@SourceForge development environment.
please see at

LarKC Workflow Editor

Should the semantic web be just about querying RDF? Or is it usefual (or even: feasible) to use the semantics of RDF, RDF Schema and OWL to derive additional information from the published RDF graphs? Both the feasibility and the usefulness of this depends on the amount of additional triples that are derived by inference: when almost zero, there is little point to inference, when explosively large, it might become infeasible.

LarKC researchers at OntoText produced an informative table showing the amount of additional triples that can be inferred from some of the most popular datasets on the Web. It’s interesting to see how the datasets differ in their semantic richness, with their ratio of explicit triples vs. inferred triples ranging from close to zero (CIA Factbook) to a 16-fold increase (for DBPedia). Please let us know if you have similar statistics for other datasets.

All of the data below taken from FactForge which by itself now contains 1.5billion triples, nearly four times larger than in the beginning of the LarKC project in 2008. All of the figures below obtained with BigOWLIM 3.4, under the OWL-Horst semantics. Size is reported in billions of triples.

Dataset Explicit Indexed Triples Inferred Indexed Triples Total of Indexed Triples Entities (nodes in the graph) Inferred closure ratio
Schemata (Proton,
DC) and ontologies
(DBpedia, Geonames)
15 9 23 8 0.6
DBpedia (SKOS
2,915 47,837 50,751 1,135 16.4
NY Times 574 328 902 185 0.6
UMBEL 4,638 6,936 11,575 1,190 1.5
Lingvoj 20 182 201 18 9.2
CIA Factbook 76 4 80 24 0.1
WordNet 1,943 6,067 8,010 842 3.1
Geonames 142,011 194,191 336,202 42,738 1.4
DBpedia core 825,162 166,740 991,902 125,803 0.2
Freebase 494,344 52,411 546,754 123,511 0.1
MusicBrainz 45,492 36,572 82,064 15,585 0.8
Related articles

Enhanced by Zemanta

By Bosse Andersson
The first LarKC Pharma workshop was held in Stuttgart April 19 and 20. An interesting mix of participants from pharmaceutical companies, semantic web companies and research/academia formed an open atmosphere with many intense discussions and hopefully future interactions.

The workshop had an outline similar to previous LarKC tutorials with a twist from the pharma domain in presentations and examples.

Participants did find the LarKC platform and the Linked Life Data repository useful;

  • From pharma perspective questions circulated around what the requirements will be for us to host/use LarKC as an internal experimental platform.
  • The semantic web companies where more interested in how to use components of LarKC or provide services that can leverage from the LarKC platform. 
  • The research/academia community had a specific need to learn how to quickly get LarKC up and running for the first iteration in the Innovative Medicine Initiative, OpenPhacts.

Many questions did come up during lively discussions, some were answered others will be brought back to the consortium to address, e. g. how to lower the entrance to start using LarKC.

Although LarKC is based in Europe, the project of building, and applying, web-scale reasoning is world wide. One of the most exciting things about living in a connected world, and a world of abundant, location independent computational resources, is that people anywhere in the world can do world class AI research, and develop applications based on that research. The recent, and very rapid, increase in internet bandwidth going into Africa means that one can now use Shazam to get impromptu karaoke lyrics for the Texas country-and-western playing in a hotel bar in Accra. It also means that previously isolated African researchers can make a full contribution to the advance of semantic technology.  In February, partially supported by the FP7 Active project, we had the opportunity to present LarKC, and the potential benefits of AI and human-computer collaboration, to students and researchers at the Ghana-India Kofi Annan Centre of Excellence in ICT in Ghana. Discussion following the talks was lively, with great local ideas for the application of AI in knowledge capture from small farmers, and resource allocation for rural health care. Video from some talks is being made available on, there was good coverage from the local media, and we look forward to building a collaboration with our new colleagues.

Related articles

Enhanced by Zemanta