News and Updates on the KRR Group
Header image

The LarKC development team is proud to announce the new release V2.5 of the LarKC platform. The new release is a considerable improvement over the previous V2.0 edition, with the following distinctive features:

  • V2.5 is fully compliant with the LarKC final architecture. You can now develop your workflows and plugins, and be assured that future updates won’t change the main APIs.
  • The Management Interface, which makes it possible to run LarKC from your browser, has an updated RESTful implementation. Besides RDF/XML, workflows can now be described in very readable N3 notation.
  • The endpoint for submitting queries to LarKC is now user-definable, and multiple endpoints are supported.
  • The Plug-in Registry has been improved, and is now coupled with the browser-based Management Interface
  • LarKC now uses a Maven-based build system, giving improved version and dependency management, and a simplified procedure for new plug-in creation
  • A number of extra tools have been introduced to make life for LarKC users a lot easier. Besides the Mangement Interface to run LarKC from your browser, V2.5 also contains:
    • A WYSIWIG Worfklow Designer tool that allows you to construct workflows by drag-and-drop, right from your browser: click on some plugins, drag them to the workspace, click to connect them, and press run! (see screenshot below).
    • An updated plug-in wizard for Eclipse.
  • We have thouroughly updated the distributed execution framework. Besides deploying LarKC plugins through Apache (simply by dropping them in your Apache folder), it is now also possible to deploy plugins through JEE (for webservers) or GAT (for clusters).
  • The WYSIWYG Workflow Designer allows you to specify remote execution of a plugin simply by connecting a plugin to a remote host. Templates are provided for such remote host declaration.
  • LarKC now takes care of advanced data caching for plug-ins
  • V2.5 comes with extended and improved JUnit tests
  • Last but not least, we have considerably improved documentation and user manuals, including a quick-start guide, tutorial materials and example workflows.

The release can be downloaded from
The platform’s manual is available at

Bugs can be submitted using the bug tracker at

As usual, you are encouraged to use the discussion forums and mailing lists served by the LarKC@SourceForge development environment.
please see at

LarKC Workflow Editor

Source: Think Links

While exploring the London Science Museum, I saw this great exhibit for the Toaster Project. The idea was to try to build a modern day toaster from scratch. There’s a video describing the project below and more info about the project from the site linked above.  What was interesting was that to get some information about how things were produced, Thomas Thwaites had to go look in some pretty old books to see how things get produced. I think it would be cool to make it easy to link  every product in my house to how to produce it (or how it was created) without going through a 9 month process to figure it out.

Filed under: supply chains Tagged: cool project, real world provenance, toaster project

Should the semantic web be just about querying RDF? Or is it usefual (or even: feasible) to use the semantics of RDF, RDF Schema and OWL to derive additional information from the published RDF graphs? Both the feasibility and the usefulness of this depends on the amount of additional triples that are derived by inference: when almost zero, there is little point to inference, when explosively large, it might become infeasible.

LarKC researchers at OntoText produced an informative table showing the amount of additional triples that can be inferred from some of the most popular datasets on the Web. It’s interesting to see how the datasets differ in their semantic richness, with their ratio of explicit triples vs. inferred triples ranging from close to zero (CIA Factbook) to a 16-fold increase (for DBPedia). Please let us know if you have similar statistics for other datasets.

All of the data below taken from FactForge which by itself now contains 1.5billion triples, nearly four times larger than in the beginning of the LarKC project in 2008. All of the figures below obtained with BigOWLIM 3.4, under the OWL-Horst semantics. Size is reported in billions of triples.

Dataset Explicit Indexed Triples Inferred Indexed Triples Total of Indexed Triples Entities (nodes in the graph) Inferred closure ratio
Schemata (Proton,
DC) and ontologies
(DBpedia, Geonames)
15 9 23 8 0.6
DBpedia (SKOS
2,915 47,837 50,751 1,135 16.4
NY Times 574 328 902 185 0.6
UMBEL 4,638 6,936 11,575 1,190 1.5
Lingvoj 20 182 201 18 9.2
CIA Factbook 76 4 80 24 0.1
WordNet 1,943 6,067 8,010 842 3.1
Geonames 142,011 194,191 336,202 42,738 1.4
DBpedia core 825,162 166,740 991,902 125,803 0.2
Freebase 494,344 52,411 546,754 123,511 0.1
MusicBrainz 45,492 36,572 82,064 15,585 0.8
Related articles

Enhanced by Zemanta

Source: Think Links

I’m in London for  a number of meetings. Last week I had a great time talking with chemist and IT people about how to deal with chemistry data in the new project I’m working on OpenPhacts. You’ll probably hear more about this from me as the project gets up and running. This week I’m at a workshop discussing and hacking some next generation ways of measuring impact in science.

Anyway, on the weekend I got to visit some of London’s fantastic museums. I spend a lot of my time thinking about ways of describing the provenance of things particularly  data. This tends to get rather complicated… But visiting these museums, you see how some very simple provenance can add a lot to understanding something. Here’s some examples:

A very cool looking map of britain from the Natural History Museum:

Checking out the bit of text that goes with it:

We now know that it was produced by William Smith by himself in 1815 and that this version is a facsimile. Furthermore, we find out that it was the first geological map of Britain. That little bit of information about the map’s origins makes it even cooler to look at.

Another example this time from the Victoria and Albert Museum. An action packed sculpture:

And we look at the text associated with it:

and find some interesting provenance information. We have a rough idea about when it was produced between 1622-23 and who did it (Bernini). Interestingly, we also find out how it transitioned through its series of owners from Cardinal Montalto to Joshua Reynolds and then in was in the Yarborough Collection and finally purchased by the museum. This chain of ownership is classic provenance. Actually, wikipedia has even more complete provenance of the sculpture.

These examples illustrate how a bit of provenance can add so much more richness and meaning to objects.I’m going to be on the look out for provenance  in the wild.

If you spot some cool examples of provenance, let me know.

Filed under: communicating provenance

Source: Think Links

I’m pretty excited about the Beyond Impact workshop next week in London. It’s a workshop/hackathon to look at next generation ways of measuring impact in science. This has to do with the altmetrics initiative I’m involved with and our Semantically Mapping Science project.

Here’s me giving a video introduction of myself for the workshop….

Filed under: academia Tagged: #altmetrics, beyondimpact

By Bosse Andersson
The first LarKC Pharma workshop was held in Stuttgart April 19 and 20. An interesting mix of participants from pharmaceutical companies, semantic web companies and research/academia formed an open atmosphere with many intense discussions and hopefully future interactions.

The workshop had an outline similar to previous LarKC tutorials with a twist from the pharma domain in presentations and examples.

Participants did find the LarKC platform and the Linked Life Data repository useful;

  • From pharma perspective questions circulated around what the requirements will be for us to host/use LarKC as an internal experimental platform.
  • The semantic web companies where more interested in how to use components of LarKC or provide services that can leverage from the LarKC platform. 
  • The research/academia community had a specific need to learn how to quickly get LarKC up and running for the first iteration in the Innovative Medicine Initiative, OpenPhacts.

Many questions did come up during lively discussions, some were answered others will be brought back to the consortium to address, e. g. how to lower the entrance to start using LarKC.