News and Updates on the KRR Group
Header image

Source: Data2Semantics

Our website with additional material for our paper: “A Fast Approximation of the Weisfeiler-Lehman Graph Kernel for RDF Data” has won the Open Science Award at ECML/PKDD 2013. The jury praised the submission as “a perfect example of open science”.

A goal of the Data2Semantics project is to provide resuable software to support semantic enrichment of data. Therefore, the software used for the paper uses existing well-known libraries (SESAME, LibSVM) and was set up into three distinct projects from the start. The heart of software is the proppred library, which contains all the code for doing property prediction using graph kernels on RDF data. Some additional support code for handling RDF is in the d2s-tools project. All the code to run the experiments from the paper(s) is in a separate project called kernelexperiments. This setup allows for easy replication of the (and doing new) experiments and easier integration of the property prediction on RDF library into other projects.

For the future, we aim to provide even more scientific openess via the experimental machine learning platform that we are developing. One of the aims of the platform is to make experimentation easier, without introducing too much overhead. Furthermore, we wil export provenance of the experiments in the Prov-O format. This provenance is visualized using Prov-O-Viz (also developed in Data2Semantics), allowing researchers to gain better insight into the experiments without having to study the code.

Source: Data2Semantics

Paul Groth co-authored an article about altmetrics in the Elsevier Library Connect newsletter for librarians. The newsletter reaches 18,000 librarians in 138 countries around the world.

Academic research and publishing have transitioned from paper to online platforms, and that migration has continued to evolve from closed platforms to connected networks. With this evolution, there is growing interest in the academic community in how we might measure scholarly activity online beyond formal citation.

See more at:

Enhanced by Zemanta

Source: Data2Semantics

As a complement to two papers that we will present at the ECML/PKDD 2013 conference in Prague in September we created a webpage with additional material.

The first paper: “A Fast Approximation of the Weisfeiler-Lehman Graph Kernel for RDF Data” was accepted into the main conference and the second paper: “A Fast and Simple Graph Kernel for RDF” was accepted at the DMoLD workshop.

We include links to the papers, to the software and to the datasets used in the experiments, which are stored in figshare. Furthermore, we explain how to rerun the experiments from the papers using a precompiled JAR file, to make the effort required as minimal as possible.

Source: Data2Semantics

This june 10 and 11, the Data2Semantics team locked itself in a room in the Amsterdam Public Library to build a first version of the Data2Semantics Golden Demo: a pipeline for publishing enriched data (‘semantics’) directly from Dropbox to Figshare, integrated in the Linkitup webservice.

In two days, we built and integrated:

Watch the video!


Enhanced by Zemanta

Source: Data2Semantics

During the COMMIT/ Community event, April 2-3 in Lunteren, the Data2Semantics won one out of three COMMIT/ Valorization awards. The award is a 10000 euros subsidy to encourage the project to bring one of its products closer to use outside academia.

At the event, we presented and emphasized the philosophy of Data2Semantics to embed new enrichment tools in the current workflow of individual researchers. We are working closely with both (with our Linkitup tool) and Elsevier Labs to bring semantics at the fingertips of the researcher.

Source: Data2Semantics

BeyondThePDF2 – TimeLapse from Jongens van de Tekeningen on Vimeo.

Data2Semantics was the local organizer for the Beyond the PDF 2 conference. This conference brought together over 200 scholars, technologists, librarians and publishers to discuss the future of research communication. The conference had a huge social media presence with 3,500 tweets sent by 625 participants over the course of the two days. There were also lots of other outcomes.

This is another example of how Data2Semantics is reaching out to the scientific and research communities to push new ways of doing research.


Source: Data2Semantics

Vrije Universiteit (Amsterdam). Left: Exact Sc...

VU University Amsterdam (Photo credit: Wikipedia)

Learn to build better code in less time.

Software Carpentry ( is a two day bootcamp for researchers to learn how to be more productive with code and software creation. VU University Amsterdam brings Software Carpentry to the Netherlands for the first time. PhD students, postdocs and researchers in physics are cordially invited for this free, 2-day workshop, on May 2–3, 2013, in Amsterdam.

Data2Semantics is sponsoring the event to help learn the issues facing scientists around managing their data.

Go to for more information and registration (max. 40!) .

Enhanced by Zemanta

Last Wednesday, Frank van Harmelen appeared on the Dutch science TV program “Labyrint”, where he interviews George Dyson, Luc Steels and François Pachet about their ideas on the future of Computers.

The program can be watched online (in Dutch):

And here’s the discussion session afterwards (in Dutch):

More information at the website of Labyrint.

Source: Data2Semantics

A few months ago Laurens Rietveld was looking for a query interface from which he could easily query any other SPARQL endpoint.

But he couldn’t find any that fit my requirements:

So he decided to make his own!

Give it a try at:

Future work (next year probably):

Comments are appreciated (including feature ideas / bug reports).

Sources are available at

Enhanced by Zemanta

This week we received notification from the EU that the LDBC project has been granted. We think this is great news. The LDBC project (is a STREP and will run until Q2 2015. LDBC stands for Linked Data Benchmark Council, and linked data here of course comprises RDF data management, but also includes the emerging class of graph database systems.

The mission of the LDBC project is to establish a long-term independent association among RDF and Graph database companies that define benchmarks, specify benchmarking practices and publish officially vetted benchmark results. Beyond the project partners, many commercial vendors of RDF and Graph database systems have already expressed their interest in joining this council (once we have founded the legal entity.. it will take a few months still).

The motivation behind the project is to show the strengths (and weaknesses) of RDF and Graph database technologies to the wider IT community pondering the adoption of these technologies, by enabling comparisons between the various products but also with established relational database technologies. Also, by establishing competition on these benchmarks LDBC aims to foment technical progress in the RDF and Graph database systems.

The LDBC project partners include for the RDF database community Ontotext and Openlink; from the graph database side there is Neo Technologies (of neo4j fame) and Sparsity is indirectly involved through academic project partner UPC (Barcelona). Other project partners are University of Innsbruck, FORTH, VU University Amsterdam and Technical University Munich (TUM). The academic partners will help to provide the council with an initial set of benchmarks.

The technical topics of interest for benchmarking are:

  • complex analytical queries for both graph and RDF
  • graph analysis algorithms and traversals
  • large-scale reasoning on RDF data
  • transaction performance
  • systems support for data integration and provenance

The use-case scenarios for these are:

  • social networking (e.g. marketing companies)
  • dynamic publishing (e.g. BBC)
  • telecommunication network analysis
  • bioinformatics data integration (e.g. OpenPhacts)

LDBC interacts with users of Graph and RDF technologies through is Technical User Community (TUC), and the TUC is having its first users workshop in Barcelona next week Nov19+Nov20 ( on the premises of UPC. The main take-away for users to engage with the TUC is to influence the benchmarking agenda of the LDBC. Talk to us, and RDF vendors might start competing in how to best solve your problems! Even if the Barcelona meeting is too short notice, please drop a note if you want to be involved in the TUC or know people who should.

Finally, please fill in the questionnaire ( to tell us about your usage (problems) with RDF (or graph) database technologies. We will be looking at the questionnaire results that we have received by Friday November 16 to help set the agenda in the users meeting, so if you want to contribute already this week, that would be highly appreciated.

Thanks for your time, also on behalf of the full LDBC consortium,

Peter Boncz (scientific director LDBC)
Paul Groth
Frank van Harmelen

Enhanced by Zemanta