News and Updates on the KRR Group
Header image

Author Archives: data2semantics

Source: Data2Semantics

Our website with additional material for our paper: “A Fast Approximation of the Weisfeiler-Lehman Graph Kernel for RDF Data” has won the Open Science Award at ECML/PKDD 2013. The jury praised the submission as “a perfect example of open science”.

A goal of the Data2Semantics project is to provide resuable software to support semantic enrichment of data. Therefore, the software used for the paper uses existing well-known libraries (SESAME, LibSVM) and was set up into three distinct projects from the start. The heart of software is the proppred library, which contains all the code for doing property prediction using graph kernels on RDF data. Some additional support code for handling RDF is in the d2s-tools project. All the code to run the experiments from the paper(s) is in a separate project called kernelexperiments. This setup allows for easy replication of the (and doing new) experiments and easier integration of the property prediction on RDF library into other projects.

For the future, we aim to provide even more scientific openess via the experimental machine learning platform that we are developing. One of the aims of the platform is to make experimentation easier, without introducing too much overhead. Furthermore, we wil export provenance of the experiments in the Prov-O format. This provenance is visualized using Prov-O-Viz (also developed in Data2Semantics), allowing researchers to gain better insight into the experiments without having to study the code.

Source: Data2Semantics

Paul Groth co-authored an article about altmetrics in the Elsevier Library Connect newsletter for librarians. The newsletter reaches 18,000 librarians in 138 countries around the world.

Academic research and publishing have transitioned from paper to online platforms, and that migration has continued to evolve from closed platforms to connected networks. With this evolution, there is growing interest in the academic community in how we might measure scholarly activity online beyond formal citation.

See more at:

Enhanced by Zemanta

Source: Data2Semantics

As a complement to two papers that we will present at the ECML/PKDD 2013 conference in Prague in September we created a webpage with additional material.

The first paper: “A Fast Approximation of the Weisfeiler-Lehman Graph Kernel for RDF Data” was accepted into the main conference and the second paper: “A Fast and Simple Graph Kernel for RDF” was accepted at the DMoLD workshop.

We include links to the papers, to the software and to the datasets used in the experiments, which are stored in figshare. Furthermore, we explain how to rerun the experiments from the papers using a precompiled JAR file, to make the effort required as minimal as possible.

Source: Data2Semantics

This june 10 and 11, the Data2Semantics team locked itself in a room in the Amsterdam Public Library to build a first version of the Data2Semantics Golden Demo: a pipeline for publishing enriched data (‘semantics’) directly from Dropbox to Figshare, integrated in the Linkitup webservice.

In two days, we built and integrated:

Watch the video!


Enhanced by Zemanta

Source: Data2Semantics

During the COMMIT/ Community event, April 2-3 in Lunteren, the Data2Semantics won one out of three COMMIT/ Valorization awards. The award is a 10000 euros subsidy to encourage the project to bring one of its products closer to use outside academia.

At the event, we presented and emphasized the philosophy of Data2Semantics to embed new enrichment tools in the current workflow of individual researchers. We are working closely with both (with our Linkitup tool) and Elsevier Labs to bring semantics at the fingertips of the researcher.

Source: Data2Semantics

BeyondThePDF2 – TimeLapse from Jongens van de Tekeningen on Vimeo.

Data2Semantics was the local organizer for the Beyond the PDF 2 conference. This conference brought together over 200 scholars, technologists, librarians and publishers to discuss the future of research communication. The conference had a huge social media presence with 3,500 tweets sent by 625 participants over the course of the two days. There were also lots of other outcomes.

This is another example of how Data2Semantics is reaching out to the scientific and research communities to push new ways of doing research.


Source: Data2Semantics

Vrije Universiteit (Amsterdam). Left: Exact Sc...

VU University Amsterdam (Photo credit: Wikipedia)

Learn to build better code in less time.

Software Carpentry ( is a two day bootcamp for researchers to learn how to be more productive with code and software creation. VU University Amsterdam brings Software Carpentry to the Netherlands for the first time. PhD students, postdocs and researchers in physics are cordially invited for this free, 2-day workshop, on May 2–3, 2013, in Amsterdam.

Data2Semantics is sponsoring the event to help learn the issues facing scientists around managing their data.

Go to for more information and registration (max. 40!) .

Enhanced by Zemanta

Last Wednesday, Frank van Harmelen appeared on the Dutch science TV program “Labyrint”, where he interviews George Dyson, Luc Steels and François Pachet about their ideas on the future of Computers.

The program can be watched online (in Dutch):

And here’s the discussion session afterwards (in Dutch):

More information at the website of Labyrint.

Source: Data2Semantics

A few months ago Laurens Rietveld was looking for a query interface from which he could easily query any other SPARQL endpoint.

But he couldn’t find any that fit my requirements:

So he decided to make his own!

Give it a try at:

Future work (next year probably):

Comments are appreciated (including feature ideas / bug reports).

Sources are available at

Enhanced by Zemanta

Source: Data2Semantics

Complexity metrics form the backbone of graph analysis. Centrality, betweenness, assortativity and scale freeness are just a handful of selections from a large and quickly growing literature. It seems that every purpose has its own notion of complexity. Can we find a way to tie these disparate notions together?

Algorithmic statistics provide an answer. It posits that any useful property that is induced from data can be used to compress it—to store it more efficiently. If I know that my network is scale free, or that a set of points is distributed normally, that information will allow me to come up with a more efficient representation of the data. If not, the property we have learned is of no use.

This notion allows us to see data compression, learning and complexity analysis as simply three names for the same thing. The less a dataset can be compressed, the more complex it is, the more it can be compressed the more useful our induced information is.

But we can go further than just complexity. Occam’s razor tells us that the simplest explanation is often the best. Algorithmic statistics provides us with a more precise version. If our data is the result of a computational process, and we have found a short description of it, then with high probability the model that allowed that compression is also a description of the process that generated our data. And that is ultimately what semantics is, a description of a generating process. Whether it’s the mental state that led to a linguistic expression, or the provenance trail that turned one form of data into another. When we talk about semantics, we are usually discussing computational processes generating data.

Practically, algorithmic statistics will give us a means to turn any family of network models (from frequent subgraphs to graph grammars) into a family of statistics. If the network model is powerful enough, the statistics should be able to capture any existing property of complex graphs, including scale freeness, assortativity or fractal scaling.

Enhanced by Zemanta