Update: Machine Learning and Linked Data

Source: Data2Semantics

Part of work package 2 is developing machine learning techniques to automatically enrich linked data. The web of data has become so large, that maintaining it by hand is no longer possible. In contrast to existing techniques for learning for the semantic web, we aim at applying the techniques directly to the linked data.

We use kernel based machine learning techniques, which can deal well with structured data, such as RDF graphs. Different graph kernels exist, typically developed in the bioinformatics domain, thus which kernels are most suited to RDF is an unanswered question. A big advantage of the graph kernel approach is that relatively little preprocessing/feature selection of the RDF graph is necessary and graph kernels can be applied for a wide range of tasks, such as property prediction, link prediction, node clustering, node ranking, etc.

Currently our research focusses on:

which graph kernels are best suited to RDF,
what part of the RDF graph do we need for the graph kernel,
which tasks are well suited to solve using kernels.

A paper with the most recent results is currently under submission at SDM 2013. Code for different graph kernels and for redoing our experiments is available at: https://github.com/Data2Semantics/d2s-tools.