2012Q3 Heads up

A quick heads up on the progress of Data2Semantics over the course of the third quarter of 2012.

Management summary: we have made headway in developing data enrichment and analysis tools that will have use in practice.

First of all, we developed a first version of a tool for enriching and interlinking data stored in the popular Figshare

data repository. This tool, called Linkitup, takes metadata provided by the data publisher (usually the author) and can link it to DBPedia/Wikipedia, DBLP, ORCID, ScopusID, the Elsevier Linked Data Repository. These links are fed back to the publication in Figshare, but can also be published separately as RDF. This way, we use Web 2.0 mechanisms that are popular amongst researchers to allow them to enrich their data, and reap immediate benefit. Our plans are to integrate increasingly elaborate enrichment and analysis tools in this dashboard (e.g. annotation, complexity analysis, provenance reconstruction, etc.)

Linkitup is available from http://github.com/Data2Semantics/linkitup . We are aiming for a first release soon!

Furthermore, we have made a good start at reconstructing provenance information from documents stored in a Dropbox folder. This technology can be used to trace back research ideas through chains of publications in a way that augments and extends the citation network. The provenance graph does not necessarily span just published papers, but can be constructed for a variety of document types (posters, presentations, notes, documents, spreadsheets, etc.).

A first implementation of (partial) linked data replication that will make dealing with large volumes of linked data much more manageable. The crux of partial replication lies in the selection (i.e. ranking) of a suitable subset of data to replicate. We will use data information measures, graph analysis, and statistical methods to perform this selection. The use case of clinical decision support will be the primary testing ground for this technology.