News and Updates on the KRR Group
Header image

Author Archives: data2semantics

Source: Data2Semantics

This june 10 and 11, the Data2Semantics team locked itself in a room in the Amsterdam Public Library to build a first version of the Data2Semantics Golden Demo: a pipeline for publishing enriched data (‘semantics’) directly from Dropbox to Figshare, integrated in the Linkitup webservice.

In two days, we built and integrated:

Watch the video!

 

Enhanced by Zemanta

Source: Data2Semantics


During the COMMIT/ Community event, April 2-3 in Lunteren, the Data2Semantics won one out of three COMMIT/ Valorization awards. The award is a 10000 euros subsidy to encourage the project to bring one of its products closer to use outside academia.

At the event, we presented and emphasized the philosophy of Data2Semantics to embed new enrichment tools in the current workflow of individual researchers. We are working closely with both Figshare.com (with our Linkitup tool) and Elsevier Labs to bring semantics at the fingertips of the researcher.

Source: Data2Semantics

BeyondThePDF2 – TimeLapse from Jongens van de Tekeningen on Vimeo.

Data2Semantics was the local organizer for the Beyond the PDF 2 conference. This conference brought together over 200 scholars, technologists, librarians and publishers to discuss the future of research communication. The conference had a huge social media presence with 3,500 tweets sent by 625 participants over the course of the two days. There were also lots of other outcomes.

This is another example of how Data2Semantics is reaching out to the scientific and research communities to push new ways of doing research.

 

Source: Data2Semantics

Vrije Universiteit (Amsterdam). Left: Exact Sc...

VU University Amsterdam (Photo credit: Wikipedia)

Learn to build better code in less time.

Software Carpentry (http://www.software-carpentry.org) is a two day bootcamp for researchers to learn how to be more productive with code and software creation. VU University Amsterdam brings Software Carpentry to the Netherlands for the first time. PhD students, postdocs and researchers in physics are cordially invited for this free, 2-day workshop, on May 2–3, 2013, in Amsterdam.

Data2Semantics is sponsoring the event to help learn the issues facing scientists around managing their data.

Go to http://www.data2semantics.org/bootcamp for more information and registration (max. 40!) .

Enhanced by Zemanta

Last Wednesday, Frank van Harmelen appeared on the Dutch science TV program “Labyrint”, where he interviews George Dyson, Luc Steels and François Pachet about their ideas on the future of Computers.

The program can be watched online (in Dutch):

And here’s the discussion session afterwards (in Dutch):

More information at the website of Labyrint.

Source: Data2Semantics

A few months ago Laurens Rietveld was looking for a query interface from which he could easily query any other SPARQL endpoint.

But he couldn’t find any that fit my requirements:

So he decided to make his own!

Give it a try at: http://aers.data2semantics.org/yasgui/

Future work (next year probably):

Comments are appreciated (including feature ideas / bug reports).

Sources are available at https://github.com/LaurensRietveld/yasgui

Enhanced by Zemanta

Source: Data2Semantics

Complexity metrics form the backbone of graph analysis. Centrality, betweenness, assortativity and scale freeness are just a handful of selections from a large and quickly growing literature. It seems that every purpose has its own notion of complexity. Can we find a way to tie these disparate notions together?

Algorithmic statistics provide an answer. It posits that any useful property that is induced from data can be used to compress it—to store it more efficiently. If I know that my network is scale free, or that a set of points is distributed normally, that information will allow me to come up with a more efficient representation of the data. If not, the property we have learned is of no use.

This notion allows us to see data compression, learning and complexity analysis as simply three names for the same thing. The less a dataset can be compressed, the more complex it is, the more it can be compressed the more useful our induced information is.

But we can go further than just complexity. Occam’s razor tells us that the simplest explanation is often the best. Algorithmic statistics provides us with a more precise version. If our data is the result of a computational process, and we have found a short description of it, then with high probability the model that allowed that compression is also a description of the process that generated our data. And that is ultimately what semantics is, a description of a generating process. Whether it’s the mental state that led to a linguistic expression, or the provenance trail that turned one form of data into another. When we talk about semantics, we are usually discussing computational processes generating data.

Practically, algorithmic statistics will give us a means to turn any family of network models (from frequent subgraphs to graph grammars) into a family of statistics. If the network model is powerful enough, the statistics should be able to capture any existing property of complex graphs, including scale freeness, assortativity or fractal scaling.

Enhanced by Zemanta

Source: Data2Semantics

TabLinker, introduced in an earlier post, is a spreadsheet to RDF converter. It takes Excel/CSV files as input, and produces enriched RDF graphs with cell contents, properties and annotations using the DataCube and Open Annotation vocabularies.

TabLinker interprets spreadsheets based on hand-made markup using a small set of predefined styles (e.g. it needs to know what the header cells are). Work package 6 is currently investigating whether and how we can perform this step automatically.

Features:

  • Raw, model-agnostic conversion from spreadsheets to RDF
  • Interactive spreadsheet marking within Excel
  • Automatic annotation recognition and export with OA
  • Round-trip conversion: revive the original spreadsheet files from the produced RDF (UnTabLinker)

In Data2Semantics, we have used TabLinker to publish linked socio-historical data, converting the historical Dutch censuses (1795-1971) to RDF (see slides).

 

Social historians are actively doing research using these datasets, producing rich annotations that correct or reinterpret data; these annotations are very useful when checking dataset quality and consistency (see model). Published RDF is ready-to-query and visualze via SPARQL queries.

 

 

Enhanced by Zemanta

Source: Data2Semantics

Mathematics

Part of work package 2 is developing machine learning techniques to automatically enrich linked data. The web of data has become so large, that maintaining it by hand is no longer possible. In contrast to existing techniques for learning for the semantic web, we aim at applying the techniques directly to the linked data.

We use kernel based machine learning techniques, which can deal well with structured data, such as RDF graphs. Different graph kernels exist, typically developed in the bioinformatics domain, thus which kernels are most suited to RDF is an unanswered question. A big advantage of the graph kernel approach is that relatively little preprocessing/feature selection of the RDF graph is necessary and graph kernels can be applied for a wide range of tasks, such as property prediction, link prediction, node clustering, node ranking, etc.

Currently our research focusses on:

  • which graph kernels are best suited to RDF,
  • what part of the RDF graph do we need for the graph kernel,
  • which tasks are well suited to solve using kernels.

A paper with the most recent results is currently under submission at SDM 2013. Code for different graph kernels and for redoing our experiments is available at: https://github.com/Data2Semantics/d2s-tools.

Enhanced by Zemanta

Source: Data2Semantics

Linkitup is a Web-based dashboard for enrichment of research output published via the Figshare.com repository service. For license terms, see below.

Linkitup currently does two things:

  • it takes metadata entered through Figshare.com and tries to find equivalent terms, categories, persons or entities on the Linked Data cloud and several Web 2.0 services.
  • it extracts references from publications in Figshare.com, and tries to find the corresponding Digital Object Identifier (DOI).

Linkitup is developed as part of our strategy to bring technology for adding semantics to research data to actual users.

Linkitup currently contains five plugins:

  • Wikipedia/DBPedia linking to tags and categories
  • Linking of authors to the DBLP bibliography
  • CrossRef linking of papers to DOIs through bibliographic reference extraction
  • Elsevier Linked Data Repository linking to tags and categories
  • ORCID linking to authors

Using Figshare allows Data2Semantics to:

  • tap into a wealth of research data already published
  • provide state-of-the art data enrichment services on a prominent platform with a significant user base, and
  • bring RDF data publishing to a mainstream platform.
  • And lastly, Figshare removes the need for a Data2Semantics data repository

Linkitup feeds the enriched metadata back as links to the original article in Figshare, but also builds a RDF representation of the metadata that can be downloaded separately, or published as research output on Figshare.

We aim to extend linkitup to connect to other research repositories such as EASY and the Dataverse Network.

A live version of Linkitup is available at http://linkitup.data2semantics.org. Note that the software is stil in beta! You will need a Figshare login and some data published in Figshare to get started.

More information, including installation instructions are available from Github.

 

Enhanced by Zemanta