News and Updates on the KRR Group
Header image

TabLinker is experimental software for converting manually annotated Microsoft Excel workbooks to the RDF Data Cube vocabulary. It is used in the context of the Data2Semantics project to investigate the use of Linked Data for humanities research (Dutch census dataproduced by DANS).

TabLinker was designed for converting Excel or CSV files to RDF (triplification, RDF-izing) that have a complex layout and cannot be handled by fully automatic csv2rdf scripts.

A presentation about Linked Census Data, including TabLinker is available from SlideShare.

Please consult the Github page for the latest release information.

Using TabLinker

TabLinker takes annotated Excel files (found using the srcMask option in the config.ini file) and converts them to RDF. This RDF is serialized to the target folder specified using the targetFolder option in config.ini.

Annotations in the Excel file should be done using the built-in style functionality of Excel (you can specify these by hand). TabLinker currently recognises seven styles:

  • TabLink Title – The cell containing the title of a sheet
  • TabLink Data – A cell that contains data, e.g. a number for the population size
  • TabLink ColHeader – Used for the headers of columns
  • TabLink RowHeader – Used for row headers
  • TabLink HierarchicalRowHeader – Used for multi-column row headers with subsumption/taxonomic relations between the values of the columns
  • TabLink Property – Typically used for the header cells directly above RowHeader or HierarchicalRowHeader cells, cell values are the properties that relate Data cells to RowHeader and HierarchicalRowHeader cells.
  • TabLink Label – Used for cells that contain a label for one of the HierarchicalRowHeader cells.

An eight style, TabLink Metadata, is currently ignored (See #3).

An example of such an annotated Excel file is provided in the input directory. There are ways to import the styles defined in that file into your own Excel files.

Tip: If your table contains totals for HierarchicalRowHeader cell values, use a non-TabLink style to mark the cells between the level to which the total belongs, and the cell that contains the name of the total. Have a look at the example annotated Excel file to see how this is done (up to row 428).

Once you’re all set, start the TabLinker by cd-ing to the src folder, and running:

python tablinker.py

Requirements

TabLinker was developed under the following environment:

github kitty

We have opened up a Data2Semantics GitHub organisation for publishing all (open source) code produced within the Data2Semantics project. Point your browser (or Git client) to http://github.com/Data2Semantics for the latest and greatest!

Enhanced by Zemanta

Data2Semantics at ICTDelta 2011

Posted by data2semantics in collaboration | computer science | large scale | semantic web | vu university amsterdam - (Comments Off on Data2Semantics at ICTDelta 2011)

The COMMIT programme was officially kicked-off by Maxime Verhagen, minister of Economic Affairs, Agriculture and Innovation at  the ICTDelta 2011 event held at the World Forum on November 16, in The Hague.

Throughout the day, members of the Data2Semantics project manned a very busy stand in the foyer, featuring prior and current work by the project partners such as the AIDA toolkit, OpenPHACTS, LarKC and the MetaLex Document Server.

Enhanced by Zemanta

Data2Semantics aims to provide essential semantic infrastructure for bringing e-Science to the next level.

A core task for scientific publishers is to speed up scientific progress by improving the availability of scientific knowledge. This holds both for dissemination of results through traditional publications, as well as through the publication of scientific data. The Data2Semantics project focuses on a key problem for data management in e-Science:

How to share, publish, access, analyse, interpret and reuse data?

Data2Semantics is a collaboration between the VU University Amsterdam, the University of Amsterdam, Data Archiving and Networked Services (DANS) of the KNAW, Elsevier Publishing and Philips, and is funded under the COMMIT programme of the NL Agency of the Dutch Ministry of Economic Affairs, Agriculture and Innovation.

Enhanced by Zemanta