News and Updates on the KRR Group
Header image

TabLinker is experimental software for converting manually annotated Microsoft Excel workbooks to the RDF Data Cube vocabulary. It is used in the context of the Data2Semantics project to investigate the use of Linked Data for humanities research (Dutch census dataproduced by DANS).

TabLinker was designed for converting Excel or CSV files to RDF (triplification, RDF-izing) that have a complex layout and cannot be handled by fully automatic csv2rdf scripts.

A presentation about Linked Census Data, including TabLinker is available from SlideShare.

Please consult the Github page for the latest release information.

Using TabLinker

TabLinker takes annotated Excel files (found using the srcMask option in the config.ini file) and converts them to RDF. This RDF is serialized to the target folder specified using the targetFolder option in config.ini.

Annotations in the Excel file should be done using the built-in style functionality of Excel (you can specify these by hand). TabLinker currently recognises seven styles:

  • TabLink Title – The cell containing the title of a sheet
  • TabLink Data – A cell that contains data, e.g. a number for the population size
  • TabLink ColHeader – Used for the headers of columns
  • TabLink RowHeader – Used for row headers
  • TabLink HierarchicalRowHeader – Used for multi-column row headers with subsumption/taxonomic relations between the values of the columns
  • TabLink Property – Typically used for the header cells directly above RowHeader or HierarchicalRowHeader cells, cell values are the properties that relate Data cells to RowHeader and HierarchicalRowHeader cells.
  • TabLink Label – Used for cells that contain a label for one of the HierarchicalRowHeader cells.

An eight style, TabLink Metadata, is currently ignored (See #3).

An example of such an annotated Excel file is provided in the input directory. There are ways to import the styles defined in that file into your own Excel files.

Tip: If your table contains totals for HierarchicalRowHeader cell values, use a non-TabLink style to mark the cells between the level to which the total belongs, and the cell that contains the name of the total. Have a look at the example annotated Excel file to see how this is done (up to row 428).

Once you’re all set, start the TabLinker by cd-ing to the src folder, and running:

python tablinker.py

Requirements

TabLinker was developed under the following environment:

github kitty

We have opened up a Data2Semantics GitHub organisation for publishing all (open source) code produced within the Data2Semantics project. Point your browser (or Git client) to http://github.com/Data2Semantics for the latest and greatest!

Enhanced by Zemanta

Data2Semantics at ICTDelta 2011

Posted by data2semantics in collaboration | computer science | large scale | semantic web | vu university amsterdam - (Comments Off on Data2Semantics at ICTDelta 2011)

The COMMIT programme was officially kicked-off by Maxime Verhagen, minister of Economic Affairs, Agriculture and Innovation at  the ICTDelta 2011 event held at the World Forum on November 16, in The Hague.

Throughout the day, members of the Data2Semantics project manned a very busy stand in the foyer, featuring prior and current work by the project partners such as the AIDA toolkit, OpenPHACTS, LarKC and the MetaLex Document Server.

Enhanced by Zemanta

The Botari application from the LarKC project has won the Open Track of the Semantic Web Challenge.

Botari is a LarKC workflow running on servers in Seoul, plus a user frontend that runs on a Galaxy Tab.

The workflow combines open data from the city of Seoul (Open Street Map, POI’s) with twitter traffic and combines stream processing, machine learning and querying over RDF datasets and streams to give personalised restaurant information and recommendations, presented in an augmented reality interface on the Galaxy Tab.

For more info on Botari, see either the website, or the demo movie or the slide deck or the paper.

Enhanced by Zemanta

The LarKC project’s development team would like to announce a new release (v.3.0) of the LarKC platform, which is available for downloading here. The new release is a considerable improvement of the previous release (v.2.5), with the following distinctive features: PLATFORM New (plain) plug-in registry light-weight plug-in loading and thus very low platform’s start-up time […]

If you liked WebPIE, you’ll also like QueryPIE

WebPIE performed forward inference over up to 100 billion triples (yes, that’s 10^11). Our about-to-be-published QueryPIE can do on the fly backward-chaining inference at query-time, over a billion triples, in milliseconds, on just 8 parallel machines.

Last year, Jacopo Urbani and co-authors from the LarKC team broke the speed record for forward chaining inference over OWL-Horst.  Computing the complete closure over 100 billion of triples in a number of hours using a MapReduce/Hadoop implementation on a medium-sized cluster. The performance of WebPie [see conference and journal paper] is:

  • 1 billion FactForge triples in 1.5 hours on 32 compute nodes
  • 24 billion Bio2RDF triples in 10 hours on 32 compute nodes
  • 100 billion LUBM triples in 15 hours on 64 compute notes
  • deriving anywhere between 150K-650K triples per second, depending on the dataset
  • runtime growing linearly with number of triples
  • speedup growing linearly the number of compute nodes

Now, a year later, we’re breaking another speed record, but this time for “backward chaining“: not doing all inferencing up front, but doing the inferencing “on the fly”, at query time, as and when they are needed.

Until now, backward-chaining was considered to be unfeasible on very large realistic data, since it would slow down the query response time too much. Our paper at ISWC this year shows it’s not all that impossible: on different real-life datasets of up to 1 billion triples, QueryPIE can do on the fly backward-chaining inference at query-time, implementing the full OWL Horst fragment with response times in millisecs on just 8 machines.

All code available at http://few.vu.nl/~jui200/files/querypie-1.0.0.tar.gz

Enhanced by Zemanta

Data2Semantics aims to provide essential semantic infrastructure for bringing e-Science to the next level.

A core task for scientific publishers is to speed up scientific progress by improving the availability of scientific knowledge. This holds both for dissemination of results through traditional publications, as well as through the publication of scientific data. The Data2Semantics project focuses on a key problem for data management in e-Science:

How to share, publish, access, analyse, interpret and reuse data?

Data2Semantics is a collaboration between the VU University Amsterdam, the University of Amsterdam, Data Archiving and Networked Services (DANS) of the KNAW, Elsevier Publishing and Philips, and is funded under the COMMIT programme of the NL Agency of the Dutch Ministry of Economic Affairs, Agriculture and Innovation.

Enhanced by Zemanta

 by Zhisheng Huang

The China Higher Education Press will publish a LarKC book in Chinese. This book will appear  in the book series of Web Intelligence and Web Science (http://www.wici-lab.org/wici/WIWS/) .

This Chinese LarKC book consists of two parts: Technology part and application part. The technology part covers the topics of LarKC platform, development guide and various plugins and workflows.  The application part covers the topics of Linked Life Data, semantic information retrieval, urban computing, and cancer study. The main contributors of  the book are six Chinese researchers in the LarKC Consortium, who are from Amsterdam, WICI, and Siemens. See the appended text below for the detail.
The book is expected to be published by the end of this year.

Here is the outline of the book content and the main contributors.

Chapter 1 Introduction to LarKC
by Zhisheng Huang (VUA) and Ning Zhong (WICI)

Chapter 2 LarKC Platform
by Jun Fang (VUA)

Chapter 3 Identification  and Selection
by Yi Zeng (WICI)

Chapter 4 Abstraction and Transformation
by Yi Huang (SIEMENS)

Chapter 5 Reasoning  and Deciding
by Jun Fang (VUA) and Zhisheng Huang (VUA)

Chapter 6 LarKC Development Guide
by Zhisheng Huang (VUA) and Jun Fang (VUA)

Chapter 7 Linked Life Data
by Yi Huang (SIEMENS) and Zhisheng Huang (VUA)

Chapter 8 Semantic information retrieval for biomedical applications
by Ru He (SIEMENS) and Zhisheng Huang (VUA)

Chapter 9 Semantic Technology and Gene Study
by Zhisheng Huang (VUA)

Chapter 10 Urban Computing
by Yi Huang (SIEMENS) and Zhisheng Huang (VUA)

Chapter 11 Conclusions
by Zhisheng Huang (VUA),  Ru He (SIEMENS), and Ning Zhong (WICI)

The LarKC folk at the German High Performance Computing Centre in Stuttgart did a rather nice write-up on LarKC from a high-performance computing perspective, intended for their own community. Find the relevant pages here.

Enhanced by Zemanta

A LarKC workflow for traffic-aware route-planning has won the 1st prize in the AI Mashup Challenge at the ESWC 2011 conference, held this week on Crete.

The detail of “Traffic_LarKC” can be found at https://sites.google.com/a/fh-hannover.de/aimashup11/home/traffic_larkc, but in brief:

Four different datasets are used:

  • the traffic sensors data, obtained from Milano Municipality
  • the Milano street topology
  • historical weather data from the Italian website ilMeteo.it
  • calendar information (week days and week-end days, holidays, etc.) from Milano Municipality and from the Mozilla Calendar project.

These are used in a batchtime workflow to predict the traffic situation over the next two ours and in a runtime workflow to respond to route-planning queries from users.

This LarKC workflow shows that Linked Open Data and the corresponding technologies are now getting good enough to compete with what’s possible in closed commercial systems.

Congratulations to the entire team that has made this possible!

LarKC traffic demo