News and Updates on the KRR Group
Header image

Source: Data2Semantics

As a complement to two papers that we will present at the ECML/PKDD 2013 conference in Prague in September we created a webpage with additional material.

The first paper: “A Fast Approximation of the Weisfeiler-Lehman Graph Kernel for RDF Data” was accepted into the main conference and the second paper: “A Fast and Simple Graph Kernel for RDF” was accepted at the DMoLD workshop.

We include links to the papers, to the software and to the datasets used in the experiments, which are stored in figshare. Furthermore, we explain how to rerun the experiments from the papers using a precompiled JAR file, to make the effort required as minimal as possible.

Source: Semantic Web world for you
One visiting the Netherlands will inevitably stumble upon some “BakFiets” in the streets. This Dutch speciality that seems to be the result from cross-breeding a pick-up with a bike can be used from many things from getting the kids around to moving a fridge. Now, let’s consider a Dutch bike shop that sells some Bakfiets […]

Source: Semantic Web world for you
One visiting the Netherlands will inevitably stumble upon some “BakFiets” in the streets. This Dutch speciality that seems to be the result from cross-breeding a pick-up with a bike can be used from many things from getting the kids around to moving a fridge. Now, let’s consider a Dutch bike shop that sells some Bakfiets […]

Source: Semantic Web world for you
Yesterday was the closing event of the Pilot Linked Open Data project. A significantly big crowd of politicians, civil servants, hackers, SME owners, open data activists and researchers gathered in the very nice building of the RCE in Amersfoort to hear about what has been done within this one year project lead by Erwin Folmer. […]

Source: Semantic Web world for you
Yesterday was the closing event of the Pilot Linked Open Data project. A significantly big crowd of politicians, civil servants, hackers, SME owners, open data activists and researchers gathered in the very nice building of the RCE in Amersfoort to hear about what has been done within this one year project lead by Erwin Folmer. […]

Source: Data2Semantics

This june 10 and 11, the Data2Semantics team locked itself in a room in the Amsterdam Public Library to build a first version of the Data2Semantics Golden Demo: a pipeline for publishing enriched data (‘semantics’) directly from Dropbox to Figshare, integrated in the Linkitup webservice.

In two days, we built and integrated:

Watch the video!


Enhanced by Zemanta

Source: Think Links

I think since I’ve moved to Europe I’ve been attending ESWC (Extended/European Semantic Web Conference) and I always get something out of the event. There are plenty of familiar faces but also quite a few new people and it’s a great environment for having chats. In addition, the quality of the content is always quite good. This year the event was held in Montpellier and was for the most part well organized: the main conference wifi worked!

The stats:

  • 300 participants
  • 42 accepted papers from 162 submissions
  • 26% acceptance rate
  • 11 workshops + 7 tutorials

So what was I doing there:

The VU Semantic Web group also had a strong showing:

  • Albert Meroño-Peñuela won the best PhD symposium paper for his work on digital humanities and the semantic web.
  • The USEWOD workshop’s (led by Laura Hollink) datasets were used by a number of main track papers for evaluation.
  • Stefan Schlobach and Laura Hollink were on the organizing committee. And we organized a couple of workshops & tutorials.
  • Posters/Demos:
    • Albert Meroño-Peñuela, Rinke Hoekstra, Andrea Scharnhorst, Christophe Guéret and Ashkan Ashkpour. Longitudinal Queries over Linked Census Data.
    • Niels Ockeloen, Victor de Boer and Lora Aroyo. LDtogo: A Data Querying and Mapping Framework for Linked Data Applications.
  • Several workshop papers.

I’ll try to pull out what I thought were the highlights of the event.

What is a semantic web application?

Can you escape Frank?

Can you escape Frank?

The keynotes from Enrico Motta and David Karger focused on trying to define what a semantic web application was. This starts out in the form of does a Semantic Web application need to use the Semantic Web set of standards (e.g. RDF, OWL, etc). So from my perspective, the answer is no. These standards are great infrastructure for building these applications but are they necessary, no (see google knowledge graph).  Then what is a semantic web application?

From what I could gather, Motta would define it as an application that is scalable, uses the web and embraces Model Theoretic semantics. For me that’s rather limiting, there are many other semantics that may be appropriate… we can ground meaning in something else other than model theory. I think a good example of this is the work on Pragmatic Semantics that my colleague Stefan Schlobach presented at the Artificial Intelligence meets the Semantic Web workshop. Or we can reach back into AI and see discussion’s from Brooks’ classic paper Elephant’s Don’t Play Chess.  I felt that Karger’s definition (in what was a great keynote) was getting somewhere. He defined a semantic web application essentially as:

An application whose schema is expected to change.

This seems to me to capture the semantic portion of the definition, in a sense that the semantics need to be understood on the fly. However, I think we need to role the web back into this definition… Overall, I thought this discussion was worth having and helps the field define what it is that we are aiming at. To be continued…

Homebrew databases

2013-05-29 09.18.05

Homebrew databases

As I said, I thought Karger’s keynote was great. He gave a talk within a talk, on the subject of homebrew databases from this paper in CHI 2011:

Amy Voida, Ellie Harmon, and Ban Al-Ani. 2011. Homebrew databases: complexities of everyday information management in nonprofit organizations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’11). ACM, New York, NY, USA, 915-924. DOI=10.1145/1978942.1979078

They define a homebrew database as “an assemblage of information management resources that people have pieced together to satisfice their information management needs.” This is just what we see all the time, the combination of excel, word, email, databases and don’t forget normal paper brought together to try to attack information management problems. A number of our use cases from the pharma industry as well as science reflect essentially this practice. It’s great to see a good definition of this problem grounded in ethnographic studies.

The Concerns of Linking

There were a couple of good papers on generating linkage across datasets (the central point of linked data). In Open PHACTS, we’ve been dealing with the notion of essentially context dependent linkages. I think this notion is becoming more prevalent in the community. We had a lot of positive response on this in the poster session when presenting Open PHACTS. Probably, my favorite paper was on linking the Smithsonian American Art museum to the Linked Data cloud. They use PROV to drive their link generation. Essentially, proposing links to human’s who then verify the connections. See:

I also liked the following paper on which hardware environment you should use when doing link discovery. Result: use GPU’s there fast!

Additionally, I think the following paper is cool because they use network statistics not just to measure but to do something, namely create links:


APIs were a growing theme of the event with things like the Linked Data Platform working group and  the successful SALAD workshop. (Fantastic acronym). Although I was surprised people in the workshop hadn’t heard of the Linked Data API. We had a lot of good feedback on the Open PHACTS API. It’s just the case that there is more developer expertise for using web service apis rather than semweb tech. I’ve actually seen a lot of demand for Semweb skills and while we our doing our best to train people there is still this gap. It’s good then that we are thinking about how these two technologies play together nicely.

Random Notes

Filed under: events, linked data Tagged: conference, eswc2013, linked data, semantic web

Source: Think Links

Last week, I attended ACM CHI 2013 and Web Science 2013 in Paris. I had a great time and wanted to give a recap of both conferences, which were collocated.


2013-04-29 18.45.58

This was my first time at CHI – the main computer-human interaction conference. It’s not my main field of study but I was there to Data DJ. I had an interactivity submission accepted with Ayman from Yahoo! Reseach on using turntables to manipulate data. Here’s the abstract:

Spinning Data: Remixing live data like a music DJ

This demonstration investigates data visualization as a performance through the use of disc jockey (DJs) mixing boards. We assert that the tools DJs use in-situ can deeply inform the creation of data mixing interfaces and performances. We present a prototype system, DMix, which allows one to filter and summarize information from social streams using a audio mixing deck. It enables the Data DJ to distill multiple feeds of information in order to give an overview of a live event.

Paul Groth and David A. Shamma. 2013. Spinning data: remixing live data like a music dj. In CHI ’13 Extended Abstracts on Human Factors in Computing Systems (CHI EA ’13). ACM, New York, NY, USA, 3063-3066. DOI=10.1145/2468356.2479611 (PDF)

It was a fun experience… although it was a lot of demo giving (reception + all coffee breaks). The reactions were really positive. Essentially, once a person touched the deck they really got the interaction. Plus, a couple of notable people stopped by that seemed to like the interaction: Jacob Nielsen and @kristw from twitter data science. The kind of response I got made me really want to pursue the project more. I also learned about how we can make the interaction better.

The whole prototype system is available on github. I wrote the whole using node.js and javascript in a web browser.  Warning: this is very ugly code.

In addition to my demo, I was impressed with the cool stuff on display (e.g. traceable skateboards) as well as the number of companies there looking for talent. The conference itself was huge with 3500 people and it was the first conference I attended where they had multiple sponsored parties.


Web Science was after CHI and is more in my area of research.

What we presented

2013-05-03 15.16.18

I was pleased that the VU had 8 publications at the conference, which is a really strong showing. Also two of our papers were nominated for the best paper award.

The two papers I had in the conference were very interdisciplinary.

These papers were chiefly done by the first authors both students at the VU. Anca attended Web Science and did a great job presenting our poster on using Google Scholar to measure academic independence. There was a lot of interest and we got quite a few ideas on how to improve the paper (bigger sample!).

The other paper by Fabian Eikelboom was very well received. It compared online and offline pray cards and tried to see how the web modified this form of communication. Here’s a couple of tweets:

Interesting talk on online and offline prayer and comparing them at #websci13. First time I've heard of this
Alvin Chin (@GadgetMan4U) May 02, 2013

Absolutely BRILLIANT #websci13 Pecha Kucha on comparisons of on and offline prayer @WebSciDTC
maire byrne (evans) (@MaireAByrne) May 02, 2013

Conference thoughts

I found quite a few things that I really liked at this year’s web science. A couple of pointers:

  • Henry S Thompson, Jonathan A Rees and Jeni Tennison: URIs in data: for entities, or for descriptions of entities: A critical analysis – Talked about the http range 14 and the problem of unintended extensibility points within standards. I think a critical area of Web Science is how the social construction of technical standards impacts the Web and its development. This is an example of this kind of research.
  • Catherine C. Marshall and Frank M. Shipman: Experiences Surveying the Crowd: Reflections on methods, participation, and reliability – really got me thinking about the notion of hypotheticals in law and how this relates to provenance on the web.
  • Panagiotis Metaxas and Eni Mustafaraj: The Rise and the Fall of a Citizen Reporter – a compelling example of how twitter influences the mexican drug war and how trust is difficult to determine online. The subsequent Trust Trails project looks interesting.
  • The folks over at the UvA at are doing a lot of fun work with respect to studying the web as a social object. It’s worth looking at their work.
  • Jérôme Kunegis, Marcel Blattner and Christine Moser. Preferential Attachment in Online Networks: Measurement and Explanations – interesting discussion of how good our standard network models are.  Check out there collection of networks to download and analyze!
  • Sebastien Heymann and Benedicte Le Grand. Towards A Redefinition of Time in Information Networks?

Unfortunately, there were some things that I hope will improve for next year. First, as you can tell above the papers were not available online during the conference. This is really a bummer when your trying to tweet about things you see and follow-up later. Secondly, I thought there were a few too many philosophy papers. In particular, it worries me when a computer scientist is presenting a philosophy paper at a science conference. I think the program committee needs to watch out for spreading too thinly in the name of interdisciplinarity. Finally, the pecha kucha session was a real  success – short, succinct presentations that really raised interest in the work. This, however, didn’t carry over into the main sessions which often ran too long.

Overall, both CHI and Web Science were well worth the time – I made a bunch of connections and saw some good research that will influence some of my work. Oh and it turns out Paris has some amazing coffee:

2013-05-03 10.37.29

Filed under: data dj, interdisciplinary research Tagged: #chi2013, conference, data dj, web science, websci13

Source: Think Links


For the past couple of days (April 8 – 10, 2013), I attended the UKSG conference. UKSG is organization for academic publishers and librarians. The conference itself has over 700 attendees and is focused on these two groups. I hadn’t heard of it until I was invited by Mike Taylor from Elsevier Labs to give a session with him on altmetrics.

The session was designed to both introduce and give a start-of-the art update on altmetrics to publishers and librarians. You can see what I had to say in the clip above but my main point was that altmetrics is at a stage where it can be advantageously used by scholars, projects and institutions not to rank but instead tell a story about their research. It’s particular important when many scientific artifacts beyond the article (e.g. data, posters, blog posts, videos) are becoming increasingly trackable and can help scholars tell their story.

The conference itself was really a bit weird for me as it was a completely different crowd than I normally would connect with… I had to one of the few “actual” academics there, which lead to my first day tweet:

being at #uksglive as an academic is interesting – talking to people who talk about me in the abstract is seriously meta

— Paul Groth (@pgroth) April 8, 2013

It was fun to randomly go up to the ACM and IEEE stand and introduce myself not as a librarian or another publisher but as an actual member of their organizations. Overall, though people were quite receptive of my comments and were keen to get my views on what publishing and librarians could be doing to help me as a research out. I do have to say that it was a fairly well funded operation (there is money in academia somewhere)…. I came away with a lot of free t-shirts and USB sticks and I never have been to a conference that had bumper cars for the evening entertainment:

UKSG bumper cars

In addition to (hopefully) contributing to the conference, I learned some things myself. Here are some bullet points in no particular order:

  • Outrageous talk by @textfiles – the Archive Team is super important
  • I talked a lot to Geoffrey Bilder from CrossRef. Topics included but not limited to:
    • why and when indirection is important for permanence in url space
    • the need for a claims (i.e. nanopublications) database referencing ORCID
    • the need for consistent url policies on sites and a “living will” for sites of importance
    • when will scientist get back to being scientists and stop being marketers (is this statement true, false, in-between, or is it even a bad thing)
    • the coolness of
  • It’s clear that librarians are the publishers customers, academics are second. I think this particular indirection  badly impacts the market.
  • Academic content output is situated in a network – why do we de-link it all the time?
  • The open access puppy
  • Wasting no time, #mendeley already up in the #elsevier booths at #uksglive…

    — Mark Hahnel (@MarkHahnel) April 9, 2013


  • It was interesting to see the business of academic publishing going done. There were lots of pretty intense looking dealings going down that I witnessed in the cafe.
  • Bournemouth looks like it could have some nice surfing conditions.

Overall, UKSG was a good experience to see, from the inside, this completely other part of the academic complex.

Filed under: academia, altmetrics Tagged: #altmetrics, #uksglive, trip report

Data2Semantics wins COMMIT/ Valorization Award!

Posted by data2semantics in collaboration | computer science | large scale | semantic web | vu university amsterdam - (Comments Off on Data2Semantics wins COMMIT/ Valorization Award!)

Source: Data2Semantics

During the COMMIT/ Community event, April 2-3 in Lunteren, the Data2Semantics won one out of three COMMIT/ Valorization awards. The award is a 10000 euros subsidy to encourage the project to bring one of its products closer to use outside academia.

At the event, we presented and emphasized the philosophy of Data2Semantics to embed new enrichment tools in the current workflow of individual researchers. We are working closely with both (with our Linkitup tool) and Elsevier Labs to bring semantics at the fingertips of the researcher.