News and Updates on the KRR Group
Header image

The KR&R group investigates modelling and representation of different forms of knowledge and reasoning, as found in a large variety of AI systems. We have an interest in both applications and theory. We study theoretical properties of knowledge representation and reasoning formalisms, but are also involved in developing practical knowledge-based systems. Recently, we have been very active in developments around the Semantic Web.

Posts on this website are continuously aggregated from project and member blogs.

NWO has awarded 12M Euro to CLARIAH, a project to build a digital infrastructure for software, data, enrichment, search and analytics in the Humanities.  Frank van Harmelen, Maarten de Rijke en Cees Snoek are among the 9 scientists that form the core team of the project. See http://clariah.nl/, http://bit.ly/TERNC0, http://bit.ly/1mWtnje for more details.

Recently the Linked Data Benchmark Council (LDBC) launched its portal http://ldbcouncil.org

LDBC is an  organization for graph and RDF data management systems benchmarking that came out of a EU PF7 project by the same name (ldbc.eu).

The LDBC will survive the EU project and be industry supported and operate with ldbcouncil.org as its web presence.

Also, LDBC announced public drafts of its two first benchmarks. Public draft means that implementations of benchmark software and technical specification documents are available, and ready for public testing and comments. The two benchmarks are

- the Semantic Publishing Benchmark (SPB – http://ldbcouncil.org/benchmarks/spb) which is based on the BBC use case and ontologies, and

- the Social Network Benchmark (SNB – http://ldbcouncil/benchmarks/snb) on which an interactive workload has been released. Later, a Business Intelligence and a Graph Analytics will follow on this same dataset. The SNB data generator was recently used in the ACM SIGMOD programming contest, which was about graph analytics.

The ldbcouncil.org website also holds a blog with new and technical backgrounds of LDBC. The most recent post is about “Choke-Point based Benchmark Design”, by Peter Boncz.

Source: Semantic Web world for you
Today I was attending an event entitled “Data-driven Visualization Symposium” in the beautiful Trippenhuis building of the KNAW in Amsterdam. There was a really rich schedule with 10 speakers showcasing some of their work in the area of big data and visualisation. Though I would have appreciated getting a bit more of the how instead […]

Source: Semantic Web world for you
Following the discussion I had after my previous posts, here is a bit more structured explanation of the ideas: Please feel free to ping me and/or comment on this post if you too think it’s a good idea Filed under: Visualisation

Last week the RDF and graph DB benchmarking project LDBC had its 3rd  Technical User Community meeting in London, held in collaboration with the GraphConnect event.  This meeting marks the official launch of the LDBC non-profit company  which is the successor of the present EU FP7 project.
The meeting was very well attended, along with most of the new advisory board.  Xavier Lopez from Oracle, Luis Ceze from Washington  University and Abraham Bernstein of Zurich University were present.  Jans Aasman of Franz, Inc. and Karl Huppler, former chairman of the TPC were not present but are signed up as advisory board members.
We had great  talks by the new board members and invited graph and RDF DB users.
Nuno Carvalho of Fujitsu Labs presented on the Fujitsu RDF use cases and benchmarking requirements, based around analytics streaming on time series of streaming data.  The technology platform is diverse, with anything from RDF stores to H Base.  The challenge is integration.  I pointed out that with Virtuoso column store you could now efficiently host also time series data alongside RDF.  Sure, a relational format is more efficient with time series data but it can be collocated with RDF and queries can join between the two.  This is specially so after our stellar  bulk load speed measured with the TPC H dataset.
Luis Ceze of Washington University presented Grappa, a C++ graph programming framework that in his words would be like Cray XMT, later Yarc Data, in software.  The idea is to have a graph algorithm divided into small executable steps, millions in number and too have very efficient scheduling and switching between these, building latency tolerance into every step of the application.  Commodity interconnects like InfiniBand deliver bad throughput with small messages,  but with endless message combination opportunities from millions of mini work units the overall throughput stays good.  We know the same from all the Virtuoso scale-out work.  Luis is presently working on Graphbench, a research project at Washington State funded by Oracle for graph algorithm benchmarking.  The major interest for LDBC is in having a library of  common graph analytics as a starting point.  Having these, the data generation can further evolve so as to create challenges for the algorithms.  One issue that came up is  the question of validating graph algorithm results:  Unlike in SQL queries, there is not necessarily a single correct answer.  If the algorithm to use and the count of iterations to run is not fully specified response times will vary widely.  Random walks will anyway create variation between consecutive runs.
Abraham Bernstein presented about the work on his Signal-Collect graph programming framework and its applications in fraud detection.  He also talked about the EU FP7 project ViSTA-TV which does massive stream processing around the real time behavior of internet TV users.  Again, Abraham gave very direct suggestions for what to include in the LDBC graph analytics workload.
Andreas Both of Unister presented on RDF Ontology-driven applications in an e-commerce context.  Unister is Germany’s leading E commerce portal operator with a large number of properties ranging across travel to most B2C.  The RDF use cases are many, in principle down to final content distribution but high online demand often calls for specialized solutions like hbit field intersections for combining conditions.  Sufficiently advanced database technology may also offer this but this is not a guarantee. Selecting travel destinations based on attributes like sports opportunities, culture etc can be made into efficient query plans but this requires perfect query plans also for short queries.  I expect to learn more about this when visiting on site.  There is clear input for LDBC in these workloads.
There were three talks on semantic applications in cultural heritage. Robina Clayphan of Europeana talked about this pan-European digital museum and library, and the Europeana Data Model (EDM). C.E.Ore of the University of Oslo talked about the CIDOC  (Conceptual Reference Model) ontology (ISO standard 21127:2006) and its role in representing cultural, historic and archaeological information. Atanas Kiryakov of Ontotext gave a talk on a possible benchmark around CIDOC CRM reasoning.  In the present LDBC work RDF inference plays a minor role but reasoning would be emphasized with this CRM workload, in which the inference needed revolves around abbreviating unions between many traversal paths of different lengths between modeled objects. The data is not very large but the ontology has a lot of detail.  This still is not the elusive use case which would really require  all the OWL complexities.   We will first see how the semantic publishing benchmark work led by Ontotext in LDBC plays out.  There is anyhow work enough there.
The most concrete result was that the graph analytics part of the LDBC agenda starts to take shape.  The LDBC organization is getting formed and its processes and policies are getting defined.  I visited Thomas Neumann’s group in Munich just prior to the TUC meeting to work on this.  Nowadays Peter Boncz, who was recently awarded the Humbolt  prize  goes to Munich on a weekly basis so Munich is the favored destination for much LDBC related  work.
The first workload of the social network benchmark is taking shape and there is good advance also in the semantic publishing benchmark.  I will in a future post  give more commentary on these workloads, now that the initial drafts from the respective task forces are out.
Orri Erling
OpenLink Software, Inc.
 

Tags: 

On Monday the 7th of October the Knowledge, Reasoning and Representation and the Web&Media research groups of the Free University Amsterdam joined for their biweekly Semantic Web (SW) meeting. The topic was on the purpose of using HTTP URIs for denoting SW resources and on the implications for archiving Linked Data (LD).

An important aspect of archiving LD is that in archiving the LD is decoupled from its native Web environment (thus the title of the talk). The two most important Web-based properties that are lost in the process are (1) authority and (2) dereferenceability. We first discussed the relevance of both these properties.

(more…)

Source: Data2Semantics

Our website with additional material for our paper: “A Fast Approximation of the Weisfeiler-Lehman Graph Kernel for RDF Data” has won the Open Science Award at ECML/PKDD 2013. The jury praised the submission as “a perfect example of open science”.

A goal of the Data2Semantics project is to provide resuable software to support semantic enrichment of data. Therefore, the software used for the paper uses existing well-known libraries (SESAME, LibSVM) and was set up into three distinct projects from the start. The heart of software is the proppred library, which contains all the code for doing property prediction using graph kernels on RDF data. Some additional support code for handling RDF is in the d2s-tools project. All the code to run the experiments from the paper(s) is in a separate project called kernelexperiments. This setup allows for easy replication of the (and doing new) experiments and easier integration of the property prediction on RDF library into other projects.

For the future, we aim to provide even more scientific openess via the experimental machine learning platform that we are developing. One of the aims of the platform is to make experimentation easier, without introducing too much overhead. Furthermore, we wil export provenance of the experiments in the Prov-O format. This provenance is visualized using Prov-O-Viz (also developed in Data2Semantics), allowing researchers to gain better insight into the experiments without having to study the code.

protege logo +Apache Jena + YASGUI

 

This year we again have a nice group of students (almost 70) following the 3rd year bachelor Semantic Web course. Until this year it was quite a hassle to combine the bits and pieces to create a complete workflow starting from ontology creation (in Protégé) to having a nice SPARQL endpoint that reasons in OWL over the ontology+instances.

Like the previous years, the instructors (Stefan Schlobach and Ronald Siebes) updated the assignment by the latest developments regarding available toolkits and software.  We were surprised that it was very easy to integrate these latest tools and are now able to do the following within 30 minutes on any machine:

- create a simple ontology in Protégé

- install a sparql endpoint with OWL reasoning (Jena-Fuseki)

- import the ontology

- connect this local endpoint with Yasgui (http://yasgui.laurensrietveld.nl)

- do via Yasgui a federated query combining results from our local endpoint together with results from other endpoints (e.g. DBPedia)

Conclusion: it is now in reach of many people to get, without a lot of suffering, a nice Semantic Web infrastructure up-and-running, and connect it with the vast amount of external Linked-data from various endpoints.

The Semantic Web works!

Here you can find a manual to do this yourself and hopefully share the conclusion.

 

Source: Semantic Web world for you
Yesterday I was sitting in a very interesting meeting with some experts in data visualisation. There was a lot of impressive things presented and the name of Wii remote and Kinect were mentioned a couple of time. As I observed so far, these devices are used as cheap way to get sensors. And they certainly […]

Source: Data2Semantics

Paul Groth co-authored an article about altmetrics in the Elsevier Library Connect newsletter for librarians. The newsletter reaches 18,000 librarians in 138 countries around the world.

Academic research and publishing have transitioned from paper to online platforms, and that migration has continued to evolve from closed platforms to connected networks. With this evolution, there is growing interest in the academic community in how we might measure scholarly activity online beyond formal citation.

See more at: http://bit.ly/19bEVpD

Enhanced by Zemanta