News and Updates on the KRR Group
Header image

Author Archives: admin

To watch Lauren’s ‘lekenpraatje’ on youtube click here.

Laurens Rietveld graduated on 9th of March 2016. Find out more about his PhD project here.

On Wednesday March 9th, the Knowledge Representation and Reasoning group of the VU University organises a seminar preceding the PhD Defence of Laurens Rietveld. You are cordially invited to attend both the seminar and the defence of Laurens.

The seminar takes place at 13.00-14.30 in the VU University Amsterdam, main building, EZ – HG-03A10 Agora 4. The defence of Laurens Rietveld on the topic of Publishing and Consuming Linked Data starts at 15.45 in the Aula.


13.00, Prof.dr. Sören Auer
Title: Linking Data on the Web and within Enterprises

Abstract:
In the last years, the Linked Data concept gained wide attention for integrating distributed, heterogeneous data on the Web and within enterprises. In this talk, we discuss some crucial research and technology challenges of the Linked Data management life-cycle including extraction, linking, quality assurance, authoring and visualization. We look at existing and promising future Linked Data applications in the Digital Humanities/Cultural Heritage, Enterprise Data and Internet of Thing/Industry 4.0 domains.

Bio:
Sören heads the Enterprise Information Systems group at the computer science department at University of Bonn. He is also member of the leadership council of Fraunhofer-Insitute for Intelligent Analysis and Information Systems (IAIS) and scientific head of the Fraunhofer IAIS department Organized Knowledge. Before joining University of Bonn and Fraunhofer IAIS in 2013, Sören founded AKSW research group at University of Leipzig and worked at TU Chemnitz as well as University of Pennsylvania.


13.45, Prof.dr, Stefan Decker
Title: Knowledge Representation on the Web revisited: the case for Prototypes

Abstract:
In recent years RDF and OWL have become the most common knowledge representation languages in use on the Web, propelled by the recommendation of the W3C. In this talk I will report on some work to develop a case for using a different approach for representing knowledge on the Web. I argue, that an approach, based on Prototypes, is more suitable than the currently available paradigms for representing knowledge on the Web. I will discuss requirements and design principles for Knowledge Representation based on Prototypes on the Web, after which I propose a formal syntax and semantics. I report on an implementation and on the usability of the system. The presented work is jointly done with Michael Cochez (University of Jyvaskyla).

Bio:
Prof Stefan Decker is a professor at RWTH Aachen University and a Director of the Fraunhofer Institute for Applied Information Technology (FIT). He had a Chair at the National University of Ireland, Galway in conjunction with the Directorship of the Digital Enterprise Research Institute (now Insight), a Research Assistant Professor position at the Information Sciences Institute of the University of Southern California, and positions at Stanford University and University of Karlsruhe. His current research interests include the Semantic Web, Linked Data, Open Data, ontologies and semi-structured data, and applications for Digital Humanities and the Life Sciences.

CTcue  a company that develops technology for help recruit patients for medical trials is one of the 10 finalists in the Amsterdam Science & Innovation Award 2015.
CTcue has collaborated with our group, has employed one of our PhD students, and their product and services are in part based on our publications, such as:

NWO has awarded 12M Euro to CLARIAH, a project to build a digital infrastructure for software, data, enrichment, search and analytics in the Humanities.  Frank van Harmelen, Maarten de Rijke en Cees Snoek are among the 9 scientists that form the core team of the project. See http://clariah.nl/, http://bit.ly/TERNC0, http://bit.ly/1mWtnje for more details.

Recently the Linked Data Benchmark Council (LDBC) launched its portal http://ldbcouncil.org

LDBC is an  organization for graph and RDF data management systems benchmarking that came out of a EU PF7 project by the same name (ldbc.eu).

The LDBC will survive the EU project and be industry supported and operate with ldbcouncil.org as its web presence.

Also, LDBC announced public drafts of its two first benchmarks. Public draft means that implementations of benchmark software and technical specification documents are available, and ready for public testing and comments. The two benchmarks are

– the Semantic Publishing Benchmark (SPB – http://ldbcouncil.org/benchmarks/spb) which is based on the BBC use case and ontologies, and

– the Social Network Benchmark (SNB – http://ldbcouncil/benchmarks/snb) on which an interactive workload has been released. Later, a Business Intelligence and a Graph Analytics will follow on this same dataset. The SNB data generator was recently used in the ACM SIGMOD programming contest, which was about graph analytics.

The ldbcouncil.org website also holds a blog with new and technical backgrounds of LDBC. The most recent post is about “Choke-Point based Benchmark Design”, by Peter Boncz.

Last week the RDF and graph DB benchmarking project LDBC had its 3rd  Technical User Community meeting in London, held in collaboration with the GraphConnect event.  This meeting marks the official launch of the LDBC non-profit company  which is the successor of the present EU FP7 project.
The meeting was very well attended, along with most of the new advisory board.  Xavier Lopez from Oracle, Luis Ceze from Washington  University and Abraham Bernstein of Zurich University were present.  Jans Aasman of Franz, Inc. and Karl Huppler, former chairman of the TPC were not present but are signed up as advisory board members.
We had great  talks by the new board members and invited graph and RDF DB users.
Nuno Carvalho of Fujitsu Labs presented on the Fujitsu RDF use cases and benchmarking requirements, based around analytics streaming on time series of streaming data.  The technology platform is diverse, with anything from RDF stores to H Base.  The challenge is integration.  I pointed out that with Virtuoso column store you could now efficiently host also time series data alongside RDF.  Sure, a relational format is more efficient with time series data but it can be collocated with RDF and queries can join between the two.  This is specially so after our stellar  bulk load speed measured with the TPC H dataset.
Luis Ceze of Washington University presented Grappa, a C++ graph programming framework that in his words would be like Cray XMT, later Yarc Data, in software.  The idea is to have a graph algorithm divided into small executable steps, millions in number and too have very efficient scheduling and switching between these, building latency tolerance into every step of the application.  Commodity interconnects like InfiniBand deliver bad throughput with small messages,  but with endless message combination opportunities from millions of mini work units the overall throughput stays good.  We know the same from all the Virtuoso scale-out work.  Luis is presently working on Graphbench, a research project at Washington State funded by Oracle for graph algorithm benchmarking.  The major interest for LDBC is in having a library of  common graph analytics as a starting point.  Having these, the data generation can further evolve so as to create challenges for the algorithms.  One issue that came up is  the question of validating graph algorithm results:  Unlike in SQL queries, there is not necessarily a single correct answer.  If the algorithm to use and the count of iterations to run is not fully specified response times will vary widely.  Random walks will anyway create variation between consecutive runs.
Abraham Bernstein presented about the work on his Signal-Collect graph programming framework and its applications in fraud detection.  He also talked about the EU FP7 project ViSTA-TV which does massive stream processing around the real time behavior of internet TV users.  Again, Abraham gave very direct suggestions for what to include in the LDBC graph analytics workload.
Andreas Both of Unister presented on RDF Ontology-driven applications in an e-commerce context.  Unister is Germany’s leading E commerce portal operator with a large number of properties ranging across travel to most B2C.  The RDF use cases are many, in principle down to final content distribution but high online demand often calls for specialized solutions like hbit field intersections for combining conditions.  Sufficiently advanced database technology may also offer this but this is not a guarantee. Selecting travel destinations based on attributes like sports opportunities, culture etc can be made into efficient query plans but this requires perfect query plans also for short queries.  I expect to learn more about this when visiting on site.  There is clear input for LDBC in these workloads.
There were three talks on semantic applications in cultural heritage. Robina Clayphan of Europeana talked about this pan-European digital museum and library, and the Europeana Data Model (EDM). C.E.Ore of the University of Oslo talked about the CIDOC  (Conceptual Reference Model) ontology (ISO standard 21127:2006) and its role in representing cultural, historic and archaeological information. Atanas Kiryakov of Ontotext gave a talk on a possible benchmark around CIDOC CRM reasoning.  In the present LDBC work RDF inference plays a minor role but reasoning would be emphasized with this CRM workload, in which the inference needed revolves around abbreviating unions between many traversal paths of different lengths between modeled objects. The data is not very large but the ontology has a lot of detail.  This still is not the elusive use case which would really require  all the OWL complexities.   We will first see how the semantic publishing benchmark work led by Ontotext in LDBC plays out.  There is anyhow work enough there.
The most concrete result was that the graph analytics part of the LDBC agenda starts to take shape.  The LDBC organization is getting formed and its processes and policies are getting defined.  I visited Thomas Neumann’s group in Munich just prior to the TUC meeting to work on this.  Nowadays Peter Boncz, who was recently awarded the Humbolt  prize  goes to Munich on a weekly basis so Munich is the favored destination for much LDBC related  work.
The first workload of the social network benchmark is taking shape and there is good advance also in the semantic publishing benchmark.  I will in a future post  give more commentary on these workloads, now that the initial drafts from the respective task forces are out.
Orri Erling
OpenLink Software, Inc.
 

Tags: 

Tags: 

Read more about Big Data RDF Store Benchmarking Experiences at:
http://lod2.eu/BlogPost/1584-big-data-rdf-store-benchmarking-experiences.html

Benchmarking can stimulate technological progress. Check the Latest Berlin Benchmark Report for RDF & SPARQL compliant DBMS engines: http://bit.ly/Yf5etP and http://bit.ly/12UsFbu

Tags: