News and Updates on the KRR Group
Header image

Source: Think Links

I just received 50 copies of this in the mail today:

Literature is so much better in its electronic form but its still fun to get a physical copy. Most importantly, this proceedings represents scientific content and a scientific community that I’m proud to be be part of. You can obviously access the full proceedings online. Preprints are also available from most of the author’s sites. You can also read my summary of the 4th International Provenance and Annotation Workshop (IPAW 2012) .

Filed under: academia, provenance Tagged: ipaw, lecture notes in computer science, lncs, provenance

Source: Think Links

If you read this blog a bit, you’ll know I’m a fairly big fan of RDF as a data format. It’s really great for easily mashing different data sources together. The common syntax gets rid of a lot of headaches before you can start querying the data and you get nice things like simple reasoning to boot.

One thing that I’ve been looking for is a nice way to store a ton of RDF data, which is

  1. easy to deploy;
  2. easy to query;
  3. works well with scalable analysis & reasoning techniques (e.g. stuff built using MapReduce/Hadoop);
  4. oh and obviously scalable.

This past spring I was at a dagstuhl workshop where I had the chance to briefly talk to Chris Ré about the data storage environment used  by the Hazy – one of the leading projects in the world on large scale statistical inference. At the time, he was fairly enthusiastic about using HBase as a storage layer.

Based on that suggestion, I played around with deploying HBase myself on Amazon. Using whirr it was pretty straightforward to deploy a pretty nice environment in a matter of hours. In addition, HBase has the nice side effect that it uses the same file system as Hadoop  (HDFS) so you can run Hadoop jobs over the data that’s stored in the database.

With that I wanted to see a) what was a good way to store a bunch of RDF in HBase and b) if the retrieval of RDF was performant. Sever Fundatureanu worked on this as his master’s thesis.

One of the novel things he looked at was using coprocessors (built in user defined functions in  hbase) to try and improve the building of indexes for RDF within the database. That is instead of running multiple hadoop load jobs you run ~one and then let the coprocessors in each worker node build the rest of the x indexes you want to improve retrieval. While it didn’t improve performance, I thought the idea was cool. I’m still interested in how much user side processing you can shove into the worker nodes within HBase. Below you’ll find an abstract and link to his full thesis.

I’m still keen on using HBase as the basis for the analysis and reasoning over RDF data. We’re continuing to look into this area. If you have some cool ideas, let us know.

A Scalable RDF Store Based on HBase  –

Sever Fundatureanu

The exponential growth of the Semantic Web leads to a need for a scalable storage solution for RDF data. In this project, we design a quad store based on HBase, a NoSQL database which has proven to scale out to thousands of nodes. We adopt an Id-based schema and argue why it enables a good trade-off between loading and retrieval performance. We devise a novel bulk loading technique based on HBase coprocessors and we compare it to a traditional Map-Reduce technique. The evaluation shows that our technique does not scale as well as the traditional approach. Instead, with Map-Reduce, we achieve a loading throughput of 32152 quads/second on a cluster of 13 nodes. For retrieval, we obtain a peak throughput of 56447 quads/second.

 

Filed under: linked data Tagged: coprocessors, hbase, rdf

Source: Semantic Web world for you
Here it is: the first fully featured release of SemanticXO! Use it in your activities to store and share any kind of structured information with other XOs. The installation procedure is easy and only requires and XO-1 running the operating system version 12.1.0. Go to the GIT repository and download the files “setup.sh” and “semanticxo.tar.gz” […]

Source: Semantic Web world for you
Here it is: the first fully featured release of SemanticXO! Use it in your activities to store and share any kind of structured information with other XOs. The installation procedure is easy and only requires and XO-1 running the operating system version 12.1.0. Go to the GIT repository and download the files “setup.sh” and “semanticxo.tar.gz” […]

Source: Semantic Web world for you
Reblogged from The World Wide Semantic Web: Opening of the festival (see more photos) This year Open Knowledge Festival took place in Helsinki between from September 17 and September 22. Victor de boer and myself (Christophe Guéret) went to this huge (1000 participants from 100 nations) conference to speak about Open Development, one the 13 […]

Source: Semantic Web world for you
Reblogged from The World Wide Semantic Web: Opening of the festival (see more photos) This year Open Knowledge Festival took place in Helsinki between from September 17 and September 22. Victor de boer and myself (Christophe Guéret) went to this huge (1000 participants from 100 nations) conference to speak about Open Development, one the 13 […]

Source: Think Links

Today, I was teaching the second class of our Semantic Web class here at the VU University Amsterdam on RDF and RDFS. After the first half of the class in a very warm lecture room, the students were fading. After a quick poll, we decided to take the course outside. So I had the fun challenge of teaching off-the-cuff RDF Schema without a chalkboard and slides… and I think it actually worked. The students did a great job of participating and we managed to demonstrate a bit of rule based reasoning using a combination of coloured paper, students, and moving about. Here’s a photo of the class as we ended:

Filed under: academia Tagged: lecture, outdoors, vrije universiteit amsterdam, vu university amsterdam

Source: Semantic Web world for you
Let’s assume you are the owner of a CSV file with some valuable data. You derive some revenue from it by selling it to consumers that do traditional data integration. They take your file and import it into their own data storage solution (for instance, a relational database) and deploy applications on top of this […]

Source: Semantic Web world for you
Let’s assume you are the owner of a CSV file with some valuable data. You derive some revenue from it by selling it to consumers that do traditional data integration. They take your file and import it into their own data storage solution (for instance, a relational database) and deploy applications on top of this […]

Source: Semantic Web world for you
Emerging online applications based on the Web of Objects or Linked Open Data typically assume that connectivity to data repositories and entity resolution services are always available. This may not be a valid assumption in many cases. Indeed, there are about 4.5 billion people in the world who have no or limited Internet access. Many […]