News and Updates on the KRR Group
Header image

We are glad to announce that the LarKC Platform Release v2.0 is now available in our repository on SourceForge.
The redistributable package can be downloaded via the following URL:

http://sourceforge.net/projects/larkc/files/Release-2.0/larkc-release-2.0.zip/download (OS independent)

The source code belonging to this release can be checked out from SVN:

http://larkc.svn.sourceforge.net/viewvc/larkc/branches/Release_2.0_prototype/platform/

A complete manual for both users and developers can be found at:

http://sourceforge.net/projects/larkc/files/Release-2.0/LarKC_Platform_Manual_2.0.pdf

If [...]

Source: Think Links

One of the nice things about using cloud services is that sometimes you get a feature that you didn’t expect. Below is a nice set of stats from WordPress.com about how well Think Links did in 2010. I was actually quite happy with 12 posts – one post a month. I will be trying to increase the rate of posts this year. If you’ve been reading this blog, thanks! and have a great 2011. The stats are below:

Here’s a high level summary of this blogs overall blog health:

Healthy blog!

The Blog-Health-o-Meter™ reads Fresher than ever.

Crunchy numbers

Featured image

A Boeing 747-400 passenger jet can hold 416 passengers. This blog was viewed about 4,500 times in 2010. That’s about 11 full 747s.

 

In 2010, there were 12 new posts, growing the total archive of this blog to 46 posts. There were 12 pictures uploaded, taking up a total of 5mb. That’s about a picture per month.

The busiest day of the year was October 13th with 176 views. The most popular post that day was Data DJ realized….well at least version 0.1.

Where did they come from?

The top referring sites in 2010 were twitter.com, few.vu.nl, litfass.km.opendfki.de, 4store.org, and facebook.com.

Some visitors came searching, mostly for provenance open gov, think links, ready made food, 4store, and thinklinks.

Attractions in 2010

These are the posts and pages that got the most views in 2010.

1

Data DJ realized….well at least version 0.1 October 2010

2

4store Amazon Machine Image and Billion Triple Challenge Data Set October 2009
2 comments

3

Linking Slideshare Data June 2010
4 comments

4

A First EU Proposal April 2010
3 comments

5

Two Themes from WWW 2010 May 2010

Filed under: meta

Source: Semantic Web world for you

A few days ago, I posted about SemanticXO and how you will see how to install a TripleStore on your XO. Here are the steps to follow to compile&install RedStore on the XO, put some triples in it and issue some queries. The following has been tested with an XO-1 running the software 10.1.3 and a MacBookPro running ArchLinux x64 (it’s not so easy to compile directly on the XO, that’s why you will need a secondary machine). All the scripts are available here.

Installation of RedStore

RedStore depends on some external libraries that are not yet packaged for Fedora11, which is used as a base for the operating system of the XO. The script build_restore.sh will download and compile all the necessary stuff. You may however need to install external dependencies on your system, such as libxml. That script only takes care of the things redstore directly depends on, namely raptor2, rasqal and redland (all available here). Here is the full list of commands to issue:

mkdir /tmp/xo
cd /tmp/xo
wget --no-check-certificate https://github.com/cgueret/SemanticXO/raw/master/build_redstore.sh
sh build_restore.sh

Once done, you will get four files to copy on the XO and if you don’t, you can also download this pre-compiled package. These files shall be put all together somewhere, for instance “/opt/redstore”. Note that all the data redstore needs will be put into that same directory. In plus of these 4 files, you’ll need a wrapper script and an init scripts. Both are available on the source code repository. So, here what to do on the XO, as root (replacing “cgueret@192.168.1.105″ by the login/IP accurate for you) :

mkdir /opt/redstore
scp cgueret@192.168.1.105:/tmp/xo/libraptor2.so.0 .
scp cgueret@192.168.1.105:/tmp/xo/librasqal.so.2 .
scp cgueret@192.168.1.105:/tmp/xo/librdf.so.0 .
scp cgueret@192.168.1.105:/tmp/xo/restored .
wget --no-check-certificate https://github.com/cgueret/SemanticXO/raw/master/wrapper.sh
chmod +x wrapper.sh
cd /etc/init.d
wget --no-check-certificate https://github.com/cgueret/SemanticXO/raw/master/redstoredaemon
chmod +x redstoredaemon
chkconfig --add redstoredaemon

Then you can reboot your XO and enjoy the triplestore through its http frontend, available on the port 8080 :)

Loading some triples

Now that the triple store is running, it’s time to add some triples. The SP2Bench benchmark comes with a tool (sp2b_gen) to generate any number of triples. To begin with, you can generate 50000 triples. That should be about of the maximum amount of triples an XO will have to deal with later on when the activities will store data in it. Here is what to do, with “192.168.1.104″ being the IP of the XO:

sp2b_gen -t 50000
rapper -i guess -o rdfxml sp2b.n3 > sp2b.rdf
curl -T sp2b.rdf 'http://192.168.1.104:8080/data/http://example.com/data'

It takes about 43 minutes to upload these 50k triples which gives an average of 53 milliseconds per triple or 19 triples per second. That’s not fast but should be enough to have an API allowing to store a bunch triples with an acceptable response time. The data takes 4Mo of disk space on the XO for an initial RDF file of about 9.8Mo.

Issue some queries

The SP2Bench benchmark comes with a generator for the triples and a set of 17 SPARQL queries expressed over this data. The queries are of changing complexity in order to benchmark different triple stores. Unfortunately, 9 of them where to complex for RedStore on the XO, with these 50k triples. These queries where not solved, even after being executed over a full night! The 8 remaining queries are solved without much problems, as long as you have enough time to wait for the answer:

Query file Execution time
q1.sparql 14229.4 ms
q2.sparql 44189.2 ms
q3a.sparql 21506.8 ms
q3b.sparql 19498.4 ms
q3c.sparql 19663.9 ms
q10.sparql 3940.6 ms
q11.sparql 4685.2 ms
q12c.sparql 3539.6 ms

The queries have been executed using the “sparql-query” command line client that way:

cat q2.sparql | sparql-query http://192.168.1.104:8080/sparql -t -p -n

The long delay can sounds as a bad news but it must be noted that this was with 50k triples and with queries designed to be tricky in order to test triple store capabilities. Considering a normal usage with fewer triples and more standard queries, we can expect things to go better.

Source: Semantic Web world for you

The three XOs received for the project

The project One Laptop Per Child (OLPC) has provided millions of kids world wide with a low-cost connected laptop helping them to enhance their knowledge and develop learning skills. Learning a foreign language, getting an introduction to reading/writting or preserving/revive an endangered/extinct language are among the possible usages of these XOs. Such activities could take a significant benefit from a storage layer optimised for multi-lingual and loosely structured data.

One of the building block of the Semantic Web, the “Triple Store”, is such a data storage service.  A triple store is like a database engine optimised to store and provide access to triples, atomic statements binding together a subject a predicate and an object. For instance, <Amsterdam,isLocatedIn,Netherlands>. And these two triples would define two different names for two different languages: <Amsterdam,isLocatedIn,”Netherlands”@nl>,  <Amsterdam,isLocatedIn,”Pays-Bas”@fr>.

SemanticXO is a new project from the contributor program aimed at adding a triple store and a front-end API on the XOs’ operating system. This triple store will extend the functionalities of Sugar with the possibility for all activities to store loosely structured/multilingual data and easily connect information across activities. In plus, the SPARQL protocol will allow for an easy access to the data stored on any device.

A first goal is to setup RedStore on the XOs allocated to this project. RedStore is a lightweight triple store that should be able to run on low hardware and still provide nice performances. Stay tuned for the result! ;-)

Source: Semantic Web world for you

This is the first post on this blog, aimed at giving and pointing to information about the Semantic Web. The Semantic Web (or Web 3.0) is a new technology and research topic aimed at putting more semantic into the Web as we know it. The changes are happening, in a not so visible but very concrete way for you, user of the Web. On this blog you will learn more about it and how you can benefit from it, whoever you are.

LarKC announces a new release of mpiJava (1.2.6) – a Message-Passing Interface (MPI) library, allowing a Java application to efficiently run on a distributed, parallel, and high-performance computer architecture. First introduced in the HPJava project and developed by

  • Pervasive Technology Labs, Indiana University,
  • Syracuse University, and
  • CSM, University of Portsmouth,

mpiJava@SourceForge (http://sourceforge.net/projects/mpijava/) is now managed and maintained by the High Performance Computing Center Stuttgart (HLRS) in the framework of LarKC.

The library is easy to deploy and use within the application code, in particular for plug-ins. Among the new features are true multi-platforming (thanks to CMake support configuration procedure), very high performance characteristics (achieved by efficient utilisation of underlying native MPI-C implementation), and support of the most famous MPI realisations (MPICH, Open MPI, and MS-MPI).

home  |  stats  |  news (rss)  |  login

LarKC announces a new release of mpiJava (1.2.6) – a Message-Passing Interface (MPI) library, allowing a Java application to efficiently run on a distributed, parallel, and high-performance computer architecture. First introduced in the HPJava project and developed by

Pervasive Technology Labs, Indiana University,
Syracuse University, and
CSM, University of Portsmouth,

mpiJava@SourceForge (http://sourceforge.net/projects/mpijava/) is now managed and maintained by the [...]

Source: Think Links

This has been a great week if you think that it’s important to know the origins of content on the web. First, Google announced the support of explicit metadata describing the origins of news article content that will be used by Google News. Publishers can now identify using two tags whether they the original source of a piece of news or are syndicating it from some other provider. Second, the New York Times now has the ability to do paragraph level permalinks. (So this is the link to the third paragraph of an article on starbucks recycling). So one can link to the exact paragraph when quoting a piece. This was supported by some other sites as well and there’s a wordpress plug-in for it but having the Times support it is big news. Essentially, with a couple of tweaks these techniques could make the quote pattern that you see in blogs (shown below) machine readable.

In the W3C  Provenance Incubator Group that is just wrapping up, one of the main scenarios was how to support a News Aggregator that can makes use of provenance to help determine the quality of the articles it automatically creates. With these developments, we are moving one step closer to being able to make this scenario possible.

To me, this is more evidence that with simple markup, and simple link structures, we can achieve the goal of having machines know where content on the web originates. However, like with a lot of the web, we need to agree on those simple structures so that everyone knows how to properly give credit on the web.

Filed under: provenance markup Tagged: google news syndication tags, new york times, permalinks, provenance

The 2010 LarKC PhD symposium was held in Beijing in Nov 14th, 2010. More than 40 participants attended this symposium (Most of them were participants from The 4th LarKC Ealy Adopters Tutorial).

The Proceeding of the symposium can be downloaded here.

The LarKC PhD symposium is an annual event that the Large Knowledge Collider (LarKC) Consortium organized. The main objectives of this symposium series is to provide a communication platform for young researchers (especially PhD students) on their recent progresses in the EU FP-7 LarKC project, Web-scale reasoning and the Semantic Web in general.

The seminar is open and free to everyone who is interested.The 2010 LarKC PhD Symposium is the 2nd symposium in this series. The 1st symposium is jointly held with STI PhD Seminar 2009 in Berlin. The participants of that events all agree that they learned a lot from each other and that is one of the most important reason why we have this event this year.During year 2 of the LarKC project, the consortium has many progresses on Web-scale reasoning and search, ranging from new selection and reasoning strategies to real-world use cases. Many of them are from PhD students and young researchers in this consortium. We are very proud to have these researchers report their recent results in the 2nd LarKC PhD symposium.

In addition, we are very pleased to see that there are external plug-in contributions outside the LarKC consortium in the form of close collaboration with the LarKC members.The speakers for the 2nd LarKC PhD symposium are from China, Germany, Italy, the Netherlands, UK, etc. We are please to have several talks which cover a wide range on Semantic Web, Machine Learning, and AI in general.

The topics focus but not limited to: Natural Language Interfaces to Ontologies, segmentation strategies for Web-scale data, Machine learning meets the Semantic Web, selection strategies, parallel and Contrastive reasoning for the Semantic Web, and Semantic Web-enabled Recommender System.Some of the speakers are still in their PhD program, hence we are very pleased to have several senior members from and outside the LarKC consortium to make comments and suggestions to their future research in the area of Web search and reasoning. More importantly, the speakers will learn from each other during their communications in the symposium.

by Yi Zeng
The 4th LarKC Early Adopters Tutorial took place in Beijing, China on Nov 13th, 2010.
LarKC 4th Early Adopters Tutorial

Approximately 90 participants attended the tutorial. 2 introductions, 4 hands on sessions as well as use cases demos from Urban computing has been given.The tutorial is in bi-lingual (English and Chinese), with most of the talks translated real-time to the audience.

The participants agreed that LarKC is easy to use as a plugable platform for Web-scale reasoning.The materials of the 4th LarKC Early Adopters Tutorial can be downloaded from the LarKC sourceforge and the following addresses: