News and Updates on the KRR Group
Header image

The KR&R group investigates modelling and representation of different forms of knowledge and reasoning, as found in a large variety of AI systems. We have an interest in both applications and theory. We study theoretical properties of knowledge representation and reasoning formalisms, but are also involved in developing practical knowledge-based systems. Recently, we have been very active in developments around the Semantic Web.

Posts on this website are continuously aggregated from project and member blogs.

Source: Semantic Web world for you
The WordPress.com stats helper monkeys prepared a 2011 annual report for this blog. Here’s an excerpt: A San Francisco cable car holds 60 people. This blog was viewed about 2,800 times in 2011. If it were a cable car, it would take about 47 trips to carry that many people. Click here to see the [...]

Source: Think Links

The VU University Amsterdam computer science department has been a pioneer at putting structured data and Semantic Web into the undergraduate curriculum through our Web-based Knowledge Representation. I’ve had the pleasure of teaching the class for the past 3 years. The class is done in a short block of 8 weeks (7 weeks if you give them a week for exams). It’s a fairly complicated class for second year undergraduates but each year the technology becomes easier making it easier for the students to ground the concepts of KR and Web-based data into applications.

The class involves 6 lectures covering the major ground of Semantic Web technologies and KR. We then give them 3 1/2 weeks to design and hopefully build a Semantic Web application in pairs. During this time we give one-on-one support through appointments. For most students, this is the first time they’ve come into contact with Semantic Web technologies.

This year they built applications based on The Times Higher Education 2011 World University rankings. They converted databases to RDF, developed their own ontologies, integrated data from the linked data cloud and visualized data using sparql. I was impressed with all the work they did and I wanted to share some of their projects. Here are four screencasts from the applications the students built.

Points of Interest Around Universities

Guess Which University

Find Universities by Location

SPARQL Query Builder for University Info



Filed under: academia, linked data Tagged: education, linked data, semantic web, student, vu university amsterdam, web-based knowledge representation

Source: Think Links

It’s nice to see where I work (VU University Amsterdam) putting out some nifty promotional videos on YouTube. Here are two from the Computer Science department and the Network Institute both of which I’m happy to be part of.

 

 

Filed under: academia Tagged: computer science, network institute, vu university amsterdam

Source: Semantic Web world for you

I’m currently spending some time at Yahoo labs in Barcelona to work with Peter Mika and his team on data analysis. Last week, I was invited to give a seminar on how we perform network-based analysis of Linked Data at the VU. The slides are embedded at the end of this post.

Essentially, we observe that focusing only on the triples (c.f., for instance, a BTC snapshot) is not enough to explain some of the patterns observed in the Linked Data ecosystem. In order to understand what’s really going on, one as to take in account the data, its publishers/consumers and the machines that serve it. Time also plays an important role and shouldn’t be neglected. This brings us to studying this ecosystem as a Complex System and that’s one of the thing that is keeping Paul, Frank, Stefan, Shenghui and myself busy these days ;-)

Exploring Linked Data content through network analysis

Source: Think Links

This past Tuesday, I had the opportunity to give a webinar for Elsevier Labs giving an overview of altmetrics. It was a fun opportunity to talk to people who have a great chance to influence the next generation of academic measurement. The slides are embedded below.

At the VU, we are also working with Elsevier Labs on the Data2Semantics project where we are trying to enrich data with additional machine understandable metadata. How does this relate to metrics? I believe that metrics (access, usage, etc) can be e a key piece of additional semantics for datasets. I’m keen to see how metrics can make our data more useful, findable and understandable.

 

Filed under: altmetrics Tagged: #altmetrics, data2semantics, presentation

github kitty

We have opened up a Data2Semantics GitHub organisation for publishing all (open source) code produced within the Data2Semantics project. Point your browser (or Git client) to http://github.com/Data2Semantics for the latest and greatest!

Enhanced by Zemanta

The COMMIT programme was officially kicked-off by Maxime Verhagen, minister of Economic Affairs, Agriculture and Innovation at  the ICTDelta 2011 event held at the World Forum on November 16, in The Hague.

Throughout the day, members of the Data2Semantics project manned a very busy stand in the foyer, featuring prior and current work by the project partners such as the AIDA toolkit, OpenPHACTS, LarKC and the MetaLex Document Server.

Enhanced by Zemanta

Source: Semantic Web world for you

Scaling is often a central question for data intensive projects, making use of Semantic Web technologies or not, and SemanticXO is no exception to that. The triple store is used as a back end for the Journal of Sugar, which is a central component recording the usage of the different activities. This short post discusses the results found for two questions: “how many journal entries can the triple store sustain?” and “hoe much disk space is used to store the journal entries?”

Answering these questions means loading some Journal entries and measuring the read and write performances along with the disk space used. This is done by a script which randomly generate Journal entries and insert them in the store. A text sampler and the real names of activities are used to make these entries realistic in terms of size. An example of such generated entry, serialised in HTML, can be seen there. The following graphs show the results obtained for inserting 2000 journal entries. These figures have been averaged over 10 runs, each of them starting with a freshly created store. The triple store used is called “RedStore“, it is called with an hash based BerkleyDB backend. The test machine is an XO-1 running the software 11.2.0.

The disk space is minimal for up to 30 entries, grows rapidly between 30 and 70 entries and continues on a linear basis from that number on. The maximum space occupied is a bit less than 100MB which is few of the 1GB of storage of the XO-1.

 

Amount of disk space used by the triple store

The results for the read and write delay are a bit less of a good news. Write operations are constant in time and always take around 0.1 s. Getting an entry from the triple store proves to get linearly slower as the triple store gets filled. It can be noticed that for up to 600 entries, the retrieval time of an entry is below a second. This should provide a reasonable response time. However, with 2000 entries stored the retrieval time goes as high as 7 seconds :-(

Read and write access time

The answer to the question we started with (“Does it scale?”) is then “yes, for up to 600 entries” considering a first generation device and the current status of the software components (SemanticXO/Redstore/…). This answers also yields new questions, among which: Are 600 entries enough for a typical usage of the XO? Is it possible to improve the software to get better results? How are the result on some more recent hardware?

I would appreciate a bit of help for answering all of these, and especially the last one. I only have an XO-1 and can not thus run my script on an XO-1.5 or XO-1.75. If you have such device and are willing to help me getting the results, please download the package containing the performance script and the triple store and follow the instructions for running it. After a day of execution or so, this script will generate three CSV files that I could then postprocess to get similar curves as the one showed.

Related articles

Source: Think Links

The Journal of Web Semantics recently published a special issue on Using Provenance in the Semantic Web edited by myself and Yolanda Gil. (Vol 9, No 2 (2011)). All articles are available on the journal’s preprint server.

The issue highlights top research at the intersection of provenance and the Semantic Web. The papers addressed a range of topics including:

  • tracking provenance of DBpedia back to the underlying Wikipedia edits [Orlandi & Passant];
  • how to enable reproducibility using Semantic techniques [Moreau];
  • how to use provenance to effectively reason over large amounts (1 billion triples) of messy data [Bonatti et al.]; and
  • how to begin to capture semantically the intent of scientists [Pignotti et al.].
 Our editorial highlights a common thread between the papers and sums them up as follows:

A common thread through these papers is the use of already existing provenance ontologies. As the community comes to an increasing agreement on the commonalities of provenance representations through efforts such as the W3C Provenance Working Group, this will further enable new research on the use of provenance. This continues the fruitful interaction between standardization and research that is one of the hallmarks of the Semantic Web.

Overall, this set of papers demonstrates the latest approaches to enabling a Web that provides rich descriptions of how, when, where and why Web resources are produced and shows the sorts of reasoning and applications that these provenance descriptions make possible

Finally, it’s important to note that this issue wouldn’t have been possible without the quick and competent reviews done by the anonymous reviewers. This is my public thank you to them.

I hope you take a chance to take a look at this interesting work.

Filed under: academia, linked data Tagged: journal, linked data, provenance, semantic web

The Botari application from the LarKC project has won the Open Track of the Semantic Web Challenge.

Botari is a LarKC workflow running on servers in Seoul, plus a user frontend that runs on a Galaxy Tab.

The workflow combines open data from the city of Seoul (Open Street Map, POI’s) with twitter traffic and combines stream processing, machine learning and querying over RDF datasets and streams to give personalised restaurant information and recommendations, presented in an augmented reality interface on the Galaxy Tab.

For more info on Botari, see either the website, or the demo movie or the slide deck or the paper.

Enhanced by Zemanta