News and Updates on the KRR Group
Header image

Source: Semantic Web world for you
[object Window] via ICT 4 Development course final presentations. Filed under: Linked Open Data, SemanticXO

Source: Semantic Web world for you
[object Window] via ICT 4 Development course final presentations. Filed under: Linked Open Data, SemanticXO

Source: Semantic Web world for you
Il y a quelque jours j’ai eu le plaisir, et la chance, de participer à la série de webinaires organisés par l’AIMS. L’objectif que je m’étais fixé pour ma présentation (en Français) intitulée “Clarifier le sens de vos données publiques avec le Web de données” était de démontrer l’avantage de l’utilisation du Web de données [...]

Source: Semantic Web world for you
Il y a quelque jours j’ai eu le plaisir, et la chance, de participer à la série de webinaires organisés par l’AIMS. L’objectif que je m’étais fixé pour ma présentation (en Français) intitulée “Clarifier le sens de vos données publiques avec le Web de données” était de démontrer l’avantage de l’utilisation du Web de données [...]

Source: Think Links

Below is a post-it note summary made with our students in the Web Science course. This is the capstone class for students doing the Web Science minor here a the VU and the summary highlights the topics they’ve learned about so far in four other courses.

webscience-summary

Filed under: academia Tagged: summary, web science

Source: Think Links

The WordPress.com stats helper monkeys prepared a 2012 annual report for this blog.

Here’s an excerpt:

600 people reached the top of Mt. Everest in 2012. This blog got about 4,900 views in 2012. If every person who reached the top of Mt. Everest viewed this blog, it would have taken 8 years to get that many views.

Click here to see the complete report.

Filed under: Uncategorized

Last Wednesday, Frank van Harmelen appeared on the Dutch science TV program “Labyrint”, where he interviews George Dyson, Luc Steels and François Pachet about their ideas on the future of Computers.

The program can be watched online (in Dutch):

And here’s the discussion session afterwards (in Dutch):

More information at the website of Labyrint.

Source: Data2Semantics

A few months ago Laurens Rietveld was looking for a query interface from which he could easily query any other SPARQL endpoint.

But he couldn’t find any that fit my requirements:

So he decided to make his own!

Give it a try at: http://aers.data2semantics.org/yasgui/

Future work (next year probably):

Comments are appreciated (including feature ideas / bug reports).

Sources are available at https://github.com/LaurensRietveld/yasgui

Enhanced by Zemanta

In recent days cyberspace has seen some discussion concerning the relationship of the EU FP7 project LDBC (Linked Data Benchmark Council) and sociotechnical considerations. It has been suggested that LDBC, to its own and the community’s detriment, ignores sociotechnical aspects.

LDBC, as research projects go, actually has an unusually large, and as of this early date, successful and thriving sociotechnical aspect, i.e. involvement of users and vendors alike. I will here discuss why, insofar the technical output of the project goes, sociotechnical metricss are in fact out of scope.  Then yet again, to what degree the benefits potentially obtained from the use of LDBC outcomes are in fact realized does have a strong dependence on community building, a social process.

One criticism of big data projects we sometimes encounter is the point that data without context is not useful. Further, one cannot just assume that one can throw several data sets together and get meaning from this, as there may be different semantics for similar looking things, just think of 7 different definitions of blood pressure.

LDBC, in its initial user community meeting was, according to its charter, focusing mostly on cases where the data is already in existence and of sufficient quality for the application at hand.

Michael Brodie, Chief Scientist at Verizon, is a well known advocate of focusing on meaning of data, not only on processing performance. There is a piece on this matter by him, Peter Boncz, Chris Bizer and myself on the Sigmod Record: “The Meaningful Use of Big Data: Four Perspectives”.

I had a conversation with Michael at a DERRI meeting a couple of years ago about measuring the total cost of technology adoption, thus including socio-technical aspects such as acceptance by users, learning curves of various stakeholders, whether in fact one could demonstrate an overall gain in productivity arising from semantic technologies. ‘Can one measure the effectiveness of different approaches to data integration?’ asked I. ‘Of course one can,’ answered Michael, ‘this only involves carrying out the same task with two different technologies, two different teams and then doing a double blind test with users.  However, this never happens. Nobody does this because doing the task even once in a large organization is enormously costly and nobody will even seriously consider doubling the expense.’ [in my words, paraphrased]

LDBC does in fact intend to address technical aspects of data integration, i.e. schema conversion, entity resolution and the like. Addressing the sociotechnical aspects of this such as whether one should integrate in the first place, whether the integration result adds value, whether it violates privacy or security concerns, whether users will understand the result, what the learning curves are etc. is simply too diverse and so totally domain dependent that a general purpose metric cannot be developed, not at least in the time and budget constraints of the project.  Further, adding a large human
element in the experimental setting, e.g how skilled the developers are, how well the  stakeholders can explain their needs, how often these needs change, etc. will lead to experiments that are so expensive to carry out and whose results will have so many unquantifiable factors that these will constitute an insuperable barrier to adoption.  

Experience demonstrates that even agreeing on the relative importance of quantifiable metrics of database performance is hard enough. Overreaching would compromize the project’s ability to deliver its core value. Let us next talk about this.

It is only a natural part of the political landscape that the EC’s research funding choices are criticized by some members of the public. Some criticism is about the emphasis on big data.  Big data is a fact on the ground and research and industry need to deal with it. Of course there have been and will be critics of technology in general on moral or philosophical grounds. Instead of opening this topic, I will refer you to an article by Michael Brodie http://www.michaelbrodie.com/michael_brodie_statement.asp In a world where big data is a given, lowering the entry threshold for big data applications, thus making them available not only to government agencies and the largest businesses seems ethical to me, as per Brodie’s checklist. LDBC will contribute to this by driving greater availability, performance and lower costfor these technologies.

Once we accept that big data is there and is important, we arrive at the issue of deriving actionable meaning from it. A prerequisite of deriving actionable meaning from big data is the ability to flexibly process this data. LDBC is about creating metrics for this. The prerequisites for flexibly working with  data are fairly independent of the specific use case whereas the criteria of meaning, let alone actionable analysis, are very domain specific. Therefore in order to provide the greatest service to the broadest constituency, LDBC focuses on measuring that which is most generic, yet will underlie any decision support or other data processing deployment that involves RDF or graph data.

I would say that LDBC is an exceptionally effective use of taxpayer money.  LDBC will produce metrics that will drive technology innovation for years to come.  The total money spent towards pursuing goals set forth by LDBC is likely to vastly exceed the budget of LDBC. Only think of the person-centuries or even millennia that have gone into optimizing for TPC C and H. The vast majority of the money spent for these pursuits is paid by industry, not by research funding. It is spent worldwide, not in Europe alone.

Thus, if LDBC is successful, a limited amount of EC research money will influence how much greater product development budgets are spent in the future.  This multiplier effect applies of course to highly successful research outcomes in general but is specially clear with LDBC.

European research funding has played a significant role in creating the foundations of the RDF/linked data scene.  LDBC is a continuation of this policy, however the focus has now shifted to reflect the greater maturity of the technology.  LDBC is now about making the RDF and graph database sectors into mature industries whose products can predictably tackle the challenges out there.

Orri Erling
OpenLink Software, Inc.

Tags: 

LDBC project

Posted by admin in ldbc - (0 Comments)

The mission of the LDBC can be compared to that of the Transaction Processing Council (TPC) that Jim Gray founded in the area of relational database technology (www.tpc.org). LDBC will create a body in which vendors of RDF and graph database systems agree on relevant benchmarks and benchmark practices; and will publish official benchmark results. The objective of the project is to highlight the functional and performance characteristics of Graph and RDF systems, viz-a-viz each other and established relational data management technology. The motivation for this is to help IT practitioners understand and select Graph and RDF data management products, and thus, help make the emerging Graph and RDF data management industry more mature. Additionally, we hope that LDBC will spur competition and thereby accelerate technical progress.

In detail:

  • “agreeing on benchmark practices” means agreeing on the exact rules and metrics with which products can be compared. Without such rules, which include having benchmark results checked by independent auditors, it is very easy to skew any benchmark result in one’s favor; e.g. by precomputing (partial) answers; by implementing benchmark-special functionalities, by being not open about hot or cold runs; by comparing results on wholly different hardware (with wholly different price-tags). There are many ways in which one can game a result.
  • “agreeing on metrics” is important as, without balanced metrics, it is easy to pick the benchmark observations or statistics that favor one algorithm/system/product (conveniently forgetting about other metrics relevant for the benchmark on which the performance maybe favorable — often systems must make trade-offs, so a win on one metric can become a loss on another; see e.g. the difference between OLTP and OLAP workloads). This will include a notion of score-per-EURO (or $), taking into account hardware+software+maintenance cost aspects in the results.

These points underline the industrial nature of the project, since such elements are not usually present in academic benchmark work. The industry participation in LDBC include Ontotext, Openlink and Neo Technologies (neo4j), which are European industrial leaders in this emerging technological space. The council itself is international, so other companies will be able to join the non-profit body of LDBC as well. More than ten such companies have approached LDBC already: effectively the great majority of RDF and Graph database companies are interested. We expect the council to start growing by March 2013, when a non-profit legal entity for it will have been formed; and membership will become formally possible.

The LDBC EU project has also a research participation in the form of UPC Barcelona, VUA Amsterdam, Technical University Munich, FORTH and STI Innsbruck. The research task is to kick-start the LDBC by helping in selecting/defining an initial set of benchmarks. Even though in RDF and graph databases there already exist benchmarks, aspects like cost metrics, rules for running the benchmark, and benchmark audits are generally underdeveloped; so LDBC here will extend existing benchmark components were possible and create new ones where necessary. The academic partners have been selected to include groups that have technical expertise in data management (e.g. RDF-3X — Munich; MonetDB, VectorWise – Amsterdam, Sparsity – Barcelona) so benchmarks will stress systems in relevant areas “where it hurts” in order to maximize the potential for progress.

In order to ensure that benchmarks represent usage scenarios that matter for technology users, LDBC has a Technical User Community (TUC). This TUC had its first meeting last week November 19/20 in Barcelona, that was well attended and quite productive. A digital record is found on: ldbc.eu:8090/display/TUC/First+TUC+meeting+Nov+2012

We see it as a sign of relevance for LDBC that these users spent two days to talk in-depth about their technical challenges with Graph and RDF software, multiple of them flying in from the US (on their own cost). The TUC includes participants from the publishing, life sciences, security and marketing domains. The outcomes of the first TUC meeting have been used to determine the direction in establishing the first LDBC benchmark task forces; and the TUC will remain continuously involved in providing information on relevant datasets and workloads, and feedback to benchmark specifications as they evolve.

In case this description got you interested, and specifically if you are a user of RDF, graph or relational technology, we would like to invite you take a short survey: http://goo.gl/PwGtK

More about the project, its activities and its benchmarks in the future are found on: www.ldbc.eu. We are also on twitter @LDBCproject.
You can contact me via: larri “at” ac.upc.edu

Yours,
Josep Lluis Larriba Pey
LDBC coordinator

Tags: