News and Updates on the KRR Group
Header image

Author Archives: admin

Read more about Big Data RDF Store Benchmarking Experiences at:
http://lod2.eu/BlogPost/1584-big-data-rdf-store-benchmarking-experiences.html

Benchmarking can stimulate technological progress. Check the Latest Berlin Benchmark Report for RDF & SPARQL compliant DBMS engines: http://bit.ly/Yf5etP and http://bit.ly/12UsFbu

Tags: 

Benchmarks

Posted by admin in ldbc - (0 Comments)

This page will include details for vendors and users containing reference information about the RDF and graph databases benchmarks developed by LDBC once they are completed. One can track the development of the benchmarks at: http://www.ldbc.eu:8090/display/TUC/Benchmark+Task+Forces

In recent days cyberspace has seen some discussion concerning the relationship of the EU FP7 project LDBC (Linked Data Benchmark Council) and sociotechnical considerations. It has been suggested that LDBC, to its own and the community’s detriment, ignores sociotechnical aspects.

LDBC, as research projects go, actually has an unusually large, and as of this early date, successful and thriving sociotechnical aspect, i.e. involvement of users and vendors alike. I will here discuss why, insofar the technical output of the project goes, sociotechnical metricss are in fact out of scope.  Then yet again, to what degree the benefits potentially obtained from the use of LDBC outcomes are in fact realized does have a strong dependence on community building, a social process.

One criticism of big data projects we sometimes encounter is the point that data without context is not useful. Further, one cannot just assume that one can throw several data sets together and get meaning from this, as there may be different semantics for similar looking things, just think of 7 different definitions of blood pressure.

LDBC, in its initial user community meeting was, according to its charter, focusing mostly on cases where the data is already in existence and of sufficient quality for the application at hand.

Michael Brodie, Chief Scientist at Verizon, is a well known advocate of focusing on meaning of data, not only on processing performance. There is a piece on this matter by him, Peter Boncz, Chris Bizer and myself on the Sigmod Record: “The Meaningful Use of Big Data: Four Perspectives”.

I had a conversation with Michael at a DERRI meeting a couple of years ago about measuring the total cost of technology adoption, thus including socio-technical aspects such as acceptance by users, learning curves of various stakeholders, whether in fact one could demonstrate an overall gain in productivity arising from semantic technologies. ‘Can one measure the effectiveness of different approaches to data integration?’ asked I. ‘Of course one can,’ answered Michael, ‘this only involves carrying out the same task with two different technologies, two different teams and then doing a double blind test with users.  However, this never happens. Nobody does this because doing the task even once in a large organization is enormously costly and nobody will even seriously consider doubling the expense.’ [in my words, paraphrased]

LDBC does in fact intend to address technical aspects of data integration, i.e. schema conversion, entity resolution and the like. Addressing the sociotechnical aspects of this such as whether one should integrate in the first place, whether the integration result adds value, whether it violates privacy or security concerns, whether users will understand the result, what the learning curves are etc. is simply too diverse and so totally domain dependent that a general purpose metric cannot be developed, not at least in the time and budget constraints of the project.  Further, adding a large human
element in the experimental setting, e.g how skilled the developers are, how well the  stakeholders can explain their needs, how often these needs change, etc. will lead to experiments that are so expensive to carry out and whose results will have so many unquantifiable factors that these will constitute an insuperable barrier to adoption.  

Experience demonstrates that even agreeing on the relative importance of quantifiable metrics of database performance is hard enough. Overreaching would compromize the project’s ability to deliver its core value. Let us next talk about this.

It is only a natural part of the political landscape that the EC’s research funding choices are criticized by some members of the public. Some criticism is about the emphasis on big data.  Big data is a fact on the ground and research and industry need to deal with it. Of course there have been and will be critics of technology in general on moral or philosophical grounds. Instead of opening this topic, I will refer you to an article by Michael Brodie http://www.michaelbrodie.com/michael_brodie_statement.asp In a world where big data is a given, lowering the entry threshold for big data applications, thus making them available not only to government agencies and the largest businesses seems ethical to me, as per Brodie’s checklist. LDBC will contribute to this by driving greater availability, performance and lower costfor these technologies.

Once we accept that big data is there and is important, we arrive at the issue of deriving actionable meaning from it. A prerequisite of deriving actionable meaning from big data is the ability to flexibly process this data. LDBC is about creating metrics for this. The prerequisites for flexibly working with  data are fairly independent of the specific use case whereas the criteria of meaning, let alone actionable analysis, are very domain specific. Therefore in order to provide the greatest service to the broadest constituency, LDBC focuses on measuring that which is most generic, yet will underlie any decision support or other data processing deployment that involves RDF or graph data.

I would say that LDBC is an exceptionally effective use of taxpayer money.  LDBC will produce metrics that will drive technology innovation for years to come.  The total money spent towards pursuing goals set forth by LDBC is likely to vastly exceed the budget of LDBC. Only think of the person-centuries or even millennia that have gone into optimizing for TPC C and H. The vast majority of the money spent for these pursuits is paid by industry, not by research funding. It is spent worldwide, not in Europe alone.

Thus, if LDBC is successful, a limited amount of EC research money will influence how much greater product development budgets are spent in the future.  This multiplier effect applies of course to highly successful research outcomes in general but is specially clear with LDBC.

European research funding has played a significant role in creating the foundations of the RDF/linked data scene.  LDBC is a continuation of this policy, however the focus has now shifted to reflect the greater maturity of the technology.  LDBC is now about making the RDF and graph database sectors into mature industries whose products can predictably tackle the challenges out there.

Orri Erling
OpenLink Software, Inc.

Tags: 

LDBC project

Posted by admin in ldbc - (0 Comments)

The mission of the LDBC can be compared to that of the Transaction Processing Council (TPC) that Jim Gray founded in the area of relational database technology (www.tpc.org). LDBC will create a body in which vendors of RDF and graph database systems agree on relevant benchmarks and benchmark practices; and will publish official benchmark results. The objective of the project is to highlight the functional and performance characteristics of Graph and RDF systems, viz-a-viz each other and established relational data management technology. The motivation for this is to help IT practitioners understand and select Graph and RDF data management products, and thus, help make the emerging Graph and RDF data management industry more mature. Additionally, we hope that LDBC will spur competition and thereby accelerate technical progress.

In detail:

  • “agreeing on benchmark practices” means agreeing on the exact rules and metrics with which products can be compared. Without such rules, which include having benchmark results checked by independent auditors, it is very easy to skew any benchmark result in one’s favor; e.g. by precomputing (partial) answers; by implementing benchmark-special functionalities, by being not open about hot or cold runs; by comparing results on wholly different hardware (with wholly different price-tags). There are many ways in which one can game a result.
  • “agreeing on metrics” is important as, without balanced metrics, it is easy to pick the benchmark observations or statistics that favor one algorithm/system/product (conveniently forgetting about other metrics relevant for the benchmark on which the performance maybe favorable — often systems must make trade-offs, so a win on one metric can become a loss on another; see e.g. the difference between OLTP and OLAP workloads). This will include a notion of score-per-EURO (or $), taking into account hardware+software+maintenance cost aspects in the results.

These points underline the industrial nature of the project, since such elements are not usually present in academic benchmark work. The industry participation in LDBC include Ontotext, Openlink and Neo Technologies (neo4j), which are European industrial leaders in this emerging technological space. The council itself is international, so other companies will be able to join the non-profit body of LDBC as well. More than ten such companies have approached LDBC already: effectively the great majority of RDF and Graph database companies are interested. We expect the council to start growing by March 2013, when a non-profit legal entity for it will have been formed; and membership will become formally possible.

The LDBC EU project has also a research participation in the form of UPC Barcelona, VUA Amsterdam, Technical University Munich, FORTH and STI Innsbruck. The research task is to kick-start the LDBC by helping in selecting/defining an initial set of benchmarks. Even though in RDF and graph databases there already exist benchmarks, aspects like cost metrics, rules for running the benchmark, and benchmark audits are generally underdeveloped; so LDBC here will extend existing benchmark components were possible and create new ones where necessary. The academic partners have been selected to include groups that have technical expertise in data management (e.g. RDF-3X — Munich; MonetDB, VectorWise – Amsterdam, Sparsity – Barcelona) so benchmarks will stress systems in relevant areas “where it hurts” in order to maximize the potential for progress.

In order to ensure that benchmarks represent usage scenarios that matter for technology users, LDBC has a Technical User Community (TUC). This TUC had its first meeting last week November 19/20 in Barcelona, that was well attended and quite productive. A digital record is found on: ldbc.eu:8090/display/TUC/First+TUC+meeting+Nov+2012

We see it as a sign of relevance for LDBC that these users spent two days to talk in-depth about their technical challenges with Graph and RDF software, multiple of them flying in from the US (on their own cost). The TUC includes participants from the publishing, life sciences, security and marketing domains. The outcomes of the first TUC meeting have been used to determine the direction in establishing the first LDBC benchmark task forces; and the TUC will remain continuously involved in providing information on relevant datasets and workloads, and feedback to benchmark specifications as they evolve.

In case this description got you interested, and specifically if you are a user of RDF, graph or relational technology, we would like to invite you take a short survey: http://goo.gl/PwGtK

More about the project, its activities and its benchmarks in the future are found on: www.ldbc.eu. We are also on twitter @LDBCproject.
You can contact me via: larri “at” ac.upc.edu

Yours,
Josep Lluis Larriba Pey
LDBC coordinator

Tags: 

The TUC will kick off its activities on November 19/20, 2012 at the first scheduled meeting in Barcelona. An online questionnaire is available at http://goo.gl/PwGtK for interested parties can make contributions on their experiences and needs for consideration in the LDBC benchmarks.

This week we received notification from the EU that the LDBC project has been granted. We think this is great news. The LDBC project (is a STREP and will run until Q2 2015. LDBC stands for Linked Data Benchmark Council, and linked data here of course comprises RDF data management, but also includes the emerging class of graph database systems.

The mission of the LDBC project is to establish a long-term independent association among RDF and Graph database companies that define benchmarks, specify benchmarking practices and publish officially vetted benchmark results. Beyond the project partners, many commercial vendors of RDF and Graph database systems have already expressed their interest in joining this council (once we have founded the legal entity.. it will take a few months still).

The motivation behind the project is to show the strengths (and weaknesses) of RDF and Graph database technologies to the wider IT community pondering the adoption of these technologies, by enabling comparisons between the various products but also with established relational database technologies. Also, by establishing competition on these benchmarks LDBC aims to foment technical progress in the RDF and Graph database systems.

The LDBC project partners include for the RDF database community Ontotext and Openlink; from the graph database side there is Neo Technologies (of neo4j fame) and Sparsity is indirectly involved through academic project partner UPC (Barcelona). Other project partners are University of Innsbruck, FORTH, VU University Amsterdam and Technical University Munich (TUM). The academic partners will help to provide the council with an initial set of benchmarks.

The technical topics of interest for benchmarking are:

  • complex analytical queries for both graph and RDF
  • graph analysis algorithms and traversals
  • large-scale reasoning on RDF data
  • transaction performance
  • systems support for data integration and provenance

The use-case scenarios for these are:

  • social networking (e.g. marketing companies)
  • dynamic publishing (e.g. BBC)
  • telecommunication network analysis
  • bioinformatics data integration (e.g. OpenPhacts)

LDBC interacts with users of Graph and RDF technologies through is Technical User Community (TUC), and the TUC is having its first users workshop in Barcelona next week Nov19+Nov20 (http://www.ldbc.eu:8090/display/TUC/First+TUC+meeting+Nov+2012) on the premises of UPC. The main take-away for users to engage with the TUC is to influence the benchmarking agenda of the LDBC. Talk to us, and RDF vendors might start competing in how to best solve your problems! Even if the Barcelona meeting is too short notice, please drop a note if you want to be involved in the TUC or know people who should.

Finally, please fill in the questionnaire (http://goo.gl/PwGtK) to tell us about your usage (problems) with RDF (or graph) database technologies. We will be looking at the questionnaire results that we have received by Friday November 16 to help set the agenda in the users meeting, so if you want to contribute already this week, that would be highly appreciated.

Thanks for your time, also on behalf of the full LDBC consortium,

Peter Boncz (scientific director LDBC)
Paul Groth
Frank van Harmelen

Enhanced by Zemanta

Neo Technology and LDBC

Posted by admin in ldbc - (0 Comments)

“Graphs are everywhere. Organizations of all sizes, from large enterprise to new startups, are embracing graph databases as the fastest way to query and store graph data. The EU has recognized this, and has funded the Linked Data Benchmark Council to promote and further the research in graph databases. We are grateful to the EU for recognizing the leading role of Neo4j in graph database adoption worldwide and have accepted its invitation join the research team, where we will be working closely with graph reseachers to set the next generation of industry standards and benchmarks.”
Emil Eifrém, CEO of Neo Technology

Tags: 

November 9, 2012, the EU confirmed the start of the new FP7 project called Linked Data Benchmark Council (LDBC). The main objective of LDBC is the development of benchmarks for the emerging field of RDF and graph data management systems, as well as to spur industry cooperation around such benchmarks. This new council of database software vendors and academics will establish benchmarks and publish benchmark results that will make the properties of RDF and graph data management systems insightful.
The LDBC audience includes IT professionals interested in using these emerging technologies, researchers in both the database and semantic web research communities, and data management technology vendors.
The outcomes of the LDBC project will be

  • (i) a set of benchmarks that will span four technical expertise areas: complex query execution, transactionality in graphs, RDF inference and RDF support for ETL/data integration, and
  • (ii) the creation of an industry-supported LDBC organization that will outlive the EU project, and which ultimately aims to include the entire set of RDF and graph database vendors.

The LDBC also engages users of graph and RDF data management technology in its Technical User Community (TUC); where users have the opportunity to interact with the LDBC in order to make sure their experiences and needs find their way into LDBC benchmarks. The TUC will kick off its activities on November 19/20, 2012 at the first scheduled meeting in Barcelona. Alternatively, an online questionnaire is available at http://goo.gl/PwGtK for interested parties can make contributions on their experiences and needs for consideration in the LDBC benchmarks.
Please visit http://ldbc.eu/tuc to engage with its technical user community.

 

Tags: 

LIMES Webinar

Posted by admin in Uncategorized - (0 Comments)

On 27.03.2012, 04.00pm CET the LOD2 project (http://lod2.eu) will offer the next free one hour webinar on LIMES. LIMES is a tool providing time-efficient and lossless discovery of links across knowledge bases. LIMES is an extensible declarative framework that encapsulates manifold algorithms dedicated to the processing of structured data of any sort. Built with extensibility and easy integration in mind, LIMES allows implementing applications that integrate, consume and/or generate Linked Data. Within LOD2, it will be used for discovering links between knowledge bases.

This webinar will be presented by the LOD2 Partner: University of Leipzig (ULEI), Germany

The LOD2 webinar series is powered by the LOD2 project organised and produced by the Semantic Web Company (Austria). If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the LOD2 webinar series! The LOD2 team is looking forward to meeting you at the webinar!

When : 27.03. 2012, 04.00pm – 05.00pm CET
Information & Registration: https://www2.gotomeeting.com/register/369667514

The LOD2 team is looking forward to meeting you at the webinar!! All the best and have a nice day!

Enhanced by Zemanta