News and Updates on the KRR Group
Header image

Source: Think Links

I wrote a post a while back around the idea of Data DJs: how do we make it as easy to mix data as it is to mix music. This notion requires advances on several fronts from data and knowledge integration, to user interfaces, along with data provenance and semantics. Most of the research I do then somehow relates to this Data DJ’s in some form or anther.

However, I always thought I it would be fun to push the analogy as far as I could. Last Christmas, I got a DJ deck (specifically a Numark Stealth Control-fantastic name, right?) with the idea of actually using it to mix data sets. For a host of reasons, including time but also a lack of a clear vision of what an integration interface should look like, I never got past just toying around with it. However, over the past couple of weekends I found time to revisit it and develop a super alpha version of a data integration system using the deck. Here’s a video to see what I’ve done, read on to get more details.

What really got me going was the notion that events (or who, what, when, where and why) are a perfect substrate for data integration. This is not my idea but has been something I’ve been hearing from a number of sources including from a number of people in the VU’s Web and Media Group down the hall, Raphaël Troncy, and probably best summed up by Mor Naaman. With this as inspiration, I developed a preliminary interface around integrating/and summarizing events (well actually tweets, but hopefully this will expand to other event sources) that you saw in the video above. The components of the interface (shown in the picture below) are as follows:

  • On the top is a list of the search terms that were used to retrieve the tweets. The tweets for each search term can be hidden and unhidden.
  • On the right is a list of the users (i.e. sources) who made the tweets. Each source can be filtered in and out impacting the term summary graph
  • In the middle are all the tweets on the same timeline.
  • On the right, is a bar graph that summarizes the most common terms across the tweets.
  • Below the bar graph, is the time span of the tweets and the current time of the selected tweet.
  • On the far right are hashtags that are selected by the user.

As you saw in the video it’s pretty fast to scroll through both sources and tweets. With a quick flick it’s easy to apply a filter and pretty natural to select and deselect search terms. Furthermore, we can easily delete tweets and data sources with the push of a button. There’s still much much more to be done to make this a viable user interface for the kind of data mixing task we want to support. But standing in front of the projector today scrolling through tweets, eliminating sources and seeing an overview fly-up really convinced me that this type of interaction is really suited to the data integration task. That being said any advice or comments on the interface would be greatly appreciated. In particular, suggestions for good infographics pertaining to events would be appreciated.

Technical Details:

The interface was completely implemented using HTML5. In particular, I used the nice ProtoVis framework along with JQuery and JQuery Tools. To get the fast updates from the deck, we use WebSockets. I have a small Java program reading midi off the deck which then acts as a socket server for WebSockets and pipes the midi signals (after translation to JSON) to the connected sockets. I’ve been using Google Chrome for development so I don’t know how it works in other browsers. To get data, we use the search interface of twitter and JSONP. In general, I was very impressed with what you can do in the browser. I felt like I wasn’t even pushing the capabilities especially since I don’t do web programming everyday.

What’s next?

Lots! This was really just a proof of concept. There’s a bunch of directions to go in: improved graphics, better use of the decks, social interaction around integration (two djs at once!), more data sources beyond twitter, experiments on task performance, live mixing of an event…. If you have any ideas, suggestions, or comments, I’d love to hear them.

How do you want to data DJ?

Filed under: data dj Tagged: data dj, decks, infographics, mixing data

Source: Think Links

As a computer scientist, I’ve always found it inspirational talking to people from other disciplines. There are always interesting problems where computational techniques could be applied and also questions about what we would have to improve in order to use technology in these disciplines. I also know from talking to a range of people (biologists, communication scientists, etc) that they often feel excited about the opportunity to work with cutting edge computer science.

But even with excitement on both sides, it is hard to engage in interdisciplinary work. We are often pulled to our own communities for a variety of reasons (incentives, social structure, vocabulary…) and even when we do engage, it is often only for the length of one project. Afterwards, the collaboration dwindles.

The VU (Vrije Universiteit Amsterdam) through the Network Institute has been putting effort in trying to increase and extend interdisciplinary engagement. In June, Iina Hellsten and I organized a half-day symposium for discussion about collaborations between social science and computer science. It was successful in two respects:

  1. It generated excitement.
  2. It identified a set of challenges and opportunities for collaboration.
We followed up this symposium two months later (Aug. 28, 2009) with a second meeting this time focused on turning this excitement into concrete initiatives. We had 13 participants this time again with attendees from both computer science and social science.

The meeting started by breaking into three groups where we spent about 40 minutes generating concrete collaboration ideas in the context of the 4 challenges and 4 opportunities identified at the last meeting. We ensured that each group had members from computer science and social science. After that session each group presented their top 3 ideas. Groups were good at using the “technology”:

After this session, the group selected three areas of interest and then discussed how these could be concretely acted upon.

Here are the results:

1. Advertising collaborations

One issue that came up was the difficulty in knowing what the other discipline was doing and whether collaboration would be helpful.

  • Announcement of talks on a central site. Simply, if the agent simulation group in CS is having a talk perhaps the organization architectures social science group would want to know about it. We thought we could use the Network Institute Linked In Group for this.
  • Consulting. I thought this was a fun idea… Here, one could advertise their willingness to spend 1/2, 1, or two days with a person from the other discipline advising and helping them out with no expectations on either side. For example, if a social scientist wanted to have help running a large scale analysis, a computer scientist could help for a day without expecting to have to continue to help. Likewise, a computer scientist wanting a social scientist to check if their paper on analyzing twitter was theoretically sound, the social scientist could spend a half day with them. It was proposed that the Network Institute could offer incentives for this.

2. Interdisciplinary master and PhD student projects.
Collaborating through students can provide a way to build longer lasting collaborations.

  • One initiative would be to advertise co-supervised masters projects hopefully as soon as this November.
  • Since PhD students usually require funding, it was felt there needs to be more collaboration on obtaining research funding between faculties. One challenge here is knowing what calls could be targeted. To attack this problem, we thought the subsidy desk at the VU could start a special email list for interdisciplinary calls.

3. Processing large-scale data
Large scale data (from the web or otherwise) was of interest to a big chunk of the people in the room. There was a feeling that it would be nice to know what sorts of data sets people have or what data sets they were looking for.

  • As a first step, we imagine a structured event sometime in 2011 where participants would present the data sets they have or what data sets they are looking for, and what analysis they aim to do. The aim of the event would be to try and build one-to-one connections across disciplines.

I think the group as a whole felt that these ideas could be straightforwardly put into practice and would lead to deeper and lasting collaborations between social and computer science. It would be great to hear your ideas along with comments and questions below.

Filed under: academia Tagged: collaboration, computer science, network institute, social science, vu

Source: Think Links

One of the things that I think is great about the VU (Vrije Universiteit Amsterdam) where I work is the promotion of interdisciplinary work through organizations like the Network Institute.  Computer Science is often known for interacting with biology, physics, and economics but we are now seeing the application of computing to Social Science problems. This is great for CS because domains often introduce new fundamental CS problems.

To talk about the overlap and potential opportunities for greater Social Science and Computer Science collaboration at the VU, Iina Hellsten (from Organization Science) and I organized a half-day symposium on Tuesday, June 29, 2010. We had a great environment for the discussion in the Intertain Lab (a space for investigating new interactive environments).

We had 17 participants about half from the Social Sciences (covering organization science, communication science, to psychology)  and half from Computer Science.

We started off with talks setting the scene from myself (on the CS side) and Peter Groenewegen and then moved to a series of shorter talks giving us a glimpse of the different focuses of some of the attendees. Even during these talks, there was clearly excitement about the possibilities for collaboration and there were several interesting conversations about the work itself.

The last part of the symposium was a session where we identified challenges and opportunities. We ran this as a post-it note session where each participant wrote two challenges and two opportunities on post it notes. (I got this idea from Katy Börner at her NSF Workshop on Mapping of Science and the Semantic Web. Thanks Katy!). Amazingly, these post-it notes always cluster together. Below is an image of the results of the session:

The group identified 8 different groupings of the 60 challenges and opportunities listed by the participants. They were:

  1. How do we bridge the vocabulary gap between social science and computer science?
  2. We have the opportunity to build new applications using insights from social science.
  3. Writing new proposals and fundraising.
  4. Knowing who in the other discipline is working on a particular subject and maintaining connections between the disciplines.
  5. Being able to answer new research questions.
  6. Having an opportunity to apply research results in the “real world”.
  7. Automating parts of social science analysis (think network extraction from data sets).
  8. Overcoming the differing research styles of the two disciplines especially in terms of publication cycles.

Below we list the actual text of the post-it notes grouped into the 8 areas.

The outcome of the symposium is that now that we’ve identified clusters of challenges and opportunities, we need to focus on concrete collaborations to address these areas. We will hold another session in September to discuss concrete actions.

Overall, this event showed me that at the VU, we have both the right structures but the right people to engage in this sort of interdisciplinary research.

Results of Post-it Note Session:
post-it content challenge or opportunity (c/o) category
More user centered/friendly systems. Not only usability, but also privacy strong communication ties o no category
convience peers (e.g reviewers) c no category
learn to give data (LOD) the right intrepretation o no category
use the methodological rigor (of social science?) to scope your results o no category
exploring/studying area for “design” of techno-social systems o vocab
seduce social scentists to think technical and computer scientist to think social c vocab
mix technical(cs) and social theoris and modes to advance understanding c vocab
deal with some fuzziness of social science models c vocab
time consuming coordination or alternatively miscommunication c vocab
different mindsets conceptualizations c vocab
it is difficult to develop shared understanding of theory c vocab
it is difficult to find common levels of abstraction c vocab
integrate low level network analysis with higher level models from social sciences c vocab
different sorts of thinking in cs and social social science c vocab
combining conceptual work to “bridge” the gap c vocab
very different outlook on research c vocab
speaking/interacting using the “same” vocabulary c vocab
finding coomon language between computer & social sciences c vocab
talk similar language c vocab
new applications of technology o new apps
teaching each other concepts/methods o new apps
developing new technology bundles together (e.g. pda-based surveys) o new apps
processing huge bulks of data o new apps
fundrasiing opportunities o funds
socio-technical support for agile social networks in organizations o funds
cross-polinization & cross-fertilization for developing meaningful insights o funds
keeping the connections across exisiting projects c who’s who
knowing who is doing what c who’s who
give overview of who is doing what in this field at the VU (via webpage?) o who’s who
identify the true webscience problems in the convergence of cs & ss o answering new questions
find relevant problems that are now solvable because of ICT solutions o answering new questions
generating new ideas o answering new questions
seeing research problems from new perspectives o answering new questions
provide overview of available methods, etc. o answering new questions
if we work together we can integrate our knoweldge and get a better idea about the big picture o answering new questions
make technical & interpretive knowledge come together o answering new questions
designing studies that have a greate change of producing real insights o real results
understand the social web phenomena like wikipedia, facebook (motivation/quality) o real results
share (experience) tools for network vizualization & analysis o real results
linking concepts that wouldn’t have been associated earlier (underlying frames) o real results
applying the results of the detailed tracking of people o real results
ending up with a lot of manual work to compensate for technical errors c automated analysis
combining social networks and content networks o automated analysis
automating social and content analysis o automated analysis
losing valuable information that might be essential to understanding phenomena c automated analysis
automated analysis & interpretations of social phenomena c automated analysis
thinking that one side (your side) always does things “the right way”. c research styles
interests are divergent c research styles
research timeframes are divergent c research styles
cs need short-term “help” -> pulbication cycle c research styles
different scientific approaches and styles (e.g. publication) c research styles

Filed under: academia Tagged: computational social science, post-it notes, symposium, vu unviersity amsterdam