Archive for the ‘Proposal’ Category

Visualisation Proposal: Accessing Data & Available Visualisation Frameworks

November 27, 2009

To create the visualisation I proposed I need to access a lot of external services like Delicious, Twitter, … In this post I’ll take a first look at what are the possibilities of accessing data from a few of these services. I’ll end with a conclusion explaining based on the data availability which visualisation I’d like to start prototyping.

Accessing Data

The first question I asked myself was what information do I need to start searching on these services. The starting point of course is the metadata from the paper stored by the open repository. What information could be used to search for on other services, there is currently no such thing as a unique identifier for a paper on all services. But there is of course still enough information to start exploring for information on other services. When publishing a paper on Lirias a user is asked to at least fill in one of the following identifiers:

  • URI – Universal Resource Identifier – e.g.. http://www.Lirias.org/help/submit.html
  • ISSN – International Standard Serial Number – e.g. 1234-5678
  • ISBN – International Standard Book Number – e.g. 0-1234-5678-9
  • DOI – Digital Object Identifier – e.g.. 10.1000/182
  • Other – A unique identifier assigned to the item using a system different from the specified ones.

Next to these identifiers the DSpace software also allows to use Handles. Handles are a way of assigning globally unique identifiers to objects within the DSpace system. These identifiers are persistent and make it possible to for users to bookmark files from Lirias. A Handle can be formatted as a URL and is thus the ideal way for a user to create a bookmark without ever having a broken link. When searching on other external services for a Lirias document often these Handles will be found as a reference.

These different identifiers of course offer more ways to search for the presence in a certain. But they also make the process of searching a bit more complicated e.g. different identifiers are present on the same or more services. In the next paragraphs I’ll be looking at what types of identifiers service allow to search on with their API.

Delicious

The Delicious API is limited to perform actions on the profile of a user and doesn’t allow any global search so it’s only for personal bookmarks. On the site of Delicious there is the possibility to search for a URL.

Bibsonomy

The API of Bibsonomy offers no explicit way to search for ISBN, DOI, … but it does offer a full text search. The search covers all available metadata for a post (e.g. title, authors, ISBN, DOI, …) as well as associated tags. This technique might result in  undesirable results:

Special characters in search terms: Please note that all special (i.e. non-alphanumeric) characters occuring in search terms apart from “_” and “‘” are treated as search term separators – if you search e.g. for an ISBN like /posts?search=978-0-387-71000-6 , then you also an entry with ISBN 387-0-978-71000-6 will be matched, because the number blocks are treated as distinct search terms, which is not what you want.

Connotea

The Connotea API allows to search for tags and URI/URL. This example http://www.connotea.org/data/tags/uri/http://www.google.com/ returns a list of tags for the Google-site.

Other bookmarking sites

An other bookmarking tool I’ve been looking at is CiteULike, this site does not seem to have a fully developed API. I only found an API for developing plugins, which at first sight doesn’t seem to be updated since 2006. The bookmarking service Zotero also offers an API but it’s limited to development within Firefox and is meant for development of extensions accessing Zotero.

Twitter

When searching for activity on Twitter, the biggest problem is what are we looking for. The first search I thought of was the URL of a paper. Twitter has a well documented API, allowing to search for a specific URL. But the limitations of the 140 characters on Twitter forces people to use URL shortening services like TinyURL. So when searching for a particular URL it will not return all related post because of the use of the URL shortening services. A solution to this problem is that Backtweets offers a Twitter search including these post and offers an API.

Another way to find interesting tweets is by searching for a hash tag used during a conference where a paper was presented. But how can I get to know the used hash tags of conferences? This is a question that still needs to be answered. Also there might be some other tweets related to a paper scientists might be interested in that I don’t know of, I hope to discover these during the evaluation of the visualisation.

Blog posts

The hardest information to receive is blog posts related to a paper, there are a lot of blog services like WordPress, Blogger, … The APIs offered by these services are either limited to simple retrieval not allowing to search for a mention of a specific URL or they only offer an API for plugin development. Another issue is that scientist also blog on social networks like Nature Network so blogs are widespread just looking at one or more blog services might not give a realistic view of the blogging activity.

A possible solution to this problem might be Google Alerts which allows you to receive an alert whenever a search term is mentioned on a blog. An issue that arises is that these alerts are accessible via RSS feed or e-mail and there is no API. So whenever searching for a URL a Google Alerts needs to be created and verified, this is time-consuming. This issue needs some more research on how I will achieve to receive blog posts that mention an article.

Available Visualisation Frameworks

As I want to start evaluating and developing a prototype within the next weeks I need to make a choice in which visualisation to start with. In this blog post I’ve taken a look at what information I need to get from these external services. It seems that at this moment I’m only able to find information related to a URL with the API offered by Connotea. It also is not clear what information I’m going to show on the timeline: what tweets are useful? how do I get blog posts mentioning a URL? These are some of the question still unanswered.

Another aspect that I might need to take in account is what kind of visualisation frameworks are already available for the development of these visualisations. The timeline visualisation could be developed using SIMILE Timeline Widget which only needs some data as in put. For the tag and library visualisation I might be able to use the relation browser from Moritz Stefaner. Another possibility for the tag browsing would be the cluster map used in the visualisation of social bookmarks. In case of using the relation browser a lot of bookmarks wouldn’t give a good overview but a cluster map would.

By the lack of information I think starting with the tag visualisation is the best way to go because the timeline activities are still uncertain and for the library visualisation I need user data from Mendeley. I seems that a there are enough frameworks to start from although I haven’t looked at how difficult it would be to start developing with them.

Do you have any suggestion on how to easily get tag and paper information or got a answer to the any of my questions (where to get related blogposts, what twitter messages are useful, …) please let me know. Comment are always welcome!

Advertisements

Visualisation Proposal: Scenarios & Data

November 26, 2009

After telling what I’d like to visualise I would like to explain some more about the visualisation proposal. For evaluating my visualisation I was asked to think of some scenarios users of the visualisation could perform. Also it was not clear in my previous post which data I would like to use.

Scenarios

Important in the evaluation of the visualisation are scenarios that I could ask users to perform. Performing these scenarios user can get an idea about the visualisation and give some feedback. Here are the scenarios I’ve come up with, when I think of or receive feedback on other scenarios I’ll be sure to add them.

Timeline Visualisation

  • Search for all bookmarks or tweets or other type of activity made with a service within a specific period.
  • Look for the period with the most activity on the paper.
  • Search for an activity with a specified text, tag, …
  • Retrieve the day or week of an activity.
  • Look at the detail of an activity.

Tag Visualisation

  • Look up a paper indirectly related to the current paper.
  • Find all papers with 2 or more tags in common with the current paper.
  • Find all papers with one specific tag.

Mendeley Library Visualisation

  • Find the paper most related to the current paper from a library.
  • Select all papers with one or more tags from a library.
  • Find all libraries at least containing the current paper.

Data

In my previous post about the visualisation proposal I explained that I’d like to visualise tags. These tags would be received from other services, a possible way to get these is through an API offered by these services. The activities from the timeline should also be gathered from other external services. I’ve been looking at some of these services to determine the feasibility of these visualisations, I will make a separate post about it. As always comments are welcome!

Visualisation Proposal

November 25, 2009

In this post I’ll be explaining a proposal I’ve worked out. This proposal contains a visualisation of the online impact of a paper. I’ll start of with explaining what and why I’m making this visualisation. Next I’ll show some drawings of the concepts of the visualisation I want to make.

Problem

On the figure above is an example of a page showing information about a paper. You immediately notice that this page only contains information stored in this open repository. There is no link with other online services scientist use like Delicious, Twitter, … A user might have added the same paper to his library in Mendeley or bookmarked it using CiteULike. Showing information of the impact of the paper in other services would offer the user a better insight in the importance or activity surrounding this paper. I’d like to create a visualisation to solve this problem so that this visualisation could be added as a widget to an open repository like Lirias.

Requirements of the visualisation

Before creating a visualisation it should be clear what I would like to visualise so I made a list of requirements.

  • The user should be offered an overview of the impact of the publication he’s consulting. This overview should be adjustable to the user needs from superficial like number of references or tags to a more detailed view of the most used tags.
  • Next to impact the user should also be able to discover new trends or related papers. There are several ways to discover new interesting material like for example using tags, twitter messages, … The user should be offered the choice between these possibilities.

Visualisation

I’ve made some drawing on paper of what I’d like to visualise, I’ve created a visualisation that consists 3 separate visualisation. The first visualisation is a timeline showing all the activities related to a paper. Another visualisation offers the user an overview of all the papers having one or more of the tags used with the current paper. The last visualisation shows a graph with related Mendeley libraries which a user can explore using a treemap. I’ll explain these visualisation in more depth in the next paragraphs

Timeline

To give an overview of the impact of a paper a user might be interested if there are any activities on other services related to the paper. An example could be that another user bookmarked the paper. To visualise this I used a timeline, showing a time based overview of all the activities. As can be seen in the above figure an activity is shown by a round icon, the icon should identify the service. Important in this visualisation is to let the user filter the shown information this is done in several ways. First the user is able to select either year or month view of the timeline, this is simply done by clicking a year or month at the bottom of the timeline. Next an other filter option is  a double slider allowing to select the time span. The user is shown information about the activity by clicking the icon or looking at the result list beneath the timeline.

Tags

Knowing which tags people use when bookmarking a paper is a can be help to discover new papers. This visualisation is a graph showing both papers and tags. In the middle is the paper the user starts from surrounded by tags used with this paper. These tags are connected to papers that have also been bookmarked with this tag. The size of the circles and tag icons show the popularity of the tag or paper. Again the user is enabled to filter which tags are shown, the depth of the related papers, … Beneath this visualisation a list of papers is shown that are on the visualisation. This list can be filtered for example by clicking on a tag node only papers with that tag are shown.

Mendeley libraries

In this next visualisation I’m trying to take advantage of Mendeley user data, it would be interesting to know which libraries of user are of high interest for a user. Knowing which library has similar or related content would enable a user to discover new papers. In the figure above the visualisation is a graph showing related libraries that have characteristics in common to a paper. These characteristics could be:

  • Has current paper in the library
  • Same author, co-author in library
  • The library has papers with the same tags

These are just a couple of characteristics that come in to my mind. Again a user will be allowed to filter on these characteristics, making the user in control of the data. By clicking on a library icon a user can choose between a treemap or tag based view on the library. The treemap is based upon the characteristic mentioned above and these determine the size of the squares which represent papers. Beneath the treemap a list of papers present in the library is shown, clicking on a square in the library or list shows a detailed view of the paper. The tag based view shows a list of tags in rectangles, rectangles increase in size by the relevance or use of the tags. The tags can be clicked to filter the list of resulting papers beneath the tags.

Conclusion

The proposal of this visualisation is of course in progress meaning that I will be looking to get feedback on it and also still need to look at the technical possibilities of these visualisation. In the next blog posts I’ll show some more detailed screens of the visualisation and check whether it is technically possible.  Any comments on this post are welcome.