Choosing Colors for Data Visualization

March 2, 2010

This article explains that a good use of colors can enhance and clarify a presentation, when used poorly it will have a negative effect. The use of color is all about function: what information are you trying to convey, and how color can enhance it. The author uses a lot of examples to help us understand what the effects of colors are. Within this summary I’ll take some of the conclusions about the examples which I think could be important within my thesis.

One of the functions of colors is to distinguish one element from another, but one should not forget that all visible parts of a presentation must be some color and if all are taken together they must be effective. Effective in this case is making it easy for the viewer to understand the roles and relationships between the elements. To do this one could define categories of information and group and order the information. Using color will group related items and command attention in proportion to importance.

A next step is choosing an effective set of colors, to understand this the author explains the principles of color design. contrasting colors are different, analogous colors are similar. Contrasting draws attention, analogy groups. In color design, color is specified by three dimensions. The first, hue is the color’s name and is typically drawn as a hue circle. Analogous hues are close together and contrasting hues are on the opposite side of the hue circle. Next is the value of a color which is the perceived lightness or darkness of the color. Contrast in value defines legibility as well as having a powerful effect on attention. Last is the chroma which indicates how bright, saturated, vivid or colorful a color is. High chroma colors are vivid and bright. Using darker and grayer has many benefits: looks less garish, more sophisticated, …

Different dimensions have different application to information display. Making related items the same color (analogues hue) is a powerful way to label and group. Hue contrast is easy to overuse to the point of visual clutter, a better approach is to use a few high chroma colors as color contrast in a presentation consisting primarily of grays and muted colors.

Legibility

Legibility means being able to read, decipher, discover and to be understood. Difference in value between the symbol and its background is important for legibility. The higher the luminance contrast (difference in value) , the easier it is to see the edge between one shape and another. Variation in luminance can also be used to separate overlaid values into layers, where low contrast layers can sit behind high contrast ones without causing visual clutter. A primary rule in many forms of design is “get it right in black and white”, meaning that important information would be legible even if chroma were reduced to zero.

Summarized these previous statements tell us to “assign color according to function”.

  • use contrast to highlight
  • analogy to group
  • control value contrast for legibility

Most design situations, the best results are achieved by limiting hue to a palette of two or three colors, and using hue and chroma variations within these hues to create distinguishable different colors. The article gives some examples that makes things more clear. ColorBrewer is a website that helps choose colors for data display and is refered to by the paper. The examples of the paper always use a white background and the contextual information are shades of gray. A general rules is to make background white and its supporting information shades of gray this provides the most effective foundation for your color palette.

The paper ends by a few notes on background color, noting that most color palettes are designed to be printed on white paper. White as a background color has the advantage that the human visual system is designed to adapt its color perception relative to the local definition of white. A white background gives a stable definition of white, and a stable “surface” to focus on.

Thesis

This paper helped me realize that colors are very important to making a visualization easy to understand, I already applied the contrast rule for all of my tags within my graph. I will most likely change my background of my application to white and give the supporting information appropriate colors.

Advertisements

Toward Measuring Visualization Insight

March 2, 2010

This paper starts with telling us that one of the purposes of visualization is gaining insight. It is hard to define insight when it comes to visualizations so the article identifies some essential characteristics of insight. Insight is: complex, deep qualitative, unexpected and relevant. An insight is more interesting if it has more of these characteristics. Often visualizations are evaluated using controlled experiments. When benchmark tasks are used in these controlled experiments they are not proper tools for measuring insight. This method depends on the fact that these benchmark tasks and metrics represent insight. According to the author there are four fundamental problems compared to the previously mentioned characteristics:

  • they must be predefined by test administrators, leaving little room for unexpected insight and even forcing users into a line of thought that they might not otherwise take.
  • they need definitive completion times
  • they must hav definitive answers that measure accuracy
  • they require simple answers

This forces the experimenter to search-like tasks that don’t represent insight well. These benchmark tasks are far too simplistic and constrained to indicate insight of a visualization. A claim often made to generalize results of simple benchmark tasks is that complex tasks are build from simple tasks. The author counters this, first of all efficiency of simple benchmark tasks is often due to specific visualization interface features that don’t generalize to more complex tasks. Second a clear decomposition doesn’t exist yet. Another problem often arising in the interpretation of the benchmark results is the tradeoff between performance and accuracy. Often users are forced to continue until correctly completing a tasks, leading to trail-and-error approach by users and a misrepresentation of accuracy. It is concluded that controlled experiments on benchmarks are not the right method to evaluate insight.

First of all the author suggest to include more complex benchmark tasks, this still involves some uncertainty because these tasks generally support visualization overviews rather that detail views. Another method could be to let users interpret visualization into a textual answer but this is difficult to score. Allowing multiple-choice could again lead towards biasing the user. Also these methods lead to longer tasks times and a larger group of participants to be tested to get statistically significant results.

A second suggestion is made to eliminating benchmark tasks and letting researchers observe what insights users gain on their own. Using a open-ended protocol is a possible method, users are instructed to explore the data and report their insights. A qualitative insight analysis is also a possible solution like the think-aloud protocol. For each insight, a coding method quantifies various metrics (insight category, complexity, …), these categories can be assigned to common clusters like usability, … The coding converts qualitative data to quantitative but still is subjective but supports the qualitativeness of insight. The advantage of eliminating benchmark tasks is that they reveal what insights visualization users gained. The measures are closely related to the fundamental characteristics of insight (previously mentioned). These insights can also be compared to insights a researcher expected users to gain.

The author concludes with pointing to the fact that both types of controlled experiments are needed. Benchmark tasks for low-level effects and the eliminating of benchmarks tasks for a broader insight. Noted is that if combining both approaches into a single experiment, benchmark taks should not precede the open-ended portion this could lead to constraining the user.

Thesis

This article helped me to understand that I need to pay more attention to the open-ended portion of the evaluation of my visualizations. I will combine both methods to gain more information, in my previous evaluations I allowed the user to explore the visualisation for a very short time this should be extended. I’ll also need to note what I kind of insights I’d like to achieve from my visualization and compare these to the insights gained from the evaluation. In my previous evaluation I also noticed how hard is to find a good benchmark to test the visualization, this article confirms my thoughts that these are often to simple and force the user in a certain direction. I’ll also need to pay more attention to how I will formulate my question to not bias a user.

Web API and start of the first visualisation

February 12, 2010

After a closer look at the services available to access data from social bookmarking sites (see earlier blog post), the next step was to create a Web API and to start creating a first visualisation. The last few weeks I’ve been working on both, this blog post is a short description of the work I’ve done these last weeks.

Web API

The services I chose to start implementing were Delicious and Bibsonomy,  after a small investigation on a number of services as explained in my presentation of 17th of December. Delicious offers an API to access your own bookmarks but not other users bookmarks. So I adapted an existing web scraper in Java to retrieve bookmarks from Delicious. Bibsonomy offers a Java API which made retrieving bookmarks of the site easy. After retrieving the bookmarks using Java, I started working on the web service. I chose REST over SOAP because development for REST is easy and messages are easy to understand. I used Restlet a RESTful web framework for Java which made it easy to develop a web service. The web service will support both XML and JSON as a response format to make the web service more flexible to use.

The URL of the request to the web service currently looks like this:

http://domain:port/bookmarks/titlePaper

The only argument is the title of the paper, of course I will add other parameters like type of web service, response format, author, …  I’ll also offer more request like a request for all tags used for a certain paper on a social bookmarking service, … The response to the request previously mentioned will look like this.

Visualisation

My first visualisation of tags from a bookmark was based upon the Cluster Map of Aduna. But since this isn’t available any more without a license I started searching for a environments offering visualisation frameworks. A framework I found was the Flare framework. A demo of the framework showed a nice circle packing visualisation (check: demo -> layouts -> circle pack). I thought it would be easy to recreate the Cluster Map visualisation of Aduna, so I started recreating the nice edges of this visualisation. After creating my own edge object I was able to create the following type of edge:

But after trying to create apply the edge to a circle packing layout I ran into several problems: I’m unable to create multiple circle packs in one visualisation, creating multiple visualisations you are not able to control where a circle pack shows up on the screen, … The problem made me take a step back and only use simple circles. In stead of showing a circles within another circle (=circle pack), I’m going to use circle size to show the number of bookmarks with a certain tag. To get an idea of how I would want my visualisation to look I created the following mockup:

This mockup was created by manually drawing circles on the screen and using my own edge object using Flex. The grey circle in the middle shows the paper which the user starts from, smaller coloured nodes are connected to the center circle showing the tags used to bookmark the paper. The larger circles show groups of bookmarks, these circles are connected to the tag circles, a group of bookmarks can be tagged by multiple words like Science and Social bookmarking.

Next week

Within the next week I’ll create my own visualisation object as to create my own type of visualisation as the one’s offered by Flare don’t seem able to create the kind of visualisation I need. I will also search for other visualisation frameworks for Flex that could help me in building my visualisation. Any comments or ideas are welcome!

Thesis Presentation December 17th

December 19, 2009

I did a presentation with a status update of my work during the last month. I uploaded the slides of my presentation (these slides are in Dutch).

Visualisation Proposal: Accessing Data & Available Visualisation Frameworks

November 27, 2009

To create the visualisation I proposed I need to access a lot of external services like Delicious, Twitter, … In this post I’ll take a first look at what are the possibilities of accessing data from a few of these services. I’ll end with a conclusion explaining based on the data availability which visualisation I’d like to start prototyping.

Accessing Data

The first question I asked myself was what information do I need to start searching on these services. The starting point of course is the metadata from the paper stored by the open repository. What information could be used to search for on other services, there is currently no such thing as a unique identifier for a paper on all services. But there is of course still enough information to start exploring for information on other services. When publishing a paper on Lirias a user is asked to at least fill in one of the following identifiers:

  • URI – Universal Resource Identifier – e.g.. http://www.Lirias.org/help/submit.html
  • ISSN – International Standard Serial Number – e.g. 1234-5678
  • ISBN – International Standard Book Number – e.g. 0-1234-5678-9
  • DOI – Digital Object Identifier – e.g.. 10.1000/182
  • Other – A unique identifier assigned to the item using a system different from the specified ones.

Next to these identifiers the DSpace software also allows to use Handles. Handles are a way of assigning globally unique identifiers to objects within the DSpace system. These identifiers are persistent and make it possible to for users to bookmark files from Lirias. A Handle can be formatted as a URL and is thus the ideal way for a user to create a bookmark without ever having a broken link. When searching on other external services for a Lirias document often these Handles will be found as a reference.

These different identifiers of course offer more ways to search for the presence in a certain. But they also make the process of searching a bit more complicated e.g. different identifiers are present on the same or more services. In the next paragraphs I’ll be looking at what types of identifiers service allow to search on with their API.

Delicious

The Delicious API is limited to perform actions on the profile of a user and doesn’t allow any global search so it’s only for personal bookmarks. On the site of Delicious there is the possibility to search for a URL.

Bibsonomy

The API of Bibsonomy offers no explicit way to search for ISBN, DOI, … but it does offer a full text search. The search covers all available metadata for a post (e.g. title, authors, ISBN, DOI, …) as well as associated tags. This technique might result in  undesirable results:

Special characters in search terms: Please note that all special (i.e. non-alphanumeric) characters occuring in search terms apart from “_” and “‘” are treated as search term separators – if you search e.g. for an ISBN like /posts?search=978-0-387-71000-6 , then you also an entry with ISBN 387-0-978-71000-6 will be matched, because the number blocks are treated as distinct search terms, which is not what you want.

Connotea

The Connotea API allows to search for tags and URI/URL. This example http://www.connotea.org/data/tags/uri/http://www.google.com/ returns a list of tags for the Google-site.

Other bookmarking sites

An other bookmarking tool I’ve been looking at is CiteULike, this site does not seem to have a fully developed API. I only found an API for developing plugins, which at first sight doesn’t seem to be updated since 2006. The bookmarking service Zotero also offers an API but it’s limited to development within Firefox and is meant for development of extensions accessing Zotero.

Twitter

When searching for activity on Twitter, the biggest problem is what are we looking for. The first search I thought of was the URL of a paper. Twitter has a well documented API, allowing to search for a specific URL. But the limitations of the 140 characters on Twitter forces people to use URL shortening services like TinyURL. So when searching for a particular URL it will not return all related post because of the use of the URL shortening services. A solution to this problem is that Backtweets offers a Twitter search including these post and offers an API.

Another way to find interesting tweets is by searching for a hash tag used during a conference where a paper was presented. But how can I get to know the used hash tags of conferences? This is a question that still needs to be answered. Also there might be some other tweets related to a paper scientists might be interested in that I don’t know of, I hope to discover these during the evaluation of the visualisation.

Blog posts

The hardest information to receive is blog posts related to a paper, there are a lot of blog services like WordPress, Blogger, … The APIs offered by these services are either limited to simple retrieval not allowing to search for a mention of a specific URL or they only offer an API for plugin development. Another issue is that scientist also blog on social networks like Nature Network so blogs are widespread just looking at one or more blog services might not give a realistic view of the blogging activity.

A possible solution to this problem might be Google Alerts which allows you to receive an alert whenever a search term is mentioned on a blog. An issue that arises is that these alerts are accessible via RSS feed or e-mail and there is no API. So whenever searching for a URL a Google Alerts needs to be created and verified, this is time-consuming. This issue needs some more research on how I will achieve to receive blog posts that mention an article.

Available Visualisation Frameworks

As I want to start evaluating and developing a prototype within the next weeks I need to make a choice in which visualisation to start with. In this blog post I’ve taken a look at what information I need to get from these external services. It seems that at this moment I’m only able to find information related to a URL with the API offered by Connotea. It also is not clear what information I’m going to show on the timeline: what tweets are useful? how do I get blog posts mentioning a URL? These are some of the question still unanswered.

Another aspect that I might need to take in account is what kind of visualisation frameworks are already available for the development of these visualisations. The timeline visualisation could be developed using SIMILE Timeline Widget which only needs some data as in put. For the tag and library visualisation I might be able to use the relation browser from Moritz Stefaner. Another possibility for the tag browsing would be the cluster map used in the visualisation of social bookmarks. In case of using the relation browser a lot of bookmarks wouldn’t give a good overview but a cluster map would.

By the lack of information I think starting with the tag visualisation is the best way to go because the timeline activities are still uncertain and for the library visualisation I need user data from Mendeley. I seems that a there are enough frameworks to start from although I haven’t looked at how difficult it would be to start developing with them.

Do you have any suggestion on how to easily get tag and paper information or got a answer to the any of my questions (where to get related blogposts, what twitter messages are useful, …) please let me know. Comment are always welcome!

Visualisation Proposal: Scenarios & Data

November 26, 2009

After telling what I’d like to visualise I would like to explain some more about the visualisation proposal. For evaluating my visualisation I was asked to think of some scenarios users of the visualisation could perform. Also it was not clear in my previous post which data I would like to use.

Scenarios

Important in the evaluation of the visualisation are scenarios that I could ask users to perform. Performing these scenarios user can get an idea about the visualisation and give some feedback. Here are the scenarios I’ve come up with, when I think of or receive feedback on other scenarios I’ll be sure to add them.

Timeline Visualisation

  • Search for all bookmarks or tweets or other type of activity made with a service within a specific period.
  • Look for the period with the most activity on the paper.
  • Search for an activity with a specified text, tag, …
  • Retrieve the day or week of an activity.
  • Look at the detail of an activity.

Tag Visualisation

  • Look up a paper indirectly related to the current paper.
  • Find all papers with 2 or more tags in common with the current paper.
  • Find all papers with one specific tag.

Mendeley Library Visualisation

  • Find the paper most related to the current paper from a library.
  • Select all papers with one or more tags from a library.
  • Find all libraries at least containing the current paper.

Data

In my previous post about the visualisation proposal I explained that I’d like to visualise tags. These tags would be received from other services, a possible way to get these is through an API offered by these services. The activities from the timeline should also be gathered from other external services. I’ve been looking at some of these services to determine the feasibility of these visualisations, I will make a separate post about it. As always comments are welcome!

Thesis Presentation November 17th

November 25, 2009

Last week I did a presentation about the progress of my work, I uploaded the slides of my presentation (these slides are in Dutch).

Visualisation Proposal

November 25, 2009

In this post I’ll be explaining a proposal I’ve worked out. This proposal contains a visualisation of the online impact of a paper. I’ll start of with explaining what and why I’m making this visualisation. Next I’ll show some drawings of the concepts of the visualisation I want to make.

Problem

On the figure above is an example of a page showing information about a paper. You immediately notice that this page only contains information stored in this open repository. There is no link with other online services scientist use like Delicious, Twitter, … A user might have added the same paper to his library in Mendeley or bookmarked it using CiteULike. Showing information of the impact of the paper in other services would offer the user a better insight in the importance or activity surrounding this paper. I’d like to create a visualisation to solve this problem so that this visualisation could be added as a widget to an open repository like Lirias.

Requirements of the visualisation

Before creating a visualisation it should be clear what I would like to visualise so I made a list of requirements.

  • The user should be offered an overview of the impact of the publication he’s consulting. This overview should be adjustable to the user needs from superficial like number of references or tags to a more detailed view of the most used tags.
  • Next to impact the user should also be able to discover new trends or related papers. There are several ways to discover new interesting material like for example using tags, twitter messages, … The user should be offered the choice between these possibilities.

Visualisation

I’ve made some drawing on paper of what I’d like to visualise, I’ve created a visualisation that consists 3 separate visualisation. The first visualisation is a timeline showing all the activities related to a paper. Another visualisation offers the user an overview of all the papers having one or more of the tags used with the current paper. The last visualisation shows a graph with related Mendeley libraries which a user can explore using a treemap. I’ll explain these visualisation in more depth in the next paragraphs

Timeline

To give an overview of the impact of a paper a user might be interested if there are any activities on other services related to the paper. An example could be that another user bookmarked the paper. To visualise this I used a timeline, showing a time based overview of all the activities. As can be seen in the above figure an activity is shown by a round icon, the icon should identify the service. Important in this visualisation is to let the user filter the shown information this is done in several ways. First the user is able to select either year or month view of the timeline, this is simply done by clicking a year or month at the bottom of the timeline. Next an other filter option is  a double slider allowing to select the time span. The user is shown information about the activity by clicking the icon or looking at the result list beneath the timeline.

Tags

Knowing which tags people use when bookmarking a paper is a can be help to discover new papers. This visualisation is a graph showing both papers and tags. In the middle is the paper the user starts from surrounded by tags used with this paper. These tags are connected to papers that have also been bookmarked with this tag. The size of the circles and tag icons show the popularity of the tag or paper. Again the user is enabled to filter which tags are shown, the depth of the related papers, … Beneath this visualisation a list of papers is shown that are on the visualisation. This list can be filtered for example by clicking on a tag node only papers with that tag are shown.

Mendeley libraries

In this next visualisation I’m trying to take advantage of Mendeley user data, it would be interesting to know which libraries of user are of high interest for a user. Knowing which library has similar or related content would enable a user to discover new papers. In the figure above the visualisation is a graph showing related libraries that have characteristics in common to a paper. These characteristics could be:

  • Has current paper in the library
  • Same author, co-author in library
  • The library has papers with the same tags

These are just a couple of characteristics that come in to my mind. Again a user will be allowed to filter on these characteristics, making the user in control of the data. By clicking on a library icon a user can choose between a treemap or tag based view on the library. The treemap is based upon the characteristic mentioned above and these determine the size of the squares which represent papers. Beneath the treemap a list of papers present in the library is shown, clicking on a square in the library or list shows a detailed view of the paper. The tag based view shows a list of tags in rectangles, rectangles increase in size by the relevance or use of the tags. The tags can be clicked to filter the list of resulting papers beneath the tags.

Conclusion

The proposal of this visualisation is of course in progress meaning that I will be looking to get feedback on it and also still need to look at the technical possibilities of these visualisation. In the next blog posts I’ll show some more detailed screens of the visualisation and check whether it is technically possible.  Any comments on this post are welcome.

Should Scientists Be Tweeting?

November 2, 2009

The article tells us there’s a growing number of science Twitterers. They consider Twitter a useful tool to share their insights about recently published papers and science presentations or discussions, as well as information about grants, careers, science policy, …

Why scientist use Twitter

A few scientist tell about the reasons why they use Twitter. For scientist Twitter is a single source where you can go to scan news and papers. Another reason mentioned is that Twitter is another source for tips on papers, often people who a scientist follows recommend papers that the scientist didn’t come across. This way the scientist feels more up to date on his science literature.
Twitter is also often used to report on interesting (or sometimes dreadful) presentations heard at a scientific conference. Because of some controversy people must sometimes obtain permission from the presenting author to use Twitter during a presentation. The main reason behind this is that unlike regular blogs or news articles, tweets have the potential to spread like wild fire.

Disseminating scientific information is a driving mission for many Twitter users.  Twitter gives scientists a way to communicate their work to non-scientists and allows anyone to see science in a way that is more accessible.

Twitter also offers other people a window into the life of a scientist. Scientist can write stories that educate and publicize
science, and more accurately explain what scientists do to lay people. Twitter and regular blogging are an effective way of telling people about your work.

Potential

The article also explains that there are still small number of science Twitters compared to the numbers of scientists who could join. This is not just the case for Twitter but also for online social networking. Part of the problem is that Twitter has a reputation for being a social venue for friends to tell each other about their daily activities. But like explained above Twitter is more than this.

Too short?

The 140-character forces people to be concise and creative and makes others more likely to read the messages. But also has some limitations like the fact that you cannot have a decent, full-blown, high-level scientific debate via Twitter messages. This might be the reason why some scientist aren’t joining Twitter, a possible alternative could be FriendFeed which allows user to post longer messages.

Thesis

Twitter is a great service to spread new papers and I’m sure the number of scientist using it will grow. The reasons why scientist are using Twitter are interesting, not only do they use the Twitter to find news and scientific papers. But they also use it as a window into their everyday research activities. Twitter also allows to communicate with non scientific people. Twitter offers search, recommendation, sharing, …

In my opinion Twitter can enable people to spread the word about a paper quickly which is a great advantage compared to the publishing in journals which might only reach a limited group of people. Discussion on Twitter about a paper might be considered an indicator of a positive or negative impact.

Reference tools

October 22, 2009
Online reference tools
Researchers used to keep track of their references manually, with the arival of computers, software tools where developed for acadamic publications. Some of these tools are desktop applications but most of them are inspired by Delicious and offer online services. A good comparison can be found on Wikipedia. When looking at the tools that excist one could say there is a devision between tools that make citing papers easy (Endnot, RefWork, …) and tools that make sharing easy (CiteULike,Connotea, …). In the next few paragraphs I’ll explain some of the features offered by the 2 kind of tools. (Ref’s)
Desktop applications
Tools like EndNote allow users to manage their publications on the desktop, like mentioned before the main task of this tool is to make citing easier. So these tools don’t enable easy ways of sharing, most of the times the way to share your references with others is by exporting the database and mail it to your friend or colleague. Often these tools also allow users to manage their PDF’s of publcitations on their computer like Mendeley. Another advantage is that most of the desktop tools also have word processor integration although some online bookmarkings also offer this functionality.
Social bookmarking tools
A lot of social bookmarking tools exist now a days of mostly inspired by Delicious. But they deffer from Delicious because they have an acadamic public. The tools are mostly a meld of existing reference management conventions and new social bookmarking concepts (ref’s).
By moving the reference management online, the tools are able to offer a lot of social features like:
commenting
tagging which enables discovery
sharing of references
recommending
Next to social features their are also the advantages of being able to access your references from everywhere. Bookmarks are added to your library by using bookmarklets, this enables the user to quickly add them while doing another task or some research. The social aspect is increased by allowing groups to be made. In this way groups of researchers can collaborate and share references, this is a lot harder with desktop applications.
privacy (private bookmarks) problem
Conclusion
It is a good thing to see so many reference tools out there, in the beginning I mentioned a quote about the devision between tools that make cinting easier and those that make sharing easier. This is because a desktop often offers faster response, local management of files and has closer integration with word processor. The last one can easily also be offered by online tools. A good example of a combination of a strong desktop application and an online profile is Mendeley. The webbased profile is very limited and doesn’t come close to the functionality of discovery and tagging that tools like Zotera, … offer. A social bookmarking tool can offer almost the same functionality as the desktop applications accept for managing your publications on your local disk.
In my opinion the social features these social bookmarking tools offer make them superiour to the desktop applications. I think some of these social bookmarking site should also realise they should offer word processor integration to make it easier for users of desktop applications to switch.
A strong social bookmarking tool should offer
word processor integration
support most of the acadamic databases and search engines to be imported
support a wide range import & export file formats
BibSonomy
Zotero

Researchers used to keep track of their references manually, with the arival of computers, software tools where developed for acadamic publications. Some of these tools are desktop applications but most of them are inspired by Delicious and offer online services. A good comparison can be found on Wikipedia. When looking at the tools that excist one could say there is a devision between tools that make citing papers easy (EndNote, RefWorks, …) and tools that make sharing easy (CiteULike,Connotea, …). In the next few paragraphs I’ll explain some of the features offered by the 2 kind of tools.

Desktop applications

Tools like EndNote allow users to manage their publications on the desktop, like mentioned before the main task of this tool is to make citing easier. So these tools don’t enable easy ways of sharing, most of the times the way to share your references with others is by exporting the database and mail it to your friend or colleague. Often these tools also allow users to manage their PDF’s of publcitations on their computer like Mendeley. Another advantage is that most of the desktop tools also have word processor integration although some online bookmarkings also offer this functionality. Another way of achieving a desktop like feeling is with a Firefox-extension like Zotero does or an integration in a word processor.

Social bookmarking

A lot of social bookmarking tools exist now a days of mostly inspired by Delicious. But they deffer from Delicious because they have an acadamic public. While Delicious deals with simple URL’s, citations are a bit more complex and contain metadata like author, journals, … . The tools are mostly a meld of existing reference management conventions and new social bookmarking concepts.

By moving the reference management online, the tools are able to offer a lot of social features like:

  • commenting
  • tagging which enables discovery
  • sharing of references
  • recommending

Next to social features their are also the advantages of being able to access your references from everywhere. Bookmarks are added to your library by using bookmarklets, this enables the user to quickly add them while doing another task or some research. The social aspect is increased by allowing groups to be made. In this way groups of researchers can collaborate and share references, this is a lot harder with desktop applications. The online tools also offer RSS-feeds to follow certain tags, users, … Like in many science tools privacy is still an issue, some users might not want to make their bookmarks public but most tools will offer the user this possibility.

Conclusion

It is a good thing to see so many reference tools out there, in the beginning I mentioned a quote about the devision between tools that make cinting easier and those that make sharing easier. This is because a desktop often offers faster response, local management of files and has closer integration with word processor. The last one can easily also be offered by online tools.

A good example of a combination of a strong desktop application and an online profile is Mendeley. The webbased profile is very limited and doesn’t come close to the functionality of discovery and tagging that tools like Zotero, … offer. A social bookmarking tool can offer almost the same functionality as the desktop applications accept for managing your publications on your local disk.

In my opinion the social features these social bookmarking tools offer make them superiour to the desktop applications. I think some of these social bookmarking site should also realise they should offer word processor integration to make it easier for users of desktop applications to switch.

CiteULike offers some features that might convince some people to join or increase it’s popularity. This service offers the possibility to view all tags related to a journal, this offers another way to discover content and also journals might be eager to link to this service on their official website. CiteULike also has recommendations for the users and allows user to upload PDF’s (and access them from anywhere). BibSonomy also has the feature to upload and share a PDF with a group. A useful feature Bibsonomy offers is to  view relations between tags. Between BibSonomy and CiteULike, BibSonomy has the better import and export features, for example it offers export in BibTeX format and seems to be the better tool for the moment.

Did I miss a feature, tool or want to give feedback, please let me know in the comments?