There are tons of social tools for scientists online, and the somewhat lukewarm adoption is a subject of occasional discussion on friendfeed. The general consensus is that the online social tools, in general, which have seen explosive growth are the ones that immediately add value to an existing collection. Some good examples of this are Flickr for pictures and Youtube for video. I think there’s an opportunity to similarly add value to scientists’ existing collections of papers, without requiring any work from them in tagging their collections or anything like that. The application I’m talking about is a curated discovery engine.
There are two basic ways to find information on the web – searches via search engines and content found via recommendation engines. Recommendation engines become increasingly important where the volume of information is high, and there are two basic types of these: human-curated and algorithmic. Last.fm is an example of a algorithmic recommendation system, where artists or tracks are recommended to you based on correlations in “people who like the same things as you also like this” data. Pandora.com is an example of the other kind of recommendation system, where human experts have scored artists and tracks according to various components and this data feeds an algorithm which recommends tracks which score similarly. Having used both, I find Pandora to do a much better job with recommendations. The reason it does a better job is that it’s useful immediately. You can give it one song, and it will immediately use what’s known about that song to queue up similar songs, based on the back-end score of the song by experts. Even the most technology-averse person can type a song in the box and get good music played back to them, with no need to install anything.
Since the reason for the variable degree of success of online social tools for scientists is largely attributed to the lack of participation, I think a great way to pull in participation by scientists would be to offer that kind of value up-front. You give it a paper or set of papers, and it tells you the ones you need to read next, or perhaps the ones you’ve missed. My crazy idea was that a recommendation system for the scientific literature, using expert-scored literature to find relevant related papers, could do for papers what Flickr has done for photos. It would also be exactly the kind of thing one could do without necessarily having to hire a stable of employees. Just look at what Euan did with PLoS comments and results.
Science social bookmarking services such as Mendeley, or perhaps search engines such as NextBio, are perfectly positioned to do something like this for papers, and I think it would truly be the killer app in this space.
I agree that, in principle, journal article recommenders could be invaluable for researchers. I’ve developed a prototype on CISTI Lab (a link to the demo can be found on http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/Synthese_Recommender). Future implementations of this demo will do better than what it currently does (collaborative filtering on citations and user-collections) I will be adding content-based filtering as well.
However for recommenders to be successful in practice there needs to be the perfect storm of (a) a critical mass of users (b) a large and full-text collection from multiple publisher sources with good metadata and (c) an “explaining” interface for recommendations that allows the user to understand how the recommendation came about. The literature on recommender systems shows pretty clearly that the user’s “trust” in a recommender is critical to its long-term success (repeated use.)
I think this will ultimately happen, but there’s going to be an evolution process for recommenders before they become a staple of scientific search engine portals.
All good points, Andre, and the Synthese demo looks pretty good. The one place where I think improvement would drive things forward is your first point: recommendations requiring a critical mass of users.
Pandora doesn’t require a critical mass of users, which is why I bring it up. It required a back-end annotation/analysis of the corpus from which recommendations are made. I guess this is kinda similar to MeSH, but I’ve always found MeSH to be somewhat unhelpful, perhaps because it’s more of a taxonomy than a set of attributes.
I wanted to try it on a paper I know to be fairly “nodal” in the field, but searching for the whole title didn’t bring up that paper. Maybe taking a DOI as a query would help?
Also, Sorting/grouping by author, date, and times cited are important features for any system that presents a list of citations, because those are the ways that you make sense of the returned results from queries of scientific literature.
Thanks for the suggestions William. I didn’t spend enough time on the overall DL-experience because I wanted to focus on the “rate the recommender” part of the experiment. But yes sorting by author / date / journal etc. would be nice and DOIs would be nice too.
I should also have made it clear that the collection I’m using is a rather an ad-hoc list of journals that ends in 2005 and is by no means complete.
What I need are more experts like yourself to give me just his kind of feedback. Many thanks again.
Pingback: Mendeley Blog » Blog Archive » A human-scored research paper recommendation engine?
Doesn’t Faculty of 1,000 already fill this role in some ways?
http://www.facultyof1000.com/
It never really achieved much traction as far as I know.
I would think that getting the “expert” reviewers that people are really willing to listen to may be the biggest non-technical stumbling block here. Those whose opinions would be most useful are the least likely to participate.
You’re right, David that expert participation and attention is a difficult thing to get, and the real experts tend to be the busiest ones. I wasn’t thinking about reviews, though. I was thinking of something more along the lines of annotation. You don’t necessarily need the big dog in the field to score a paper along the lines of techniques used, studied under, is a response to, etc. I was really thinking more along the lines of putting the supposed PhD excess to use. Maybe one of the science social networking companies could use analysts like Pandora uses music analysts?
I always wonder if annotations are worth the time required, at least when compared with a good search engine. I know in my experience with Connotea and CiteULike and such things, that tagging is a major pain. Each paper takes time to tag, then you add another paper with a new tag, and you realize some of the older papers could use that tag as well, so you have to go back, etc., etc. And then even if you tag a paper for something, it might not be relevant to what you’re later looking for that’s in that paper. I think that’s the big reason you see so much effort from Google, Microsoft and Apple on desktop searching, with the idea that it’s easier to search than it is to organize.
Although I would say that social networking approaches can make searching vastly easier, with GoPubMed (http://www.gopubmed.org) as an example, with their curated hierarchy.
Pingback: Bench Marks » Blog Archive » Why article tagging doesn’t work
Pingback: Science Spotlight - February 24th, 2009 | Next Generation Science
Pingback: Not such a killer, perhaps
I believe the development in this software would be a breakthrough to the scientific community, with professors students and researches alike taking full use of this medium for their own specific reasons. I think Andres beta searcher is a step in the right direction, as is Faculty of 1000 and GoPubMed.com There needs to be a collaboration and compromise to create an engine that is as powerful as Pandora or Last.fm for the scientific community.
Funny you say that, Gary, because Mendeley proposes to be exactly that, and the lead investor in last.fm is doing the same for Mendeley.
They just released a new version, so you may want to give them a try.
Pingback: William Gunn joins Mendeley as Community Liaison | Mendeley Blog
Pingback: Social Networking, Semantic Searching and Science « Dennis’ Blog
Pingback: The problem of linking and referencing | Israeli Software
I bet Google is currently working on something like this. They aren’t far off right now. They have Google Scholar, and clicking the Related Articles link seems to produce fairly relevant results to the initial document. They don’t seem to serve ads in Google Scholar. I wonder why they just don’t serve documents on the right side of the page based of a user’s interaction with their results. They have a lot of patents related to refining results based off of a series of searches.
Other applications that would help a great deal are:
– Simply rating the relationship between two documents. Most systems rate individual pieces of content, but I have yet to see anything that offers the ability to rate a relationship.
– When it comes to tagging, Calais looks promising (http://www.opencalais.com/). They can crawl your content and programatically tag it based off relationships found in the Calias RDF database. Drupal currently offers some plug and play modules that make this a lot easier. (http://drupal.org/project/opencalais). Other good modules for this:
– http://drupal.org/project/calais_marmoset
– http://drupal.org/project/morelikethis
– http://drupal.org/project/topichubs
The entire concpet of decision engines is amazing!
Mr Gunn, awesome analysis! I just wanna add that Facebook have recently been widely used as a form of collaboration tool in various communities and I do forsee its value within the science crowd. The social aspect aside, I think Facebook’s grouping functions gives an element of fun, not to mention the hundreds of apps that can be used to enhance collaboration and communication.
Thanks, Ed. I remains skeptical about the value of Facebook, as it’s a walled community. That’s antithetical to collaboration and sharing. In fact, I don’t even think it would be appropriate to use research that’s publicly funded to add value to a commercial enterprise that has shown such a low regard for privacy and whose whole business model is based on selling user information. Friendfeed served this role much better, which is probably why Facebook bought&buried them.
This could indeed help scientists to get relevant data faster that is ranked with algorithm and with user input. I whish I could speak out my thoughts more clearly because of the language barrier I have but this post opened my eyes a bit. How bout that new scientific search engine Wolfram Alpha? Would that be any good source or platform with this kind of experiment?
This doesn’t surprise me at all. There are social networks for everybody, not just ones that cater for the ‘general’ but also specific niches like this. They do not need to be tweaked too much in order to achieve the correct functionality.