Reference managers and I have a long history. All the way back in 20041, when I was writing my first paper, my workflow went something like this:
“I need to cite Drs. A, B, and C here. Now, where did I put that paper from Dr. A?” I’d search through various folders of PDFs, organized according to a series of evolving categorization schemes and rifle through ambiguously labeled folders in my desk drawers, pulling out things I knew I’d need handy later. If I found the exact paper I was looking for, I’d then open Reference Manager (v6, I think) and enter the citation details, each in their respective fields. Finding the article, I’d the select it and add it to the group of papers I was accumulating. If it didn’t find it, I’d then go to Pubmed and search for the paper, again entering each citation detail in its field, and then do the required clicking to get the .ris file, download that, then import that into Reference Manager. Then I’d move the reference from the “imported files” library to my library, clicking away the 4 or 5 confirmation dialogs that occurred during this process. On to the next one, which I wouldn’t be able to find a copy of, and would have to search Pubmed for, whereupon I’d find more recent papers from that author, if I was searching by author, or other relevant papers from other authors, if I was searching by subject. Not wanting to cite outdated info, I’d click through from Pubmed to my school’s online catalog, re-enter the search details to find the article in my library’s system, browse through the until I found a link to the paper online, download the PDF and .ris file(if available), or actually get off my ass and go to the library to make a copy of the paper. As I was reading the new paper from the Dr. B, I’d find some interesting new assertion, follow that trail for a bit to see how good the evidence was, get distracted by a new idea relevant to an experiment I wanted to do, and emerge a couple hours later with an experiment partially planned and wanting to re-structure the outline for my introduction to incorporate the new perspective I had achieved. Of course, I’d want to check that I wouldn’t be raising the ire of a likely reviewer of the paper by not citing the person who first came up with the idea, so I’d have some background reading to do on a couple of likely reviewers. The whole process, from the endless clicking away of confirmation prompts to the fairly specific Pubmed searches which nonetheless pulled up thousands of results, many of which I wasn’t yet aware, made for extraordinarily slow going. It was XKCD’s wikipedia problem writ large.
Needless to say, the more I tried to do it right, the further and further away I got from having the manuscript completed. I ended up with a enormous Reference Manager library, only some of which was relevant to the paper or the section I was currently composing in Word. At some point in my Googling for solutions for the the myriad little annoyances I had during this process of knowledge (mis)management, I stumbled upon Alf Eaton’s Hubmed. Not only did it free me from the propellerhead search interface of Pubmed at that time and reduce the amount of clicking required to get a reference, but it turned me onto his blog, his parallel, but far more sophisticated, approach to the same problems I was having, other people who were also trying to do it right, and eventually Connotea. In short, they showed me that there was a better way. Now I could build a library, tagged by keyword, and thus only download the particular set of papers I needed at the time into a new Reference Manager library, which was still necessary for the actual writing and formatting of the bibliography in Word. Now things were looking up. I had found something that allowed me to leverage one of my strengths, lateral thinking, and still achieve the order necessary to find a specific item when I needed it.
The idea that there is a better way is an insidious one, though, and once I realized that it might actually be possible to use open source software for the whole process, I no longer had that excuse for using Reference Manager/MS Word. I learned about LaTeX, Zotero, metadata, and the semantic web. After more work requested by the reviewers, the paper eventually came out in 2006, and was submitted from my brother’s computer while I was evacuated from New Orleans for Hurricane Katrina. It was entirely written using MS Word and Reference Manager, because the journal required Word, and because Zotero couldn’t pick up cleanly from where Reference Manager had begun. The different version of Word and the lack of Reference Manager on my brother’s computer caused much difficulty in the edits needed to incorporate the additional experiments asked for by the reviewers. Knowing there was almost a better way that would actually work for me and the publishers made it all the more painful.
Subsequent papers and grant proposals were written using Connotea + Zotero, but as Zotero became more useable, Connotea became less. I attempted to share what I was learning about these new tools for doing and communicating science with my colleagues in the lab who were having the same issues and solving them in their own ways, but the activation energy to get them going with Connotea2 was too high for something that was still missing that bit of UI polish a mass appeal app needs. Citeulike just had too embarrassing of a name for me to recommend. 2collab was an interesting development, but wasn’t quite good enough for me to switch from Connotea. Less people used it and it wasn’t becoming part of the ecosystem like Connotea and Citeulike. The missing link was still integration with Word, because while Zotero’s word integration became better, the moving of information from the online services to Zotero was fraught with difficulty. A particular problem was that although these programs had been designed with tagging specifically in mind, the data exchange formats(.ris) were from the pre-tagging era and there wasn’t agreement on in which field tags should be put/found. Where URLs should be stored was another issue. Connotea had an API, but none of the other citation managers used it. My efforts dealing with this can be seen in the comments below my post on Connotea. Because by this point I had drunk deeply of the Semantic Web Big Data Open Access Collaborative Filtering Kool-aid, served in large glasses by the likes of Deepak, Peter Suber, Cameron Neylon, and JC Bradley, I was no longer content with online bookmarking of stuff; I wanted a dataset that I could do something with. I wanted recommendations and discovery and cool visualizations. I wanted serendipity.
Mendeley may not be open source, and it might not get everything just right, right now, but I think they’re heading the right way. I think Victor and Jan get it, and I asked them, subsequent to the discussion around my last post on a killer app to drive adoption of social networking among scientists, how I could help drive the adoption of their approach. I see this as taking market share away from Endnote, and promoting use of tools which help generate Big Data among people for whom social bookmarking is a total non-starter. I’m excited about the pace of developments in this space and more optimistic than ever that we’re finally starting to get some traction, moving towards a new way of collaboration that will finally start to transform the way collaboration, research, and publishing is done for the whole field.
Please remember that this is still my blog, and my opinions are still my own. Not only that, but my opinions are attained through reading and interacting with the group of early adopters than I’ve come to know and depend on ever since that first irritation-induced foray into this space, so long ago1. Then read this short and fantastic story by Hari Kunzru.