Citing journal articles in blog posts and blog posts in journal articles

Wheel I’ve written before about what seems to be the most persistent and error-proof way to handle citing journal articles in blog posts and blog posts in journal articles (1,2), because it seems like some people have gone to quite extensive efforts to address this problem, apparently without looking to see if someone else has already gotten started on a solution. I’m glad to see that people are starting to talk to one another about how to handle things, as opposed to creating their own version of the wheel.

Recent developments:

  • The people at WebCite are talking to CrossRef:
  • But what if we provided a different service for more informal content? Recently we have been in talking with Gunther Eysenbach, the creator of the very cool WebCite service about whether CrossRef could/should operate a citation caching service for ephemera.

    As I said, I think WebCite is wonderful, but I do see a few problems with it in its current incarnation.

    The first is that, the way it works now, it seems to effectively leech usage statistics away from the source of the content. If I have a blog entry that gets cited frequently, I certainly don’t want all the links (and their associated Google-juice) redirected away from my blog. As long as my blog is working, I want traffic coming to my copy of the content, not some cached copy of the content (gee- the same problem publishers face, no?). I would also, ideally, like that traffic to continue to come to to my blog if I move hosting providers, platforms (WordPress, Moveable Type) , blog conglomerates (Gawker, Weblogs, Inc.), etc.

    The second issue I have with WebCite is simpler. I don’t really fancy having to actually recreate and run a web-caching infrastructure when there is already a formidable one in existence.

    The people at Crossref know about Purl.org, Archive.org, and they share my rather dim opinion of the NLM’s recommendation’s for citing websites. However, the people at WebCite.org apparently didn’t know that you can deposit things upon request into Archive.org. If CrossRef goes forward with their idea, perhaps working with Purl.org like they did with DOI, it would pretty much make WebCite irrelevant, and I wouldn’t have to be frustrated by seeing http://webcitation.org/f973p4y in a paper and never knowing if it’s worth following the link or not(at least there’s a greasemonkey fix for YouTube Links).

  • The bpr3.org people are talking to the people at Postgenomic:
  • “Bloggers for Peer-Reviewed Research Reporting strives to identify serious academic blog posts about peer-reviewed research by offering an icon and an aggregation site where others can look to find the best academic blogging on the Net.”

    It is all great except that it already exists and for a long time before BPR3. You can go to the papers section in Postgenomic and select papers by the date they were published, were blogged about, how many bloggers mentioned the paper or limit this search to a particular journal. I have even used this early this year to suggest that the number of citations increases with the number of blog posts mentioning the paper.

    See comments hereand at Hublog.

    I have reservations about WebCite

    Via BBGM, I hear of WebCite, an on-demand Wayback Machine for web content cited within academic publications. It’s important to make sure that links to web content in academic publications don’t fail to resolve to their intended content over time, but how valuable is it, and whose responsibility is it?

    If the citing author feels it’s important, they should make a local copy. They have the same right to make a local copy as a repository does. If the cited author feels the link is important, he should take steps to maintain accessibility of his content. If neither of these things happen, this raises the question whether the value of the potentially inaccessible content is greater than the cost of a high-availability mirror of the web whose funding will come from as yet unspecified publishing partners.

    These things aside, there are some important technical flaws with the project:

  • The URL scheme removes any trace of human readable information. It’s another one of those damn http://site.com/GSYgh4SD63 URL schemes.
  • All sites have downtime. Is the likelihood of any given article being available made greater by putting it all under one roof?
  • What about robots.txt excluded material? A search engine isn’t allowed to archive it, and many publishers have somewhat restrictive search engine policies.
  • Of course, it’s much easier to find flaws in a solution than to come up with a solution in the first place, but it seems to me that a DOI-like system where semantic permalinks could be used that would always point to content wherever it moved around the web would work better, lead to a more complete index, and be much cheaper to run, as well. I know they chose archiving as opposed to redirecting because they wanted to link to the version of the page on the day it was cited, and that’s a good idea, but if having a copy of the page as it was is important, the author needs to make a local copy, rather than hope some third-party will do it for him.

    John Udell likes it, but I’m feeling like it needs a little work.