Connotea, you’ve been good to me

Posted on June 16, 2008 by Mr. Gunn

Despite all that, you’ve been a little slow lately, and I found my thoughts wandering. Earlier today, I noticed a 2collab link, and, without thinking about it, clicked over to see what has changed since last time we met. Before I knew it, I was pulling in my publications via Scopus ID, tagging papers, and joining interest groups. I’m sorry, Connotea, but the speed was just so intoxicating that I went and exported my whole library from you and went to import it to 2collab. It seems my affections weren’t returned, however, as I was slapped with the following message:

Unable to import bookmarks: org.hibernate.exception.ConstraintViolationException: Could not execute JDBC batch update

Like a bucket of cold water, that returned me to my senses, and I came back to you, ol’ Buggotea.

15 thoughts on “Connotea, you’ve been good to me”

Brant Emery on June 17, 2008 at 4:19 am said:

Dear Mr Gunn

I just read your post – and I’ve asked our technical team to investigate, but in the meantime, could I just check a few assumptions: Was this importing bookmarks via the Connotea option? Are there any other details you remember? Are you using a Mac?
I will get back to you when we’ve fixed it. Thanks for pointing it out. But I do hope you will change your mind and give us another try! You’ll see we are very different from Connotea in blending together a reference management, collaboration and networking tool in one space.
With best regards
-Brant

Brant Emery, development manager, 2collab

Reply ↓
Brant Emery on June 17, 2008 at 9:55 am said:

Mr Gunn – for someone so prolific, it’s actually very hard to find your email address so I have to post this on the blog post. Hope you don’t mind too much. We solved your problem re the import error for 2collab:

The reason the import does not work is because of this reference, specifically invalid URL format UR – isi:000221143200012, see below:

TY – JOUR
TI – Mesenchymal-epithelial interactions in the skin: increased expression of dickkopf1 by palmoplantar fibroblasts inhibits melanocyte growth and differentiation
AU – Yamaguchi, Y.
AU – Itami, S.
AU – Watabe, H.
AU – Yasumoto, K.
AU – Abdel-Malek, Z.A.
AU – Kubo, T.
AU – Rouzaud, F.
AU – Tanemura, A.
AU – Yoshikawa, K.
AU – Hearing, V.J.
PY – 2004/xx/xx
N2 – J
KW – MICROPHTHALMIA
KW – beta-catenin
KW – MITF
KW – transcription factor
KW – neural crest
KW – differentiation
KW – Wnt
KW – dkk1
JF – Journal of Cell Biology
JO – J.Cell Biol.
VL – 165
IS – 2
SP – 275
EP – 285
UR – isi:000221143200012

To help other cases of this – we are now investigating developing a feature that will catch-pool these when you import, so it won’t stop the whole importing process. You’ll be able to review these exclusions later.

Hope this helps.
ATB
-Brant, 2collab

Reply ↓
Mr. Gunn on June 17, 2008 at 10:06 am said:

I’ve gotten so used to never hearing back from web service providers to whom I’m not paying money that I didn’t even bother to contact anyone about the problem.

Yes, it was importing via the Connotea option, which consists of .RIS export from Connotea and import of the .RIS file, and I’m using Firefox 3 on Vista.

I think the last time I tried, I could import the .RIS file from Connotea to Zotero, then export that again as .RIS, and import that file to 2collab, but not only did I get an error, the file uploaded anyways, with missing tags, and left me with no batch delete option to go back and try the upload again.

All the social bookmarking services bring a little something different, and it really is nice how 2collab leverages the existing data they have to add contextual information to your bookmarks. I hope these remaining issues get ironed out soon.

Reply ↓
Mr. Gunn on June 17, 2008 at 10:15 am said:

Brant, I just checked my library, and it looks like things did get imported, and the tags are there, but now the titles are missing.

Compare the bookmarks tagged connotea with the one I imported via Scopus ID.

Posting here works just as well as email, probably better, in fact, since I get an email when a comment comes in. What I’m going to do next is to delete everything I’ve tagged connotea(if possible), remove the UR:isi record from the .RIS file, and try again.

Reply ↓
Mr. Gunn on June 17, 2008 at 10:39 am said:

I removed anything after “UR – ” in the .RIS file (regex ^UR - .*$) and tried the upload again. I tried both the Connotea import and the .RIS import options. Both resulted in the import of zero citations, despite the full citation details(and DOI where available) being present. Since you’ve got the file, I’ll let you get back to me on why the title field wasn’t picked up in the file with the UR – records and why nothing got imported when I removed the url from the UR – record, leaving everything else intact.

Reply ↓
Brant Emery on June 19, 2008 at 4:13 am said:

Hi – Sorry, just to clarify, essentially it was just this record that was stopping the import, not all UR fields. The content of the UR field should not be removed for all entries, otherwise 2collab will not know the URLs anymore. Recommendation: only the entry UR – isi:000221143200012 should either be changed to an online link – or the whole reference removed.

And for the titles problem: it appears that us and connotea use a different tag for primary title: we use T1, they use TI. If in the RIS file all occurrences of TI are replaced with T1, the titles will
be imported properly (a quick “find and replace” for all occurrences of TI with T1 will do). We have actually done this for you – and can send you a fixed RIS file if you let me know your email address.

But this is good feedback, and we will enhance the RIS import to include other title variations possible. So we’ll include not only T1, TI, but also BT, CT, etc.

Also, I do recommend deleting first everything imported before trying again: go to ‘my bookmarks’, filter everything by the tag ‘connotea’ (if this is what was used at import) and click on the delete button.

Hope this helps – and glad to surprise you with an info provider reply! 😉
ATB
-Brant

Reply ↓
Mr. Gunn on June 19, 2008 at 11:03 am said:

Replacing TI with T1 almost works, Brant. The next problem is that some journals don’t play well with bookmarking services (I’m looking at you, sciencedirect) so in order to get the citation info scraped, you have to bookmark the pubmed page. In that case, my UR fields contain the pubmed page I was on when I bookmarked the article, and the N1 field and the M3 fields contain the DOI. Some records also only contain a JO field, the abbreviated journal name, but 2collab needs a JF field. This means that instead of displaying “Journal of Biological Chemistry 280, 3, 2309-2323, 21 January 2005.”, 2collab simply displays ncbi.nlm.nih.gov. For entries in the .RIS file added to Connotea via pubmed, the UR field contains the link to the pubmed abstract, but not the link to the actual article.

What I expect to happen is that the citation information is displayed beneath the title, and the DOI, if present, is used as the link for the title, falling back to using whatever is in the UR field only if there’s no DOI. Would this be too hard to implement, perhaps just for the case of the Connotea import, or should I just get busy with the find and replace again? I’m thinking that that still won’t solve the problem of the citation information not being displayed.

Edited to remove potentially confusing incorrect assumptions, see next post.

Reply ↓
Mr. Gunn on June 19, 2008 at 11:14 am said:

Actually, it looks like it might, but it’s not as simple as deleting the UR field and renaming the M3 field as UR. You gotta insert http://dx.doi.org/ in front of the DOI if you’re putting it in the UR field, and only do that for records which have a DOI. I’ll post the regex for this when I get it figured out.

Reply ↓
Mr. Gunn on June 19, 2008 at 12:36 pm said:

OK, so I renamed the M3 field to UR, added http://dx.doi.org/ to the beginning of M3, and deleted the UR field containing a link to pubmed, leaving the non-pubmed UR fields intact. The regex I used to find pubmed links in the UR field was ^UR - http\:\/\/www\.ncbi.*$ and I replaced it with UR - . This properly links those records which previously just went to pubmed, and seems to be displaying the citation info as well, but it still has problems where entries contain a JO field and no JF field. From what I can tell, the JO field is the abbreviated form of the journal name, whereas JF is the long form. I have 180 entries with JO fields, so it’s beyond my abilities at the moment to fix this, as it would require some sort of lookup of the full name given the short version, so I’m just going to leave it for now.

What my experiments here suggest is that 2collab just uses whatever it finds in the .RIS file straight, instead of matching entries in the .RIS with journals in their database and actually using the citation information from the database. That’s the kind of input validation needed to fix problems with differently named fields and missing information, but I understand how it’s much more of a computational load than just parsing whatever the .RIS file says and assuming it’s correct and complete.

Would you agree, Brant?

Reply ↓
Mr. Gunn on June 19, 2008 at 1:15 pm said:

I’m also wondering if it wouldn’t be easier to just use the API.

Reply ↓
Brant Emery on June 23, 2008 at 2:28 pm said:

Hi William

After looking through the work you’ve done here and also at the RIS format, it breaks down into two propositions – refining the RIS importing feature to better handle cases like this, and better handling of Connotea data sets for people transferring their references or accounts.

I can report that we are now investigating both – the further inspection of the complexities of RIS came at the behest of your post, and I thank you for pointing this out. We are now scheduling investigation time for developing and implementing better RIS handling.

On the second count – of course 2collab is not a walled garden and we want to make it easy as possible for people to join our platform, and leave it should they wish to do so. Using other sites’ APIs to ease this transition may be the most user friendly way of doing it, and we will definitely take up this analysis.

Thanks for the personal time and effort on this.

ATB
-Brant

Reply ↓
Mr. Gunn on June 23, 2008 at 2:40 pm said:

I’m glad to help, Brent. I guess what I’m proposing is skipping the .RIS altogether whenever possible, such as when you can pull citation info using another site’s API.

One final bug to report: Using Firefox 3 on Vista, the keyboard arrow keys don’t work in the tag text area.

Reply ↓
Pingback: William Gunn joins Mendeley as Community Liaison | Mendeley Blog
Garden tool supply on December 18, 2009 at 6:28 pm said:

I enjoyed reading it. I need to read more on this issue…I am admiring the time and effort you put in your blog, because it is apparently one great place where I can find lot of reusable info..

Reply ↓
- Mr. Gunn on December 18, 2009 at 6:32 pm said:
  
  Oh please tell me more about how you use academic reference management in your garden tool supply business.
  
  /tool
  
  Reply ↓

Synthesis

A synthesis of ideas about open science and social technology.

Connotea, you’ve been good to me

15 thoughts on “Connotea, you’ve been good to me”

Leave a Reply Cancel reply