Clearing up the confusion around citations of internet sources

Since I wrote that last post, it has become apparent that there’s a lot of confusion regarding citing material on the internet, which isn’t surprising given that there’s a lot of confusion surrounding the internet itself. Put your mind at ease, gentle reader, for clarity awaits.

Here’s the main idea you need to understand: There’s two kinds of information on the web: static content and dynamic content.

Static content refers to files on a computer somewhere that are made available on the web by means of a web server. All static content has a unique URL that points to it and only it. This used to be the only kind of personal content on the web, and consisted largely of HTML files instructing the browser unlucky enough to request it to show a flashing GIF or under construction banner. Because each URL led to one and only one file, there was never any uncertainty about what URL referenced what content. However, the big disadvantage to this approach is that whenever you wanted to re-structure your site, say from a site organized by department to one organized by discipline or location, to you had to go through and manually change the links in all the files so they pointed to the new locations. Needless to say, this got out of hand in a hurry.

To improve this situation, dynamic sites were invented. With this type of site, all your content sat in a database, and instead of displaying an HTML file when someone visited your site, you had a script that dynamically generated the HTML based upon the information in the database. All you needed then was a template to tell it which kind of information to put where and then, to totally rearrange the structure of the whole site, you only had to change that one template file. Instead of linking to a file that contained the Biochemistry dept. information, and a separate one that contained the Genetics dept. information, and so on, you just had your template specify “department links here” and it would spit out the link to all the departments in the database. Unfortunately, this meant that one URL no longer unambiguously referenced one file, because requesting a page from a dynamically generated site is doing a database query, and not all queries return only one result and the same result every time.

Now, with this distinction firmly in mind, we can now properly divide up the types of content and specify appropriate rules based on their content type. Obviously, if you want to reference a static file, include the URL to the static file. A link to a static file should end in the filename, not in a forward slash. If you want to reference dynamically generated content, you need to reference a URL that will cause only the one item of content you want to be generated. Such an URL is called a permalink, and it’s just a special kind of database query. A permalink doesn’t reference a file, but rather contains a set of search parameters needed to retrieve a single piece of content, and therefore does not end in a filename. All static files can be referenced in the same way(with a link to the file), and all dynamically generated content can be referenced in the same way(by the database query).

However the relevant style guides want to structure things is fine with me, but the one piece of information that must be included is either the static link to the file, or the permalink.

There are a couple special cases of dynamically-generated content that bear discussing.

Blogs

Since blogs are organized chronologically, you should reference the date the post was published, and not any other date. The date you read the post is meaningless, because there’s no way to reference the version of the post on the day you read it. Likewise, the hosting provider or country of origin is pretty much meaningless, as things can and do move around the globe without changing in content. The content of a specific post is expected to remain more or less static once published, though this is entirely up to the discretion of the blog owner, and not something that can be addressed through citation style guidelines.

Wikis

Wikis, on the other hand, are expected to not remain static. To compensate for the dynamic nature of the article, every entry has a revision history. In this case, the revision of the page at the date of access should be given, and the easiest way to do this with most wikis is to cite the URL of the “diff“. Here’s an example of a diff for the page on Mesenchymal stem cells at Wikipedia.

Content generated by POST forms

When you enter search terms into a form on a page, as for a database search, and the URL for the results page doesn’t show the parameters in the result page URL, you might think there’s no way to cite those results, however, there’s usually a way to put the search parameters in the URL, so that you can link directly to a page of search results or other dynamically generated content. Remember, a permalink is just a special kind of database query, so this shouldn’t sound too strange. Here’s an example of a link to all the clinical trials being conducted on Multiple Myeloma. This post tells you how to make database query parameters show up in the URL.

And it’s as simple as that. To cite content on the web, you either cite a file or you cite a database query. I hope you now have a better understanding of the kinds of content on the web, and how to cite them so that future readers can actually find your content.

There’s a round-up of some of the posts here so you can see the confusion for yourself.

Synthesis

A synthesis of ideas about open science and social technology.

Clearing up the confusion around citations of internet sources

Blogs

Wikis

Content generated by POST forms

One thought on “Clearing up the confusion around citations of internet sources”

Leave a Reply Cancel reply