Helicos Biosciences is reporting the sequencing of M13 in Science.

Helicos Biosciences is reporting the re-sequencing of the M13 viral genome.
What’s really cool about this is that it’s sequencing-by-synthesis, requiring no amplification before sequence reading, so there’s no biasing of the populations of DNA fragments. Here’s a short overview of how their sequencing by synthesis works:

So it’s still a shotgun-style technique, and the problem has been getting long enough read lengths to assemble the pieces into a whole sequence. Think of it like a jigsaw puzzle. The smaller the pieces are, the more you have that look identical, and the harder it is to figure out where to put them. Given 3 x 109 bases in the human genome, you’d need a read length of about 17 base pairs to be certain of ending up with all different pieces. They report average read lengths of 23 bp, with each individual sequence chunk represented about 150 times, and every part of the genome represented. This allows them to keep the error rate low enough that they could sequence different strains of the virus, and reliably pick up the genomic differences between the two. The paper doesn’t say how long this takes, but their marketing material says the process takes only one day.

The technique as practiced by Helicos has one major drawback, however, that will limit how low it can take the cost/base. It requires very expensive optics, since you’re essentially doing FRET on an array of targets. The list price for their instrument is $1.3 Million, with a consumables price of $18,000/sample, placing it out of reach of many institutions.

Illumina, another company with a sequencing-by-synthesis application, gets around the expensive optics issue by doing a clever solid-phase amplification of their target strands, essentially growing little colonies of identical strands in situ. A sales rep told me they’re selling around $500K, but didn’t have details on the consumables cost.

Long-term, I’d expect the Polonator to be the most widely adopted platform, due to their aggressive pursuit of royalty-free technology and low instrument cost. There’s a good thread to follow here, if you’re interested.

23andme and Navigenics take note: They’re not offering a direct to consumer service, yet, but there’s no way a SNP scan can compete with a full sequence, once the cost comes down.

via HotCites, more on this story at in-sequence.

Science Debate 2008

I’ve been a big supporter of the idea of having all the candidates get together for a roundtable on science-related issues. I’m subscribed to the mailing list for Science Debate 2008, which has been trying to organize such a debate. The organizer recently pointed out that the candidates are attending the Compassion Forum, which is a debate on “moral issues”. I don’t know if there’s more going on behind the scenes regarding why the candidates would chose to attend one conference and not the other, but it just seems to me like the Democratic candidates, at least, would want to seize the advantage provided by the recent weakening of the religious right, and it also seems like attending a forum on moral issues might not be the best way to do that. Maybe they’re trying to catch some swing votes, but they certainly shouldn’t give out the impression that moral issues remain the most important issues for this election, because that’s exactly what led to their loss last two times. You know Obama and Clinton would both kill McCain in a science debate.

Here’s the email:

I am sorry to send two emails in such short succession, but I thought you should know that after declining our invitation to debate science in Pennsylvania, Barack Obama and Hillary Clinton yesterday agreed to attend “The Compassion Forum,” a forum of “wide-ranging and probing discussions of policies related to moral issues.” CNN will serve as the exclusive broadcaster of the “presidential-candidate forum on faith, values and other current issues” at Messiah College near Harrisburg, Pa., April 13 at 8 p.m. You can read more here.

Perhaps among the moral issues discussed should be whether they have a moral obligation to more fully engage on science issues, since the future viability of the planet may hang in the balance, for starters. Is there a larger moral imperative? How about the future economic health of the United States and the prosperity of its families? Science & engineering have driven half our economic growth since WWII, yet but 2010 if trends hold 90% of all scientists and engineers will live in Asia. Then there are the moral questions surrounding the health of our families with stem cell research, genomics, health insurance policy, and medical research. There’s biodiversity loss and the health of the oceans and the morality of balancing destruction of species against human needs and expenses, there’s population and development and clean energy research, there’s food supply and GMO crops and educating children to compete in the new global economy and securing competitive jobs. Science issues are moral issues.

I would encourage you to write letters to the editor, emails to the campaigns, and blog postings pointing this out. And if you can, support our ongoing effort to turn this country around.

Shawn Lawrence Otto

ScienceDebate2008.com

Distributed data and distributed analysis

Two smart people I read, who probably don’t know each other, and work in disparate fields, both have a post today about using a distributed collection of data and analysts to answer problems that we’re just now being able to address.

Here’s Jon Udell talking about distributed communities of climate scientists studying CO2 fluctuations.

Here’s Steve Hsu talking about the quantitative finance firm Horton Point, and how they’re assembling teams of academics from diverse fields such as psychology and data mining, to come up with dynamic models of how markets work.

The problems with current genomic association studies.

There’s a nice roundup of the problems faced by current chip-based techniques to find disease-gene associations. This is the technique practiced by 23andme.

Current chip-based technologies for genome-wide analysis, while having some success in identifying the lowest-hanging genetic fruit for many common diseases, seem to have already started to run up against barriers that are unlikely to be overcome by simply increasing sample sizes. These technologies should really be regarded as little more than a place-holder for whole-genome sequencing, which should become affordable enough to use for large-scale association studies within 3-5 years.

Nothing beats a large, diverse library of full sequences. There will be many a position available for the enterprising bioinformaticians needed to compile and interpret this data store.

The G.I.N.A. could be a bad thing for healthcare.

The DNA Network batted this issue around several months ago, but it’s coming back in the form of a Letter to Nature.

His argument is that there’s no guarantee that making insurance companies remain ignorant of a patient’s genetic risks will prevent discrimination against those who have the unlucky combination of poor economic status and genetic risk factors.

I think he’s right. For one thing, the government would have to force people to not voluntarily disclose their info to the insurers, perhaps in exchange for a premium reduction, and that would be a little heavy-handed, wouldn’t it? Currently, the system helps those with higher risks obtain insurance by essentially letting the people who pay in without making claims subsidize the rates for those who make more claims than they pay in. Why couldn’t it work the same way if the insurers were able to take genetic risks into consideration? It might even lower the total cost of insuring everyone, allowing the insurers to cover more people for the same costs, or the same amount of people for much less cost with the high risk people picked up by programs similar to those which cover the homes of those in flood-prone areas. (As a New Orleanian, I’m not one to sign the praises of FEMA, believe me, so I’m not holding the NFIP up as an example, I just think the actuarial considerations are similar). Maybe we could even make enrollment of a certain amount of high-risk people a condition of being allowed to set rates based on genetic profiling?

This is all jumping the gun a little, because actual rock-solid, high-confidence correlations between a genetic feature and a disease are still rather rare, but one thing’s for sure: The better you see what’s ahead, the better you can plan for it (whether a insurance company or an individual), and having a good plan leads to better outcomes for everyone. Everyone’s worried about enabling social injustice, but it can’t really be said that our current insurance system in which many are so under-served is really all that great to begin with, so let that temper your thoughts, as well.

Genetic-Future via Genome Technology

OK, people. It’s time to define some terms.

I’ve increasingly began to hear something that is the conversational equivalent of fingernails scratching a chalkboard.

A blog, short for web log, is a website consisting of timestamped posts in reverse chronological order. A log, in essence, of things you see and react to on the web.

The individual entries that make up the timestamped list are posts, or entries. Think post in terms of “posting a notice”. DO NOT use the word blog to refer to a post.

Blogs often feature comments, which are short pieces of commentary written by readers of the blog, appearing below each individual post. DO NOT use the word blog to refer to a comment.

The n00b influx is a sure sign something has truly entered the public awareness, and not just the web awareness.