Everything Wrong With: "Illumina: The Measurement Monopoly" Part 6/8
Recently a couple of people have congratulated me on this substack. Congratulations are gratefully received of course. However I should note of the ~2000 subscribers I have less than 5% convert to paid subscribers.
I spend a significant amount on broken sequencers and lab equipment. Which I think might be of interest to the substack audience1.
This isn’t some kind of plea for more paid subscriptions2. But I remain under a certain amount of financial stress. And I’m going to continue to moan about the Century of Biology having an order of magnitude more subscriptions than I do. So here is the next part of my critical review of Elliot’s article on sequencing!
In a shocking twist, this time it isn’t Elliot who is wrong. It’s Solexa’s John Milton3!
“Solexa’s Milton described this advantage, saying “That’s one of the few reasons Illumina dominates the SOLiD system—they got there first. Once you’re a genome center and everybody’s trained, you stick with [the technology].””
John, you’re wrong!
The SOLiD was an AWFUL sequencer. I personally wouldn’t even call it a DNA sequencer, it didn’t output DNA sequences but “color space” reads4. These reads couldn’t easily be converted into DNA sequences without comparison against a know reference sequence.
The SOLiD probed for 2 bases at a time, and provided a 4 color readout.
But because the sequencer mapped 16 states to 4 colors you didn’t get a DNA sequence readout, but a convolved “color space” read. After each probe, the template was extended by one base. As such your 2bp probes overlapped.
It was therefore possible to convert “color space” into regular base calls, by mapping your way through the overlaps between these 2bp probes:
In theory, if there are no errors this works reasonably well. The problem is that any “color space” error is going to result in a completely different path in base space in all downstream positions:
But… I imagine the thinking at ABI went something like this. Why not spin this massive and obvious problem as an advantage!
Because a single color space error results in a vastly different basecall if you know the expected call then you can filter out errors.
You can do this if you have a reference (like a human genome). Convert the genome to color space. Align your color space reads, then filter out single color change errors.
Problems…
In practice this means the SOLiD had a number of issues, including:
It was only really useful when you had a good reference. This meant that sequencing new genomes was not viable.
It was only ever really going to be useful for SNPs. As indels and variants would be harder to model.
You had to use a custom pipeline, with all your references converted to color space.
It had a large underlying error rate (>6%) which was being hidden by the use of color space5.
Beyond this the SOLiD used emPCR, which by all accounts is a massive pain.
I’m sure a lot of people did a lot great work on the SOLiD, but unfortunately the final result was an awful instrument which was not in any way competitive at the time.
In contrast to what John suggests, there were no lock-in effects with Illumina sequencing at work here.
The ABI SOLiD was just a bad sequencer.
I quite obviously also just like buying broken things on eBay and pulling them apart.
Though, you know that would be nice. :)
Who moved on to Oxford Nanopore as there CSO… but I can’t find a reference to on the company website any more.
I wrote up some notes up on the SOLiD here back in 2008: https://41j.com/blog/wp-content/uploads/2012/04/primer.pdf
To be fair they Genome Analyzer error rate was also ~6% but they much more reasonably filtered out “low purity” reads and never delivered them to the customer.