More Subread Simulations

May 10, 2024

In my last post of Pacbio read simulations I laid out the basic simulation framework and showed a comparison of the simulated and real error profiles. To review the simulation creates subreads and then uses PacBio’s CCS tool to build a corrected consensus read. These are then evaluated with BEST:

With this framework in place, we can play around with various other simulations. For example comparing the subread coverage published for the Revio against varying subread coverage in simulation:

Simulated mis =0.0249, dels=0.0296, ins=0.049

The Revio results are 3 or 4 Qs higher. This is expected as the Revio reads also go through DeepConsensus for correction, which gives them an extra boost.

However the graphs of course suggest you have the option of increasing subread count to improve quality. That might be possible by going to a shorter subread length, or by reducing yield (throwing out wells with lower subread counts). Neither of these options is particularly attractive…

Let’s see what happens if we increase the error rate, here scaling over baseline (so 2 is twice the error rate):

The CCS process seems to work well even at relatively high error rates (10% insertions, ~6% dels and ~5% mismatch) are still giving us ~Q20 reads at 10 subread coverage. Seeing how this would perform in the context of nanopore sequencing would be interesting!

ASeq Newsletter

Discussion about this post