Oxford Nanopore Inter-Template Duration

Jun 06, 2024

Someone was asking about the time taken to load a new template on Oxford nanopores platform on the Discord. There was a fair bit of back and forth about this and no clear consensus. I decided to have a quick poke around.

This is a first pass. Comments and suggests are most welcome.

I grabbed a random dataset here (Pangenome). The directory contains a text file with some summary statistics (GM24385_1.sequencing_summary.txt.gz). The file seems to contains both passing and failing reads.

Looking at the summary data it’s clear that there are a huge number of short reads:

In the above I calculated the events/s from num_events and template_duration. This is generally ~2000/s. This is >4 times higher than expected from the Oxford quotes base/s. This could be down to event overcalling or other artifacts.

I then calculated the “inter-template duration”. That is the duration between strands when there’s nothing in the pore.

I get the following distribution:

I make no strong claims on the accuracy of this (it’s a first pass, and I welcome comments and suggestions). The code is attached below.

But it’s interesting!

I assume this is an exponential distribution, which is what you’d expect for a distribution of times between random events? The average is ~10.5 (seconds, assume time here is seconds!).

This likely varies significantly between runs/with sample prep. but likely has a significant impact on how throughput is affected by template length and other factors.

I will continue to play around with this, but thoughts are most welcome!

Code

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('GM24385_1.sequencing_summary.txt.gz', sep='\t', compression='gzip', dtype={'start_time': float, 'duration': float, 'template_duration': float})

df['start_time'] = df['start_time'].astype(float)
df['duration'] = df['duration'].astype(float)
df['template_duration'] = df['template_duration'].astype(float)


unique_channels = df['channel'].unique()


all_all_itd = []

for channel in unique_channels:

    channel_data = df[df['channel'] == channel][['run_id','channel', 'start_time', 'duration', 'template_duration','num_events']]

    # Sort data by 'start_time'
    channel_data = channel_data.sort_values(by='start_time')

    channel_data['events/s'] = (channel_data['num_events'] / channel_data['template_duration']) 
    channel_data['end'] = (channel_data['start_time'] + channel_data['duration']) 

    le = 1000

    all_itd = []
    for index, row in channel_data.iterrows():
            st = row['start_time']
            itd = st-le
            if itd > 0:
                if itd < 100:
                    all_itd.append(itd)
            le  = row['end']

    all_all_itd.extend(all_itd)


    print(f"Channel {channel}")


# Plot histogram
plt.hist(all_all_itd, bins=100, edgecolor='black') 
plt.xlabel('Inter-template duration')
plt.ylabel('Frequency')
plt.grid(True)

plt.savefig('histogram.png')

ASeq Newsletter

Discussion about this post