Oxford Nanopore Inter-Template Duration
Someone was asking about the time taken to load a new template on Oxford nanopores platform on the Discord. There was a fair bit of back and forth about this and no clear consensus. I decided to have a quick poke around.
This is a first pass. Comments and suggests are most welcome.
I grabbed a random dataset here (Pangenome). The directory contains a text file with some summary statistics (GM24385_1.sequencing_summary.txt.gz). The file seems to contains both passing and failing reads.
Looking at the summary data it’s clear that there are a huge number of short reads:
In the above I calculated the events/s from num_events and template_duration. This is generally ~2000/s. This is >4 times higher than expected from the Oxford quotes base/s. This could be down to event overcalling or other artifacts.
I then calculated the “inter-template duration”. That is the duration between strands when there’s nothing in the pore.
I get the following distribution:
I make no strong claims on the accuracy of this (it’s a first pass, and I welcome comments and suggestions). The code is attached below.
But it’s interesting!
I assume this is an exponential distribution, which is what you’d expect for a distribution of times between random events? The average is ~10.5 (seconds, assume time here is seconds!).
This likely varies significantly between runs/with sample prep. but likely has a significant impact on how throughput is affected by template length and other factors.
I will continue to play around with this, but thoughts are most welcome!
Code
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('GM24385_1.sequencing_summary.txt.gz', sep='\t', compression='gzip', dtype={'start_time': float, 'duration': float, 'template_duration': float})
df['start_time'] = df['start_time'].astype(float)
df['duration'] = df['duration'].astype(float)
df['template_duration'] = df['template_duration'].astype(float)
unique_channels = df['channel'].unique()
all_all_itd = []
for channel in unique_channels:
channel_data = df[df['channel'] == channel][['run_id','channel', 'start_time', 'duration', 'template_duration','num_events']]
# Sort data by 'start_time'
channel_data = channel_data.sort_values(by='start_time')
channel_data['events/s'] = (channel_data['num_events'] / channel_data['template_duration'])
channel_data['end'] = (channel_data['start_time'] + channel_data['duration'])
le = 1000
all_itd = []
for index, row in channel_data.iterrows():
st = row['start_time']
itd = st-le
if itd > 0:
if itd < 100:
all_itd.append(itd)
le = row['end']
all_all_itd.extend(all_itd)
print(f"Channel {channel}")
# Plot histogram
plt.hist(all_all_itd, bins=100, edgecolor='black')
plt.xlabel('Inter-template duration')
plt.ylabel('Frequency')
plt.grid(True)
plt.savefig('histogram.png')