Biomemory - Enzymatic Data Storage
A company I’ve never heard of has just raised $18M for enzymatic data storage according to GenomeWeb. That’s kind of shocking, as most other data storage companies, and many other enzymatic DNA synthesis companies seem to be having a hard time.
Biomemory is a French startup, and they’ve raised from the follow folks: Crédit Mutuel Innovation (lead), French Tech Seed fund, Blast, Deep Tech 2030 fund, Paris Business Angels, Sorbonne Venture, Adnexus, Prunay, Next Sequence, and Accelerem.
They’d previously raised 2M Euros in 2022.
There first product appears to be this 1000 Euro card (actually a set of two) which appears to encode 1Kb of data1 and includes sequencing.
The video they show the following message:
Hello World!!😊🧬
Encoded as:
AGACTCACAGTCAGAGAGTCTGACAGTCTGACAGTCTGTGACTCACACAGTGCCGAAGTCTGTGAGTGACTCAGTCTGACAGTCAGACACTCACAGACTCACAGTGTGACACTCAGTGTGTCAGTCACCCGATCTCTGTGACACTCAGTGTGTCTCAGTGTCTCTGAC
This is something like 21 bytes encoded into 168 bases, 10.5 bases per byte. The most efficient encoding would be 4 bases per byte so there’s clearly some redundancy here.
There’s a lot of bias in the in the base selection here. For the most part only the following 2mers appear to be used:
AG, GA, AC, CT, TC, CA, GT and TG
That is 8 out of a possible 16, with these transitions being represented:
Using the following transition table, we can partially decode the message2
transition_to_bit = {"AG": 0, "AC": 0, "GA": 1, "GT": 1,"CT": 0, "CA": 0, "TC": 1, "TG": 1}
Hello v÷&ÆB^R^_ ù<9a>ð<9f>§V
The encoding fails from then, there are a couple of homopolymers in the sequence which could be throwing things off. So if anyone can completely decode the message let me know!
The sequence itself suggests that they have issues encoding homopolymers (which I assume is why they use a transition based encoding). That wouldn’t be too surprising for an enzymatic technology. But the whole sequence is pretty short anyway… and it feels like it could be cost effectively synthesized using traditional methods anyway.
Perhaps in a future post I’ll poke around for any patents or publications on the basic technology.
As to other information about the company, there are a couple of videos on their YouTube channel, including this one showing an image being encoded:
And I’d feel remise if I didn’t point out that it seems Lena would rather you didn’t use her image for testing out your encoding strategies…
You can watch a documentary on the history of the Lena image here.
Based on this YouTube video.
Here’s the full code:def create_ascii_from_transitions(sequence):
def create_ascii_from_transitions(sequence):
# Define the mapping of transitions to bits
transition_to_bit = {
"AG": 0, "AC": 0, "GA": 1, "GT": 1,
"CT": 0, "CA": 0, "TC": 1, "TG": 1
}
# List to store the bits
bits = []
# Iterate through the sequence and convert transitions to bits
for i in range(len(sequence) - 1):
transition = sequence[i:i+2]
if transition in transition_to_bit:
print(transition, ' ',end='')
print(transition_to_bit[transition], ',',end='')
bits.append(transition_to_bit[transition])
print()
# Convert bits to ASCII characters
ascii_chars = []
bit_patterns = []
for i in range(0, len(bits), 8):
byte_bits = bits[i:i+8]
byte_value = 0
for bit in byte_bits:
byte_value = (byte_value << 1) | bit
bit_patterns.append(''.join(map(str, byte_bits)))
ascii_chars.append(chr(byte_value))
return ''.join(ascii_chars), bit_patterns
def string_to_bit_pattern(input_string):
return ['{:08b}'.format(ord(char)) for char in input_string]
# Input sequence
#sequence = "AGACTCACAGTCAGAGAGTCTGACAGTCTGACAGTCTGTGACTCACACAGTGCGAGTCTGTGAGTGACTCAGTCTGACAGTCAGACACTCACAGACTCACAGTGTGACACTCAGTGTGTCAGTCACCGATCTCTGTGACACTCAGTGTGTCTCAGTGTCTCTGAC"
sequence = "AGACTCACAGTCAGAGAGTCTGACAGTCTGACAGTCTGTGACTCACACAGTGCCGAAGTCTGTGAGTGACTCAGTCTGACAGTCAGACACTCACAGACTCACAGTGTGACACTCAGTGTGTCAGTCACCCGATCTCTGTGACACTCAGTGTGTCTCAGTGTCTCTGAC"
# Generate the ASCII text and bit patterns from the transitions
ascii_output, bit_patterns = create_ascii_from_transitions(sequence)
# Print the resulting ASCII text
print("Generated ASCII Text:", ascii_output)
# Print the bit patterns
print("Generated Bit Patterns:")
for pattern in bit_patterns:
print(pattern)
# Example for "Hello World!!"
example_string = "Hello World!!"
example_bit_patterns = string_to_bit_pattern(example_string)
print("\nBit Patterns for 'Hello World!!':")
for char, pattern in zip(example_string, example_bit_patterns):
print(f"{char}: {pattern}")