Everyone is aware of steadily falling costs in high-throughput short-read DNA sequencing – a theme that is expected to continue as new players, such as BGI, Apton Bio, Singular Genomics, Element Biosciences, Genapsys, and others still in “stealth mode” enter the market following the recent invalidity of the ‘444 patent.
We are also seeing real improvements in the accuracy of long-read sequencing technologies; for example 25kb reads with 99.999% consensus accuracy reported by Pacific Biosciences with the SMRTbell™ kit, while Oxford Nanopore Technologies regularly achieve several 100kb per read – with theoretically no limit to the length of a single nanopore read.
But as end users increasingly appreciate, and as signalled in several recent industry announcements, many applications of DNA sequencing are now running into challenges because techniques for extracting nucleic acids and preparing samples have not yet been updated to take full advantage of progress in sequencing.
Too often, users still have to work with nucleic acid extraction methods that are difficult to automate, limit the ability to detect rare DNA molecules, or add unnecessary cost to commercial assays.
Additionally, many commercially available extraction techniques do not serve the long-read sequencing technologies which are increasingly taken up in the market.
Challenging applications of sequencing technologies
In diagnostics, short-read DNA sequencing is rapidly becoming an essential tool in applications that require the detection of rare DNA molecules, for example in infectious disease and cancer.
Minimal Residual Disease (MRD) assays now need to detect cancer-derived DNA molecules as rare as 1 in 1 million. Similarly in sepsis, DNA sequencing-based approaches require the detection of infectious organism borne DNA extracted from as few as 1 CFU/mL. Shortcomings in DNA extraction methods can limit the sensitivity of assays and increase the risk of false negatives.
In research, the potential for long-read sequencing technologies to drive scientific discovery is profound. It is now possible to resolve highly complex repetitive regions of DNA and to characterise structural variation in regions of the genome that would be significantly more challenging with short-read technologies. In addition, long-read sequencing technologies are enabling improvements in the accuracy of existing genome assemblies but can also be used to resolve chromosome rearrangements in disease.
Classical approaches to sample preparation
Despite this promise, too often advanced sequencing technologies are still forced to work in concert with sample preparation techniques that have their roots in previous generations of sequencing technologies.
Nucleic extraction methods were developed with a view to extracting enough DNA from a sample to support the existing sequencing platform’s input requirements, not rare variants – the hay, not the proverbial needles in the stack. Many of these techniques achieve only limited DNA yields that are often not wholly representative of the input sample, which means that very rare sequences may be missed.
To tackle this problem, scientists and clinicians typically take advantage of progress in short-read sequencing technologies and sequence deeper. But this comes at significant cost, and only helps if the rare variant DNA molecule that is the harbinger of disease has been captured in the first place.
Commercially established nucleic acid extraction methods are also often intrinsically biased towards short fragments of DNA. This is not surprising considering the combination of chemical, mechanical and thermal insult often thrown at the DNA. Even traditional automation solutions such as gentle pipetting can damage the integrity of DNA.
Previously, this could be seen as advantageous because sequencing technologies could only handle short fragments of DNA. But as long-read sequencing is increasingly taken up in the market, this general approach is no longer fit for purpose.
There are non-commercial workarounds to achieve long DNA fragments. But these are based on more hands-on classical DNA extraction protocols accessible only to specialist laboratories. As a consequence, long-read sequencing technologies in particular are yet to be used to their full potential in both research and diagnostic applications.
Inventing next-generation sample preparation
These challenges are ripe to be solved by inventing next-generation techniques that offer improved biochemical performance, throughput, ease of automation and speed of sample processing.
In principle, higher nucleic acid yields, which can directly translate into cost reductions and may aid the capture of rare targets, will be achieved with more efficient sample preparation chemistries and alternative precipitation substrates.
In addition, an up-and-coming approach for capturing rare DNA targets is to leave behind the old ways of working at the volumes of standard liquid handling solutions – generally, 96-well microplates – and develop extraction techniques that process the whole biological sample.
Longer DNA fragments, on the other hand, can be achieved by using gentler chemical methods – or ideally eliminating harsh chemicals such as chaotropic salts from extraction protocols altogether.
Inventive biochemistry in combination with new liquid handling approaches that apply less mechanical stress to the sample will be key to yielding higher molecular weight DNA.
Additionally, careful DNA precipitation substrate choice can be crucial in minimising mechanical insult as well as providing substrate features which themselves protect DNA from shearing. This is core to the Nanobind technology developed by Circulomics Inc (recently acquired by Pacific Biosciences).
And next-generation sample preparation need not be limited to nucleic acids. Other approaches, such as multiphase extraction and electrophoresis, not only help balance DNA yield, integrity and additional requirements around DNA purity. They also offer the potential to extract other biomolecule targets together with DNA.
It is true that data analytics such as AI and mature bioinformatics can enable better capabilities and ‘new’ biology. For example, virtual long reads can be delivered on existing short-read sequencing platforms, such as the Infinity Long-read Assay recently announced by Illumina Inc. However, novel techniques like this will also benefit from unbiased sample preparation so that the analytics can work in concert with better biochemistry.
To transform clinical outcomes and meet the demands of DNA sequencing technology users in 2022 and beyond, we need elegant sample preparation solutions that combine biology, engineering and automation to achieve nucleic acid outputs which are unbiased, of adequate integrity, and match the promise of speed of novel rapid sequencing techniques.
Ultimately, the true value of cheaper short-read sequencing and increasingly capable long-read sequencing in diagnostics and research will only be realised if we tackle the limitations in many of the commercially available sample preparation methods.
At TTP, we are exploring the next generation of sample preparation, including new automation and integration friendly techniques that achieve higher DNA yields and longer DNA fragments.