What is optical mapping?
Optical mapping is a technique that captures the labelling patterns on long DNA molecules. The technology has gained popularity nowadays due to its vast applications on genome scaffolding and large structural variation detection. Instead of capturing exact nucleotide, optical mapping provides the structural information (labelling pattern) of individual long DNA molecules. These labels usually refer to a short but specific sequence such as an enzyme site. The product of optical mapping is called an optical map or optical mapping molecule that comprises a DNA backbone and a set of labels.
Despite the lower resolution that only specific site is captured, one reason why optical maps attract attentions is that the length of an optical map is far longer than that of a sequencing read. This technology could therefore complement sequencing to handle areas where the global genomic structure is more important. Furthermore, optical mapping also has the potential to deal with specific genomic studies by using alternative labelling methods.
How is optical mapping data generated?
The key to data generation is to create a site on DNA molecule that is captured. Recently, the most commonly used optical mapping technologies are commercialized by Bionano Genomics Incorporation and OpGen Incorporation. The concepts behind data generation of the two technologies are similar. The former technology uses Direct Label and Stain (DLS) enzyme or nicking enzyme to create signals on DNA molecules and passes them through nano-channels for imaging, while the latter one uses restriction enzyme to create double-stranded breaks in DNA molecules and immobilizes them for imaging. Here we only introduce the DLS enzyme-based data generation, as this is the latest technology that allows labelling without breaking the molecule and can capture the longest DNA ever achieved ( > 2Mbp).
First, long DNA molecules are extracted from the samples. Unlike short-read sequencing, DNA First, long DNA molecules are extracted from the samples. Unlike short-read sequencing, DNA molecules need to be as intact as possible. Next, the DNA molecules are digested with DLS enzymes targeting at specific DNA sequences. These DLS enzymes tag the target sites with fluorescent labeling without nicking, which protects the integrity of DNA. Finally, the DNA molecules are passed through nano-channels for imaging.
Data Properties
There are 5 major types of errors in optical mapping data. At the enzyme digestion step, incomplete recognition caused some target sites remain unlabelled. Therefore, no fluorescent signals are tagged at these sites and they are named missing signals (or false negative signals). In contrast, non-specific enzyme recognition lead to extra signals (or false positive signals) tagged on the DNA molecule. When DNA molecules pass through the nano-channels, different flow rates stretch the DNA molecules to different extent, leading to the scaling error. During imaging, if two signals are too close, they may not be resolved accurately due to limit of light resolution, leading to resolution error. Finally, measurement errors may occur when measuring the distance between two signals.
When nicking enzyme is used, other than the above errors, systematic double-stranded breaks are also an important factor in data generation. When two nicking sites in opposite strands are very close, the DNA molecules is potent to a double-stranded break instead. Such breaks do not directly appear in the raw optical maps. However, the breaks can be revealed by aligning molecules to a reference. Systematic double-stranded breaks are not rare and should not be neglected because it significantly affects the assembly quality, where the contig cannot be extended beyond breaks. But the development of DLS has well overcame this shortage.
Applications of Optical Mapping
With the structural information capture along the very long DNA molecule, optical mapping is well-known as a promising technology in improvement of sequence assemblies, including sequence scaffolding and mis-assembly detection. Other major applications of optical mapping include various genomic studies, but new researches have been conducted to extend optical mapping to study transcription factor binding and epigenome. Here is a list of applications of optical mapping:
- Assisted sequence scaffolding
- Structural variation detection
- Complex genome variation characterization
- Strain typing
- Transcription factor binding site detection
- Methylation site detection
- DNA damage site detection