General information and Q&A
Experimental design is very crucial to every project. You need to generate sufficient data with the correct choice of enzyme. Here we give some quick tips on the experimental design.
What is the general considerations for experimental design of optical mapping?
You should consider two factors: (1) data coverage and (2) enzyme selection.
How much data coverage do I need?
Usually, 100x data should be enough for most of the tasks, even for large genomes (such as human). If you are detecting structural variations, higher coverage will further improve accuracy. For smaller genomes, you should use at least 50x data for assembly, and at least 30x for structural variation detection.
As your raw data set contains mostly useless short DNA fragments, it is recommended to calculate the coverage of “usable data” after all the filtering steps.
Which enzyme should I choose?
We would recommend Direct Label and Stain (DLS) compared with nicking enzymes. Currently, DLE-1 is the first and only developed enzyme in the Direct Label and Stain (DLS) family target CTTAAG sequence in the DNA molecule. This enzyme can be use in most of the organisms and have increased the optical mapping detection resolution to 500 bp. DLE-1 allows labeling without breaking the molecule and can capture the longest DNA ever achieved. The usable label density for DLE-1 is between 9 to 25 labels per 100 kbp. However, if this sequence target happened to be rare in your organism of interest, you may also choose nicking enzymes as described below.
How to choose a good nicking enzyme for my experiment?
A good nicking enzyme should create unique patterns with optimal signal density (roughly 10 signals per 100 kbp) along your target genomes. From our experience, signal density between 8-18 signals per 100 kbp should be acceptable. A data set with signal density 5-8 signals per 100 kbp could be used for suboptimal performance, and you will need higher data coverage (Sometimes 2 fold or even 3 fold) for downstream applications. However, data set with signal density more than 25 signals per 100 kbp is not recommended. Using in-silico digested E. coli genomes as example, BspQI and BssSI will be the best enzymes for experiment. BbvCI could be used with sub-optimal performance with lower signal density. The use of other enzymes is not recommended. Commands for generating simulated mapping data and statistics are described on the Data Simulation page.
Table 1. Signal density of in-silico digested human genome using different enzymes
Enzyme | Signal density (per 100 kbp) |
---|---|
DLE1 | 20.8 |
BbvCI | 45.5 |
BspQI | 12.5 |
BssSI | 13.6 |
BsrDI | 58.7 |
BsmI | 65.6 |
How does signal density affects my analysis?
Currently the acceptable range of signal density used in the experiment is very limited. On the one hand, the signal density cannot be too low because the resultant optical map may not carry enough information for further analysis. Suppose the DNA length is fixed, an optical map with very low signal density will present fewer signals, meaning that the information content is lower. An improvement in experimental protocols to generate longer DNA molecules for imaging would extend the lower limit of signal density.
On the other hand, the signal density cannot be too high mainly because of the increased rate of resolution error and measurement error. The signaling patterns become hardly distinguished with others given the high error rate. Super-resolution microscopies and other imaging techniques are still under development to extend the upper limit of signal density.
Should I use two or more enzymes with single color?
Yes. This strategy is usually adopted when no single nicking enzymes produce optimal signal density, but two or more of them produce signal density below the optimal value. A combination of these enzymes may increase the overall signal density to optimal.
Should I use two enzymes to label the DNA molecules with two colors?
Yes. Two-color optical mapping is definitely superior in terms of increased information content, but the rate of extra and missing signals also increases. You need to confirm whether your instrument supports two-color imaging. Also, most analysis pipelines do not offer direct support to multi-color optical maps. Since the two-color technology is not commonly used, you should consult the instrument provider directly for more information and support.
What should I do if I want to target a specific repetitive region?
First, your target repetitive region should not be too small (smaller than the resolution / measurement error) or too large (longer than the length of captured optical maps).
Next, you need to check the size of the repeat unit. If a repeat unit is small (again, smaller than the resolution / measurement error), you should consider using a nicking enzyme without target sites within the repeat unit, but with target sites just next to the repetitive sequence. This way, you can estimate the number of copies based on the size of the repetitive region.
If the repeat unit is large enough, use any nicking enzyme that creates sites within the repeat unit. There is no requirement on the number of sites within each repeat unit, but the signal density should not be too high. In case no nicking enzyme matches the criteria, you should consider creating one based on the Cas9 system (See more in 26481349).
I don’t have the sequence information of my sample. Is there a general enzyme?
You can use a reference sequence of the same or a close species for experimental design. We do not recommend starting optical mapping experiment without knowing the signal density. There is no “general enzyme”, but if you really need to choose one without sequence information available, you should start with DLE-1. Or for nicking enzymes, you can try BbvCI, BspQI or BssSI, from our humbling experience.
What areas other than DNA can optical mapping be applied?
Theoretically we could do labelling on reversely transcribed RNA. However, RNA is too short to be studied given the current limitation on signal density. There are also studies on epigenomics and transcription binding sites, but these techniques are not widely used yet.