How do I know if I have enough coverage for assembly?

Usually 100x coverage should be enough. However, if you are not sure, you can sample 10x, 20x, 30x… data for de novo assembly for saturation analysis based on contig n50.

Could I perform de novo assembly with ultra-high (e.g. 1000x) data coverage?

No. Theoretically it should be okay, but practically the Bionano Solve assembly pipeline becomes very slow when ultra-high coverage is used, even for de novo assembly of small bacterial genome. In case your data coverage is too high for de novo assembly to complete in a reasonable period, you could sample 100x data for de novo assembly.

Quick commands

De novo assembly using Bionano Solve.

python pipelineCL.py -T 8 -j 8 -t BionanoaRefAlignerDir -b Sample.bnx -a optArguments_nonhaplotype_irys.xml -l OutputDir

This command performs de novo assembly on optical maps Sample.bnx and output to the directory OutputDir, using the arguments listed in the file optArguments_nonhaplotype_irys.xml with 8 threads.

How much depth of coverage is needed for assembly?

Bionano requires 150X of genome size of total effective molecule length for assembly. Empirically, assembly may finish with lower genome coverage with over 100X data.

How to configure an assembly run?

It is essential to use a proper argument file for input. There are a few templates provided by Bionano as defaults. To select the right template, note for the followings:

  • Platform of Bionano system (Irys/Saphyr)
  • Is haplotype resolution needed?

While all parameters are adjustable within the XML file, many parameters are optimized by the manufacturer, such as to suit the different optics and data nature of the two platforms. Here are a few of the most usual parameters for users to adjust:

  • Enzyme: Enzyme used in mapping experiments, usually selected to optimize signal density
  • P-value cut-off threshold (-T): for estimation of coverage of molecule support needed for trimming or extension; 1e-10 by default; recommends 1e-5/genome size(Mb)
  • Max Memory (-maxmem): According to available computing resources

Advanced users may experiment with arguments such as minimum molecule length (-minlen), maximum nicking sites (-maxsites), mapping rate (-MapRate), to suit their genome and mapping data.