Generation Information and Q&A

How to set the minimum number of signals and minimum length of molecules in filtering?

In general, the common practice is to filter out molecules with fewer than 10 signals or smaller than 150 kbp. If you want to keep more data as the data coverage is not deep, you may want to lower the filtering cut-off. However, molecules with fewer than 5 signals and smaller than 100kb should not be used for analysis. Most of these molecules are hardly aligned confidently pair-wisely or on a reference.

Quick commands

Basic filtering

Filter the molecules with short length or few signals.

java -jar OMTools.jar DataTools --optmapin Sample.bnx --optmapout FilteredSample.bnx --minsig 10 --minsize 150000

Detailed commands

Filtering removes optical maps in the data that does not fulfill the criteria. The process is usually used to remove raw molecules with low quality from further analysis.

java -jar OMTools.jar DataTools --minsize 150000 --minsig 10 --optmapin RawMolecules.data --optmapout Molecules.data

This command removes molecules with size smaller than 150000 bp and number of signals fewer than 10.

java -jar OMTools.jar DataTools --lowcom 1 --optmapin RawMolecules.data --optmapout Molecules.data

This command removes molecules with low complexity.

java -jar OMTools.jar DataTools --optmapin Molecules.data --optmapout RenamedMolecules.data --idprefix Sample1_

This command add the prefix “Sample1_” to the each data entry.

Data selection allows quick extraction of selected data entries, or sampling from data entries.

java -jar OMTools.jar DataTools --optmapin Molecules.data --optmapout SelectedMolecules.data --dataid Molecule1 Molecule2 Molecule3

This command extracts molecules of name “Molecule1”, “Molecule2”, “Molecule3” to the output.

java -jar OMTools.jar DataTools --optmapin Molecules.data --optmapout RandomMolecules.data --randdata 1000 --seed 123456

This command randomly selects and extracts 1000 molecules using the random seed 123456.

Format conversion allows users to convert optical mapping data formats to the desired one. Note that some information may be lost throughout the conversion process.

java -jar OMTools.jar DataTools --optmapin BNXMolecules.bnx --optmapout CMAPMolecules.cmap

This command converts data in bnx format to cmap format.

Close signals are usually merged on in-silico digested reference before downstream analysis.

java -jar OMTools.jar DataTools --optmapin Reference.ref --optmapout Reference_1000.ref --condense 1000

This command merges signals closer than 1000 bp into one signal.