Generation Information and Q&A
How to set the minimum number of signals and minimum length of molecules in filtering?
In general, the common practice is to filter out molecules with fewer than 10 signals or smaller than 150 kbp. If you want to keep more data as the data coverage is not deep, you may want to lower the filtering cut-off. However, molecules with fewer than 5 signals and smaller than 100kb should not be used for analysis. Most of these molecules are hardly aligned confidently pair-wisely or on a reference.
Quick commands
Basic filtering
Filter the molecules with short length or few signals.
java -jar OMTools.jar DataTools --optmapin Sample.bnx --optmapout FilteredSample.bnx --minsig 10 --minsize 150000
Detailed commands
Filtering
Filtering removes optical maps in the data that does not fulfill the criteria. The process is usually used to remove raw molecules with low quality from further analysis.
java -jar OMTools.jar DataTools --minsize 150000 --minsig 10 --optmapin RawMolecules.data --optmapout Molecules.data
This command removes molecules with size smaller than 150000 bp and number of signals fewer than 10.
java -jar OMTools.jar DataTools --lowcom 1 --optmapin RawMolecules.data --optmapout Molecules.data
This command removes molecules with low complexity.
Renaming
java -jar OMTools.jar DataTools --optmapin Molecules.data --optmapout RenamedMolecules.data --idprefix Sample1_
This command add the prefix “Sample1_” to the each data entry.
Data selection
Data selection allows quick extraction of selected data entries, or sampling from data entries.
java -jar OMTools.jar DataTools --optmapin Molecules.data --optmapout SelectedMolecules.data --dataid Molecule1 Molecule2 Molecule3
This command extracts molecules of name “Molecule1”, “Molecule2”, “Molecule3” to the output.
java -jar OMTools.jar DataTools --optmapin Molecules.data --optmapout RandomMolecules.data --randdata 1000 --seed 123456
This command randomly selects and extracts 1000 molecules using the random seed 123456.
Data format conversion
Format conversion allows users to convert optical mapping data formats to the desired one. Note that some information may be lost throughout the conversion process.
java -jar OMTools.jar DataTools --optmapin BNXMolecules.bnx --optmapout CMAPMolecules.cmap
This command converts data in bnx format to cmap format.
Merging close signals
Close signals are usually merged on in-silico digested reference before downstream analysis.
java -jar OMTools.jar DataTools --optmapin Reference.ref --optmapout Reference_1000.ref --condense 1000
This command merges signals closer than 1000 bp into one signal.