Dedup
The dedup
command is used to remove duplicate groups from a pairwise file that shares very high similarity.
Usage: DBRetina dedup [OPTIONS]
Deduplicate the pairwise distance file using ochiai similarity
Options:
-i, --index-prefix TEXT Index file prefix [required]
-p, --pairwise PATH the pairwise TSV file [required]
-c, --cutoff FLOAT RANGE ochiai similarity cutoff [0<=x<=100; required]
-o, --output TEXT output file prefix [required]
--help Show this message and exit.
Command arguments
-i, --index-prefix TEXT Index file prefix [required]
This is the user-defined prefix that was used in the indexing step as an output prefix.
-p, --pairwise PATH the pairwise TSV file [required]
The original or a filtered pairwise TSV file.
-c, --cutoff FLOAT RANGE filter out similarities < cutoff [default: 0.0; 0<=x<=100]
The -c --cutoff
argument uses the Ochiai metric.
-o, --output TEXT output prefix [required]
The output files prefix.
Output files format
{output_prefix}_deduplicated_groups.txt
The final deduplicated groups file. This file can be used with the DBRetina filter
command to filter out the groups that are not present in the final deduplicated groups file.
Created: July 8, 2023