Skip to content

Dedup

The dedup command is used to remove duplicate groups from a pairwise file that shares very high similarity.

Usage: DBRetina dedup [OPTIONS]

  Deduplicate the pairwise distance file using ochiai similarity

Options:
  -i, --index-prefix TEXT   Index file prefix  [required]
  -p, --pairwise PATH       the pairwise TSV file  [required]
  -c, --cutoff FLOAT RANGE  ochiai similarity cutoff  [0<=x<=100; required]
  -o, --output TEXT         output file prefix  [required]
  --help                    Show this message and exit.

Command arguments

-i, --index-prefix TEXT Index file prefix [required]

This is the user-defined prefix that was used in the indexing step as an output prefix.

-p, --pairwise PATH the pairwise TSV file [required]

The original or a filtered pairwise TSV file.

-c, --cutoff FLOAT RANGE filter out similarities < cutoff [default: 0.0; 0<=x<=100]

The -c --cutoff argument uses the Ochiai metric.

-o, --output TEXT output prefix [required]

The output files prefix.


Output files format

{output_prefix}_deduplicated_groups.txt

The final deduplicated groups file. This file can be used with the DBRetina filter command to filter out the groups that are not present in the final deduplicated groups file.


Last update: July 8, 2023
Created: July 8, 2023
Authors: mr-eyes