3. Filter
The Filter command in DBRetina is designed to filter out the pairwise TSV file. The command requires the full path of the pairwise TSV file.
❯ DBRetina filter --help
Usage: DBRetina filter [OPTIONS]
Filter a pairwise file.
Detailed description:
Filter a pairwise file by distance cutoff and/or a set of groups
(provided as a single-column file or cluster IDs in a DBRetina cluster
file).
Examples:
1- distance cutoff only | dbretina filter -p pairwise.tsv
-d ochiai -c 60 -o filtered.tsv
2- distance cutoff and groups file | dbretina filter -p pairwise.tsv
-d min_cont -c 97 -g groups.tsv -o filtered.tsv
3- distance cutoff and a cluster IDs | dbretina filter -p pairwise.tsv
-d max_cont -c 77 --clusters-file clusters.tsv --clusters-id 8 -o
filtered.tsv
4- groups file only | dbretina filter -p pairwise.tsv
-g groups.tsv -o filtered.tsv
5- cluster file with cluster IDs | dbretina filter -p pairwise.tsv
--clusters-file clusters.tsv --clusters-id 8 -o filtered.tsv
Options:
-p, --pairwise PATH the pairwise TSV file [required]
-g, --groups-file PATH single-column supergroups file
--clusters-file PATH DBRetina clusters file
--cluster-ids TEXT comma-separated list of cluster IDs
-d, --dist-type TEXT select from ['min_cont', 'avg_cont', 'max_cont',
'ochiai', 'jaccard'] [default: NA]
-c, --cutoff FLOAT RANGE filter out distances < cutoff [default: 0.0;
0<=x<=100]
--extend include all supergroups that are linked to the
given supergroups.
-o, --output TEXT output file prefix [required]
--help Show this message and exit.
3.1 Command arguments
3.1.1 Filtering by distance's cutoff
-c, --cutoff FLOAT RANGE filter out distances < cutoff [default: 0.0; 0<=x<=100]
This will filter out all pairwise distances that are below the cutoff value.
-d, --dist-type TEXT select from ['min_cont', 'avg_cont', 'max_cont', 'ochiai', 'jaccard'] [default: NA]
The distance metric to apply the cutoff on.
3.1.2 Filtering by supergroups
-g, --groups-file PATH single-column supergroups file
This will filter out all pairwise distances that are between supergroups that are not in the provided groups file. The groups file is a single-column file that contains the names of the supergroups to be included in the filtering.
3.1.3 Filtering by clusters
This will filter out all pairwise distances that are between clusters that are not in the provided clusters file.
--clusters-file PATH DBRetina clusters file
The clusters file is a DBRetina clusters file that contains the cluster IDs to be included in the filtering.
--cluster-ids TEXT comma-separated list of cluster IDs
The cluster IDs selected from the clusters file. This argument is only used if the clusters file is not provided.
3.1.4 Extending the filteration
--extend include all supergroups that are linked to the given supergroups.
This will include all supergroups that are linked to the supergroups in the groups file. This argument is only used if the groups are provided from either the groups file or the clusters file.
3.2 Output files format
{output_prefix}.tsv
Filtered version of the pairwise TSV file.
{output_prefix}_extended_supergroups.txt
If the --extend
argument is used, this file will contain the names of the extended supergroups.