Filtering pairwise similarities
The Filter command in DBRetina is designed to filter out the pairwise TSV file. The command requires the full path of the pairwise TSV file.
Usage: DBRetina filter [OPTIONS]
Filter a pairwise file.
Detailed description:
Filter a pairwise file by similarity cutoff and/or a set of groups
(provided as a single-column file or cluster IDs in a DBRetina cluster
file).
Options:
-p, --pairwise PATH the pairwise TSV file [required]
-g, --groups-file PATH single-column supergroups file
--clusters-file PATH DBRetina clusters file
--cluster-ids TEXT comma-separated list of cluster IDs
-m, --metric TEXT select from ['containment', 'ochiai', 'jaccard',
'pvalue']
-c, --cutoff FLOAT RANGE filter out similarities < cutoff [0<=x<=100]
--extend include all supergroups that are linked to the
given supergroups.
-o, --output TEXT output file prefix [required]
--help Show this message and exit.
Command arguments
-p, --pairwise PATH the pairwise TSV file [required]
The pairwise file to be filtered. This file can be generated by DBRetina pairwise
or DBretina filter
commands.
-g, --groups-file PATH single-column supergroups file The groups file is a single-column file text file that contains the names of the supergroups to be included in the filtering.
Note
The group names must exist in the {index_prefix}_raw.json
file. It will be autoconverted to lowercase.
--clusters-file PATH DBRetina clusters file
The clusters file is generated from the command DBRetina cluster
. This is used alongside the --cluster-ids
argument to filter out all pairwise similarities that are not in the provided clusters.
--cluster-ids TEXT comma-separated list of cluster IDs
The cluster IDs selected from the clusters file. This argument is only used if the --clusters-file
is provided.
-m, --metric TEXT select from ['containment', 'ochiai', 'jaccard', 'pvalue']
The similarity metric to filter out pairwise comparisons below a certain cutoff.
-c, --cutoff FLOAT RANGE filter out similarities < cutoff [default: 0.0; 0<=x<=100]
The -c --cutoff
argument is used with the -m --metric
argument to define the cutoff.
--extend include all supergroups that are linked to the given supergroups.
In pairwise comparisons, imagine a graph with groups as nodes and their relationships as edges. Initially, only edges between user-defined groups are considered. The --extend
option enlarges this scope, incorporating all nodes directly linked to user-defined groups, from either the --groups
input or the --clusters-file
& --cluster-ids
. This option can only be used when groups information is provided.
Here's an example of the effect of --extend
on the pairwise graph. The user-defined groups are nodes (1,2,5).
-o, --output TEXT output file prefix [required]
The output files prefix.
Usage examples
This will filter out all pairwise similarities that are below 60% Ochiai similarity.
This will filter out all pairwise similarities that are below 97% containment similarity AND exist in the groups.tsv
file.
Output files
{output_prefix}.tsv
Filtered version of the pairwise TSV file.
{output_prefix}_extended_supergroups.txt
If the --extend
argument is used, this file will contain the names of the extended supergroups.
Created: July 5, 2023