Skip to content

Filtering pairwise similarities

The Filter command in DBRetina is designed to filter out the pairwise TSV file. The command requires the full path of the pairwise TSV file.

Usage: DBRetina filter [OPTIONS]

  Filter a pairwise file.

  Detailed description:

      Filter a pairwise file by similarity cutoff and/or a set of groups
      (provided as a single-column file or cluster IDs in a DBRetina cluster

  -p, --pairwise PATH       the pairwise TSV file  [required]
  -g, --groups-file PATH    single-column supergroups file
  --clusters-file PATH      DBRetina clusters file
  --cluster-ids TEXT        comma-separated list of cluster IDs
  -m, --metric TEXT         select from ['containment', 'ochiai', 'jaccard',
  -c, --cutoff FLOAT RANGE  filter out similarities < cutoff  [0<=x<=100]
  --extend                  include all supergroups that are linked to the
                            given supergroups.
  -o, --output TEXT         output file prefix  [required]
  --help                    Show this message and exit.

Command arguments

-p, --pairwise PATH the pairwise TSV file [required]

The pairwise file to be filtered. This file can be generated by DBRetina pairwise or DBretina filter commands.

-g, --groups-file PATH single-column supergroups file The groups file is a single-column file text file that contains the names of the supergroups to be included in the filtering.


The group names must exist in the {index_prefix}_raw.json file. It will be autoconverted to lowercase.

--clusters-file PATH DBRetina clusters file

The clusters file is generated from the command DBRetina cluster. This is used alongside the --cluster-ids argument to filter out all pairwise similarities that are not in the provided clusters.

--cluster-ids TEXT comma-separated list of cluster IDs

The cluster IDs selected from the clusters file. This argument is only used if the --clusters-file is provided.

-m, --metric TEXT select from ['containment', 'ochiai', 'jaccard', 'pvalue']

The similarity metric to filter out pairwise comparisons below a certain cutoff.

-c, --cutoff FLOAT RANGE filter out similarities < cutoff [default: 0.0; 0<=x<=100]

The -c --cutoff argument is used with the -m --metric argument to define the cutoff.

--extend include all supergroups that are linked to the given supergroups.

In pairwise comparisons, imagine a graph with groups as nodes and their relationships as edges. Initially, only edges between user-defined groups are considered. The --extend option enlarges this scope, incorporating all nodes directly linked to user-defined groups, from either the --groups input or the --clusters-file & --cluster-ids. This option can only be used when groups information is provided.

Here's an example of the effect of --extend on the pairwise graph. The user-defined groups are nodes (1,2,5).

Image title

Image title

-o, --output TEXT output file prefix [required]

The output files prefix.

Usage examples

  DBRetina filter -p pairwise.tsv -m ochiai -c 60 -o filtered_pairwise

This will filter out all pairwise similarities that are below 60% Ochiai similarity.

  DBRetina filter -p pairwise.tsv -m containment -c 97 -g groups.tsv -o filtered_pairwise

This will filter out all pairwise similarities that are below 97% containment similarity AND exist in the groups.tsv file.

DBRetina filter -p pairwise.tsv --clusters-file clusters.tsv --clusters-ids 1,2,8 -o filtered_pairwise

This will filter out all pairwise similarities that are not in the clusters with IDs 1, 2, and 8.

  DBRetina filter -p pairwise.tsv -g groups.tsv -o filtered_pairwise

This will filter out all pairwise similarities that are between groups that are not in the groups.tsv file. In other words, it will only keep the similarities between groups that are in the groups.tsv file.

Output files


Filtered version of the pairwise TSV file.


If the --extend argument is used, this file will contain the names of the extended supergroups.

Last update: July 8, 2023
Created: July 5, 2023
Authors: mr-eyes