Skip to content

2. Pairwise

The Pairwise command in DBRetina is designed to perform pairwise comparisons between supergroups based on their shared features. This command takes the index prefix and the number of cores as input parameters.

Usage: DBRetina pairwise [OPTIONS]

  Calculate pairwise distances.

Options:
  -i, --index-prefix TEXT   Index file prefix  [required]
  -t, --threads INTEGER     number of cores
  -d, --dist-type TEXT      select from ['min_cont', 'avg_cont', 'max_cont',
                            'ochiai', 'jaccard']  [default: max_cont]
  -c, --cutoff FLOAT RANGE  filter out distances < cutoff  [default: 0.0;
                            0<=x<=100]
  --help                    Show this message and exit.

2.1 Command arguments

-i, --index-prefix TEXT Index file prefix [required]

This is the user-defined prefix that was used in the indexing step.

-t, --threads INTEGER number of cores

The number of processing cores to be used for parallel computation during the pairwise comparisons.

-d, --dist-type TEXT select from ['min_cont', 'avg_cont', 'max_cont', 'ochiai', 'jaccard'] [default: max_cont]

-c, --cutoff FLOAT RANGE filter out distances < cutoff [default: 0.0; 0<=x<=100]

The -d and -c input parameters serve the purpose of selecting a particular distance metric and predefined cutoff. This cutoff will eliminate all pairwise comparisons that have a distance value lower than the cutoff.


2.2 Output files format

{perfix}_DBRetina_pairwise.tsv

A TSV file that provides information about shared features between each pair of supergroups. The TSV columns are defined as follows:

group_1_ID ID of the first supergroup in a pair
group_2_ID ID of the second supergroup in a pair
group_1_name name of the first supergroup in a pair
group_2_name name of the second supergroup in a pair
shared_features number of features shared between the two supergroups
min_containment minimum containment between the two supergroups
avg_containment average containment between the two supergroups
max_containment maximum containment between the two supergroups
ochiai Ochiai distance between the two supergroups
jaccard Jaccard distance between the two supergroups

The output PNG file of histogram of pairwise distances

{index_prefix}_DBRetina_distance_metrics_plot_log.png

clustered bar chart illustrates the frequency distribution of five distance metrics - min_cont, avg_cont, max_cont, ochiai, and jaccard - across various distance ranges. The y-axis is displayed on a logarithmic scale to accommodate the wide range of frequencies observed in the data.

{index_prefix}_DBRetina_distance_metrics_plot_linear.png

Same as above, but the y-axis is displayed on a linear scale.