Pairwise scatter plots

Author

Charlotte Soneson, Michael Stadler

Summary

This document illustrates how to use GGally (a companion package to ggplot2) to:

  • make pairwise scatter plots of columns from a data set
  • customize the default panels

Prepare data

Run the following code to load the packages and data used in this document:

suppressPackageStartupMessages({
    library(tibble)
    library(ggplot2)
    library(GGally)
    library(swissknife)
})

loadExampleData("mycars")
`mycars`: re-encoded version of `datasets::mtcars`
tibble(mycars)
# A tibble: 32 × 13
     mpg cyl    disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21   6      160    110  3.9   2.62  16.5     0     1     4     4
 2  21   6      160    110  3.9   2.88  17.0     0     1     4     4
 3  22.8 4      108     93  3.85  2.32  18.6     1     1     4     1
 4  21.4 6      258    110  3.08  3.22  19.4     1     0     3     1
 5  18.7 8      360    175  3.15  3.44  17.0     0     0     3     2
 6  18.1 6      225    105  2.76  3.46  20.2     1     0     3     1
 7  14.3 8      360    245  3.21  3.57  15.8     0     0     3     4
 8  24.4 4      147.    62  3.69  3.19  20       1     0     4     2
 9  22.8 4      141.    95  3.92  3.15  22.9     1     0     4     2
10  19.2 6      168.   123  3.92  3.44  18.3     1     0     4     4
# ℹ 22 more rows
# ℹ 2 more variables: engine_shape <fct>, transmission <fct>

Create basic pairs plot

We start by using the GGally::ggpairs() function to create a pairs plot with mostly default settings, displaying a subset of the columns from mycars.

Code
ggpairs(mycars, 
        columns = c("mpg", "disp", "hp", "wt", "transmission")) + 
    theme_bw(13)
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

Add aesthetics to the panels

Next, we note that we can add mappings (aesthetics) to the panels, e.g. to color by a variable.

Code
ggpairs(mycars, 
        columns = c("mpg", "disp", "hp", "wt"),
        mapping = aes(colour = transmission)) + 
    theme_bw(13)

Customize individual panels

Specifying the mapping as above modifies all the panels in the pairs plot. We can also customize individual panels. Here we illustrate this by coloring the points in the panels below the diagonal and adding a regression line for each group, but displaying only the overall correlation in the panels above the diagonal.

Code
## Define plot function for the scatter plots
plotpoints <- function(data, mapping, ...) {
    ggplot(data = data, mapping = mapping) +
        aes(color = transmission) + 
        geom_smooth(method = "lm", se = FALSE, 
                    formula = "y ~ x",
                    linetype = "dashed") + 
        geom_point(alpha = 0.5, size = 3) +
        theme_bw(13)
}

## Use this function in the ggpairs call
ggpairs(mycars, 
        lower = list(continuous = plotpoints),
        columns = c("mpg", "disp", "hp", "wt"),
        progress = FALSE) + 
    theme_bw(13) + 
    labs(title = "A subset of the mtcars data set",
         subtitle = "Lines represent linear regression")

We can also modify the panels above the diagonal to display the overall correlation, but change the font size and add a background color according to the strength of the correlation.

Code
## Define correlation function for the panels above the diagonal
cor_fcn <- function(data, mapping, ...) {
    ## Get data
    xData <- GGally::eval_data_col(data, mapping$x)
    yData <- GGally::eval_data_col(data, mapping$y)
    
    ## Calculate correlation
    corr <- cor(xData, yData, method = "pearson")
    
    ## Define background color
    ## The colorRamp() function creates a function that map the interval 
    ## [0, 1] to colors
    if (corr >= 0) {
        cols <- hcl.colors(n = 11, palette = "RdBu")[6:2]
        col <- rgb(colorRamp(cols)(abs(corr)),
                   maxColorValue = 255)
    } else {
        cols <- hcl.colors(n = 11, palette = "RdBu")[6:10]
        col <- rgb(colorRamp(cols)(abs(corr)),
                   maxColorValue = 255)
    }

    ## Construct plot
    ggplot(data = data, mapping = mapping) +
        annotate(x = 0.5, y = 0.5, 
                 label = paste0("Corr: ", round(corr, digits = 3)),
                 geom = "text",
                 size = abs(corr) * 5 + 1) +
        theme_bw(13) + xlim(c(0, 1)) + ylim(c(0, 1)) +
        theme(panel.background = element_rect(fill = col),
              panel.grid.major = element_blank(),
              panel.grid.minor = element_blank())
}

ggpairs(mycars, 
        lower = list(continuous = plotpoints),
        upper = list(continuous = cor_fcn),
        columns = c("mpg", "disp", "hp", "wt"),
        progress = FALSE) + 
    labs(title = "A subset of the mtcars data set",
         subtitle = "Lines represent linear regression")

Remarks

  • It is possible to define customized plots separately for continuous, discrete and ‘combo’ data types, see the ggpairs documentation for more details.

Session info

Code
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.2 (2025-10-31)
 os       macOS Sequoia 15.7.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Zurich
 date     2025-11-03
 pandoc   3.6.4 @ /usr/local/bin/ (via rmarkdown)
 quarto   1.7.28 @ /usr/local/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 package              * version date (UTC) lib source
 abind                  1.4-8   2024-09-12 [1] CRAN (R 4.5.0)
 Biobase                2.70.0  2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 BiocGenerics           0.56.0  2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 BiocParallel           1.44.0  2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 cli                    3.6.5   2025-04-23 [1] CRAN (R 4.5.0)
 codetools              0.2-20  2024-03-31 [2] CRAN (R 4.5.2)
 DelayedArray           0.36.0  2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 dichromat              2.0-0.1 2022-05-02 [1] CRAN (R 4.5.0)
 digest                 0.6.37  2024-08-19 [1] CRAN (R 4.5.0)
 dplyr                  1.1.4   2023-11-17 [1] CRAN (R 4.5.0)
 evaluate               1.0.5   2025-08-27 [1] CRAN (R 4.5.1)
 farver                 2.1.2   2024-05-13 [1] CRAN (R 4.5.0)
 fastmap                1.2.0   2024-05-15 [1] CRAN (R 4.5.0)
 fs                     1.6.6   2025-04-12 [1] CRAN (R 4.5.0)
 generics               0.1.4   2025-05-09 [1] CRAN (R 4.5.0)
 GenomeInfoDb           1.46.0  2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 GenomicRanges          1.62.0  2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 GGally               * 2.4.0   2025-08-23 [1] CRAN (R 4.5.0)
 ggplot2              * 4.0.0   2025-09-11 [1] CRAN (R 4.5.0)
 ggstats                0.11.0  2025-09-15 [1] CRAN (R 4.5.0)
 glue                   1.8.0   2024-09-30 [1] CRAN (R 4.5.0)
 gtable                 0.3.6   2024-10-25 [1] CRAN (R 4.5.0)
 htmltools              0.5.8.1 2024-04-04 [1] CRAN (R 4.5.0)
 htmlwidgets            1.6.4   2023-12-06 [1] CRAN (R 4.5.0)
 httr                   1.4.7   2023-08-15 [1] CRAN (R 4.5.0)
 IRanges                2.44.0  2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 jsonlite               2.0.0   2025-03-27 [1] CRAN (R 4.5.0)
 KernSmooth             2.23-26 2025-01-01 [2] CRAN (R 4.5.2)
 knitr                  1.50    2025-03-16 [1] CRAN (R 4.5.0)
 labeling               0.4.3   2023-08-29 [1] CRAN (R 4.5.0)
 lattice                0.22-7  2025-04-02 [1] CRAN (R 4.5.0)
 lifecycle              1.0.4   2023-11-07 [1] CRAN (R 4.5.0)
 magrittr               2.0.4   2025-09-12 [1] CRAN (R 4.5.1)
 Matrix                 1.7-4   2025-08-28 [2] CRAN (R 4.5.2)
 MatrixGenerics         1.22.0  2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 matrixStats            1.5.0   2025-01-07 [1] CRAN (R 4.5.0)
 mgcv                   1.9-3   2025-04-04 [1] CRAN (R 4.5.0)
 nlme                   3.1-168 2025-03-31 [2] CRAN (R 4.5.2)
 pillar                 1.11.1  2025-09-17 [1] CRAN (R 4.5.0)
 pkgconfig              2.0.3   2019-09-22 [1] CRAN (R 4.5.0)
 png                    0.1-8   2022-11-29 [1] CRAN (R 4.5.0)
 purrr                  1.1.0   2025-07-10 [1] CRAN (R 4.5.1)
 R6                     2.6.1   2025-02-15 [1] CRAN (R 4.5.0)
 RColorBrewer           1.1-3   2022-04-03 [1] CRAN (R 4.5.0)
 Rcpp                   1.1.0   2025-07-02 [1] CRAN (R 4.5.0)
 rlang                  1.1.6   2025-04-11 [1] CRAN (R 4.5.0)
 rmarkdown              2.30    2025-09-28 [1] CRAN (R 4.5.0)
 rstudioapi             0.17.1  2024-10-22 [1] CRAN (R 4.5.0)
 S4Arrays               1.10.0  2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 S4Vectors              0.48.0  2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 S7                     0.2.0   2024-11-07 [1] CRAN (R 4.5.0)
 scales                 1.4.0   2025-04-24 [1] CRAN (R 4.5.0)
 Seqinfo                1.0.0   2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 sessioninfo            1.2.3   2025-02-05 [1] CRAN (R 4.5.0)
 SparseArray            1.10.1  2025-10-31 [1] Bioconductor 3.22 (R 4.5.1)
 SummarizedExperiment   1.40.0  2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 swissknife           * 0.44    2025-11-03 [1] Github (fmicompbio/swissknife@c69e512)
 tibble               * 3.3.0   2025-06-08 [1] CRAN (R 4.5.0)
 tidyr                  1.3.1   2024-01-24 [1] CRAN (R 4.5.0)
 tidyselect             1.2.1   2024-03-11 [1] CRAN (R 4.5.0)
 UCSC.utils             1.6.0   2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 usethis                3.2.1   2025-09-06 [1] CRAN (R 4.5.0)
 utf8                   1.2.6   2025-06-08 [1] CRAN (R 4.5.0)
 vctrs                  0.6.5   2023-12-01 [1] CRAN (R 4.5.0)
 withr                  3.0.2   2024-10-28 [1] CRAN (R 4.5.0)
 xfun                   0.54    2025-10-30 [1] CRAN (R 4.5.0)
 XVector                0.50.0  2025-10-29 [1] Bioconductor 3.22 (R 4.5.1)
 yaml                   2.3.10  2024-07-26 [1] CRAN (R 4.5.0)

 [1] /Users/stadler/Library/R/arm64/4.5/library/__bioc322
 [2] /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
 * ── Packages attached to the search path.

──────────────────────────────────────────────────────────────────────────────