Scatter plots with many points

Author

Michael Stadler, Charlotte Soneson

Summary

This document illustrates how to use ggplot2 and scattermore to:

  • make a scatter plot with many data points that shows the density of points without saturation
  • that is fast to view and arrange (for example when assembling figure panels)
  • that can be stored into a compact file without loss in quality

Prepare data

Run the following code to prepare the data used in this document:

suppressPackageStartupMessages({
    library(ggplot2)
    library(tibble)
})

# built-in `diamonds` dataset from the `ggplot2` package (see ?ggplot2::diamonds)
tibble(diamonds)
# A tibble: 53,940 × 10
   carat cut       color clarity depth table price     x     y     z
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
 7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
 8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
 9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39
# ℹ 53,930 more rows

Create figure

Load packages

Code
library(ggplot2)
library(scattermore)

Plot

Let’s first create a simple scatter plot to illustrate the problems. The diamonds dataset has 53940 observations. If stored into a vectorized graphics device such as pdf() or svg(), the file will be large (each observation is individually represented as graphic elements) and slow to open or arrange. Furthermore, the high number of data points leads to saturation and we do not see the full underlying density of data points.

# create base plot
gg <- ggplot(data = diamonds, mapping = aes(x = carat, y = price)) +
    labs(x = "Weight of the diamond (carat)", y = "Price (US dollars)") +
    theme_bw(20) +
    theme(panel.grid = element_blank(),
          legend.position = "bottom")
gg + geom_point()

A simple way to improve the saturation issue is to use transparency, so that overlapping observations lead to darker colors. However, this does not solve the “many points” problem yet.

Code
# ... with transparency
gg + geom_point(color = alpha("black", 0.02))

A simple way to solve also the “many points” problem is to avoid showing all individual observations and instead show the local density of points using a color scale.

Code
# ... with marginal density plots by number of cylinders
gg + geom_density_2d_filled(bins = 48) +
    coord_cartesian(expand = FALSE) +
    theme(legend.position = "none")

The linear contour levels or color intervals (controlled by bins or breaks) may not work well in a case like ours, where the density is very high in some regions that will occupy almost the complete color scale and we lose resolution in low density regions. You can use breaks to create non-linear intervals (here combined with ndensity so that we know the range of densities: [0, 1]) and with theme(panel.background) to make the zero-density area dark blue.

Code
# ... with marginal density plots by number of cylinders
gg + geom_density_2d_filled(contour_var = "ndensity",
                            breaks = exp(seq(log(1e-4), log(1), length.out = 64))) +
    coord_cartesian(expand = FALSE) +
    theme(legend.position = "none",
          panel.background = element_rect(fill = hcl.colors(64)[1]))

Finally, if you prefer to show individual observations and need something that will scale to millions of points without getting slow or hard to use, scattermore provides a solution for that too.

Code
# ... with marginal violin and labelled data points
gg + geom_scattermore(pointsize = 2, alpha = 0.02)

Remarks

  • scattermore does its magic by rendering the geom_point layer into a bitmap, while keeping all the other layers as they are, allowing you to create pdf() files that can be magnified without losing readability of the axes.
  • scattermore provides two ggplot2 layers: geom_scattermore() (which behaves mostly like geom_point()), and geom_scattermost() which avoids much of the overhead of a normal ggplot2 layer and thus is even more efficient, at the price that it has a slightly different interface and needs to get the data directly as the xy argument.
  • An alternative, maybe even more convenient drop-in replacement for geom_point() and other ggplot2 geoms that follows a similar strategy is geom_point_rast() from the ggrastr package. This package also provides the rasterize() function that can take any existing ggplot2 plot object and rasterize suitable layer in it. Compared to scattermore, ggrastr does not seem to be as fast and scale as well, though.

Session info

Code
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.5.2 (2025-10-31)
 os       macOS Sequoia 15.7.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Zurich
 date     2025-11-03
 pandoc   3.6.4 @ /usr/local/bin/ (via rmarkdown)
 quarto   1.7.28 @ /usr/local/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 package      * version date (UTC) lib source
 cli            3.6.5   2025-04-23 [1] CRAN (R 4.5.0)
 dichromat      2.0-0.1 2022-05-02 [1] CRAN (R 4.5.0)
 digest         0.6.37  2024-08-19 [1] CRAN (R 4.5.0)
 dplyr          1.1.4   2023-11-17 [1] CRAN (R 4.5.0)
 evaluate       1.0.5   2025-08-27 [1] CRAN (R 4.5.1)
 farver         2.1.2   2024-05-13 [1] CRAN (R 4.5.0)
 fastmap        1.2.0   2024-05-15 [1] CRAN (R 4.5.0)
 generics       0.1.4   2025-05-09 [1] CRAN (R 4.5.0)
 ggplot2      * 4.0.0   2025-09-11 [1] CRAN (R 4.5.0)
 glue           1.8.0   2024-09-30 [1] CRAN (R 4.5.0)
 gtable         0.3.6   2024-10-25 [1] CRAN (R 4.5.0)
 htmltools      0.5.8.1 2024-04-04 [1] CRAN (R 4.5.0)
 htmlwidgets    1.6.4   2023-12-06 [1] CRAN (R 4.5.0)
 isoband        0.2.7   2022-12-20 [1] CRAN (R 4.5.0)
 jsonlite       2.0.0   2025-03-27 [1] CRAN (R 4.5.0)
 knitr          1.50    2025-03-16 [1] CRAN (R 4.5.0)
 labeling       0.4.3   2023-08-29 [1] CRAN (R 4.5.0)
 lifecycle      1.0.4   2023-11-07 [1] CRAN (R 4.5.0)
 magrittr       2.0.4   2025-09-12 [1] CRAN (R 4.5.1)
 MASS           7.3-65  2025-02-28 [2] CRAN (R 4.5.2)
 pillar         1.11.1  2025-09-17 [1] CRAN (R 4.5.0)
 pkgconfig      2.0.3   2019-09-22 [1] CRAN (R 4.5.0)
 R6             2.6.1   2025-02-15 [1] CRAN (R 4.5.0)
 RColorBrewer   1.1-3   2022-04-03 [1] CRAN (R 4.5.0)
 rlang          1.1.6   2025-04-11 [1] CRAN (R 4.5.0)
 rmarkdown      2.30    2025-09-28 [1] CRAN (R 4.5.0)
 rstudioapi     0.17.1  2024-10-22 [1] CRAN (R 4.5.0)
 S7             0.2.0   2024-11-07 [1] CRAN (R 4.5.0)
 scales         1.4.0   2025-04-24 [1] CRAN (R 4.5.0)
 scattermore  * 1.2     2023-06-12 [1] CRAN (R 4.5.0)
 sessioninfo    1.2.3   2025-02-05 [1] CRAN (R 4.5.0)
 tibble       * 3.3.0   2025-06-08 [1] CRAN (R 4.5.0)
 tidyselect     1.2.1   2024-03-11 [1] CRAN (R 4.5.0)
 utf8           1.2.6   2025-06-08 [1] CRAN (R 4.5.0)
 vctrs          0.6.5   2023-12-01 [1] CRAN (R 4.5.0)
 viridisLite    0.4.2   2023-05-02 [1] CRAN (R 4.5.0)
 withr          3.0.2   2024-10-28 [1] CRAN (R 4.5.0)
 xfun           0.54    2025-10-30 [1] CRAN (R 4.5.0)
 yaml           2.3.10  2024-07-26 [1] CRAN (R 4.5.0)

 [1] /Users/stadler/Library/R/arm64/4.5/library/__bioc322
 [2] /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
 * ── Packages attached to the search path.

──────────────────────────────────────────────────────────────────────────────