Rarefy localities within latitudinal bands

bandit subsamples spatial point data to a specified number of sites within bins of equal latitude

Usage

bandit(
  dat,
  xy,
  iter,
  nSite,
  bin,
  centr = FALSE,
  absLat = FALSE,
  maxN = 90,
  maxS = -90,
  crs = "epsg:4326",
  output = "locs"
)

Arguments

dat: A data.frame or matrix containing the coordinate columns xy and any associated variables, e.g. taxon names.
xy: A vector of two elements, specifying the name or numeric position of columns in dat containing coordinates, e.g. longitude and latitude. Coordinates for any shared sampling sites should be identical, and where sites are raster cells, coordinates are usually expected to be cell centroids.
iter: The number of times to subsample localities within each latitudinal band.
nSite: The quota of unique locations to include in each subsample.
bin: A positive numeric value for latitudinal band width, in degrees.
centr: Logical: should a bin center on and cover the equator (TRUE) or should the equator mark the boundary between the lowest-latitude northern and southern bins (FALSE, default)? Ignored if absLat = TRUE.
absLat: Logical: should only the absolute values of latitude be evaluated? If absLat = TRUE, centr argument is ignored.
maxN: Optional argument to specify the northmost limit for subsampling, if less than 90 degrees.
maxS: Optional argument to specify the southmost limit for subsampling, if not -90 degrees. Should be a negative value if in the southern hemisphere.
crs: Coordinate reference system as a GDAL text string, EPSG code, or object of class crs. Default is latitude-longitude (EPSG:4326).
output: Whether the returned data should be two columns of subsample site coordinates (output = 'locs') or the subset of rows from dat associated with those coordinates (output = 'full').

Value

A list of subsamples, each a data.frame containing coordinates of subsampled localities (if output = 'locs') or the subset of occurrences from dat associated with those coordinates (if output = 'full'). The latitudinal bounds of each subsample are specified by its name in the list. If there are too few localities in a given interval to draw a subsample, that interval is omitted from output.

Details

bandit() rarefies the number of spatial sites within latitudinal ranges of specified bin width. (Compare with cookies() and clustr(), which spatially subsample to a specified extent without regard to latitudinal position.) Cases where it could be appropriate to control for latitudinal spread of localities include characterisations of latitudinal diversity gradients (e.g. Marcot 2016) or comparisons of ecosystem parameters that covary strongly with latitude (e.g. diversity in reefal vs. non-reefal habitats). Note that the total surface area of the Earth within equal-latitudinal increments decreases from the equator towards the poles; bandit() standardises only the amount of sites/area encompassed by each subsample, not the total area that could have been available for species to inhabit.

As with all divvy subsampling functions, sites within a given regional/latitudinal subsample are selected without replacement.

To calculate an integer number of degrees into which a given latitudinal range divides evenly, the palaeoverse package (v 1.2.1) provides the palaeoverse::lat_bins() function with argument fit = TRUE.

References

Allen2020divvy

Marcot2016divvy

Examples

# load bivalve occurrences to rasterise
library(terra)
#> terra 1.8.42
data(bivalves)

# initialise Equal Earth projected coordinates
rWorld <- rast()
prj <- 'EPSG:8857'
rPrj <- project(rWorld, prj, res = 200000) # 200,000m is approximately 2 degrees

# coordinate column names for the current and target coordinate reference system
xyCartes <- c('paleolng','paleolat')
xyCell   <- c('centroidX','centroidY')

# project occurrences and retrieve cell centroids in new coordinate system
llOccs <- vect(bivalves, geom = xyCartes, crs = 'epsg:4326')
prjOccs <- project(llOccs, prj)
cellIds <- cells(rPrj, prjOccs)[,'cell']
bivalves[, xyCell] <- xyFromCell(rPrj, cellIds)

# subsample 20 equal-area sites within 10-degree bands of absolute latitude
n <- 20
reps <- 100
set.seed(11)
bandAbs <- bandit(dat = bivalves, xy = xyCell,
                  iter = reps, nSite = n, output = 'full',
                  bin = 10, absLat = TRUE,
                  crs = prj
)
head(bandAbs[[1]]) # inspect first subsample
#>       genus paleolng paleolat collection_no reference_no    environment max_ma
#> 61 Placamen   122.84    12.21         41525        11125 coastal indet.  5.333
#> 62  Anadara   122.84    12.21         41524        11125 coastal indet.  5.333
#> 63  Anadara   122.84    12.21         41524        11125 coastal indet.  5.333
#> 64  Anadara   122.84    12.21         41524        11125 coastal indet.  5.333
#> 65  Corbula   122.84    12.21         41524        11125 coastal indet.  5.333
#> 66  Corbula   122.84    12.21         41524        11125 coastal indet.  5.333
#>    min_ma                      accepted_name centroidX centroidY
#> 61  2.588                           Placamen  11656041   1539440
#> 62  2.588        Anadara (Anadara) antiquata  11656041   1539440
#> 63  2.588         Anadara (Anadara) biformis  11656041   1539440
#> 64  2.588                            Anadara  11656041   1539440
#> 65  2.588 Corbula (Notocorbula) fortisulcata  11656041   1539440
#> 66  2.588                            Corbula  11656041   1539440
names(bandAbs)[1] # degree interval (absolute value) of first subsample
#> [1] "[10,20)"
#> [1] "[10,20)"
unique(names(bandAbs)) # all intervals containing sufficient data
#> [1] "[10,20)" "[20,30)" "[30,40)" "[40,50)"
#> [1] "[10,20)" "[20,30)" "[30,40)" "[40,50)"
# note insufficient coverage to subsample at equator or above 50 degrees

# subsample 20-degree bands, where central band spans the equator
# (-10 S to 10 N latitude), as in Allen et al. (2020)
# (An alternative, finer-grain way to divide 180 degrees evenly into an
# odd number of bands would be to set 'bin' = 4.)
bandCent <- bandit(dat = bivalves, xy = xyCell,
                   iter = reps, nSite = n, output = 'full',
                   bin = 20, centr = TRUE, absLat = FALSE,
                   crs = prj
)
unique(names(bandCent)) # all intervals containing sufficient data
#> [1] "[-50,-30)" "[10,30)"   "[30,50)"  
#> [1] "[-50,-30)" "[10,30)" "[30,50)"