bandit
subsamples spatial point data to a specified number of sites
within bins of equal latitude
Usage
bandit(
dat,
xy,
iter,
nSite,
bin,
centr = FALSE,
absLat = FALSE,
maxN = 90,
maxS = -90,
crs = "epsg:4326",
output = "locs"
)
Arguments
- dat
A
data.frame
ormatrix
containing the coordinate columnsxy
and any associated variables, e.g. taxon names.- xy
A vector of two elements, specifying the name or numeric position of columns in
dat
containing coordinates, e.g. longitude and latitude. Coordinates for any shared sampling sites should be identical, and where sites are raster cells, coordinates are usually expected to be cell centroids.- iter
The number of times to subsample localities within each latitudinal band.
- nSite
The quota of unique locations to include in each subsample.
- bin
A positive numeric value for latitudinal band width, in degrees.
- centr
Logical: should a bin center on and cover the equator (
TRUE
) or should the equator mark the boundary between the lowest-latitude northern and southern bins (FALSE
, default)? Ignored ifabsLat = TRUE
.- absLat
Logical: should only the absolute values of latitude be evaluated? If
absLat = TRUE
,centr
argument is ignored.- maxN
Optional argument to specify the northmost limit for subsampling, if less than 90 degrees.
- maxS
Optional argument to specify the southmost limit for subsampling, if not -90 degrees. Should be a negative value if in the southern hemisphere.
- crs
Coordinate reference system as a GDAL text string, EPSG code, or object of class
crs
. Default is latitude-longitude (EPSG:4326
).- output
Whether the returned data should be two columns of subsample site coordinates (
output = 'locs'
) or the subset of rows fromdat
associated with those coordinates (output = 'full'
).
Value
A list of subsamples, each a data.frame
containing
coordinates of subsampled localities (if output = 'locs'
)
or the subset of occurrences from dat
associated with those coordinates
(if output = 'full'
). The latitudinal bounds of each subsample
are specified by its name in the list. If there are too few localities
in a given interval to draw a subsample, that interval is omitted from output.
Details
bandit()
rarefies the number of spatial sites within latitudinal ranges
of specified bin width. (Compare with cookies()
and clustr()
, which spatially
subsample to a specified extent without regard to latitudinal position.)
Cases where it could be appropriate to control for latitudinal spread of localities
include characterisations of latitudinal diversity gradients (e.g. Marcot 2016)
or comparisons of ecosystem parameters that covary strongly with
latitude (e.g. diversity in reefal vs. non-reefal habitats). Note that
the total surface area of the Earth within equal-latitudinal increments
decreases from the equator towards the poles; bandit()
standardises only
the amount of sites/area encompassed by each subsample, not the total area
that could have been available for species to inhabit.
As with all divvy
subsampling functions, sites within a given
regional/latitudinal subsample are selected without replacement.
To calculate an integer number of degrees into which a given latitudinal
range divides evenly, the palaeoverse
package (v 1.2.1) provides the
palaeoverse::lat_bins()
function with argument fit = TRUE
.
References
Allen BJ, Wignall PB, Hill DJ, Saupe EE, Dunhill AM (2020). “The latitudinal diversity gradient of tetrapods across the Permo--Triassic mass extinction and recovery interval.” Proceedings of the Royal Society B, 287(1929), 20201125. doi:10.1098/rspb.2020.1125 .
Marcot JD, Fox DL, Niebuhr SR (2016). “Late Cenozoic onset of the latitudinal diversity gradient of North American mammals.” Proceedings of the National Academy of Sciences, 113(26), 7189-7194. doi:10.1073/pnas.1524750113 .
Examples
# load bivalve occurrences to rasterise
library(terra)
#> terra 1.7.55
data(bivalves)
# initialise Equal Earth projected coordinates
rWorld <- rast()
prj <- 'EPSG:8857'
rPrj <- project(rWorld, prj, res = 200000) # 200,000m is approximately 2 degrees
# coordinate column names for the current and target coordinate reference system
xyCartes <- c('paleolng','paleolat')
xyCell <- c('centroidX','centroidY')
# project occurrences and retrieve cell centroids in new coordinate system
llOccs <- vect(bivalves, geom = xyCartes, crs = 'epsg:4326')
prjOccs <- project(llOccs, prj)
cellIds <- cells(rPrj, prjOccs)[,'cell']
bivalves[, xyCell] <- xyFromCell(rPrj, cellIds)
# subsample 20 equal-area sites within 10-degree bands of absolute latitude
n <- 20
reps <- 100
set.seed(11)
bandAbs <- bandit(dat = bivalves, xy = xyCell,
iter = reps, nSite = n, output = 'full',
bin = 10, absLat = TRUE,
crs = prj
)
head(bandAbs[[1]]) # inspect first subsample
#> genus paleolng paleolat collection_no reference_no environment max_ma
#> 61 Placamen 122.84 12.21 41525 11125 coastal indet. 5.333
#> 62 Anadara 122.84 12.21 41524 11125 coastal indet. 5.333
#> 63 Anadara 122.84 12.21 41524 11125 coastal indet. 5.333
#> 64 Anadara 122.84 12.21 41524 11125 coastal indet. 5.333
#> 65 Corbula 122.84 12.21 41524 11125 coastal indet. 5.333
#> 66 Corbula 122.84 12.21 41524 11125 coastal indet. 5.333
#> min_ma accepted_name centroidX centroidY
#> 61 2.588 Placamen 11656041 1539440
#> 62 2.588 Anadara (Anadara) antiquata 11656041 1539440
#> 63 2.588 Anadara (Anadara) biformis 11656041 1539440
#> 64 2.588 Anadara 11656041 1539440
#> 65 2.588 Corbula (Notocorbula) fortisulcata 11656041 1539440
#> 66 2.588 Corbula 11656041 1539440
names(bandAbs)[1] # degree interval (absolute value) of first subsample
#> [1] "[10,20)"
#> [1] "[10,20)"
unique(names(bandAbs)) # all intervals containing sufficient data
#> [1] "[10,20)" "[20,30)" "[30,40)" "[40,50)"
#> [1] "[10,20)" "[20,30)" "[30,40)" "[40,50)"
# note insufficient coverage to subsample at equator or above 50 degrees
# subsample 20-degree bands, where central band spans the equator
# (-10 S to 10 N latitude), as in Allen et al. (2020)
# (An alternative, finer-grain way to divide 180 degrees evenly into an
# odd number of bands would be to set 'bin' = 4.)
bandCent <- bandit(dat = bivalves, xy = xyCell,
iter = reps, nSite = n, output = 'full',
bin = 20, centr = TRUE, absLat = FALSE,
crs = prj
)
unique(names(bandCent)) # all intervals containing sufficient data
#> [1] "[-50,-30)" "[10,30)" "[30,50)"
#> [1] "[-50,-30)" "[10,30)" "[30,50)"