Calculate basic spatial coverage and diversity metrics

Summarise the geographic scope and position of occurrence data, and optionally estimate diversity and evenness

Usage

sdSumry(
  dat,
  xy,
  taxVar,
  crs = "epsg:4326",
  collections = NULL,
  quotaQ = NULL,
  quotaN = NULL,
  omitDom = FALSE
)

Arguments

dat: A data.frame or matrix containing taxon names, coordinates, and any associated variables; or a list of such structures.
xy: A vector of two elements, specifying the name or numeric position of columns in dat containing coordinates, e.g. longitude and latitude. Coordinates for any shared sampling sites should be identical, and where sites are raster cells, coordinates are usually expected to be cell centroids.
taxVar: The name or numeric position of the column containing taxonomic identifications. taxVar must be of same class as xy, e.g. a numeric column position if xy is given as a vector of numeric positions.
crs: Coordinate reference system as a GDAL text string, EPSG code, or object of class crs. Default is latitude-longitude (EPSG:4326).
collections: The name or numeric position of the column containing unique collection IDs, e.g. 'collection_no' in PBDB data downloads.
quotaQ: A numeric value for the coverage (quorum) level at which to perform coverage-based rarefaction (shareholder quorum subsampling).
quotaN: A numeric value for the quota of taxon occurrences to subsample in classical rarefaction.
omitDom: If omitDom = TRUE and quotaQ or quotaN is supplied, remove the most common taxon prior to rarefaction. The nTax and evenness returned are unaffected.

Value

A matrix of spatial and optional diversity metrics. If dat is a list of data.frame objects, output rows correspond to input elements.

Details

sdSumry() compiles metadata about a sample or list of samples, before or after spatial subsampling. The function counts the number of collections (if requested), taxon presences (excluding repeat incidences of a taxon at a given site), and unique spatial sites; it also calculates site centroid coordinates, latitudinal range (degrees), great circle distance (km), mean pairwise distance (km), and summed minimum spanning tree length (km). Coordinates and their distances are computed with respect to the original coordinate reference system if supplied, except in calculation of latitudinal range, for which projected coordinates are transformed to geodetic ones. If crs is unspecified, by default points are assumed to be given in latitude-longitude and distances are calculated with spherical geometry.

The first two diversity variables returned are the raw count of observed taxa and the Summed Common species/taxon Occurrence Rate (SCOR). SCOR reflects the degree to which taxa are common/widespread and is decoupled from richness or abundance (Hannisdal et al. 2012). SCOR is calculated as the sum across taxa of the log probability of incidence, $\lambda$. For a given taxon, $\lambda = -ln(1 - p)$, where $p$ is estimated as the fraction of occupied sites. Very widespread taxa make a large contribution to an assemblage SCOR, while rare taxa have relatively little influence.

If quotaQ is supplied, sdSumry() rarefies richness at the given coverage value and returns the point estimate of richness (Hill number 0) and its 95% confidence interval, as well as estimates of evenness (Pielou's J) and frequency-distribution sample coverage (given by iNEXT$DataInfo). If quotaN is supplied, sdSumry() rarefies richness to the given number of occurrence counts and returns the point estimate of richness and its 95% confidence interval. Coverage-based and classical rarefaction are both calculated with iNEXT::estimateD() internally. For details, such as how diversity is extrapolated if sample coverage is insufficient to achieve a specified rarefaction level, consult Chao and Jost (2012) and Hsieh et al. (2016).

References

Chao2012divvy

Hannisdal2012divvy

Hsieh2016divvy

Examples

# generate occurrences
set.seed(9)
x  <- sample(rep(1:5, 10))
y  <- sample(rep(1:5, 10))
# make some species 2x or 4x as common
abund <- c(rep(4, 5), rep(2, 5), rep(1, 10))
sp <- sample(letters[1:20], 50, replace = TRUE, prob = abund)
obs <- data.frame(x, y, sp)

# minimum sample data returned
sdSumry(obs, c('x','y'), 'sp')
#>      nOcc nLoc centroidX centroidY latRange greatCircDist meanPairDist
#> [1,]   45   22  3.045289  2.909986        4      628.5192     297.7363
#>      minSpanTree     SCOR nTax
#> [1,]    2332.473 2.234449   17

# also calculate evenness and coverage-based rarefaction diversity estimates
sdSumry(obs, xy = c('x','y'), taxVar = 'sp', quotaQ = 0.7)
#>   nOcc nLoc centroidX centroidY latRange greatCircDist meanPairDist minSpanTree
#> 1   45   22  3.045289  2.909986        4      628.5192     297.7363    2332.473
#>       SCOR nTax  evenness coverage   SQSdiv SQSlow95 SQSupr95
#> 1 2.234449   17 0.9405151   0.8708 12.17405 8.323042 16.02506