Code and synthetic data for analysis of relation between species sizes and local abundances trends

This resource contains the R code and core results of a study seeking to identify whether there are global patterns in whether larger or smaller bodied species are showing different population trends within communities. This used the global BioTIME database (including community time series from hundreds of studies of mostly covering the last 20-50 years) and several large trait databases to gain a very large sample size (12,956 assemblage time series from 144 studies, incorporating 2,109,593 observations of 10,286 species, of which 7,234 could be linked to at least one size trait). Data resources in this deposit include matched trait values for each species, several population trends of each species, and community level correlations between population trends and body-size trait values. Additionally, html files describing the R markdown code to produce the data resources are included. This resource does not contain the raw population and trait data, which are openly available from various sources that are listed in the supporting documentation. The matching between global databases required a large amount of initial cleaning and filtering steps. Although the data was subject to a number of checks, with tens of thousands of species, it was too large to check all alignments manually, and trait matches assume a single bodysize value for a species across its range. The original purpose was to generate a relative rank within a community, but caution is needed for more fine-grained analyses using this approach. Full details about this application can be found at https://doi.org/10.5285/1f7687de-d68e-4349-b4d3-b7d3a127a7df

Date (Publication): 2024-05-01

Identifier: https://catalogue.ceh.ac.uk/id/1f7687de-d68e-4349-b4d3-b7d3a127a7df

Identifier: doi: / 10.5285/1f7687de-d68e-4349-b4d3-b7d3a127a7df

Other citation details: Terry, J.C.D., O'Sullivan, J.D., Rossberg, A.G. (2024). Code and synthetic data for analysis of relation between species sizes and local abundances trends. NERC EDS Environmental Information Data Centre 10.5285/1f7687de-d68e-4349-b4d3-b7d3a127a7df

Author

Queen Mary University of London - Terry, J.C.D.

https://orcid.org/0000-0002-0626-9938

Point of contact

Queen Mary University of London - Terry, J.C.D.

https://orcid.org/0000-0002-0626-9938

Author

Queen Mary University of London - O'Sullivan, J.D.

https://orcid.org/0000-0003-2386-6635

Author

Queen Mary University of London - Rossberg, A.G.

Publisher

NERC EDS Environmental Information Data Centre

Custodian

NERC EDS Environmental Information Data Centre

Owner

Queen Mary University of London

Access constraints: otherRestrictions Other restrictions

Other constraints: no limitations

Use constraints: otherRestrictions Other restrictions

Other constraints: This resource is available under the terms of the Open Government Licence

Use constraints: otherRestrictions Other restrictions

Other constraints: If you reuse this data, you should cite: Terry, J.C.D., O'Sullivan, J.D., Rossberg, A.G. (2024). Code and synthetic data for analysis of relation between species sizes and local abundances trends. NERC EDS Environmental Information Data Centre https://doi.org/10.5285/1f7687de-d68e-4349-b4d3-b7d3a127a7df

Metadata language: EnglishEnglish

Character set: utf8 UTF8

Topic category

Biota

Distribution format

Comma-separated values (CSV) ()
html ()
md ()

Distributor

NERC EDS Environmental Information Data Centre

OnLine resource: Download the data
Download a copy of this data

OnLine resource: Supporting information
Supporting information available to assist in re-use of this dataset

Hierarchy level: application application

Other: application

Conformance result

Date (Publication): 2010-12-08

Statement: To generate the assemblage time series, we downloaded all studies available in the ‘open’ component of the BioTIME database of community time series from Zenodo. We identified studies as ‘multi-site’ or ‘single-site’ based on the number of coordinates in the BioTIME database. Single-site studies were considered as one combined assemblage, whilst widely dispersed ‘multi-site’ studies were portioned into assemblages based on a global hexagonal grid of 96 km2 cells using the dggridR package in R. We retained records from assemblages with abundance or biomass data of at least 10 distinct species and at least 5 years between the first and last record. We used four separate trait databases that include some measure of organism size, but we did not mix information between databases. For amniotes, an amniote life history database was downloaded. For plants, we used the TRY database. For fish, we downloaded a curated database of fish traits, which in turn is largely based on data from the FishBase database. It is focused on the North Atlantic and Pacific continental shelf, but this represents the majority of the relevant BioTIME studies. For marine species, we downloaded size data from the WoRMS database. Data cleaning was performed in order to match up species names in the trait databases and BioTIME database. We assessed each assemblage–trait combination where ≥40% and ≥5 of the species had data for that trait and >80% of year samples contained at least 5 species. We excluded transitory species within each assemblage by including only those species that were seen in over half of the year samples. Where this filtering left data from less than 1% of the cells in the original study, we removed the whole study. Where a study included both ‘abundance’ and ‘biomass’ data, we preferentially used the abundance data. Studies with only presence–absence data were not used. To calculate the relative change in abundance of each species, we fitted the square-root transformed and scaled species totals as a function of year for each assemblage using ordinary least-squares regression models and calculate the slope β for each species. The main response variable τ for each assemblage was then computed as Kendall’s rank correlation coefficient between size trait values and the set of βs. Species with missing trait values were excluded from the calculation of τ. Where there were multiple assemblages per study, study-level τ was taken as a simple arithmetic mean of all assemblage-level τ values. We also test two alternative transformations of the population data. To examine study-level determinates of τ within each size trait, for each study we calculated: (1) the mean total species richness of each assemblage over the time frame, (2) the mean assemblage-level trait data completeness, (3) the mean number of years from which there were data, (4) the mean span of years from which there were data, (5) the log10-transformed number of assemblages within the study (that is, the spatial extent), (6) the absolute latitude of the centre of the study and (7) the range of traits in the assemblage (log10(max) − log10(min)). We fitted a set of linear models to assess whether these factors could predict either τ or τ2. All analysis used the R language, and scripts are included in the KnittedScripts folder. More information is provided in the supporting information.

File identifier: 1f7687de-d68e-4349-b4d3-b7d3a127a7df XML

Metadata language: EnglishEnglish

Character set: ISO/IEC 8859-1 (also known as Latin 1) 8859 Part 1

Hierarchy level: application application

Hierarchy level name: application

Date stamp: 2024-11-14T13:45:22

Metadata standard name: UK GEMINI

Metadata standard version: 2.3

Point of contact

NERC EDS Environmental Information Data Centre

Lancaster Environment Centre, Library Avenue, Bailrigg , Lancaster , LA1 4AP , UK

https://eidc.ac.uk/

Overviews

Spatial extent

Provided by

Access to the portal

Read here the full details and access to the data.

Associated resources

Not available

Simple

Conformance result

Overviews

Spatial extent

Keywords

Provided by

Share on social sites

Associated resources