AmP estimation procedure

From DEBwiki
Jump to: navigation, search
AmP estimation
Concepts

Data and completeness
Parameter estimation
Goodness-of-fit: SMSE / MRE
AmP Literature

Practice - essentials

Starting an estimation for a new species
Setting initial parameter values
Setting weight coefficients
Computing implied properties
Submitting to the collection
Obtaining parameter confidence intervals

Practice - extra modules
Code specification
User-defined files: run, mydata, pars_init, predict
Data: Zero-variate, Univariate, Pseudo-data

Typified models
Estimation options

The purpose of this Wiki-style AmP (Add-my-pet) estimation procedure manual is to explain:

  • The concepts (this page): description of the methodology used in the AmP project and for parameter estimation;
  • Practice and getting started (see table on the right for quick-access links): technical aspects (code specifications), and how to use the scripts and arrive at parameter estimates;

AmP is a project based on Dynamic Energy Budget theory for metabolic organization. For a short intro to DEB theory visit the DEB Wikipedia page, and for more about DEB visit our main DEBwiki portal. An introduction to modelling and statistics is given in the document Basic methods for Theoretical Biology.

Notation

This manual and the DEBtool software (download from GitHub) follow the DEB notation.

Data and data types

The data consists of:

  • a set of zero-variate data (i.e. a set of numbers)
  • and, possibly, one or more sets of uni-variate data (each consisting of list of values for the independent and the associated dependent variable).

Data sources are referenced in the mydata file.

Zero-variate data has real and pseudo data points. Real data relate to actual observations on the species of interest at specified temperatures and food conditions. Pseudo-data relate to the generalised animal at the reference temperature. Increasing the number of types of real data (so information) decreases the role of the pseudo-data in the parameter estimation. The impact of pseudo-data on the resulting parameter estimates is controlled by the weight coefficients.

The real data should at least contain the maximum adult weight. However, it is preferable to also include weight and age at birth and puberty as well as the maximum reproduction rate. Notice that times and rates without temperature are meaningless. This combination already fixes the growth curve in a crude way, specifies kap and the maturity thresholds at birth and puberty. The weight can be dry, ash-free dry or wet weights, but the type of weight relates to the specific densities and chemical indices. Assuming that the specific density of wet mass is close to 1 g/cm3, check the values for d_V and d_E that refer to dry weight.

Pseudo-data are parameter values corresponding to a generalized animal, i.e. typical values for a wide variety of animals (Lika et al 2011). These values may change as the AmP collection increases. Pseudo-data serve to fill possible gaps in information that is contained in the real data. Only intensive parameters can play the role of pseudo-data points. Species-specific parameters should not be included in the pseudo-data, especially the zoom factor, the shape coefficient and the maturity levels at birth and puberty. Since the value for specific cost for structure (E_G) is sensitive for the water content of tissue, which differs between jelly fish and vertebrates, it is replaced by the growth efficiency kap_G. Pseudo-data, if used properly, can play several roles. It serves the task of increasing the identifiability of parameters and, thus, preventing the ambiguous determination of parameter values. Pseudo-data can also be used to define the area of the parameter space where the parameter values are reasonable.

Generally the use of statistics derived from observations, such as the von Bertalanffy growth rate or the half saturation coefficient, as data from which DEB parameters are estimated, is discouraged. It is far better to base the parameter estimation directly on the measurements, avoiding manipulation or interpretation. For instance, if wet weights were measured, use wet weights as data and do not convert them first to dry weights (or vice versa).

Data quality and availability

The quality and availability of data varies enormously over species, which has consequences for the entries. For comparative purposes, it helps to judge the completeness of the data using a marking system from 0 (low) to 10 (high) (See Table), published in the Lika et al. 2011 paper.

Data from field conditions suffer from the problem that temperature and feeding profiles are generally unknown. To a lesser extent, this also applies to laboratory conditions. Only a few species can be cultured successfully and detailed (chemical) knowledge about nutritional requirements hardly exists for any species. The idea that `some prediction is better than no prediction' fueled the collection (e.g. for management purposes), but where data are guessed is clearly indicated in the mydata-files. The hope is that such weak entries will improve over time by supplementing data and re-estimate parameters. Predictions might help to prioritize further research.

Another motivation to include weak entries is that predictions for situations that have not yet been studied empirically can be used to test the theory rigorously. It is encouraging to see how few data already allows for an estimation of parameters. That results are not fully random is supported by the observation that similar species (in terms of body size, habitat and taxonomy) have similar parameter values, despite lack of advanced data. See, for instance, the different species of tardigrades. The reliability of the resulting estimates and predictions should always be evaluated in the context of the data on which they are based. Generally, the more types of data, the more reliable are the results.

Where many different data sources are used, however, conditions can vary to the extent that variations cannot be ignored. In some mydata-files this is taken into account by assigning different feeding conditions to different data sets. Notice that the scaled functional response only takes differences in food density into account, not differences in food quality. If food qualities differ, the scaled function response is no longer less or equal to 1, but might be larger. If feeding densities and qualities are not specified with the data, this "repair" is far from ideal, however.

The variation not only concerns environmental conditions, but also differences in parameter values among individuals that have been used. Parameter values tend to vary across the geographical range of a species, a problem that applies to many fish entries. Although parameter values are better fixed with a growing number of data types, the inherent variability works in the opposite direction. This is why marks have been given for both completeness of data and goodness of fit.

Although DEB theory concerns all organisms, the collection is only about animals, for the reason that they can live off a single (chemically complex) resource and thus can be modeled with a single reserve and resource availability is relatively simple to characterize. Within the animals, we made an effort to maximize coverage, given limitations imposed by data availability.

Typified models

Different models of DEB theory have been applied to different organisms. Some of the most used models have been formalized and are called typified models. There is a set of instructions to go from model std to abj.

Species specific details which are not included in the computation of implied properties:

  • Acanthocephalans live in the micro-aerobic environment of the gut of their host. They don't use dioxygen, but ferment. It is possible to model this (see Section 4.9.1, Kooijman 2010), but this is not yet implemented in the code behind the calculation of the statistics. These particular respiration predictions should, therefore, be ignored.
  • Cephalopods are typically semelparous (death at first spawning) and die well before approaching ultimate body size. For practical purposes, this early death is included as an effect of ageing, but ageing has probably nothing to do with this. The asymptotic size is calculated in the pars-file and some of the listed properties are not realistic as a consequence.
  • The toadlets Crinia lower their allocation fraction to soma between hatch and birth (Mueller et al 2012).
  • Mammals take milk during their baby-stage, weaning is included in all stx models for mammals as a maturity threshold, but the change in diet is not taken into account.
  • Many birds first reproduce in their second year under (seasonal) field conditions. They apparently have a relatively long juvenile period during most of which they are fully grown. This trait leads to high values for maturity maintenance at puberty and low values for maturity maintenance. Husbandry data indicates that birds potentially reproduce much earlier, which questions the realism of these two parameters.


Parameter estimation

Methodology of parameter estimation

We here discuss the estimation of all DEB parameters in context: the AmP method; for details see Marques et al, 2018a and 2018b.Van der Meer 2006 and Kooijman et al, 2008 show which particular compound parameters can be estimated from a few simple observations and how an increasing number of parameters can be estimated if more quantities are observed at several food densities. A natural sequence exists in which parameters can be known in principle. The methodology evolved from the covariation method (Lika et al 2011).

Estimating parameter values from a set of data sets is done in the AmP collection on the basis of the minimization of a parameter-free loss function, see Marques et al 2018 and 2019, which takes the different dimensions of the various data sets into account, and penalizes over-estimation as hard as under-estimation, using all data sets simultaneously. The minimum is found using a Nelder-Mead simplex method. A simplex is a set of parameter-sets with a number of elements that is one more than the number of free parameters. One of the elements in the set is the specified initial parameter set, the seed, the others are generated automatically in its "neighbourhood". The simplex method tries to replace the worst parameter set by one that is better than the best one, i.e. gives a smaller value of the loss-function. During the procedure the parameter are (optionally, but by default) filtered to avoid that combinations of values are outside their logical domain (Lika et al 2014).

The procedure starts from a set of initial values. Provided that a global minimum has been found, the result does not depend on the initial value.

Obtaining parameter estimates

Estimation of some 15 parameters simultaneously from a variety of data cannot be routine work. You can only expect useful results if your initial estimates are not too far from the resulting estimates. It is best to either use a time-length-energy framework (as done here) or a time-length-mass framework in the selection of primary parameters and not mix them. Both frameworks can be used to predict energies and masses, using conversion factors.

To obtain the estimates, you have to prepare a script-file run_my_pet and three function files mydata_my_pet,pars_init_my_pet and predict_my_pet.

You can follow the instructions to start an Add-my-pet estimation for a single species. The DEBtool also enables you to estimate parameters for two or more species simultaneously. This can be interesting in the case that different species share particular parameter values, and/or parameter values have particular assumed relationships. The general idea is that the total number of parameters to be estimated for the group is (considerably) smaller than the sum of the parameters to be estimated for each species.

Weight coefficients

The weight coefficients serve to (subjectively) quantify the confidence of the user in the data-sets as well as for specific data points. The AmP procedure distinguished between real and pseudo data. The weight coefficients are automatically set to Weight coeff.png where i designates the data set and j the point on data set i, where ni designates the number of points in data set i. The motivation is to ensure that each data set contributes equally to the loss function (instead of each data point contributing equally). The default weight coefficients for pseudo-data are handled differently).

The user can overwrite default weight values (for either the whole data set or else particular values. This is done in the mydata file. The overwriting of the weight coefficient is done by multiplying the default value by a dimensionless factor. See Setting weight coefficients.

Estimation options

The AmP estimation procedure includes several loss functions. The user defines which loss function to use in estimation options - the default weight coefficients for pseudo-data depends on which loss function is being used. 'sb' stands for the symmetric bounded loss function and 'su' stands for the symmetric unbounded loss function. Please refer to the Estimation options page to check what are the default options.

Your best option is to use a series of short iteration runs, setting 'max_step_number' at 500, say, rather than a single long run, using continuation: continue with the previously obtained results. You can do this by first selecting 'pars_init_method' 1, meaning that you start from the values as specified in the pars_init file, in combination with 'results_output' 2, meaning that a .mat file is saved, and then select 'pars_init_method' 2, meaning that you continue with values as specified in the .mat file that was previously written. The significance of a series of short runs is that with each restart, the simplex has a relatively large volume, which shirks during iteration, meaning that valleys in the surface of the lossfunction are more easily detected, and the risk is reduced to arrive at a local minimum that is not the global minimum. For this reason, it is always a good idea to restart from the result, even in the case of successful convergence. Many predictions, as specified in the predict file, are the results of numerical procedures, involving small numerical errors. For this reason, it is not always possible to arrive at a successful convergence, i.e. the lossfunction has a rough surface. As long as the resulting fit is good, and the parameter values seem acceptable, this does not need to be a problem. When you think that the result is better than the values in the pars_init file, use mat2pars_init (this function does not need further input), to copy the values of the .mat file to the pars_init file.

In many cases, convergence will be smooth and easy, but sometimes convergence is more reluctant. In such cases it helps to first fix parameters in the pars_init that turn out to run to unrealistic values, and release them again if predictions are closer to data. The free/fix setting is always taken from the pars_init file, even with 'pars_init_method' 2, when the parameter values are taken from the .mat file.

To judge parameter values, you can study the implied properties, setting 'results_output' at 3, and an html-page is automatically opened in your system browser at the end of an iteration (or directly if 'method' 'no' is specified). If all looks OK, you can specify 'results_output' 4, and implied properties of related species in the collection are included in the table in the html-page.

Goodness of fit criterion

The match between data and predictions is quantified by the goodness of fit using the mean relative error (MRE) and the symmetric mean squared error (SMSE). MRE can have values from 0 to infinity, while SMSE has values from 0 to 1. In both cases, 0 means predictions match data exactly. MRE assesses the differences between data and predictions additively, judging equally an overestimation and underestimation of the same relative size (e.g, +20% or -20% will give the same contribution), while SMSE assesses the difference multiplicatively, judging overestimation and underestimation by the same factor equally (e.g. x2 or x/2 will give the same contribution). Notice that the result of the minimization of loss functions does not, generally, correspond with the minimum of MRE or SMSE (unless the fit is perfect).

Relative errors in a univariate data set are summarized to that of a single data-point by taking the MRE for all data-points. Only real data, not pseudo-data, are included in the assessment. If all weight coefficients of a data set are zero, it is not included in the computation of the MRE. The best situation is, of course, that of a small MRE. It is likely that the marks for completeness and goodness of fit will be negatively correlated.

The problem of a good fit for the wrong reasons is always present. It is, therefor, important to judge the realism of parameter values as well. Remember that parameters might be poorly fixed by data, and very different values can, sometimes, result in a tiny difference in goodness of fit.


Obtaining parameter confidence intervals

Uncertainty of the point estimates of parameter values can be assessed by computing the marginal confidence intervals using the profile method, as described in Marques et al 2019 (development) and Stavrakidis-Zachou et al. 2018 (application). The profile method is a two-step procedure. In the first step, the profile (of the loss function) for a parameter is obtained. In the second, which is the calibration step, the level of the loss function that corresponds to uncertainty is computed. See a more detailed tutorial here.


References

Editorials of DEB special issues

DEB model introductions

  • Kearney 2020: What is the status of metabolic theory one century after Pütter invented the von Bertalanffy growth curve?
  • Kooijman 2020: The standard Dynamic Energy Budget model has no plausible alternatives
  • Jager 2020: Revisiting simplified DEBtox models for analysing ecotoxicity data
  • Muller et al 2019: Regulation of reproductive processes with dynamic energy budgets
  • Jusup et al 2017: Physics of metabolic organization
  • Sara et al 2014: Thinking beyond organism energy use: a trait-based bioenergetic mechanistic approach for predictions of life history traits in marine organisms
  • Ledder 2014: The basic Dynamic Energy Budget model and some implications
  • Kooijman 2012: Energy budgets
  • Lika and Kooijman 2011: The comparative topology of energy allocation in budget models
  • Sousa et al 2006: The thermodynamics of organisms in the context of Dynamic Energy Budget theory
  • Kooijman 2001: Quantitative aspects of metabolic organization; a discussion of concepts
  • Kooijman 1998: The Dynamic Energy Budget (DEB) model

DEB in evolutionary context

Parameter estimation

  • Lika et al 2020: The use of augmented loss functions for estimating Dynamic Energy Budget parameters
  • Augustine et al 2020: Comparing loss functions and interval estimates for survival data
  • Marques et al 2019: Fitting Multiple Models to Multiple Data Sets
  • Marques et al 2018: The AmP project: Comparing Species on the Basis of Dynamic Energy Budget Parameters
  • Morais et al 2018: Calibration of parameters in Dynamic Energy Budget models using Direct-Search methods
  • Lika et al 2011: The `covariation method' for estimating the parameters of the standard Dynamic Energy Budget model II: Properties of the estimation method and some patterns
  • Lika et al 2011: The `covariation method' for estimating the parameters of the standard Dynamic Energy Budget model I: philosophy and approach
  • Kooijman et al 2008: From food-dependent statistics to metabolic parameters, a practical guide to the use of Dynamic Energy Budget theory
  • Sousa et al 2008: From empirical patterns to theory: A formal metabolic theory of life
  • van der Meer 2006: An introduction to Dynamic Energy Budget (DEB) models with special emphasis on parameter estimation

Patterns in parameter values

  • van der Meer 2020: Production efficiency differences between poikilotherms and homeotherms have little to do with metabolic rate
  • Kooijman et al 2020: The energetic basis of population growth in animal kingdom
  • Kooijman 2020: The comparative energetics of petrels and penguins
  • Augustine et al 2019: Altricial-precocial spectra in animal kingdom
  • Augustine et al 2019: Why big-bodied animal species cannot evolve a waste-to-hurry strategy
  • Lika et al 2019: Body size as emergent property
  • Baas and Kooijman 2015: Sensitivity of animals to chemical compounds links to metabolic rate
  • Kooijman and Lika 2014: Comparative energetics of the 5 fish classes on the basis of Dynamic Energy Budgets
  • Kooijman 2014: Metabolic acceleration in animal ontogeny: an evolutionary perspective
  • Kooijman and Lika 2014: Resource allocation to reproduction in animals
  • Lika et al 2014: Bijection between data and parameter space quantifies the supply-demand spectrum
  • Kooijman 2013: "Waste-to-hurry" Dynamic Energy Budgets explain the need of wasting to fully exploit blooming resources
  • Kooijman et al 2011: Scenarios for acceleration in fish development and the role of metamorphosis
  • Kooijman et al 2007: Scaling relationships based on partition coefficients and body sizes have similarities and interactions
  • Cardoso et al 2006: Body size scaling relationships in bivalves: a comparison of field data with predictions by Dynamic Energy Budgets (DEB theory)
  • van der Veer et al 2006: The estimation of DEB parameters for various Northeast Atlantic bivalve species
  • van der Veer et al 2003: Body size scaling relationships in flatfish as predicted by Dynamic Energy Bugets (DEB theory): implications for recruitment
  • Kooijman 1986: Energy budgets can explain body size relations

Bibliography of DEB papers