Add-my-pet Introduction

From DEBwiki
Jump to: navigation, search
Add-my-pet
Main articles
How to start

Introduction
Starting an estimation for a new species
Setting initial parameter values manually
Estimation options

Important knowledge

Typified models
Data types
Zero-variate data
Univariate data
Pseudo-data
Weight coefficients
SMSE / MRE
DEB notation

User-defined files

run
mydata
pars_init
predict

The purpose of this introduction is to explain:

  • what the different terms in the collection mean;
  • how the collection is organised;
  • how to edit templates for a species and submit it to the collection.


Short intro to the DEB theory

A 16-page introduction to the DEB theory is presented in Kooijman 2012. The concepts of DEB theory are presented in the summary of the DEB book. Sousa et al 2008 and Sousa et al 2010 give a more formal introduction to the standard DEB model.

The general methodology of estimation of DEB parameters from data is described in van der Meer 2006; Kooijman et al 2008 shows which particular compound parameters can be estimated from a few simple observations at a single food density and how an increasing number of parameters can be estimated if more quantities are observed at several food densities. A natural sequence exists in which parameters can be known in principle.

We here discuss the estimation of all parameters, using circumstantial evidence: the covariation method, as presented in Lika et al, 2011, 2011a. This manual and the software follows the DEB notation.

An introduction to modelling and statistics is given in the document Basic methods for Theoretical Biology.

Taxa in the collection

Although DEB theory concerns all organisms, the collection is only about animals, for the reason that they can live off a single (chemically complex) resource and thus can be modeled with a single reserve and resource availability is relatively simple to characterize. Within the animals, we made an effort to maximize coverage, given limitations imposed by data availability.

Typified models

Different models of DEB theory have been applied to different organisms. Some of the most used models have been formalized and are called typified models.

There is a set of instructions to go from model std to abj.

Species specific details which are not included in the computation of implied properties

  • Acanthocephalans live in the micro-aerobic environment of the gut of their host. They don't use dioxygen, but ferment. It is possible to model this (see Section 4.9.1, Kooijman 2010), but this is not yet implemented in the code behind the calculation of the statistics. These particular respiration predictions should, therefore, be ignored.
  • Cephalopods are typically semelparous (death at first spawning) and die well before approaching ultimate body size. For practical purposes, this early death is included as an effect of ageing, but ageing has probably nothing to do with this. The asymptotic size is calculated in the pars-file and some of the listed properties are not realistic as a consequence.
  • The toadlets Crinia lower their allocation fraction to soma between hatch and birth (Mueller et al 2012).
  • Mammals take milk during their baby-stage, weaning is included in all stx models for mammals as a maturity threshold, but the change in diet is not taken into account.
  • Many birds first reproduce in their second year under (seasonal) field conditions. They apparently have a relatively long juvenile period during most of which they are fully grown. This trait leads to high values for maturity maintenance at puberty and low values for maturity maintenance. Husbandry data indicates that birds potentially reproduce much earlier, which questions the realism of these two parameters.

Parameter estimation

Estimating parameter values from data can be done on the basis of the minimisation of the weighted sum of squared deviations between predicted and observed values, the WLS criterion (see Lika et al 2011).

The method (as used here) assumes that the deviations are independently normally distributed with a constant variation coefficient. The minima can be found using a Nelder-Mead (= simplex) method. During the procedure the parameter combinations are filtered to constrain them within boundaries (Lika et al 2014).

DEBtool has two regression routines: petregr and petregr_f (f for filter). These methods start from a set of initial values. Provided that a global minimum has been found, the result does not depend on the initial value.

To obtain the estimates, we have to prepare a script-file run_my_pet and three function files mydata_my_pet,pars_init_my_pet and predict_my_pet. We can follow the instructions to start an Add-my-pet estimation.

Initial Parameters

Estimation of some 15 parameters simultaneously from a variety of data cannot be routine work. You can only expect useful results if your initial estimates are not too far from the resulting estimates. It is best to either use a time-length-energy framework (as done here) or a time-length-mass framework in the selection of primary parameters and not mix them. Both frameworks can be used to predict energies and masses, using conversion factors.

The add-my-pet estimation procedure offers three methods for setting initial parameters values. These are described in the following subsections.

Automatized setting of initial values

You can make use of an automatized method to find initial values based on the bijection between available zero-variate data and the DEB model parameters. You can directly study the theory in Lika et al 2014 (please send any questions or feedback to the authors).

In practice you need to open run_my_pet and set:

estim_options ('method', 'no') estim_options('pars_init_method', 0)

You can access code and explanations in the DEBtool manual Toolbox: lib/pet (please scroll down to subsection 'automatized initial estimates').

Use values saved from previous runs

If you saved values from previous runs (estim_options('results_output', 2)), then you will have a results_my_pet.mat file in your working directory.

You can read initial parameter values from that file by setting in run_my_pet: estim_options('pars_init_method', 1)

Manual setting of initial values

In run_my_pet set:

  • estim_options ('method', 'no')
  • estim_options('pars_init_method', 2)

Then you can manually find initial estimates by following the nine steps procedure. Please note that this works well for the 'std' model, but needs to be adjusted if using 'abj'.

Data

The data consists of:

  • a set of zero-variate data (i.e. a set of numbers)
  • and, possibly, one or more sets of uni-variate data (each consisting of list of values for the independent and the associated dependent variable).

Zero-variate data have real and pseudo data points. Pseudo-data are parameter values corresponding to a generalised animal, i.e. typical values for a wide variety of animals (Lika et al 2011). Pseudo-data serve to fill possible gaps in information that is contained in the real data. Only intensive parameters can play the role of pseudo-data points. Species-specific parameters should not be included in the pseudo-data, especially the zoom factor, the shape coefficient and the maturity levels at birth and puberty. Since the value for specific cost for structure (E_G) is sensitive for the water content of tissue, which differs between jelly fish and vertebrates, it is replaced by the growth efficiency kap_G.

Real data relate to actual observations on the species of interest at specified temperatures and food conditions. Pseudo-data relate to the generalised animal at the reference temperature.

Generally the use of statistics derived from observations, such as the von Bertalanffy growth rate or the half saturation coefficient, as data from which DEB parameters are estimates is discouraged. It is far better to base the parameter estimation directly on the measurements, avoiding manipulation or interpretation. For instance, if wet weights were measured, use wet weights as data and do not convert them first to dry weights (or vice versa).

The real data should at least contain the maximum adult weight. However, it is preferable to also include weight and age at birth and puberty as well as the maximum reproduction rate. Notice that times and rates without temperature are meaningless. This combination already fixes the growth curve in a crude way, specifies kap and the maturity thresholds at birth and puberty. The weight can be dry, ash-free dry or wet weights, but the type of weight relates to the specific densities and chemical indices. Assuming that the specific density of wet mass is close to 1 g/cm3, check the values for d_V and d_E that refer to dry weight.

Data quality and availability

The quality and availability of data varies enormously over species, which has consequences for the entries. Data from field conditions suffer from the problem that temperature and feeding profiles are generally unknown. To a lesser extent, this also applies to laboratory conditions.

Only a few species can be cultured successfully and detailed (chemical) knowledge about nutritional requirements hardly exists for any species. The idea that `some prediction is better than no prediction' fuelled the collection (e.g. for management purposes), but where data are guessed is clearly indicated in the mydata-files. The hope is that such weak entries will improve over time by supplementing data and re-estimate parameters. Predictions might help to prioritize further research.

Another motivation to include weak entries is that predictions for situations that have not yet been studied empirically can be used to test the theory rigorously. It is encouraging to see how few data already allows for an estimation of parameters. That results are not fully random is supported by the observation that similar species (in terms of body size, habitat and taxonomy) have similar parameter values, despite lack of advanced data. See, for instance, the different species of tardigrades. The reliability of the resulting estimates and predictions should always be evaluated in the context of the data on which they are based. Generally, the more types of data, the more reliable are the results.

Where many different data sources are used, however, conditions can vary to the extent that variations cannot be ignored. In some mydata-files this is taken into account by assigning different feeding conditions to different data sets. Notice that the scaled functional response only takes differences in food density into account, not differences in food quality. If food qualities differ, the scaled function response is no longer less or equal to 1, but might be larger. If feeding densities and qualities are not specified with the data, this "repair" is far from ideal, however.

The variation not only concerns environmental conditions, but also differences in parameter values among individuals that have been used. Parameter values tend to vary across the geographical range of a species, a problem that applies to many fish entries. Although parameter values are better fixed with a growing number of data types, the inherent variability works in the opposite direction. This is why marks have been given for both completeness of data and goodness of fit.

Weight coefficients

By default, weights are set automatically (read more here). As a general rule, the weight coefficients should quantify how certain you are about the facts. Thus, depending on your data, you have the possibility to overwrite the automatically set weight coefficients. How exactly is specified in the mydata_my_pet template. We provide here guidance for setting weight coefficients when using the WLS criterium:

  • Choose weight coefficient inversely proportional to the squared data value, to avoid effects of the choice of units. To also avoid too high weights for low values and too much impact for data sets with many points, the weight coefficients of different points in a single data set are all chosen inversely to the product of the number of points in that data set and the squared mean value of the data points.
  • Weigh pseudo-data less than real data in the zero-variate data by a factor 10 (or so).
  • Weigh points that you don't trust with weight coefficient zero.
  • Apply factors to increase or decrease the importance of data in determining the estimated parameter values, accounting for presumed accuracy.
  • First reach a conversion with the above-mentioned rules and then re-weigh particular data points to avoid odd results using the previous estimate as starting values if the result looks promising.

Typical reason to re-weigh particular data points is to avoid unrealistic parameter estimates, and thus not passing the filters; we can't accept negative k_J-values or kap-values outside the (0,1)- range, for instance.

Goodness of fit criterion

For comparative purposes, e.g. to find patterns in parameter values among species, it helps to judge the goodness of fit using the mean relative error (MRE) and the symmetric mean squared error (SMSE). MRE can have values from 0 to infinity, while SMSE has values from 0 to 1. In both cases, 0 means predictions match data exactly. MRE assesses the differences between data and predictions additively, judging equally an overestimation and underestimation of the same relative size (e.g, +20% or -20% will give the same contribution), while SMSE assesses the difference multiplicatively, judging overestimation and underestimation by the same factor equally (e.g. x2 or /2 will give the same contribution). Notice that the result of the minimization of loss functions does not, generally, correspond with the minimum of MRE or SMSE (unless the fit is perfect).

Relative errors in a univariate data set are summarised to that of a single data-point by taking the MRE for all data-points. Only real data, not pseudo-data, are included in the assessment. If all weight coefficients of a data set are zero, it is not included in the computation of the MRE. The best situation is, of course, that of a small MRE. It is likely that the marks for completeness and goodness of fit will be negatively correlated.

Add-my-pet papers