Loading...

Quality Assurance of Exposure Models for Environmental Risk Assessment of Substances

Doctoral Thesis / Dissertation 2000 198 Pages

Mathematics - Applied Mathematics

Excerpt

Contents

Tables

Figures

Summary

Keywords

Acknowledgements

Abbreviations

Preface

1 Introduction

2 Evaluation of models
2.1 Assuring the quality of models
2.1.1 The validation problem
2.1.2 External and internal validation and software evaluation
2.1.3 The importance of the model’s purpose
2.2 Model validation methodology
2.2.1 Internal validation
2.2.2 External validation
2.2.3 Both aspects of validation
2.3 Software evaluation methodology
2.3.1 Quality testing of software
2.3.2 Quality requirements regarding ISO/IEC 12119
2.3.3 Quality requirements for risk assessment programmes
2.4 Discussion
2.5 Conclusions
2.6 Summary

3 Handling Uncertainties
3.1 Types of uncertainty
3.1.1 Uncertainties in exposure assessment
3.1.2 True parameter uncertainty and parameter variability
3.2 Sensitivity analyses
3.2.1 Background and benefit
3.2.2 Methodology
3.3 Scenario analyses
3.3.1 Point estimates
3.3.2 Limitations of the approach
3.4 Probabilistic analyses
3.4.1 Background
3.4.2 Methodological survey
3.4.3 Benefits
3.4.4 Monte-Carlo analyses
3.4.5 Probability distributions
3.5 Standards for exposure assessments
3.5.1 View on the US situation
3.5.2 View on the EU and German situation
3.6 Discussion and conclusions
3.6.1 Methodology for handling the different types of uncertainties
3.6.2 Sensitivity analysis methodology
3.6.3 Probabilistic analysis methodology
3.6.4 The methodology in the context of model validation
3.7 Summary

4 Exposure models
4.1 Terminology
4.2 Types of models
4.3 Description of the models’ structure and equations
4.3.1 Overall system
4.3.2 Fish
4.3.3 Meat and milk
4.3.4 Plants
4.3.5 Drinking water
4.3.6 Human exposure
4.4 Purpose of the models and software
4.5 Probabilistic extension of the models
4.6 Discussion and conclusions
4.7 Summary

5 Substances and parameters
5.1 Selected substances
5.1.1 Polychlorinated dibenzo-p-dioxins (PCDD)
5.1.2 Polychlorinated biphenyls (PCB)
5.1.3 Di-(2-ethylhexyhl)phthalate (DEHP)
5.1.4 Hexahydro-hexamethyl-cyclopenta-[g]-2-benzopyrane (HHCB)
5.1.5 Linear alkyl benzene sulfonates (LAS)
5.1.6 Ethylendiaminetetra acetic acid (EDTA)
5.1.7 1,2-Dichloroethane (EDC)
5.1.8 Benzene (BENZ)
5.2 Input parameters
5.2.1 Parameters for the regional distribution model and its respective scenarios
5.2.2 Parameters of the exposure module
5.2.3 Concentrations
5.3 Evaluative terms for the external validation
5.3.1 Accuracy and uncertainty in effect assessment
5.3.2 Definition of evaluative terms
5.4 Summary

6 Inspection of theory
6.1 Verification
6.2 Underlying assumptions
6.2.1 Fish
6.2.2 Meat and milk
6.2.3 Plants
6.2.4 Drinking water
6.2.5 Human exposure
6.3 Conclusions
6.4 Summary

7 Sensitivity analyses
7.1 Analytic approach
7.2 Substance-based approach (overall system)
7.3 Substance-based approach (exposure module only)
7.4 Conclusions
7.5 Summary

8 Scenario analyses and comparison with measured data
8.1 Bioconcentration model fish
8.1.1 Comparison with experimental data
8.1.2 Comparison to the monitoring data
8.2 Biotransfer into milk and meat
8.3 Uptake by plants
8.4 Human exposure
8.4.1 Predicted doses
8.4.2 Contribution of the exposure pathways
8.5 Concluding evaluation
8.6 Summary

9 Probabilistic uncertainty analyses
9.1 Uncertainty impact analyses of individual parameters
9.2 Cumulative distribution functions of the total daily dose
9.2.1 Comparison with point estimates
9.2.2 Comparison with alternative assessments
9.2.3 Impact of ignoring correlations
9.2.4 Impact of unknown degradation rates
9.2.5 Impact of other age-specific intake rates
9.3 Uncertainty impact analyses of parameter groups
9.4 Conclusions
9.5 Summary

10 Comparison with alternative models
10.1 Alternatives to the bioconcentration model for fish
10.2 Alternatives to the biotransfer model for meat and milk
10.3 Alternatives to the plant model
10.4 Alternative human exposure pathways
10.5 Conclusions
10.6 Summary

11 Software evaluation
11.1 Product description
11.2 Documentation
11.2.1 Printed documentation
11.2.2 Online documentation
11.3 Technical requirements
11.3.1 Installation and system requirements
11.3.2 Stability and reliability
11.3.3 State-of-the-art
11.3.4 Network-support
11.3.5 Miscellaneous
11.4 Correctness of calculations (verification)
11.5 User interface and operability
11.6 Transparency
11.7 Features
11.8 Cooperation with other programmes
11.9 Uncertainty analyses capability
11.10 Support
11.11 Conclusions and proposals
11.12 Summary

12 Conclusions
12.1 Applicability of the models
12.2 Database
12.3 Classes of chemicals posing problems
12.4 General remarks regarding applicability
12.5 Conceptual suggestions
12.6 Concluding remarks

Bibliography

Appendix

Tables

Tab. 3.1 Probability distributions used in this study

Tab. 4.1 Parameters of the plant model

Tab. 5.1 Distribution types used

Tab. 5.2 Characteristics of the investigated regions

Tab. 6.1 Assumptions for estimating bioconcentration for fish

Tab. 6.2 Assumptions for estimating biotransfer into meat and milk

Tab. 6.3 Assumptions for estimating concentrations in plants

Tab. 6.4 Assumptions for estimating the total daily intake

Tab. 7.1 Substance-independent statements on parameter sensitivities of the submodels

Tab. 7.2 Impact of all parameters from the overall system on the daily intake (DOSEtotal)

Tab. 7.3 Impact of all parameters from the exposure module on the daily intake (DOSEtotal)

Tab. 7.4 Impact of the exposure pathways on the daily intake (DOSEtotal)

Tab. 7.5 Sensitivity analysis of the plant model (CRroot)

Tab. 7.6 Sensitivity analysis of the plant model (CLeaf)

Tab. 7.7 Sensitive parameters in relation to substance properties (impact on DOSEtotal)

Tab. 8.1 Estimation of the molecular dissolved fraction

Tab. 8.2 Number of values below detection limit

Tab. 8.3 Estimated averaged fractions of the environmental media for PCDD and PCB intake

Tab. 8.4 Number of values below detection limit in beef

Tab. 8.5 Measured, used and calculated particulate fractions

Tab. 8.6 Overview of KOC estimation functions for non-dissociating organic chemicals

Tab. 8.7 Overview of TSCF estimation functions for non-dissociating organic chemicals

Tab. 8.8 Contributions of the individual exposure pathways to the total

Tab. 9.1 Results of the ranked correlation between DOSEtotal and sensitive input parameters

Tab. 9.2 Survey of the five simulations carried out for each chemical

Tab. 9.3 Results of the goodness-of-fit test

Tab. 10.1 Carryover rates for fodder/milk from six sources

Tab. 10.2 Comparison of exposure pathways used by EUSES and CalTOX™

Tab. 11.1 Examples of errors

Tab. 11.2 Measures to improve the software’s operability

Tab. 11.3 Proposals for improvements concerning the programme’s features

Tab. 12.1 Evaluation of the applicability of problematic substances

Tab. 12.2 Expected deviations (for both the standard and realistic scenario) in OoM

Tab. 12.3 Effort of conceptual changes

Figures

Fig. 2.1 Model validation and software evaluation as parts of the quality assurance

Fig. 3.1 Classification of uncertainty and associated sources

Fig. 3.2 Procedure to select appropriate probability distributions

Fig. 4.1 Successive refinement of the overall system

Fig. 4.2 From plant to model: Simplification and incorporated processes

Fig. 6.1 Regression ranges in the exposure module

Fig. 6.2 Fraction of the non-dissociated form for both acids and bases

Fig. 7.1 Parameters and their connectivity in the plant model

Fig. 7.2 Non-linearity in the plant model for substance PCB52

Fig. 7.3 Sensitivity of the fraction of air in plants for a fictive substance

Fig. 8.1 Calculated and experimental BCF for the rather hydrophilic compounds

Fig. 8.2 Calculated and experimental BCF for the lipophilic compounds

Fig. 8.3 Comparison of the median values of measured and calculated PCB conc. in fish

Fig. 8.4 Comparison of monitoring data for various PCB congeners with calculated conc

Fig. 8.5 Comparison of measured with predicted conc. in beef for PCDD and PCB

Fig. 8.6 Comparison of measured conc. in milk with predictions for PCDD and PCB

Fig. 8.7 Comparison of measured and predicted PCDD and PCB concentrations for grass

Fig. 8.8 PCB in lettuce with and without considering deposition

Fig. 8.9 Measured and predicted concentrations for DEHP and 14C-marked LAS

Fig. 8.10 Calculated conc. in plants using measured and estimated particulate fractions

Fig. 8.11 Comparison of KOW-based KOC estimation functions

Fig. 8.12 Comparison of KOW-based TSCF estimation functions

Fig. 8.13 Results of the scenario analyses of the total daily intake

Fig. 9.1 Cumulative distribution of the total daily dose for TCDD

Fig. 9.2 Cumulative distribution of the total daily dose for PeCDD

Fig. 9.3 Cumulative distribution of the total daily dose for HxCDD

Fig. 9.4 Cumulative distribution of the total daily dose for HpCDD

Fig. 9.5 Cumulative distribution of the total daily dose for OCDD

Fig. 9.6 Cumulative distribution of the total daily dose for DEHP

Fig. 9.7 Cumulative distribution of the total daily dose for HHCB

Fig. 9.8 Cumulative distribution of the total daily dose for 1,2-Dichloroethane (EDC)

Fig. 9.9 Cumulative distribution of the total daily dose for benzene (BENZ)

Fig. 9.10 Cumulative distribution of the total daily dose for EDTA

Fig. 9.11 Cumulative distribution of the total daily dose for LAS

Fig. 9.12 Range of uncertainties of the total daily dose

Fig. 9.13 Maximal deviation between two percentiles using intake rates for children

Fig. 9.14 Contribution of the three parameter groups to the variance of the total daily dose

Fig. 9.15 Refined view of the contribution of the parameter groups

Fig. 10.1 Probability density functions for PCB 52 conc. in milk

Fig. 10.2 Comparison of measured PCDD and PCB conc. in milk

Fig. 10.3 Comparison of calculated milk concentrations with monitoring data

Fig. 10.4 Comparison of calculated and measured concentrations in plants

Fig. 11.1 The status line

Fig. 11.2 Window Study/Defaults/Release Estimation

Fig. 12.1 Illustration of the impact of parameter and scenario uncertainty

Summary

Environmental risk assessment of chemical substances in the European Union is based on a har- monised scheme. The required models and parameters are laid down in the Technical Guidance Document (TGD) and are implemented in the EUSES software. Although the results may have a considerable ecological and economic impact, guidance is rarely given on the applicability of the framework. To fill this gap, an evaluation study of the TGD exposure models was carried out. In particular, the models for estimating chemical intake by humans were investigated. These models, which are a key component in risk assessment, involve a quantification of human contact with envi- ronmental contamination in various media of exposure through various exposure pathways. The objective of this study was two-fold: firstly, to develop an evaluation methodology, since no appro- priate approach is available in the scientific literature. Secondly, to elaborate applicability and limi- tations of the models and to provide proposals for their improvement.

The principles of model evaluation in terms of quality assurance, model validation and software evaluation were elaborated and a suitable evaluation protocol for chemical risk assessment models was developed. Since scientific theories and the mathematical models embedded therein cannot be proved as true, a pragmatic meaning of validation is required, of which the primary purpose is to increase the level of confidence placed in the model. The accuracy of the model outcome is a nec- essary, but insufficient criterion for the quality assurance of models. A wider approach is required which examines the scientific inference that can be made about models with regard to their in- tended purpose. By reviewing the literature on the validation problem, it was found that all the fac- ets of validation can be assigned to generic (internal) and task-specific (external) properties of a model. In this context, sensitivity and uncertainty analyses are essential to tackle the issues of un- certainty. Sensitivity analysis aims to ascertain how a given model depends upon the information fed into it. Uncertainty analysis aims to quantify the uncertainty regarding what comes out of the model. It was argued that targeted uncertainty analysis and sensitivity analysis, as a part of it, is capable of reducing critical uncertainties and represents an essential contribution for assuring the quality of a model. Appropriate and detailed quality criteria for fate and exposure assessment soft- ware were developed. These are based on common standards for software supplemented by spe- cific requirements for application in risk assessment. Altogether, quality assurance of a model in- cludes internal and external validation, and addresses the evaluation of the respective software. It should focus not only on the predictive capability of a model, but also on the strength of the theo- retical underpinnings, evidence supporting the model’s conceptualisation, the database and the software.

The evaluation protocol was subsequently processed and applied to the TGD human exposure models. External validation was performed using a set of reference substances with different physico-chemical properties and use patterns. Substances of interest were PCDD, PCB, DEHP, HHCB, LAS, EDTA, benzene and 1,2-dichloroethane. By using different scenarios, model calcula- tions were carried out and the results were compared with monitoring data and experimentally de- termined values. The comparison was carried out for single submodels on the one hand and for the entire system on the other. For the latter, two scenarios were applied: for the default parameter set of EUSES and for a parameter set representing the German State of North Rhine-Westphalia.

From a theoretical point of view, it was shown that the models strongly depend on the lipophilicity of the substance, that the underlying assumptions drastically limit the applicability, and that realistic concentrations may seldom be expected. If the models are applied without adjustment, high un- certainties must inevitably be expected. In several cases, considerable (explicable) deviations from the measured values were found. This affects extremely lipophilic substances or substances with degradation. Altogether, the comparison to measured real field data shows that for the test chemi- cals, an accuracy within a factor of ten is rarely achieved. It was shown that the concentrations are overestimated by up to two orders of magnitude for the aquatic environment. For superlipophilic and persistent chemicals, higher uncertainties emerge and measured concentrations may also be underestimated. The deviations are caused by unrealistic bioconcentration factors or metabolism on the one hand and by neglecting biomagnification on the other. The biotransfer model for meat and milk represents a conservative estimation. The overestimation is most significant for non- persistent or superlipophilic substances with more than two orders of magnitude. A lack of steady state, metabolism and/or reduced resorption were presumed to be the reasons. The model for de- scribing uptake by plants often leads to an underestimation of the measured concentrations be- cause the model considers chemical uptake from air only via gas exchange. The calculated total daily dose was compared with alternative estimations available from the literature. For several chemicals it corresponds with deviations within two orders of magnitude (for chemicals without a lack of data) when applying more realistic intake values. It was found that low deviations are some- times caused by an equalising effect of overestimations and underestimations in the submodels. The sensitivity analysis revealed that the total daily dose is sensitive to the majority of parameters if a variety of chemicals is investigated. However, there is a set of parameters with negligible impact. Few of the sensitive parameters show extremely sensitive values and should be treated with cau- tion. In order to assign sensitive parameters to substance classes, it is sufficient to distinguish be- tween lipophilic, waterborne and airborne substances. Taking the distribution of input parameters into account, the result only depends on a relatively small subset of parameters. Depending on the substance, up to a quarter of all parameters are important. The uncertainties are high for chemicals ingested via the food chain and lower for those ingested directly via air or drinking water. Only the parameters of the exposure module are important for the former, and parameters from all submod- els are important for the latter.

Regarding the software, it was found that EUSES basically fulfils the postulated quality criteria. Particularly with regard to correctness and stability, (almost) no errors were found. EUSES contains some innovative features. However, numerous alterations are necessary. High complexity, low modularity, and incomplete documentation result in a lack of transparency and are emphasised as major points of criticism. To overcome the inadequacies a more modular design is proposed.

All in all, the overall system was classified as a good compromise between complexity and practi- cability. But several chemicals and classes of chemicals, respectively, with several restrictions were revealed: The investigated models used to assess indirect exposure to humans are in parts currently not applicable for dissociating compounds, very polar compounds, very lipophilic com- pounds, ions, some surfactants, and compounds in which metabolites provide the problems and mixtures. In a strict sense, the method is only applicable for persistent, non-dissociating chemicals of intermediate lipophilicity. Further limitations may exist. Finally, recommendations for improve- ments and maintenance of the risk assessment methodology were presented. Relevant processes which were not included should be considered, several new and simpler concepts should be added, and the relevancy of certain exposure pathways has to be refined urgently.

Keywords

Risk assessment, TGD, EUSES, quality assurance, model validation, software evaluation, fate and exposure models, uncertainty analysis, sensitivity analysis, scenario analysis, assumptions, limita- tions.

Acknowledgements

Sponsorship by the Umweltbundesamt (FKZ 26967075) is gratefully acknowledged. In particular, the support of Bernd Scharenberg as the initiator and contact person of the project is acknowl- edged. My sincere thanks are also extended to Michael Matthies, Stefan Trapp and to my co- worker, Volker Berding, for many inspiring ideas. I also feel obliged to the following persons and institutions for their help and for making data available: Perkons (Landesumweltamt of North Rhine- Westphalia), H.-D. Eschke (Ruhrverband), H. Hecht (Bundesanstalt für Fleischforschung), H. Geyer (GSF-Research Centre for the Protection of Man and the Environment GmbH), T. Jager and J. Bakker (RIVM, National Institute of Public Health and the Environment, the Netherlands). My thanks also go to the collegiate co-workers Nadja Rüger, Markus Brune and Frank Voss and all other people who contributed to this paper. Last but not least, I would like to thank Elke Altekruse and Teresa Gehrs for their administrative and linguistic support.

Abbreviations

illustration not visible in this excerpt

Preface

Risk assessment of chemicals requires the application of mathematical models. The European Union risk assessment scheme provides a framework including software in the form of the Techni- cal Guidance Document (TGD) and the European Union System for the Evaluation of Substances (EUSES). Nevertheless, an evaluation of the entire system regarding its applicability and limitations is lacking.

Neither a standard nor a consensus on how to evaluate such models exists in the scientific theory. Thus, the development of an appropriate methodology was required, which is presented in Chap- ters 2 and 3. The methodology is also useful for evaluating similar models in the context of chemi- cal fate and exposure assessment. Chapters 4 and 5 present the models and underlying database. Presentation of the results is to be found in Chapters 6 to 11. An evaluation of the entire system including proposals for improving it is given in a concluding chapter. It is intended to contribute to a forthcoming update of models and software.

Both the entire system and individual models are investigated. The paper should also be viewed as a reference book to support the user. The nomenclature corresponds with the TGD (EC 1996A) and EUSES documentation (EC 1996B), respectively. In order to assure lucidity, names of vari- ables were sometimes abbreviated.

This paper is one out of two parts of a superior validation study. It focuses on the food chain part of the TGD and on the software evaluation. The regional distribution model was validated by BERDING (2000).

Osnabrück, June 2000

1 Introduction

The risk posed by existing and new notified chemical substances to humans and the environment is to be evaluated within the framework of the implementation of European chemicals legislation. The EU member states put forward procedures for a harmonised risk assessment of chemicals in the Technical Guidance Document (TGD) on directive 93/67/EEC and regulation (EC) 1488/94 (EC 1996A). With the EUSES (European Union System for the Evaluation of Substances) software a computer programme was developed which contains the mathematical models and calculation processes described in the TGD (EC 1996B).

The risk assessment methodology is based on a four-step procedure (NRC 1983) consisting of hazard identification, exposure assessment and dose-response assessment as key components. The risk characterisation as the last step culminates in a so-called PEC/PNEC approach for eco- systems: Predicted Environmental Concentrations (PECs) and Predicted No Effect Concentrations (PNECs) are determined to characterise risk by computing the ratio of both concentrations. For human populations the total daily intake of a chemical is compared to the No- or Lowest-Observed- Adverse-Effect-Level (N(L)OAEL) to specify a Margin of Safety (MOS).

The PECs and the total daily intake are estimated by a combination of mathematical models, lea- ding to a relatively large and complex system. The models have been developed to estimate emis- sions, environmental distribution, fate and exposure, and to guide the assessment of potential hu- man and ecological risks in situations where measurements have not been made or would be im- possible or impractical to make. In order to establish their effective use, however, there is a need to establish the magnitude and sources of uncertainty associated with model predictions to achieve a better understanding of environmental systems, to increase the reliability of models predictions, and to define realistic values that should be used in subsequent risk assessment. The need for this task was strikingly pointed out by GLAZE (1998):

To solve this problem, studies intending to improve the validation status and to elucidate model limitations are needed. GOBAS ET AL. (1998) stated several reasons limiting the applicability of fate and exposure models and emphasised the lack of validation studies and the poorly characterised uncertainty inherent to models.

For the European Union risk assessment approach, there are only a few papers dealing with this problem. Some provide statements on the validity of individual exposure models, while others pro- vide statements on the general approach: JAGER (1995) pointed out where validation efforts are required. DIDERICH (1997) and TRAPP AND SCHWARTZ (2000) discussed future needs and made proposals for the European Union risk assessment methodology in general. They emphasised the need for a detailed evaluation of the models and parameters. In the scope of the development of the EUSES precursor USES (Uniform System for the Evaluation of Substances), a comparison of measured concentrations with modelling results was undertaken by TOET ET AL. (1991). The reg- ression equations for estimating bioaccumulation in fish were evaluated by JAGER AND HAMERS (1997) and ECETOC (1998). Validation studies of the biotransfer model for cattle are shown in DOUBEN ET AL. (1997). The generic one-compartment model for plants was validated by TRAPP ET AL. (1994), TRAPP AND MATTHIES (1995), JAGER AND HAMERS (1997), POLDER ET AL. (1998).

SCHWARTZ (1997) investigated the food chain part and provided a framework for a comprehensive validation study. An inventory of experiences and validation activities of EUSES can be found in JAGER (1998), who subsumed “...the user should be aware of the degree of accuracy and precision to facilitate interpretation of the model results.” All in all, the need for more reliability and certainty of the model calculations was emphasised. Despite all the efforts dealing with individual models, a holistic validation study on the overall system, including the role of interaction between the models, is still lacking. Furthermore, in most of the investigations single processes or partition coefficients were considered under laboratory conditions, not real-life situations.

To summarise, the current EU risk assessment scheme tends toward a large complex system that has not been rigorously validated and that lacks comprehensive uncertainty analyses, based prin- cipally on the belief that holistic correctness justifies predictive extrapolation. In particular, validati- on studies that lead to statements on accuracy and applicability of exposure models are scarcely available. But since model predictions are used as a basis for important decisions, it is essential to evaluate their reliability. Expression of the state (or lack) of knowledge about uncertain model pre- dictions is also necessary for both public and scientific credibility and can identify the most impor- tant areas for further research. Furthermore, over the last several years it has become obvious that our increased understanding of chemicals’ behaviour require the improvement of current methods as well as an implementation of new approaches.

To fill these gaps, the overall objective of this paper is to carry out a comprehensive evaluation of the models for indirect exposure of humans via the environment (i.e. the models to calculate the total daily dose). To accomplish this task, a central question first has to be answered: How can the models of the TGD be evaluated? Often, a procedure termed as validation and realised by a com- parison of model outcomes with observed data is applied. But how can one evaluate models that will be applied to new chemicals on the market, i.e. chemicals for which – by definition – no obser- ved data are available? An appropriate methodology has not yet been developed and, thus, is deri- ved in the first chapters. This includes a collation of techniques that are capable of evaluating ex- posure models. As a preparatory task, the investigated models have to be presented and, with the intention of covering a wide range of chemical properties, various substances have to be selected. Before dealing with model calculations, the general applicability of the models and their agreement to scientific theory has to be revealed. This makes an elaboration of scientific knowledge about relevant physical and chemical processes necessary. The accuracy of the models is addressed by two distinct but complementary approaches: (1) analysis of the uncertainty associated with the predictions and (2) tests of model predictions against measurements. A comparison of the models laid down in the TGD with alternative models concludes the overall study. Finally, the results are combined to evaluate the applicability of the models and the accuracy of their predictions.

Again, the general goal of this paper is to evaluate the models. Dividing this goals into details re- veals several questions that have to be answered. For instance:

- What are the underlying assumptions of the models?
- What are the limitations of the models?
- Are the models formally correct?
- What is the quality of the software?
- Which classes of chemicals will cause problems?
- Do the models correspond to monitoring and experimental data (for chemicals already on the market) and what is the accuracy of the predictions?
- What is the effect of changing an exposure scenario?
- How can the default parameters be evaluated?
- What are the sensitive parameters?
- What are the uncertain parameters?
- What is the impact of a certain parameter or a group of parameters on the result?
- What is the ratio of important to unimportant parameters?
- Do the models offer the best compromise between simplicity and complexity?

Altogether, this paper should provide an overview of the context in which the TGD exposure mo- dels may be employed and of what degree of accuracy may be expected. It aims to be a contributi- on to elaborate the scientific basis and underlying model theory and to provide recommendations for improving the current methods. With the intention of answering questions on general applicabi- lity, accuracy of the results, sensitive parameters and all other relevant aspects for exposure as- sessments, it should also be a contribution for users of the models and software, respectively.

2 Evaluation of models

“We do indeed have a problem with validation”, BECK AND CHEN (2000) articulated, pointing out a profound problem which arise when given the task to evaluate a model. In this chapter the back- ground of model evaluation is investigated. The objective is to understand the meaning of validati- on, to compile a methodology, and finally, to derive a protocol for evaluating the environmental exposure models as laid down in the TGD.

2.1 Assuring the quality of models

2.1.1 The validation problem

The construction and use of mathematical models for exposure assessment are crucial in the con- text of environmental risk assessment for chemical substances (LEEUWEN AND VAN HERMENS 1995). After the development (or synthesis) of a model, questions concerning its applicability emerge: is my model applicable to the class of chemicals under consideration? Can I justify a carry-over of the model from one chemical to another? How accurate are the predicted results? Does the conceptual structure of the model reflect that of the real phenomena? Given a certain task, is my model better than another one? To recapitulate: should I use the model?

In any case, a concept termed as validation (from validus (lat.)) is used to answer these questions. But in the scientific community the concept of validation is debatable, it is defined inconsistently and has led into an intellectual impasse (BECK AND CHEN 2000). Confusions arise from the philo- sophical question to what extent, if at all, models or more generally scientific theories can be vali- dated. Not only commonly accepted fundamental works of POPPER (1963, 1959) show that the truth of a scientific theory cannot be proved, at best it can only be invalidated. Despite this, the public has its own understanding of what the word validation implies and is misled by this expression (BREDEHOEFT AND KONIKOW 1993). Even among modellers, who deem validation as a kind of con- firmation, there is no clear and uniform concept and many expressions circulate. Confusion ap- pears with such concepts as validation, verification, credibility, capability, adequacy, reliability, to name just a few. Despite their plethora and variety, all of these phrases emphasise the applicability of a model to perform a designated task. Against this background, papers have been written to place all encountered terms into an ordered context and to abolish the discords on validation (GAYLER 1999, BECK ET AL. 1997, RYKIEL 1995, ORESKES ET AL. 1994, SARGENT 1993). Neverthe- less, the debate continues.

2.1.2 External and internal validation and software evaluation

Predicting the concentration of chemicals in a strict sense poses problems: Since the ideal of a- chieving, or even approximating truth in predicting novel behaviour of natural systems, is unattai- nable (BECK ET AL. 1997), a more practicable understanding of the concept of validation is required. Proposals emerged to renounce the word validation and to replace it with evaluation (KONIKOW AND BREDEHOEFT 1992) or to broaden the discussion of validation into one of quality assurance (BECK AND CHEN 2000). For this reason, the meaning of validation should be specified precisely.

The historical, but constantly widely accepted understanding of validation is a comparison of model results with numerical data independently derived from experience or observations of the environ- ment, which is indeed insufficient for environmental exposure models. The application of these models in the field of environmental risk assessment for new notified chemical substances exposes this insufficiency. In a pragmatic manner validation of a (mathematical) model can be realised as a rudimental part of the quality assurance of the entire model. The relevance of software quality was stressed by GAYLER (1999), who discussed the evaluation of a computer-based model in terms of adequacy, reliability, accuracy and software quality. Then, the entire model not only includes the mathematical model, but also the software (Fig. 2.1).

illustration not visible in this excerpt

Fig. 2.1 Model validation and software evaluation as parts of the quality assurance.

In the literature validation mainly consists of two aspects: The first is commonly referred to as con- ceptual (SARGENT 1993. ROBINSON 1999), conceptual & functional (JAGER 1995), compositional or internal (BECK ET AL. 1997) validation, and addresses the behaviour, structure and principle appli- cation of the model under consideration. Questions of concern are: Do the underlying assumptions allow an application? Are all obviously relevant processes considered? Does the model conform to expert judgement? What are the most critical parameters in the design of the model?

The second aspect is described by terms like empirical (SARGENT 1993), operational & numerical (JAGER 1995), experimentation & solution & white/black-box (ROBINSON 1999), performance or ex- ternal (BECK ET AL. 1997) validation and focuses on task-specific properties. This aspect aims to answer questions such as: What are the most critical parameters in the design of the model with respect to successful achievement of the particular task? Are there alternative models providing more accurate results by comparison with observed data? How strong are the deviations to a given monitoring study?

It is crucial to distinguish between task-specific properties of a model and its task-irrespective or generic properties. Following BECK ET AL. (1997), it is proposed to classify the validation of a ma- thematical model into an internal and an external part. The internal part addresses all generic pro- perties of the model, while the external one represents all task-specific properties of a model. An external validation is possible before calibrating the model, i.e. fitting the generic model to a given task, or after its calibration. These possibilities are termed as prior and posterior external validation.

The external validation also comprises the evaluation of the used data, because statements on external validity are primarily limited by the nature, amount and quality of the available data. These characteristics can vary considerably from the investigated circumstances and define the bounda- ries of what can be achieved by the validation. It is therefore important that considerable effort is made to ensure that the data are as accurate and representative as possible.

2.1.3 The importance of the model’s purpose

It follows from the applied view of validation that a judgement about the validity of a model must be based on the – previously defined – purpose of the model, including statements on undesirable outcomes. Indeed, CASWELL (1976) also argued that a judgement about the validity of a model cannot be made in the absence of its purpose. He identified gaining an insight into the system’s structure and the prediction of its future behaviour as the two possible basic purposes of a model. By taking this as a framework, purposes (or design tasks) of exposure models can be itemised. Exposure models to be used in a regulatory context are not so much a tool to gain insight into any system’s structure, but rather they are applied in risk assessment for new notified and existing, but, with respect to their exposure, relatively unknown substances and, therefore, have a predictive character (EC 1996A). Examples of design tasks of fate and exposure models are estimations of median partition coefficients (e.g. by using regression equations) or of mean or worse-case expo- sure concentrations. But also the identification of the need for more detailed information is an ima- ginable purpose. All these purposes do not imply providing a model result which is as faithful as possible regarding the “true” behaviour of the substance. The goal of validation is rather to un- derstand the realism of the model relative to its intended purpose. Or in the sense of the well- known saying “All models are wrong, but some are useful”, the validation of models for exposure assessment means providing a confirmation of the underlying theory and statements on the degree of the accuracy to fulfil a given task.

2.2 Model validation methodology

The question remains as to how way the two aspects of model validation can be dealt with. This section reveals essential methods and derives from these a suitable protocol as a contribution to assuring the quality of environmental exposure models.

2.2.1 Internal validation

Model formalism: To deal with the generic properties of the model, the formal correctness has to be checked. The formalism of the mathematical model must be mechanically and logically correct, i.e. it has to be proven if all equations are adopted correctly from the original literature and if all me- chanisms (e.g. the use of techniques to solve an equation) are free of errors. Together with the formal correctness of the computer programme this method is usually constituted as verification (RYKIEL 1995).

Model concept: There are no formal methods for validating the conceptual model (ROBINSON 1999),

i.e. the underlying theory. However, the specification of relevant processes and their comparison with the underlying model assumptions is a useful device. A visualisation of the model complexity by depicting the parameters and their interdependence helps us to understand its behaviour, provi- des transparency and, therefore, greatly facilitates the validation study. It is also necessary to ac- quire an in-depth understanding of the environmental processes and chemical properties involved. With risk assessment models one often has to extrapolate outside current conditions, rendering a purely data-oriented approach invalid. As a consequence, implicit model assumptions and the rele- vance of implemented processes must be evaluated to justify the extrapolations.

Additionally, the time and cost of running the model and analysing its results should also be consi- dered. All these methods contribute to the internal validation and may also be termed as an in- spection of the underlying theory.

2.2.2 External validation

Parameter behaviour: Exposure modelling needs to make extrapolations from the knowledge gai- ned for some chemicals to those with no or very limited field measurements. The release pattern and the environmental conditions that are appropriate for some substances are often substantially different for other chemicals. In predicting the fate of novel substances released into the environ- ment – by definition – no monitoring data are available to be matched to the model results. In spite of this background a comparison of measured against predicted concentrations using surrogate chemicals may be helpful by analogies. But this inference is only appropriate if, simultaneously, all critical parameters are known, which lead to a completely different model response. In a recent work of BECK AND CHEN (2000) the distinction of key parameters in the model from those that are redundant to the task was introduced as a suitable method for the external model validation. They pointed out that a valid model is maximally relevant to its task. In this context “relevance” is defined as the ratio of key/redundant parameters, a property notably independent of the size of the model. A model is of poor relevance for a given task if it contains many input factors whose value does not drive variation in the output being sought for the task. They introduced these terms for models with a task which is defined by constraints (e.g. a predicted concentration must be below a maximal permissible level). However, if the task is merely to predict “most realistic” concentrations (without having further constraints), the proportion of key and redundant parameters is nevertheless valu- able.

Accuracy of the results: When comparing the observed with predicted data the degree of accuracy becomes important. Validity and accuracy are related but separate concepts. As illustrated by ROBINSON (1999), a model can be valid but inaccurate. Agreement between the simulated and ob- served data in accordance with some pre-defined criteria is considered to be the accuracy of the model. It can be dealt with by using statistical measures or visual techniques. A compilation of vi- sual as well as statistical methods can be found in GAYLER (1999). Although the application of sta- tistical methods may often seem obvious, they focus on a purely quantitative comparison of calcu- lated versus observed data and, as demonstrated in GAYLER (1999), different statistical measures may lead to differing results. Using statistics in this case is not, as it may seem, an objective me- thod to determine the accuracy, because restraints arising from (1) the quality of monitoring data,

(2) the selection of the statistical measure and (3) the subjectivity of the predefined criteria. In addi- tion, due to the fact that a quantitative agreement of generic exposure models with monitoring data cannot be expected, we avoid using statistics for the evaluation of generic results.

Furthermore, default input values and other data provided together with the model and software have to be investigated. Where possible, actual values should replace default values selected for input.

2.2.3 Both aspects of validation

Uncertainties: The uncertainty inherent to all model calculations should be investigated and serves, depending on their usage, both validation aspects. For example, taking the proportion of key and redundant parameters as a measure for model performance the role of sensitivity analyses beco- mes a cornerstone in the external model validation. Due to their central role, sensitivity and uncer- tainty analyses require further elaboration and will be elaborated in a chapter nine.

Alternative models: As an alternative to the comparison of predicted against observed data, the model’s results can be compared to both simpler and more complex models. A comparison to simpler models can reveal a too complex model and a comparison with a more complex model can indicate where the investigated model can be improved. One way to obtain an impression of the model’s behaviour in a certain situation despite the lack of field data is to apply models with a diffe- rent structure to identical problems and to compare the results (RAGAS ET AL. 1999). The range of results can be used as a measure for both aspects of validation resulting from different model as- sumptions and structures.

Expert judgement: This method, which has a qualitative nature, can also be used to extrapolate into an area of uncertainty. An expert’s opinion covers knowledge based on both former internal and external validation efforts.

2.3 Software evaluation methodology

The following section describes the aim of the software evaluation and gives a brief overview of general quality requirements for software products. Quality requirements for software products are not a novelty, but they need to be specified in more detail for software dealing with the risk as- sessment of chemicals. The international quality standard for software products is also taken into consideration.

2.3.1 Quality testing of software

Software testing is a process in which compliance with quality criteria is monitored. These quality criteria are formulated in the software specifications and are realised by a defined development process. Software quality can be achieved (directly) by a systematic development process (KNÖLL ET AL. 1996). The aim of software testing is to discover the errors and weaknesses of the program- me under consideration and hence to assist software developers in the improvement of the soft- ware. By declaring that software is to be tested immediately after its development, it could be pos- sible to encourage developers to produce faultless software, thus influencing the stipulated deve- lopment quality (indirect influence on the quality).

Two methods are basically available to test software: firstly, a dynamic test using the programme can be undertaken (test). Errors can be recognised by testing and simultaneously recording the results. These errors are limited to certain mistakes in the software’s properties (e.g. the acceptan- ce of nonsensical input data).

Secondly, the source code and documentation can be reviewed (review). This entails reviewing targets and valid guidelines with the aim of bringing errors and weaknesses to light, but this also serves to acknowledge positive features. Unlike the tests, the reviews represent a static process. Both methods were used to test EUSES. Since the source code was not available, it could not be reviewed.

2.3.2 Quality requirements regarding ISO/IEC 12119

The certification of software products according to international standards is a current issue: in 1994 the international standard ISO/IEC 12119 ”Information technology - Software packages - Quality requirements and testing” was published. This standard describes quality requirements and testing conditions for user programmes, in particular in the field of science and technology. With software products, the accompanying documentation and product description are almost as im- portant as the software products themselves. This standard demands the fulfilment of certain qua- lity requirements for the following three components of the software product (KNORR 1997):

According to the standard, products need to be described. The aim of a product description is to provide details about the supplier, the task of the product, the hard- and software requirements, and the form and extent of the delivery. Also required is information about whether maintenance is offered, and the scope of such maintenance. Details concerning the specific knowledge required to operate the programme (e.g. specialist knowledge) are also significant. All provided details must be correct and verifiable.

Quality requirements are also given for the user documentation, which must contain all necessary details for the use of the programme and must describe all functions that can be called up in a complete and apt manner. Furthermore, general documentation guidelines (layout, construction, etc.) also have to be complied with.

The third component is the programme itself and the accompanying data. All functions listed in the documentation must be executable. All other details given in the documentation must also cor- respond completely to the programme. The functions also have to be operated correctly. The sys- tem must not get into an uncontrollable condition and must be prevented from falsifying or elimina- ting data, even when used incorrectly. No demands are made regarding efficiency, alterability and transferability.

2.3.3 Quality requirements for risk assessment programmes

Good Laboratory Practice (GLP) deals with the organisational development and the conditions under which laboratory checks are planned, carried out, and monitored, as well as the recording and reporting of the tests (KAYSER AND SCHLOTTMANN 1991). A similar approach is desirable for the generation of computer programmes for risk assessment, for which Good Modelling Practice (GMoP) should also be developed and established. The basis for this are quality criteria for soft- ware for exposure and risk assessment, which as yet can only be found in WAGNER AND MATTHIES (1996), VEERKAMP AND WOLFF (1996) and TRAPP AND MATTHIES (1998). According to these and the general quality requirements for software products, the following ten aspects were found to be es- sential for the software evaluation:

(1) Product description: The product description with the software tested here is not as important as for standard software. However, it should still be available in order to clarify technical queries and areas of application before purchase. Particularly important for software products that deal with chemical risk assessment are an exact indication of the version, changes with regard to previous versions, system requirements for use, scope of built-in evaluation functions, support and possible interfaces with other products.

(2) Documentation: The documentation should contain both technical references (installation, ope- ration, etc.) and specialist references (description of the models and theory). It is advisable to pro- vide these details in printed and in online form. The documentation should contain the following features:
- Correctness: Are all of the equations in the documentation identical to those in the original literature and to those implemented into the programme? Were the targets complied with (e.g. TGD equations)?
- Completeness: All details required for the use of the software product must be included. All functions need to be completely described and all error messages need to be explained. If the software is to be installed by the user, complete (correct) installation instructions need to be
provided. If users are intended to maintain the software, a maintenance manual is required. Tutorials often tend to complement the documentation of many programmes.
- Consistency of the various different user documents and the product description must be gua- ranteed (also with respect to the programme).
- Comprehensibility: Comprehensible choice of terms and graphics according to the user group. The use of such terms must be consistent throughout.
- Clarity: Logical structure of the user documentation, in which connections can be recognised (including a list of contents and key words).
- Applicability: List of the ranges and quality of regressions, the basic substance classes, list of the validation studies undertaken to date, etc.

(3) Technical requirements:
- Installation and system requirements: The installation of the programme must be possible ac- cording to the directions and without previous knowledge. The hard- and software requirements should not be more extensive than necessary for the type of problem. It should be possible to uninstall the programme without difficulty and whenever required.
- Stability and reliability: The programme should be stable and controllable at all times. In practi- ce, however, errors can occur, especially when dealing with rather complex programmes, which could possibly cause the programme to ”crash”. It needs to be examined when such er- rors occur (e.g. input of extreme parameters) and what effects they have. Under no cir- cumstance may data be falsified or eliminated.
- State-of-the-art: Current programming standards should be used. Furthermore, the functions provided by the operating system should also be adopted in the software and not newly deve- loped. Examples of requirements for programmes based on Windows 95/NT are (1) a pro- gramme to install and uninstall the software, (2) saving of configuration settings in the system’s database (registry), (3) use of the dialog window provided by Windows, (4) input of long file names
- Network support: Due to the increased networking of computers it would be appropriate to install the software on a network server. This would save costs and administrative time. The presently examined software should also carry these features. Even with locally installed pro- grammes a minimal amount of network support would be sensible to enable at least the resul- ting data to be stored in a central database.

(4) Correctness of calculations: The programme must compute correctly. All of the functions con- tained in the product description and user documentation must operate as described.

(5) User interface and operability: The most important aspects among the models discussed here are correctness of calculations and applicability. Despite this, software should also be tested to see if its ”external appearance” lives up to the present standard of technology and to examine what is required of its users. Is the interface ordered ergonomically? Is redundant information given?
- Programme control: The control of the programme by the user and the reaction of the pro- gramme (messages, masks, lists, etc.) should be uniformly constructed. It must be apparent to users at all times which function is being carried out at that moment in time.
- Flexibility: All programme settings and especially the entering of parameters should not be subject to unnecessary limitations. This gives the programme a wide range of applications and makes it operable for different substances and environmental segments. In literature this crite- ria is also denoted by the term “generic” (Meyer 1988).
- Output: Queries, messages, and results of programme calculations should be comprehensible (clear choice of terms, graphic representations, background information, help function). The is- suing of information should be easily perceptible and easy to read. When a message appears on the screen, users should be able to recognise immediately if it is an acknowledgement, an inquiry, a warning, or an error message.
- Error messages should contain sufficient information about the cause of the error and how to eradicate it (or at least refer users to the manual/documentation).

(6) Transparency: It must be clear to users at all times which calculations are being carried out and how individual models can be linked together. This transparency is achieved by free insight into equations and the logical structure of the models. The transparency of the models is a basic requi- rement for the acceptance of the software.

- Free insight: With uncertainties about computational steps taken by the programme, users should be able to comprehend the model calculations ”by hand”. Besides disclosing all model calculations, an exact description is also required. In particular, all variables, including the units used, must be explained and relationships between the individual models must be comprehen- sible. Complete transparency requires the insight into the source text, which is not the norm with commercial programmes. However, this is the only way to verify the result of a model cal- culation, since the documentation represents a further potential source of error.
- Modularity: A significant concept of software engineering is modularity, which allows for stability and reliability, and also enables programme parts to be reused and freely exchanged (Meyer 1988). For the programmes tested here, this means modularity of the individual models as well as the purely technical functions. This is particularly interesting for users of the software, since they can then recognise the connections between the various models. Data exchange between the individual modules occurs with clearly defined and disclosed interfaces. In REYNOLDS AND ACOCK (1997) modularity is explicitly elucidated and considered as one of the substantial qua- lity criteria.
- Complexity: The programme should not be more complex than necessary. If the number of parameters used and their relationships and other conditions are kept low, the whole pro- gramme is easier to understand, thus contributing considerably to its transparency. Low complexity is not necessarily a contradiction to the demand for flexibility: even a low complex model may offer high flexibility. A comprehensive discussion on complexity can be found in BROOKS AND TOBIAS (1996).

(7) Features: Because of its purpose as a DSS for experts, a certain amount of specialist knowled- ge is required to operate these programmes. But even experts can make typing errors, or may not know all ranges of each parameter. For this reason it is also important with a programme such as EUSES to support users when entering data and applying the models. Important operational requi- rements within the framework of quality control are:
- Messages: If implausible data are entered or if with a regression model the regression range is exited, the programme should deliver the appropriate message. A two-step process is suitable to test plausibility: first of all it is tested whether an entered value is realistic (e.g. molar mass < 1000 g/mol?). The value is accepted, but a warning may appear. In the second step it is tested whether the value is at all physically possible (e.g. concentration > 0 mg/l?). If this second test fails, the value has to be rejected by the programme.
- Relationships and dependencies between parameters should be monitored (e.g. is the melting point > boiling point possible?). Dependencies arise from estimated values. If changes are ma- de to the original value, then the estimated value must also be updated automatically.
- Variable units: Errors often occur with the conversion of units (e.g. kg/kg to mg/g). The pro- gramme should be able to accept different units and convert them internally. If this is not pos- sible, standards should at least be complied with (e.g. SI units).
- Comments on input data enable information on data sources and descriptions to be saved. Details on the user and date of input can be automatically recorded. It is important to ensure that comments are updated after input values have been modified. This could occur with the automatic appearance of a comment window after modification of a value.

(8) Cooperation with other programmes: Exposure models usually require a multitude of physico- chemical data, emission data, among others, which are often saved in the programme’s own data- base. The results produced (e.g. the development of a concentration of a substance in a river de- pendent on time and place) may be further processed with visualisation or statistics programmes or, increasingly, with geographic information systems (GIS) (MATTHIES ET AL. 1997). This situation ensures the flexibility of the programme with regard to data input and output.

In order to cooperate with other programmes, appropriate interfaces must be evolved. With large data stocks, an interface to an (external) relational database system (e.g. Access®, Oracle®, etc.) would be suitable.

The problem here is that with the definition of these import and export interfaces, often only “raw data” are transmitted. But the transmission of all information contained in the programme (e.g. de- pendencies between parameters, estimation functions used, comments on data, etc.) is also im- portant.

(9) Uncertainty analyses capability: It is unreasonable to expect that no uncertainty will attach to a model and the predictions it generates. Users may often ask themselves how reliable or uncertain the computed results are. The facilitation of an uncertainty analysis hence represents a possibility to ensure quality. It needs to be tested to what extent the programme is supported by an uncer- tainty analysis or cooperates with special programmes such as Crystal Ball®, @Risk®, MCSim, etc.

(10) Support: The use of software often leads to technical problems (e.g. Why can’t I install the software on my computer as described in the documentation?), to questions of a scientific nature (e.g. Is model X applicable to chemical Y ?) or to the stage of the programme development (e.g. Is the programme version at hand the most up-to-date one? Do updates exist?). For this reason, technical and scientific support is interesting for users. Furthermore, there should be an information source which informs users about the present status of the programme development.

In order to realise such support, further information sources are required alongside the documenta- tion. Examples are (a) postal contact, or contact by phone, fax or e-mail with the developers and/or contact persons or (b) information on the programme through Internet services (e.g. World Wide Web).

2.4 Discussion

The objective of this chapter was neither to elucidate all published concepts of validation, nor to develop a new one. The issue was rather to compile some of the major and most accepted con- cepts to establish a terminology for use in the field of predictive exposure modelling and assess- ment.

The concept of validation applied here focuses on the quality of the model. Herein, the terms model validation and software evaluation are the basis of the superior quality assurance task. Against the background of many published papers on validation, the concept responses to the “modern” view of validation, which broadens the validation task into a quality assurance procedure and which is closely related to the purpose of the model. Considering validation as a foundation of quality assu- rance seems to be pertinent, because a validation study assures quality in the sense that the mo- del conforms to the user’s requirements and the results are sufficiently accurate. What it does not determine is the extent of accuracy actually required by the user. Indeed, ROBINSON (1999) stres- sed that the manner in which a validation study is performed is more important in forming a user’s quality perception than the quality (or validity) of the model and its results. Subdividing validation into an internal and external aspect is simple, but concise. It is expected that this terminology is pragmatic and able to minimise misunderstandings. Circulating terms of validity can be allocated to one of both aspects.

Additionally, the meaning of validation implies that the validation task is not necessarily cast in terms of predicted concentrations versus monitoring data. If models cannot be validated in a traditi- onal sense, i.e. the comparison of predicted with measured values, which is the fact for novel sub- stances, it becomes a major task to obtain a picture of the behaviour of the parameters involved. Following this approach, validation has an objective and a subjective element. Whenever possible, statements on, for instance, the quantity of uncertainty propagation have to be made in an objecti- ve sense. On the contrary, problems that arise from the assessment of new notified substances in complex environmental systems must be handled in a more subjective manner, i.e. evaluation of the model performance on the basis of expert knowledge.

The papers of RYKIEL (1995) and ROBINSON (1999) explicitly stress the validation of data. In this study data validation plays an important role, too, but it is a part of the external validation where provided model parameters and monitoring data flow into the quality assurance task. It is note- worthy that observed data as well as model results should be considered as an approximation to- wards reality and not as reality in itself, due to the averaging and generic character of exposure models.

The presented methodology should be considered as a selection which can be supplemented if more appropriate methods become apparent. Especially for the validation of the mathematical mo- del methods cannot be instructed, since validation depends on the purpose of the model. More precise instructions can be derived for the evaluation of the software, because here the meaning of high quality is internationally standardised. The compilation of methods is a contribution to establishing a Good Modelling Practice in the field of environmental risk assessment models and is a tutorial for assuring the quality of models.

2.5 Conclusions

After reviewing the literature it became obvious that there is no general validity, i.e. validity is only meaningful with respect to the purpose of a model. Furthermore, the term validation is misleading, because it implies an affirmative result. One should rather speak of quality assurance, which is interpreted in a pragmatic manner. Since there are often no representative observed data available for a comparison with the model results, validation is more than comparing model results with mo- nitoring data. The concept of validation rather underlines that the validity of the (mathematical) model is a necessary but insufficient condition for the acceptability of the whole system, which en- compasses the mathematical model and the software. Furthermore, a valid model represents the existing processes and completes other methods of an internal validation successfully.

There is insufficient time to validate and evaluate, respectively, everything and the heaviness of the quality assurance task increases with the model’s complexity, but the general rule is: the more the better. To assure essential needs the following protocol is recommended:

(A) Prerequisites, i.e. presentation of the
1. model’s structure and its equations,
2. model’s purpose,
3. substances and database.

(B) Model validation by
1. inspection of the underlying theory (particularly, model verification and evaluation of implicit assumptions),
2. sensitivity analyses,
3. scenario analyses and comparison with observed data,
4. uncertainty analyses,
5. comparison with alternative models,
6. evaluation of the used data.

(C) Software evaluation with respect to
1. product description,
2. documentation,
3. technical requirements,
4. correctness of calculations,
5. user interface and operability,
6. transparency,
7. features,
8. cooperation with other programmes,
9. uncertainty analyses capability,
10. support.

(D) Concluding statements on model and software and recommendations.

2.6 Summary

The principles of model evaluation in terms of quality assurance, model validation and software evaluation were elaborated and discussed with the intention to develop a suitable evaluation proto- col.

Since scientific theories and mathematical models for exposure assessment embedded therein cannot be proved as true, a pragmatic meaning of validation is required, of which the primary pur- pose is to increase the level of confidence that is placed in the model. The accuracy of the model outcome is a necessary, but insufficient criterion for the quality assurance of models. A wider ap- proach is required which examines the scientific inference that can be made about models relative to their intended purpose. By reviewing the literature on the validation problem, it was found that all the facets of validation can be assigned to generic (internal) and task-specific (external) properties of a model. Appropriate and detailed quality criteria for environmental risk assessment software were not found in the scientific literature and, thus, they were developed. They are based on com- mon standards, on available publications, and on newly established standards. Hence, a compilati- on of quality criteria emerged which can serve as a basis for the development and evaluation of programmes in the field of environmental risk assessment software.

Altogether, quality assurance of a model includes internal and external validation and addresses evaluation of the respective software. It should focus not only on the predictive capability of a mo- del, but also on the strength of the theoretical underpinnings, the evidence supporting the model conceptualisation, the database and the software.

3 Handling Uncertainties

Heterogeneity in human behaviour and environmental characteristics as well as an inadequate model structure and measurement errors lead to inevitable uncertainties adherent to the model’s outcome. In the preceding chapter the assessment and analysis of these uncertainties were intro- duced as crucial parts in order to evaluate exposure models. The common approach to handle uncertainties is to investigate diverse exposure scenarios and to represent them in terms of proba- bility distributions (probabilistic exposure assessment). This chapter reviews the underlying theory of uncertainty analyses and develops a methodology as a framework for the TGD evaluation. The database used is presented in a later chapter.

3.1 Types of uncertainty

3.1.1 Uncertainties in exposure assessment

To obtain an impression of the amount of possible contributing sources, the overall uncertainty in exposure or risk can be split up into several parts. As depicted in Fig. 3.1, the US EPA (EPA 1997C) classified the sources of uncertainty in exposure assessment into (1) uncertainty regarding parameters (parameter uncertainty), (2) uncertainty regarding missing or incomplete information needed to fully define exposure and dose (scenario uncertainty) and (3) uncertainty regarding gaps in scientific theory required to make predictions on the basis of causal inferences (model uncer- tainty).

illustration not visible in this excerpt

Fig. 3.1 Classification of uncertainty and associated sources.

Input parameters are uncertain for several reasons: variability or errors in measurement, sampling or exertion of data. Scenario uncertainty includes uncertainties resulting from false or incomplete information, such as description, aggregation or judgement errors or an incomplete analysis. Final- ly, due to lack of knowledge or errors in modelling and integrated relationships the structure of the model (i.e. the model in respect of the mathematical expressions of its hypothetical relationships) can also be uncertain.

Alternative terms exist, although the classification scheme behind them is the same. Noteworthy are the terms operational and fundamental uncertainty as used by RAGAS ET AL. (1999), because they correspond to the internal and external aspect of validation: Operational uncertainty results from quantifiable uncertainties in the input propagated through the model equations into the output parameters (parameter plus scenario uncertainty) and can be assessed by quantifying the uncer- tainties in the input. Fundamental uncertainty stems from the assumptions underlying the model structure and equations (model uncertainty) and can be assessed by expert judgement. By compa- ring operational and fundamental uncertainty for the TGD regional distribution model, RAGAS ET AL. (1999) stressed that the fundamental uncertainty perceived by experts exceeds the operational uncertainty calculated by means of Monte-Carlo simulations. This finding emphasises the impor- tance of considering fundamental uncertainty within a model validation.

The identification of the sources of uncertainty in an exposure assessment is important, because it represents the first step in determining how to reduce uncertainty (EPA 1997C). Once identified, the uncertainties can be dealt with using appropriate methods.

3.1.2 True parameter uncertainty and parameter variability

In the context of an uncertainty analysis a distinction between true uncertainty and variability is commonly claimed. True uncertainty (also called type B uncertainty) represents a lack of knowled- ge or partial ignorance about factors affecting exposure or risk, whereas variability (also called type A uncertainty) arises from true heterogeneity across people, places or time (EPA 1997C). Both together contribute to the overall parameter uncertainty, whereas the true uncertainty is a gap in one’s knowledge that can be discerned from the overall uncertainty. In case of uncertain model parameters, it is useful to distinguish between both types because with respect to the interpretation of the model’s result it is valuable to know the contribution of inevitable variance on the one hand and the reducible true uncertainty on the other hand. Secondly, with respect to the consequences of an exposure assessment it is difficult, due to the nature of variability, to constitute acceptable concentrations (e.g. 90%-ile vs. 95%-ile). Reducing uncertainty may help to constitute such values. If the difference between both types of uncertainty is ignored, it becomes difficult to draw useful insights. The fact that a certain parameter is both uncertain and variable aggravates the analysis. However, the overall uncertainty in the parameters can be described using the same formula (e.g. probabilistic distribution functions), although uncertainty and variability are conceptually diverse. If uncertainty dominates an exposure assessment, then one needs to intensify research in order to obtain better parameter values. If variability dominates, one may be able to stratify the variability for sensitive cases. When both true uncertainty and variability are negligible, one truly has a determi- nistic result. If true uncertainty is negligible relative to variability, then a variability analysis simply represents the expected statistical variation in the outcome. If neither variability nor uncertainty are negligible, for practical reasons the distribution function representing variability cannot be given precisely. Methods for analysing uncertain variability distribution are the subject of current research (PRICE ET AL. 1996).

3.2 Sensitivity analyses

3.2.1 Background and benefit

While analysis of the overall uncertainty involves the determination of variation in an output function based on the collective variability and true uncertainty of model inputs, the sensitivity analysis, in contrast, involves the determination of changes in model response as a result in individual model parameters. An investigation into sensitivity may be carried out beforehand or after an uncertainty analysis. Doing it beforehand helps to identify influential parameters with the intention of reducing costs and effort, since those without impact may be left as deterministic. Applying sensitivity analy- sis after working with uncertainties may confirm the reliability of the previous work or may reveal further need for research (FINLEY AND PAUSTENBACH 1994, HAIMES ET AL. 1994).

However, the usual approach is to carry out sensitivity studies to assess the effect of varying inputs on the overall output (COX AND BAYBUTT 1981). Also in this work the objective of a sensitivity analy- sis is to find out those parameters with the strongest impact on the models’ results.

3.2.2 Methodology

Different approaches for conducting sensitivity analyses exist, including methods which operate on one variable at a time (e.g. differential sensitivity analysis, HAMBY 1994) or those which handle many variables simultaneously (e.g. Spearman rank order correlation, as implemented in Crystal Ball®, Decisioneering 1999). No consensus exists as to a best approach. However, the differential sensitivity approach (COX AND BAYBUTT 1981, MORGAN AND HENRION 1990, HAMBY 1994) always results in the same sensitivity indices, irrespective of the number of investigated variables, and is easily reproducible without further software. For that reason, this method is applied. It defines a sensitivity function S(Xi) with respect to input parameter Xi by taking the partial derivatives.

illustration not visible in this excerpt

The quotient Xi / Y is introduced to normalise the coefficient by removing the effect of units. The effort in solving this equation can be quite intensive and, instead, the derivatives can be approxi- mated as a finite difference by replacing the denominator of the partial derivative by Xi ± n %.

3.3 Scenario analyses

3.3.1 Point estimates

It remains to find out by which means uncertainty can be reflected when a developed model is gi- ven. One way is to take a deterministic model and to carry out point estimates for various exposure scenarios and assumptions (scenario analysis). Each scenario used is a hypothetical construct, based on a set of facts, assumptions and inferences about how exposure takes place, which as- sists in the estimation of uncertain exposures. For example, the EU risk assessment scheme deli- berately creates a standard scenario in the TGD which is a conservative point estimate, i.e. it should protect public health. According to FINLEY AND PAUSTENBACH (1994), this approach is most useful as a screening approach which approximates a remote, yet plausible, worst-case situation for some subpopulation of potentially exposed persons. In addition, the calculation of point estima- tes is a desirable first step which should subsequently be followed by a probabilistic risk assess- ment (BURMASTER AND ANDERSON 1994).

The deviations between different scenarios can then be characterised by orders of magnitudes. This approach, which is presented in the form of range/confidence estimates and uncertainty indi- ces by RICHARDS AND ROWE (1999), is useful for certain classes of problems: (1) as a mean to pro- vide screening for uncertainties, (2) when data are insufficient for more comprehensive treatment, (3) when the data are from widely different sources with different degrees of precision, and (4) when safety factors are used to provide margins of safety (i.e. the ratio of the effect assessment results to the total daily dose).

3.3.2 Limitations of the approach

The intention of a scenario analysis is to cover a broad range of possible outcomes. However, ave- raged values are used for each scenario. Such an estimate is then interpreted as a reasonable case. To be on the “safe side” for the protection of human health and environmental damage, worst-case assumptions are commonly applied. But the more parameters are described by worst- case assumptions, the more unrealistic the result is likely to be. For example, applying worst-case assumptions (e.g. the 99%-tile) to both parameters of the simple multiplicative model f(x,y) = x·y leads, according to the laws of probability theory, to a resulting probability P = (1-0.99)2 = 0.0001.

Using 90%-iles as input for those at minimum more than 20 multiplicatively connected parameters of the TGD plant model (EC 1996A) would lead to P = 1E-20. This phenomenon is greater, the more parameters and models are combined and, thus, can be noted as a cumulative worst case. For example, COPELAND ET AL. (1994) showed using the example of a case study, that the Califor- nian point estimate method results in estimates greater than the 99.99th percentile. PRICE ET AL.

(1996) calculated lifetime average daily dose rates for individuals exposed to 2,3,7,8-TCDD and found that predicting exposures from indirect exposure pathways may considerably overestimate the intakes for typical and high-end individuals. For those reasons, the point estimate should not be misconstrued as realistic. As stated in the guidelines for exposure assessment (EPA 1992), a point estimate cannot be used to make a determination that a pathway is significant, and it certainly can- not be used to estimate actual exposure. In the case of a scenario analysis, information on uncer- tainty is restricted to a qualitative statement of confidence in the results. For instance, uncertainty in the point estimate is less than one order of magnitude. Unfortunately, these qualitative statements are difficult to assess, particularly when the assessment involves potential exposure to several contaminants transferred via a number of different pathways (HOFFMAN AND HAMMONDS 1994).

3.4 Probabilistic analyses

A possibility to overcome the limitations of the previous section is to perform a quantitative analysis of uncertainty using probabilistic techniques to propagate uncertainty in models into an assessment of uncertainty in the exposure. The aim of the probabilistic assessment is then to quantify the pro- bability of the model’s outcome and to develop a ranking of input parameters concerning their contribution to the overall uncertainty.

3.4.1 Background

To assess uncertainty one can think of a model as producing an output Y, such as a PEC, that is a function of several input variables Xi i.e. Y = f(X1,...,Xk). Describing uncertainty in a predicted dose or concentration involves the quantification of the range of Y, e.g. by the arithmetic mean and stan- dard deviation of Y, and upper and lower percentile values such as 10% lower bound and 90% upper bound. To characterise the uncertainty of a parameter with a measure independent of the parameter value, the coefficient of variation (standard deviation divided by mean) is stated whene- ver possible. It ranges typically between 0 and 1 and might exceed unity in cases where the stan- dard deviation is very high. Convenient tools for presenting such information are the probability density function (PDF) and the cumulative distribution function (CDF) for Y. However, the PDF or CDF of Y can often be obtained only when meaningful estimations of the probability distributions of the input parameters Xi are available. If this information is missing or incomplete, the PDF or CDF for Y can still be constructed, but they should be characterised as screening distributions for para- meter uncertainty rather than realistic representations of the uncertainty (MCKONE AND BOGEN 1991).

Several papers have identified, compared and evaluated probabilistic approaches for assessing uncertainty in exposure models: The subject of uncertainty analysis as a whole was discussed in a the fundamental work of MORGAN AND HENRION (1990). It was stressed that the probabilistic appro- ach is a suitable tool for evaluating the uncertainty in the parameters, but not for handling the mo- del or scenario uncertainties. MCKONE AND RYAN (1989) investigated sources and the impact of uncertainty in simple compartment models for human exposure assessment. Case studies for or- ganic chemicals were, for instance, provided by the estimation of the tetrachloroethylene cancer potency from uptake of water to characterise uncertainty in human exposure models (MCKONE AND BOGEN 1991), by human exposure assessments to hexachlorbenzene and benzo(a)pyrene through home-grown food to determine the relative contribution of uncertainty and variability (MCKONE 1994) or by the oral uptake of PAH via drinking water and other sources (IHME AND WICHMANN 1996). The majority of publications deal with relatively simple multiplicative models for human health risk assessment. But some papers also exist for regional mass balance models (MACKAY AND PATERSON 1984, SCOTT ET AL. 1998). Also RAGAS ET AL. (1999) estimated uncertainties in the multi-media fate model SimpleBox by comparing the model calculations with independently derived environmental quality objectives for air and water. For a set of diverse organic chemicals by using the CalTOX™ (DTSC 1993) system, HERTWICH ET AL. (1999) evaluated the variance in the calcu- lated dose which can be attributed to the uncertainty in chemical-specific parameters as well as the variability in exposure factors and landscape properties for the state of California.

Using the example of two chemicals JAGER ET AL. (2000) have carried out the only probabilistic risk assessment with an EUSES equivalent system so far. Like many other scientists, they emphasised the gain of information.

3.4.2 Methodological survey

Probabilistic exposure assessments can be carried out by means of different methods. In an analy- tical manner, HELTON (1994) and KLEPPER (1997) dealt with methods for handling uncertainty in complex systems. COX AND BAYBUTT (1981) as well as IMAN AND HELTON (1988) considered analytic and numerical techniques, including Monte-Carlo simulations, response surface approaches, diffe- rential sensitivity techniques and evaluation by means of classical statistical confidence bounds. They concluded that some approaches are sufficiently general and flexible for use as overall me- thods of uncertainty analysis, and others may be very useful for particular problems. Recently, de- cision trees were used to characterise uncertainty and probability distributions to incorporate varia- bility in a human exposure dose (SIELKEN AND VALDEZ-FLORES 1999). By using the TGD calculation for the local PEC in water as an example, SLOB (1994) has shown that analytical methods may be mathematically an elegant way of identifying uncertainty for multiplicative models with lognormal distributed parameters. However, they limit the assessment by constraints (e.g. requirements re- garding the type of distribution functions). Furthermore, the models laid down in the TGD consist of a great amount of parameters, they are not linear and show discontinuities in their behaviour. Nu- merical methods for uncertainty analyses have proved to be useful for such large and complex models. One of these methods is the well-established Monte-Carlo analysis.

3.4.3 Benefits

Probabilistic approaches, particularly due their versatile applicability, have been identified as a valuable contribution to handling uncertainties in risk assessment. The benefit of the probabilistic approach has been elaborated by several authors: FINKEL (1994) emphasised in his didactical work the gain of information and perspective, which is not available in any less complete descriptions. THOMPSON ET AL. (1992), COPELAND ET AL. (1994) and FINLEY AND PAUSTENBACH (1994) have shown

that the outcome of the probabilistic approach is considerably lower than the point estimates of a deterministic worst-case approach. Even for simple exposure scenarios the upper percentiles are overstated by a factor of 3 to 5. Looking at more complex assessments, deviations of up to 2 log units may occur. For example, after comparing case-studies for dioxins and volatile chemicals, FINLEY AND PAUSTENBACH (1994) pointed out that as the number of exposure pathways and variab- les growths, the difference between the point estimate and the 95th percentile of exposure increa- ses and almost always becomes significant when secondary exposure pathways are considered: the 95th percentile of a probabilistic assessment which requires the consideration of multiple direct pathways is usually 3-5-fold less than the point estimate. Considering indirect pathways of exposu- re, the percentile is often as much as an order of magnitude less. Altogether, all these studies re- veal that a probabilistic approach to uncertainty basically has the following three advantages:

- More realism: The complete distribution is considered instead of some single values. This ex- tends information and perspective concerning the exposure.
- More scientific due to the separation of risk assessment and risk management: it becomes obsolete to constitute criteria for the different endpoints (e.g. 99th percentile as worst-case) within the scientific part of the risk analysis.
- More robust: It was shown that the probabilistic approach is more robust regarding changes in one single exposure variable.

3.4.4 Monte-Carlo analyses

In a Monte-Carlo analysis, one of two sampling schemes are generally employed (EPA 1997A): S imple random sampling or Latin hypercube sampling. In the basic form of a Monte-Carlo analysis the model’s outcome is calculated directly from empirical probability distributions of the input para- meters. Each input parameter is expressed by a probability distribution that defines both the range of values and the likelihood of each value in the range. Simple random sampling is used to select each member of the input parameter set. Arguing with the strong law of large numbers it follows, with high probability, that the outcome provides a good representation of the true output distributi- on. Latin hypercube sampling may be viewed as a stratified sampling scheme designed to ensure that the upper or lower ends of the distributions used in the analysis are well represented. It is con- sidered to be more efficient than simple random sampling, that is, it requires fewer simulations to produce the same level of precision. Latin hypercube sampling is generally recommended over simple random sampling when the model is complex or when time and resource constraints are an issue. Advantageous is the fact that the inputs do not necessarily have to be stochastically inde- pendent (COX AND BAYBUTT 1981). Furthermore, there is no restriction on the form of the joint input distribution or on the nature of the relationship between input and output. A further advantage of this method is that the model can be used in its original form. Any error-prone re-formulations of the model, as needed for analytical methods, are not necessary. In addition, confidence intervals for calculated quantities can easily be developed. Several methods for ranking uncertainty exist, such as correlation coefficients and rank correlations (DECISIONEERING 1999). The disadvantage of the Monte-Carlo method is the huge amount of effort required to carry out calculations. Reliable results require a certain amount of simulations, so-called shots. According to MORGAN AND HENRION [illustration not visible in this excerpt] ried out following the principles of good practice for Monte-Carlo techniques proposed by BURMASTER AND ANDERSON (1994).

During a Monte-Carlo analysis, it is easy to generate a rank correlation: The calculated input and output parameter values of each shot are saved in lists. The lists are sorted and the values are replaced with a numerical ranking starting at 1 for the lowest value in the list and ending with n (the number of shots) for the highest value in the list. A correlation is then computed for each pair of lists and, thus, one obtains the strength of the relationship between each varied parameter and the result. An advantage is the possibility that after a normalisation according to

illustration not visible in this excerpt

a correlation coefficient (ri) can be expressed as the contribution to the result’s variance (vi) in rela- tion to all other parameters.

3.4.5 Probability distributions

The validity of any analysis is contingent upon the validity of its inputs. Characterising the type of distributions for input parameters is a major task, because Monte-Carlo simulations will transmit the input information directly to the final result, making its distribution appropriately sensitive to the influence of badly chosen distribution functions. But what is decisive for the assignment of distribu- tions? HAIMES ET AL. (1994) pointed out that distributions should represent the state of knowledge. FINLEY ET AL. (1994) stressed the importance of physically meaningful distribution functions in contrast to the relevance of using mathematically elegant models. Additionally, they derived from various case studies that the type of distribution is often less important than the validity and appli- cability of the database. If the assumption is made that the uncertainty in the model's outcome is the result of many multiplicative factors, it follows from the Central Limit Theorem that the result will tend to be lognormally distributed. Since most exposure model parameters are the result of mul- tiplicative factors, also in the literature most parameters are represented by a lognormal distributi- on. Also for most physico-chemical parameters, there are strong theoretical and empirical argu- ments to assume lognormal uncertainty distributions (SLOB 1994, SEILER AND ALVAREZ 1996).

Tab. 3.1 Probability distributions used in this study.

illustration not visible in this excerpt

In addition, to contribute to quality assurance the Kolmogorov-Smirnov goodness-of-fit test was applied to check if the assumption of lognormally distributed results can be justified. This test re- presents a measure of the largest vertical distance between two cumulative distributions. General- ly, a value less than 0.03 indicates a good fit (DECISIONEERING 1999). The parameters which have a physical limit in value are modelled as truncated lognormals. Such parameters include fractions that cannot exceed unity or partition factors that, by theory and measurement, cannot exceed cer- tain values. However, HAMED AND BEDIENT (1997) showed with restriction on the example of a mul- tiplicative lifetime cancer model that the choice of distribution does not alter the order of importance of the basic uncertain variables.

Due to the fact that a distribution is a priori known or unknown, a procedure for selecting appropri- ate distributions can be derived. The procedure (Fig. 3.2) for a parameter is (1) to prove, based on the sensitivity analysis, if the parameter can be ignored. (2) If not, is there a known distribution or are there theoretical reasons to assign a certain distribution. (3) If this is not the case, are there adequate data to fit a distribution? If none of these three steps can be fulfilled, only surrogate data in combination with expert judgement have to be exploited. In this way the probability distribution is assigned on the basis of available data, combined with the judgement of experts.

illustration not visible in this excerpt

Fig. 3.2 Procedure to select appropriate probability distributions.

In assigning probability distributions the choice of the underlying database, reliability in extreme margins and correlations between parameters may cause problems and necessitate special atten- tion:

Underlying database: Fitting distributions is possible by means of empirical data. But ANDERSON AND HATTIS (1999) stressed that distribution fitting is an overused and often pointless exercise, par- ticularly if only a few data are available. Fitting a distribution to a non-representative dataset is, in fact, non-representative and may therefore be irrelevant to the assessment. In addition, a problem may occur when data originating from different studies are mixed (FINLEY AND PAUSTENBACH 1994) and when study design and study methods are incomparable. However, non-representative data is more the rule than the exception and the common problem of creating parameter distributions is the poor database for nearly all parameters. Thus, in practice the creation of distributions is less a question of statistical methods, it is more a question of expert judgement (ANDERSON AND HATTIS

[...]

Details

Pages
198
Year
2000
ISBN (eBook)
9783638103749
File size
3.1 MB
Language
English
Catalog Number
v547
Institution / College
University of Osnabrück – Institute for Environmental System Research, Mathematics/Computer Science
Grade
1,0 (A)
Tags
Risk assessment TGD EUSES quality assurance model validation software evaluation fate and exposure models uncertainty analysis sensitivity analysis scenario analysis assumptions limitations

Author

Share

Previous

Title: Quality Assurance of Exposure Models for Environmental Risk Assessment of Substances