SETAC Globe Learned Discourses: Timely Scientific Options
SETAC Globe is published by the Society of Environmental Toxicology and Chemistry (SETAC)
Robert P. Eganhouse and James
U.S. Geological Survey, Reston, Virginia, USA
Editors Note: This is an expanded version of the article that appeared in the January-February, 2003 issue of the SETAC Globe and appears here with permission from SETAC.
This article is also available in pdf format (180 kb)
We appreciate the opportunity to respond to comments made by Burkhard and Mount (2002) concerning our Learned Discourse (LD) article. It is gratifying to learn that they agree, " the development of standards for selecting high quality values, and a quality-assured database of consensus KOW values would be of great benefit". However, we believe their criticisms reflect a lack of understanding of the purpose of our original study and its findings. The LD article was abstracted from a report (Pontolillo and Eganhouse 2001; downloadable as a PDF file) that discusses our results in much greater detail. Some misunderstanding may have resulted through the distillation of information from our 2001 report to the LD.
As indicated in our 2001 report, this study was not planned. Like many data users, we were seeking reliable physico-chemical property data for use in interpreting results of a field study. As the data search proceeded we encountered numerous problems that made the process much more difficult and time consuming. Once we could find no more published data (after nearly 2 years of sustained, if intermittent, effort), we faced the challenge of determining the true KOW and SW values for the two compounds of interest (DDT and DDE). Unfortunately, guidance from the literature was conflicting, and the number of measurements that had been made with widely accepted techniques (e.g. KOW-slow stir, SW-generator column) were few and not always in agreement. Because all KOW and SW data for these compounds had been identified, we decided to carry out a comprehensive analysis of the original data sources, commonly used compilations and databases, and trends in data production. Our analysis revealed two fundamental problems: 1) errors in reporting data and references, and 2) poor data quality and/or inadequate documentation of procedures. While the first problem has the effect of making data retrieval difficult, the second problem places limits on the use of data. Because of the potential significance of these issues to the environmental science and regulatory communities, we felt compelled to share our findings. This ultimately led to publication of the aforementioned report.
In their critique of the LD article, Burkhard and Mount (2002) raise five main points:
Point 1: Burkhard and Mount complain that we assembled data without censoring it, giving a false impression of the uncertainty in KOW and SW values. In brief, the process we undertook was as follows: assemble all data sources, examine them, rate them, and comment on any trends. In presenting a complete, rather than censored, data set, we wished to avoid the common practice of rejecting data without documenting the procedures by which this was done. We presented all of the data as they exist in the literature. We never stated that the uncertainty in these property/compound pairs is 2-4 orders of magnitude. However, we acknowledge that the wording used in our LD article stating, " there is little indication that the uncertainty in these properties has declined over the last 5 decades " may have left the wrong impression. It is our opinion that the existing data for these compounds (DDT, DDE) and properties (KOW, SW), which happen to range over 2-4 orders of magnitude, do not support the estimation of uncertainty. This is due to a paucity of reliable data and/or the inadequate documentation of procedures of determination. As seen in Figure 1 of the LD article, the published data do not converge to a narrow range in more recent years, and as noted in our 2001 report, the precision of individual data points has only rarely been reported (DDT/DDE: SW-20%, KOW-10%). These patterns have persisted for the last 5 decades.
It is claimed that we " treat all methods as equal " in the LD article (Eganhouse and Pontolillo 2002). Although space limitations did not permit a discussion of methodology in this brief article, anyone who reads our 2001 report will find that differences among the methods and the implications of these differences are discussed (cf., Effect of Methodology section, p. 20-23). In addition, we presented (and documented) an assessment of the quality of all sources of log KOW and SW data for DDT and DDE (p. 19-20, Appendix B, p. 41-51). This assessment included consideration of the acceptability of the methodology and documentation of the measures taken to ensure the quality of property determination (e.g. purity of solvents, minimization of emulsions, etc ). Clearly, we did not treat all methods as equal.
Concerning the data ranges for methods of choice as discussed by Burkhard and Mount, even if one assumes that the data are of equivalent reliability, we do not believe that: 1) a factor of 3x (n=5) is acceptable for all environmental purposes (SW/DDT), 2) that two measurements are sufficient to assess the true uncertainty (SW/DDE), and 3) that a range of 15x (n=8) should be viewed as satisfactory (KOW/DDT) under any circumstances. The authors state that censoring of the KOW /DDT data (using the data source quality assessment criteria described in our report), reduces the range from 15x to 4x. However, this leaves only two data points, which, as stated above, are insufficient to assess uncertainty. In the case of KOW/DDE, a range of 2x (n=2) was presented, but one of the measurements is unpublished (see U.S. EPA 1995), and is actually for DDT, not DDE. Even if the datum had been for DDE, its reliability is indeterminate. In any case, n=1. No range, let alone uncertainty, can be established with a single data point. Although data produced with widely accepted methods do exist, we believe these data are too few, and in some cases too disparate (or inadequately documented), to assess the level of uncertainty. More importantly, we feel the task of identifying reliable data and reaching conclusions about parameter uncertainty is problematic for the average user.
Point 2: Burkhard and Mount state that we do not acknowledge the fact that measurement difficulties increase with increasing KOW and decreasing SW of the compound. Although confusion about this issue seems unlikely to us, we appreciate the efforts of Burkhard and Mount to clarify the situation. While the LD article does not discuss this issue (owing to space limitations), our 2001 report, for which a link was supplied, does (cf., Introduction, p. 2). Throughout the report we are careful to note that we are speaking of hydrophobic organic compounds, not " all nonionic organic chemicals. " The focus on HOCs is also reflected in the titles of both the report and the LD article. The issue of comparability of methodologies for lower KOW, higher SW compounds is, thus, not germane to our report or the LD article.
Point 3: Burkhard and Mount forward arguments on behalf of current estimation models such as SPARC and ClogP and suggest that we appear to view assessments that rely on predicted, rather than measured, values with trepidation. In support of these estimation models they cite the " excellent correspondence between model predictions and data from high quality measurements for most chemicals." A fundamental problem with this type of analysis is the fact that if the uncertainty of the measured value is not known, the accuracy of the model prediction cannot be assessed.
While predictive methods have seen continuous refinement in recent years, a number of well-documented problems with these approaches remain. Among them are the following: 1) data used to train algorithms are sometimes also used to judge the predictive capabilities of models, 2) although high regression coefficients are found when comparing predicted versus measured values for large numbers of compounds on log-log plots, significant errors (> 2 log units) can be observed for individual compounds (e.g. Schüürmann et al. 1995; Duban et al. 2001), 3) different models usually produce different predictions for a given compound/property pair and the differences can be significant (see Figure 1, Burkhard and Mount, 2002; Duban et al. 2001), 4) the data sets used for training algorithms as well as the criteria used in selecting the training data are not documented preventing users from assessing the reliability of the underlying data, 5) estimation techniques are known to overestimate KOW at high values where fewer reliable measurements are available (log KOW > 6; Kühne et al. 1994) and 6) methods vary significantly in their ability to predict parameter values for different compound classes (Kühne et al. 1994; Schüürmann et al. 1995). Without going into these and other issues in detail, it should be apparent that models are only as good as the data they are based on, and, like experimental methods, they are not created equal. For these reasons, especially in the case of HOCs, it is difficult for a user to judge which model is best suited to a given purpose.
Burkhard and Mount state that the results obtained with SPARC predictions are particularly significant because they are developed from " first principles only " and that " there is no empirical training of the algorithm " We find these assertions puzzling in light of statements by Karickhoff et al. (1991) [see also Long et al. 1999] that the basic philosophy behind SPARC is " not to compute any chemical property from 'first principles' " and that " the program takes initial 'guesstimates' (and appropriate boundary constraints) together with a set of designated training data and provides an optimized set of model parameters." Either SPARC has changed or Burkhard and Mount are not familiar with its underpinnings. Because a description of the solvation models used by SPARC to predict SW and KOW has not been published, we are in no position to determine which of these alternatives is more likely.
Finally, we are surprised that the authors feel that the level of agreement between ClogP and SPARC predictions and the slow-stir data of de Bruijn et al. (1989) and de Maagd et al. (1998) [not de Haagd et al.] are good. The datasets appear to have been chosen to demonstrate the predictive capabilities of these models under optimal conditions, yet for compounds in the log KOW range considered by our report (i.e. log KOW > 6), the results are less than impressive and certainly are no substitute for reliable experimental data. [We say this not knowing whether some or all of the de Bruijn et al. (1989) and de Maagd et al. (1998) data were used in training the ClogP algorithm.] One wonders what the slope and correlation coefficients would be if the data range were restricted to log KOW > 6.
Point 4: Burkhard and Mount indicate that our statement of the need for creation of a mechanism for archiving reliable data is not a new idea and that there already are groups that warehouse and evaluate such data. We did not portray this as a new idea, but the fact remains that there is no centralized system for carrying out such a function at the present time. Our study has shown that the databases mentioned by Burkhard and Mount are not free of erroneous data. Moreover, because the criteria and methods used in data screening for these databases are not transparent, users cannot determine for themselves which data were considered, which were excluded and how screening criteria were applied. For the user it, thus, becomes a matter of faith. We maintain that there continues to be a need for a properly documented, centralized system that can facilitate the availability of certifiably reliable data.
Point 5: Burkhard and Mount consider our conclusion that " estimation of critical environmental parameters on the basis of SW and KOW is inadvisable because it will likely lead to incorrect environmental risk assessments" is neither helpful nor appropriate. In fact, they consider it alarmist. We suggest that the appropriate metric here is whether our conclusion is supported by the evidence. While we do not deny or minimize the advances made in environmental science, the need for reliable data has never been greater. Models predicting contaminant transport and fate are increasing in number and complexity. More and more they are linked to regulatory decision-making. Uncertainties in the input parameters are undeniably reflected in the uncertainties in model output. The extent to which uncertainties in input parameters can be determined (which varies from not at all to quite well) and the effects of parameter uncertainty on model output (i.e. model sensitivity) are among a number of factors that determine whether model predictions are valid (Bennett et al., 2001). It seems obvious to us that the deficiencies of the existing database for DDT and DDE (and likely other HOCs), both in terms of the literature and the paucity of high quality measurements, is a matter of concern. This deficiency has been recognized by the U.S. EPA, which stated as recently as 1998 that " there remains a serious shortage of reliable measured data for compounds with higher log KOW values (log KOW > 5)" and "For chemicals with log KOW > 5, it is highly unlikely to find multiple 'high quality' measurements " (Appendix F of U.S. EPA, 1998). It also was the basis for excluding DDT as a suitable test compound in the recently published OECD-sponsored slow stir validation study (Tolls, 2002).
Finally, we object to the characterization by Burkhard and Mount that the 2 to 4 order of magnitude range in SW and KOW data for DDT and DDE represents the uncertainty in these properties. We have never stated that. The true uncertainty remains unknown.
The authors wish to thank Ike Winograd and Ed Furlong for providing reviews of an earlier draft of this response. Their comments significantly improved the manuscript.
Bennett DH, Scheringer M, McKone TE, Hungerbühler K. 2001. Predicting long-range transport: A systematic evaluation of two multimedia transport models. Environ Sci Technol 35:1181-1189.
Burkhard LP, Mount, DR. 2002. Comments on Eganhouse and Pontolillo: Assessing the Reliability of Physico-Chemical Property Data (KOW, SW) for Hydrophobic Organic Compounds: DDT and DDE as a Case Study. SETAC Globe: Learned Discourse November-December: 40-42.
de Bruijn J, Busser F, Seinen W, Hermens J. 1989. Determination of octanol/water partition coefficients for hydrophobic organic chemicals with the "slow-stirring" method. Environ Toxicol Chem 8:499-512.
de Maagd PG-J, ten Hulscher DThEM, van den Heuvel H, Opperhuizen A, Sijm DTHM. 1998. Physicochemical properties of polycyclic aromatic hydrocarbons: aqueous solubilities, n-octanol/water partition coefficients, and Henry's Law constants. Environ Toxicol Chem 17:251-257.
Duban ME, Bures MG, DeLazzer J, Martin YC. 2001. Virtual screening of molecular properties: a comparison of log P calculators. In: Pharmokinetic Optimization in Drug Research - Biological, Physicochemical and Computational Strategies, B. Testa, H. van der Waterbeemd, G. Folkers and R. Guy (Eds.), Wiley, Zurich, p. 485-497.
Eganhouse RP, Pontolillo J. 2002. Assessing the reliability of physico-chemical property data (KOW, SW) for hydrophobic organic compounds: DDT and DDE as a case study . SETAC Globe: Learned Discourse July-August: 34-35.
Karickhoff SW, McDaniel VK, Melton C, Vellino AN, Nute DE, Carreira LA. 1991. Predicting chemical reactivity by computer. Environ Toxicol Chem 10:1405-1416.
Kühne R, Rothenbacher C, Herth P, Schüürmann G. 1994. Group contribution methods for physicochemical properties of compounds, In: Software-Development in Chemistry 8, C. Jochum (Ed.), Gesellschaft Deutscher Chemiker, Frankfurt am Main, Germany, p. 207-224.
Long, JM, Karickhoff, SW, Jones, WJ, Weber, EJ. 1999. Chemical data base for HWIR99, In: U.S. Environmental Protection Agency 1999 Hazardous Waste Identification Rule Notice Risk Assessment Report (http://www.epa.gov/osw/hazard/wastetypes/wasteid/hwirwste/risk.htm).
Pontolillo J, Eganhouse RP. 2001. The search for reliable aqueous solubility (SW) and octanol-water partition coefficient (KOW) data for hydrophobic organic compounds: DDT and DDE as a case study. U.S. Geological Survey Water-Resources Investigations Report 01-4201, Reston, VA, USA.
Schüürmann G, Kühne R, Ebert R-U, Kleint, F. 1995. Multivariate error analysis of increment methods for calculating the octanol/water-partition coefficient. Fresenius Environ Bull 4:13-18.
Tolls J. 2002. Ring test proposal of the Netherlands for a new OECD guideline: Partition coefficient n-octanol/water (Pow) slow-stirring method for highly hydrophobic chemicals-Validation report. IRAS Environmental Toxicology and Chemistry, Utrecht University, 65p.
U.S. Environmental Protection Agency. 1995. Great Lakes water quality initiative technical support document for the procedure to determine bioaccumulation factors. EPA-820-B-95-005. Office of Water, Washington, DC.
U.S. Environmental Protection Agency. 1998. Ambient water quality criteria derivation methodology human health: technical support document, final draft. EPA-822-B-98-005. Office of Science and Technology, Washington, DC.
USGS Home || Toxics || Environmental Health || Water || Climate Change || Science Systems || Ecosystems || Energy and Minerals || Hazards