In the last two parts of this essay, I discussed my thoughts on the veil protecting private inquiry from public scrutiny and I discussed the meaning of stolen e-mails made public. In this last part, I discuss my thoughts on the handling of conflicting data.
Quotes extracted from the stolen e-mail suggested that data which should have entered into the famous “hockey stick plot” was deliberately ignored [1]. I’ve certainly commented on the hockey stick plot before [2], and I will be the first to admit that I felt somewhat concerned when I heard that conflicting data was excluded from the plot. On the other hand, the scientist in me knows that excluding data happens all the time; when you do it, you must say so and state why.
Before going further on that specific subject, let’s take a moment to think about a situation that is quite common in science. In fact, I would say that most scientists face this problem the very first time they are asked to conduct a controlled experiment for a high school or college laboratory class.
You are asked to conduct an experiment; for instance, you are asked to attach a piece of ticker-tape paper to a weight, raise the weight to the ceiling and thread the paper between two electrodes, and then drop the weight. The electrodes fire at fast, regular intervals, marking the paper every time they fire. The spacing of the marks on the paper, and knowing the time between the sparks, tells you about the acceleration of the weight. You are asked to determine the acceleration due to gravity by doing this experiment.
You find the acceleration to be LESS than 9.8 m/s^2. You do the experiment several times, to build up a sample of repetitions. This reduces the statistical uncertainty, but each measurement confirms the original result with improved uncertainty each time. You bring your conclusions to the instructor, who then asks you to simply discuss your findings and hand in your report. What do you do?
You’ve learned in class that all objects on earth fall at 9.8 m/s^2. Yet, here is an experiment that measures something much less – let’s say (7.4 +/- 0.4) m/s^2. This is significantly different from what you learned in class. What do you do? Do you throw out the result and simply say that since you were told that acceleration on earth is 9.8 m/s^2, your result must be wrong? Do you fudge the data and try to get closer to the stated value? Do you repeat the experiment with somebody else’s sparker? Do you work to study how your own sparker functions (e.g. maybe the time between sparks is NOT what the manufacturer reported)? You have limited time – the report is due by the end of lab. What do you do?
This is the same situation that all scientists face in their work. You have a deadline – a meeting, a conference, a review, a threat of a competitor scooping you. You have a result that doesn’t agree with other measurements. It could signal something new; it could signal that other results are wrong; it could be that you didn’t understand your experiment. What do you do?
This is not an easy dilemma. The overwhelming set of measurements in climate data point to global-scale change, centered on average temperature increases that are more rapid since the mid 1800s than in any previous known period. Yet, here is data that contradicts that observation. You badly want your detractors to go away. What do you do?
There are few right answers to this question, and lots of wrong ones. Of course, few of us know enough about that climate data to make an intelligent answer to the question. The safe answer would be to report the results from the raw data, record your hypotheses about why the data do what they do, and make the result public (perhaps as a pre-print, since it’s likely such an incomplete analysis won’t withstand peer review).
I’ll close with a final thought on all of this. The veil that protects the messy process of science is important for the promotion of free thought, but cannot be relied upon to hide irresponsible action from public scrutiny. The release of electronic documents is a piercing of the veil, but unethically snapshots only a part of the overall scientific process (and, from my own experience, a part of that process where people speak all too freely and easily in ways incongruent with their final decisions). Faced with inconsistent data, the scientist faces an unpleasant set of choices where the best answer brings the least glory or recognition. But, what of the data itself? Does the public have a right to the data itself?
There is no easy answer to this as well. Many collaborations own their data, while many others are required to release the data to the public after a period of closed access. Even publicly released data has already been corrected and filtered, so public data is rarely “complete” in the sense that it’s the same data that came fresh off the instrumentation. The public is ill-equipped, either by their own dis-interest in scientific methods or by the poor quality of the educational highway for science, to deal with data. Yet the demand for the raw climate data behind the “Climate-gate” e-mails is real, and must be considered. Certainly, there can be real value in making data available to other scientists, but personally I think that the data should come with strings attached.
Here are the strings: publications on the data are not official unless they are blessed by the original person or collaboration that collected the data. Therefore, journals should not accept papers that are not affiliated with the original owners. In cases where data is old enough that it outlived the experiment or the original researchers, the journals should be required to review the paper by panel and not by just a few peer referees. They need a diversity of experts in the field, not just a few people in the field to sign-off on the analysis. The data must be released with all of the filters and calibrations that have been applied clearly documented, and the data source must also be thoroughly documented. If the original instrumentation is still available, it should be made available to the analysts. This latter issue is critical so that instrumental effects can be understood; they often shape the data in unknown or unpredictable ways.
The responsible release of data is a partnership between the original researchers, the public entity using the data, and the journals reviewing the work. Without this partnership, nobody should believe what comes from the new research.
“Climate-gate” is a window not just on the scientific process, but a means by which scientists themselves can think about the way they conduct their work. The ultimate question – does this change how we think about the climate research itself – still needs to be answered. But I won’t answer it, because I am not a climate researcher. I will say that I am taking personal responsibility for my carbon emissions, because there is no reason I need to produce as much carbon as the world has made me capable of producing. I am conserving water, I am changing my eating habits, and I am using habit and technology to throttle electrical usage.
Check out this analysis of the most held-up “conspiracy” e-mails; it’s fun, and shows what you can learn by thinking deeply about these matters and going even a little bit below the surface (thanks to Randy Scalise for bringing this to my attention):