March 5, 2009 This is the second
in a series of articles about GISTEMP, a software system created by the NASA
Goddard Institute for Space
Studies (GISS) that is used to produce the GISS Surface Temperature Analysis. In
the previous article,
we gave you an overview of what to expect if you want to install the
package on your system and get it operational. This article will
explain how we verified that our installation was functioning properly
before moving on to our real objective, which is to evaluate what
effect various changes to some of the assumptions, decisions, judgement calls,
parameters, and
functionality built into the software have on the temperature
analysis. The modifications and their effects will be the topics
of future articles.
After
spending a considerable amount of time gathering the input data files,
getting the system running, and walking through each step trying to
understand what it was doing and how, we finally got around to doing
a full end to end run sequentially through all of the steps and produce
a set of output files. When you have successfully completed a
run,
you end up with
a set of four files in the STEP3/results directory that represent the
land-only
temperature anomalies for the globe, the Northern Hemisphere, Southern
Hemisphere, and specific latitudinal zones. You also have another
set of four
files in the STEP4_5/results directory that represent the land plus
ocean temperature
anomalies for the same areas. There are other files produced, but
for the purpose of this article they are not relevant.
In order to verify that the system was indeed functioning
as intended, the output we produced was compared to the output
available
on the GISS web site
(scroll down near the bottom of the page to find the four land and four land
plus ocean text files).
We ran the "diff" command on each of our land-only files against the
corresponding GISS files
and inspected all lines identified as different by hand to see which
values did not match.
In less than 4% of the total monthly and
annual temperature anomaly values generated, minor
variations of .01C were
detected. In the majority of cases our value was .01C lower than
the GISS value, although there were several places where our value was
.01C higher. Performing the same check on the land plus ocean
files generated significantly more output, too much to inspect by hand
as we had previously done.
The original temperature anomaly files from the GISS web site
that were used to do the
comparison to our output were saved as part of the baseline and are available for
review:
To eliminate the tedium and automate the process, we developed
some scripts. The first script just does a
full run of GISTEMP. The
information that was displayed on the screen during the initial
run was captured and
stored as part of the baseline. Each of the eight temperature anomaly files
produced is also stored in
our baseline to be used for comparison purposes during future testing.
Those files are available for review:
The
next script we wrote compares two
temperature anomaly files
and print out details about each of the differences detected.
Another script reads the discrepancies identified by compare_output and
counts how often the differences are
higher and lower than
the GISS data. The last script we wrote
invokes
compare_output and high_low_count for each of the eight files we are
interested in to produce a summary or detail report about the
differences between our output and the corresponding files from GISS. The
detail report was captured and
stored as part of the baseline. The
summary output from run_compare when using the GISS files as the baseline for
comparison is
as follows:
STEP3 Global anomalies - Found 69 total differences, 0 >.01C
Higher than baseline 56 times, lower 13 times
STEP3 NH anomalies - Found 142 total differences, 0 >.01C
Higher than baseline 116 times, lower 26 times
STEP3 SH anomalies - Found 33 total differences, 0 >.01C
Higher than baseline 12 times, lower 21 times
STEP3 Zonal anomalies - Found 72 total differences, 0 >.01C
Higher than baseline 45 times, lower 27 times
STEP4_5 Global anomalies - Found 510 total differences, 28 >.01C
Higher than baseline 179 times, lower 331 times
Big differences higher 14 times, lower 14 times
STEP4_5 NH anomalies - Found 755 total differences, 126 >.01C
Higher than baseline 330 times, lower 425 times
Big differences higher 53 times, lower 73 times
STEP4_5 SH anomalies - Found 342 total differences, 38 >.01C
Higher than baseline 91 times, lower 251 times
Big differences higher 6 times, lower 32 times
STEP4_5 Zonal anomalies - Found 396 total differences, 49 >.01C
Higher than baseline 164 times, lower 232 times
Big differences higher 26 times, lower 23 times
There are a total of 17,504 temperature anomaly values in the eight
files. The comparison identifies 2,319 temperature anomaly values
as being different between our files and the GISS files, 13.24% of the
total number of values. Of those differences, 2,078 (89.6% of the anomaly values
that did not match) were off by .01C and 241
(1.38% of the total temperature anomaly values) were off by .02C or
more. The land-only files were the closest match with just 316
values that were all .01C different than the temperature anomalies
computed by GISS. The land plus sea files contained 2003 values
that were different than those computed by GISS and all of the 241
values that varied by .02C or more. A further analysis of the
land
plus sea temperature anomaly discrepancies identified 187 that were off
by .02C, 30 off by
.03C, 15 off by .04C, 5 off by .05C, 1 off by .06C, 1 off by .07C, and
2 that were off by .08C.
We determined that all of the discrepancies could be
attributable to rounding errors, most likely caused by
differences between the floating point hardware
in the system we used versus the system used by GISS. It is also
possible that GISS used a slightly different version of the SBBX.HadR2
input data file than the one we used, which may have been the cause of
some of the larger discrepancies. In any event, we were satisfied
that our installation was
operating properly and that we had established a suitable baseline environment
for further research into GISTEMP. The original output files from the GISS web
site are archived
in our baseline directory for historical purposes along with the original input
data files used. All future research will be
conducted using the same input data files and the output files produced during our
initial run, as described and available
above, as the baseline for comparison purposes after modifications are made
to the software or other files.
Update
After publication of our article on base period
selection and writing the scripts to extract data and import into
spreadsheets in order to create charts, we received a request to produce charts
displaying the differences between our baseline and the original GISS data. The
following global charts graphically portray the
difference between our baseline and the files originally downloaded from
the GISS web site.
As can be seen in the global land only chart, the differences are barely blips.
The global land and sea chart also demonstrate that the differences between
our baseline and the GISS data are merely inconsequential blips.
An
Excel file containing the data and all of the
charts was created and incorporated into our baseline.