March 6, 2009 Previous
articles gave an overview of GISTEMP
and described how the
baseline
was established and validated. That baseline is used for
comparison purposes during the testing of changes to the GISTEMP
software
and/or data files.
During our initial research, we walked through
the main shell scripts (do_comb_step*.sh) in each top level directory
that
control the flow of the system and examined
them in detail to better understand how the overall system works and
what
processing steps were involved. While doing that, we thought
there was a small coding error in the do_comb_step0.sh shell script and
instead discovered that it will be
necessary to examine all of the actual algorithms that affect the data
in significantly more detail
than was originally planned to explain the unexpected results we
encountered.
The bug we thought we found had to do with how data is extracted from an
input file containing monthly, seasonal, and annual average
temperature values from 1781 through 2003 for
Hohenpeissenberg,
a rural location in Germany. The
intention of the code, according to the comments, was to remove data prior to 1880
from the file and then use the data from
1880 - 2002 to replace records from the Global Historical Climatology Network (GHCN)
dataset in order to create a more complete
temperature record for that location. The
lines from do_comb_step0.sh that are relevant to Hohenpeissenberg are as follows:
echo "replacing Hohenspeissenberg data in $1 by more complete data (priv.comm.)"
echo "disregard pre-1880 data:"
tail +100 input_files/t_hohenpeissenberg_200306.txt_as_received_July17_2003 > t_hohenpeissenberg
${fortran_compile} hohp_to_v2.f -o hohp_to_v2.exe ; hohp_to_v2.exe
${fortran_compile} cmb.hohenp.v2.f -o cmb.hohenp.v2.exe
cmb.hohenp.v2.exe
The
data for 1879 is also extracted when the "tail
+100" command is executed, so we changed it to "tail +101" to get the data
starting with 1880 and then ran our
run_gistemp
and run_compare
scripts to see what effect, if any, it would have. The results
were surprising to say the least; making that modification caused a
total of 41 anomaly values to change by .01C in six of the output
data files, only one of the differences was for the year 1880, and only
two others that were for 1881 were readily explainable.
Further investigation revealed that, even though the comment says that
it is
intending to remove the data prior to 1880, and thus the "tail +100"
command
would have been incorrect, the first line of the output file is
actually ignored. The Fortran
program hohp_to_v2.f, which is compiled and executed on the next line
of the do_comb_step0.sh script, reads and discards the first line of
the file so the 1879 data was not being used and we had caused
it to remove the data for 1880 for the Hohenpeissenberg
station.
However,
it is not readily
apparent why removing a
single record for 1880 would affect 38 temperature anomaly values in
years other than 1880 and the two values in 1881 that use December of
1880 to compute. It was certainly not expected that values in two
other centuries would be affected. The complete detailed output from
run_compare is as follows:
STEP3 Global anomalies - Found 7 total differences, 0 >.01C
Higher than baseline 5 times, lower 2 times
(Column 13) Dec 1884 -82 -81
(Column 12) Nov 1887 -74 -73
(Column 10) Sep 1892 -37 -38
(Column 16) DJF 1893 -94 -95
(Column 03) Feb 1895 -89 -90
(Column 16) DJF 1896 -8 -9
(Column 15) D-N 2007 74 73
STEP3 NH anomalies - Found 15 total differences, 0 >.01C
Higher than baseline 3 times, lower 12 times
(Column 06) May 1881 9 10
(Column 15) D-N 1881 -33 -32
(Column 16) DJF 1881 -67 -66
(Column 18) JJA 1881 -33 -32
(Column 07) Jun 1882 -39 -38
(Column 13) Dec 1884 -56 -55
(Column 18) JJA 1884 -56 -55
(Column 02) Jan 1890 -30 -29
(Column 06) May 1901 -11 -10
(Column 12) Nov 1935 -28 -29
(Column 03) Feb 1939 16 17
(Column 03) Feb 1940 27 28
(Column 08) Jul 1948 4 5
(Column 03) Feb 1982 21 20
(Column 12) Nov 2004 106 105
STEP3 SH anomalies - Found 0 total differences, 0 >.01C
STEP3 Zonal anomalies - Found 2 total differences, 0 >.01C
Higher than baseline 2 times, lower 0 times
(Column 05) -90N 1981 67 66
(Column 09) -64N 2002 111 110
STEP4_5 Global anomalies - Found 4 total differences, 0 >.01C
Higher than baseline 1 times, lower 3 times
(Column 13) Dec 1880 -22 -21
(Column 15) D-N 1883 -26 -25
(Column 08) Jul 1919 -18 -17
(Column 10) Sep 2004 48 47
STEP4_5 NH anomalies - Found 6 total differences, 0 >.01C
Higher than baseline 3 times, lower 3 times
(Column 14) J-D 1881 -23 -22
(Column 08) Jul 1887 -3 -2
(Column 05) Apr 1895 -26 -25
(Column 15) D-N 2003 65 64
(Column 14) J-D 2007 78 77
(Column 17) MAM 2007 83 82
STEP4_5 SH anomalies - Found 0 total differences, 0 >.01C
STEP4_5 Zonal anomalies - Found 7 total differences, 0 >.01C
Higher than baseline 2 times, lower 5 times
(Column 09) -64N 1880 -40 -39
(Column 03) NHem 1881 -23 -22
(Column 09) -64N 1881 -39 -38
(Column 09) -64N 1886 -35 -34
(Column 09) -64N 1903 -24 -23
(Column 09) -64N 1907 -63 -64
(Column 03) NHem 2007 78 77
Admittedly,
the anomaly differences are very small and
most likely due to rounding error. But, in a basic
computation of monthly, seasonal, and annual means, removing a single
record for 1880 should only be
expected to effect rounding errors in 1880 and possibly the December to
November and the December January February values for 1881. In our
results, the Northern Hemisphere land only values for D-N and DJF in 1881
and the December global land and sea value for 1880 were the only differences
that are easily explainable.
Had a record been removed that
fell within the base period of 1951 - 1980, many more values in any
number of years could be
expected to change due to either rounding errors or other
influences. The fact that values
more than one hundred years later are affected by the removal of a
single
record indicates that, even if rounding errors are to blame, the other
differences are actually caused by something else. At this time,
our best guess is that changes to the data by other parts of the
software, most likely homogenization and/or urban adjustments, were influenced by
one year at a rural station in Germany. Determining the exact reason will
require much more detailed analysis and testing, which will be
performed and documented in future articles.