GISTEMP: Hohenpeissenberg

March 6, 2009
Previous articles gave an overview of GISTEMP and described how the baseline was established and validated.  That baseline is used for comparison purposes during the testing of changes to the GISTEMP software and/or data files. 

During our initial research, we walked through the main shell scripts (do_comb_step*.sh) in each top level directory that control the flow of the system and examined them in detail to better understand how the overall system works and what processing steps were involved.  While doing that, we thought there was a small coding error in the do_comb_step0.sh shell script and instead discovered that it will be necessary to examine all of the actual algorithms that affect the data in significantly more detail than was originally planned to explain the unexpected results we encountered.

The bug we thought we found had to do with how data is extracted from an input file containing monthly, seasonal, and annual average temperature values from 1781 through 2003 for Hohenpeissenberg, a rural location in Germany.  The intention of the code, according to the comments, was to remove data prior to 1880 from the file and then use the data from 1880 - 2002 to replace records from the Global Historical Climatology Network (GHCN) dataset in order to create a more complete temperature record for that location.  The lines from do_comb_step0.sh that are relevant to Hohenpeissenberg are as follows:

echo "replacing Hohenspeissenberg data in $1 by more complete data (priv.comm.)"
echo "disregard pre-1880 data:"
tail +100 input_files/t_hohenpeissenberg_200306.txt_as_received_July17_2003 > t_hohenpeissenberg
${fortran_compile} hohp_to_v2.f -o hohp_to_v2.exe  ; hohp_to_v2.exe
${fortran_compile} cmb.hohenp.v2.f -o cmb.hohenp.v2.exe
cmb.hohenp.v2.exe

The data for 1879 is also extracted when the "tail +100" command is executed, so we changed it to "tail +101" to get the data starting with 1880 and then ran our run_gistemp and run_compare scripts to see what effect, if any, it would have.  The results were surprising to say the least; making that modification caused a total of 41 anomaly values to change by .01C in six of the output data files, only one of the differences was for the year 1880, and only two others that were for 1881 were readily explainable.  Further investigation revealed that, even though the comment says that it is intending to remove the data prior to 1880, and thus the "tail +100" command would have been incorrect, the first line of the output file is actually ignored.  The Fortran program hohp_to_v2.f, which is compiled and executed on the next line of the do_comb_step0.sh script, reads and discards the first line of the file so the 1879 data was not being used and we had caused it to remove the data for 1880 for the Hohenpeissenberg station. 

However, it is not readily apparent why removing a single record for 1880 would affect 38 temperature anomaly values in years other than 1880 and the two values in 1881 that use December of 1880 to compute.  It was certainly not expected that values in two other centuries would be affected.  The complete detailed output from run_compare is as follows:

		STEP3 Global anomalies - Found 7 total differences, 0 >.01C
		Higher than baseline 5 times, lower 2 times
		(Column 13) Dec 1884    -82     -81
		(Column 12) Nov 1887    -74     -73
		(Column 10) Sep 1892    -37     -38
		(Column 16) DJF 1893    -94     -95
		(Column 03) Feb 1895    -89     -90
		(Column 16) DJF 1896    -8      -9
		(Column 15) D-N 2007    74      73

		STEP3 NH anomalies - Found 15 total differences, 0 >.01C
		Higher than baseline 3 times, lower 12 times
		(Column 06) May 1881    9       10
		(Column 15) D-N 1881    -33     -32
		(Column 16) DJF 1881    -67     -66
		(Column 18) JJA 1881    -33     -32
		(Column 07) Jun 1882    -39     -38
		(Column 13) Dec 1884    -56     -55
		(Column 18) JJA 1884    -56     -55
		(Column 02) Jan 1890    -30     -29
		(Column 06) May 1901    -11     -10
		(Column 12) Nov 1935    -28     -29
		(Column 03) Feb 1939    16      17
		(Column 03) Feb 1940    27      28
		(Column 08) Jul 1948    4       5
		(Column 03) Feb 1982    21      20
		(Column 12) Nov 2004    106     105
            
		STEP3 SH anomalies - Found 0 total differences, 0 >.01C
            
		STEP3 Zonal anomalies - Found 2 total differences, 0 >.01C
		Higher than baseline 2 times, lower 0 times
		(Column 05) -90N 1981   67      66
		(Column 09) -64N 2002   111     110
            
		STEP4_5 Global anomalies - Found 4 total differences, 0 >.01C
		Higher than baseline 1 times, lower 3 times
		(Column 13) Dec 1880    -22     -21
		(Column 15) D-N 1883    -26     -25
		(Column 08) Jul 1919    -18     -17
		(Column 10) Sep 2004    48      47

		STEP4_5 NH anomalies - Found 6 total differences, 0 >.01C
		Higher than baseline 3 times, lower 3 times
		(Column 14) J-D 1881    -23     -22
		(Column 08) Jul 1887    -3      -2
		(Column 05) Apr 1895    -26     -25
		(Column 15) D-N 2003    65      64
		(Column 14) J-D 2007    78      77
		(Column 17) MAM 2007    83      82

		STEP4_5 SH anomalies - Found 0 total differences, 0 >.01C

		STEP4_5 Zonal anomalies - Found 7 total differences, 0 >.01C
		Higher than baseline 2 times, lower 5 times
		(Column 09) -64N 1880   -40     -39
		(Column 03) NHem 1881   -23     -22
		(Column 09) -64N 1881   -39     -38
		(Column 09) -64N 1886   -35     -34
		(Column 09) -64N 1903   -24     -23
		(Column 09) -64N 1907   -63     -64
		(Column 03) NHem 2007   78      77

Admittedly, the anomaly differences are very small and most likely due to rounding error.  But, in a basic computation of monthly, seasonal, and annual means, removing a single record for 1880 should only be expected to effect rounding errors in 1880 and possibly the December to November and the December January February values for 1881.  In our results, the Northern Hemisphere land only values for D-N and DJF in 1881 and the December global land and sea value for 1880 were the only differences that are easily explainable.  Had a record been removed that fell within the base period of 1951 - 1980, many more values in any number of years could be expected to change due to either rounding errors or other influences.  The fact that values more than one hundred years later are affected by the removal of a single record indicates that, even if rounding errors are to blame, the other differences are actually caused by something else.  At this time, our best guess is that changes to the data by other parts of the software, most likely homogenization and/or urban adjustments, were influenced by one year at a rural station in Germany.  Determining the exact reason will require much more detailed analysis and testing, which will be performed and documented in future articles.