AUDIT TRAIL The audit trail for verification of MBH98 errors is set up here in 3 formats: 1. Errors and defects which can be verified through inspection of the MBH98 dataset 2. Updates which can be verified through comparison of MBH98 and NGDC data 3. Errors in proxy principal component calculation, which require re-collation of NGDC data, and comparison of explained variance.
1. Errors which can be verified through inspection This is the proxy dataset (1 MB) as received. This can be opened in an Excel sheet. The following extracts show collation errors and fills. The files are all tab-separated and will be tidied. a) Series 72-80 row 1980. All Texas-Mexico principal components have same 1980 value. Data b) Series 81 to 83 row 1980. All Vaganov principal components have same 1980 value. Data c) Series 84 and 90-92 row 1980. Four ITRDB US principal components have same 1980 value. Data d) series 73 year 1499 start Data e) series 74-75 year 1599 start Data f) series 76-80 year 1699 start Data g) series 81 year 1449 start Data h) series 82 year 1599 start Data i) series 83 year 1749 start Data j) series 86-89 year 1499 start Data k) series 90 year 1599 start Data l) series 91-92 year 1749 start Data m) series 3, year 1907-1909 fills Data n) series 3 year 1953-1964 fills (3 fills in 1962-64 overlaying source see below) Data o) series 6, year 1980 fill Data p) series45, year 1979-1982 fills Data q) series 46, year 1975-1980 fills Data r) series 50, year 1962-1982 copied from series 49 values in adjacent column Data s) series 51, year 1977-1980 fill Data t) series 52, year 1974-1980 fill Data u) series 53, year 1400-1404 fill Data v) series 54, year 1975-1980 fill Data w) series 55, year 1979-1980 fill Data x) series 56, 1975-1980 fill Data y) series 58, 1977-1980 fill Data z) series 93-99, 1976-1980 fill Data aa) series 102, 1975-1980 missing data Data ab) series 103, 1975-1980 missing data Data ac) series 104, 1974-1980 missing data Data ad) series 106, 1972-1980 missing data Data ae) series 112, 1973-1980 missing data Data af) series 10, 1980 missing data Data
2. Truncations and updates which can be verified at NGDC The audit trail here is set up as a series of short scripts in R, which read the corresponding data and output (usually a correlation) and referrable here as hyperlinks to series-by-series annotation. Users of R (after loading the MBH98 proxy table using the command below) can simply copy the script into R and the correlation or other index should result. Tweaks for users of Matlab should be apparent to such users. The location of FTP sources of the various MBH98 series has been by trial-and-error as MBH98 provides no such disclosure. All FTP sources here (except the Central England series from Hadley Centre) are from NGDC, which maintains an excellent collection. Bruce Bauer, the manager of the paleo program, has been unfailingly co-operative to even the most minute inquiry. While enough FTP sources have been located to make this section of interest, many others have not. Indeed, it is a pet peeve of mine that some of the more vociferous advocates of aggressive public policy have failed to archive their data with NGDC. Some examples of this are here. Steps taken with each series were to calculate correlations, examine the start and finish for truncations or additions and to plot the series. The scripts hyperlinked below have been condensed to show the material point referred to. load("c:/climate/data/mann/proxy.tab") load("c:/climate/data/mann/prname.tab")
a) use of summer data in series 10 (Central England). This is shown first through correlation of >0.99 with JJA series and only 0.62 with annual data and secondly through direct inspection of the 3 series taken together. Graph Script Data URL b) truncation of data from 1659 to 1730 in series 10 (Central England). This is shown through examination of the two series together. The cold temperatures so deleted are shown by plotting series together. Graph Script Data URL c) high MBH98 data in the 1980s and especially 1987 in series 10 (Central England). This is shown through direct inspection. Graph Script Data URL d) use of summer data in series 11 (Central Europe). Graph Script Data URL e) truncation of data from 1525 to 1550 in series 11 (Central Europe). This is shown through examination of the two series together. The high early temperatures so deleted are shown by plotting series together. Graph Script Data URL The NGDC identifications of MBH98 series 51-61 (Jacoby northern treeline series) are not shown in MBH98. These identifications are straightforward as shown here. I have downloaded all the NGDC data (decadal format) and converted to R-time series for easier data handling. I've tried to annotate below to show the main issues without requiring this overhead. f) MBH98 data for series 51 (Four Twelve AK) has correlation of 0.86 with NGDC. Comparison of end values shows that NGDC continues to 1990, as compared to MBH end in 1976 (with plugs to 1980). Plotting shows that MBH98 has pervasive and increasing over-statement in 20th century values and peaks in the 1920s. Graph Script Data URL g) MBH98 data for series 52 (Fort Chimo PQ) has correlation of 0.93 with NGDC. Comparison of end values shows that NGDC continues to 1990, as compared to MBH end in 1976 (with plugs to 1980). Plotting shows that MBH98 has pervasive and increasing over-statement in 20th century values. Series peaks in 1960s. Graph Script Data URL h) MBH98 data for series 54 (Arrigetch AK) has correlation of 0.96 with NGDC. Comparison of end values shows that NGDC continues to 1990, as compared to MBH end in 1976 (with plugs to 1980). Plot shows series peak in early 1980s with downturn to series end in 1990. Graph Script Data URL i) MBH98 data for series 55 (Sheenjek River AK) has correlation of 0.70 with NGDC. Comparison of end values shows that both NGDC and unplugged MBH98 end in 1979. Comparison of start values (and plot) shows NGDC starts much earlier. Considerable overstatement of values in MBH98 in the 1940s and in the 18th century. Graph Script Data URL j) MBH98 data for series 56 (Twisted Tree, Heartrot Hill (TTHH), Canada has correlation of 0.699 with NGDC. Comparison of end values shows that NGDC continues to 1990, while unplugged MBH98 ends in 1976. Comparison of start values (and plot) shows NGDC starts much earlier. NGDC values peak in the 1960s and reduce sharply thereafter. Increasing MBH overstatement in the 20th century. Graph Script Data URL j) MBH98 data for series 58 (Coppermine River, Canada has correlation of 0.99 with NGDC. MBH plug three years (1978-1980), but otherwise coverage period is the same. Values nearly identical at beginning but pervasive changes later in the series. Graph Script Data URL k) MBH98 data for series 1 Burdekin River, Australia coral fluorescence has correlation of 0.42 with NGDC series. Lough (pers. comm. Oct. 2003) confirms validity of NGDC series over earlier data. Plot shows visual coherence, but considerable shifting. Graph Script Data URL l) MBH98 data for series 2 (Great Barrier Reef) is coral calcification, not coral thickness (Lough, pers. comm., Oct. 2003). There is a correlation of 0.99 between series 2 and the average calcification at NGDC of the following 5 corals for the period 1615-1982: Abraham Reef, Britomart Reef, Havannah Island, Lodestone Reef and Sanctuary Reef. The MBH data seems to be Z-transformed, although the basis of the Z-transform is not clear. Graph Script Data URL-Havannah Island m) MBH98 data for series 3 Urvina Bay, Galapagos coral δO18 has correlation of -0.9992951 with NGDC series - which is reversed in sign during transformation. MBH overwrite actual data in 1962-64 and fill for 1907-1909 and 1953-61 as noted above. The missing data results from a splice between two corals, which are spliced by adjusting the readings of the second coral.. Graph Script Data URL m) MBH98 data for series 6 Vanuatu coral δO18 has correlation of 0.93 with NGDC series. MBH have one plugged year in 1980. Graph Script Data URL n) MBH98 data for series 7, New Caledonia δ)18 has correlation of 0.618 with NGDC data. Graph Script Data URL o) MBH98 data for series 8, Secas, Panama δO18 has correlation of 0.983 with NGDC annualized series (annual data calculated from NGDC 10 per year data). Graph Script Data URL p) MBH98 data for series 9, Secas, Panama δC13 has correlation of 0.991 with NGDC annualized series (annual data calculated from NGDC 10 per year data). Graph Script Data URL q) MBH98 data for series 21, grid-box 42.5N, 92.5W has correlation of 0.889 with JB92 Minnesota (adjacent grid box) annual data. There are many differences in the plotted series. Graph Script Data URL r) MBH98 data for series 23, grid-box 47.5N, 7.5E has correlation of 0.81 with JB92 Geneva annual data, which has identical start date (1753) and location. There are many differences in the plotted series including a notable downspike in the MBH data in early 19th century not present in JB92 data. Graph Script Data URL s) MBH series 26, grid-box 52.5N, 17.5E has no counterpart location in JB92 Table 13.1. t) MBH98 data for series 27, grid-box 57.5N, 17.5E has correlation of >0.99 with JB92 Stockholm annual data, which has identical start date (1756) and location. The MBH series is linearly transformed from the JB92 series. Graph Script Data URL u) MBH98 data for series 28, grid-box 57.5N, 37.5E has correlation of 0.96 with JB92 Leningrad annual data, which has identical start date (1752) and location. The MBH series is transformed from the JB92 series. Graph Script Data URL v) MBH series 29, grid-box 62.5N, 7.5E has no counterpart location in JB92 Table 13.1. w) MBH98 data for series 30, grid-box 62.5N, 12.5E has correlation of 0.998 with JB92 Trondheim annual data, which has identical start date (1761) and location. The MBH series is transformed from the JB92 series. Graph Script Data URL x) JB92 series for Central England, Berlin, Sverdlovsk and Toronto (all digitally available at NGDC) are compared to MBH series 21-31 and no correlations are found to permit identification. y) MBH98 data for series 35, grid-box precipitation 42.5N, 2.5E has correlation of 0.95 with JB92 Marseilles (43.3N, 5.4E) annual data, which has identical start date (1749) and is one grid-box to the east. Both the JB92 series at NGDC and MBH series are transformed, but transformations are different. Graph Script Data URL z) MBH98 data for series 37, precipitation 42.5N, 72.5W has correlation of 0.92 with JB92 Paris annual data, which has identical start date (1770). Both the JB92 series at NGDC and MBH series are transformed, but transformations are different. Graph Script Data URL aa) JB92 series ab) MBH98 data for series 43, Tasmania T-reconstruction has correlation of 0.82 with updated NGDC series. Plot shows visual coherence, but considerable shifting. Graph Script Data URL ac) MBH98 data for series 65, Tarvagatny Pass, Mongolia has correlation of 0.94 with updated NGDC series. MBH data shows increasing over-estimate in 20th century. Graph Script Data URL ad) MBH98 data for series 105, INDI008X is an incorrect label for NGDC indi002x. Correlation is 0.83. Graph Script Data URL ae) MBH98 data for series 112, SWED002B is NGDC swed002. Correlation is 0.977. Graph Script Data URL Series which were successfully located in digital form in the MBH form are noted here; comments on digitally unavailable series are here 3. Tree Ring Principal Components Five separate principal component regions were identified within the MBH98 database: Texas-Oklahoma (#69-71), Texas-Mexico (#72-80), ITRDB US (#84-92), South America (#93-95) and Australia-(New Zealand) (#96-99). The sites for each region are identified at MBH Supplementary Information. No sites for Texas-Oklahoma or Texas-Mexico were given there, but NGDC identifications for the sites in these region were easily located and are listed here. All sites in the other three regions, except immaterially one of 232 US sites -AR045, were located at NGDC. Digital site lists are as follows: Texas-Oklahoma, Texas-Mexico, ITRDB US, South America and Australia-(New Zealand) as well as for MBH99 ITRDB US. The site chronologies (*.crn) data from NGDC was collated into time series for each region, truncating the US data at 1400. Digital collations are as follows: Texas-Oklahoma, Texas-Mexico, ITRDB US, South America and Australia-(New Zealand). A collation is also done for the MBH99 ITRDB US data. Conventional principal component calculations, which MBH98 claim to use, require that there be no missing data. There is little relationship between the periods in which MBH principal components are calculated and the period during which all selected sites in the region are available as shown here. Tree ring data is conventionally standardized to a mean of 1000 (with no negative values). Although it is not disclosed by MBH, they carry out a Z-transformation on the collated data. This is established both by the range of values and by a very close replication of the MBH99 principal components. MBH99 PC1 MBH99 PC2 MBH99 PC3. Accordingly, prior to carrying out a principal component calculation, the collated data is scaled. A principal components analysis is carried out for each region and the same number of principal components collected as in the MBH98 collection. The explained variance is calculated. MBH do not disclose the eigenvectors corresponding to their principal components. Given the MBH98 PCs, the eigenvectors which maximize explained variance are calculated; the explained variance using the MBH PCs and these calculated eigenvectors is then calculated. The script to carry out the calculations in this section is here. Graphics comparing the MBH and recalculated PCs are here.
A summary of explained variance is here. A summary of correlations is here. |