Computations

To say that the description within MBH98 of their methodology is terse is an understatement. The algorithms described here were successful in replicating the MBH98 reconstruction in the 20th century to a high degree of precision; the correlations declined in earlier centuries, but still captured most MBH98 features. I (SM) notified Prof. Mann of this in a private email and requested clarification, perhaps in the form of a less terse description of the MBH98 methodology. Prof. Mann stated that he was unable to deal with this request because of numerous other time commitments. The algorithms described here are accordingly based upon a careful consideration of the public data and considerable experimentation. 

(a) Selection and scaling of 1082 MBH Cells

MBH98 purports to establish relationships between the proxies and 16 temperature principal components calculated from the Climate Research Unit (CRU) instrumental temperature database, using a subset of 1,082 out of 2,592 cells and the 79-year period from 1902-1980 as a calibration period. 

The CRU dataset was downloaded from CRU in July 2003 and collated into an R-table of dimension 1769 ( months from 1856 Jan to 2003 May) x 72 (longitude groups W to E) x 36 (latitude groups N to S).  The Jones marker of -9999 was changed to NA. The dataset was truncated to the 1082 MBH cells, the location of which was obtained from the MBH98 FTP site. (See below for collation of MBH98 data.) Following MBH98, the dataset was then truncated to 1902-95 and then a Z-transformation (subtracting mean and dividing by standard deviation) was carried out cellwise. The scaling factor for each cell was saved as "mann.scale.tab" in c:/climate/data/mann.  Four MBH cells have no observations [sic].  A tracking vector (of length 1082) is saved as "nil.tab" in c:/climate/data/mann. For each cell, the cosine of the latitude is used as an area-weighting factor. The scaled dataset obtained above is then multiplied by this area-weighting factor to obtain a scaled latitude-weighted temperature dataset, which is saved for further use.  This dataset is not used further in this paper, but was used in calculations regarding the MBH98 calculations of temperature principal components, which will be discussed in a later paper. The script for the above is here

(b) Downloading MBH Data

MBH98 reported that they carried out a principal component analysis of the above temperature data, using "conventional" principal component analysis.  There is considerable missing data. Since "conventional" methods fail in the presence of missing data, the precise methodology used by MBH98 remains undisclosed. MBH have published a commendable amount of supplementary information at several websites ftp://eclogite.geo.umass.edu/pub/mann/MANNETAL98 also ftp://ftp.ngdc.noaa.gov/paleo/paleocean/by_contributor/mann1998  including important results of their principal components analysis:  16 temperature principal components, 16 EOFs (eigenvectors) and a table of eigenvalues.  A script for downloading, collating and saving these and other MBH98 datasets is here.

The dataset of 112 proxies used in MBH98 was emailed to me by an associate of Prof. Mann.  The text file is here. It was saved as an R-object (time series) proxy in c:/climate/data/mann/proxy.tab.

The updated and corrected dataset of 112 proxies used in MBH98 was created by editing proxy.tab on a series-by-series basis as corrections were identified. The updated and corrected dataset was saved as an R (time series) object proxy in c:/climate/data/mann/proxy4.tab. 

(c) Reconstruction

The reconstruction is done through 4 functions.  The linear algebra involved in the MBH98 methodology is discussed here.

Gcalc   (see after function RPC )  calculates a matrix G of regression coefficients given a roster of proxies (as a logical vector of length 112) and a selection of eigenvectors (as a logical vector of length 16) for the relationship between the proxies and TPCs for the calibration period 1902-1980. (All calculations are done after Z-transforming both proxies and TPCs to period 1902-1980, but this is not done in this function.)  This calculation is formally equivalent to MBH98's "least-squares solution to the overdetermined matrix equation.

RPC  for a dataset of proxies proxy calculates the reconstructed TPCs.  The periodization defined in MBH98 is parameterized in the R-object period.m and the selections of eigenvectors in each period is parameterized in the R list select of length12. The proxies are Z-transformed basis 1902-1980. For all periods prior to 1970, the roster of proxies used in the period is defined as the set of proxies available in the first year of the period.  After 1970, the roster is the year-by-year available proxies.  For each period, using the roster of proxies and selection of eigenvectors, a matrix of coefficients G is calculated using Gcalc.  Then for each year in the period, the proxies are regressed against the selection of coefficients (using the proxy weighting factors as sent by  Prof. Mann's associate) thereby yielding the "reconstructed" TPCs for the selected eigenvectors.

 The script for generating Figures 7 and 8 is here. The MBH98 NH reconstruction (nhmann) is loaded, as are the MBH98 TPCs (pc) , eigenvectors (eof) , eigenvalues (lambda), weights (weights1), proxies (proxy), together with the above functions.  Figure 7a is column 2 of nhmann; Figure 7b is column 1 of rpc;  by applying the function RPC to proxy, a matrix MBH of reconstructed TPCs is obtained; Figure 7c is column 1 of MBH;  Then the corrected and updated proxy table proxy4 is loaded; a matrix new of reconstructed TPCs is obtained by applying the function RPC to the new table proxy. Figure 7d is column 1 of new. Figure 7 is then plotted.

The NH average using the reconstructed TPCs is now calculated. The reconstructed TPCs are expanded to 1082 cells using the eigenvectors eof, eigenvalues lambda, and the selection of eigenvectors for each period in list select. The MBH98 periodization is included as data within the function.  The location of the Mann cells is loaded within the function as is the vector nil of cells with no data.  This dataset is re-scaled using the scaling factors calculated in the original Z-transform of the CRU temperature data. The average is then taken of the dataset truncated to NH cells (using information on location of Mann cells). Areal weighting was already allowed for in the transformation described above. This gives Figure 8b; Figure 8a is the same as Figure 7a above.