Computations
To say that the
description within MBH98 of their methodology is terse is an understatement. The
algorithms described here were successful in replicating the MBH98
reconstruction in the 20th century to a high degree of precision; the
correlations declined in earlier centuries, but still captured most MBH98
features. I (SM) notified Prof. Mann of this in a private email and requested
clarification, perhaps in the form of a less terse description of the MBH98
methodology. Prof. Mann stated that he was unable to deal with this request
because of numerous other time commitments. The algorithms described here are
accordingly based upon a careful consideration of the public data and
considerable experimentation. (a) Selection
and scaling of 1082 MBH Cells MBH98 purports
to establish relationships between the proxies and 16 temperature principal
components calculated from the Climate Research Unit (CRU) instrumental
temperature database, using a subset of 1,082 out of 2,592 cells and the 79-year
period from 1902-1980 as a calibration period. The CRU dataset
was downloaded from CRU in July 2003 and collated into an R-table of dimension
1769 ( months from 1856 Jan to 2003 May) x 72 (longitude groups W to E) x 36
(latitude groups N to S). The Jones marker of -9999 was changed to NA. The
dataset was truncated to the 1082 MBH cells, the location of which was obtained
from the MBH98 FTP site. (See below for collation of MBH98 data.) Following
MBH98, the dataset was then truncated to 1902-95 and then a Z-transformation
(subtracting mean and dividing by standard deviation) was carried out cellwise.
The scaling factor for each cell was saved as "mann.scale.tab" in
c:/climate/data/mann. Four MBH cells have no observations [sic]. A
tracking vector (of length 1082) is saved as "nil.tab" in
c:/climate/data/mann. For each cell, the cosine of the latitude is used as an
area-weighting factor. The scaled dataset obtained above is then multiplied by
this area-weighting factor to obtain a scaled latitude-weighted temperature
dataset, which is saved for further use. This dataset is not used further
in this paper, but was used in calculations regarding the MBH98 calculations of
temperature principal components, which will be discussed in a later paper. The
script for the above is here. (b)
Downloading MBH Data MBH98 reported
that they carried out a principal component analysis of the above temperature
data, using "conventional" principal component analysis. There
is considerable missing data. Since "conventional" methods fail in the
presence of missing data, the precise methodology used by MBH98 remains
undisclosed. MBH have published a commendable amount of supplementary
information at several websites ftp://eclogite.geo.umass.edu/pub/mann/MANNETAL98
also ftp://ftp.ngdc.noaa.gov/paleo/paleocean/by_contributor/mann1998
including important results of their principal components analysis: 16
temperature principal components, 16 EOFs (eigenvectors) and a table of
eigenvalues. A script for downloading, collating and saving these and
other MBH98 datasets is here. The dataset of
112 proxies used in MBH98 was emailed to me by an associate of Prof. Mann.
The text file is here. It was saved as an
R-object (time series) proxy in c:/climate/data/mann/proxy.tab. The updated and
corrected dataset of 112 proxies used in MBH98 was created by editing proxy.tab
on a series-by-series basis as corrections were identified. The updated and
corrected dataset was saved as an R (time series) object proxy in
c:/climate/data/mann/proxy4.tab. (c)
Reconstruction The
reconstruction is done through 4 functions. The linear algebra involved in
the MBH98 methodology is discussed here. Gcalc
(see
after function RPC ) calculates a matrix G of regression
coefficients given a roster of proxies (as a logical vector of length
112) and a selection of eigenvectors (as a logical vector of length 16)
for the relationship between the proxies and TPCs for the calibration period
1902-1980. (All calculations are done after Z-transforming both proxies and TPCs
to period 1902-1980, but this is not done in this function.) This
calculation is formally equivalent to MBH98's "least-squares solution to
the overdetermined matrix equation. RPC
for a dataset of
proxies proxy calculates the reconstructed TPCs. The periodization
defined in MBH98 is parameterized in the R-object period.m and the
selections of eigenvectors in each period is parameterized in the R list select
of length12. The proxies are Z-transformed basis 1902-1980. For all periods
prior to 1970, the roster of proxies used in the period is defined as the set of
proxies available in the first year of the period. After 1970, the roster
is the year-by-year available proxies. For each period, using the roster
of proxies and selection of eigenvectors, a matrix of coefficients G is
calculated using Gcalc. Then for each year in the period, the
proxies are regressed against the selection of coefficients (using the proxy
weighting factors as sent by Prof. Mann's associate) thereby yielding the
"reconstructed" TPCs for the selected eigenvectors. The script
for generating Figures 7 and 8 is here. The
MBH98 NH reconstruction (nhmann) is loaded, as are the MBH98 TPCs (pc)
, eigenvectors (eof) , eigenvalues (lambda), weights (weights1),
proxies (proxy), together with the above functions. Figure 7a is
column 2 of nhmann; Figure 7b is column 1 of rpc; by
applying the function RPC to proxy, a matrix MBH of
reconstructed TPCs is obtained; Figure 7c is column 1 of MBH; Then
the corrected and updated proxy table proxy4 is loaded; a matrix new
of reconstructed TPCs is obtained by applying the function RPC to the new
table proxy. Figure 7d is column 1 of new. Figure 7 is then
plotted. The NH average
using the reconstructed TPCs is now calculated. The reconstructed TPCs are
expanded to 1082 cells using the eigenvectors eof, eigenvalues lambda,
and the selection of eigenvectors for each period in list select. The
MBH98 periodization is included as data within the function. The location
of the Mann cells is loaded within the function as is the vector nil of
cells with no data. This dataset is re-scaled using the scaling factors
calculated in the original Z-transform of the CRU temperature data. The average
is then taken of the dataset truncated to NH cells (using information on
location of Mann cells). Areal weighting was already allowed for in the
transformation described above. This gives Figure 8b; Figure 8a is the same as
Figure 7a above. |