Application of Preisendorfer Criteria to MBH98 Tree Ring Networks

Was Preisendorfer's Rule N used in MBH98 tree ring PC calculations?

Mann et al. have recently argued that they can salvage MBH98-type results using correct PC calculations under "the standard selection rule (Preisendorfer's Rule N) used by MBH98". http://www.realclimate.org/index.php?p=8 They say that this method permits them to retain 5 PCs in the North American network. Since the bristlecones are in the PC4, this expanded roster still permits them to imprint the NH temperature reconstruction. We have discussed elsewhere many issues regarding the robustness and statistical significance of this calculation. Here I consider the narrow issue of whether this method was actually "used by MBH98" for tree ring networks. I have been able to closely replicate the diagram published at realclimate.org on Nov. 22, 2004, said to be an example of the selection method used in MBH98. I have tested the 19 network/calculation step combinations used in MBH98 and, in 18 of 19 cases, the selections from the Preisendorfer-type calculation are inconsistent with the reported selections at the Corrigendum SI. In some cases, the results are higher; in some cases, lower. In three calculations, different selections are taken from the same network in different calculation steps - a result inconsistent with the stated policy. We remain puzzled why Mann et al. continue to refuse to provide source code for MBH98 calculations and why climate scientists do not expect them to do so.

Statements in MBH98

First, there is no mention in MBH98 or the MBH98 SI that Preisendorfer's Rule N was used to determine the number of retained PC series for tree ring networks. The only pertinent reference in MBH98 was as follows:

Certain densely sampled regional dendroclimatic data sets have been represented in the network by a smaller number of leading principal components (typically 3–11 depending on the spatial extent and size of the data set). This form of representation ensures a reasonably homogeneous spatial sampling in the multiproxy network (112 indicators back to 1820). [our bolds]

This statement contains no reference to the use of Preisendorfer's Rule N.

In connection with the calculation of temperature principal component series, a different calculation, MBH98 does refer to the use of Preisendorfer's Rule N as follows:

a conventional Principal Component Analysis (PCA) is performed... An objective criterion was used to determine the particular set of eigenvectors which should be used in the calibration as follows. Preisendorfer’s selection rule ‘rule N’ was applied to the multiproxy network to determine the approximate number N_eofsof significant independent climate patterns that are resolved by the network, taking into account the spatial correlation within the multiproxy data set.

Before trying to interpret these two statements from a text analytic point of view, I will make four quick points about rules for deciding the number of PCs to retain:

The briefest survey of PC literature will show that there are many approaches to selecting the number of PC series to retain and Preisendorfer's Rule N is far from being a "standard selection rule".
in fact, Urban, in a presentation about PCs cited on Jan. 6, 2005 by Mann at realclimate stated that the choice was subjective as follows:

It should be noted that because the goal of PCA is essentially utilitarian, the choice of how many axes to retain is ultimately subjective. In practice, either 2 or 3 axes are retained, simply because it is difficult to project more than this onto a printed page.
Overland and Preisendorfer [1982] themselves argued that being significant under Rule N was only necessary for significance; they did not argue that it was sufficient.
The real test for retaining a PC series is not whether it is significant under Preisendorfer's Rule N (or some other such rule), but whether it is scientifically significant,. For example, Franklin et al. [1995] stated:

In the final analysis, the retained components must make good scientific sense (Frane & Hill 1976; Legendre & Legendre 1983; Pielou 1984; Zwick & Velicer 1986; Ludwig & Reynolds 1988; Palmer 1993).

Now, from a text analytic perspective, a reasonable reader might conclude that the difference in description of the PC retention policy in the two cases - tree rings and temperatures - pointed to the use of different procedures in the two calculations. In fact, the form of PC calculation in the two calculations differed: we have determined that the temperature PC calculations were centered calculations, while, as we've pointed out in our recent articles (and earlier), the tree ring PC calculations were not conventional centered calculations. Mann et al. have recently (Jan. 6, 2005) acknowledged that they did not use a "standard centered method" so their use of an uncentered method is no longer in dispute.

Applying Preisendorfer's Rule N to Tree Ring Networks

The real test for whether Preisendorfer's Rule N was used in MBH98 was whether the actual number of selected PCs can be replicated using this method.

The actual retentions for each calculation step/network combination were not provided in MBH98, its SI or at Mann's FTP site. The first complete listing of actual retentions came in the Corrigendum SI (July 2004). Even the Corrigendum SI contains no summarized listing: the following table was collated from the Corrigendum SI and shows the number of retained PCs by calculation step-network combination. (It was impossible to deduce this table with the additional disinformation of Mann et al. [2003] that 159 distinct series were used, since only 139 distinct series were actually used. Any such deduction attempts were further blocked by erroneous listings of the number of series used in the AD1450 step and the erroneous non-use of 6 available series in the AD1500 step. These do not affect early 15th century results, but frustrate attempts at replication.)

	1400	1450	1500	1600	1700	1730	1750	1760	1780	1800	1820
Stahle/OK	0	0	0	0	3	3	3	3	3	3	3
Stahle/SWM	1	1	2	4	7	7	9	9	9	9	9
NOAMER	2	2	6	7	7	7	9	9	9	9	9
SOAMER	0	0	0	2	2	2	3	3	3	3	3
AUSTRAL	0	0	0	3	3	3	4	4	4	4	4
Vaganov	0	1	1	2	2	2	3	3	3	3	3
PC series	3	4	9	18	24	24	31	31	31	31	31
Direct proxy	19	21	19*	39	50	55	58	62	66	71	81
Total series	22	25*	28*	57	74	79	89	93	97	102	112

Table 1. Proxy series used in MBH98 (collated from Corrigendum SI, July 2004), showing the number of retained PC series by network-calculation step combination. *: The total number of series used in the AD1450 step is incorrectly stated in MBH98 as 24 (but error is not reported yet). Six series available in the AD1500 network are not used.

The first hints that a Preisendorfer-type policy had supposedly been used in MBH98 came in our Nature correspondence. In response to our observation of the error in their PC methods, Mann et al. [Revised Nature Reply] had noticed that, under correct PC calculations, the bristlecone pine pattern was demoted from the PC1 to the PC4.

precisely the same 'hockey stick' PC pattern appears using their convention, albeit lower down in the eigenvalue spectrum (PC#4) (Figure 1a). If the correct 5 PC indicators are used, rather than incorrectly truncating at 2 PCs (as MM04 have done), a reconstruction similar to MBH98 is obtained.

They argued that they could still salvage a hockey-stick shaped series using a Preisendorfer-type calculation on the AD1400 North American network. The calculation published on Nov. 22, 2004 at realclimate showing the implementation of a Preisendorfer-type calculation on the AD1400 North American network was originally submitted in our Nature correspondence. We had seen this diagram and calculation in August 2004 and had fully considered it in our GRL submission - in fact, it contributed to the approach taken in our GRL submission, which differs substantially from our previous Nature submission.

Realclimate Nov 2004 Figure 1

Figure 1 below, http://www.realclimate.org/index.php?p=9 (Nov. 22, 2004), and the two tables are all taken from realclimate, illustrating the application of the supposed Preisendorfer-type calculation. (The original section from Preisendorfer is re-typed here for reference.) The blue and red lines show the simulation results (using AR1 models of the AD1400 North American network) under MBH98 and centered PC calculations respectively; the red and blue points show actual results from the MBH98 and centered PC methods respectively. Preisendorfer's Rule N selects PC series as long as the actual eigenvalue exceeds the simulation. For the MBH98 method, 3 eigenvalues are clearly separated under Rule N and perhaps 7 in a centered calculation. This result is strangely described by Mann et al as follows:

" In the former case, 2 (or perhaps 3) eigenvalues are distinct from the noise eigenvalue continuum. In the latter case, 5 (or perhaps 6) eigenvalues are distinct from the noise eigenvalue continuum.

It seems obvious that the selection of 2 (rather than 3) eigenvalues in MBH98 cannot be directly justified on this diagram without appeal to some still unstated method.

Eigen #	% Variance	Cum % Variance
1	0.3818	0.3818
2	0.0976	0.4795
3	0.0491	0.5286
4	0.0354	0.564
Table 1 (from realclimate - MBH98 method. Bold -retained PCs

Eigen #	% Variance	Cum % Variance
1	0.1946	0.1946
2	0.0905	0.2851
3	0.0783	0.3634
4	0.0663	0.4297
5	0.0549	0.4846
6	0.0373	0.5219
Table 2, from realclimate - centered "MM" method. bold - retained PCs

FIGURE 1. "Comparison of eigenvalue spectrum resulting from a Principal Components Analysis (PCA) of the 70 North American ITRDB data used by Mann et al (1998) back to AD 1400 based on Mann et al (1998) centering/normalization convention (blue circles) and MM centering/normalization convention (red crosses). Shown also is the null distribution based on Monte Carlo simulations with 70 independent red noise series of the same length and same lag-one autocorrelation structure as the actual ITRDB data using the respective centering and normalization conventions (blue curve for MBH98 convention, red curve for MM convention). In the former case, 2 (or perhaps 3) eigenvalues are distinct from the noise eigenvalue continuum. In the latter case, 5 (or perhaps 6) eigenvalues are distinct from the noise eigenvalue continuum." Original legend from: Mann, http://www.realclimate.org/index.php?p=9

Replication of Realclimate Nov 22, 2004 Figure 1

Figure 2 below shows my replication of the above calculations. The left panel repeats realclimate Nov 22, 2004 Figure 1 (as above), while the right panel shows my emulation, using the script here. The salient features of the methods are obviously captured.

FIGURE 2. AD1400 North American network - Preisendorfer-type calculations Left panel: Mann et al. [realclimate]. Points - NOAMER netowrk; lines - simulations. Blue - MBH98 decentered; red - centered. Right panel: Emulation of calculation in left panel.

TESTING OTHER NETWORK/TIMESTEP COMBINATIONS

MBH98 has 6 networks with erratic changes of PC retention by timestep, yielding a total of 17 network/timestep combinations, all of which are examined below,

Stahle/OK

MBH98 only used one network/directory combination here, retaining three PCs. The observed retention is inconsistent with Rule N. The PC3 is insignificant under Rule N, but is retained anyway. There is little difference between MBH98 (blue) and centered (red) results - presumably because Stahle pre-whitens site chronologies.