Replication #13: The 159 Series

Does anyone remember the 159 series said to have been used in MBH98? Mann et al. [2003] stated:

MBH98 calculated PCs of proxy sub-networks separately for each interval in their stepwise reconstruction. This is the only sensible approach, as it allows all data available over each sub-interval to be used. This requires 159 independent time series to represent all indicators required for reconstructions of all possible sub-intervals, even though the maximum number ever used for a particular sub-interval is 112. By not following this protocol, MM appear to have eliminated in the range of 100 proxy series used by MBH98 over the interval 1400-1600.

The Corrigendum SI re-states that PCs are calculated separately for each interval, but does not mention 159 series (for a reason that will become clear). Previously, as is well known, there had been no mention of 159 series in MBH98, which, in respect to tree ring PC series, had only stated:

Certain densely sampled regional dendroclimatic data sets have been represented in the network by a smaller number of leading principal components (typically 3–11 depending on the spatial extent and size of the data set). This form of representation ensures a reasonably homogeneous spatial sampling in the multiproxy network (112 indicators back to 1820).

Zorita had never heard of 159 series, only 112 [Crok, 2005], although Zorita et al. [2003] is cited on several occasions in the Corrigendum as evidence that sufficient information was available to permit replication.

Let's first examine the claim that PCs are calculated separately for each interval. The University of Virginia FTP site contained directories for only some interval-region combinations. From the UVA site and the Corrigendum SI, we tabulated the following table showing which PC series were used in each calculation interval. We have been unable to discern any pattern to these selections and have simply adopted this table as a parameter of the stepwise method. Obviously this pattern is inconsistent with any methodological descriptions provided to date. 

   Table 1. PC Subdirectory Used in Calculation Step by Network

Network 

1400

1450

1500

1600

1700

1730

1750+

Stahle/OK

NA

NA

NA

NA

1700

1700

1700

Stahle/SWM

1400

1450

1500

1600

1700

1700

1700

NOAMER

1400

1450

1450

1600

1600

1600

1750

SOAMER

NA

NA

NA

1600

1600

1600

1600

AUSTRAL

NA

NA

NA

1600

1600

1600

1750

Vaganov

NA

1450

1450

1600

1600

1600

1750

The Corrigendum SI also listed the number of PC series used in each calculation step by network. Table 2 shows our collation of this information. For reasons shown below, we were unable to deduce this structure from previous information (including the famous 159 series) and we do not believe that it is possible to deduce this structure. In November 2004 at realclimate, Mann published a template for applying a Preisendorfer-type criterion to retain 2 PC series in the AD1400 North American network. However, as I showed in Was Preisendorfer's Rule N Used?, it is impossible to replicate the retentions in other calculation step/network combinations. In particular, it is impossible under a Preisendorfer method to explain the increase from 2 to 6 PCs retained from the same set of proxies going from the AD1450 network to the AD1500 North American network (see Table 1 for subdirectory used.)

    Table 2. PC Series by Network and Calculation Step

 

1400

1450

1500

1600

1700

1730

1750+

Stahle/OK

0

0

0

0

3

3

3

Stahle/SWM

1

1

2

4

7

7

9

NOAMER

2

2

6

7

7

7

9

SOAMER

0

0

0

2

2

3

3

AUSTRAL

0

0

0

3

3

3

4

Vaganov

0

1

1

2

2

2

3

Total

3

4

9

18

24

25

31

To reconcile to 159 series, one needs to determine how many different PC series exist out of the 114 uses listed above.  Since the Corrigendum SI lists all the series in digital form, this exercise can be carried out directly and resulted in the following compilation:

    Table 3. Number of Different PC Series by Originating Subdirectory and Network

Network

1400

1450

1500

1600

1700

1730

1750+

TOTAL

Stahle/OK

0

0

0

0

3

0

0

3

Stahle/SWM

1

1

2

4

9 (2)

0

0

17

NOAMER

2

6 (1)

0

7

0

0

9

24

SOAMER

0

0

0

3 (3) 

0

0

0

3

AUSTRAL

0

0

0

3

0

0

4

7

Vaganov

0

1

0

2

0

0

3

6

Total

3

8

2

19

12

0

16

60

 Notes: (1) only 2 used in AD1450 step, 6 used in AD1500 step;   (2)7 used in AD1700 step; 9 in AD1750 step on; (3) 2 used in AD1600 step; 3 after 1730.

Combining this total of 60 different PC series and 81 non-proxy series, we arrive at a total of 141 different series used in MBH98 rather than the figure of 159 series. No wonder this was impossible to reconcile.

After the Corrigendum SI, in addition to our re-iterated request to Nature for source code, we asked Nature to provide a listing of the supposed 159 series. They said that the figure of 159 series had nothing to do with them and was not their responsibility. However, the figure remains uncorrected.

Two other very irritating discrepancies turned up in this process, shown in red bold in the following table. 

Table 4. Number of Proxies by Calculation Step

 

1400

1450

1500

1600

1700

1730

1750

1820
PCs Used

3

4

9

18

24

25

31

31
Non-PC Proxies Available 19 21 25 39 50 55 58 81
Theoretical Total 22 25 34 57 74 80 89 112
Number in Corrigendum SI  22 25 28 57 74 80 89 112
Nature 22 24 28 57 74 80 89 112

First, the sum of the number of PC series together with the number of available proxies is 25, which is the number shown in the Corrigendum SI. The number shown in Nature is incorrect, but has not been corrected. It's only relevant if there is no listing of retentions (as was the case up to July 2004) and you're trying to balance discrepant information.

The second inconsistency is more bizarre. In the AD1500 step, the number of series shown as used in the Corrigendum SI matches the number in Nature. However, neither number matches the sum of PCs used and available proxies - 6 available non-PC proxies are not used. Of these 6 proxies, 5 were used in the AD1450 step, but missed in the AD1500 step. It's hard to figure how they did this. You can also imagine the impact of this nonsense on trying to then guess how to deploy 159 series in total.