March 21, 2004

To:       Nature Publishing Group

Re:             Revision of 2004-01-14277

Enclosed please find a revision of 2004-01-14277, a comment on a paper by Mann, Bradley and Hughes (MBH98), published in Nature in 998, revised in accordance with referee comments enclosed in an email of March 9, 2004 from Rosalind Cotter. We appreciate the consideration extended to us by Nature.

Sincerely,

Stephen McIntyre

Ross McKitrick

 

March 21, 2004

To:       Nature Editors and Reviewers

From:    Stephen McIntyre and Ross McKitrick

Re:             Manuscript 2004-01-14277

Reply to Referees and MBH

Our submission analyzed in detail the effect of “key” indicators on the temperature reconstruction carried out in Mann et al (1998) (MBH98), with particular attention to principal component analysis of their North American tree ring network. We will discuss the comments of the referees first, then the response by Professor Mann and colleagues (MBH). In respect to perhaps the most contentious issue, we categorically confirm that our early 15th century calculations included the NOAMER PC1 and PC2. We have provided the software used in our calculations in the SI (to be submitted a few days after this letter.) The principal changes to the manuscript relate to analyzing the top-weighted sites in the MBH98 NOAMER PC1 and to reporting on and discussing verification statistics.

Referee #1.

1.       We agree with the unease of referee #1 in respect to “standardisation based on a small segment of the series to the whole series” as this exactly what is done in MBH98 and is our principal objection. In order to clarify this objection, we have applied the referee’s phrase. We note that Mann et al. did not dispute our description of their standardization procedures. Our studies indicate that the principal impact of their procedure results from the step involving the calculation of the mean over the small segment.

2.       We agree that the results are very sensitive to the presence/absence of early data and that the selection criteria need to be carefully examined. We hope that recognition of this sensitivity will be an outcome of our paper. We note that Mann et al. do not dispute our analysis of the quality issues pertaining to the Stahle/SWM data and now assert that this series does not have a material impact (a conclusion with which we agree). Similarly, they do not dispute our observation that the Twisted Tree series does not affect the early 15th century reconstruction. We have moved some footnote comments regarding the Gaspé series to the main text and amplified these comments, in response to a specific criticism by MBH (see below).

3.       Like referee #1, we do not believe that RE statistics dispose of the matter. We believe that RE statistics are useful, in combination with other statistics (e.g. R2), but the more basic issues here concern data quality and procedures. Goodness-of-fit statistics cannot be invoked to defend use of data known to be faulty on other grounds. We have added some text to deal with this issue.

Referee #2

We agree with referee #2 that examination of software may be of interest to readers. We have included scripts to produce all figures and statistics in our SI (to follow in a few days).

1.       We categorically assert that all our early 15th century calculations include two NOAMER PCs. This is proven in the software which we have included. As referee #2 points out, MBH do not address our criticism of the way the NOAMER series were constructed, instead they carry out a “simulation” in which these indicators are deleted.

2.       This point would seem to be directed at Mann et al. For our part, it appears to be a straw man argument for Mann et. al. to criticize their “simulation” of our procedure (excluding the NOAMER principal component series) while ignoring what we actually did. The reason why the shape of the early 15th century portion of the temperature index in the two cases (the “simulation” MM04c – excluding the NOAMER PCs – and MM04 – using a conventional algorithm) are similar is that, in both cases, the Graybill sites do not dominate the NH temperature index – a point discussed in the revised draft.

3.       We have carried out calculations of RE and R2 statistics for various reconstructions and, as noted in our comments for referee #1, have added text on this issue.

4.   This comment is directed to MBH.

Reply to MBH

We respond here to what we believe to be the most significant points.

1.       MBH argued that many of the criticisms of the quality of the MBH98 database presented in MM03 and of the Stahle/SWM series in MM04, even if correct, do not affect the calculations. We have focused in this submission on “key indicators” rather than on all possible data quality issues. MBH offer no defense of the early portion of the Stahle-SWM series, but instead argue that its effect on early 15th century values is inconsequential. We agree that the Stahle/SWM PC1 does not affect early 15th century values and have noted this in our revision. (However, it was Mann et al. who argued that this was a “key” indicator, not us.) Although they apparently object to our use of the updated and corrected edition of the Twisted Tree series, they offer no substantive defense of the obsolete edition that they used.

2.       In their Appendix they objected to our focus on the Sheep Mountain series, noting that other series contributed to the NOAMER PC1. So we have broadened the point by analyzing the 16 series with weights at least 25% of that given to Sheep Mountain, the most heavily weighted series. They are:

ID

Name

Species

Elevation (m)

Author

Graybill-Idso (1993) #

Exclusion in MBH98 Censored

ca528

Flower Lake

PIBA

3291

D.A. Graybill

13

TRUE

ca529

Timber Gap Upper

PIBA

3261

D.A. Graybill

14

TRUE

ca530

Cirque Peak

PIBA

3505

D.A. Graybill

12

TRUE

ca533

Campito Mountain

PILO

3400

D.A. Graybill and V.C. Lamarche

5

TRUE

ca534

Sheep Mountain

PILO

3475

D.A. Graybill

11

TRUE

ca555

Yolla Bolly

PIBA

2460

B. Buckley

 

FALSE

co523

Windy Ridge

PIAR

3570

D.A. Graybill

4

TRUE

co524

Almagre Mountain

PIAR

3536

D.A. Graybill

1

TRUE

co525

Hermit Lake

PIAR

3660

D.A. Graybill

3

TRUE

co535

Frosty Park

PIFL

3218

D.A. Graybill

 

TRUE

co545

Niwot Ridge

PIFL

3169

D.A. Graybill

 

TRUE

nv510

Charleston Peak

PILO

3425

D.A. Graybill

6

TRUE

nv511

Mount Jefferson

PIFL

3300

D.A. Graybill

7

TRUE

nv512

Pearl Peak

PILO

3170

D.A. Graybill

9

TRUE

nv513

Mount Washington

PILO

3415

D.A. Graybill

8

TRUE

nv514

Spruce Mountain

PILO

3110

D.A. Graybill

 

TRUE

It immediately stands out that 15 of the 16 sites are high-altitude sites due to Donald Graybill. We identified a presentation of 12 of these sites in Graybill and Idso (1993) (which was also discussed in Mann et al. (1999).) Graybill and Idso specifically stated that the 20th century growth in these sites were not accounted for by local or regional temperature and hypothesized that these trees  (selected for cambial dieback) contained signals of direct 20th century CO2 fertilization. Indeed Mann et al. (1999) explicitly acknowledged this and introduced a nonlinear transformation as an adjustment for CO2 fertilization effects. We also analyzed the “censored” version of the NOAMER PC calculations at the MBH98 FTP site to determine which sites were excluded (finding out in the process that the PC1 was virtually identical to our own calculations.) We found that 19 of the 20 sites so “censored” were Graybill sites and that all of the above 16 sites were so censored. We believe that this analysis sheds a great deal of light on the critical NOAMER PC1 and also on how much significance can be placed on RE and other verification statistics, and have added a discussion of this effect.

3.       Since the overall fit was improved by using data known to be unsuitable as temperature proxies (selected by a flawed calculation algorithm), any improvement in reconstructive skill (as exemplified by an RE statistics) is spurious. We have also reported on the R2 statistic (which is low) on the basis that a good model should meet several statistical tests, not merely an RE test.

4.       MBH attempt to rebut our argument about the inappropriateness of their principal component methodology (short-segment standardization - again borrowing a phrase from referee #1) by showing that recalculation of their index with all proxies uniformly weighted gives similar results in the early 15th century to MBH98. In this case, the 20 Graybill-Idso proxies go from being dominant members in the NOAMER PCs, which account for 2 out of 22 proxies in the MBH98 AD1400 roster, to being 20 out of 95 proxies in a re-constituted AD1400 roster, and thus still drive the behaviour of the AD1400 model. In any event, even if similar results could be obtained by a different method, that has no bearing on the validity of the methods used in MBH98. Introduction of a new methodology would require full presentation and justification of the new methodology.

5.       Even now, MBH may feel they had good reason to use short-segment standardization (by pre-scaling their proxies to the 1901-1980 mean and standard deviation, and then to detrended standard deviations, and so forth) in order to emphasize the contribution of the Graybill-Idso series, although they do not do so in their response. However, it is more significant that they did not state and defend what they were doing in MBH98; neither did they alert their readers that their unconventional principal component methods were so crucial to their final results or that their results depended so heavily on a very unique set of tree ring sites with cambial dieback. Had they said all this in their original write-up, their original referees would undoubtedly have asked that they qualify their conclusions accordingly and show the effects of these methodological decisions. It may even have influenced the publication decision. It is certainly information that readers ought to have been given at the time.

6.       MBH also state that our analysis of the effect of the extrapolation of the early portion of the Gaspé series is a mere “technicality”. This is hardly correct. Our interest in this series was drawn by the fact that the series as used did not coincide with the archived version at WDCP and that it was the only one of the over 400 series used by MBH98 in which there was an extrapolation of the early part of the series. It was hard to avoid noticing that this extrapolation permitted insertion of this series into the AD1400 roster, and, once there, it strongly affected the early 15th century values (which otherwise seemed to want to go up.) When we carried out before and after calculations, we found quite a dramatic impact from this one series, which seem inconsistent to us with robustness requirements. Further examination showed that the early portion of the series consisted of only one tree; that the 20th century portion was highly non-linear and the site was located in a forested area nowhere near the northern treeline (of which it is supposed to be a member). We have added a brief discussion of this proxy, in response to MBH’s criticism on this point, to explain the substantive reasons for excluding it from the AD1400 roster. While we have provided such a justification, it is our strong belief that there is a reverse onus for justification: it is up to MBH to disclose and justify the unique extrapolation and the inclusion of this series in the AD1400 roster, which would require them to address and resolve the substantive issues relating to this proxy.

Changes to the manuscript

We have added some text to the introduction and conclusion to emphasize that our critique centers on undisclosed, questionable mathematical procedures for principal component calculation that inappropriately weight the least suitable proxies. We have responded to Professor Mann’s criticism that sites other than Sheep Mountain contribute to their NOAMER PC1 by discussing the full list of high altitude sites that make up the dominating subset of the NOAMER data. We are assisted in this analysis by the fact that the MBH98 FTP site actually contains an unreported principal component calculation with certain series “censored”; this PC1 turns out to correspond very closely with the PC1 we calculated without short–segment standardization. We explain, as briefly as possible, why these influential censored series are problematic as temperature proxies, not only in the view of the originating scientists but in the view of MBH themselves one year later (Mann et al. (1999).

We include RE statistics, as requested. It is important in this context to note that the largest improvement in RE fit comes from using the flawed NOAMER PC1 and from the pre-1450 Gaspé series, both of which are problematic for the reasons discussed in our paper. We also report on R2 statistics in the verification period, which are rather low. Obviously, a good model has to meet a number of statistical tests, not merely an RE test.

We wish to express our thanks to the referees and editors of Nature, and to Professors Mann, Bradley and Hughes, for considering our work.

Sincerely

Stephen McIntyre

Ross McKitrick