Shifting Standards: Experiments in Particle Physics in the Twentieth Century
()
About this ebook
Franklin develops a framework for his analysis, viewing each example according to exclusion and selection of data; possible experimenter bias; details of the experimental apparatus; size of the data set, apparatus, and number of authors; rates of data taking along with analysis and reduction; distinction between ideal and actual experiments; historical accounts of previous experiments; and personal comments and style.
From Millikan's tabletop oil-drop experiment to the Compact Muon Solenoid apparatus measuring approximately 4,000 cubic meters (not including accelerators) and employing over 2,000 authors, Franklin's study follows the decade-by-decade evolution of scale and standards in particle physics experimentation. As he shows, where once there were only one or two collaborators, now it literally takes a village. Similar changes are seen in data collection: in 1909 Millikan's data set took 175 oil drops, of which he used 23 to determine the value of e, the charge of the electron; in contrast, the 1988-1992 E791 experiment using the Collider Detector at Fermilab, investigating the hadroproduction of charm quarks, recorded 20 billion events. As we also see, data collection took a quantum leap in the 1950s with the use of computers. Events are now recorded at rates as of a few hundred per second, and analysis rates have progressed similarly.
Employing his epistemology of experimentation, Franklin deconstructs each example to view the arguments offered and the correctness of the results. Overall, he finds that despite the metamorphosis of the process, the role of experimentation has remained remarkably consistent through the years: to test theories and provide factual basis for scientific knowledge, to encourage new theories, and to reveal new phenomenon.
Related to Shifting Standards
Related ebooks
Fixing Modern Physics Rating: 0 out of 5 stars0 ratingsParticle Physics: A Beginner's Guide Rating: 5 out of 5 stars5/5Quantum Physics: A Beginner's Guide Rating: 3 out of 5 stars3/5The Odd Quantum Rating: 0 out of 5 stars0 ratingsLandmark Experiments in Twentieth-Century Physics Rating: 3 out of 5 stars3/5Can the Laws of Physics Be Unified? Rating: 0 out of 5 stars0 ratingsOn the Revolutions of the Internal Spheres: A New Theory of Matter and the Transmission of Light, Second Edition Rating: 0 out of 5 stars0 ratingsDefects and Defect Processes in Nonmetallic Solids Rating: 4 out of 5 stars4/5The Quantum Mechanics of Many-Body Systems: Second Edition Rating: 0 out of 5 stars0 ratingsElementary Particle Physics in a Nutshell Rating: 5 out of 5 stars5/5Spins in Chemistry Rating: 3 out of 5 stars3/5The Little Book of String Theory Rating: 4 out of 5 stars4/5In Quest of the Quark: A Student's Introduction to Elementary Particle Physics Rating: 0 out of 5 stars0 ratingsPlasma Physics Rating: 0 out of 5 stars0 ratingsLight Revolutions Rating: 0 out of 5 stars0 ratingsMathematical Analysis of Deterministic and Stochastic Problems in Complex Media Electromagnetics Rating: 5 out of 5 stars5/5The Path to Everywhere and Nowhere: The Trouble with a Unification Rating: 0 out of 5 stars0 ratingsParticle or Wave: The Evolution of the Concept of Matter in Modern Physics Rating: 4 out of 5 stars4/5A Collection of Articles on Physics and Others Rating: 0 out of 5 stars0 ratingsThe Principle of Relativity with Applications to Physical Science Rating: 0 out of 5 stars0 ratingsInteractions in Ultracold Gases: From Atoms to Molecules Rating: 0 out of 5 stars0 ratingsThermoelectricity: An Introduction to the Principles Rating: 4 out of 5 stars4/5The Nature of the Atom: An Introduction to the Structured Atom Model Rating: 0 out of 5 stars0 ratingsElectromass: The Same Principles at Every Scale Rating: 3 out of 5 stars3/5Solid State Theory Rating: 0 out of 5 stars0 ratingsBasic Concepts of Nuclear Physics Rating: 0 out of 5 stars0 ratingsInstead of the ITER project and the TOKAMAK principle: – a new type of fusion machine Rating: 5 out of 5 stars5/5Physical Sciences, Revised Edition: Notable Research and Discoveries Rating: 0 out of 5 stars0 ratingsNuclear Power Explained Rating: 0 out of 5 stars0 ratingsGauge Field Theories: An Introduction with Applications Rating: 0 out of 5 stars0 ratings
Science & Mathematics For You
Memory Craft: Improve Your Memory with the Most Powerful Methods in History Rating: 3 out of 5 stars3/5Outsmart Your Brain: Why Learning is Hard and How You Can Make It Easy Rating: 4 out of 5 stars4/5Feeling Good: The New Mood Therapy Rating: 4 out of 5 stars4/5Becoming Cliterate: Why Orgasm Equality Matters--And How to Get It Rating: 4 out of 5 stars4/5The Invisible Rainbow: A History of Electricity and Life Rating: 4 out of 5 stars4/5Ultralearning: Master Hard Skills, Outsmart the Competition, and Accelerate Your Career Rating: 4 out of 5 stars4/5The Gulag Archipelago [Volume 1]: An Experiment in Literary Investigation Rating: 4 out of 5 stars4/5The Big Book of Hacks: 264 Amazing DIY Tech Projects Rating: 4 out of 5 stars4/5A Letter to Liberals: Censorship and COVID: An Attack on Science and American Ideals Rating: 3 out of 5 stars3/5A Crack In Creation: Gene Editing and the Unthinkable Power to Control Evolution Rating: 4 out of 5 stars4/5The Systems Thinker: Essential Thinking Skills For Solving Problems, Managing Chaos, Rating: 4 out of 5 stars4/5Why People Believe Weird Things: Pseudoscience, Superstition, and Other Confusions of Our Time Rating: 4 out of 5 stars4/5How Emotions Are Made: The Secret Life of the Brain Rating: 4 out of 5 stars4/5Activate Your Brain: How Understanding Your Brain Can Improve Your Work - and Your Life Rating: 4 out of 5 stars4/5The Trouble With Testosterone: And Other Essays On The Biology Of The Human Predi Rating: 4 out of 5 stars4/5The Big Fat Surprise: Why Butter, Meat and Cheese Belong in a Healthy Diet Rating: 4 out of 5 stars4/5On Food and Cooking: The Science and Lore of the Kitchen Rating: 5 out of 5 stars5/5The Gulag Archipelago: The Authorized Abridgement Rating: 4 out of 5 stars4/5Free Will Rating: 4 out of 5 stars4/5How to Think Critically: Question, Analyze, Reflect, Debate. Rating: 4 out of 5 stars4/5Oppenheimer: The Tragic Intellect Rating: 5 out of 5 stars5/5Other Minds: The Octopus, the Sea, and the Deep Origins of Consciousness Rating: 4 out of 5 stars4/5The Hungry Brain: Outsmarting the Instincts That Make Us Overeat Rating: 4 out of 5 stars4/5Homo Deus: A Brief History of Tomorrow Rating: 4 out of 5 stars4/5Suicidal: Why We Kill Ourselves Rating: 4 out of 5 stars4/5The Structure of Scientific Revolutions Rating: 4 out of 5 stars4/5No-Drama Discipline: the bestselling parenting guide to nurturing your child's developing mind Rating: 4 out of 5 stars4/5The Wuhan Cover-Up: And the Terrifying Bioweapons Arms Race Rating: 5 out of 5 stars5/5The Dorito Effect: The Surprising New Truth About Food and Flavor Rating: 4 out of 5 stars4/5Hunt for the Skinwalker: Science Confronts the Unexplained at a Remote Ranch in Utah Rating: 4 out of 5 stars4/5
Reviews for Shifting Standards
0 ratings0 reviews
Book preview
Shifting Standards - Allan Franklin
SHIFTING STANDARDS
Experiments in Particle Physics in the Twentieth Century
ALLAN FRANKLIN
UNIVERSITY OF PITTSBURGH PRESS
Published by the University of Pittsburgh Press, Pittsburgh, Pa., 15260
Copyright © 2013, University of Pittsburgh Press
All rights reserved
Manufactured in the United States of America
Printed on acid-free paper
10 9 8 7 6 5 4 3 2 1
Cataloging-in-Publication data is available from the Library of Congress
ISBN 10: 0-8229-4430-8
ISBN 13: 978-0-8229-4430-0
ISBN-13: 978-0-8229-7919-7 (electronic)
Contents
Preface
Prologue: The Rise of the Sigmas
Introduction
Chapter 1. Some Measurements of the Temperature Variation in the Electrical Resistance of a Sample of Copper
Chapter 2. Do Falling Bodies Move South?
Chapter 3. The Isolation of an Ion, a Precision Measurement of Its Charge, and the Correction of Stokes’s Law
Chapter 4. Directed Quanta of Scattered X-rays
Chapter 5. A Determination of e/m for an Electron by a New Deflection Method
Chapter 6. An Uncertain Interlude
Chapter 7. Electron Polarization
Chapter 8. Mean Lifetime of V-Particles and Heavy Mesons
Chapter 9. Detection of the Free Antineutrino
Chapter 10. Measurement of the Ke2+ Branching Ratio
Chapter 11. Determination of Kl3 Form Factors from Measurements of Decay Correlations and Muon Polarizations
Chapter 12. Bad Data: An Interlude
Chapter 13. Measurement of the Antineutron-Proton Cross Section at Low Energy
Chapter 14. New Measurements of Properties of the Ω− Hyperon
Chapter 15. The Coherent Scattering of Neutrinos
Chapter 16. Search for Neutral Weakly Interacting Massive Particles in the Fermilab Tevatron Wideband Neutrino Beam
Chapter 17. Measurement of the B+ Total Cross Section and B+ Differential Cross Section dσ/dpT in p Collisions at √s = 1.8 TeV
Chapter 18. B Meson Decays to Charmless Meson Pairs Containing η or η’ Mesons
Chapter 19. The Case of the Disappearing Sigmas
Conclusion
Notes
References
Index
Preface
In his one-paragraph short story On Exactitude in Science,
the great Argentinian writer Jorge Luis Borges presents a literary forgery, where the following quotation is fictitiously attributed to Suárez Miranda: . . . In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it
(Borges 1971, 123). A writer who wishes to discuss experiment, even if restricted only to physics, would be faced with a similar problem. Experimental practice is so varied that in order to discuss it fully one might very well have to summarize all experimental papers. I suspect that there are no typical experiments that could be used as exemplars. In this book I will discuss the changes in experimental practice in the twentieth century, restricting myself to experiments in particle physics. There are several reasons for this restriction. First, it makes the discussion tractable. Second, some of the changes discussed, particularly those of scale, are best illustrated in particle physics. Third, particle physics is the field of physics I know best. I strongly believe that knowledge of the science is essential for historical and philosophical study of any science. Although I believe that some of the changes discussed are valid for other fields of experimental physics, and possibly for other scientific disciplines, I do not have sufficient knowledge of those fields to make any useful comments.
The reader will notice, particularly in the prologue, that numerous elementary particles, such as the π, Λ, μ, ω, and ρ, are mentioned in the text. None of the discussion demands knowledge of these particles and their properties. The symbols should be read as the names of characters in the studies. They do, however, serve to show that the practices discussed are used in a wide variety of experiments in particle physics. In addition, including the particle names is required for historical accuracy.
Similarly, various mathematical and statistical techniques are mentioned and discussed briefly. Some, like the standard deviation (sigma) and χ², are discussed briefly in the text and in more detail in the notes. Others, such as Boosted decision tree, neural network discriminants, and matrix element discriminant, are merely mentioned. These are advanced techniques and certainly exceed my own knowledge. The important point is that several of these methods are used in a single experiment and their agreement argues for the robustness and correctness of the results. In addition, these results are checked against actual data, providing more confidence in the results.
This project would not have started without a question, combined with later valuable discussions, from Harry Collins, who, to use his felicitous expression, is my onetime bitter academic enemy and now valued colleague.
Harry asked me about the changing standards for discovery in high-energy physics, which led to the study documented in the prologue. The discussions, along with Harry’s fine book Gravity’s Ghost, were essential in this work and also helped in my investigation of other aspects of experiment that have changed with time.
This book could not have been completed without the assistance of my colleagues in the experimental high-energy physics group at the University of Colorado: John Cumalat, Brian Drell, Bill Ford, Eduardo Luiggi, Jim Smith, Kevin Stenson, Keith Ulmer, and Steve Wagner. They took time out from actually doing physics to answer my questions and to provide very helpful discussions and material. Thanks also to Josh Ellenbogen for discussions on experiment in general and in particular on the philosophy of Duhem. I am also grateful to Jim Bogen and Giora Hon for carefully reading the manuscript and offering constructive and helpful suggestions. Thanks are also due to Alex Wolfe for his careful and thoughtful editing of the manuscript.
None of this work would have been possible without the support of my wife, and best friend, Cynthia Betts.
Prologue. The Rise of the Sigmas
Before beginning the discussion about changes in the experimental practices of particle physics in the twentieth century, it is worthwhile to introduce readers to changes in the field’s most notable experimental standard, namely, its use of standard deviation to measure the accuracy and credibility of results.
The discovery of the top quark was announced in three papers published between 1994 and 1995 by the Collider Detector at Fermilab (CDF) collaboration. The first two papers, one in the letters journal Physical Review Letters (Abe et al. 1994a) and the other in the more archival journal Physical Review D (Abe et al. 1994b),¹ were both titled Evidence for Top Quark Production in p Collisions at √s = 1.8 TeV.
² The third paper, also published in Physical Review Letters, was titled Observation of Top Quark Production in p Collisions with the Collider Detector at Fermilab
(Abe et al. 1995). The difference between the papers was that in the interval between when the first two papers were written and when the third paper was written the CDF group had acquired more data and had obtained a statistically more significant result. The Evidence for
papers presented a result that had a statistical significance of 2.8 standard deviations (or sigmas [σ]), whereas the Observation of
paper was a 5-standard-deviation effect.
This was an early indication of what has now become a general policy for papers on high-energy physics.³ Editors for Physical Review Letters and Physical Review have remarked that papers that are titled Observation of
must report at least a five-standard-deviation effect (see the discussion below and tables P.4 and P.5 for statistical evidence on this policy). Anything less statistically significant can be titled only Evidence for
or other similar phrases.⁴ According to several colleagues in high-energy physics, this unwritten policy is strongly adhered to and enforced by research groups themselves in the writing of papers, even before submission to a journal. It is also enforced by referees for the journals. This has become a significant issue within the high-energy physics community (see the discussion of the discovery of the Higgs boson below). Groups prefer to publish Observation of
papers, which has become the gold standard for a discovery.⁵ In fact, it is reasonable to say that Observation of
is synonymous with discovery of,
at least for the high-energy physics community. Thus, for example, in recent work on single-top-quark production, Fermilab issued a press release—Fermilab collider experiments discover rare single top quark
—announcing the discovery only after both the CDF (Aaltonen et al. 2009) and D0 (Abazov et al. 2009) groups at the laboratory had reported five-standard-deviation effects, meaning that both papers used Observation of
in their respective titles.⁶
Not everyone within the high-energy physics community is happy about the strict enforcement of the five-sigma rule. I have been told in private communication that there have been, on occasion, rather heated discussions between authors and editors about that enforcement. As it now stands, if you want to publish an Observation of
paper in either Physical Review Letters or Physical Review D, it must report a five-sigma effect.⁷
The Beginning
The use of standard deviations as a measure of the significance or credibility of a high-energy physics results seems to have started in the early 1960s.⁸ This is not to say that experimental physicists had not previously quoted an experimental uncertainty with their results, usually as a probable error or a standard deviation, but rather that the use of the number of standard deviations as a measure of credibility began at this time. Initially, there was no agreed upon criterion for the number of standard deviations that signaled a significant result. Papers reported effects as low as 1.5 standard deviations (Connolly et al. 1963). It should be noted that although no significant claim was made for this result, it was mentioned: For events satisfying these criteria, their 3π effective mass spectrum . . . is examined for evidence of a peak at the φ mass. There is a deviation at M(3π) = 1020 MeV of about 1.5 standard deviations above background
(375). This result was consistent with the mass of the known φ particle. An interesting question, and one that will be discussed later, is whether the criteria for the confirmation of an already reported result or for a discovery claim should be the same or different.
On occasion, significant claims were made on the basis of what we would now consider rather weak evidence. A case in point is the claim made by Baltay and collaborators (1966): "On the basis of our result we conclude that C [charge conjugation or particle-antiparticle] invariance is violated in η-meson decay into three pions" (1224). Their conclusion was based on an asymmetry between those η decays in which the positive pion had more energy and those in which the negative pion had more energy. They found that the asymmetry A, which is determined by (N+ − N−) / (N+ + N−), where N+ is the number of decays in which the positive pion had more energy and N− the number of decays that had more energetic negative pions, was equal to 0.072 ± 0.028, a 2.57-sigma effect: "The probability of obtaining a result |A| ≥ 0.072 because of random fluctuations in the data of this experiment, if there were no C-invariance, is 1.08 × 10−2" (Baltay et al. 1966, 1226). The group added two other less significant results that gave asymmetries A = 0.058 ± 0.034 and A = 0.087 ± 0.053, each less than a 2-sigma effect, rather weak evidence to their result: "Combining our data with the [earlier data], we obtain A = 0.068 ± 0.020. The probability, in the same sense as before, of obtaining this result, there being no C-invariance violation, is 8 × 10−4 (Baltay et al. 1966, 1226–27). That probability corresponds to a 3.3-sigma effect, which would have justified a discovery claim at the time. As Samuel Goudsmit, the editor of Physical Review at the time, remarked, an effect of less than three standard deviations is quite insufficient in such an important and subtle experiment
(Goudsmit 1971, 137). This was later shown to be a statistical fluctuation and not a real effect.
In a paper reporting the existence of possible resonances in the Ξπ and K systems Bertanza and coworkers remarked that the deviations from the phase-space predictions are 3 standard deviations for the Ξπ and 2.5 standard deviations for the K effective-mass-squared distribution. This estimate of error is based on the square root of the total number of events in the bins containing the peaks in the Ξπ and K Dalitz plots
(Bertanza et al. 1962, 180).⁹ The method used here to estimate the number of standard deviations differs slightly from that which is used currently. The experimenters used the total number of events, equal to the signal plus the background (S + B), to calculate the standard deviation, which in this method of calculation is equal to the square root of the total number of events. (See figure P.1, where the plotted xs indicate the calculated background, and the signal is the number of events above the background curve). They then calculated the number of standard deviations of the signal by dividing the signal by that standard deviation. This results in a lower value for the number of standard deviations of the signal than does current practice and lowers the significance of the results. Current practice calculates the number of standard deviations as S/√B, where S is the signal above background and B is the background. The standard deviation is √B, which gives a larger number of standard deviations to the signal and a greater statistical significance. The older method calculates the probability that the total number of events might fluctuate downward and give only the background. The current method calculates the probability that the background will fluctuate upward and give the total number of events. This is a small but significant difference in method. The issue is still, at least on occasion, a subject for discussion. In a paper published in 2004 the COSY-TOF collaboration made the following statement about their calculation of the statistical significance of their result:
The first alternative is the naive estimation NS/√NB where NS is the number of events corresponding to the signal on top of the fitted background and NB is the number of background events in the chosen area. . . . In the present case this leads to a significance of 5.9 σ. . . . This estimator however neglects the statistical uncertainty of the background and therefore usually overestimates the significance of the peak. A more conservative method which is reliable for cases where the background is smooth and well-fixed in its shape uses the estimator NS/√(NS + NB). In our case this method leads to a significance of 4.7 σ. The third expression taking into account the full uncertainty of a statistically independent background which should underestimate the significance of the signal is given by NS/√((NS + NB) + NB). This leads to a value of 3.7 σ. (Abdel-Barry et al. 2004, 132).
Often, other methods were used in addition to standard deviations. Not all data are suitable for analysis with standard deviations. If one wants to investigate which of two mathematical formulas better fits a set of data, then χ² analysis may be more appropriate. Bertanza and collaborators also used the χ² test: An analysis has been made of the effective-mass-squared distributions of the Ξπ and K systems by means of the χ² test. The probabilities that the observed distributions originate from their corresponding phase-space distributions are < 0.0001 for the Ξπ system and < 0.01 for the K system. The large χ² arise mainly from the single peaks appearing in each curve
(Bertanza et al. 1962, 182–83). In order to calculate the χ² value one must know the standard deviation: χ² = Σi=1N ([Observed events] – [Expected events])i²/σi². For randomly distributed events, σ² is equal to the expected number of events. If the data are a good fit to the proposed curve or hypothesis, the χ² per degree of freedom should be approximately one.¹⁰ It is also possible to translate χ² into a probability and then into standard deviations, although this is not always done. A large χ² indicates that the results do not fit the hypothesis, whereas a smaller χ² shows a good fit.
Sometimes the number of standard deviations is not mentioned at all. The paper that announced the discovery of the η meson by Pevsner and collaborators (1961) noted that there were 36 events in the appropriate region with an overestimated background on 12 events, a signal of 24 events, with no further comment.¹¹ The experimenters did, however, also present a graph of their data (figure P.1).¹² The smaller peak on the left, the η meson, is clearly visible.¹³ The combination of the two was deemed sufficient to establish their claim by both the authors and by the physics community.
A similar method was used by Alston and collaborators (1961) in reporting a new resonance in the K-π system. They noted that they had a signal of 22 events above a background of 3 events, a total of 25 events, and presented a graph of their data (figure P.2). No number of standard deviations for the signal was given, but the peak was clearly visible in the graph. In this case the authors did use standard deviations to determine the spin of the new particle: Experimentally
(Alston et al. 1961, 301). A three-sigma signal seemed to be sufficient.
The trend of using standard deviations as a measure of significance continued in the early 1970s.¹⁴ A survey of papers in Physical Review Letters in 1971 reveals that experimenters were now tending to cite only results with four standard deviations or more, although, once again, the method is not used exclusively. The origin of this change was attributed to Arthur Rosenfeld, a member of the Particle Data Group, which produced the standard reference guide of particle properties, Review of Particle Physics. According to the story, which received wide circulation within the high-energy physics community, Rosenfeld pointed out that given the large number of graphs that were plotted each year by experimenters, one would expect to see a significant number of three-standard-deviation effects even if the data were distributed randomly and no particles or resonances were present. Rosenfeld (1975) discussed the issue in print. In discussing the existence of the κ(725), a Kπ resonance that had been reported five times, but subsequently disappeared, Rosenfeld stated the following: We compiled and histogrammed (by computer) 60,000 new Kπ events, and found no substantial further evidence, and went on to ask how frequently such striking statistical fluctuations should be expected at some given mass in the Kπ system. (At the time about 2 million bubble chamber events were being measured annually, and about a thousand physicists were hunting through 10,000 to 20,000 mass histograms each year, in search of striking features, real or imagined,) We concluded that the five κ claims were just about what we should expect
(564–65).
The probability of a three-sigma effect in a single bin is 0.27 percent.¹⁵ In a 1,000 bin graph, however, the probability of observing a three-standard-deviation effect, if the data are distributed randomly, is 93 percent. The criterion was then changed to four sigma, which has a probability in a single bin of 0.0064 percent and a probability of 6 percent in a 1,000 bin graph. The question of what sample space to use for calculating such probabilities is important and will be discussed in more detail below.
Carmony and collaborators (1971), in reporting an Observation of a New KN(1760) in the Kπ and Kππ Systems,
stated that decays into K*(890) π and Kρ with the same mass and width are observed with 4-standard-deviation significance
(1160). There were, however, some problems with their analysis. The proposed KN(1420) shows up clearly in the K⁰π⁰π− spectrum and both the known K*(890) and the KN(1420) appear in the K+π− spectrum (figure P.3). The experimenters fitted the data with a third resonance at a mass of 1760 MeV.¹⁶ As the authors themselves admitted, "the fitted curve still has a very low probability for describing the data between 1.9 and 2.2 GeV [in the K+π− spectrum]. It appears possible that other structures may exist in this region, although with the current data it is not possible to extract their parameters. The KN(1760) is not clearly separated from these structures" (Carmony et al. 1971, 1161, emphasis added). One may get a good fit to the proposed hypothesis in the area of interest, but it may fail elsewhere. This analysis raises the question of whether one is fitting the data only to what one is looking for without considering alternative hypotheses. In this case the experimenters might have considered fitting a smooth curve to the region between 1.6 and 2.2 GeV. The problem of what hypotheses to use for fitting data is one that we will see again.
The use of χ² analysis is further illustrated in the work of Foley and collaborators (1971). At this time there was a controversy as to whether the A2(1320) meson consisted of a single peak or of two slightly separated peaks (the split A2
controversy).¹⁷ The experimenters fitted their observed mass spectrum with both a single-peak distribution and a double-peak distribution. They obtained a χ² fit of 35.1 for 37 degrees of freedom, a 55 percent probability for the former, whereas the double-peak fit yielded a χ² of 149 for 39 degrees of freedom, a probability of very close to zero,¹⁸ which they regarded as unacceptably bad
(Foley et al. 1971, 419). The superiority of the one-peak fit is seen clearly in figure P.4. The group also remarked that within the region 1.2 to 1.4 GeV, the observed distribution about the fitted curve is consistent with statistical fluctuations. However the bins 1.415 to 1.435 contain 37 events where 18 are predicted by the fit. The probability that this is a statistical fluctuation is < 10−4. . . . Hence we conclude that there is probably a narrow peak in the K⁰K− mass at ~1.425 GeV
(Foley et al. 1971, 415).¹⁹ The observed effect is more than four standard deviations, although this is not explicitly mentioned.
Another problem that arises in this type of analysis, and in later experiments, is how does one calculate the appropriate number of standard deviations when selection criteria, or cuts, are applied to the data.²⁰ Consider the claim made by Ming Ma and Colton (1971) that they had found Evidence for a Narrow N*(1470).
They studied the reactions 1) pp → p(nπ+) and 2) pp → p(pπ⁰). Although the pπ⁰ mass spectrum shows no significant enhancements, an enhancement near 1.46 GeV/c² can be seen in reaction 1)
(Ming Ma and Colton 1971, 334) (figure P.5). In order to enhance the observed effect, a cut was made requiring the Mπ+p be greater than 2.4 GeV/c², which would reduce a known source of background. When this cut was made, peaks stood out clearly at approximately 1.47 and 1.65 GeV/c²: The combined signal in the mass region between 1.425 and 1.500 Gev/c² stands at the 6-standard deviation level [this was for the combined spectra for reaction 1) and 2) shown in figure P.6]
(335). If one looks at the Review of Particle Physics, 2010
(Nakamura et al. 2010), however, one finds no mention of a nucleon resonance at 1470 MeV/c².²¹ A six-standard-deviation effect had disappeared.
The possible overreliance on standard deviation estimates as a measure of significance suggested above is clearly illustrated in the work of Maglich and collaborators (1971). In this paper a claim was made that a new neutral boson with mass 953+1−2.5 MeV had been discovered by a "20-standard deviation peak (Maglich et al. 1971, 1479; emphasis added). The group explicitly rejected the previously discovered η’(958) as an explanation for their result:
If the mass value of the η’, which is 957.7 ± 0.8 is fixed in the program, it is rejected with a confidence level P(χ²) < 10−4" (1479). Their results are shown in figure P.7. As one can see, they were able to observe peaks at the masses of the known π⁰, η⁰, and ω⁰ particles at their previously measured masses. The insert in figure P.7 shows the details of the spectrum in the region of 900 MeV. The claimed 20-standard-deviation effect has a probability of approximately 10−86, or about the same probability of getting 288 consecutive heads in the toss of a fair coin. Yet if one looks at Nakamura et al. (2010), only the η’(958) is listed. There are other instances in which results that have large reported statistical significance have disappeared. This, of course, raises the question of whether the standard deviations are being calculated correctly or whether the result is an artifact caused by the application of selection criteria.²² In this case one might suggest that the experimenters underestimated the uncertainty in their mass determination and had mistaken a slightly displaced η’(958) for a new particle.
It is clear that the four-standard-deviation criterion for a significant effect was well-established by the mid-1970s. In a Search for Heavy Narrow Resonances produced by Photons with Energies up to 11.8 GeV,
Theodosiou et al. (1976) stated that within the sensitivity of the experiment no evidence for any narrow new resonances was found
(126).²³ They noted that "the limits quoted . . . are based on the number of events that one would have detected for a 4-standard-deviation signal on top of the background (128; emphasis added). In order to demonstrate that their experimental apparatus and analysis procedure would have detected such an effect, they plotted both the observed e+e− mass-squared spectrum as well as that spectrum with a four-standard-deviation bump added to it, which was accomplished by adding 160 events from the decay of a resonance with mass equal to 2.15 GeV² and a width of 0.29 GeV² (figure P.8).²⁴ It is quite clearly visible. The four-standard-deviation criterion was also used by Abolins and collaborators (1976) in their
Search for Narrow Two-body Enhancements at Fermilab. Their method was to compare their observed mass spectra for the invariant mass of particles produced in states by neutron-beryllium interactions, π+π−, π+K−, and p , with smoothed distributions generated with randomized tracks from different events. This was intended
to search for ≥ 4-standard-deviation enhancements. One such enhancement is observed in the π+K− mass distribution at mπK = 2.29 ± 0.03 GeV. . . . We note that this ‘4σ’ enhancement has a purely statistical probability of ~3% (Abolins et al. 1976, 418). The reader may be somewhat surprised at this estimate because the probability of a four σ deviation in a single bin is 0.0064 percent. The experimenters obtained this estimate by multiplying that probability by five hundred, the total number of bins.²⁵ In a footnote they remarked that
for comparison two ~3 standard deviation peaks are observed: p at 2.66 GeV and K−π+ at 2.42 GeV" (420). These were not noted in the text. A three-sigma effect was insufficient.
The increasing importance of standard deviations as a criterion of significance is shown in Search for Charmed Hadrons Using a Direct-Muon Trigger
(Bunnell et al. 1976). The experimenters noted that "figure [P.9] shows the distribution of standard deviations from the smoothed background curve for a representative data sample. While we see no significant deviations from Poisson statistics, there are three bins with probability less than one over the number of bins studied, i.e., ≤ 1.3 × 10−5 (Bunnell et al. 1976, 87). A probability of ≤ 1.3 × 10−5 corresponds to slightly more than four standard deviations. The experimenters found three such peaks but did not regard any of them as establishing the existence of a new particle, but they did note that,
because of its quantum numbers, the enhancements at m(K+K−) = 1984 MeV/c² is the most promising one for further investigation" (87).
There were, however, also papers that made no use of standard deviations. In a search for π-μ bound states Coombes et al. (1976) presented a graph of their results for α, the parameter of interest (figure P.10), claiming that the data show a clear peak at the predicted point containing a total of 21 events with an estimated background of 3 events. . . . We conclude that we have observed Coulomb bound states of pions and muons
(251). The combination of the graph, the size of the signal, and the agreement with theory was sufficient to establish the existence of the effect.
The Reign of Four Standard Deviations
The reign of four standard deviations continued into the 1980s, albeit without the rigor that the statistical significance of a result would assume in the 2000s.²⁶ Although the four-standard-deviation criterion was acknowledged, papers were published with lower statistical significance. In some cases the number of events above background and the statistical uncertainty were given, but the number of standard deviations was neither calculated nor presented. This was left as an exercise for the reader. In addition, there was no correlation between the statistical significance of a result, the number of standard deviations of that result, and the title of the paper. Thus, we have papers titled Evidence for
and Observation of
that report results having the same statistical significance.
Consider Evidence for Two Narrow p Resonances at 2020 MeV and 2200 MeV
(Benkheiri et al. 1977). The experimenters presented graphs of their results (figure P.11) and noted that for the 2020 MeV peak the total sample gives a signal/noise = 153/409 (~7.6 s.d.). For the 2200 peak the sample of events with a Δ⁰(1232) selection gives a signal/noise = 58/82 (~6.5 s.d.)
(485). They concluded that we observe two narrow p peaks at 2020 and 2200 MeV, with a significance of more than 6 s.d.
(485). In contemporary work this would qualify for an Observation of
title. In examining the decay angular distributions they remarked that "the compatibility between the 2200 MeV resonance and the background exists only at 3 s.d." (485; emphasis added). The group also remarked on a suggestive peak at approximately 1930 MeV but presented only a figure (figure P.12, the small peak on the left) and no number of events or standard deviations.
That same resonance at 1930 MeV was reported in Observation of a Narrow p Enhancement at 1940 MeV
(Daum et al. 1980). The experimenters remarked that the situation with the narrow baryonium states
was quite confusing. Only the 1936 MeV resonance had been seen in more than one experiment, but it had not been seen in all experiments. The group took data with beams of protons and of positive and negative pions. The resonance was seen only in the proton data. They reported a peak of 36 ± 9 events after background subtraction and observed a 4-standard-deviation enhancement in p states produced inclusively in 93 GeV proton interactions, while there is no enhancement in the pion-induced reactions
(478). Four standard deviations was, at the time, sufficient for an observation.
A somewhat different case was presented by Russell and collaborators (1981). In this paper the group was investigating a particle that was already regarded as well established. They were attempting to better determine its mass. The experimenters presented a graph of the combined data for pK⁰S and K⁰S final states (figure P.13), and they stated that a clear signal is seen at the charmed baryon mass. A fit of the mass distribution to a single resonance and a smooth quadratic background gives a signal of 55 ± 10 events above a background of 85 events
(Russell et al. 1981, 800). The group also commented that "for the π-π-π+ mode, where a signal has been previously observed in photoproduction, an enhancement of marginal significance is present in the inclusive mass distribution (801; emphasis added). The number of signal events given was 140 ± 48 events, a 2.9-standard-deviation effect. This was regarded as of
marginal significance."
Roos et al.’s (1982) Review of Particle Properties,
the Particle Data Group’s discussion of particle resonances, stated that in general we accept such peaks if they are experimentally reliable, of high statistical significance or observed in several different production experiments
(xi). No explicit criterion was given, however, for high statistical significance.
Thus, we see that in the early 1980s the four-standard-deviation criterion seems to have been in effect, although it was not always explicitly used or mentioned. This continued through the 1980s. In 1986, for example, we find four papers whose titles begin Observation of
in volume 56 of Physical Review Letters, and in three of these four papers, the significance of the results is expressed in standard deviations, although, as usual, graphs of the data and numbers of events are also included. The fourth paper, however, presented no standard deviations but showed the result in a graph.
In the first of these papers, Observation of a Narrow K State in J/Ψ Radiative Decays,
Baltrusaitis et al. (1986) looked for such a state, named the ξ, in the decay of the J/Ψ particle. For the K+K− final decay state they reported a statistical significance of ~4.5 standard deviations, whereas for the K⁰S K⁰S final state the significance was ~3.6 standard deviations. The results for both decay states were combined and are shown in figure P.14. The significance of the signal,
Baltrusaitis et al. report, was determined by comparing the likelihood for the fit described above [this included the new K+K− state] with that obtained for a fit containing no signal, is found to be 4.5 standard deviations (s.d.). Other choices of background parameterization lead to fits in which the statistical significance of the ξ signal varies from 3.9 to 5.8 s.d.
(109). A similar procedure for the K⁰S K⁰S final state yielded results that varied from 3 to 4.7 standard deviations. Baltrusaitis and collaborators varied the background parameters used to guard against the possibility that the observed effect was an artifact produced by the particular choice of background. This is, as discussed later, an important safeguard that is not always employed.
In Observation of a New Charmed Meson
(Albrecht et al. 1986), the group reported a new state, the D*⁰(2420), which decayed into D*±(2010) π−+. For one set of selection criteria, they noted that a prominent peak is seen around 410 MeV. A Breit-Wigner form for the signal, plus a threshold factor times a second-order polynomial for the background were fitted to the mass difference distribution. . . . The statistical significance of the enhancement is 3.9 standard deviations
(550). They combined the data for the two final states and remarked that the combined significance of the effect is 4.9 standard deviations
(551) (figure P.15).
Similarly in Observation of a Narrow Enhancement in ΦKK and Φππ Final States Produced in 400-GeV p-N Interactions
(Green et al. 1986) the experimenters used both χ² analysis and standard deviation analysis. For the ΦKK final state they state that there is a clear enhancement in ΦKK. . . . The chi squared per degree of freedom (χ²/DF) is 51/36 [note that this has a probability of 5 percent of getting worse fit to their data] with a 4.3 standard-deviation (σ) excess of 222 ± 52 events above background
(1640). For Φππ they found a (χ²/DF) = 35/39 (note that this has a probability of 65 percent of getting worse fit to their data) and a 3.9 standard-deviation excess of 213 ± 55 events in their peak. Graphs of their data were also presented.
In Observation of the Decay B → FX
(Haas et al. 1986), however, neither the statistical significance of the result nor the number of events in their peak were given. Clear peaks shown in figure P.16 were deemed sufficient to establish the existence of the