“You may speculate from the day that days were created,
but you may not speculate on what was before that.”
—Talmud, Tractate Hagigah 11b, 450 A.D.
To go back to the beginning, if there was a beginning, means testing the dominant theory of cosmogenesis, the model known as inflation. Inflation, first proposed in the early 1980s, was a bandage applied to treat the seemingly grave wounds cosmologists had found in the Big Bang model as originally conceived. To call inflation bold is an understatement; it implied that our universe began by expanding at the incomprehensible speed of light … or even faster! Luckily, the bandage of inflation was only needed for an astonishingly minuscule fraction of a second. In that most microscopic ash of time, the very die of the cosmos was cast. All that was and ever would be, on a cosmic scale at least—vast assemblies of galaxies, and the geometry of the space between them—was forged.
For more than 30 years, inflation remained frustratingly unproven. Some said it couldn’t be proven. But everyone agreed on one thing: If cosmologists could detect a unique pattern in the cosmos’s earliest light, light known as the cosmic microwave background (CMB), a ticket to Stockholm was inevitable.
Suddenly, in March 2014, humanity’s vision of the cosmos was shaken. The team of which I had been a founding member had answered the eternal question in the affirmative: Time did have a single beginning. We had proof. It was an amazing time indeed.
For weeks I had known it was coming. Our entire team was furiously working to finalize the results we would soon make public. We had relentlessly reviewed the data, diligently debating the strength of the findings, discussing what could be one of the greatest scientific discoveries in history. In the intensely competitive world of modern cosmology, the stakes couldn’t have been higher. If we were right, our detection would lift the veil on the birth of the universe. Careers would skyrocket, and we would be forever immortalized in the scientific canon. Detecting inflation equaled Nobel gold, plain and simple.
But what if we were wrong? It would be a disaster, not only for us as individual scientists but for science itself. Funding for our work would evaporate, tenure tracks would be derailed, professional reputations ruined. Once gleaming Nobel gold would be tarnished. Glory would be replaced by disappointment, embarrassment, perhaps even humiliation.
The juggernaut rolled on. The team’s leaders, confident in the quality of our results, held a press conference at Harvard University on March 17, 2014, and announced that our experiment, BICEP2, had detected the first direct evidence of inflation—evidence, albeit indirect, of the very birth pangs of the universe.
Cosmologists were expecting a whisper. We claimed BICEP2 had heard a roar.
BICEP2 was a small telescope, the second in a series of telescopes located in Antarctica. I had co-invented the first telescope (BICEP) more than a decade earlier, when I was just a lowly postdoc at Caltech. BICEP sprang out of a deep obsession I had long had with making the invisible birth of the universe visible. And it wasn’t lost on me that, if we succeeded, the Nobel Prize would be the most tangible reward for the discovery.
BICEP’s design was simple. It was a small refracting telescope—a spyglass like Galileo’s, with two lenses that bent incoming light and directed it not to the human eye but to modern, ultrasensitive detectors. The telescope needed to be at an exquisitely pristine location, and we found one: the South Pole. Our goal was to capture the aftershocks of cosmic inflation, a signal imprinted on the afterglow of the Big Bang—the CMB, which permeates all of space.
For years BICEP2 looked for a swirling, twisting pattern (called a B-mode polarization pattern) in the CMB that cosmologists believed could only have been caused by gravitational waves squeezing and stretching space-time as they rippled through the infant universe. What could have caused these waves? Inflation and inflation alone. BICEP2’s detection of this pattern would be evidence of primordial gravitational waves generated during inflation, all but proving that inflation happened.
Then we saw it. There was no going back.
The broadcast from Harvard’s Center for Astrophysics captivated media around the world. Nearly 10 million people watched the press conference online that day. Every major news outlet, from The New York Times to the Economist to obscure gazettes deep within the Indian subcontinent, covered the announcement “above the fold.” My kids’ teachers had heard about it. My mother’s mahjong partners were kvelling about it.
Watching the live video, I could see MIT cosmologist Max Tegmark reporting the event. He wrote, “I’m writing this from the Harvard press conference announcing what I consider to be one of the most important scientific discoveries of all time. Within the hour, it will be all over the web, and before long, it will lead to at least one Nobel Prize.”
Finally, we’d seen what we, and the whole world apparently, had wanted to see. The BICEP2 team’s announcement was that we had read the very prologue of the universe—which, after all, is the only story that doesn’t begin in medias res.
Still, doubts plagued me. It sure seemed to be a discovery for the ages. But was it? No one is immune from confirmation bias. And scientists, despite what you may think, are rarely mere gatherers of facts, dispassionately following data wherever it may lead. Scientists are human, often all too human. When desire and data are in collision, evidence sometimes loses out to emotion. It was impossible to rule out every possible contaminant. Had we fretted enough?
The most worrisome aspect of BICEP2’s signal was how huge it was. It was shockingly big, more like finding a crowbar in a haystack than a needle, as one team member phrased it. At the time of our announcement, we were worried about being beaten by our chief competitor, a $1 billion space telescope called the Planck satellite with the perfect heavenly perch from which to scoop us. Prior to BICEP2’s press conference, Planck had already ruled out a B-mode signal half as big as the one we claimed to have observed. Cosmologists were expecting a whisper. We claimed BICEP2 had heard a roar.
Planck represented serious competition: It had a heavenly vantage point 1 million miles above Earth, free from gravity and atmospheric contamination alike. Planck possessed the perfect perch from which to scoop us. Worse yet, the BICEP2 telescope had been disassembled two years earlier. We couldn’t exactly go back and check to see if we had taken the lens cap off. But we could make use of our most powerful weapon: data, and lots of it.
We began by testing it for consistency by dividing the massive data set in half and making two maps, one from BICEP2’s first 18 months of observations and one from the second 18 months. The two maps showed the same signal, albeit with lower signal-to-noise ratio (because each map had only half the amount of data as the two maps put together).
To prevent mistakes, carpenters say, “Measure twice, cut once.” Well, BICEP2 astronomers cut the data dozens of ways, looking for discrepancies in data from one set of detectors versus another, or differences between when the telescope was scanning to the right versus to the left. We tortured the data in every conceivable way, each scientist on the team trying to concoct ever more outlandish scenarios that we had overlooked. Even if extraterrestrials had created our signal, the implications might have been less astonishing!
But what if we were wrong? Glory would be replaced by disappointment, embarrassment, perhaps even humiliation.
When I speak in public and am introduced as a cosmologist, I like to joke that you sure don’t want me doing your hair and nails. Many people don’t know that the similarity between cosmology and cosmetology is more than skin deep. They both have the prefix cosm, which is the Greek word for “adornment,” as in the beautiful face the universe shows us. When I saw the BICEP2 data arranged into a map, the pattern of whorls and swirls took my breath away. It was exactly what inflation predicted we’d see, and it was love at first sight. The cosmos wasn’t just beautiful. It was showing off.
Our exhilaration was mixed with a sense of foreboding. After a yearlong inquisition, it became clear: The signal was not coming from the South Pole, the atmosphere, nor BICEP2 itself. Where else could it be coming from, if not inflation?
One possible answer was that we’d seen the same material that had bedeviled so many astronomical discoveries since Galileo’s time: dust.
Everyone knew that B-modes could come from interstellar dust in the Milky Way: Microwaves scattering off dust within our own galaxy could generate the pattern we saw. Might it make up the entire signal we were now seeing? How could we prove it was not dust, but the imprint of gravitational waves on the cosmic microwave background?
Though we had selected the Southern Hole—the patch of sky where BICEP2 hunted for B-modes—based on the low level of dust predicted by the best available models, we didn’t know for sure if it was as free of contamination as we’d expected. What we really needed were high-frequency data.
Earlier I mentioned that the amount of polarization produced by dust increases very steeply with frequency. BICEP2 worked at 150 GHz only, corresponding to wavelengths of approximately 2 millimeters. Doubling the frequency would more than triple the dust signal. If dust were producing our B-modes, it would be obvious at 300 GHz … if only we had data at such high frequencies.
In truth, such a map did exist, one with the exact high-frequency data we needed. There was only one catch: It belonged to our competitor, the Planck satellite. And in early 2014, the Planck team hadn’t yet released their B-mode polarization data. We were scared Planck might not only hold the key to proving our measurement right, but might have already glimpsed the inflationary B-mode signal before we did. If it really was as large as we thought it was, it was well within Planck’s grasp.
We desperately tried to work with the Planck team, while being careful not to tip them off as to what we’d found. It was a perilous line to walk. Science teams that sometimes collaborate can be in competition at other times, particularly when there is a well-known goal or target signal both are looking for. This is a troublesome aspect of science; many of us treat the data as if it’s “ours” when, in fact, it belongs to the people paying the bills: the taxpayers.
BICEP2 had much more sensitive data, but Planck’s was broader, covering the whole sky and at many more frequencies than BICEP2 had. After everything else was ruled out, frequency coverage held the key to our fate.
The Planck team wouldn’t cooperate. Either they didn’t have the data we wanted, or they did have it and they were going to scoop us. We had to go it alone. What BICEP2 lacked in frequency quality, we compensated for with quantity. We made five different models for the dust, each based on old data—the same data that we’d used to choose BICEP’s observing region nearly a decade before.
Each of the five models predicted the total emission—the total heat produced by dust—at a particular region in the galaxy, but none of them could predict how much polarization we could expect in the Southern Hole. So, from these data, we extrapolated what galactic dust emission would look like in our patch if it were also slightly polarized. We played the guessing game, trying to be conservative, and eventually settled on a level of about 5 percent for our simulations.
Then came a revelation: We noticed that a Planck team member, Jean-Philippe Bernard, an expert on the Milky Way’s polarization, had given a talk earlier that year which was posted online. Bernard showed an actual picture of Planck’s dust measurements: a map of the sky as seen by our competition. It was a treasure map, with polarized “X”s marking the spot of sure Nobel gold.
As soon as we discovered it, one of our team members digitized Bernard’s slide, revealing by extrapolation the formerly forbidden Planck data. We knew it was an unorthodox approach. In fact, it didn’t sit well with many of us. We took unpublished data, a single qualitative image, and digitized it, turning it into quantitative information. By doing so, we obtained a new model, one unavailable when we began taking data with BICEP, with exactly the information we craved.
It was time: Publish, or else our Nobel dreams might perish.
Planck had not published this map and they likely had their own systematic errors to worry about. But the slide was public and freely available, giving us the green light to use it if we explained our methodology. But, if we went public, how much weight should this contraband slide carry? At first it was a curiosity, a digital trick to make us feel more confident. Then, a few months later, it snowballed, becoming a major link in the chain of reasoning assuring us that galactic dust was safely ignorable … and confirming something beyond our wildest hopes when we started: We had discovered B-modes from inflation.
Using the slide made me uncomfortable. On conference calls and in emails I complained to BICEP2’s leaders. I wanted clarification: Were we sure we had accurate measurements of dust? I was concerned that BICEP2’s results had already been ruled out by Planck. Polarization of dust was the most obvious explanation for a signal we could see that Planck couldn’t.
“How can we use slides that were shown in a talk but not intended for any quantitative purpose?” I asked in an email to the whole team. The leadership replied to my email, saying that it was fine to use the slide if we stated the assumptions we’d made.
Plus, the Planck slide merely confirmed the results of the other five models we had, all of which showed that dust wasn’t a plausible explanation for the bright B-modes we saw. Planck’s slide would be but one piece of evidence, and not the most definitive piece of evidence at that. That distinction belonged to my precious BICEP, which had been renamed BICEP1. Unlike BICEP2, which observed the sky at a single frequency—150 GHz, where the CMB is brightest—BICEP1 had three frequency channels, at 90, 150, and 220 GHz. With the benefit of these other frequency channels we could exclude, to some extent, the impact of dust above a certain level.
We could use Planck’s slide, because it wasn’t the main line of evidence. That most convincing evidence came courtesy of BICEP1, which said dust wasn’t the cause of our signal, and we were 95 percent confident about that. In other words, dust had only 1 chance in 20. Would you enter a lottery, the biggest one in cosmic history, if you had “only” a 95 percent chance of winning? Of course you would!
John Kovac made one last plea to the Planck team for their actual data, but again was denied. I figured Planck was about to scoop us. Waiting wasn’t going to help. The Planck slide combined with BICEP1’s data convinced all 49 of us, including me. I got off of my high horse. It was time: Publish, or else our Nobel dreams might perish.
Within three weeks of the press conference, 250 scientific papers had been written about our results. That was astonishing; a paper is considered “famous” if it has 250 citations over the course of decades! Then, in early April, I got an email from the physicist Matias Zaldarriaga. How many times can he be congratulating me, I wondered?
“When the dust is low, but spread over a wide area, it betokens the approach of infantry.” —Sun Tzu, The Art of War
Matias’s April email was no “attaboy.” He was disturbed. He wanted to talk details. What did I know and when did I know it? It was the beginning of a trial I had long feared. Rumors were swirling at Princeton about the way we had used the infamous Planck slide. “People here in Princeton are very concerned about dust,” he said, ominously adding, “In fact they have managed to convince me that there is not a very good reason for me to believe it is not just dust. Have you looked into the foregrounds yourself?” Of course I had looked at the foregrounds—potential sources of contamination such as polarized emission from the Milky Way’s dust. The whole team had been worried about our galaxy producing spurious B-mode polarization that would masquerade as primordial gravitational wave B-modes. But data at low frequencies from BICEP1 and at high frequencies from Planck’s scrubbed PowerPoint slide convinced us we were okay.
A few days later, I got wind of a colloquium that Princeton University’s David Spergel had given just after the Harvard press conference. David said he had spotted a blunder in our results, that our data were contaminated by dust within the Milky Way galaxy. Soon, I found out there were others at Princeton laser-focused on the way we modeled dust. The BICEP2 leadership had anticipated an onslaught, perhaps even a backlash, from the Princeton folks, who were working on several competing B-mode experiments. Maybe they were just frustrated after being scooped on another major CMB discovery.
I asked Matias if it was David Spergel alone causing his concerns. Ominously, Matias said, “I think there is nothing else people here talk about.” My heart stopped. Princeton’s cosmology program is the top-ranked in the country—cosmology’s own Holy See, comprised of the world’s best experimentalists and theorists, among them multiple members of the National Academies of Sciences. It felt like an inflationary Inquisition, one that could put the BICEP2 results on a modern-day Index of banned pre-prints.
Imagine finding out the entire IRS is obsessed with your tax return. Not just one rogue auditor, but everyone, from the Secretary of the Treasury on down, fixated on your Form 1040! It was petrifying.
Matias told me that an outstanding young physicist named Raphael Flauger was leading a paper with Spergel and Spergel’s graduate student J. Colin Hill. Flauger had convinced Matias that the Milky Way’s dust polarization was higher than what the BICEP2 scientists had assumed. We were vulnerable to the same sort of tactics we had employed in utilizing the unpublished Planck slide; they could digitize our results before we released them. Live by the slide, die by the slide.
Matias added, “Don’t get me wrong. Obviously, there is nothing more I would want than the result to be correct. But the discussions here have shaken my confidence and thus I hope you guys respond to the skeptics with a detailed explanation of exactly what you did with those Planck slides.”
By early May, Flauger and his collaborators had finished their analysis, and it didn’t look good for BICEP2. According to Flauger, we had used an incorrect estimate of the level of dust polarization in the Planck slide, a value four times lower than we should have used. If true, BICEP2 would go down as the most celebrated dust detector in history—tricked, like so many before us, by a dirty mirage.
But Flauger’s analysis wasn’t conclusive. He himself remained dispassionate, saying, “I hope there still is a signal. I’m not trying to pick a fight; this is how science works, that someone presents a result and someone else checks that. But it doesn’t usually happen in public like this.” He and his colleagues, as well as Uroš Seljak and Michael Mortonson, claimed our interpretation of Planck’s results was suspicious; but this didn’t mean we were wrong. Only new data, data unavailable to either BICEP2 or the groups doing the reanalysis, could tell us if the ax would eventually fall. The jury was still out.
Flauger’s analysis was thorough, and it took several weeks for the cosmology community to digest it. A tense atmosphere settled over the CMB community; this was a cosmic cliffhanger, slowplaying us all.
The beginning of the summer found the BICEP2 team in full panic mode, analyzing and reanalyzing data, responding to referee reports and putting out fires in the media and at scientific conferences. Paralleling our scientific battles was a battle in the media about the media. In particular, the propriety of the Harvard press conference became one of the hottest topics in all of science. The criticism we received about the way BICEP2 sought publicity was almost as intense as the heat we took for using the scrubbed Planck PowerPoint slide.
Scientists, pundits, and journalists alike questioned the decision to announce our findings at a press conference before peer review had been completed. While it’s impossible to know whether holding a press conference was good or bad for us specifically, the issue of if, and when, press conferences should be held is an important question. Such decisions are always stressful. For a physicist, a press conference is likely a once-in-a-lifetime event. If your results are correct, a press conference-worthy discovery might result in a Nobel Prize. If your result is erroneous, it might be the end of your research … and its press coverage.
For BICEP2, the standard practice—a months-long peer review process, which would then be followed by a press release—had many disadvantages, any of which, individually, were worrisome. In total, they were completely unpalatable. First off, during the peer review, rework, and resubmission cycle we could have been scooped by the competition. Second, we feared that sending the paper to a journal would be unfair, giving a particular group—referees and their friends—a head start on proposal submission. My field is so competitive that the only people who weren’t on BICEP2 who could have reviewed the highly technical aspects of the paper were competitors. Our first priority was to make a scientific presentation to communicate our results to all our peers in the cosmology community. By releasing BICEP2’s papers and data online, we allowed the entire community, not just two referees, to immediately begin a technical review. While some scientists praised our decision to go public first, analogizing our decision to the announcement of a blockbuster new drug, the criticism of BICEP2’s crowdsource approach was, at times, brutal. New York Times reporter Dennis Overbye noted that this approach to the scientific sausage-making process wasn’t pretty, calling it a “dissection … a rare example of the scientific process—sharp elbows, egos and all.”
Three months after the press conference, in June 2014, the peer-reviewed version of the paper was published in Physical Review Letters. Taking the advice of two anonymous referees, we removed all trace of the dust data we took from Planck’s PowerPoint slide. Its deletion, we said, was due to the unquantifiable uncertainties involved in its analysis. But we were clear: BICEP2’s data were unimpeachable. It was only the interpretation which was up for debate. Planck promised to resolve the situation soon, because its newest data was set to be released in the next few months.
Planck had previously shown that the Milky Way’s dust emitted microwaves with a blackbody spectrum, just like the CMB. But the dust emission had a temperature of 20 Kelvin, instead of 3 Kelvin. Since the total energy of a blackbody increases as the fourth power of its temperature, the Milky Way’s emission was nearly 2,000 times brighter than the CMB’s emission.
One of Planck’s channels, its frequency band at 353 GHz, was nearly insensitive to anything besides dust; it was a kind of sacrificial channel dedicated not to the cosmological gold we sought, but to the cosmic schmutz that might be obscuring it. We all held out hope that Planck’s 353 GHz channel would be the salvation, quantifying the qualitative PowerPoint slide and allowing an unaltered conclusion. It was going to be a long, hot summer.
With the Planck 353 GHz paper appearance came the beginning of the end of the BICEP2 team’s inflation elation. Although the Planck team was careful to release no data for the Southern Hole, the field where BICEP2 observed—perhaps out of fear we would digitize it—they made a blunt assessment of the potential amount of dust polarization contamination in the Southern Hole, saying it was of “the same magnitude as reported by BICEP2.” This meant dust was as likely a culprit for our B-modes as were inflationary gravitational waves.
Later, the Planck team produced an image of the Milky Way’s dust polarization, finally including our patch of sky, the Southern Hole. It was mesmerizing; large swaths of sky festooned with azure streamers, whorls of ocher, and swaths of amber garland. Dust was showing off in all its Van Gogh vainglory. “Visible certainty,” Galileo likely would opine, as he had with his Pleiades hypothesis. But this time he’d be devastatingly right. It was over. Eden had sunk to grief. Our Nobel gold couldn’t stay.
BICEP2 turned out to be a very precise dust detector. It also showed the public how science works: You put out a result, and other scientists work to test the result. You put your cards on the table, and leave it all out there for your critics. If and when they attack, you defend until you can defend no longer and the attacks subside. Only then, when both critic and supporter collapse, exhausted, can science be said to be settled.
For the BICEP2 retraction, there was neither press conference nor viral YouTube video. And while Planck, the fearful enemy fighter on our tail, came clean about the amount of dusty B-modes that our galaxy produced, they never did say anything about cosmic B-modes produced by inflation. It was BICEP2’s vision which was clouded: a bit by fear, a bit by greed, and mostly by bits of dust.
Brian Keating is a professor of physics at the University of California, San Diego; a Fellow of the American Physical Society; and the director of the Simons Observatory. He received the 2007 Presidential Early Career Award for Scientists and Engineers for his work on BICEP. Follow him on Twitter @DrBrianKeating
Adapted from Losing the Nobel Prize: A Story of Cosmology, Ambition, and the Perils of Science’s Highest Honor by Brian Keating. Copyright © 2018 by Brian Keating. Used with permission of the publisher, W.W. Norton & Company, Inc. All rights reserved.