Resume Reading — The Astrophysicists Who Faked It


The Astrophysicists Who Faked It

The inside story of the gravitational wave signal injection.

At 2:40 a.m., my phone woke me up. At least one of us was always on shift, and that night in September of 2010, I had volunteered…By Jonah Kanner & Alan Weinstein

At 2:40 a.m., my phone woke me up. At least one of us was always on shift, and that night in September of 2010, I had volunteered to respond to automated text messages from our alert system.

As a graduate student at the time, I (Jonah) had helped build the first quick-response alert software pipeline for two gravitational-wave observatories, called LIGO (Laser Interferometer Gravitational-wave Observatory) and Virgo. This system was designed to search for astrophysical signals in data as it arrived, to alert people who could check if a signal seemed valid, and share the message with astronomers around the world if needed. Every alert carried the possibility of a positive detection—humanity’s first direct observation of waves traveling through the fabric of spacetime, predicted by Einstein in 1916.

I got out of bed and made a sleepy-eyed walk to the small workstation we kept in our apartment. I didn’t know it, but the alert was the beginning of a professional and emotional rollercoaster. I logged into our event database and started browsing plots. I didn’t stay sleepy-eyed for very long. The plots showed an unusually loud signal. More dramatically, the waveform showed the “chirp” pattern that we were all hoping to see, something characteristic of the gravitational-wave emission from a pair of black holes spiraling together and then merging. The chirp was familiar to me from simulations, but nobody had ever seen one appear naturally. I plugged in headphones and jumped on to a conference call.

Jackie Ferrentino

Nine of us—spread across the United States and Italy—began to talk the results over, wrestling with something too good to be true. Our hearts were racing. We needed to make a fast decision. If this dramatic signal was some kind of mistake, then there was no need for it to go further. After about 30 minutes of discussion, we agreed that the signal seemed valid, and pushed a button that spurred a collection of robotic telescopes to swing their gaze to the source location. Our log notes, usually dry, captured what we were all thinking that night: “Exciting!!!!! Very strong significant event …”

Einstein’s prediction that gravitational waves exist was debated by theoretical physicists for decades. It took until the 1960s for it to finally be accepted. Half a century later, there still hadn’t been a single direct detection of them. Tonight seemed like that could all change, and that I would be part of the discovery.

Or maybe not.

Do you remember your last fire drill? Was there a tense moment when you asked yourself, what if this is the real thing? That was the thought running through all our minds that September night.

About a year beforehand, in the fall of 2009, the LIGO and Virgo collaborations had agreed to set up a mechanism by which our own data could be faked. We created a small team that had the authority to secretly add a simulated signal to our gravitational-wave detectors, and then hide that fact from the rest of us.

This might seem, at first, like a needlessly masochistic thing to do. Measuring ripples that distort spacetime by less than the diameter of a proton is hard enough. Why, on top of this, would we try to trick ourselves?

To understand the answer, consider what it’s like being a scientist on the verge of something great, someone for whom playing a role in a new discovery is one of the greatest thrills possible. The opportunity to learn something new about the universe, to observe something that no one has ever seen before, is an incredible motivator. It’s one of the reasons people choose careers in science in the first place, and what keeps us working night shifts and weekends year after year. A major discovery also changes careers and reputations for both researchers and institutions.

The problem is that these high stakes don’t always mix well with the process of discovery and confirmation, which is often detailed, technical, tedious, and subtle. In a modern, complex experiment, distinguishing new science from an instrument artifact or a routine occurrence may be far from obvious. Our collaboration made the decision that the moment of potential discovery—when emotions are running high and reputations are at stake—is not the time to define a procedure for confirming a potential major scientific discovery. That time should come earlier, during a drill.

The ground-rules of the LIGO/Virgo drill were simple: We were told that a small number (perhaps zero) of simulated gravitational-wave signals (hardware injections) would be added to the data during our 2010 observing run. Each experiment searched for gravitational waves by monitoring the distance separating two distant masses. If a wave passed through, those distances would shrink or expand. The hardware injections mimicked a tiny change in the separation distance by gently nudging one mass with a flickering magnetic field, causing it to swing by a millionth-millionth-millionth of a meter. The resulting blip in the main data channel would look like the expected signal from merging pairs of black holes or neutron stars.

In science, the question of when to believe is a deep and ancient problem.

We wouldn’t be warned ahead of time or told afterward—not for a while, at least. Only a very small team of “blind injectors,” sworn to secrecy, would know the timing and nature of the injections. The team included about five people—those with the technical know-how to execute the injections. Even a lot of the top management would be kept in the dark. The blind injectors would leave evidence behind in off-limits data channels that the rest of us were instructed not to look at, on our honor.

Those of us outside of the blind injector group only had one choice: Treat the data like the real thing. This really messed with our minds. On the surface, here was the signal we had spent over 20 years trying to find. If the signal was real, our job was to analyze it as quickly as possible, burning the candle at both ends.

But we also knew there was a real possibility it might be a fake. The hardware injection team might be secretly laughing at our enthusiasm. Imagine bidding on a $100 million painting, all the while not being sure if it’s a forgery. We were working as hard as we had ever worked in our life, and every day we teetered between exhilaration and exhaustion. Would this effort really pay off, or was it all a big joke?

IT CAME FROM SPACE: The source of gravitational waves detected on Sept. 14, 2015 is shown on this sky map of the southern hemisphere. The colored lines represent different confidence levels: purple is 90 percent, yellow is 10 percent.LIGO/Axel Mellinger

We nicknamed the event “the Big Dog” because we (erroneously, it turned out) localized it on the sky in the direction of the constellation Canis Major (and because, at the time, Harry Potter’s Sirius Black was very popular). For the next six months we labored over the data and ran through a suite of hardware checks. We developed new analysis tools and tried to figure out if the event was due to instrument or other terrestrial noise. The data passed every test.

We wrote a discovery paper; one of us (Alan) was a data analysis group chair and editor of the paper. We agonized over the words in the paper title: First Detection? First Observation? Discovery? Evidence for? Could we really claim “first detection,” if there was already a 1993 Nobel Prize? Our collaboration was large, and the spectrum of attitudes was broad. Some wanted to be extremely cautious and only claim “evidence,” not “detection.” Some insisted that we hold off on publishing until we see more events. Others wanted to be bolder; they felt that we had enough confidence to avoid being ambiguous or wishy-washy about our claim.

Hundreds of colleagues weighed in; endless discussions and arguments were applied to every word. Getting 700 skeptical scientists to agree on all the words in a paper, and on the required level of confidence, was a huge task, with complicated sociology. (In fact, one sociologist—Harry Collins—has written two books about the struggle of the gravitational-wave community to accept or reject various claims of detection.) In the end, we settled on the title “Evidence for the Direct Detection of Gravitational Waves from a Black Hole Binary Coalescence.” You can almost hear the compromise in that title.

In March 2011, we gathered in a hotel near Arcadia, California to review all the evidence and the paper draft, and vote on submitting it to a journal. There were more than 300 people in the room and about another hundred more connected via the Internet. We brought lots of champagne. We discussed. We voted to approve the paper draft. Speeches were made celebrating the long road we had traveled, from building incredible detectors, to finding a signal, to finally executing the entire procedure for claiming a detection. We opened the champagne.

Then Jay Marx, the director of the LIGO Laboratory, who had been carrying a tattered envelope in his pocket for more than six months, took to the stage. He was about to open the envelope and tell us whether the whole thing had been a fraud.

If you came home from work and announced, “today I saw a flock of geese flying across the sky,” it’s unlikely anyone would doubt you. But what if you said, “today, I saw dragons flying across the sky.” Would your family accept your story? How much proof would you need? What if you really had seen dragons, but it was a one-time event?

We were preparing to announce a positive detection of gravitational waves based on a single event. We didn’t know how rare the waves would be—it was perfectly possible for them to be so rare that the next event would not happen in our lifetimes. How much evidence would be enough? A common guideline in physics is that a new discovery requires a “5-sigma” level of evidence, meaning an event is unlikely to be noise fluctuation, with a confidence of more than 99.9999 percent. We calculated that our candidate event was “louder” than the loudest and most rare noise fluctuations that we might encounter in thousands of years of LIGO and Virgo observations.

So far so good. But quantifying evidence that the signal was not noise is not the same as quantifying confidence that it was a real signal. In fact, since nobody had ever directly seen a gravitational wave, we had no real way to express our confidence that it was real. Perhaps gravitational waves don’t exist, and there were no astrophysical signals for us to detect. If you believed that, you would reject our signal as due to noise or malfunction, no matter how unlikely that might be.

In science, the question of when to believe is a deep and ancient problem. There is no universal answer, and evaluating the merits of any potential discovery always includes considering the prior beliefs of the people involved. There is no way around this.

So what were our prior beliefs? By 1975, there was clear and convincing evidence of the existence of gravitational waves from observations by the radio astronomers Russell Hulse and Joseph Taylor (who won the 1993 Nobel Prize in physics for their work). They observed a pair of compact neutron stars orbiting each other and losing orbital energy. Einstein’s theory predicted that gravitational waves would carry orbital energy away, reproducing exactly what Hulse and Taylor were seeing. So, the evidence from radio astronomy made it seem likely that gravitational waves exist.

We were less sure that we had the equipment sensitivity necessary to measure them. In 2010, the LIGO and Virgo detectors were “pathfinders”: We guessed that they would be insufficiently sensitive to detect gravitational waves from merging binary stars, but they would give us invaluable information for designing the next-generation advanced detectors. We also knew that both the observatories and their terrestrial environments had a variety of misbehaviors that could, if only very rarely, produce what would seem like an extraterrestrial gravitational wave signal. Our search pipelines were sufficiently complex that they might also have rare failure modes.

There could be a Nobel Prize waiting for us.

On balance, then, our prior beliefs were mixed. This mix tended to create two problematic approaches to interpreting the newly observed (and possibly fake) event: Trying to kill it (if it can’t be shown to be false, it must be real), and trying elevate it (if it’s really something new, we can’t be sure of what we are looking for, so let’s keep our eyes wide open and look for anything that smells real). Both approaches are dangerously biased, because both choose to seek one type of evidence and ignore another. Our real goal was to minimize bias, and avoid intentionally killing or elevating anything.

This was the genius of the fake signal injection: Whatever the prior belief of an individual scientist might be, it gave him or her reason to doubt it. A scientist who believed that the current generation of instruments was simply not up to the task would have to allow for the possibility that it was. A scientist tempted to elevate a signal because of the benefits of a real detection would have to temper his or her enthusiasm to avoid making a false claim. The fake injection bugaboo forced us to keep an open mind, apply skepticism and reason, and examine the evidence at face value.

So when Jay Marx opened his envelope in Arcadia in 2011 and told us all that the Big Dog was a Big Fake, and that we had just completed the first successful discovery fire drill in gravitational wave observation history, we still treated it as a moment of celebration. We raised glasses of champagne, and toasted our fake success. It was a strange, hollow feeling. But clearly, Big Dog had motivated a flurry of work, including big steps forward in our ability to measure the masses of the source objects (the neutron star or black hole) using only the gravitational wave signal. Most significantly, our collaboration had agreed for the first time what standards we’d use, and how we’d minimize our biases. For the first time, we had decided that we had enough evidence for a detection.

It’s hard to appreciate how important this is. Some gravitational wave experiments had suffered in the past from overstated claims, and the LIGO/Virgo collaboration had grown in size to the point that a consensus over standards was hard to achieve. It was not clear before 2011 that any level of evidence would be sufficient.

Now—finally—a fake signal had made us feel prepared for real success.

In September 2015, almost exactly five years after the Big Dog, our low-latency pipeline alert went off again. This time, it was first noticed by researchers at the Albert Einstein Institute in Hannover, Germany, who sent an email with the words “very interesting event” in the subject to the whole collaboration. I remember turning on my computer and finding my inbox busting at the seams. I gave up on the emails and called a colleague. He pointed me to a results page made with computer code I had helped write, which showed an obvious chirp signal. I got chills. I closed my eyes and took a second look. I stood up and stomped around the room. “What’s going on?” my wife asked me. “Well, I’m not sure,” I said. “But I think we found a gravitational wave.”

From 2010 to 2015, the Initial LIGO detectors at both the LIGO Hanford Observatory and the LIGO Livingston Observatory had been dismantled, and the Advanced LIGO detectors were built and installed in their place. These new detectors included a range of upgrades, all intended to improve the sensitivity of the instruments and make detections more likely. The improvements included larger mirrors, more powerful lasers, and active feedback loops for better seismic isolation. Instead of suspending the test masses by wires, they were now hung on thin strands of glass resistant to thermal vibrations. The improvements were designed to increase the reach of LIGO by 10 times, and so increase a thousand-fold the volume of space that could be searched for rare events.

We knew there was a real possibility it might be a fake.

We started an engineering run (designed to test our instruments, and not to take data) in late August 2015, and planned to officially start our first observing run on Sept. 18. During the engineering run, there would be no blind injections. By Sept. 12, the detectors were already working smoothly and with good sensitivity; so we left them undisturbed to observe the sky.

Good thing. In the early morning of Sept. 14, 2015, our detectors and associated search pipeline software identified an event consistent with a binary black hole merger some 1.3 billion light-years away. It was loud, and clearly inconsistent with detector or terrestrial noise, with a confidence of better than 99.9999 percent. Wow ... so soon after turn-on, and so loud; could nature really have been so kind? We named it GW150914.

But—could it be a blind injection, even though there were not supposed to be any? Within a few hours, many of us (including Alan) were on a telecon to discuss it. Mike Landry, one of the members of the 2010 “blind injection team,” was present, and said that there was no blind injection team convened for the engineering run, and that this wasn’t a blind injection. Had the discovery come a few days later, during our observing run, the blind injection team might have been active, and he might have been sworn to secrecy. But at this time, blind injections were simply not an option. Still, we doubted him. We asked to look at the secret blind injection data channel. The response: “Go right ahead.”

Spot the fake: The fake (left) and real (right) gravitational wave signals from 2010 and 2015, respectively. Data from The LOSC Team and The LIGO Scientific Collaboration

We looked and found nothing. The physicist in charge of hardware injections, Jeff Kissel, wrote so in our electronic logbook: “There were NO Transient Injections during G184098 Candidate Event.” It was his shortest e-log entry ever. Later, it won the award for best e-log entry of the year.

Okay. But could it have been a rogue injection? Perhaps some knowledgeable but disaffected colleague or ex-employee was carrying a grudge, and knew that a false injection would really cause misery among his or her ex-colleagues. This would not be so easy to do—or to rule out. The rogue(s) would have to get a lot of things right. The signal waveform would have to be precisely correct; the injections in our two detectors (separated by almost 2,000 miles) would have to be perfectly coherent in time, amplitude, and phase; and all of the data channels, secret or otherwise, would have to be sanitized to leave no trace. We checked dozens of data channels. Nothing was found that was at all consistent with a hardware injection of any kind.

We let ourselves conclude what we’d hoped for: This was not a drill. So the analysis began. Our starting point? The 2010 detection event. We reused that year’s detection checklist and adapted our old detection committee framework for the new data. We applied what we had learned about parameter estimation (our ability to measure the masses, spins, and other characteristics of a binary merger).

What we remember from the first few days of making the 2015 discovery was a surprising level of calm. In a time of intense pressure, our colleagues followed procedure, worked hard, and produced outstanding results. This was the greatest reward of the 2010 fire drill: We had the experience of working with high-stakes data, while trusting the process, the evidence, and our colleagues. If this were our first time encountering a “real” signal, doubt, worry, anxiety, and philosophical disputes could have kept us from ever saying with certainty, “this is real.” By learning to construct evidence that we had found the fake signal, we taught ourselves how to find a real one. The discovery fire drill had taught us how to use a set of evidence to believe in something extraordinary.

Interestingly, though, the question of “when do we have enough evidence?” reared itself again. Some researchers argued that we should not claim detection until we had seen a second black hole merger! If we never see a second one, the thinking went, how can we know the first one was not some kind of fluke? This discussion was contentious and tied to the fact that the advanced detectors had just started running. Some people argued that we needed to gain more experience with the new detectors to understand the noise and detector quirks. Others pointed out that if we got one real signal with just a few days of observing, the rate was likely high enough that we should expect to see more. Fortunately, nature gave us a break. A second probable detection arrived on Oct. 12, less than a month after the first discovery. The pairing convinced nearly everyone that we had enough evidence to publish. Better yet, in late December, there was another undeniably clear detection, leaving no room for doubt.

Five months after the detection of GW150914, having written more than a dozen papers about it (see, with the lead paper accepted for publication in Physical Review Letters, we publicly announced the discovery at a press conference at NSF headquarters in February 2016. This time, there was nothing hollow about the celebration.

Jonah Kanner is a research scientist at LIGO Laboratory, Caltech; he has 10 years experience working with gravitational-wave data.

Alan Weinstein is a professor of physics at Caltech and head of the LIGO Laboratory astrophysical data analysis group at Caltech.

Join the Discussion