This post was originally written in September of 2021.
OpenAI’s GPT-3 is the most powerful AI system I’ve ever used. Trained on billions of web pages and tens of thousands of books, the system can generate nearly any kind of text, from news articles to computer code to sea shanties.
The current version of GPT-3, however, was only trained on data gathered through October of 2019. That means that GPT-3 has never heard of Covid-19, since the virus only started circulating broadly in December of that year. In testing the system, I wondered what would happen if I taught GPT-3 about Covid-19, and then asked it questions about the pandemic. How would it respond, and would its answers match at all with the reality of how Covid-19 has unfolded?
I decided to find out. The results were shocking, and gave me a new appreciation for the role that generative AI might play in guiding decision-makers during future outbreaks. To be clear, this is an experimental use of GPT-3, intended to explore the potential for future AI-assisted decision making, and especially to evaluate embedded biases–it’s not a production-ready use of the system. OpenAI does not say that the system is ready for medical uses, and you should always consult a doctor and follow the advice of public health leaders. Never rely on an experimental system in making medical or public health decisions.
To pose a question to GPT-3, you can’t simply program it, as you would a computer. Instead, you provide the system with a written prompt. GPT-3 then has two jobs: to infer your intent, and then to respond by “completing” the prompt you’ve provided. For my experiment, I began by giving GPT-3 general prompts about the pandemic, such as “A novel coronavirus was discovered in Wuhan, China in December of 2019 and began to spread worldwide.”
These general prompts yielded a vibrant, diverse set of responses, all in wildly different formats. GPT-3 generated Socratic dialogs about Covid-19, imaginary scientific papers with fake authors, pretend Instant Messenger exchanges between people discussing the virus, and more. It seemed to struggle with the idea that my prompt represented reality–in many cases, it turned my prompt into a book or movie pitch, as if a sudden global pandemic was so unbelievable that it must be part of a work of fiction.
Eventually, I settled on two approaches that provided more structured and coherent results. For the first approach, I set GPT-3 up as a system designed to answer questions, something which it excels at doing. I gave it a bit of background about the virus and my intent:
In 2020 there has been a global pandemic of a novel coronavirus. I am a highly intelligent question answering bot trained to answer questions about the pandemic.
I also gave the system two generic sample questions and answers provided by OpenAI, which are used to prime the system to act as a question-answering bot. I then started asking it questions about the pandemic.
First, I asked GPT-3 how Covid-19 came to be, with the question “How did the virus jump to humans? What were the intermediary animal hosts?” The system responded that “The virus jumped to humans from bats. The intermediary animal hosts are unknown.” That mostly squares with today’s science — although there are concerns about a potential laboratory leak somewhere in the process, the general consensus is that Covid-19 originated in bats, and any intermediaries are indeed still unknown.
Next, I asked about the virus’ impacts. In response to the question “What populations are most affected by the virus?”, GPT-3 responded, “The populations most affected by the virus are the elderly and those with chronic illnesses.” That’s completely true. According to a study in the journal Nature, age and a variety of underlying conditions are two of the strongest predictors of mortality from the virus. It’s also not obvious — in the 1918 Spanish Flu pandemic, which is society’s closest corollary to Covid-19, young people suffered more than the elderly.
When I dug deeper and asked “What other conditions make the virus worse?”, GPT-3 even correctly predicted the specific conditions most associated with mortality. It responded that “asthma, diabetes, and kidney disease” would lead to worse Covid-19 outcomes. According to the Nature study, “severe asthma”, diabetes and “reduced kidney function” are indeed three of the most significant pre-existing conditions predicting whether a patient is a risk of death from Covid-19. Again, it did all this while knowing nothing about the actual Covid-19 pandemic beyond what I told it in my prompt — all its conclusions were presumably based on its knowledge of coronaviruses and of previous global pandemics.
Although it got age and health-related risk factors correct, however, GPT-3 overlooked social ones. According to the Nature study, being male and being a Person of Color also puts patients at higher risk of dying from the virus. When I asked GPT-3 about these racial and gender factors, it said that “The virus will impact males and females equally”, and that “Caucasians and Asians” would be most affected by the virus. That’s an interesting blindspot — while the system got the medical aspects of risk essentially spot-on, it missed the more complex social factors which change the virus’ impact.
It faltered in a similar way when I brought up mask-wearing. When I asked “Will people be willing to wear masks in the United States to stop the virus from spreading?”, GPT-3 responded that “People in the United States will be willing to wear masks to stop the virus from spreading. The virus is spread through the air, and masks will help prevent the spread of the virus.”
Again, the system got the science down perfectly — an increasingly strong body of evidence shows that Covid-19 is likely airborne, and face masks are still a major part of Covid-19 prevention efforts. Strikingly, the system got these scientific details right with very little information about the specifics of the virus itself. That’s more than can be said for health authorities in the early months of the actual pandemic.
As with racial and gender-based risk factors, though, the system faltered in predicting more complex societal responses to the virus, overlooking how politically polarizing mask-wearing would become. In the system’s idealized world, everyone would wear a mask, because doing so makes scientific sense. In the real world, that’s unfortunately not the case.
When I asked about governments’ responses, GPT-3’s predictions were more true to life. Responding to the question, “How will governments attempt to contain the virus?,” GPT-3 responded that they would “attempt to contain the virus by closing borders, imposing travel restrictions, and isolating infected individuals”. Closing borders was indeed one of the first government responses to the Covid-19 pandemic, and many travel restrictions remain in effect. Quarantine requirements for infected individuals are a common part of governments’ responses, too.
Moving on from short-form questions, I tried my second approach — giving GPT-3 more extensive scientific information about the virus (copied from Covid-19’s Wikipedia page for neutrality), and letting it write narrative responses. These varied in their accuracy, but several contained striking insights.
When I seeded the system with the text “As soon as the virus’s genome was sequenced, work started on developing a vaccine”, it wrote that “The first vaccine was developed in a record time of 9 months by an international consortium of pharmaceutical companies, and it was ready by September 2020. It was administered to 30 million people in the UK alone.” In another narrative, it predicted that a vaccine would be ready in October.
It’s debatable what “ready” means in this context, but vaccines did indeed enter Phase 3 trials in September of 2020 and were administered to the public by December. As of this writing, 48 million people have been fully vaccinated in the UK. It surprised me that GPT-3 correctly predicted the record-setting speed of Covid-19 vaccine development, and was generally correct about the massive scale of vaccination campaigns, even if it was off by 18 million jabs. As of mid-2020, many prominent experts didn’t believe a vaccine would be ready by the year’s end, much less in millions of arms by mid-2021. In August of 2020 for example, Paul Offit, an advisor to the FDA, told NPR that a year-end timeline was “unrealistic”. Even with limited scientific data, GPT-3 apparently knew better.
The system’s predictions about Covid-19 variants were surprisingly accurate, too. To prepare GPT-3 for scientific questions about variants, I first handed it a detailed scientific description of the virus’ physical structure. I then gave it versions of the prompt “If the virus mutates, expected sites of mutation which would increase virulence include”. The system completed my sentence with the text “erythrocyte binding site and the furin cleavage site.”
That shocked me. According to Nature, both the highly contagious Delta variant of Covid-19 and the Alpha variant “have altered furin cleavage sites”, and this alteration is thought to make the variants “even better at transmitting” than the original virus. GPT-3’s statement about furin binding sites appear to line up almost perfectly with the science. Given only a basic description of the virus’ structure, GPT-3 essentially successfully predicted the Delta variant.
Even more interesting is the fact that in implicating the “erythrocyte binding site,” GPT-3 may be dreaming up a totally new kind of Covid-19 variant. Erythrocytes are cells found in the blood. Although Covid-19 isn’t considered a bloodborne virus, it does have major impacts on blood cells, and some evidence suggests that it infects them directly. If the virus mutated to infect blood cells more efficiently and travel through the blood, GPT-3 seems to suggest, this would make it way more virulent than it is today.
GPT-3 certainly isn’t 100% accurate when it comes to making predictions about Covid-19. When I asked it to predict where the virus originated, the system responded with “Saudi Arabia”, which is inaccurate (and perhaps points to biases in the system’s training data.) It also predicted that Pakistan and Indonesia would have high death rates, whereas in reality, their rates are relatively low. Overall, however, I was surprised by how well the system performed in my tests. Even with extremely sparse inputs, it gave surprisingly accurate predictions about the virus itself, as well as the course of the pandemic.
Asking a system like GPT-3 about Covid-19 may one day be more than just an academic exercise. AI systems often lack common sense, but they’re fantastic pattern detectors. In many cases, a well-trained deep learning system can spot patterns which humans would miss. If the world again faced a novel virus similar to Covid-19, future decision-makers could train a system like GPT-3 using what’s known about the novel pathogen, and then ask it targeted questions about the virus itself or the future course of the new pandemic.
Such as system could potentially find connections to previous pandemics, similarities to other pathogens, and other useful patterns. It could then use these to guide recommendations, suggest useful public health measures, or even predict the likely course of the novel virus’ spread. Again, while GPT-3’s predictions about Covid-19 weren’t perfect, it got key facts (like the fact that the virus spreads through the air and the importance of mask-wearing) right, using information that would have been available to public health leaders as early as February of 2020.
No public health authority should rely on an AI system to make recommendations, of course. But as they grow in power and reach, AI systems could become another tool in leaders’ belts, allowing them to quickly parse existing scientific knowledge for insights that could help to guide in-the-moment decision-making. As the systems become better at citing their sources and explaining their output, their value as tools for guiding decision-making will only grow, because the validity of their predictions can be checked and vetted.
In my testing, GPT-3 got a lot of things about Covid-19 right and made a lot of valuable predictions. Perhaps the most sobering, though, was the system’s prediction about the future of Covid-19’s spread. Towards the end of my testing, I asked GPT-3 “When will the pandemic end?”
Its response? 2023.