The Mapmaker’s Dilemma
When I tell people that I worked as a geospatial analyst at the Centers for Disease Control and Prevention during the Covid-19 pandemic—and that I deployed twice to the pandemic response team—their reactions are often characterised by extremes. To be fair, one might think a CDC deployment during a global pandemic would involve whisking off to far-flung places, donning PPE, and orienting top brass in Emergency Operations Centers buzzing with activity. (For some deployers, such was likely the case.) In reality, my experience was similar to that of many whose jobs were relegated to spare bedrooms, unfinished basements, and desks wedged into the corners of living rooms. I sat, day in and day out, huddled over a laptop at home, with my only professional connections on the other ends of virtual calls. As geospatial lead on the International Task Force for the CDC’s Covid-19 Response, I was responsible for reporting key developments abroad using maps and related infographics.
The work was deceptively mundane. Each day I would comb through various websites—often those of a country’s Ministry of Health—looking for raw, publicly available Covid data to download, wrangle, and visualise. For many countries, this task was easy. For others, less so. Over the course of several weeks, I began to spot patterns in how data were collected and delivered. For instance, I noticed which countries reported cases daily without fail, which countries had testing backlogs that translated into sudden jumps in case numbers, and which countries seemed to be falling short in disease surveillance altogether.
Maps became a visual leitmotif of the pandemic. They received top billing on websites, the evening news, and social media. Often these maps took the form of choropleths, which shade geographic units like states or counties based on a scale of numerical intensity. The archetypal map saw states or counties with high numbers of Covid cases appear in dark, feverish red, with less infected states in a lighter, less threatening pink.
In my reconnaissance process, it did not take long to understand that the numbers of cases and deaths worldwide were collected and reported in startlingly different ways. Disease surveillance varies both between and within countries. Test positivity rates—an indicator of whether or not testing is sufficient—were preposterously high in some places. The data begged some nagging questions as the pandemic unfolded: was it possible that countries dotted with enormous megacities (whose residents could not afford to abide by mitigation measures) could really have lower case rates than locked-down countries in Western Europe? Were fatality rates really higher in Italy and Spain than in India and Mexico? (As it turns out, maybe; the pandemic upended many global health professionals’ assumptions regarding infectious disease vulnerability.)
"The buzzy term “data-driven” seems to resound from every corner of the industry, and it carries with it a certain implication that data—when sufficiently abundant and appropriately presented—can take debate off the table. Such a notion is, I believe, misguided. Debate in the age of big data is needed more than ever."
For someone who tries to capture reality in maps so that the public can understand the data, these questions were disquieting. After all, I knew that while I, as a social scientist, appreciated the grey areas that were inevitable in the process of data collection, the public and the politicians that held sway—by large—did not. People want certainty as a form of security. Despite being as judicious as possible about what data we accepted as sound, I could not resist the urge to look for cracks.
Yet as time went on, new datasets added layers of truth and clarity. Genomic surveillance began to uncover new variants around the world. Wastewater data revealed viral loads in communities. And in the most grave development, excess death data revealed the extent to which the previous years’ deaths were exceeded in 2020 and 2021. Enormous discrepancies in some countries suggested dramatic shortfalls in surveillance and reporting of actual Covid deaths. Smaller excesses in others highlighted the pandemic’s expanded toll—the heart conditions, cancer diagnoses, and diabetes cases that had been fatally neglected due to both constrained healthcare resources and public reticence to pursue treatment.
Like many people working in data analytics, I quickly learned one of the defining paradoxes of our time: that despite the sophistication of our tools and the ubiquity of the data we feed into them, our picture of reality remains frustratingly blurred. Data often feel incomplete, and even if they don’t, the methods and assumptions underpinning their collection call into question the conclusions we might draw from data in the first place.
Many might despair and adopt an agnostic stance with respect to knowing anything at all. I think a more constructive approach is to understand the data exploration process as iterative. Investigating the origins and definition of data reveals new truths. And with each additional dataset, light can be shed on others. The layers of the onion can be peeled back one by one.
"Like many people working in data analytics, I quickly learned one of the defining paradoxes of our time: that despite the sophistication of our tools and the ubiquity of the data we feed into them, our picture of reality remains frustratingly blurred."
Which brings me back to the maps, those objects so often lauded for their visual intrigue yet laden with a long history of defining power. People often tell me that they love maps when I tell them I am a geographer. Others are quick to point out that maps have often been used to convey half-truths or even lies by the powerful—a sound argument that has become something of a cliché in academic circles.
There is some truth to these statements, but their monolithic tone obscures a more nuanced reality. Maps are complicated, multifaceted tools whose complexity is often overshadowed by their visual power and implied straightforwardness. They are the result of both conscious and unconscious choices—conventions and whims—on the parts of those who design them and slice the data that go into them. They can clarify and confound, baffle and bluff. When they visualise aggregated data, they often play up one truth at the expense of another. Such is the nature of data visualisation. Choices have to be made.
Crucially, it is that very process of choice that defines sound data science and geospatial analysis these days. The buzzy term “data-driven” seems to resound from every corner of the industry, and it carries with it a certain implication that data—when sufficiently abundant and appropriately presented—can take debate off the table. Such a notion is, I believe, misguided. Debate in the age of big data is needed more than ever.
In a field defined by life-and-death decision making during pandemics, the messy amalgam of analysis, debate, experience, and intuition—that is the recipe for making the most of our ever-growing mountains of data. Artefacts that we create along the way—the maps and infographics that are the currency of the digital data moment—should be seen not as reflective of objective reality, but as tools for interpreters.
Lance Owen [MPhil, 2006] received his PhD in Geography from the University of California, Berkeley. He currently leads the global health portfolio at Esri, the world’s leading geospatial software company.