On the New Nature of Surveillance Images
In the summer of 2024, Paris will host the next Olympic Games. To prepare, the French government has proposed a special legislation, which contains, among various sections, an article allowing the use of algorithms to automatically analyse CCTV camera and drone footage. For the first time in France—and in the European Union—artificial intelligence (AI) will be legally used to detect “predetermined events that may threaten the safety of people.” Although facial recognition algorithms are explicitly banned from the project, human behaviours will be systematically analysed in real-time, and biometric mass surveillance will become a reality in Europe. If this legislation is, for the moment, only—in the words of the law—“experimental and temporary,” these methods will nonetheless continue to be evaluated after the Olympic Games, for several months and at large scale, in public transportation and recreational events. Such an “experiment,” presented as a test which would improve current AI video surveillance models, will likely pave the way for broader, permanent legislation on algorithmic video surveillance in France.
Of course, video-driven surveillance is not a new practice: the first public CCTV cameras were deployed in France at the beginning of the 1990s. The automated, algorithmic analysis of surveillance footage is not new either: for two decades, in Europe, traffic enforcement cameras—such as speed radars combined with automatic number plate recognition—have been massively deployed on the roads. What is new, though, is the degree of sophistication of current AI models. Detecting complex events—such as precise human gestures or the presence of specific isolated objects—efficiently, in real time, and across a great variety of scenarios and contexts is much more of a technological challenge than analysing car speeds. Such new developments have been made possible thanks to a family of AI models—“deep learning” models—which are, among other things, particularly well-suited to analysing digital images: classifying them, segmenting them, detecting objects in them, and also—potentially—recognising human behaviours in them.
These models are striking because of their apparent understanding of the visual world. They are, nevertheless, only the latest iteration in a growing history of image processing algorithms. In fact, the digitisation of images, more than sixty years ago, may have made possible the development of a true “grammar of the visible,” whose laws are composed by algorithmic developments. This is the theory of philosopher Bernard Stiegler, as developed in his 1996 book Echographies of Television: the digitisation of images is comparable to the development of written language thousands of years ago, as it constitutes a discretisation of a previously continuous medium. Such a discretisation would allow for the development of new laws for structuring and analysing this medium: digital images are quantifiable and interpretable through features shaped by computational methods. Image processing algorithms would then determine new ways of ordering the world and making sense of it. But this is not just theoretical. Driven by this new grammar, the nature of surveillance images has recently changed in various ways.
CCTV footage is, increasingly, created by machines—with their own sensors—for machines, which analyse their content, locally or remotely. Such images, called “invisible images” by artist Trevor Paglen, must be translated to be understood by humans, who are not in the loop anymore. These images are pure data, which not only represent reality, but indirectly—through algorithmic pipelines—control and act on it. Indeed, they exist to be analysed systematically and in real time. They are not passive anymore but are made active by the machines that compute them—and that can activate processes when specific events, or objects, are detected. The great thinker Harun Farocki, who reflected a lot on the nature of digital visuality, called such images “operational.” This is how AI-driven surveillance can be seen: invisible images made operational thanks to a new grammar of the visible, shaped by algorithms and—especially today—deep learning methods. Such concepts, proposed by scholars and artists, are important to help us think critically, and politically, about the current changes in automated surveillance.
"These images are pure data, which not only represent reality, but indirectly—through algorithmic pipelines—control and act on it."
Surveillance, made invisible and operational, seems to be becoming a purely rational, automated process, evolving far from our human world. Yet these changes can be conceived in continuity with an older political will driven by economic interests: the will to monitor and control. By creating an efficient “society of control”—in the words of philosophers Gilles Deleuze and Félix Guattari—in which statistical and algorithmic methods ensure a permanent, ambient monitoring, they alienate humans, shape their bodies and minds, and influence their behaviours in public spaces. The use of this technology for the 2024 Paris Olympics will be only one example of this broader, global evolution. A great quantity of startups are, for instance, currently developing human action recognition methods to detect thefts in supermarkets. The justification for such automated monitoring is, in that case, not linked to public safety, but to purely economic gains. This growing, AI-driven surveillance industry could soon constitute a very large economic market, and many tech companies want a slice of the cake.
Pushed by a profit-hungry industry—and by supposed security constraints—highly complex new modes of surveillance are being developed and presented as justified and necessary. These opaque, computational ways of controlling human bodies and behaviours constitute a technological answer to a constant societal feeling of crisis. But the crisis in question is not only sanitary or economic. This is also a crisis of trust in human nature—seen as unpredictable, inefficient, and suspicious in a neoliberal society obsessed with quantification and automation. When “states of exception” become the rule, and are driven by uninterpretable algorithms, we risk losing our grip on the present—and on our own agency. If operational images are slowly transforming the world through automated surveillance processes, we should always remember their political nature: their deployment is not inevitable and constitutes a conscious choice, which can be rejected.
Video: Dreamy Cops by Tristan Dot & Mathieu Rita.
The place of the university—as an institution—is ambiguous on the question of automated surveillance. While many research laboratories—particularly in computer science—indirectly contribute to the development of surveillance methods, some university grants—from technology companies—can be considered as a byproduct of the surveillance industry. The video above, titled Dreamy Cops—and computed by Mathieu Rita and myself—illustrates these questions. An infinite number of images, resembling humans to varying degrees, were generated by training an algorithm on a research dataset for pedestrian detection. The slow, continuous evolution from one image to the next opens the doors to reflection—on the transformation of surveillance into a frictionless simulation and on the underlying role of the university in this evolution.
Tristan Dot  is doing a PhD in Digital Art History, in the Faculty of English. He is interested in the links between machine learning representations and art historical theories.