Working at scale: what do computational methods mean for research using cases, models and collections?
AI for GLAM, Computational Humanities, History, Machine Learning, scale, STS
The ascent of scalehttps://dx.doi.org/10.15180/221805/001
The keyword ‘scale’ has become as ubiquitous in the humanities and social sciences as it has long been polysemous. Beyond the senses of ‘climbing’ and ‘succession’ in its several etymologies, the sense in relation to measurement – a scale as a referent for counting – looms large among its meanings. When fused with the senses of ‘perspective’ and ‘representation’ (stemming from mapping and modelling) ‘scale’ begins to look like an ambiguous super-word, widely used in verb forms as in ‘scale up’ or ‘at scale’; forms now used unproblematically as terms of art in a range of fields. The merits of doing things at scale, we often hear in public discourse, is that bigger means better: conclusions self-evidently become stronger on the basis of more and bigger data. The rise of big data now has its own big and burgeoning historiography. New histories of data and information science have responded to twenty-first-century digital practices and infrastructures associated with Silicon Valley, social media and the cloud to re-interpret earlier episodes through the lens of (but also often with the use of) these new technologies. The big data produced in the nineteenth century, influentially called an ‘avalanche of numbers’ by the historian Ian Hacking (Hacking, 1991), invite (and have begun to receive) re-interpretation in light of twenty-first century developments (Wilson, 2018).
A sense of incongruity nonetheless persists between what might be called the human scale, on the one hand, and the incomparable scale of information produced by institutions and states, but also scientists, on the other. This tension – as we might expect – is not new and has surfaced, in particular, in relation to the public understanding of science. Writing in 1926 the statistically minded biologist J B S Haldane outlined the difficulties in encouraging citizens to make the leap in perspective from the human scale to that of the millions and the billions needed to make sense of scientific modernity: ‘The average man complains’ claimed Haldane (1927, p 2), ‘that he cannot imagine the eighteen billion miles which is the unit in modern astronomy’ that make up a single parsec (in fact, ‘trillions’, by today’s convention). ‘Beyond those limits space does not have the properties ascribed to it by common sense, and visual imagination does not help us’ (p 4). This breakdown in habitual forms of reasoning explains our reliance on scale models to stand in for phenomena of interest. When common sense deserts us, shifting scales allows elucidation, explication and understanding by mapping ungraspable phenomena onto a canvas rendered visible; not only for pedagogy or popularisation, but also as part of the core ratiocination involved in scientific work, namely: hypothesis generation, research design and the mental manipulation of concepts.
It is in such a marketplace of ideas that ‘scale’ has become a zeitgeist, in particular for the digital humanities, increasingly coupled to the field of data science, its methods, thought-style and knowledge claims. A data-driven approach to research in the humanities, however, remains a minority pursuit, often to the frustration of those who advocate it (and the relief of its critics). The ‘data-fication’ of the natural sciences is a process long underway, with profound implications for scientific career paths, infrastructure and institutions. New national centres for data science (sometimes coupled with so-called ‘Artificial Intelligence’) abound, drawing in substantial funding from hopeful public and private backers. Some scholars concerned about the viability of the humanities tout court have sought to ground claims of significance in their willingness to adopt data-driven approaches: ‘change or die’. Nonetheless, data science institutes remain dominated by the natural sciences, which is unsurprising given the congruence between the statistical methods required for such work and the existing skills and training of the scientific workforce (as opposed, broadly speaking, to that of the humanities, quite apart from the differing research and publication conventions in actual existence). Data-fication in the natural sciences – which has turned on notions of scale as much as on congenial tools and methods in computer science – has nonetheless found awkward kinship with the new field of data science, which has its genealogy in the culture and politics of Silicon Valley, rather than in research laboratories. ‘Data scientist’ is a moniker of a nascent profession rather than of a disciplinary field; however, it is one whose perspective on the existing disciplines is to reduce all other fields of knowledge to the status of ‘domains’ to which its methods in turn might subsequently be applied. Applications of data science are therefore as likely to relate to the work of the security services as they are to helping diagnose ovarian cancer. It is this idea of a general purpose technology which drives much of the interest in data science, whose strengths lie both in operationalising enormous volumes of data at scale, but also in the shifting between the scales involved in its myriad and versatile research (and commercial) applications.
The promise of working at scale introduces a panoptic quality to research questions: it gestures towards the longue durée as well as the exciting possibilities of new forms of breadth or depth. These can be useful claims for disciplines staking their claims for significance or tussling for the attention of funders. Questions of scale have been the occasion for theoretical and methodological reflections on the part of historians, sometimes contentiously. Is a study focusing on one historic day, more or less significant than one on the scale of centuries? And likewise: is the global scale a necessary condition for certain forms of understanding? In which case, are local studies necessarily parochial? Studies involving a broad scale often include more information, perhaps requiring different methods and infrastructure to handle seemingly bigger data but which is often analysed at a lower resolution of detail.
What do these trends and preoccupations mean for material history and research in museum collections? Scale has the potential to act as a useful prism through which to think about the way that – on the one hand – digital research methods might be brought to bear on collections – and on the other – the way that museums already operate with a sense of the problematics outlined above, with a theory of scale necessarily embedded in the work of curating and displaying objects. Referring to the Science Museum’s Making the Modern World as exemplary in this regard, Thomas Söderqvist remarks how its ‘big hall provides an immediate, almost intuitive, grasp of the longue durée’, and more generally how ‘one can move from the anecdotal features of singular objects to broader cultural and political themes and issues’ (Söderqvist, 2016, p 343). This shifting perspective experienced by museum visitors is, on reflection, a fundamental precept behind choosing to exhibit things in the first place. To see a fine example of a type of artefact, can transport the viewer through the looking glass to a different time and/or place, whose significance is established by its curation. Söderqvist continues his paean to the inherently political nature of the museum by pointing to the way exhibits are now ‘accompanied by the visualization of digitalized historical data’. This is a vision of a new and radical practice of contextualisation at scale, made possible by the accessibility of metadata about objects created by librarians, archivists and others, as well as open-access statistics more generally. However, this is not the main aspect of the role currently being imagined for digital methods, machine learning (ML) and ‘AI’ in the GLAM sector (Galleries, Libraries, Archives and Museums).
An authoritative definition of data science lists among its features ‘the processing of large amounts of data in order to provide insights into real-world problems’. The question therefore suggests itself: what are the real-world problems the GLAM sector needs help with? Setting aside the challenges of seemingly indefinite austerity, the depredations of becoming cannon fodder for culture warriors, as well as the murky issue of sponsorship – which even data science cannot address – an equally authoritative recent overview of what we might call ‘AI for GLAM’ highlighted four broad areas where data science ‘could have – or already is having – an impact.’ (van Strien, 2022, p 1): ‘Cataloguing and other forms of metadata generation. Enabling search and discovery of collections. Supporting and carrying out research. Public engagement and crowdsourcing.’ These are core areas of what museums do and they have long incurred expensive costs in the form of human labour. One reason for excitement among museum managers therefore rests in the potential for doing more with less, and doing it by leveraging collections data ‘at scale’. One aspect of the rhetoric surrounding ML is its implicit ability to save money for organisations, a claim made more explicitly in other sectors, but whose implications for the work of curators have not been fully understood. One appeal of new technology has always been as a labour-saving device, which historians of science and technology are well-placed to point out. Another feature of new technology has also been its susceptibility to animal spirits and exaggeration, modelled formally in one instance as the ‘Gartner hype cycle’. A recent piece of research to examine possible impacts of ML in this sector concluded that the mood of information professionals had reached Gartner’s notorious ‘trough of disillusion’ (Cox, 2021, p 28). Cox found that there ‘remains scepticism that many of the products being labelled as AI are truly novel or can fully deliver on vendors’ promises. They are often perceived to be familiar technologies rebadged. If they do offer something novel it is more limited than the claim. How proprietary systems work is often a secret.’ The reliance on proprietary information systems creates many potential problems for institutions, especially in the areas of sustainability and access. These problems may nonetheless seem less serious than the alternative challenge of building (and maintaining) bespoke tools and systems in-house. A much greater understanding is needed of where genuine benefits can be obtained from the latest digital methods, and where the appropriate limits lie. Such judgments require a technical skillset often wanting among decision-makers who risk being seduced by ‘shiny tech’, creating ‘the pattern of technical solutionism: technologies in search of a problem’ (Cox, 2021).
GLAM organisations remain in a process of developing an understanding of how data science methods might productively be put to work in furthering their goals in research, curation and exhibition. To do this critically means a more rounded and realistic sense of the processes and tendencies of data science, its potential, as well as possible pitfalls. A large and ambitious scheme such as the AHRC’s ‘Towards a National Collection’ (TaNC) signals an intent from research councils to support the use of digital methods. To this extent it may be useful to survey some salient experiences of related and neighbouring fields, whose more early adoption of methods such as ML could offer useful object lessons. The following sections draw on certain of these experiences in turn, paying special attention to the role of data science, its central claims about scale and its increasingly influential thought-style.
English literature has been the site of perhaps the most robust discussion around the uses of ML as a tool, as a sub-set of debates about computational methods within the field more generally. The writings which are the topos of literary studies are, in a sense, found objects which happen to be inherently amenable to computational analysis. This differs (on the whole) from museum objects and is an accident of their material property of being texts which, if not already machine readable directly, can be digitised and, thereby, processed relatively easily into material for what has famously been called ‘distant reading’. The relative availability of early modern textual corpora in digital form has led to the (perhaps incongruent) rise of the eighteenth century and Romanticism as the leading edge of literary digital humanities. Although not caused by the rise of computational approaches we can observe that a certain degree of crisis in the humanities (and literary studies especially) has been collocated with it. This has left digital scholars especially well placed to consider and introspect on the nature of their discipline, some of whom have raised valuable questions about the very nature of the literary object and its study.
The issue of scale has been crucial in narrating a new disciplinary self-understanding, in a sense, provoked by the rise of digital texts and their ubiquity in the wider culture. For James F English and Ted Underwood (2016) the history of the changing preoccupations of literary studies can be told in terms of scalar contraction and expansion over time. Interwar critics in Cambridge (among other places) ‘sought to establish literary studies as a distinct and legitimate’ practice, in contrast to philology and belles-lettres, by reducing the scale of their object to ever smaller units of analysis and so making it ‘teachable, testable, rigorous’. This scientistic approach, they claim, proved ‘a winning strategy’ as ‘[l]iterary studies massively expanded its institutional footprint and widened its cultural power’ (p 278). Further waves of expansion and contraction have followed with respect to the object of study, culminating in the new historicism of the 1980s which, for all its capacious inclusivity of cultural subject matter, could nonetheless take tiny units of time or text as high-resolution representatives of something bigger. They continue: ‘For all its expansive effect on the texts and topics deemed pertinent to literary studies, New Historicism was in this respect a ‘nanohistoricism’ (Liu, 2008). Today, by contrast, we confront something more like gigahistoricisms’ (English and Underwood, 2016, p 280). This expansion of scale has allegedly precipitated a ‘crisis of largeness’ as a result of the big data by which many scholars apparently feel overwhelmed. English and Underwood, however, reject any opposition between data on the one hand and theory on the other; an idea that might follow if we imagine that ‘big data’ involves a merely technical advance. On the contrary, there are new and sophisticated hermeneutical practices developing alongside techniques based on machine learning, for example, in relation to the use and interpretation of how inputs and variables relate to model outputs. These are more visible when deployed by natural and social scientists, whose methods are often more explicitly articulated, but are now in use by humanists as well. In any case, they point out, practically all scholars are now the users of algorithms that underlie the search engines on which they depend, only without openly acknowledging their impact.
This issue is explored directly in a recent essay in relation to vector-space models of language, in which Dobson (2022) problematises the use of linguistic tools such as word2vec (widely used to construct arguments about patterns of word use and meaning). Dobson raises questions about the hidden interpretative work performed by such tools, in a discussion which shows scholars coming to terms with computational methods and generating a productive new critical discourse. Nonetheless, there remains a sense of discomfort about critical practice being conducted using black-boxed algorithms – which is the sub-text to much of the anxious commentary around literary studies, at times recalling the tired rhetoric of the ‘two cultures’ war. Sub-fields of literary studies concerned with what might be called more empirical questions – such as literary history – have been happier bedfellows for cutting-edge methods that ‘scale up’. Notable among these is the network analysis used by Ahnert and Ahnert (2019) to make inferences about history from the big metadata gleaned from literary and epistolary archives.
However, scaling up in fields whose interest lies adjacent to or, in some sense, beyond texts creates its own challenges. The move from internal or pure hermeneutics to a form of distant reading which uses texts as a gateway to understanding social reality, faces questions about how, and in what way, do they represent the reality that lies beyond. This problem is closer to the one faced by museum curators and researchers aspiring to work digitally: what additional forms of mediation (beyond the traditional act of curation) stand between a collection and some machine-readable representation, or simulacrum, of its contents, to be analysed ‘at scale’? Social historians have been trained in source criticism to interrogate this nexus between text and reality; but how does this translate to the digital realm, with its multiple stages and varieties of processing? For many researchers working with newly accessible big historical data, it has been enough to point to the size – or scale – of their source material. However, this flies in the face of decades of practice in the social sciences around sampling and the question of representativeness. The facility of using tools such as Google’s N-Gram has dazzled too many scholars into overlooking the rather important question: but which books did Google digitise?
The form of historicism in most need of updating relates to this issue of representativeness that cannot be simply overcome by sheer volumes of data. Critical practices in relation to the assembly and exhibition of collections have not routinely been translated into the digital realm, leading naïve claims to be made on the basis of hastily assembled corpora. Pioneering work has begun in relation to digitised newspaper collections (Beelen, 2022) that point towards more dependable forms of reasoning at scale that may prove a model for other fields. Introducing the notion of an ‘environmental scan’, Beelen et al, argue that it is only by reconstructing a sense of the historical newspaper landscape (by painstaking work with contemporary reference sources) that one can credibly evaluate (one’s own, let alone others’) historical claims based on large digital collections. Without such source criticism at scale we cannot say what it is that our sources represent.
Shifts in scale are a stock-in-trade for the allied disciplines of Science and Technology Studies and the History and Philosophy of Science (STS-HPS), acting as a useful heuristic for the scientific claims they have been concerned to understand, often based on some relation between micro- and macro-analyses. The classical locus of the laboratory and its controlled environment, away from the world outside, involves scientists in scalar modes of reasoning, whether implicitly or explicitly. Ubiquitous in the sciences, experiments, models and diagrams all involve some theory of scale on the part of their users. The study of metrology – of the literal practices of measuring and counting according to one scale or another – and how these have been indexed and calibrated over time, continues to be a fundamental issue in STS-HPS. Scientific instruments, most visibly in relation to the observational practices dependent on the optics of microscopes and telescopes, are frequently scalar instruments for seeing. More recently, the problem of measuring global heating and climate change has been conceptualised in terms of the very possibility of making observations at the planetary scale.
STS-HPS scholars have made rich use of case studies to represent bigger things, as well as often reflecting on the nature of the work done by case studies. The small scale of the clinical case study was no barrier to wide generalisation, for example, in Sigmund Freud’s time. Despite the widespread understanding of sampling and statistics, certain cases took on an outsized medical and then cultural importance, to become what Monica Krause (2021) has called ‘model cases’. For the purposes of thinking about scale, the uses of case studies – and especially model cases – become pertinent when we think about the way that such cases stand in for other phenomena. Krause makes a useful distinction for us between the material object of research and the formal object of research, or epistemic target. The material research object is a concrete object, accessed through particular traces, or ‘data’, that are produced by specific tools and instruments. It stands in for the epistemic target of the study — what a given study aims to understand better, which is not usually available for direct observation.
This practice will seem as familiar to researchers across many fields as it is unexamined. Some form of logic, according to which cases relate to broader knowledge claims, can be found everywhere from urban studies (in which the cases of ‘Chicago’ or ‘Los Angeles’ becomes the focus for different schools of thought, depending on their precepts) to biology (where frogs or fruit flies have played canonical roles). When the case in question is at the scale of the whole world – such as in climate science, and studies of its history – the problem of measurement can become acute: as Simon Schaffer once commented, where do you stick the thermometer? This problem explains the use of formal modelling – not to represent a thing, but as a model for a type of thing which can then be manipulated, the results of which ‘can then be compared to the world’ (Krause, 2021, p 27). It is this connection back to the real world of phenomena-to-be-explained that seems especially pertinent to digital and computational research in history. What role do datasets and the idea of scale play in mediating between historical sources and the reality beyond, which, in Krause’s terms, is its epistemic target? This is an issue we find addressed by theoretically sophisticated researchers who have wrestled with the question of ‘representativeness’ and the inevitable process of sampling that happens when using data to stand in for something else. However, too often we see datasets treated uncritically as if simply equivalent to the target of study. For researchers in the digital humanities, ‘getting hold of the dataset’ can itself feel like the Holy Grail, which perhaps explains why so many words are often spilled in describing and interrogating the dataset, whose properties can end up becoming the rather insular target of the inquiry. Datasets that are well-curated and freely available are likely themselves to become model cases, as other researchers are invited to repeat aspects of the research. The increasing emphasis on reproducibility shows the influence of data science in setting new standards of best practice and is made possible by code sharing platforms such as GitHub, now a ubiquitous part of research infrastructure (and privately owned by Microsoft since 2018).
Historians have sought new metaphors to describe their accommodation to the panoptic possibilities of working across new enlarged scales. Instead of the microscope or the telescope, we have the ‘macroscope’, hinting at a paradoxical combination of simultaneous depth and breadth (Graham, 2016). More recent work has highlighted the ways that big data can be leveraged to explore not only long-term trends, but also short, momentary episodes and suggest associated metrics of salience (Guldi, 2022). However, the use of quantitative methods in historical studies is not in itself new: working with large numbers has long been practised within a certain strand of historical work related (principally) to demography and economics, in which scale has been posited as a means of overcoming the cherry-picking of evidence and the problem of case studies altogether. This ‘cliometric’ form of history reached its peak several decades ago but fell from favour (as popular history boomed instead) due to its remote, social-scientific idiom and lack of narrative and engagement. Can a version of cliometrics bound to digital methods reconnect with the public? One form this has already taken is with the use of history as a form of forecasting. In a strange inversion of teleological thinking, big data has been used to pursue forms of ‘retrojection’, in which the past itself is modelled to see how well the model ‘predicts’ the outcome we now know to have been true. Such approaches may have a role where data and evidence are in fact lacking, but don’t appear to answer the call for history to become more relevant as articulated, for example, in The History Manifesto (Guldi and Armitage, 2014). A more promising use of history at scale may be in the use of techniques such as data visualisation and the design of front-end interfaces, such as those being developed in heritage institutions and museums to encourage engagement with their collections.
At the level of collections research, the use of big data (at scale) offers the prospect of discovering trends and patterns not observable even by expert curators. One refrain heard among data scientists is to ‘let the data speak’ without preconceptions. Notwithstanding the difficulty of accepting such precepts in the humanities (because the notion of ‘raw data’ is an oxymoron – Gitelman, 2013), we must ask what the nature of such trends and patterns might be, even in principle. The increasingly impressive track record of data science in the natural sciences is based on speculative attempts to detect meaningful (or useful) patterns that could not have been predicted from within disciplinary domains. This is hard to imagine in a discipline such as history; however, we can close our discussion with an example based on a type of source material for which ‘scale’ is an essential keyword combining several of its senses; namely, maps.
Cutting-edge new work in historical geography has involved creating – on the one hand – new methods of working with these source materials – and on the other – wholly new categories of analysis. One example of such a procedure can be found in recent work using maps to generate new perspectives on the historical landscape (Hosseini, 2021; 2022). Thanks to the enormous investment in creating item-level metadata within its vast collection of historical Ordnance Survey maps, the National Library of Scotland (NLS) made it possible for researchers to create a machine-learning pipeline based on computer vision, to ‘see’ the historical landscape in entirely new ways. The NLS not only catalogued each of its hundreds-of-thousands of individual historical map sheets, it also digitally geo-referenced them (which situates each sheet in space). This has made it possible for members of the public to view the historical landscape in a smooth and well-ordered manner using the NLS’s online mapping platform, in which map sheets are imperceptibly stitched together in both space and time. However, a further affordance of the underlying collection information that was generated, was to make the map sheets machine-readable in ways that could not have been anticipated before the advent of the neural networks that allow computers, as it were, to ‘see’. After a long process of experimentation, a team of multidisciplinary researchers and curators developed a new typology for the historical landscape (different from anything in the existing literature) in relation – in this case, to industrial development. These (and other) features of research interest can consequently be detected by a neural network and then located ‘at scale’ across the entire collection of thousands of map sheets. This new data or output of a machine learning model can subsequently be added back into catalogue and collections systems as an enrichment of metadata, or stand as a dataset in its own right to be used in fresh combination with other datasets. The value of this procedure depends both on the scale of the map (meaning its resolution) insofar as the features must exist at a sufficient level of visual detail; and also on the broad scale of coverage offered at the level of the (national) collection.
This brief overview of one case suggests a model of how ML can be brought to bear productively for research on and with collections. The deployment of data science in the GLAM sector should not be restricted to the level of generic tools – such as for processing audience preferences, or algorithmic recommendation systems – which would be thin gruel indeed. Instead, research-led explorations of galleries, libraries, archives and museums which leverage the affordances of scale in ways that are both critical and well-judged point to much richer and promising avenues of exploration. This flows from the way they combine innovation in tooling with the serendipity of combination and discovery on which computational data science, at its best, is based. GLAM institutions seeking to reap such benefits should put research at the heart of what they do, operating at whatever scale is most appropriate to the material in their possession, rather than that being pushed by the purveyors of new techniques and technologies regardless.