in , , , , , , , , ,

Data campaigning: between empirics and assumptions

Written by Jessica Baldwin-Philippi


The use of big data in political campaigns extends far beyond micro-targeting, and has been singled out by journalists and campaign staffers alike as a powerful force that is integral to electoral victory. Current scholarship on the subject remains more mixed, however. This article provides an overview of what we know (and don’t yet know) about the effects of data-campaigning across various goals of political campaigns, alongside more public facing narratives that present data campaigning as an all-powerful tactic, highlighting the gap between these two views.

Citation & publishing information

This paper is part of Data-driven elections, a special issue of Internet Policy Review guest-edited by Colin J. Bennett and David Lyon.


Campaigns’ use of data to craft and target messages has occurred for decades, and as the amount and availability of a variety of data points has increased exponentially, discussions of cutting edge campaigning tactics have centred on data and the practices enabled by it. Often, these pieces go beyond descriptions of novel tactics and make assertions that campaigns’ use of data is directly and causally linked to how well or poorly candidates are doing, or the overall electoral outcome. Articles following Trump’s surprising presidential victory in 2016 emphasised how integral Facebook data was to the campaign with headlines like “How He Used Facebook to Win” (Halpern, 2017), and “The Data that Turned the World Upside Down” (Grassegger & Krogerus, 2017). In the months leading up to the election, however, news stories were full of how data operations gave Clinton a “crucial tactical advantage” (Goldmacher, 2016) — until the day she lost the race, that is. In 2012, which was dubbed the “first big data election” (Hellweg, 2012), a similar story played out. Following Obama’s victory, journalists homed in on how “Obama trumped Romney with big data” (Thiessen, 2012), and political operatives on both sides of the aisle doubled down on the need to create a “culture of testing” within their ranks. Despite these assertions of data’s power and influence, knowledge of when, how, and if data campaigning actually works is more complicated. This article explores what the fields of political science and communication studies know about the empirical effects of data campaigning and highlights the gap between those realities and how data is so often described as all-powerful.

The uses of data and analytics in political campaigns represent cutting edge practices, but also have a longer history within campaigns. What we understand as “data-driven campaigning” — or using large data sets to either target messages to particular populations or test the efficacy of variations of messages and a variety of goals — rose in prominence first during the Obama 2008 campaign, but gained precision in its application and its public profile during the 2012 campaign, which many news outlets called “the big data election” (Hellweg, 2012). While the practices of 2008 and 2012 were novel in many ways, they were also deeply connected to ongoing practices and strategies. They were clearly linked to findings developed by academics in the field of political science at the turn of the millennium, and have roots in what are now considered the routine and mundane practices of polling that were developed in the 1940s and which campaigns still use to help craft their messages. Yet, the use of data and analytics in making evidence-based decisions about campaign strategy is often held up as radically new and deeply effective. Although attempting to answer important questions with more data is generally a better alternative to using no data, the assumption that just because data is involved the answers must be right is equally dubious.

This article delves into the complexities of data’s power and influence in political campaigning in the US, examining if, when, and how data campaigning has been shown to actually work. I analyse the US case because, due to a combination of enormous financial investment in campaigning and lax regulation around privacy and data use, data campaigning in the US far outpaces that in other countries (Dommett & Power, 2019; Dommett, 2019). Moreover, data use regulations and guidelines such as those in the EU and Canada prohibit targeting based on personal information like race, gender or religion, while political practitioners in the US have embraced such practices and made them central to campaigns at both the national and local levels. If we are to understand the vast terrain of data campaigning, doing so with the US case in mind provides a very full picture of the opportunities at hand, as well as their limits. Moreover, despite data use regulations in some countries, concern about the export of US data practices abounds, and it appears that data-driven practices developed in the US are indeed exported to other contexts by professionals at technology firms like Google or Facebook, off-the-shelf software companies like NationBuilder, or political consultants (McKelvey & Piebiak, 2018; Strömbäck & Kiousis, 2014).

Broadly, this article argues that data-driven practices have been much more productive at mobilising action, like getting out the vote and improving donation rates, than at persuasive goals of getting someone to support a candidate. It provides a comprehensive overview of what we know data-driven practices can and cannot do, and points to places where assumptions about its effects outpace empirical knowledge. It also discusses the ways that what David Beer (2019) has dubbed the “data imaginary” or “how a faith in data emerges and then becomes embedded or cemented in social structures and practices (p. 127)” is at work in coverage of data campaigning. This happens in cases where the general claim of data’s importance is correct as well as in cases where such claims papers over conflicting evidence. Overall, I argue that narratives about data campaigning draw on and reproduce a view of data as powerful in ways that exceed our empirical knowledge of its role. Regulatory efforts that rely on this assumption of strong effects risk emphasising influence over more ethical and normative concerns about privacy, discrimination, and transparency and disclosure.

Section 1: Data campaigning and its impact across the campaign

The variety of campaign goals and types of data available to campaigns varies so widely that the term “data driven campaigning” can mean significantly different things to different people within a campaign, staffers across campaigns, journalists covering campaigns, and certainly members of the public reading insider accounts of campaigns. Moreover, precisely what counts as data-driven campaigning has changed over time — what was cutting edge use of polling data in the 1940s has become routine, as has the use of “lifestyle” consumer data like credit card purchases and magazine subscriptions that was all the rage in the 1990s. Today’s novel data type, social media data that purports to provide insight into our deepest emotional states and analytics that track which messages get more attention and engagement will likely be tomorrow’s boring data set, regardless of how effective it is currently or becomes over time. As data-driven or analytics-based campaigning has become de rigueur, campaigns at all levels have adopted the language (sometimes even more than the practices) of data, and applied to whichever level of practices they are operating at. The openness of the term — polling data is in fact data, after all — coupled with the complexity of newer data practices offers this rhetorical flexibility, and a papering-over the very real distinctions between a huge variety of campaign practices. This article parses those distinctions, providing an account of how many of these tactics have been presented to the public in ways that confuse, conflate, and contradict existing research about what works, and what does not. Repeatedly, what we see are claims that confuse and conflate substantially different approaches to data campaigning, and forego empirical accounts of their effect in favour of narratives about technological prowess and power.

Targeting and testing

At the most overarching level, data-campaigning involves two genres of practice: targeting and testing. They can each be put to use toward a variety of campaign goals, including persuading members of the public to support their candidates, or mobilising people to take some sort of action, most often donating money or getting out the vote (GOTV) (Tufekci, 2014). Any of these goals and genres of practice can be supported by access to a variety of types of data, including public voting records and census block data, campaign or party-supplemented databases, consumer or “lifestyle” data purchased from a third party vendor, and lifestyle-adjacent social media data, which can account for web browsing history, social graph data, and the algorithmic grouping of these data points into categories like emotional state and disposition. Many of these data points can be combined by a campaign or by a third party firm selling data itself (e.g., Catalist, Aristotle, or Nationbuilder) or strategy services around how to test and target messages using this data (e.g., TargetSmart, Targeted Victory, etc.). This piece not only breaks down each of these types of data-driven campaigning, but discusses which of them have garnered public attention, which have been touted as revolutionary and powerful, and which have been empirically tested to assess their power and efficiency.

It is targeting, or using data to decide which messages go to what potential voters at what time during the campaign, that has a longer history, as even the earliest opinion polls provided information about what messages voters would respond to. Micro-targeting, or using an increasing number of data points to target smaller and smaller slices of the population, is an extension of these early practices. Targeting tactics have been celebrated before the digital turn in politics, dating back to attempts to segment the American public by ideology and political interests that began in the 1950s (Issenberg, 2012), or even, as Daniel Kreiss’ (2016) work has traced, the 1980s, when the Republican party created a set of index cards with detailed information about individual members of the voting public. In the digital arena, the 2012 election saw much discussion of both the Obama and Romney campaigns’ uses of micro-targeting for ads that ran on websites and before YouTube videos, as well as in video games (Otenyo, 2010), and on social media platforms. The ability to target audience segments in more specific and refined ways has only increased, as digital platforms like Facebook and Google as well as ad sales firms have developed ways to reach increasingly specific slices of audiences over time, and can charge a premium for doing so.

While much has been written about the possibilities of such tools and practices (Chester & Montgomery, 2017) and subsequent dangers of these possibilities, much less work testing its efficacy has been produced. Public discussions of targeting programmatic ads often assume that more and smaller data points lead to better outcomes, but studies of consumer outcomes have recently cast doubt on how much more effective micro-targeting is (Marotta et al., 2019). Beyond questions of its efficacy, targeting — and especially increasingly specific micro-targeting — has been particularly criticised for its implications for reducing opportunities for shared public deliberation (Howard, 2006; Kreiss, 2012; Tufekci, 2012), but a recent study from the UK shows that despite the ability to target, even major campaigns end up with messages that largely echo the narratives found in national-level ad campaigns (Anstead et al., 2018), raising the question of how different the content that results from micro-targeting really is. In the highly-publicised case of the Trump campaign’s use of Facebook’s “dark posts” — which allowed the campaign to functionally make ads invisible to non-targeted populations and were used to target Black Americans with messages arguing that Clinton was racist and should not be supported — similar ideas were hardly absent from the larger campaign, with Trump himself tweeting about these ideas and the campaign running a national ad on the topic, too (Hellmann, 2016; Savransky, 2016).

Testing, on the other hand, allows campaigns to empirically measure how well messages perform against one another and use that information to drive content production. While current technologies have made this exponentially easier, testing also has a longer history, as campaigns have long fielded polls, focus groups, and dial tests in which they test multiple messages across audiences. In a modern campaign, analytics-based testing is added to the mix, and campaigns can display messages to a variety of random or sampled audiences and track various reactions and engagement, from clicks to subsequent action like donating, to time spent on a page. For example, a common site of testing, campaigns’ email operations can measure how message elements like subject header, different content, layouts, or action buttons, effect the likelihood a recipient is to simply open the message, or take a subsequent action like donate money or sign up for an event. A/B testing, or testing versions of a message against one another, is profoundly helpful in figuring out how to best get audiences to take action in a particular, immediate case — for instance, which ad results in the most clicks, donations, or email sign-ups. But its relevance for understanding long-term dispositions or actions is less clear. This has raised some concern among practitioners about the long-term effects of messages that might work in the short term, but potentially have negative long term consequences, such as using fear or shame based messages to drive fundraising or turnout (Brooks, 2018).

While fundamentally different, targeting and testing can be, and often are, used in tandem. A campaign can target a message, then test it within that targeted audience, or test messages across audiences. Data about both targeting and testing are often provided by those technology firms selling digital and social media ads. Despite the differences, in much coverage of digital campaigning tactics, the lines between targeting and testing are blurred, and success in one arena is often used to define or show evidence of success in another. In 2016, the Trump campaign’s targeting tactics were widely credited for the victory in profiles emphasising the aforementioned “dark posts” and “psychometric targeting” offered by Cambridge Analytica, which used data related to users’ psychological state outside of politics and demographic information to segment audiences to receive different campaign messages (Grassegger & Krogerus, 2017). Fundamentally, the idea of psychographic targeting hinges on targeting users based on how they fit into one of five psychological profiles (openness, conscientiousness, extroversion, agreeableness, neuroticism). And yet, in describing how and why these tactics were productive, the Grassegger and Krogerus article blends the two as it notes that the campaign “tested 175,000 different ad variations for his arguments, […] in order to target the recipients in the optimal psychological way”. While these tests may be useful for seeing which images or text perform better, they are functionally A/B tests, and not particularly tied to the use of the five psychological targeting categories.

Persuasion vs. mobilisation

Both targeting and testing can be used for a variety of campaign goals related to both mobilising audiences to take particular action and persuading them to support a candidate they are not already supportive of. Although data and digital teams often have their hands in both mobilisation and persuasion activities, the two goals are fundamentally different — convincing someone to take an action they are likely to be supportive of, versus convincing or even changing someone’s mind about a candidate or issue. Campaign organisation in the US often reflects this divide, with persuasion-oriented goals typically being the purview of the Communications team, and mobilising donations and mobilising GOTV being run by the Field and Finance teams, respectively (Blodgett, 2008). While digital and data teams support all of these efforts and often hold equal power in campaigns, the separation of these efforts is illustrative of their core differences.

Persuasion is tremendously difficult to measure empirically. While polling, dial tests, and focus groups may get at changes in attitudes or immediate reactions to a message, disentangling those attitudes from more macro-level dispositions and contexts, and competing narratives in the world is hard. And yet, claims about the persuasive power of any number of tactics abound. From assessments that a campaigns’ message was better so they won the race, to claims that highly targeted ads changed people’s minds, assessments of persuasion oversell certainty and undersell the role of exogenous factors such as party identification, the state of the economy, or the obvious advantages of incumbency. These claims also fundamentally conflate persuasion and mobilisation. When campaigns talk about using A/B testing or randomised controlled experiments to figure out what messages worked, “working” is defined not by a change in belief, which would be all but impossible to measure, but by being mobilised to take a particular action, be it signing up for a newsletter or donating money. Even more commonly, when journalists and pundits discuss who was persuaded in an election, they are necessarily discussing who was mobilised to vote. To say persuasion is tremendously difficult to measure does not mean campaigns should abandon all attempts to cause it — of course campaigns will develop narratives, test them, and target particular populations with them in a fashion that makes logical and contextual sense, and targeting may very well work better than no targeting at all. The point is, rather, that the empirical backing of these practices is neither easily identifiable nor clearly the case, and claims by campaign operatives and political professionals that they are — whether they are about overarching narratives or highly targeted messages — are overly confident.

Claims surrounding the role of Cambridge Analytica’s psychographic targeting in both Brexit and the 2016 US election provide a useful example of these slippages. Touted as a “psychological warfare mindfu*k tool” by creator-turned whistleblower Chris Wylie (Cadwalladr, 2018), there simply is not much empirical evidence of persuasive capacity. When former Cambridge Analytica CEO Alexander Nix boasted that the company could “predict the personality of every single adult in the United States of America” (Grassegger & Krogerus, 2017) what he literally meant was that the firm could assign a “personality” category to everyone, with conflicting accounts of how precise that designation was and little clarity on whether it made a difference in political beliefs (Sumpter, 2018). Political beliefs, as opposed to consumer behaviours like becoming interested in a new product, are especially difficult to dislodge, and that when audiences notice political content in social media is an advertisement, they react more sceptically toward it than consumer brands (Boerman & Kruikemeier, 2016). In just the same way that data-oriented practitioners criticise traditional messaging consultants’ “gut-based” belief that a message works (which, it should be noted, does make use of “data” gained from dial tests and focus groups), they should be hesitant to oversell their understandings of what data shows persuasion. While some journalists have written about the dubious nature of some of these claims, particularly those of Cambridge Analytica (Confessore & Hakim, 2018; Lapowsky, 2016a), many have also been more than willing to parrot campaigns’ claims that they figured out a failsafe way to persuade citizens using a variety of data points. Much of this coverage echoes the earliest digital coverage of the Obama campaign, which often focused on the campaigns’ use of social media, overselling the persuasive and mobilising power of novel digital platforms.

We actually do know quite a bit about what data points and practices are important for mobilising a variety of actions, as it is easier to test when a clear outcome happens or doesn’t. As field experiment experts David Nickerson and Todd Rogers (2014) have explained in detail, in order to decide which people are correct targets for any mobilising goals, from mobilising turnout or GOTV to fundraising, campaigns create predictive scores for each individual, which model the likelihood that someone will undertake a specific political behaviour, support the candidate, or respond in any way to a stimulus. These scores are often used in tandem, such as when Field teams need both a clear picture of support and voting behaviour in order to determine who to target with GOTV efforts.

The data that goes into these scores includes publicly available macro-level data like that from voting records and the census, purchased macro-level data that may be more up to date than those sources, purchased lifestyle data, and user-provided data gained from citizens directly reporting that information, or from analytics and cookies that track it through their web use (Nickerson & Rogers, 2015; Dommett, 2019). In practice, behaviour scores rely fundamentally on data that concerns prior behaviour — so, prior voting behaviour is integral to GOTV, while prior donations are integral to fundraising. Support data is similarly heavily dependent on publicly available voting record data, followed by publicly available census data, and direct voter contacts, wherein campaigns ask people how supportive they are or issues they are interested in. Support scores can also make use of analytics that can determine how someone is interacting with the campaign’s digital messages to gain a better picture of those on the higher end of the scale, but this type of data is less useful for those with lower scores (Nickerson & Rogers, 2014). In lieu of getting input from everyone, they can also be used to model increasingly specific scores for others who have similar major data points, but haven’t been contacted by the campaign or party. Responsiveness scores are an indicator of if someone is likely to respond to a campaign’s message or call to action, and are largely based on testing that the campaign does. They make use of a wide variety of data acquired through users’ various digital footprints – from when and which campaign emails they open to if they sign up for an event – and because they are not based on identity, could be useful in countries with identity-based targeting restrictions.

Some types of data have little value in terms of direct effects on persuasion or mobilisation, yet still have some value to campaigns. Data gained by campaigns through both testing and analytics connected to who read or opened content is particularly helpful to creating the responsiveness score, which assesses how likely an audience member is to respond to any stimulus, but it is much more effective for better understanding those who already support a candidate. That said, in much public discussion of data campaigning, it is the information that is digitally “provided” by voters — assessments of this data that are provided by web browsers, social media platforms, or third party — that are touted as revolutionary, when in fact, it has been shown to have limited predictive power. Additionally, data may not hold predictive value, but can still be a valuable asset to campaigns because they can sell or rent access to it (Tactical Tech, 2019).

When creating and refining these scores, information about known supporters gets increasingly richer, which may lead to mobilising power, but is much less likely to improve persuasive ability about a candidate overall. Countries that have passed laws regulating data collection and use by either corporate or political actors, such as the EU’s GDPR regulations, pose particular constraints on the ability to target for either mobilisation or persuasion (Kruschinski & Haller, 2017), and these regulations often place even greater importance on voting history, as it is less often regulated than data associated with identity such as gender or race, or more “micro” level data such as web use. While privacy and legal scholars have highlighted the possible loopholes in such regulations (Bennett, 2016), and argue that despite restrictions, targeting in particular can be engaged (Dobber et al., 2017) engaging in such workarounds often involve using more and more proxies for intended categories, thus reducing the efficacy of such data, and increasing dependency on known and macro-level data points like voting history.

In a campaign, staffers rely on tests conducted in prior election cycles that have made empirical findings concerning what data points matter when targeting potential voters to get them to the polls. For mobilising turnout, Eitan Hersh’s (2015) work has shown that publicly available data found in voter records is not only key, but that purchased, hyperspecific data — such as “lifestyle data” or consumer histories that tell you what magazines people subscribe to, what kind of car they purchased, or what their spending habits are like — is often redundant rather than additive to this public, macro-level data. Moreover, this data, on its own, is shown to have no predictive power, and is largely redundant to that which is publicly available in the US (Hersh, 2015; Nickerson & Rogers, 2014). Even other publicly available data from the census, which people broadly consider useful to campaigns, like income or education level doesn’t hold explanatory power when controlling for voting history. Despite a lack of evidence concerning its importance, purchasable hyper-specific data have a storied history in campaigns, as they were touted by the 1996 Clinton campaign for its ability to produce smaller populations the campaign wanted to target, like “soccer moms” or “pools and patios” (MacFarquhar, 1996). Even if as the available data and predictive modeling improves over the coming years and this data adds marginal benefit, use of the data is still risky, as it is a real threat to turnout voters who are not firmly in your corner.

Micro-targeting has greater potential to mobilise other, less zero-sum actions, such as fundraising, or even driving smaller scale actions like email sign-ups, or click-throughs. It can also be supplemented by immediate and short-term A/B testing to increase efficiency, whereas the only test of GOTV comes on election day. So, while micro-targeting is not instrumental to mobilising GOTV efforts, the combination of targeting and testing could be useful for other campaign needs. Yet micro-targeting for GOTV and fundraising are often conflated, and while some news reports of 2016 highlighted the fact that the Trump campaign’s micro-targeted efforts may have had meaningful returns for fundraising in particular, this is a fundamentally different practice than GOTV.

Overall, the data on what works and what doesn’t is far less clear, and the tactics of campaigns are far less obviously productive than either campaigns or journalists imply. What causes mobilisation (and does not) is much clearer than that related to persuasion. Mobilising voters to get to the polls relies most substantively on data that is publicly available in the US, found in voter records, and concerns macro-level information like prior voting record, and more precise micro-targeting data like the consumer choices that make up “lifestyle” data have not been shown to be of great import. While other data such as demographics or even analytics related to campaign email signups and behaviours can contribute marginally to the scores campaigns create about people, those are small nuances to the scores and haven’t been shown to have great predictive power. Research has begun to show that digital or social media ads can help mobilise turnout (Haenschen & Jennings, 2019), but those findings simply involve showing social media ads to a city’s residents, and are not targeted using any more categories than geography. With findings showing that negative ads in general are unlikely to work for persuasion (Lau et al., 2007), but emotions like anger do encourage partisan responses to false or biased information (Weeks, 2015), we should consider the possibility that highly targeted negative ads may be of benefit for mobilising supportive audiences. Micro-targeting and increasingly specific data points are much more likely to yield impressive results for fundraising, and other, less zero-sum actions than effect change in turnout or persuasion. That said, even in these cases where that seems likely, the effects of targeting are difficult to disentangle from the effects of testing that is also undertaken. Thus, newer tactics like using Facebook’s “Lookalike Audiences”, which allow political practitioners to find targets who share demographic and lifestyle qualities with those they already have contact information for or targeting users who “liked content related to X politician” are much more likely to be of benefit in mobilising donations and email signups, than causing a change in political opinion. Moreover, there is little to no empirical research that shows that something like “psychographic” targeting works to persuade people, or does much other than deepen existing commitment.

In each of these cases, there may be meaningful reasons to use new targeting or testing abilities that lie outside of known empirical outcomes — in cases where no effects have been shown empirically, there is unlikely to be a penalty for engaging in them, and the everyday work of campaigning largely revolves around crafting narratives and reaching out to voters in ways that may not have empirical effects. In highlighting the gap between what is known about data campaigning and how it is discussed in public, this article seeks to decouple the assumption of empiricism and objectivity associated with the very fact of being data-driven.

Section 2: Data-campaigning in the public eye and the “data imaginary”

If this rather inconclusive state of affairs is the backdrop for the past decade’s narratives about what wins elections, how and why are these narratives compelling? More than merely stating their productivity, accounts of data campaigning — both those from political professionals themselves and those created by journalists covering political campaigns and political technologies — build stories about their power. Discussions of the power of data campaigning connect back to discussions of the power of computing more broadly. As Woolgar (2002) has argued, talk of and hopes about digital media are often infused with a sort of “cyberbole” or “exaggerated depiction (hyperbole) of the capacities of cyber-technologies” (p. 9). Similarly, Vincent Mosco (2004) traces the idea of the “digital sublime”, in which the digital world elicits “hymns to progress” that, following a Burkean notion of the sublime, are preoccupied with and astonished by the digital, and unwilling to apply reason to its phenomena (pp. 22-23). Thus, its power is assumed, without need to prove any clear effects. Other scholars have written about how data, as a particular subset of digital practices and tools, has been the site and topic of similar hype. In his book The Data Gaze, David Beer (2019) describes “the veneer of knowing that aims to draw people into a data rationality” as central to the “data imaginary” (p. 4). In Beer’s vision of the data imaginary, six qualities or themes – that data is speedy, accessible, revealing, panoramic, prophetic, and smart – are key.

In the following section, I show how narratives of data campaigning focus especially on the data imaginary’s themes of revealing, panoramic, and prophetic. For Beer, data as revealing refers to the idea that data “are represented as being the means by which ‘hidden’ value might be unearthed or new value might be tapped” (p. 25), and panoramic refers to the idea that “data analytics shine a light on blind spots[…] in which nothing is outside of the knowledge that is produced by the data” (p. 26). In both of these qualities, information or its value is rendered visible by the use of data. Similarly, danah boyd and Kate Crawford (2012) have argued that understandings of “big data” hinge on the “widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of truth, objectivity, and accuracy” (p. 663). Beer’s definition of the data imaginary’s theme of prophetic emphasises that data “open up a world in which it is possible to anticipate what will happen and respond accordingly” (p. 27). The data imaginary is important not because it reveals a lie, but because it demonstrates how data practices that are effective are framed in the same ways as those that are less rigorously tested (or completely untested), resulting in oversold claims about the power and impact of data practices as a whole. What follows is an initial overview of how, rhetorically, public-facing narratives of data campaigning have fallen into these themes.

Themes of data as panoramic and prophetic have long dominated narratives of data campaigning. Sasha Issenberg’s (2012) best-selling book, The Victory Lab, places current data campaigning practices in a historical context, and even his accounts of the earliest uses of “data” in the 1950s emphasise these themes. He details how Simulmatics, one of the earliest consulting firms, touted its ability to provide so much data that it revealed new information about voters and how they could be grouped by interest, as it “included 130,000 respondents… was able to divide the United States into 480 ‘voter types’ […] and take the temperature of each voter type on fifty-two ‘issue-clusters’” (p. 118). As early as 2004, coverage of data campaigning was marked not only by extensive claims of data’s panoramic ability to see all issue positions, but also its ability to prophecise — that it could “divine [your] likely views on taxes, law enforcement, abortion, and law enforcement” (Gertner, 2004). Such claims are also in turn “proven” by reference to data’s enormity. In this case, such divination was enabled by “the several whirring, refrigerator-size computer servers in the Washington area” (ibid). New York Times columnist David Carr (2008) echoed the focus on infrastructure, arguing that Obama would have political power as he entered the White House because he will have “not just a political base, but a database”. The digital sublime, via whirring largess and spreadsheet columns is valuable and astonishes without need to describe how or why it will actually work.

Accounts of data campaigning also emphasise how large data sets can reveal new people — or truer versions of people — to campaigns. A 2004 piece, “The Search for the Elusive Swing Voter”, emphasises data’s power to reveal in its title, and ultimately argues that what allows this previously unlocatable type of voter to be rendered legible is the vast amount of data parties have accumulated and crafted into additional analytics, with both parties holding over 150 million voter files (Green, 2004). These claims are renewed year after year, with numbers inching up – nearly 200 million files held by the Obama campaign in 2012 (Pilkington and Michel, 2012) – and commonly argue that new data creates the conditions under which “a campaign can literally know who on a block by block basis is persuadable” (Miller, 2012).

The data imaginary’s frame of data as useful insofar as it reveals new information can also be seen in how the amount and type of data campaigns use is covered and revered. Descriptions of data operations that fundamentally centre scale – the “as many as 306 lifestyle variables” held by the Democrats in 2004 (Green, 2004), the “500 data points on every individual” the Obama 2012 campaign made use of, or Cambridge Analytica’s boasts of having over 5,000 data points on every American (Chon, 2019) – emphasise how the scale of data operations will necessarily reveal new qualities about voters. Coverage of data campaigning also draws on and reproduces the data imaginary when it assumes that all data is equally valuable in producing insights about potential voters, and that larger data sets filled with novel data points are therefore most valuable. For instance, calling data campaigning “the Moneyball of politics” (Miller, 2012), argues that data’s value is in strange, undervalued data points, not information like that which is in the publicly available voter file. Until about 2004, these novel data points were the lifestyle data discussed above, and in 2016, they were Cambridge Analytica’s “psychographic” profiles. When Cambridge Analytica CEO Alexander Nix boasts that psychographics “are equally important, or probably more important” than demographic categories (Nix, 2016), he is not only empirically wrong, but is relying on and reproducing the data imaginary. Implicitly, claims like this argue that out of 500 data points a campaign has about me, it is the 490 new and strange ones that allows them to see me more clearly, rather than the publicly available ten concerning my voting history, address, gender, age, and so on, that have been empirically shown to have the most explanatory power.

While the predictive models enabled by voter files and campaign-collected data like that gathered from phone banking have made measurable and important differences in turnout (Nickerson & Rogers, 2014) and can play a role in determining strategy across all aspects of a campaign, they are fundamentally questions of probability and prediction. Yet, prediction becomes prophecy in the data imaginary. Data journalists like Nate Silver have written extensively about the difficulty in explaining probability and uncertainty in models (Silver, 2015), and yet campaigns and political journalists often describe data-campaigning strategies in ways that reify their certainty. In 2016, the Trump campaign described how they knew ads aimed at depressing turnout in Black communities would work by merely stating “we know because we’ve modelled this” (Green & Issenberg, 2016). Similarly, when Nix claims Cambridge Analytica can “form a model to predict the personality of every single adult in the United States of America” he is also making a claim about the infallibility of this prediction. This treatment of predictions as prophecies that inevitably reveal the truth occurred in discussions of the Obama campaign’s use of data in 2012 as well, with news stories celebrating their ability “make the data give up its secrets” (Dickinson, 2012).

Coverage of testing — whether simple A/B testing or more rigorous randomised controlled trials — also falls into the tropes of the data imaginary, particularly the emphasis on revealing. The major data story of the 2008 election was the use of A/B testing in the Obama campaign, and much coverage of the practices and results of testing fall into themes of revealing and panoramic. After Obama won the 2008 election, narratives focused on how testing the seeming small differences in interface design, such as moving a button or adding a splash page would lead to changes in behaviour that would otherwise be unknowable (Siroker, 2010). A/B testing was also central to the narrative about the power of the Trump campaign’s success with Facebook ads, as they claimed to have run upwards of 100,000 ad variations per day to do “A/B tests on steroids” (Lapowsky, 2016b; Green & Issenberg, 2016). The amount of tests provides a panoramic vision of what messages would work, echoing the “test everything” mantra of 2012 Obama campaign manager Jim Messina, and contributing to the idea that testing all possible variables will reveal new information and make the world of digital campaigning entirely knowable. In one of the more widely discussed examples, wherein the Obama campaign tested small differences in the words written on a button “Learn More” versus “Sign Up”, these analytics-based tests revealed differences that were barely observable (Siroker, 2010). Another lesson from the Obama campaign was that “Sometimes, ugly stuff won” (Engage DC, 2012). In cases like these, data acts as Beer’s “prosthetic eye”, seeing the advantage of choices that traditional experts like user experience designers and advertising creatives saw as poorly designed or aesthetically displeasing.

Within the data imaginary, testing is also framed as more than a way to reveal how to best mobilise actions like signing up for an email list or donating money; discussions of its power slip into assumptions of its persuasive capacity as well. The Cambridge Analytica whistleblower Chris Wylie has repeatedly described their targeting and testing practices as “psychological warfare tools” (Cadwalladr, 2018). In an article headlined “How He Used Facebook to Win”, Sue Halpern (2017) covers how the Trump campaign spent “in the high eight figures just on persuasion [in social media ads]”, although most of those ads were designed to fundraise, according to Trump campaign manager Brad Parscale himself (Lapowsky, 2016b). To use the terms conceptualised by digital security and privacy NGO Tactical Tech (2019), data has value as an asset that can be traded or sold, and as intelligence in better understanding the electorate’s views and behaviour, but these are commonly collapsed into or ignored in favour of understanding data as influence that can manipulate views or votes.

The data imaginary enables coverage of practices known to work to be conflated with those unknown or failed practices, such as psychographic or other micro-targeted persuasive advertising. In 2012, the Obama campaign’s use of Facebook data to leverage social pressure to increase turnout was empirically tested and shown to have a significant effect on turnout. In 2016, the narratives about Cambridge Analytica’s use of different Facebook data to target ads based on psychological profiles of users — a practice that its own designer considers not to be internally valid, political scientists consider unlikely to be externally valid, and that is fundamentally different from social-pressure — were widely hailed as productive in the absence of empirical tests. They were also explicitly linked in news coverage, because they both drew on Facebook data (Page, 2016; Garcia-Navarro 2016). Similarly, creating turnout models to inform voter mobilisation efforts is fundamentally about using data to predict electoral outcome and emphasising its effects is integral to producing meaningful coverage of data campaigning. Yet, the narratives about other data points as necessarily adding predictive power are also presented as truth, though little research has been done in this area. This is the data imaginary at work. Imaginaries sometimes reflect reality, but often reflect a hope — or fear — for how data will work, and emphasise the revealing, panoramic, and prophetic qualities of data to do so. Across all of these themes, practices of data campaigning go without rigorous treatment or attention to how, when, and why they actually result in meaningful change or affect political behaviour. Instead, it is assumed that the new things that are revealed and made visible are of political importance, and that the future predicted by a data set is both necessarily correct and is open to manipulation using additional data-driven tactics.


Data campaigning has been heralded as an effective tool for nearly all aspects of campaigning, from GOTV to persuasion to fundraising. This article has attempted to nuance those claims, providing an overview of the known empirical evidence that supports how, when, and what kind of data is most effective for a variety of campaign goals. Despite overarching claims of its importance, the empirical facts are more of a mixed bag. On one hand, data campaigning has optimised field organising and improved voter turnout, using reams of data found in voter files and party databases to figure how to most effectively get people to the polls by canvassing, calling, or text messaging (Gerber & Green, 2017; Malhotra et al., 2011; Michelson & Nickerson, 2011; Nickerson & Rogers, 2010). On the other, there is little evidence that data-driven targeting works for persuasive outcomes, or changing someone’s vote preference, despite the publicity concerning these practices’ importance (Kalla & Broockman, 2018). What we do know is that: basic, publicly available demographic information — not the reams of hyper-specific lifestyle information — is often most effective for improving turnout (Hersh, 2015). New tactics like using social media provided models to locate and target previously-unknown potential voters seems to hold great promise for mobilising fundraising, but considerably less for persuasive outcomes. Moreover, despite the very real ways data campaigning matters, data capabilities are lacking in down-ballot races (Anstead, 2017; Baldwin-Philippi, 2016) and even successful top-of-the-ballot campaigns can lag behind (Baldwin-Philippi, 2017; Kreiss, 2016; Kreiss et al., 2018), and despite the ability to target hyper-specific messages to audience segments, social media ads’ content often mirrors the narratives of broad national-level ads (Anstead et al., 2018).

Despite these limited findings, public discussion of data campaigning and micro-targeting persistently makes claims about its power and effectiveness, often drawing on common tropes of “the data imaginary”. Drawing on scholarship rooted in science and technology studies and internet studies that critically examine the power that “big data” as both an object and method has gained across a variety of fields (Beer, 2014; boyd & Crawford, 2012), this article emphasises the way that these descriptions of data are often decoupled from descriptions of why data is correct or relevant to the case at hand. Overwhelmingly, they come instead of, not alongside, discussions of the known empirical effects of such data.

There are stakes to the publication of these claims and belief in their veracity beyond that of unrigorous journalism. Importantly, the data imaginary is not just a set of tropes that give credibility to data practices, but a productive, reifying process that reinforces the power of the imaginary itself. With that assumption of power, consulting firms and data corporations more likely to receive investment and earn contracts, thus shoring up the data industry and firms doing that type of work. Not only is this the case in the US, but these myths are exported to other countries as “cutting-edge” tactics. While some regulations prohibiting the use of certain types of data and targeting practices exist, it is likely that data-driven practices will push up to the limit of local laws, or even practices that explicitly and clearly break local laws, as was the case when Cambridge Analytica violated campaign finance laws during their work for the Vote Leave campaign ahead of the 2016 Brexit referendum.

There are political risks to relying on claims about the effects of data campaigning to further regulate, too. Concerns about the use of data by political campaigns take many valid forms, including but not limited those related to privacy, discrimination, and deceptive authorship. But resting these concerns upon assumptions of data’s inherent power and manipulation makes for slippage between these many problems. As a brief example, a 2018 UK Information Commissioner’s Office (ICO) report “Democracy disrupted: Personal information and political influence” primarily focuses on the (important) problems of the lack of transparency and users’ inability to control their data, but uses the claim of “influence” as a main reason for exigency in regulating. Regulation of commercial and marketing practices that protect privacy has not generally been concerned with how successful or effective marketing is, and yet, implicit claims of effects and “manipulation” slide into political discussions seamlessly. Although the current environment seems ripe for opportunities to regulate, relying on, or even emphasising, the strong effects of micro-targeting to make persuasive arguments about doing so actually poses risks because of the faulty assumptions it involves. In attending to the particulars of how data is actually used, and what its effects are within the campaigns, this article aims to re-orient conversations away from concern about media effects, and toward more fundamental ethical and normatively democratic questions that regulation has, and likely should be, concerned with.


Anstead, N. (2017). Data-Driven Campaigning in the 2015 United Kingdom General Election. The International Journal of Press/Politics, 22(3), 294–313.

Anstead, N., Magalhães, J. C., Stupart, R., & Tambini, D. (2018, August 22). Political Advertising on Facebook: The Case of the 2017 United Kingdom General Election. European Consortium of Political Research Annual General Meeting, Hamburg. Retrieved from

Baldwin-Philippi, J. (2016). The Cult(Ure) of Analytics in 2014. In J. A. Hendricks & D. Schill (Eds.), Communication and Midterm Elections: Media, Message, and Mobilization, (pp. 25–42). New York: Palgrave Macmillan.

Baldwin-Philippi, J. (2017). The Myths of Data-Driven Campaigning. Political Communication, 34(4), 627–633.

Bennett, C. (2016). Voter Databases, Micro-Targeting, and Data Protection Law: Can Political Parties Campaign in Europe as They Do in North America?. International Data Privacy Law, 6(4), 261–275.

Blodgett, J. (2008). Winning Your Election the Wellstone Way: A Comprehensive Guide for Candidates and Campaign Workers. Minneapolis: University of Minnesota Press.

Boerman, S., & Kruikemeier, S. (2016). Consumer responses to promoted tweets sent by brands and political parties. Computers in Human behaviour, 65, 285–294.

boyd, d., & Crawford, K. (2012). Critical Questions for Big Data. Information, Communication & Society, 15(5), 662–679.

Brooks, R. (2018, March 16). This Is B-A-D: Some Democrats Are Sick Of A DIRE Email Strategy. BuzzFeed News. Retrieved from

Cadwalladr, C. (2018, March 18). ‘I Made Steve Bannon’s Psychological Warfare Tool’: Meet the Data War Whistleblower. The Guardian. Retrieved from

Carr, D. (2008, November 9). How Obama Tapped Into Social Networks’ Power. The New York Times. Retrieved from

Chester, J. and Montgomery, K. (2017). The Role of Digital Marketing in Political Campaigns. Internet Policy Review, 6(4).

Chon, G. (2018, July 19) Blaming Big Data is Political Diversion. Reuters. Retrieved from

Collins, K. (2017, October 31). How Trump Won at Facebook to Win the Presidency. CNET. Retrieved from

Confessore, N., Hakim, D. (2018, January 20). Data Firm Says ‘Secret Sauce’ Aided Trump; Many Scoff. The New York Times. Retrieved from

Dickinson, T. (2012, December 7). The Obama Campaign’s Real Heroes. Rolling Stone. Retrieved from

Dobber, T., Trilling, D., Helberger, N., & de Vreese, C. H. (2017). Two Crates of Beer and 40 Pizzas: The Adoption of Innovative Political Behavioural Targeting Techniques. Internet Policy Review, 6(4).

Dommett, K. (2019) Data-driven campaigns in practice: Understanding and regulating diverse data-driven campaigns. Internet Policy Review, 8(4).

Dommett, K., & Power, S. (2019). The Political Economy of Facebook Advertising: Election Spending, Regulation and Targeting Online. The Political Quarterly, 90(2), 257–265.

Engage DC. (2013) Inside the Cave: An In-Depth Look at the Digital, Technology, and Analytics Operations of Obama for America. Retrieved from

Gerber, A., & Green, D. (2017). Field Experiments on Voter Mobilization: An Overview of a Burgeoning Literature. In A. V. Banerjee and E. Duflo (Eds.), Handbook of Field Experiments (pp. 395–438). Cambridge, MA: Elsevier.

Gertner, J. (2004, February 15). The Very, Very Personal Is the Political. The New York Times. Retrieved from:

Goldmacher, S. (2016, September 7). Hillary Clinton’s ‘Invisible Guiding Hand.’ POLITICO Magazine. Retrieved from

Grassegger, H., Krogerus, M. (2017, January 28). The Data That Turned the World Upside Down. Motherboard (Vice). Retrieved from

Green, J. (2004, February 15). In Search of the Elusive Swing Voter. The Atlantic. Retrieved from

Green, J., & Issenberg, S. (2016, October 27). Inside the Trump Bunker, With Days to Go. Bloomberg. Retrieved via

Haenschen, K., & Jennings, J. (2019). Mobilizing Millennial Voters with Targeted Internet Advertisements: A Field Experiment. Political Communication, 36(3), 357–375.

Halpern, S. (2017, June 8). How He Used Facebook to Win. New York Review of Books. Retrieved from

Hellmann, J. (2016, August 26). Trump: ‘How Quickly People Forget’ Clinton ‘Superpredator’ Remark. The Hill. Retrieved from

Hellweg, E. (2012, November 13). 2012: The First Big Data Election. Harvard Business Review. Retrieved from

Hersh, E. D. (2015). Hacking the Electorate: How Campaigns Perceive Voters. New York: Cambridge University Press.

Howard, P. (2006). New Media and the Managed Citizen. New York: Cambridge University Press.

Issenberg, S. (2012). The Victory Lab: The Secret Science of Winning Campaigns. New York: Crown Books.

Kalla, J.L., & Broockman, D. (2018). The Minimal Persuasive Effects of Campaign Contact in General Elections: Evidence from 49 Field Experiments. American Political Science Review, 112(1), 148–166.

Kreiss, D. (2012). Yes We Can (Profile You): A Brief Primer on Campaigns and Political Data. Stanford Law Review Online, 64, 70–74. Retrieved from

Kreiss, D. (2016). Prototype Politics: Technology-Intensive Campaigning and the Data of Democracy. New York: Oxford University Press.

Kreiss, D., Lawrence, R., & McGregor, S. (2018). In Their Own Words: Political Practitioner Accounts of Candidates, Audiences, Affordances, Genres, and Timing in Strategic Social Media Use. Political Communication, 35(1), 8–31.

Kruschinski, S., & Haller, A. (2017). Restrictions on Data-Driven Political Micro-Targeting in Germany. Internet Policy Review 6(4).

Lapowsky, I. (2016a, August 15). A Lot of People Are Saying Trump’s New Data Team Is Shady. Wired. Retrieved from

Lapowsky, I. (2016b, November 15). Here’s How Facebook Actually Won Trump the Presidency. Wired. Retrieved from

Lau, R., Sigelman, L., & Rovner, I.B. (2007). The Effects of Negative Political Campaigns: A Meta-Analytic Reassessment. Journal of Politics, 69(4), 1176–1209.

MacFarquhar, N. (1996, October 20). What’s a Soccer Mom Anyway? The New York Times. Retrieved from

Malhotra, N., Michelson, M., Rogers, T., & Valenzuela, A. (2011). Text Messages as Mobilization Tools: The Conditional Effect of Habitual Voting and Election Salience. American Politics Research, 39(4), 664–81.×11398438

Marotta, V., Abhishek, V., & Acquisti, A. (2019, June). Online Tracking and Publishers’ Revenues: An Empirical Analysis. Workshop on the Economics of Information Security. Boston, MA. Retrieved from

McKelvey, F., & Piebiak, J. (2016). Porting the Political Campaign: The NationBuilder Platform and the Global Flows of Political Technology. New Media & Society, 20(3), 901–918.

Michelson, M., & Nickerson, D. (2011). Voter Mobilization. In J. Druckman, D. P. Green, J. H. Kuklinski, and A. Lupia (Eds.), Cambridge Handbook of Experimental Political Science, (pp. 228–242). New York: Cambridge University Press.

Miller, Ryan (2012, Oct 30). The Digital Campaign. Frontline. Boston, MA: PBS–WGBH. Retrieved from:

Mosco, V. (2004) The Digital Sublime: Myth, Power, and Cyberspace. Cambridge, MA: The MIT Press.

Nickerson, D., Rogers, T. (2010). Do You Have a Voting Plan? Implementation Intentions, Voter Turnout, and Organic Plan Making. Psychological Science, 21(2), 194–199. DOI:

Nickerson, D., & Rogers, T. (2014). Political Campaigns and Big Data. The Journal of Economic Perspectives, 28(2), 51–73.

Nix, A. (2016, April 4). Cambridge Analytica: The Power of Big Data and Psychographics [Presentation]. Concordia Summit, New York. Retrieved from

Otenyo, E. (2010). Game ON: Video Games and Obama’s Race to the White House. In J. A. Hendricks & R. E. Denton, Jr. (Eds.), Communicator-in-Chief (pp. 123–138). Landham, MD: Lexington.

Pilkington, E., Michel, A. (2012, February 17). Obama, Facebook and the Power of Friendship: The 2012 Data Election. The Guardian. Retrieved from

Republican National Committee. 2013. Growth and Opportunity Project.

Savransky, R. (2016, October 19). Trump Ad Knocks Clinton over ‘superpredator’ Remark. The Hill. Retrieved from

Silver, N. (2015). The Signal and the Noise. New York: Penguin Random House.

Siroker, D. (2010, November 9). How Obama Raised $60 Million By Running a Simple Experiment [Blog post]. Optimizely Blog. Retrieved from

Sumpter, D. (2018). Outnumbered: From Facebook and Google to Fake News and Filter-Bubbles – The Algorithms that Control our Lives. London: Bloomsbury.

Strömbäck, J., & Kiousis, S. (2014). Strategic political communication in election campaigns. In C. Reinemann (Ed.) Political Communication (pp. 109–128). Berlin: DeGruyter Mouton.

Tactical Tech. (2019). Personal Data: Political Persuasion: Inside the Influence Industry. How it Works. Retrieved from

Thiessen, M. (2012, November 12). How Obama Trumped Romney with Big Data. Washington Post. Retrieved from

Tufekci, Z. (2012, November 16). Beware the Big Data Campaign. The New York Times. Retrieved from

Tufekci, Z. (2014). Engineering the public: Big data, surveillance and computational politics. First Monday, 19(7).

UK Information Commissioner’s Office (ICO). (2018, July 11). Democracy Disrupted? Personal information and political influence. Retrieved from

Weeks, B. (2015). Emotions, Partisanship, and Misperceptions: How Anger and Anxiety Moderate the Effect of Partisan Bias on Susceptibility to Political Misinformation. Journal of Communication, 65(4), 699–719.

Woolgar, S. (2002). Virtual Society? Technology, Cyberbole, Reality. New York: Oxford University Press.

This article by Jessica Baldwin-Philippi, originally published on The Internet Policy Review is licensed under aCreative Commons Attribution 3.0 Germany(CC BY 3.0 DE) license.

What do you think?

2512 points
Upvote Downvote

Written by Internet Policy Review

Internet Policy Review is an open access and peer-reviewed journal on internet regulation. Scholars, regulators, journalists, activists, and other stakeholders publish in the journal in a variety of formats


Leave a Reply

Your email address will not be published. Required fields are marked *


08 min 1 scaled

Taobao, Nike, and the US government: how US-made rules shape internet governance in China

Beyond Information Costs Preference Formation and the Architecture of Property

Beyond Information Costs: Preference Formation and the Architecture of Property Law | Journal of Legal Analysis