The Hidden Risks of AI: How Linguistic Diversity Can Make or Break Collective Intelligence

Diversity is a key ingredient in the recipe for collective intelligence because it brings together a range of perspectives, tools, and abilities; allowing for a more comprehensive approach to problem-solving and decision-making. Gender diversity on corporate boards improves firms’ performance, ethnic diversity produces more impactful scientific research, diverse groups are better at solving crimes, popular juries are less biased than professional judges, and politically diverse editorial teams produce higher-quality Wikipedia articles.

Large language models, like those powering AI systems, rely heavily on datasets or corpora, with a significant part of it based on English content. This dominance is consequential. Just as diverse groups of people yield richer outcomes, an AI trained on diverse linguistic data offers a broader perspective. Each language encapsulates unique thoughts, metaphors, and wisdom. Without diverse linguistic representation, we risk fostering AI systems with limited collective intelligence. The quality, diversity, and quantity of the data they are trained on directly influence their epistemic outputs. Unsurprisingly, large language models struggle to capture long-tail knowledge.

This comes with two major — at least hypothetically — risks: 1) systems that do not fully leverage the knowledge dispersed in the population, 2) the benefits of AI may be more accessible to some groups over others; for instance, speakers of less-dominant languages might not equally benefit from AI’s advancements. It’s not merely about translation; it’s the nuances and knowledge embedded in languages that might be overlooked.

There are also two additional dimensions that could reinforce biases in AI systems: 1) as future models are trained on content that might have been generated by AI, there may be a reinforcing effect where biases present in the initial training data are amplified over time; and 2) techniques such as guided transfer learning may also increase biases if the source model used in transfer learning is trained on biased data.

This introduces a nuanced dimension to the digital divide. Historically, the digital divide was characterized by access to technology, internet connectivity, digital skills, and the socio-economic variables shaping these factors. Yet, with AI, our understanding of what constitutes digital divide should expand. It’s a subtler yet crucial divide that policymakers and development practitioners might not yet fully recognize.

Voices in the Code: Citizen Participation for Better Algorithms

Image by mohamed Hassan from Pixabay

Voices in the Code, by David G. Robinson, is finally out. I had the opportunity to read the book prior to its publication, and I could not recommend it enough. David shows how, between 2004 and 2014 in the US, experts and citizens came together to build a new kidney transplant matching algorithm. David’s work is a breath of fresh air for the debate surrounding the impact of algorithms on individuals and societies – a debate typically focused on the negative and sometimes disastrous effects of algorithms. While David conveys these risks at the outset of the book, focusing solely on these threats would add little to a public discourse already saturated with concerns. 

One of the major missing pieces in the “algorithmic literature” is precisely how citizens, experts and decision-makers can make their interactions more successful, working towards algorithmic solutions that better serve societal goals. The book offers a detailed and compelling case where a long and participatory process leads to the crafting of an algorithm that delivers a public good. This, despite the technical complexities, moral dilemmas, and difficult trade-offs involved in decisions related to the allocation of kidneys to transplant patients. Such a feat would not be achieved without another contribution of the book, which is to offer a didactical demystification of what algorithms are, normally treated as a reserved domain of few experts.

As David conducts his analysis, one also finds an interesting reversal of the assumed relationship between technology and participatory democracy. This relationship has mostly been examined from a civic tech angle, focusing on how technologies can support democratic participation through practices such as e-petitions, online citizens’ assemblies, and digital participatory budgeting. Thus, another original contribution of this book is to look at this relationship from the opposite angle: how can participatory processes better support technological deployments. While technology for participation (civic tech) remains an important topic, we should probably start paying more attention to how participation can support technological solutions (civic for tech).           

Continuing on through the book, other interesting insights emerge. For instance, technology and participatory democracy pundits normally subscribe to the virtues of decentralized systems, both from a technological and institutional perspective. Yet David depicts precisely the virtues of a decision-making system centralized at the national level. Should organ transplant issues be decided at the local level in the US, the results would probably not be as successful. Against intuition, David presents a clear case where centralized (although participatory) systems might offer better collective outcomes. Surfacing this counterintuitive finding is a welcome contribution to debates on the trade-offs between centralization and decentralization, both from a technological and institutional standpoint. 

But a few paragraphs here cannot do the book justice. Voices in the Code is certainly a must-read for anybody working on issues ranging from institutional design and participatory democracy, all the way to algorithmic accountability and decision support systems.

***

P.s. As an intro to the book, here’s a nice 10 min. conversation with David on the Marketplace podcast.

A Charter for How to Build Effective Data (and Mapping) Commons

Among those trying to build a new economy, there is growing interest in developing online maps as tools for helping people understand and engage with the rich possibilities.  One of the earliest such maps was TransforMap, a project with origins in Austria and Germany that is using OpenStreetMap as a platform for helping people identify and connect with alternative economic projects. In the US, CommonSpark assembled a collection of “maps in the spirit of the commons” such as

the Great Lakes Commons Map (a bioregional map of healing and harm), World of Commons (innovative forms of citizen-led governance of public property and services in Italy), Falling Fruit (a global map identifying 786,000 locations of forgeable food), a map of Free Little Libraries (free books available in neighborhoods around the world), a global Hackerspace map, a global Seed Map, a map of all Transition communities, and several Community Land Trust directory maps.

As the varieties of maps proliferate, there is growing concern that the mapping projects truly function as commons and be capable of sharing data and growing together. But meeting this challenge entails some knotty technical, social and legal issues.  

A group of mappers met at the Commons Space sessions of the World Social Forum in Montreal last year to try to make progress on the challenge.  The dialogues continued at an "Intermapping” workshop in Florence, Italy, last month. After days of deep debate and collaboration, the mappers came up with a document that outlines twelve key principles for developing effective data and mapping commons. The Charter for Building a Data Commons for a Free, Fair and Sustainable Future is the fruit of those dialogues.

read more

New Papers Published: FixMyStreet and the World’s Largest Participatory Budgeting

2016_7_5_anderson-lopes_consulta-popular_virtual

Voting in Rio Grande do Sul’s Participatory Budgeting  (picture by Anderson Lopes)

Here are two new published papers that my colleagues Jon Mellon, Fredrik Sjoberg and myself have been working on.

The first, The Effect of Bureaucratic Responsiveness on Citizen Participation, published in Public Administration Review, is – to our knowledge – the first study to quantitatively assess at the individual level the often-assumed effect of government responsiveness on citizen engagement. It also describes an example of how the data provided through digital platforms may be leveraged to better understand participatory behavior. This is the fruit of a research collaboration with MySociety, to whom we are extremely thankful.

Below is the abstract:

What effect does bureaucratic responsiveness have on citizen participation? Since the 1940s, attitudinal measures of perceived efficacy have been used to explain participation. The authors develop a “calculus of participation” that incorporates objective efficacy—the extent to which an individual’s participation actually has an impact—and test the model against behavioral data from the online application Fix My Street (n = 399,364). A successful first experience using Fix My Street is associated with a 57 percent increase in the probability of an individual submitting a second report, and the experience of bureaucratic responsiveness to the first report submitted has predictive power over all future report submissions. The findings highlight the importance of responsiveness for fostering an active citizenry while demonstrating the value of incidentally collected data to examine participatory behavior at the individual level.

An earlier, ungated version of the paper can be found here.

The second paper, Does Online Voting Change the Outcome? Evidence from a Multi-mode Public Policy Referendum, has just been published in Electoral Studies. In an earlier JITP paper (ungated here) looking at Rio Grande do Sul State’s Participatory Budgeting – the world’s largest – we show that, when compared to offline voting, online voting tends to attract participants who are younger, male, of higher income and educational attainment, and more frequent social media users. Yet, one question remained: does the inclusion of new participants in the process with a different profile change the outcomes of the process (i.e. which projects are selected)? Below is the abstract of the paper.

Do online and offline voters differ in terms of policy preferences? The growth of Internet voting in recent years has opened up new channels of participation. Whether or not political outcomes change as a consequence of new modes of voting is an open question. Here we analyze all the votes cast both offline (n = 5.7 million) and online (n = 1.3 million) and compare the actual vote choices in a public policy referendum, the world’s largest participatory budgeting process, in Rio Grande do Sul in June 2014. In addition to examining aggregate outcomes, we also conducted two surveys to better understand the demographic profiles of who chooses to vote online and offline. We find that policy preferences of online and offline voters are no different, even though our data suggest important demographic differences between offline and online voters.

The extent to which these findings are transferable to other PB processes that combine online and offline voting remains an empirical question. In the meantime, nonetheless, these findings suggest a more nuanced view of the potential effects of digital channels as a supplementary means of engagement in participatory processes. I hope to share an ungated version of the paper in the coming days.

Bit by Bit: Social Research in the Digital Age

bigdata

Pic by Jim Kaskade (flickr creative commons)

Matthew Salganik, Professor of Sociology at Princeton University, has recently put his forthcoming book on social research and big data online for an open review. Matthew is the author of many of my favorite academic works, including this experiment in which he and Duncan Watts test social influence by artificially inverting the popularity of songs in an online music market. He is also the brains behind All Our Ideas, an amazing tool that I have used in much of the work that I have been doing, including “The Governor Asks” in Brazil.

As in the words of Matthew, this is a book “for social scientists that want to do more data science, and it is for data scientists that want to do more social science.” Even though I have not read the entire book, one of the things that has already impressed me is the simplicity with which Matthew explains complex topics, such as human computation, distributed data collection and digital experiments. For each topic, he highlights opportunities and provides experienced advice for those working with big data and social sciences. His stance on social research in the digital age is brilliant and refreshing, and is a wake-up call for lots of people working in that domain. Below is an excerpt from his preface:

From data scientists, I’ve seen two common misunderstandings. The first is thinking that more data automatically solves problems. But, for social research that has not been my experience. In fact, for social research new types of data, as opposed to more of the same data, seems to be most helpful. The second misunderstanding that I’ve seen from data scientists is thinking that social science is just a bunch of fancy-talk wrapped around common sense. Of course, as a social scientist—more specifically as a sociologist—I don’t agree with that; I think that social science has a lot of to offer. Smart people have been working hard to understand human behavior for a long time, and it seems unwise to ignore the wisdom that has accumulated from this effort. My hope is that this book will offer you some of that wisdom in a way that is easy to understand.

From social scientists, I’ve also seen two common misunderstandings. First, I’ve seen some people write-off the entire idea of social research using the tools of the digital age based on a few bad papers. If you are reading this book, you have probably already read a bunch of papers that uses social media data in ways that are banal or wrong (or both). I have too. However, it would be a serious mistake to conclude from these examples that all digital age social research is bad. In fact, you’ve probably also read a bunch of papers that use survey data in ways that are banal or wrong, but you don’t write-off all research using surveys. That’s because you know that there is great research done with survey data, and in this book, I’m going to show you that there is also great research done with the tools of the digital age.

The second common misunderstanding that I’ve seen from social scientists is to confuse the present with the future. When assessing social research in the digital age—the research that I’m going to describe in this book—it is important to ask two distinction questions:

How well does this style of research work now?

How well will this style of research work in the future as the data landscape changes and as researchers devote more attention to these problems?

I have only gone through parts of the book (and yes, I did go beyond the preface). But from what I can see, it is a must read for those who are interested in digital technologies and the new frontiers of social research. And while reading it, why not respond to Matthew’s generous act by providing some comments? You can access the book here.

 

New IDS Journal – 9 Papers in Open Government

2016-01-14 16.51.09_resized

The new IDS Bulletin is out. Edited by Rosemary McGee and Duncan Edwards, this is the first open access version of the well-known journal by the Institute of Development Studies. It brings eight new studies looking at a variety of open government issues, ranging from uptake in digital platforms to government responsiveness in civic tech initiatives. Below is a brief presentation of this issue:

Open government and open data are new areas of research, advocacy and activism that have entered the governance field alongside the more established areas of transparency and accountability. In this IDS Bulletin, articles review recent scholarship to pinpoint contributions to more open, transparent, accountable and responsive governance via improved practice, projects and programmes in the context of the ideas, relationships, processes, behaviours, policy frameworks and aid funding practices of the last five years. They also discuss questions and weaknesses that limit the effectiveness and impact of this work, offer a series of definitions to help overcome conceptual ambiguities, and identify hype and euphemism. The contributions – by researchers and practitioners – approach contemporary challenges of achieving transparency, accountability and openness from a wide range of subject positions and professional and disciplinary angles. Together these articles give a sense of what has changed in this fast-moving field, and what has not – this IDS Bulletin is an invitation to all stakeholders to take stock and reflect.

The ambiguity around the ‘open’ in governance today might be helpful in that its very breadth brings in actors who would otherwise be unlikely adherents. But if the fuzzier idea of ‘open government’ or the allure of ‘open data’ displace the task of clear transparency, hard accountability and fairer distribution of power as what this is all about, then what started as an inspired movement of governance visionaries may end up merely putting a more open face on an unjust and unaccountable status quo.

Among others, the journal presents an abridged version of a paper by Jonathan Fox and myself on digital technologies and government responsiveness (for full version download here).

Below is a list of all the papers:

Rosie McGee, Duncan Edwards
Tiago Peixoto, Jonathan Fox
Katharina Welle, Jennifer Williams, Joseph Pearce
Miguel Loureiro, Aalia Cassim, Terence Darko, Lucas Katera, Nyambura Salome
Elizabeth Mills
Laura Neuman
David Calleb Otieno, Nathaniel Kabala, Patta Scott-Villiers, Gacheke Gachihi, Diana Muthoni Ndung’u
Christopher Wilson, Indra de Lanerolle
Emiliano Treré

 

World Development Report 2016: Digital Dividends

nationalgeographic_1746433-wblive (1)

The World Development Report 2016, the main annual publication of the World Bank, is out. This year’s theme is Digital Dividends, examining the role of digital technologies in the promotion of development outcomes. The findings of the WDR are simultaneously encouraging and sobering. Those skeptical of the role of digital technologies in development might be surprised by some of the results presented in the report. Technology advocates from across the spectrum (civic tech, open data, ICT4D) will inevitably come across some facts that should temper their enthusiasm.

While some may disagree with the findings, this Report is an impressive piece of work, spread across six chapters covering different aspects of digital technologies in development: 1) accelerating growth, 2) expanding opportunities, 3) delivering services, 4) sectoral policies, 5) national priorities, 6) global cooperation. My opinion may be biased, as somebody who made some modest contributions to the Report, but I believe that, to date, this is the most thorough effort to examine the effects of digital technologies on development outcomes. The full report can be downloaded here.

The report draws, among other things, from 14 background papers that were prepared by international experts and World Bank staff. These background papers serve as additional reading for those who would like to examine certain issues more closely, such as social media, net neutrality, and the cybersecurity agenda.

For those interested in citizen participation and civic tech, one of the papers written by Prof. Jonathan Fox and myself – When Does ICT-Enabled Citizen Voice Lead to Government Responsiveness? – might be of particular interest. Below is the abstract:

This paper reviews evidence on the use of 23 information and communication technology (ICT) platforms to project citizen voice to improve public service delivery. This meta-analysis focuses on empirical studies of initiatives in the global South, highlighting both citizen uptake (‘yelp’) and the degree to which public service providers respond to expressions of citizen voice (‘teeth’). The conceptual framework further distinguishes between two trajectories for ICT-enabled citizen voice: Upwards accountability occurs when users provide feedback directly to decision-makers in real time, allowing policy-makers and program managers to identify and address service delivery problems – but at their discretion. Downwards accountability, in contrast, occurs either through real time user feedback or less immediate forms of collective civic action that publicly call on service providers to become more accountable and depends less exclusively on decision-makers’ discretion about whether or not to act on the information provided. This distinction between the ways in which ICT platforms mediate the relationship between citizens and service providers allows for a precise analytical focus on how different dimensions of such platforms contribute to public sector responsiveness. These cases suggest that while ICT platforms have been relevant in increasing policymakers’ and senior managers’ capacity to respond, most of them have yet to influence their willingness to do so.

You can download the paper here.

Any feedback on our paper or models proposed (see below, for instance) would be extremely welcome.

unpacking

unpacking user feedback and civic action: difference and overlap

I also list below the links to all the background papers and their titles

Enjoy the reading.


What do we want in a social studies teacher?

Recently, the National Council for the Social Studies released its National Standards for the Preparation of Social Studies Teachers guidance document for public review. If you are a parent, pre-service social studies teacher or teacher educator, current or former teacher, or, honestly, simply an engaged citizen with a concern for the future, I encourage you to check it out and provide them with feedback. The document, which runs about 23 pages, provides an overview of five core competencies that are important in social studies teacher education.

5 core competenciesEach section of the report dives deeper into each of the core competencies. No doubt, there will be some comments raised about the inclusion of ‘social justice’, knowing that that particular term was once removed from standards of the National Council for Accreditation of Teacher Education (now the Council for the Accreditation of Educator Preparation), for good or ill. I do appreciate the emphasis on inquiry, skills, and knowledge as having an equal role in teacher preparation, though the references to C3 could be an issue in states where that program is (unfortunately) not adopted.

As a civic educator, I cannot help but notice that there is a HUGE amount of attention paid to varying elements of civic education and competency. Obviously, if we consider that social studies is at the heart of instruction in good citizenship, this is important, but it’s ultimately necessary that these standards make clear that civics must necessarily connect to other content areas in our field, and I think they do that adequately.

You can review the standards here, and I encourage you to do so and reflect on them before you complete the survey here.


Civic Tech and Government Responsiveness

For those interested in tech-based citizen reporting tools (such as FixMyStreet, SeeClickFix), here’s a recent interview of mine with Jeffrey Peel (Citizen 2015) in which I discuss some of our recent research in the area.

 


Praising and Shaming in Civic Tech (or Reversed Nudging for Government Responsiveness) 

The other day during a talk with researcher Tanya Lokot I heard an interesting story from Russia. Disgusted with the state of their streets, activists started painting caricatures of government officials over potholes.

 

In the case of a central street in Saratov, the immediate response to one of these graffiti was this:  

 

Later on, following increased media attention – and some unexpected turnarounds – the pothole got fixed.

That reminded me of a recurrent theme in some conversations I have, which refers to whether praising and shaming matters to civic tech and, if so, to which extent. To stay with two classic examples, think of solutions such as FixMyStreet and SeeClickFix, through which citizens publically report problems to the authorities.

Considering government takes action, what prompts them to do so? At a very basic level, three hypothesis are possible:

1) Governments take action based on their access to distributed information about problems (which they supposedly are not aware of)

2) Governments take action due to the “naming and shaming” effect, avoiding to be publically perceived as unresponsive (and seeking praise for its actions)

3) Governments take action for both of the reasons above

Some could argue that hypothesis 3 is the most likely to be true, with some governments leaning more towards one reason to respond than others. Yet, the problem is that we know very little about these hypotheses, if anything. In other words – to my knowledge – we do not know whether making reports through these platforms public makes any difference whatsoever when it comes to governments’ responsiveness. Some might consider this as a useless academic exercise: as long as these tools work, who cares? But I would argue that the answer that questions matters a lot when it comes to the design of similar civic tech initiatives that aim to prompt government to action.

AAAFMSscreenshot

Let’s suppose that we find that all else equal governments are significantly more responsive to citizen reports when these are publically displayed. This would have importance both in terms of process and technological design. In terms of process, for instance, civic tech initiatives would probably be more successful if devoting part of their resources to amplify the visibility of government action and inaction (e.g. through local media). Conversely, from a technological standpoint, designers should devote substantive more effort on interfaces that maximizes praising and shaming of governments based on their performance (e.g. rankings, highlighting pending reports). Conversely, we might find that publicizing reports have very little effect in terms of responsiveness. In that case, more work would be needed to figure out which other factors – beyond will and capacity – play a role in government responsiveness (e.g. quality of reports).   

Most likely, praising and shaming would depend on a number of factors such as political competition, bureaucratic autonomy, and internal performance routines. But a finer understanding of that would not only bear an impact on the civic tech field, but across the whole accountability landscape. To date, we know very little about it. Yet, one of the untapped potential of civic technology is precisely that of conducting experiments at lowered costs. For instance, conducting randomized controlled trials on the effects on the publicization of government responsiveness should not be so complicated (e.g effects of rankings, amplifying visibility of unfixed problems). Add to that analysis of existing systems’ data from civic tech platforms, and some good qualitative work, and we might get a lot closer at figuring out what makes politicians and civil servants’ “tick”.

Until now, behavioral economics in public policy has been mainly about nudging citizens toward preferred choices. Yet it may be time to start also working in the opposite direction, nudging governments to be more responsive to citizens. Understanding whether praising and shaming works (and if so, how and to what extent) would be an important step in that direction.

***

Also re-posted on Civicist.