Demographic bias in social media language analysis

Before the break, I had the opportunity to hear Brendan O’Connor talk about his recent paper with Su Lin Blodgett and Lisa Green: Demographic Dialectal Variation in Social Media: A Case Study of African-American English.

Imagine an algorithm designed to classify sentences. Perhaps it identifies the topic of the sentence or perhaps it classifies the sentiment of the sentence. These algorithms can be really accurate – but they are only as good as the corpus they are trained on.

If you train an algorithm on the New York Times and then try to classify tweets, for example, you may not have the kind of success you might like – the language and writing style of the Times and a typical tweet being so different.

There’s a lot of interesting stuff in the Blodgett et al. paper, but perhaps most notable to me is their comparison of the quality of existing language identification tools on tweets by race. They find that these tools perform poorly on text associated with African Americans while performing better on text associated with white speakers.

In other words, if you got a big set of Twitter data and filtered out the non-English tweets, that algorithm would disproportionally identify tweets from black authors as not being in English, and those tweets would then be removed from the dataset.

Such an algorithm, trained on white language, has the unintentional effect of literally removing voices of color.

Their paper presents a classifier to eliminate that disparity, but the study is an eye-opening finding – a cautionary tail for anyone undertaking language analysis. If you’re not thoughtful and careful in your approach, even the most validated classifier may bias your data sample.

facebooktwittergoogle_plusredditlinkedintumblrmail

the hollowing out of US democracy

How could a celebrity with no governing experience and no grassroots infrastructure alienate and offend an outright majority of Americans, adopt positions far from the mainstream, and yet become our president?* I argue that the underlying reason is a hollowed-out democracy in which many citizens no longer expect to be represented by accountable organizations and are no longer invited to share in governing. A celebrity who offers symbolic politics has the number of followers and the level of attention that professional politicians strive to buy with their cash. In an environment of isolated citizens, he wins.

We still have plenty of voluntary associations and networks concerned with politics. But politics is a minority taste, so these groups draw a small proportion of the population. And because most of them attract members by offering a political message or agenda, they produce ideologically homogeneous groups.

We also still have very large numbers of professional advocacy organizations, but many of them are accountable to donors rather than members, and their capacity and vision come from their highly-skilled professional staff, not from citizens.

We also have some large movements that look accountable but aren’t. The Koch Brothers network, for example, employs 1,200 full-time, year-round staffers in 107 offices nationwide, more than the Republican Party. The Koch Brothers own it.

What we lack now are the kinds of organizations that I believe have been core to US civil society since the era of de Tocqueville. They offer benefits other than politics to attract members. They draw a range of people–not representative samples of the US population, but diverse groups. They give their members reasons to think politically and aggregate their political power. They create pathways to political leadership for those who become most interested. And they depend on their members’ support for survival.

In short, they offer what I’d call SPUD–Scale, Pluralism, Unity, Depth–which is the magic recipe for civic engagement.

Four traditional types of organizations that offered SPUD were unions, political parties dependent on local voluntary labor, religious congregations, and metropolitan daily newspapers. All four were imperfect, but each was much better than nothing. And they are all in bad shape today.

I’ve previously shown that newspapers have lost readership precipitously and parties have become loose networks of entrepreneurial politicians and donors instead of actual organizations. Unions and religious congregations have also shrunk. To illustrate those two trends, here is a new graph that shows the rates of union membership and weekly religious attendance. The top line is the proportion of adult Americans who either attend weekly services or belong to a union, or both. That proportion has dropped by twenty points, from a majority of 54.6 percent in 1970 to a minority of 34.3 percent in 2012. (By the way, I am skeptical that union membership really rose in 2012; I suspect that’s random noise.) I would look no further than this 20-point drop for the underlying conditions that yielded The Donald.

*The echo of Hamilton was inadvertent but seemed apt once I noticed it.

NCDD’s Year in Numbers: 2016

2016 was an important year for the dialogue & deliberation community, and a very active one for NCDD!

The following end-of-year infographic highlights NCDD’s impact and activity during the past twelve months, including activity at our national conference, on our website, listservs, events, and new and exciting efforts. Our sincere thanks to NCDD co-founder Andy Fluke for his fabulous design work!

Thanks to our fabulous community for engaging with us in so many ways in 2016! We look forward to continuing this important work with you all in 2017.

Please share this post with others you think should know there’s an amazing community of innovators in public engagement and group process work they can tap into or join!

In addition to sharing this post and/or just the image above, feel free to download the print-quality PDF.

Conto Partecipo Scelgo. Il Bilancio partecipativo del Comune di Milano [Milan Participatory budget]

L’obiettivo del Bilancio partecipativo di Milano è "progettare con i cittadini interventi di interesse pubblico da realizzare sul territorio delle 9 Zone. Per questi progetti sono disponibili 9 milioni di euro totali, 1 milione di euro per ciascuna Zona, da utilizzare per investimenti. Possono partecipare tutti coloro che abitano, studiano...

The Knowledge Economy and (Ab)use of Symbols

I’m taking a Network Economics class this semester, and we’ve reasonably begun by reading The Use Knowledge in Society – in which Hayek addresses the economic problem of information scarcity.

The economic problem faced by society, Hayek argues, is that “the knowledge of the circumstances of which we must make use never exists in concentrated or integrated form, but solely as dispersed bits of incomplete and frequently contradictory knowledge which all the separate individuals possess.” That is, the problem is “how to secure the best use of resources known to any members of society, for ends whose relative importance only these individuals know.”

Hayek, of course, sees this problem as one which is best solved by the free market – by decentralization of economic decisions. On its face, his argument makes a lot of sense: “If we can agree that the economic problem of society is mainly one of rapid adaptation to changes in the particular circumstances of time and place, it would seem to follow that the ultimate decisions must be left to the people who are familiar with these circumstances, who know directly of the relevant changes and of the resources immediately available to meet them. We can’t expect that this problem will be solved by first communicated all this knowledge to a central board which, after integrating all knowledge, issues its orders. We must solve it by some process of decentralization.”

There is a lot of Hayek’s argument that I agree with. In the civic space, we often talk about the danger of expertise – technical knowledge is valuable and important, but reducing a community problem to a technocratic solution overlooks the expertise of the people themselves. No expert, no matter how well educated, can parachute into a community they know nothing about and successfully solve it’s problems without engaging community solutions.

But I don’t follow Hayek’s jump – just because a purely technocratic solution is clearly bad it does not necessarily follow that a purely populist solution is therefore good.

Hayek praises the pricing system of the open market as a mechanistic marvel – as an emergent behavior which continually tends towards the equilibrium of an instantaneous time and context. In other words, pricing becomes a tool for coordination, a “mechanism for communicating information.” It operates as “a kind of symbol” ensuring that “only the most essential information is passed on and only to those concerned.”

This is a inspiring description of market pricing, but it obscures the problems with such an approach – namely, it is unclear just how much people know and how much of that information is accurate.

Hayek’s invocation of ‘symbols’ immediately makes me think of Lippmann’s work – symbols can be powerful tools for coordination, but they are also props for propaganda and manipulation.

John Dewey describes the positive impact of symbols, writing, “Events cannot be passed from one to another, but meanings may be shared by means of signs. Wants and impulses are then attached to common meanings. They are thereby transformed into desires and purposes, which, since they implicate a common or mutually understood meaning, present new ties, converting a joint activity into a community of interest and endeavor. Thus there is generated what, metaphorically, may be termed a general will and social consciousness: desire and choice on the part of individuals in behalf of activities that, by means of symbols, are communicable and shared by all concerned.”

The problem, as Lippmann points out, is that elites are too easily able to manipulate those signs and symbols – to manufacture a shared experience and expectation which comes, not truly from the knowledge possessed by individuals, but which are myths designed solely to fulfill elite’s goals.

facebooktwittergoogle_plusredditlinkedintumblrmail