Linguistic Clues and Evidence Relationships

Posted on April 28, 2016 by Sarah Shugars

Earlier in the week, I wrote about the theory of coherent argument structure introduced by Robin Cohen in 1987. Her model also included two other elements: a theory of linguistic clue interpretation and a theory of evidence relationships. These theories, the focus of today’s post, are both closely connected to each other as well as to the theory of argument structure.

Theory of Linguistic Clue Interpretation
Cohen’s theory of linguistic clue interpretation argues for the existence of clue words; “those words and phrases used by the speaker to directly indicate the structure of the argument to the hearer.” Capable of being identified through simple n-gram models as well as more sophisticated means, these linguistic cues, or discourse markers, are a common feature of argument mining. Cohen outlines several common clue types such as redirection, which re-direct the hearer to an earlier part of the argument (“returning to…”) and connection, a broad category encompassing clues of inference (“as a result of…”), clues of contrast (“on the other hand…”), and clues of detail (“specifically…”). Most notably, though, Cohen argues that clues are necessary for arguments whose structure is more complex than those covered by the coherent structure theory. That is, the function of linguistic cues is not “to merely add detail on the interpretation of the contained proposition, but to allow that proposition an interpretation that would otherwise be denied” (Cohen 1987).

Discourse markers are also “strongly associated with particular pragmatic functions,” (Abbott, Walker et al. 2011) making them valuable for sentiment analysis tasks such as determining agreement or disagreement. This is the approach Abbott, Walker et al. used in classifying types of arguments within a corpus of 109,553 annotated posts from an online forum. Since the forum allowed for explicitly quoting another post, Abbott, Walker et al. identified 8,242 quote-response pairs, where a user quoted another post and then added a comment of their own.

In addition to the classification task of determining whether the response agrees or disagrees with the preceeding quote, the team analysed the pairs on a number of sentiment spectrums: respect/insult, fact/emotion, nice/nasty, and sarcasm. Twenty discourse markers identified through manual inspection of a subset of the corpus, as well as the use of “repeated punctuation” served as key features in the analysis.

Using a JRip classifier built on n-gram and bi-gram discourse markers, as well as a handful of other features such as post meta-data and topic annotations, Abbott, Walker et al. found the best performance (0.682 accuracy, compared to a 0.626 unigram baseline) using local features from both the quote and response. This indicates that the contextual features do matter, and, in the words of the authors, vindicates their “interest in discourse markers as cues to argument structure” (Abbott, Walker et al. 2011).

While these discourse markers can provide vital clues to a hearer trying to reconstruct an argument, relying on them in a model requires that a speaker not only try to be understood, but be capable of expressing themselves clearly. Stab and Gurevych, who are interested in argument mining as a tool for automating feedback on student essays, argue that discourse markers make a poor feature, since, in their corpora, these markers are often missing or even misleadingly used (Stab and Gurevych 2014). Their approach to this challenge will be further discussed in the state of the art section of this paper.

Theory of Evidence Relationships
The final piece of Cohen’s model is evidence relationships, which explicitly connect one argument to another and govern the verification of evidence relations between propositions (Cohen 1987). While the coherent structure principle lays out the different forms an argument may take, evidence relationships are the logical underpinnings that tie an argument’s structure together. As Cohen explains, the pragmatic analysis of evidence relationships is necessary for the model because the hearer needs to be able to “recognize beliefs of the speaker, not currently held by the hearer.” That is, whether or not the hearer agrees with the speaker’s argument, the hearer needs to be able to identify the elements of the speaker’s argument as well as the logic which holds that argument together.

To better understand the role of evidence relationships, it is helpful to first develop a definition of an “argument.” In its most general form, an argument can be understood as a representation of a fact as conveying some other fact. In this way, a complete argument requires three elements: a conveying fact, a warrant providing an appropriate relation of conveyance, and a conveyed fact (Katzav and Reed 2008). However, one or more of these elements often takes the form of an implicit enthymeme and is left unstated by the speaker. For this reason, some studies model arguments in their simplest form as a single proposition, though humans generally require at least two of the elements to accurately distinguish arguments from statements (Mochales and Moens 2011).

The ubiquity of enthymemes in both formal and informal dialogue has proved to be a significant challenge for argument mining. Left for the hearer to infer, these implicit elements are often “highly context-dependent and cannot be easily labeled in distant data using predefined patterns” (Habernal and Gurevych 2015). It is important to note that the existence of these enthymemes does not violate the initial assumption that a speaker argues with the intent of being understood by a hearer. Rather, enthymemes, like other human heuristics, provide a computational savings to a hearer/listener pair with a sufficiently shared context. Thus, enthymemes indicate the elements of an argument that a speaker assumes a listener can easily infer, a particular challenge when a speaker is a poor judge of the listener’s knowledge or when the listener is an AI model.

To complicate matters further, there are no definitive rules for the roles enthymemes may take. Any of an argument’s elements may appear as enthymemes, though psycholinguistic evidence indicates that the relationship of conveyance between two facts, the argument’s warrant, is most commonly left implicit (Katzav and Reed 2008). Similarly, the discourse markers which might otherwise serve as valuable clues for argument reconstruction “need not appear in arguments and thus cannot be relied upon” (Katzav and Reed 2008). All of this poses a significant challenge.

In her work, Cohen bypasses this problem by relying on an evidence oracle which takes two propositions, A and B, and responds ‘yes’ or ‘no’ as to whether A is evidence for B (Cohen 1987). In determining argument relations, Cohen’s oracle identifies missing premises, verifies the plausibility of these enthymemes, and ultimately concludes that an evidence relation holds if the missing premise is deemed plausible. In order to be found plausible, the inferred premise must be both plausibly intended by the speaker and plausibly believed by the hearer. In this way, the evidence oracle determines the structure of the argument while also overcoming the presence of enthymemes.

Decreasing reliance on such an oracle, Katzav and Reed develop a model to automatically determine the evidence relations between any pair of propositions within a text. Their model allows for two possible relations between an observed pair of argumentative statements: a) one of the propositions represents a fact which supposedly necessitates the other proposition (eg, missing conveyance), or b) one proposition represents a conveyance which, together with a fact represented by the other proposition, supposedly necessities some missing proposition (eg, missing claim) (Katzav and Reed 2008). The task then is to determine the type of relationship between the two statements and use that relationship to reconstruct the missing element.

Their notable contribution to argumentative theory is to observe that arguments can be classified by type (eg, “causal argument”), and that this type constrains the possible evidence relations of an argument. Central to their model is the identification of an argument’s warrant; the conveying element which defines the relationship between fact A and fact B. Since this is the element which is most often an enthymeme, Katzav and Reed devote significant attention to reconstructing an argument’s warrant from two observed argumentative statements. If, on the other hand, the observed pair fall into type b) above, with the final proposition missing, then the process is trivial: “the explicit statement of the conveying fact, along with the warrant, allows the immediate deduction of the implicit conveyed fact” (Katzav and Reed 2008).

This framework cleverly redefines the enthymeme reconstruction challenge. Katzav and Reed argue that no relation of conveyance can reasonably be thought to relate just any type of fact to any other type of fact. Therefore, given two observed propositions, A and B, a system can narrow the class of possible relations to warrants which can reasonably be thought to relate facts of the type A to facts of the type B. Katzav and Reed find this to be a “substantial constraint” which allows a system to deduce a missing warrant by leveraging a theory of “which relations of conveyance there are and of which types each such relation can reasonably be thought to relate” (Katzav and Reed 2008).

While this approach does represent an advancement over Cohen’s entirely oracle-dependent model, it is not without its own limitations. For successful warrant recovery, Katzov and Reed require a corpus with statements annotated with the types of facts they represent and a system with relevant background information similarly marked up. Furthermore, it requires a robust theory of warrants and relations, a subject only loosely outlined in their 2008 paper. Reed has advanced such a theory elsewhere, however, through his collaborations with Walton. This line of work is picked up by Feng and Hirst in a slightly different approach to enthymeme reconstruction.

Before inferring an argument’s enthymemes, Feng and Hirst argue, one must first classify an argument’s scheme. While a warrant defines the relation between two propositions, a scheme is a template which may incorporate more than two propositions. Unlike Cohen’s argument structures, the order in which statements occur does not affect an argument’s scheme. A scheme, then, is a flexible model which incorporates elements of Cohen’s coherent structure theory with elements of her evidence relations theory.

Drawing on the 65 argument schemes developed by Walton, et al. in 2008, Feng and Hirst seek to classify arguments under the five most common schemes. While their ultimate goal is to infer enthymemes, their current work takes this challenge to primarily be a classification task – once an argument’s scheme is properly classified, reconstruction can proceed as a simpler task. Under their model, an argument mining pipeline would reconstruct an argument’s scheme, fit the stated propositions into the scheme, and then use this template to infer enthymemes (Feng and Hirst 2011).

Working with 393 arguments from the Araucaria dataset, Feng and Hirst achieved over 90% best average accuracies for two of their schemes, with three other schemes rating in the 60s and 70s. They did this using a range of sentence and token based features, as well as a “type” feature, annotated in their dataset, which indicates whether the premises contribute to the conclusion in linked or convergent order (Feng and Hirst 2011).

A “linked” argument has two or more interdependent propositions which are all necessary to make the conclusion valid. In contrast, exactly one premise is sufficient to establish a valid conclusion in a “convergent” argument (Feng and Hirst 2011). They found this type feature to improve classification accuracy in most cases, though that improvement varied from 2.6 points for one scheme to 22.3 points for another. Unfortunately, automatically identifying an argument’s type is not an easy task in itself and therefore may not ultimately represent a net gain in enthymeme reconstruction. As future work, Feng and Hirst propose attempting automatic type classification through rules such as defining one premise to be linked to another if either would become an enthymeme if deleted.

While their efforts showed promising results in scheme classification, it is worth noting that best average accuracies varied significantly by scheme. Their classifier achieved remarkable results for an “argument from example” scheme (90.6%) and a “practical reasoning” scheme (90.8%). However, the schemes of “argument from consequences” and “argument from classification” were not nearly as successful – achieving only 62.9% and 63.2% best average accuracy respectively.

Feng and Hirst attribute this disparity to the low-performing schemes not having “such obvious cue phrases or patterns as the other three schemes which therefore may require more world knowledge encoded” (Feng and Hirst 2011). Thus, while the scheme classification approach cleverly merges the challenges introduced by Cohen’s coherent structure and evidence relationship theories, this work also highlights the need to not neglect the challenges of linguistic cues.

State of the Art Techniques in Argument Mining

Posted on April 27, 2016 by Sarah Shugars

As part of a paper I’m working on, here’s a review of select recent state of the art efforts in the area of argument mining.

In their work on automatic, topic-independent argument extraction, Swanson, Ecker et al. introduce an Implicit Markup hypothesis which harkens back to the earlier work of Cohen. This hypothesis is built of four elements: discourse relation, dialogue structure, syntactic properties, and semantic density (Swanson, Ecker et al. 2015). In their model, discourse relations can be determined by any two observed arguments. The second argument, or claim, is defined as the one to which a warrant – if observed – is syntactically bound. Dialogue structure considers the position of an argumentative statement within a post. Notably, with its focus on relative position, this is more similar to Cohen’s model of coherent structure than to the concept of schemes introduced by Walton. A sophisticated version of Cohen’s linguistic cues, syntactic properties are a clever way to leverage observed discourse markers in order to infer missing discourse markers. For example, observing sentences such as “I agree that <x>” can help identify other argumentative content of the more general form “I <verb> that <x>.”

The final element, semantic density, is a notable advancement in processing noisy data. Comprised of a number of features, such as sentence length, word length, deictic pronouns, and specificity, semantic density filters out those sentences which are not relevant to a post’s core argument. When dealing with noisy forum post data, this process of filtering out sentences which are harder interpret provides valuable computational savings without loosing an argument’s core claim. Furthermore, this filtering can help with the enthymeme challenge – in fact, Swanson, Ecker et al. filter out most enthymemes, focusing instead on claims which are syntactically bound to an explicit warrant.

With this model, Swanson, Ecker et al. take on the interesting task of trying to automatically predict argument quality – a particularly timely challenge given the ubquity of argumentative data from noisy online forums. With a corpus of over 100,000 posts on four political topics, Swanson, Ecker et al. compare the prediction of their model to human annotations of argument quality. Testing their model on three regression algorithms, they found that a support vector machine (SVM) performed best, explaining nearly 50% of the variance for some topics (R²= 0.466, 0.441 for gun control and gay marriage respectively).

Stab, Gurevych, and Habernal, all of the Ubiquitous Knowledge Processing Lab, have also made important contributions to the state of the art in argument mining. As noted above, Stab and Gurevych were among the first to expressly tackle the challenge of poorly structured arguments in their work identifying discourse structures in persuasive essays (Stab and Gurevych 2014).

In seeking to identify an argument’s structure and the relationship between its elements, this work has clear ties back to earlier argumentative theory. Indeed, while unfortunately prone to containing poorly-formed arguments, student essays are a model setting for Cohen’s theory: a single speaker does their best to form a coherent and compelling argument while a weary reader is tasked with trying to understand their meaning.

A notable contribution of Stab and Gurevych was to break this effort into a two-step classification task. The first step uses a multiclass identifier to classify the components of an argument, while the second step is a simpler binary classification of a pair of argument components as either support or non-support. As future work, they propose developing this as a joint inference problem, since the two pieces of information are indicators of each other. However, they found current accuracy in identifying argument components to be “not sufficient for increasing the performance of argumentative relation identification” (Stab and Gurevych 2014). Their best performing relation identification classifier, an SVM built with structural, lexical, and syntactic features, achieved “almost human performance” with an 86.3% accuracy, compared to a human accuracy of 95.4%. Emphasizing the challenges of linguistic cues in noisy text, a model using discourse markers in student essays yielded an F1-score of only 0.265.

Finally, in what may be the most promising line of current argument mining work, Habernal and Gurevych build a classifier for their labeled data using features derived in an unsupervised manner from noisy, unlabeled data. Using text from online debate portals, they derive features by “projecting data from debate portals into a latent argument space using unsupervised word embeddings and clustering” (Habernal and Gurevych 2015).

While this debate portal data contains “noisy texts of questionable quality,” Habernal and Gurevych are able to leverage this large, unlabeled dataset to build a successful classifier for their labeled data using a sentence-level SVM-Hidden Markov Model. To do this, they employ “argument space” features; composing vectors containing the weighted average of all word embeddings in a phrase, and then projecting those vectors into a latent vector space. The centroids found by clustering sentences from the debate portal in this way represent “a protypical argument” – implied by the clustering but not actually observed. Labeled data can than be projected into this latent vector space and the computed distance to centroids are encoded as a feature. In order to test cross-domain performance, the model was trained on five domains and tested on a sixth.

While this continues to be a challenging task, the argument space features consistently increased the model’s performance in classifying an argument’s component type. The best classification of claims (F1-score: 0.256) came from combining the argument feature space with semantic and discourse features. This compares to a human-baseline F1-score of 0.739 and a random assignment F1-score of 0.167.

Importantly, Habernal and Gurevych ground their approach in argumentative theory. Building off the work of Toulmin, they take each document of their corpora to contain a single argument composed of five components: a claim to be established, premises which give reason for the claim, backing which provides additional information, a rebuttal attacking the claim, and finally a refutation attacking the rebuttal. Each type of component is classified in their vector space, allowing them to assess which elements are more successfully classified as well as to gain insights into what argument structure prove particularly problematic.

Argument Structure

Posted on April 25, 2016 by Sarah Shugars

In her 1987 paper on “Analyzing the structure of argumentative discourse,” Robin Cohen laid out a theory of argument understanding comprised of three core components: coherent structure, linguistic clue interpretation, and evidence relationships. As the title suggests, this post focuses on the first of those elements: argument structure.

Expecting a coherent structure minimizes the computational requirements of argument mining tasks by limiting the possible forms of input. The coherent structure theory parses arguments as a tree of related statements, with every statement providing evidence for some other statement, and one root statement serving as the core claim of the argument. The theory posits that argument structures may vary, but there are a finite number of unique structures, and those structures are discoverable. Cohen herself introduces two such structures: pre-order “where the speaker presents a claim and then states evidence” and post-order, “where the speaker consistently presents evidence and then states the claim” (Cohen 1987).

Argument structure is a particularly notable and challenging element of argument mining. Identifying argument structures are essential for evaluating the quality of an argument (Stab and Gurevych 2014), but it is a difficult task which has gone largely unexplored. A key challenge is the lack of argument delimiters; one argument may span multiple sentences and multiple premises may be contained in the one sentence. In the resulting segmentation problem, we are able to determine which information for the arguments, but not how this information is split into the different arguments (Mochales and Moens 2011).

To address this challenge, Mochales and Moens have sought to expand models of argument structure, parsing texts “by means of manually derived rules that are grouped into a context-free grammar (CFG)” (Mochales and Moens 2011). Restricting their focus to the legal domain – where arguments are consistently well-formed – Mochales and Moens manually built a context-free grammar in which document has a tree-structure (T) formed by an argument (A) and a decision (D). Further rules elucidated what elements may form the argument and what elements may form the decision. By maintaining a tree-structure for identified arguments, Mochales and Moens broadened the range of possible argument structures without sacrificing too much computational complexity.

Using this approach, Mochales and Moens were able to obtain 60% accuracy when detecting argument structures, measured manually by comparing the structures given by the CFG and the structures given by human annotators. This is a notable advancement over the simple structures introduced by Cohen, but there is still more work to be done in this area. Specifically, as Mochales and Moens point out, future work includes broadening the corpora studied to include additional types of argumentation structure, developing techniques which can identify and computationally handle structures more complex than trees, and incorporating feedback from those who generate the arguments being parsed. The limitation of this model to legal texts is particularly notable, as “it is likely it will not achieve acceptable accuracy when applied to more general texts in which discourse markers are missing or even misleadingly used (e.g. student texts)” (Stab and Gurevych 2014).

Argument Mining

Posted on April 20, 2016 by Sarah Shugars

In 1987, computer scientist Robin Cohen outlined a theory of argument structure which laid the groundwork for modern argument mining tasks. Taking argument to a process in which a speaker intentionally tries to convince a hearer, her approach focused on understanding the structure arguments can take.

This structure is generally tree-like: the speakers primary claim is the root, and supporting arguments appear as branches. Secondary arguments may further expand the tree, as the speaker makes claims to reinforce a supporting argument. That is, a simple argument can take the form A and B, therefore C, or could take the form A therefore B therefore C.

In this way a complex argument can be modeled a tree with all the various supporting and secondary arguments point back up to the core argument root.

The problem that Cohen noted, which has continued to be a challenge in more recent argument mining techniques, is that core premises often go unsaid.

Take, for example, the simple argument structure of “P therefore Q.” In many contexts, a speaker will state P and Q, but leave out the primary claim: P therefore Q. As human interpreters, filling this gap is often a trivial task. Consider the simple argument:

Joey is dangerous.
Joey is a shark.

It is left the reader to infer that Joey is dangerous because he is a shark…and that all sharks are dangerous. (This, of course, could be debated…)

While there are no doubt instances where this lack of clarity causes confusion for a human reader, in general, this is a challenge which is easy for people with their broad array of contextual knowledge – and terribly difficult for machines.

Joel Katzav and Chris Reed formalize this missing argument (enthymeme) challenge. Defining an argument as “a representation of a fact as conveying some other fact,” a complete argument then has three elements: a conveying fact, the appropriate relation of conveyance, and the conveyed fact.

In parsing content, then, an algorithm could work to define a sentence or otherwise defined element as either a “non-argument” or as one of the argument types above. This makes the computer’s job a little easier: it only has to recognizes pieces of an argument and can flag which arguments are incomplete.

Furthermore, syntactic clues often give both humans and machines some insight into the structure of an implied argument: because X, therefore Y. Annotated debate texts can then help machines learn the relevant syntactic clues, allowing them to better parse arguments.

This is still somewhat unsatisfying, though, as annotating texts is difficult, expensive…and may still be inaccurate. In one study of online-debate, Rob Abbott et al employed 5-7 annotators per post and still found not-insignificant disagreement on some measures. Most notably, it seems, people are not much better at recognizing sarcasm than people.

Furthermore, arguments are not always…formal.

In legal texts or a public debate, it might be reasonable to assume that a given speaker makes the best possible argument as clearly as possible for a general human audience. This assumption can not be extended to many online forums or other domains, such as student essays. Colloquially, syntactic clues may be missing…or may even be miss used.

Latest work in argument mining has focused on over coming these challenges.

A 2015 paper by Ivan Habernal and Iryna Gurevich, for example, aimed to build an argument mining system that could work across domains, on unlabeled data. An earlier paper by Christian Stab and Iryna Gurevich focused on trying to parse (often poorly-formated) student essays.

By projecting argument elements into a vector space – or argument space – researchers can use unsupervised techniques to cluster arguments and identify argument centroids, which represent “prototypical arguments” not actually observed in the text.

There’s still more work to do, but these recent approaches have been reasonably successful and show a lot of promise.

Coding The English Language

Posted on March 17, 2016 by Sarah Shugars

I have been quite busy this week trying to capture all the rules of the English language.

As you might suspect, this is a non-trivial task.

Having benefited from being a native English speaker and having studied far more regular languages (Latin and Japanese), I always knew that English was a crazy mishmash of rules – but I find I am getting a whole knew appreciation for it’s complexity.

As it stands, my grammar – which has a tiny vocabulary and only rudimentary sentences – has nearly 500 rules. Every time I try to generalize, I find those nagging English exceptions which create a cascade of special case rules.

All this highlights how impressive the advances of Natural Language Processing are – correcting spelling and grammar is hardly easy, much less building an assistant such as Siri which can understand what you say.

It also seems to highlight the concerns of the natural language philosophers – when constructing a thought as an expressible sentences is so hard, how can we be confident our meanings are understood?

Of course, our meanings are very often not understood, which leads to no end of drama and miscommunication. But, putting basic miscommunications aside, what does it really mean to communicate or to understand another person?

Ludwig Wittgenstein poses this questions frequently throughout his work. In Philosophical Investigations he tests numerous thought experiments. If I say I am in pain and you have experienced pain, do I understand your pain?

For practical purposes, we generally have to act as if we understand each other, whether or not some deeper philosophical measure of True understanding has been met.

Wittgenstein also uses a lovely metaphor to describe the complex architecture of human language:

“Our language can be regarded as an ancient city: a maze of little streets and squares, of old and new houses, of houses with extensions from various periods, and all this surrounded by a multitude of new suburbs with straight and regular streets and uniform houses.”

The Nature of Technology

Posted on March 15, 2016 by Sarah Shugars

I recently finished reading W. Brian Arthur’s The Nature of Technology, which explores what technology is and how it evolves.

Evolves is an intentional word here; the concept is at the core of Arthur’s argument. Technology is not a passive thing which only grows in spurts of genius inspiration – it is a complex system which is continuously growing, changing, and – indeed – evolving.

Arthur writes that he means the term evolution literally – technology builds itself from itself, growing and improving through the novel combination of existing tools – but he is clear that the process of evolution does not imply that technology is alive.

“…To say that technology creates itself does not imply it has any consciousness, or that it uses humans somehow in some sinister way for its own purposes,” he writes. “The collective of technology builds itself from itself with the agency of human inventors and developers much as a coral reef builds itself from the activities of small organisms.”

Borrowing from Humberto Maturana and Fransisco Varela, Arthur describes this process as autopoiesis, self-creating.

This is a bold claim.

To consider technology as self-creating changes our relationship with the phenomenon. It is not some disparate set of tools which occasionally benefits from the contributions of our best thinkers; it is a growing body of interconnected skills and knowledge which can be infinitely combined and recombined into increasingly complex approaches.

The idea may also be surprising. An iPhone 6 may clearly have evolved from an earlier model, which in turn may owe its heritage to previous computer technology – but what relationship does a modern cell phone have with our earliest tools of rocks and fire?

In Arthur’s reckoning, with a complete inventory of technological innovations one could fully reconstruct a technological evolutionary tree – showing just how each innovation emerged by connecting its predecessors.

This concept may seem odd, but Arthur makes a compelling case for it – outlining several examples of engineering problem solving which essentially boil down to applying existing solutions to novel problems.

Furthermore, Arthur explains that this technological innovation doesn’t occur in a vacuum – not only does it require the constant input of human agency, it grows from humanity’s continual “capturing” of physical phenomena.

“At the very start of technological time, we directly picked up and used phenomena: the heat of fire, the sharpness of flaked obsidian, the momentum of a stone in motion. All that we have achieved since comes from harnessing these and other phenomena, and combining the pieces that result,” Arthur argues.

Through this process of exploring our environment and iteratively using the tools we discover to further explore our environment, technology evolves and builds on itself.

Arthur concludes that “this account of the self-creation of technology should give us a different feeling about technology.” He explains:

“We begin to get a feeling of ancestry, of a vast body of things that give rise to things, of things that add to the collection and disappear from it. The process by which this happens is neither uniform not smooth; it shows bursts of accretion and avalanches of replacement. It continually explores into the unknown, continually uncovers novel phenomena, continually creates novelty. And it is organic: the new layers form on top of the old, and creations and replacements overlap in time. In its collective sense, technology is not nearly a catalog of individual parts. It is a metabolic chemistry, an almost limitless collective of entities that interact to produce new entities – and further needs. And we should not forget that needs drive the evolution of technology every bit as much as the possibilities for fresh combination and the unearthing of phenomena. Without the presence of unmet needs, nothing novel would appear in technology.”

In the end, I suppose we should not be surprised by the idea of technology’s evolution. It is a human-generated system; as complex and dynamic as any social system. It is vast, ever-changing, and at times unpredictable – but ultimately, at its core, technology is very human.

Natural Language Processing

Posted on March 1, 2016 by Sarah Shugars

I’ve been taking a great class this semester in Natural Language Processing – a computer science field which deals, as you may have guessed, with the processing of “natural” language. NLP is the foundation of technologies like spellcheck, automatic translation (a work in progress!), and Siri.

Essentially, you feed a bunch of human-generated text into a computer and it gives you something in response, with the “something” varying greatly based on what you’re trying to do.

A few weeks ago I deleted all the vowels from the Declaration of Independence.

(And then nondeterministically put them back in).

But at more sophisticated levels, you can analyze the sentiment of a text, mimic human dialogue, or generate new text in the style of a given author. Eventually, I hope to use NLP techniques to process transcripts of political and civic dialogue, but for now I’m enjoying learning the basics of the field.

The fundamentals of NLP are fascinating – in our native language, we each easily construct our own sentences and relatively easily interpret the sentiment and meaning of other’s sentences. We’re generally familiar with the basic syntax and parts of speech in our native language, but generally we don’t give these much thought as we communicate with those around us.

And, as spoken languages are living languages, in casual conversation we effortlessly change the rules and adapt to new words and styles.

One might think that teaching a computer all the rules of grammar as well as the flexibly of our unspoken rules would be quite complicated. And that’s true to some extent, but more generally the challenge of computer-interfaced language is just different.

ELIZA, one of the early successful NLP programs, is relatively simple. Programmed to respond to human-typed input as a Rogerian psychotherapist, ELIZA is based off an algorithm of pattern-matching. You say, “I am sad,” and ELIZA responds, “I’m sorry you are sad.”

On the other hand, satire and sarcasm continue to elude NLP programs…such humor is just too subtle to capture in rules, I suppose.

The rules for a given NLP program can become quite elaborate and yet, the underlying theory is relatively simple: you start at the beginning of a sentence, and then explore a set of rules with each rule given with a certain probability. When you reach an end symbol (eg, a period), you are done.

Dynamics of Online Social Interactions

Posted on February 24, 2016 by Sarah Shugars

I had the opportunity today to hear from Chenhao Tan, a Ph.D. Candidate in Computer Science at Cornell University who is looking at the dynamics of online social interactions.

In particular, Tan has done a great deal of work around predicting retweet rates for Twitter messages. That is, given two tweets by the same author on the same topic, can you predict which one will be retweeted more?

Interestingly, such pairs of tweets naturally occur frequently on Twitter. For one 2014 study, Tan was able to identify 11,000 pairs of author and topic controlled tweets with different retweet rates.

Through a computational model comparing words used as well as a number of custom features, such as the “informativeness” of a given tweet, Tan was able to build model which could correctly identify which tweet was more popular.

He even created a fun tool that allows you to input your own tweet text to compare which is more likely to be retweeted more.

From all this Twitter data, Tan was also able to compare the language of “successful” tweets to the tweets drawn from Twitter as a whole; as well as compare how these tweets fit into a given poster’s tone.

Interestingly, Tan found that the best strategy is to “be like the community, be like yourself.” That is – the most successful tweets were not notably divergent from Twitter norms and tended to be in line with the personal style of the original poster.

Tan interpreted this as a positive finding, indicating that a user doesn’t need to do something special in order to “stand out.” But, such a result to also point to Twitter as an insular community – unable to amplify messages which don’t fit the dominant norm.

And this leads to one of Tan’s broader research questions. Studies like his work around Twitter look at micro-level data; examining words and exploring how individual’s minds are changed. But, as Tan pointed out, the work of studying online communities can also be explored from a broader, macro level: what do healthy, online environments look like and how are they maintained?

There is more work to be done on both of these questions, but Tan’s work an intriguing start.

Mimicking Deliberation

Posted on February 17, 2016 by Sarah Shugars

In 1950, pioneering computer scientist Alan Turing described an “imitation game” which has since come to be known as the Turing Test. The test is a game played between three agents: two humans and a computer. Human 1 asks a series of questions; human 2 and the computer respond.

The game: human 1 seeks to correctly identify the human respondent while human 2 and the computer both try to be identified as human.

Turing describes this test in order to answer the question: can machines think?

The game, he argues, can empirically replace the posed philosophical question. A computer which could successfully be regularly be identified as human based on its command of language would indeed “think” in all practical meanings of the word.

Turing goes on to address the many philosophical, theological, and mathematical objections to his argument – but that is beyond the scope of what I want to write about today.

Regardless of the test’s indication for sentience, it quickly became a sort of gold standard in natural language processing – could we, in fact, build a computer clever enough to win this game?

Winning the game, of course, requires a detailed and nuanced grasp of language. What orders are properly appropriate for words? What elements of a question ought a respondent repeat? How do you introduce new topics or casually refer to past topics? How do you interact naturally, gracefully engaging with your interlocutor?

Let’s not pretend that I’ve fully mastered such social skills.

In this way, designing a Turing-successful machine can be seen as a mirror of ideal speaking. The winner of the Turing game, human or machine, will ultimately be the player who responds most properly – accepting some a nuanced definition of “proper” which incorporates human imperfection.

This makes me wonder – what would a Turing Test look like specifically in the context of political deliberation? That is, how would you program ideal dialogue?

Of course, the definition of ideal dialogue itself is much contested – should each speaker have an exactly measured amount of time? Should turn-taking be intentionally delineated or occur naturally? Must a group come to consensus and make a collective decision? Must there be an absence of conflict or is disagreement a positive signal that differing views are being justly considered?

These questions are richly considered in the deliberation literature, but it takes on a different aspect somehow in the context of the Turing Test.

Part of what makes deliberative norms so tricky is that people are, indeed, so different. A positive, safe, productive environment for one person may make another feel silenced. There are intersecting layers of power and privilege which are impossible to disambiguate.

But programming a computer to deliberate is different. A machine enters a dialogue naively – it has no history, no sense of power nor experience of oppression. It is the perfect blank slate upon which an idealized dialogue model could be placed.

This question is important because when trying to conceive of ideal dialogue run the risk of making a dangerous misstep. In the days when educated white men were the only ones allowed to participate in political dialogue, ideal dialogue was easier. People may have held different views, but they came to the conversation with generally equal levels of power and with similar experiences.

In trying to broaden the definition of ideal dialogue to incorporate the experiences of others who do not fit that mold, we run the risk of considering this “other” as a problematizing force. If we could just make women more like men; if we could make people of color “act white,” then the challenges of diverse deliberation would disappear.

No one would intentionally articulate this view, of course, but there’s a certain subversive stickiness to it which has a way of creeping in to certain models of dialogue. A quiet, underlying assumption that “white” is the norm and all else must change to accommodate that.

Setting out to program a computer changes all that. It’s a dramatic shift of context which belies all norms.

Frankly, I hardly know what an ideal dialogue machine might look like, but – it seems a question worth considering.

Civic Studies

An intellectual community of researchers and practitioners dedicated to building the emerging field of civic studies

Category Archives: Computer Science