Earlier in the week, I wrote about the theory of coherent argument structure introduced by Robin Cohen in 1987. Her model also included two other elements: a theory of linguistic clue interpretation and a theory of evidence relationships. These theories, the focus of today’s post, are both closely connected to each other as well as to the theory of argument structure.
Theory of Linguistic Clue Interpretation
Cohen’s theory of linguistic clue interpretation argues for the existence of clue words; “those words and phrases used by the speaker to directly indicate the structure of the argument to the hearer.” Capable of being identified through simple n-gram models as well as more sophisticated means, these linguistic cues, or discourse markers, are a common feature of argument mining. Cohen outlines several common clue types such as redirection, which re-direct the hearer to an earlier part of the argument (“returning to…”) and connection, a broad category encompassing clues of inference (“as a result of…”), clues of contrast (“on the other hand…”), and clues of detail (“specifically…”). Most notably, though, Cohen argues that clues are necessary for arguments whose structure is more complex than those covered by the coherent structure theory. That is, the function of linguistic cues is not “to merely add detail on the interpretation of the contained proposition, but to allow that proposition an interpretation that would otherwise be denied” (Cohen 1987).
Discourse markers are also “strongly associated with particular pragmatic functions,” (Abbott, Walker et al. 2011) making them valuable for sentiment analysis tasks such as determining agreement or disagreement. This is the approach Abbott, Walker et al. used in classifying types of arguments within a corpus of 109,553 annotated posts from an online forum. Since the forum allowed for explicitly quoting another post, Abbott, Walker et al. identified 8,242 quote-response pairs, where a user quoted another post and then added a comment of their own.
In addition to the classification task of determining whether the response agrees or disagrees with the preceeding quote, the team analysed the pairs on a number of sentiment spectrums: respect/insult, fact/emotion, nice/nasty, and sarcasm. Twenty discourse markers identified through manual inspection of a subset of the corpus, as well as the use of “repeated punctuation” served as key features in the analysis.
Using a JRip classifier built on n-gram and bi-gram discourse markers, as well as a handful of other features such as post meta-data and topic annotations, Abbott, Walker et al. found the best performance (0.682 accuracy, compared to a 0.626 unigram baseline) using local features from both the quote and response. This indicates that the contextual features do matter, and, in the words of the authors, vindicates their “interest in discourse markers as cues to argument structure” (Abbott, Walker et al. 2011).
While these discourse markers can provide vital clues to a hearer trying to reconstruct an argument, relying on them in a model requires that a speaker not only try to be understood, but be capable of expressing themselves clearly. Stab and Gurevych, who are interested in argument mining as a tool for automating feedback on student essays, argue that discourse markers make a poor feature, since, in their corpora, these markers are often missing or even misleadingly used (Stab and Gurevych 2014). Their approach to this challenge will be further discussed in the state of the art section of this paper.
Theory of Evidence Relationships
The final piece of Cohen’s model is evidence relationships, which explicitly connect one argument to another and govern the verification of evidence relations between propositions (Cohen 1987). While the coherent structure principle lays out the different forms an argument may take, evidence relationships are the logical underpinnings that tie an argument’s structure together. As Cohen explains, the pragmatic analysis of evidence relationships is necessary for the model because the hearer needs to be able to “recognize beliefs of the speaker, not currently held by the hearer.” That is, whether or not the hearer agrees with the speaker’s argument, the hearer needs to be able to identify the elements of the speaker’s argument as well as the logic which holds that argument together.
To better understand the role of evidence relationships, it is helpful to first develop a definition of an “argument.” In its most general form, an argument can be understood as a representation of a fact as conveying some other fact. In this way, a complete argument requires three elements: a conveying fact, a warrant providing an appropriate relation of conveyance, and a conveyed fact (Katzav and Reed 2008). However, one or more of these elements often takes the form of an implicit enthymeme and is left unstated by the speaker. For this reason, some studies model arguments in their simplest form as a single proposition, though humans generally require at least two of the elements to accurately distinguish arguments from statements (Mochales and Moens 2011).
The ubiquity of enthymemes in both formal and informal dialogue has proved to be a significant challenge for argument mining. Left for the hearer to infer, these implicit elements are often “highly context-dependent and cannot be easily labeled in distant data using predefined patterns” (Habernal and Gurevych 2015). It is important to note that the existence of these enthymemes does not violate the initial assumption that a speaker argues with the intent of being understood by a hearer. Rather, enthymemes, like other human heuristics, provide a computational savings to a hearer/listener pair with a sufficiently shared context. Thus, enthymemes indicate the elements of an argument that a speaker assumes a listener can easily infer, a particular challenge when a speaker is a poor judge of the listener’s knowledge or when the listener is an AI model.
To complicate matters further, there are no definitive rules for the roles enthymemes may take. Any of an argument’s elements may appear as enthymemes, though psycholinguistic evidence indicates that the relationship of conveyance between two facts, the argument’s warrant, is most commonly left implicit (Katzav and Reed 2008). Similarly, the discourse markers which might otherwise serve as valuable clues for argument reconstruction “need not appear in arguments and thus cannot be relied upon” (Katzav and Reed 2008). All of this poses a significant challenge.
In her work, Cohen bypasses this problem by relying on an evidence oracle which takes two propositions, A and B, and responds ‘yes’ or ‘no’ as to whether A is evidence for B (Cohen 1987). In determining argument relations, Cohen’s oracle identifies missing premises, verifies the plausibility of these enthymemes, and ultimately concludes that an evidence relation holds if the missing premise is deemed plausible. In order to be found plausible, the inferred premise must be both plausibly intended by the speaker and plausibly believed by the hearer. In this way, the evidence oracle determines the structure of the argument while also overcoming the presence of enthymemes.
Decreasing reliance on such an oracle, Katzav and Reed develop a model to automatically determine the evidence relations between any pair of propositions within a text. Their model allows for two possible relations between an observed pair of argumentative statements: a) one of the propositions represents a fact which supposedly necessitates the other proposition (eg, missing conveyance), or b) one proposition represents a conveyance which, together with a fact represented by the other proposition, supposedly necessities some missing proposition (eg, missing claim) (Katzav and Reed 2008). The task then is to determine the type of relationship between the two statements and use that relationship to reconstruct the missing element.
Their notable contribution to argumentative theory is to observe that arguments can be classified by type (eg, “causal argument”), and that this type constrains the possible evidence relations of an argument. Central to their model is the identification of an argument’s warrant; the conveying element which defines the relationship between fact A and fact B. Since this is the element which is most often an enthymeme, Katzav and Reed devote significant attention to reconstructing an argument’s warrant from two observed argumentative statements. If, on the other hand, the observed pair fall into type b) above, with the final proposition missing, then the process is trivial: “the explicit statement of the conveying fact, along with the warrant, allows the immediate deduction of the implicit conveyed fact” (Katzav and Reed 2008).
This framework cleverly redefines the enthymeme reconstruction challenge. Katzav and Reed argue that no relation of conveyance can reasonably be thought to relate just any type of fact to any other type of fact. Therefore, given two observed propositions, A and B, a system can narrow the class of possible relations to warrants which can reasonably be thought to relate facts of the type A to facts of the type B. Katzav and Reed find this to be a “substantial constraint” which allows a system to deduce a missing warrant by leveraging a theory of “which relations of conveyance there are and of which types each such relation can reasonably be thought to relate” (Katzav and Reed 2008).
While this approach does represent an advancement over Cohen’s entirely oracle-dependent model, it is not without its own limitations. For successful warrant recovery, Katzov and Reed require a corpus with statements annotated with the types of facts they represent and a system with relevant background information similarly marked up. Furthermore, it requires a robust theory of warrants and relations, a subject only loosely outlined in their 2008 paper. Reed has advanced such a theory elsewhere, however, through his collaborations with Walton. This line of work is picked up by Feng and Hirst in a slightly different approach to enthymeme reconstruction.
Before inferring an argument’s enthymemes, Feng and Hirst argue, one must first classify an argument’s scheme. While a warrant defines the relation between two propositions, a scheme is a template which may incorporate more than two propositions. Unlike Cohen’s argument structures, the order in which statements occur does not affect an argument’s scheme. A scheme, then, is a flexible model which incorporates elements of Cohen’s coherent structure theory with elements of her evidence relations theory.
Drawing on the 65 argument schemes developed by Walton, et al. in 2008, Feng and Hirst seek to classify arguments under the five most common schemes. While their ultimate goal is to infer enthymemes, their current work takes this challenge to primarily be a classification task – once an argument’s scheme is properly classified, reconstruction can proceed as a simpler task. Under their model, an argument mining pipeline would reconstruct an argument’s scheme, fit the stated propositions into the scheme, and then use this template to infer enthymemes (Feng and Hirst 2011).
Working with 393 arguments from the Araucaria dataset, Feng and Hirst achieved over 90% best average accuracies for two of their schemes, with three other schemes rating in the 60s and 70s. They did this using a range of sentence and token based features, as well as a “type” feature, annotated in their dataset, which indicates whether the premises contribute to the conclusion in linked or convergent order (Feng and Hirst 2011).
A “linked” argument has two or more interdependent propositions which are all necessary to make the conclusion valid. In contrast, exactly one premise is sufficient to establish a valid conclusion in a “convergent” argument (Feng and Hirst 2011). They found this type feature to improve classification accuracy in most cases, though that improvement varied from 2.6 points for one scheme to 22.3 points for another. Unfortunately, automatically identifying an argument’s type is not an easy task in itself and therefore may not ultimately represent a net gain in enthymeme reconstruction. As future work, Feng and Hirst propose attempting automatic type classification through rules such as defining one premise to be linked to another if either would become an enthymeme if deleted.
While their efforts showed promising results in scheme classification, it is worth noting that best average accuracies varied significantly by scheme. Their classifier achieved remarkable results for an “argument from example” scheme (90.6%) and a “practical reasoning” scheme (90.8%). However, the schemes of “argument from consequences” and “argument from classification” were not nearly as successful – achieving only 62.9% and 63.2% best average accuracy respectively.
Feng and Hirst attribute this disparity to the low-performing schemes not having “such obvious cue phrases or patterns as the other three schemes which therefore may require more world knowledge encoded” (Feng and Hirst 2011). Thus, while the scheme classification approach cleverly merges the challenges introduced by Cohen’s coherent structure and evidence relationship theories, this work also highlights the need to not neglect the challenges of linguistic cues.





