As part of a paper I’m working on, here’s a review of select recent state of the art efforts in the area of argument mining.
In their work on automatic, topic-independent argument extraction, Swanson, Ecker et al. introduce an Implicit Markup hypothesis which harkens back to the earlier work of Cohen. This hypothesis is built of four elements: discourse relation, dialogue structure, syntactic properties, and semantic density (Swanson, Ecker et al. 2015). In their model, discourse relations can be determined by any two observed arguments. The second argument, or claim, is defined as the one to which a warrant – if observed – is syntactically bound. Dialogue structure considers the position of an argumentative statement within a post. Notably, with its focus on relative position, this is more similar to Cohen’s model of coherent structure than to the concept of schemes introduced by Walton. A sophisticated version of Cohen’s linguistic cues, syntactic properties are a clever way to leverage observed discourse markers in order to infer missing discourse markers. For example, observing sentences such as “I agree that <x>” can help identify other argumentative content of the more general form “I <verb> that <x>.”
The final element, semantic density, is a notable advancement in processing noisy data. Comprised of a number of features, such as sentence length, word length, deictic pronouns, and specificity, semantic density filters out those sentences which are not relevant to a post’s core argument. When dealing with noisy forum post data, this process of filtering out sentences which are harder interpret provides valuable computational savings without loosing an argument’s core claim. Furthermore, this filtering can help with the enthymeme challenge – in fact, Swanson, Ecker et al. filter out most enthymemes, focusing instead on claims which are syntactically bound to an explicit warrant.
With this model, Swanson, Ecker et al. take on the interesting task of trying to automatically predict argument quality – a particularly timely challenge given the ubquity of argumentative data from noisy online forums. With a corpus of over 100,000 posts on four political topics, Swanson, Ecker et al. compare the prediction of their model to human annotations of argument quality. Testing their model on three regression algorithms, they found that a support vector machine (SVM) performed best, explaining nearly 50% of the variance for some topics (R2 = 0.466, 0.441 for gun control and gay marriage respectively).
Stab, Gurevych, and Habernal, all of the Ubiquitous Knowledge Processing Lab, have also made important contributions to the state of the art in argument mining. As noted above, Stab and Gurevych were among the first to expressly tackle the challenge of poorly structured arguments in their work identifying discourse structures in persuasive essays (Stab and Gurevych 2014).
In seeking to identify an argument’s structure and the relationship between its elements, this work has clear ties back to earlier argumentative theory. Indeed, while unfortunately prone to containing poorly-formed arguments, student essays are a model setting for Cohen’s theory: a single speaker does their best to form a coherent and compelling argument while a weary reader is tasked with trying to understand their meaning.
A notable contribution of Stab and Gurevych was to break this effort into a two-step classification task. The first step uses a multiclass identifier to classify the components of an argument, while the second step is a simpler binary classification of a pair of argument components as either support or non-support. As future work, they propose developing this as a joint inference problem, since the two pieces of information are indicators of each other. However, they found current accuracy in identifying argument components to be “not sufficient for increasing the performance of argumentative relation identification” (Stab and Gurevych 2014). Their best performing relation identification classifier, an SVM built with structural, lexical, and syntactic features, achieved “almost human performance” with an 86.3% accuracy, compared to a human accuracy of 95.4%. Emphasizing the challenges of linguistic cues in noisy text, a model using discourse markers in student essays yielded an F1-score of only 0.265.
Finally, in what may be the most promising line of current argument mining work, Habernal and Gurevych build a classifier for their labeled data using features derived in an unsupervised manner from noisy, unlabeled data. Using text from online debate portals, they derive features by “projecting data from debate portals into a latent argument space using unsupervised word embeddings and clustering” (Habernal and Gurevych 2015).
While this debate portal data contains “noisy texts of questionable quality,” Habernal and Gurevych are able to leverage this large, unlabeled dataset to build a successful classifier for their labeled data using a sentence-level SVM-Hidden Markov Model. To do this, they employ “argument space” features; composing vectors containing the weighted average of all word embeddings in a phrase, and then projecting those vectors into a latent vector space. The centroids found by clustering sentences from the debate portal in this way represent “a protypical argument” – implied by the clustering but not actually observed. Labeled data can than be projected into this latent vector space and the computed distance to centroids are encoded as a feature. In order to test cross-domain performance, the model was trained on five domains and tested on a sixth.
While this continues to be a challenging task, the argument space features consistently increased the model’s performance in classifying an argument’s component type. The best classification of claims (F1-score: 0.256) came from combining the argument feature space with semantic and discourse features. This compares to a human-baseline F1-score of 0.739 and a random assignment F1-score of 0.167.
Importantly, Habernal and Gurevych ground their approach in argumentative theory. Building off the work of Toulmin, they take each document of their corpora to contain a single argument composed of five components: a claim to be established, premises which give reason for the claim, backing which provides additional information, a rebuttal attacking the claim, and finally a refutation attacking the rebuttal. Each type of component is classified in their vector space, allowing them to assess which elements are more successfully classified as well as to gain insights into what argument structure prove particularly problematic.