Conversational flow in Oxford-style debates
Public debates are a common platform for presenting and juxtaposing diverging views on important issues. In this work we propose a methodology for tracking how ideas flow between participants throughout a debate. We use this approach in a case study …
Authors: Justine Zhang, Ravi Kumar, Sujith Ravi
Con versational Flo w in Oxf ord-style Debates Justine Zhang, 1 Ra vi Kumar , 2 Sujith Ra vi, 2 Cristian Danescu-Niculescu-Mizil 1 1 Cornell Uni versity , 2 Google jz727@cornell.edu , ravi.k53@gmail.com , ravi.sujith@gmail.com , cristian@cs.cornell.edu Abstract Public debates are a common platform for pre- senting and juxtaposing div erging views on important issues. In this work we propose a methodology for tracking ho w ideas flow be- tween participants throughout a debate. W e use this approach in a case study of Oxford- style debates—a competitiv e format where the winner is determined by audience v otes—and show how the outcome of a debate depends on aspects of con versational flow . In particu- lar , we find that winners tend to mak e better use of a debate’ s interactive component than losers, by acti vely pursuing their opponents’ points rather than promoting their own ideas ov er the course of the con versation. 1 Introduction Public debates are a common platform for present- ing and juxtaposing di ver ging viewpoints As op- posed to monologues where speakers are limited to expressing their own beliefs, debates allow for participants to interactiv ely attack their opponents’ points while defending their o wn. The resulting flow of ideas is a ke y feature of this con versation genre. In this w ork we introduce a computational frame- work for characterizing debates in terms of con ver - sational flow . This frame work captures two main de- bating strategies—promoting one’ s own points and attacking the opponents’ points—and tracks their relati ve usage throughout the debate. By applying this methodology to a setting where debate winners are kno wn, we show that con versational flow pat- terns are predictiv e of which debater is more likely to persuade an audience. Case study: Oxf ord-style debates. Oxford-style debates pro vide a setting that is particularly con ve- nient for studying the effects of con versational flo w . In this competiti ve debate format, two teams argue for or against a preset motion in order to persuade a liv e audience to take their position. The audience votes before and after the debate, and the winning team is the one that sways more of the audience to- wards its vie w . This setup allows us to focus on the ef fects of conv ersational flo w since it disentangles them from the audience’ s prior leaning. 1 The debate format inv olves an opening statement from the two sides, which presents an o vervie w of their arguments before the discussion begins. This allo ws us to easily identify talking points held by the participants prior to the interaction, and con- sider them separately from points introduced spon- taneously to serve the discussion. This work is taking steps tow ards better model- ing of con versational dynamics, by: (i) introducing a debate dataset with rich metadata (Section 2), (ii) proposing a frame work for tracking the flow of ideas (Section 3), and (iii) showing its effecti veness in a predicti ve setting (Section 4). 2 Debate Dataset: Intelligence Squared In this study we use transcripts and results of Oxford-style debates from the public debate series “Intelligence Squared Debates” (IQ2 for short). 2 These debates are recorded live, and contain mo- tions cov ering a di versity of topics ranging from for- 1 Other potential confounding factors are mitigated by the tight format and topic enforced by the debate’ s moderator . 2 http://www.intelligencesquaredus.org eign policy issues to the benefits of organic food. Each debate consists of two opposing teams—one for the motion and one against—of two or three ex- perts in the topic of the particular motion, along with a moderator . Each debate follows the Oxford-style format and consists of three rounds. In the intr oduc- tion , each debater is giv en 7 minutes to lay out their main points. During the discussion , debaters take questions from the moderator and audience, and re- spond to attacks from the other team. This round lasts around 30 minutes and is highly interacti ve; teams frequently engage in direct conv ersation with each other . Finally , in the conclusion , each debater is gi ven 2 minutes to mak e final remarks. Our dataset consists of the transcripts of all de- bates held by IQ2 in the US from September 2006 up to September 2015; in total, there are 108 de- bates. 3 Each debate is quite extensiv e: on a verage, 12801 w ords are uttered in 117 turns by members of either side per debate. 4 Winning side labels. W e follow IQ2’ s criteria for deciding who wins a debate, as follows. Before the debate, the liv e audience votes on whether they are for , against, or undecided on the motion. A sec- ond round of voting occurs after the debate. A side wins the debate if the difference between the per- centage of votes they recei ve post- and pre-debate (the “delta”) is greater than that of the other side’ s. Often the debates are quite tight: for 30% of the de- bates, the dif ference between the winning and losing sides’ deltas is less than 10%. A udience feedback. W e check that the voting re- sults are meaningful by verifying that audience reac- tions to the debaters are related to debate outcome. Using laughter and applause recei ved by each side in each round 5 as markers of positi ve reactions, we note that differences in audience reception of the tw o sides emerge ov er the course of the debate. While both sides get similar lev els of reaction during the introduction, winning teams tend to receiv e more laughter during the discussion ( p < 0 . 001 ) 6 and more applause during the conclusion ( p = 0 . 05 ). 3 W e omitted one debate due to pdf parsing errors. 4 The processed data is available at http://www.cs. cornell.edu/ ˜ cristian/debates/ . 5 Laughter and applause are indicated in the transcripts. 6 Unless otherwise indicated, all reported p -v alues are calcu- lated using the W ilcoxon signed-rank test. Example debate. W e will use a debate ov er the mo- tion “Millennials don’t stand a chance” (henceforth Millennials ) as a running e xample. 7 The For side won the debate with a delta of 20% of the votes, compared to the Against side which only gained 5%. 3 Modeling Idea Flow Promoting one’ s own points and addressing the op- ponent’ s points are two primary debating strategies. Here we introduce a methodology to identify these strategies, and use it to in vestigate their usage and ef fect on a debate’ s outcome. 8 Identifying talking points. W e first focus on ideas which form the basis of a side’ s stance on the mo- tion. W e identify such talking points by consider- ing words whose frequency of usage differs signif- icantly between the two teams during the introduc- tion, before an y interaction takes place. T o find these words, we use the method introduced by Monroe et al. (2008) in the context of U.S. Senate speeches. In particular, we estimate the div ergence between the two sides’ word-usage in the introduction, where word-usage is modeled as multinomial distributions smoothed with a uniform Dirichlet prior , and div er- gence is given by log-odds ratio. The most discrim- inating words are those with the highest and lo west z-scores of di ver gence estimates. For a side X , we define the set of talking points W X to be the k words with the highest or lo west z -scores. 9 W e distinguish between X ’ s own talking points W X , and the oppos- ing talking points W Y belonging to its opponent Y . These are examples of talking points for the “Mil- lennials” debate: Side T alking points For debt, boomer , college, reality Against economy , v olunteer , home, engage The flow of talking points. A side can either pro- mote its own talking points, address its opponent’ s points, or steer aw ay from these initially salient 7 http://www.intelligencesquaredus. org/debates/past- debates/item/ 1019- millennials- dont- stand- a- chance 8 In the subsequent discussion, we treat all utterances of a particular side as coming from a single speaker and defer mod- eling interactions within teams to future work. 9 In order to focus on concepts central to the sides’ argu- ments, we discard stopwords, perform stemming on the text, and take k = 20 . W e set these parameters by examining one subsequently discarded debate. T alking point volunteer boomer Introduction A GAINST: [millennials] volunteer more than any generation. 73 percent of millennials v olunteered for a nonprofit in 2012. And the percentage of [students] believing that it’ s [...] important to help people in need is [at the highest lev el] in 40 years. FOR: [ referring to colle ge completion rate ] the boomer generation is now [at] 32 percent. [Millen- nials] are currently at [...] 33 percent. So this notion that [millennials] ha ve more education at this point in time than anybody else is not actually true. Discussion FOR: I’ d make the argument [that] volunteering [is done] for exntrinsic [sic] reasons. So, it’ s done for college applications, or it’ s done because it’ s a re- quirement in high school. FOR: It stinks to be young, having gone through what your generation [ referring to millennials ] has gone through. But keep in mind that [...] hav e gone through the same. T able 1: Example talking points used throughout the “Millennials” debate. Each talking point belongs to the side uttering the first e xcerpt, taken from the introduction; the second e xcerpt is from the discussion section. In the first example, the For side addresses the opposing talking point volunteer during the discussion; in the second example the F or side refers to their own talking point boomer and recalls it later in the discussion. Figure 1: The start of the debate’ s interacti ve stage triggers a drop in self-cov erage ( > 0 , indicated by leftmost two bars) and a rise in opponent-cov erage ( < 0 , indicated by rightmost bars), with ev entual winners showing a more pronounced drop in self- cov erage (comparing the two bars on the left). ideas altogether . W e quantify the use of these strate- gies by comparing the airtime debaters dev ote to talking points. For a side X , let the self-covera ge f r ( X , X ) be the fraction of content words uttered by X in round r that are among their own talking points W X ; and the opponent-co verage f r ( X , Y ) be the fraction of its content words covering opposing talking points W Y . Not surprisingly , we find that self-coverage dominates during the discussion ( f Disc ( X , X ) > f Disc ( X , Y ) , p < 0 . 001 ). Ho wever , this does not mean debaters are simply gi ving monologues and ignoring each other: the effect of the interaction is reflected in a sharp drop in self-coverage and a rise in opponent-cov erage once the discussion round be- gins. Respecti vely , f Disc ( X , X ) < f Intro ( X , X ) and f Disc ( X , Y ) > f Intro ( X , Y ) , both p < 0 . 001 . Examples of self- and opponent-coverage of two talking points in the “Millennials” debate from the introduction and discussion are gi ven in T able 1. Does the change in focus translate to any strate gic adv antages? Figure 1 suggests this is the case: the drop in self-co verage is slightly larger for the side that ev entually wins the debate ( p = 0 . 08 ). The drop in the sum of self- and opponent-cov erage is also larger for winning teams, suggesting that the y are more likely to steer away from discussing any talking points from either side ( p = 0 . 05 ). Identifying discussion points. Having seen that debaters can benefit by shifting away from talking points that were salient during the introduction, we no w examine the ideas that spontaneously arise to serve the discussion. W e model such discussion points as words introduced to the debate during the discussion by a debater and adopted by his oppo- nents at least twice. 10 This allows us to focus on words that become relev ant to the con versation; only 3% of all newly introduced words qualify , amount- ing to about 10 discussion points per debate. The flo w of discussion points. The adoption of dis- cussion points plays an important role in persuad- ing the audience: during the discussion, ev entual winners adopt more discussion points introduced by their opponents than ev entual losers ( p < 0 . 01 ). T wo possible strate gic interpretations emerge. From a topic control angle (Nguyen et al., 2014), perhaps losers are more successful at imposing their discus- sion points to gain control of the discussion. This vie w appears counterintuitive giv en work linking topic control to influence in other settings (Planalp and T racy , 1980; Rienks et al., 2006). 10 Ignoring single repetitions discards simple echoing of words used by the pre vious speaker . A GAINST: I would say [millennials] are effecti vely moving towards goals [...] it might seem like imma- turity if you don’t actually talk to millennials and look at the statistics . FOR: –actually , the numbers are showing [...] that it’ s worsening [...] Same statistics , dreadful statistics . A GAINST: [...] there’ s a incredible [sic] advantage that millennials hav e when it comes to social media [...] be- cause we hav e an understanding of that landscape as digital nativ es [...] FOR: Generation X [...] is also known as the digital generation. The companies [...] that make you digital nativ es were all founded by [...] people in generation X. It’ s simply inaccurate e very time somebody says that the millennial generation is the only generation [...] T able 2: Example discussion points introduced by the Against side in the “Millennials” debate. For each point, the first excerpt is the context in which the point was first mentioned by the Against side in the discussion, and the second excerpt shows the F or side challenging the point later on. An alternativ e interpretation could be that winners are more activ e than losers in contesting their oppo- nents’ points, a strategy that might play out fav or- ably to the audience. A post-hoc manual examina- tion supports this interpretation: 78% of the v alid discussion points are picked up by the opposing side in order to be challenged; 11 this strategy is ex em- plified in T able 2. Overall, these observations tying the flow of discussion points to the debate’ s outcome suggest that winners are more successful at using the interaction to engage with their opponents’ ideas. 4 Predicti ve Power W e e v aluate the predictiv e power of our flo w fea- tures in a binary classification setting: predict whether the For or Against side wins the debate. 12 This is a challenging task e ven for humans, thus the dramatic re veal at the end of each IQ2 debate that partly explains the popularity of the show . Our goal 11 Three annotators (including one author) informally anno- tated a random sample of 50 discussion points in the context of all dialogue excerpts where the point was used. According to a majority vote, in 26 cases the opponents challenged the point, in 7 cases the point was supported, 4 cases were unclear, and in 13 cases the annotators deemed the discussion point inv alid. W e discuss the last cate gory in Section 6. 12 The task is balanced: after removing three debates ending in a tie, we hav e 52 debates won by F or and 53 by Against. here is limited to understanding which of the flow features that we de veloped carry predicti ve po wer . Con versation flow features. W e use all con versa- tional features discussed abo ve. F or each side X we include f Disc ( X , X ) , f Disc ( X , Y ) , and their sum. W e also use the drop in self-cov erage given by sub- tracting corresponding v alues for f Intro ( · , · ) , and the number of discussion points adopted by each side. W e call these the Flow features. Baseline features. T o discard the possibility that our results are simply explained by debater ver - bosity , we use the number of words uttered and num- ber of turns taken by each side ( length ) as baselines. W e also compare to a unigram baseline ( BO W ). A udience featur es. W e use the counts of applause and laughter receiv ed by each side (described in Sec- tion 2) as rough indicators of how well the audience can foresee a debate’ s outcome. Prediction accuracy is ev aluated using a leav e- one-out (LOO) approach. W e use logistic regres- sion; model parameters for each LOO train-test split are selected via 3-fold cross-validation on the train- ing set. T o find particularly predictiv e flow features, we also try using uni v ariate feature selection on the flo w features before the model is fitted in each split; we refer to this setting as Flow* . 13 W e find that con versation flo w features obtain the best accuracy among all listed feature types (Flow: 63%; Flo w*: 65%), performing significantly higher than a 50% random baseline (binomial test p < 0 . 05 ), and comparable to audience features (60%). In contrast, the length and BO W baselines do not perform better than chance. W e note that Flow fea- tures perform competitively despite being the only ones that do not factor in the concluding round. The features selected most often in the Flow* task are: the number of discussion points adopted (with positi ve regression coef ficients), the recall of talk- ing points during the discussion round (negati ve co- ef ficients), and the drop in usage of own talking points from introduction to discussion (positiv e co- ef ficients). The relativ e importance of these features, which focus on the interaction between teams, sug- gests that audiences tend to fa vor debating strategies which emphasize the discussion. 13 W e optimize the re gularizer ( ` 1 or ` 2 ), and the value of the regularization parameter C (between 10 − 5 and 10 5 ). For Flow* we also optimize the number of features selected. 5 Further Related W ork Pre vious work on con versational structure has pro- posed approaches to model dialogue acts (Samuel et al., 1998; Ritter et al., 2010; Ferschke et al., 2012) or disentangle interlea ved con versations (Elsner and Charniak, 2010; Elsner and Charniak, 2011). Other research has considered the problem of detecting con versation-le vel traits such as the presence of dis- agreements (Allen et al., 2014; W ang and Cardie, 2014) or the likelihood of relation dissolution (Nic- ulae et al., 2015). At the participant lev el, sev eral studies present approaches to identify ideological stances (Somasundaran and W iebe, 2010; Rosenthal and McKeo wn, 2015), using features based on par - ticipant interactions (Thomas et al., 2006; Sridhar et al., 2015), or e xtracting w ords and reasons char- acterizing a stance (Monroe et al., 2008; Nguyen et al., 2010; Hasan and Ng, 2014). In our setting, both the stances and the turn structure of a debate are known, allowing us to instead focus on the de- bate’ s outcome. Existing research on argumentation strategies has largely focused on exploiting the structure of mono- logic ar guments (Mochales and Moens, 2011), like those of persuasiv e essays (Feng and Hirst, 2011; Stab and Gurevych, 2014). In addition, T an et al. (2016) has examined the effecti veness of arguments in the context of a forum where people in vite oth- ers to challenge their opinions.W e complement this line of work by looking at the relativ e persuasiv eness of participants in extended con versations as the y e x- change arguments o ver multiple turns. Pre vious studies of influence in extended con- versations ha ve lar gely dealt with the political do- main, examining moderated but relati vely unstruc- tured settings such as talk shows or presidential debates, and suggesting features like topic control (Nguyen et al., 2014), linguistic style matching (Romero et al., 2015) and turn-taking (Prabhakaran et al., 2013). With persuasion in mind, our work e x- tends these studies to explore a new dynamic, the flo w of ideas between speakers, in a highly struc- tured setting that controls for confounding factors. 6 Limitations and Future W ork This study opens se veral avenues for future research. One could explore more complex representations of talking points and discussion points, for instance using topic models or word embeddings. Further - more, augmenting the flo w of content in a con versa- tion with the speakers’ linguistic choices could bet- ter capture their intentions. In addition, it would be interesting to study the interplay between our con- versational flo w features and relativ ely monologic features that consider the argumentati ve and rhetor- ical traits of each side separately . More explicitly comparing and contrasting monologic and interac- ti ve dynamics could lead to better models of con- versations. Such approaches could also help clar- ify some of the intuitions about con versations ex- plored in this work, particularly that engaging in di- alogue carries dif ferent strategic implications from self-promotion. Our focus in this paper is on capturing and under- standing conv ersational flow . W e hence make some simplifying assumptions that could be refined in fu- ture work. For instance, by using a basic unigram- based definition of discussion points, we do not ac- count for the context or semantic sense in which these points occur . In particular , our annotators found that a significant proportion of the discussion points under our definition actually referred to dif- fering ideas in the various contexts in which they appeared. W e expect that improving our retrie v al model will also impro ve the robustness of our idea flo w analysis. A better model of discussion points could also provide more insight into the role of these points in persuading the audience. While Oxford-style debates are a particularly con- venient setting for studying the effects of con versa- tional flow , our dataset is limited in terms of size. It would be w orthwhile to examine the flow features we dev eloped in the conte xt of settings with richer incenti ves beyond persuading an audience, such as in the semi-cooperati ve environment of W ikipedia talk pages. Finally , our methodology could point to applications in areas such as education and co- operati ve work, where it is key to establish the link between con versation features and an interlocutor’ s ability to conv ey their point (Niculae and Danescu- Niculescu-Mizil, 2016). Acknowledgements. W e thank the re viewers and V . Niculae for their helpful comments, and I. Arawjo and D. Sedra for annotations. This work was sup- ported in part by a Google Faculty Research A ward. References Kelse y Allen, Giuseppe Carenini, and Raymond T Ng. 2014. Detecting disagreement in con versations using pseudo-monologic rhetorical structure. In Pr oceed- ings of EMNLP . Micha Elsner and Eugene Charniak. 2010. Disentan- gling chat. Computational Linguistics , 36(3):389– 409. Micha Elsner and Eugene Charniak. 2011. Disentan- gling chat with local coherence models. In Pr oceed- ings of A CL . V anessa W ei Feng and Graeme Hirst. 2011. Classifying arguments by scheme. In Pr oceedings of A CL . Oliv er Ferschke, Iryna Gurevych, and Y evgen Chebotar . 2012. Behind the article: Recognizing dialog acts in W ikipedia talk pages. In Proceedings of EA CL . Kazi Saidul Hasan and V incent Ng. 2014. Why are you taking this stance? Identifying and classifying reasons in ideological debates. In Pr oceedings of EMNLP . Raquel Mochales and Marie-Francine Moens. 2011. Ar - gumentation mining. Artificial Intelligence and Law , 19(1):1–22. Burt L. Monroe, Michael P . Colaresi, and Ke vin M. Quinn. 2008. Fightin’words: Lexical feature selec- tion and ev aluation for identifying the content of polit- ical conflict. P olitical Analysis , 16(4):372–403. Dong Nguyen, Elijah Mayfield, and Carolyn P Ros ´ e. 2010. An analysis of perspectiv es in interacti ve set- tings. In Pr oceedings of the KDD 2010 W orkshop on Social Media Analytics . V iet-An Nguyen, Jordan Boyd-Graber , Philip Resnik, Deborah A Cai, Jennifer E Midberry , and Y uanxin W ang. 2014. Modeling topic control to detect influ- ence in conv ersations using nonparametric topic mod- els. Machine Learning , 95(3):381–421. Vlad Niculae and Cristian Danescu-Niculescu-Mizil. 2016. Conv ersational markers of constructive discus- sions. In Pr oceedings of NAA CL . Vlad Niculae, Srijan Kumar , Jordan Boyd-Graber , and Cristian Danescu-Niculescu-Mizil. 2015. Linguistic harbingers of betrayal: A case study on an online strat- egy game. In Pr oceedings of A CL . Sally Planalp and Karen Trac y . 1980. Not to change the subject but: A cognitive approach to the management of con versation. Communication Y earbook , 4:680– 690. V inodkumar Prabhakaran, Ajita John, and Dor ´ ee D. Seligmann. 2013. Who had the upper hand? Rank- ing participants of interactions based on their relative power . In Proceedings of IJCNLP . Rutger Rienks, Dong Zhang, Daniel Gatica-Perez, and W ilfried Post. 2006. Detection and application of in- fluence rankings in small group meetings. In Pr oceed- ings of ICMI . Alan Ritter , Colin Cherry , and Bill Dolan. 2010. Un- supervised modeling of twitter conv ersations. In Pr o- ceedings of N AA CL . Daniel M Romero, Roderick I Swaab, Brian Uzzi, and Adam D Galinsky . 2015. Mimicry is presidential: Linguistic style matching in presidential debates and improv ed polling numbers. P ersonality and Social Psychology Bulletin , 41(10):1311–1319. Sara Rosenthal and Kathleen McKeo wn. 2015. I couldn’t agree more: The role of conv ersational struc- ture in agreement and disagreement detection in online discussions. In Pr oceedings of SIGDIAL . Ken Samuel, Sandra Carberry , and K. V ijay-Shanker . 1998. Dialogue act tagging with transformation-based learning. In Pr oceedings of ACL . Swapna Somasundaran and Janyce W iebe. 2010. Rec- ognizing stances in ideological on-line debates. In Pr oceedings of the N AACL HLT 2010 W orkshop on Computational Appr oaches to Analysis and Gener a- tion of Emotion in T ext . Dhanya Sridhar, James F oulds, Bert Huang, Lise Getoor , and Marilyn W alker . 2015. Joint models of disagree- ment and stance in online debate. In Proceedings of A CL . Christian Stab and Iryna Gurevych. 2014. Identifying argumentati ve discourse structures in persuasive es- says. In Pr oceedings of EMNLP . Chenhao T an, Vlad Niculae, Cristian Danescu- Niculescu-Mizil, and Lillian Lee. 2016. W inning arguments: Interaction dynamics and persuasion strategies in good-faith online discussions. In Pr oceedings of WWW . Matt Thomas, Bo Pang, and Lillian Lee. 2006. Get out the vote: Determining support or opposition from con- gressional floor-debate transcripts. In Pr oceedings of EMNLP . Lu W ang and Claire Cardie. 2014. A piece of my mind: A sentiment analysis approach for online dispute de- tection. In Pr oceedings of ACL .
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment