Statistical modality tagging from rule-based annotations and crowdsourcing

Statistical Modality T agging fr om Rule-based Annotations and Cr owdsour cing V inodkumar Prabhakaran Michael Bloodgood Mona Diab CS CASL CCLS Columbia Uni versity Uni versity of Maryland Columbia Uni versity vinod@cs.columbia.edu meb@umd.edu mdiab@ccls.columbia.edu Bonnie Dorr Lori Levin Christine D. Piatk o CS and UMIA CS L TI APL Uni versity of Maryland Carnegie Mellon Uni v ersity Johns Hopkins Uni versity bonnie@umiacs.umd.edu lsl@cs.cmu.edu christine.piatko@jhuapl.edu Owen Rambow Benjamin V an Durme CCLS HL TCOE Columbia Uni versity Johns Hopkins Uni versity rambo w@ccls.columbia.edu v andurme@cs.jhu.edu Abstract W e explore training an automatic modality tagger . Modality is the attitude that a speaker might hav e to ward an e v ent or state. One of the main hurdles for training a linguistic tag- ger is gathering training data. This is par- ticularly problematic for training a tagger for modality because modality triggers are sparse for the ov erwhelming majority of sentences. W e in vestig ate an approach to automatically training a modality tagger where we ﬁrst gath- ered sentences based on a high-recall simple rule-based modality tagger and then provided these sentences to Mechanical T urk annotators for further annotation. W e used the resulting set of training data to train a precise modality tagger using a multi-class SVM that delivers good performance. 1 Introduction Modality is an extra-propositional component of meaning. In J ohn may go to NY , the basic propo- sition is J ohn go to NY and the word may indi- cates modality . V an Der Auwera and Ammann (2005) deﬁne core cases of modality: J ohn must go to NY (epistemic necessity), J ohn might go to NY (epistemic possibility), J ohn has to leave now (deontic necessity) and John may leave now (de- ontic possibility). Many semanticists (e.g. Kratzer (1981), Kratzer (1991), Kaufmann et al. (2006)) de- ﬁne modality as quantiﬁcation o ver possible worlds. J ohn might go means that there exist some possi- ble worlds in which John goes. Another vie w of modality relates more to a speaker’ s attitude to ward a proposition (e.g. McShane et al. (2004)). Modality might be construed broadly to include se veral types of attitudes that a speaker wants to ex- press tow ards an ev ent, state or proposition. Modal- ity might indicate facti vity , evidentiality , or senti- ment (McShane et al., 2004). Facti vity is related to whether the speaker wishes to con v ey his or her be- lief that the propositional content is true or not, i.e., whether it actually obtains in this w orld or not. It distinguishes things that (the speaker believes) hap- pened from things that he or she desires, plans, or considers merely probable. Evidentiality deals with the source of information and may provide clues to the reliability of the information. Did the speaker This paper was published within the Pr oceedings of the A CL-2012 W orkshop on Extra-Pr opositional Aspects of Meaning in Computational Linguistics (ExPr oM-2012), pa ges 57-64, J eju, Republic of K or ea, 13 J uly 2012. c  2012 Association for Computational Linguistics hav e ﬁrsthand knowledge of what he or she is re- porting, or was it hearsay or inferred from indirect e vidence? Sentiment deals with a speaker’ s positiv e or negativ e feelings tow ard an e vent, state, or propo- sition. In this paper , we focus on the follo wing ﬁv e modalities; we hav e inv estigated the belief/factivity modality previously (Diab et al., 2009b; Prab- hakaran et al., 2010), and we leav e other modalities to future work. • Ability: can H do P? • Effort: does H try to do P? • Intention: does H intend P? • Success: does H succeed in P? • W ant: does H want P? W e in vestigate automatically training a modality tagger by using multi-class Support V ector Ma- chines (SVMs). One of the main hurdles for training a linguistic tagger is gathering training data. This is particularly problematic for training a modality tag- ger because modality triggers are sparse for the o ver- whelming majority of the sentences. Bak er et al. (2010) created a modality tagger by using a semi- automatic approach for creating rules for a rule- based tagger . A pilot study re vealed that it can boost recall well abov e the naturally occurring proportion of modality without annotated data b ut with only 60% precision. W e in vestigated an approach where we ﬁrst gathered sentences based on a simple modal- ity tagger and then pro vided these sentences to an- notators for further annotation, The resulting anno- tated data also preserved the level of inter -annotator agreement for each example so that learning algo- rithms could take that into account during training. Finally , the resulting set of annotations was used for training a modality tagger using SVMs, which ga ve a high precision indicating the success of this ap- proach. Section 2 discusses related work. Section 3 dis- cusses our procedure for gathering training data. Section 4 discusses the machine learning setup and features used to train our modality tagger and presents experiments and results. Section 5 con- cludes and discusses future work. 2 Related W ork Pre vious related work includes T imeML (Sauri et al., 2006), which in volves modality annotation on e vents, and Factbank (Sauri and Pustejo vsky , 2009), where e vent mentions are marked with de gree of fac- tuality . Modality is also important in the detection of uncertainty and hedging. The CoNLL shared task in 2010 (Farkas et al., 2010) deals with automatic de- tection of uncertainty and hedging in Wikipedia and biomedical sentences. Baker et al. (2010) and Baker et al. (2012) ana- lyze a set of eight modalities which include belief, require and permit, in addition to the ﬁve modalities we focus on in this paper . They built a rule-based modality tagger using a semi-automatic approach to create rules. This earlier work dif fers from the w ork described in this paper in that the our emphasis is on the creation of an automatic modality tagger using machine learning techniques. Note that the anno- tation and automatic tagging of the belief modality (i.e., f activity) is described in more detail in (Diab et al., 2009b; Prabhakaran et al., 2010). There has been a considerable amount of inter- est in modality in the biomedical domain. Negation, uncertainty , and hedging are annotated in the Bio- scope corpus (V incze et al., 2008), along with infor - mation about which words are in the scope of nega- tion/uncertainty . The i2b2 NLP Shared T ask in 2010 included a track for detecting assertion status (e.g. present, absent, possible, conditional, hypothetical etc.) of medical problems in clinical records. 1 Apos- tolov a et al. (2011) presents a rule-based system for the detection of negation and speculation scopes us- ing the Bioscope corpus. Other studies emphasize the importance of detecting uncertainty in medical text summarization (Morante and Daelemans, 2009; Aramaki et al., 2009). Modality has also received some attention in the context of certain applications. Earlier work de- scribing the difﬁculty of correctly translating modal- ity using machine translation includes (Sigurd and Gawr ´ onska, 1994) and (Murata et al., 2005). Sig- urd et al. (1994) write about rule based frameworks and ho w using alternate grammatical constructions such as the passi ve can impro ve the rendering of the modal in the tar get language. Murata et al. (2005) 1 https://www .i2b2.org/NLP/Relations/ analyze the translation of Japanese into English by several systems, showing they often render the present incorrectly as the progressi ve. The authors trained a support vector machine to speciﬁcally han- dle modal constructions, while our modal annotation approach is a part of a full translation system. The textual entailment literature includes modal- ity annotation schemes. Identifying modalities is important to determine whether a text entails a hy- pothesis. Bar-Haim et al. (2007) include polarity based rules and negation and modality annotation rules. The polarity rules are based on an indepen- dent polarity le xicon (Nairn et al., 2006). The an- notation rules for negation and modality of predi- cates are based on identifying modal verbs, as well as conditional sentences and modal adverbials. The authors read the modality off parse trees directly us- ing simple structural rules for modiﬁers. 3 Constructing Modality T raining Data In this section, we will discuss the procedure we follo wed to construct the training data for build- ing the automatic modality tagger . In a pilot study , we obtained and ran the modality tagger described in (Baker et al., 2010) on the English side of the Urdu-English LDC language pack. 2 W e randomly selected 1997 sentences that the tagger had labeled as not having the W ant modality and posted them on Amazon Mechanical T urk (MT urk). Three differ - ent T urkers (MT urk annotators) marked, for each of the sentences, whether it contained the W ant modal- ity . Using majority rules as the T urker judgment, 95 (i.e., 4.76%) of these sentences were marked as having a W ant modality . W e also posted 1993 sen- tences that the tagger had labeled as having a W ant modality and only 1238 of them were marked by the T urkers as ha ving a W ant modality . Therefore, the estimated precision of this type of approach is only around 60%. Hence, we will not be able to use the (Baker et al., 2010) tagger to gather training data. Instead, our approach was to apply a simple tagger as a ﬁrst pass, with positi ve e xamples subsequently hand- annotated using MT urk. W e made use of sentence data from the Enron email corpus, 3 deri ved from the 2 LDC Catalog No.: LDC2006E110. 3 http://www-2.cs.cmu.edu/ ∼ enron/ version o wing to Fiore and Heer, 4 further processed as described by (Roark, 2009). 5 T o construct the simple tagger (the ﬁrst pass), we used a lexicon of modality trigger words (e.g., try , plan, aim, wish, want ) constructed by Baker et al. (2010). The tagger essentially tags each sentence that has a word in the le xicon with the corresponding modality . W e wrote a few simple obvious ﬁlters for a handful of exceptional cases that arise due to the f act that our sentences are from e-mail. For example, we ﬁltered out best wishes expressions, which otherwise would ha ve been tagged as W ant because of the word wishes . The words that trigger modality occur with very dif ferent frequencies. If one is not careful, the training data may be dominated by only the com- monly occurring trigger words and the learned tag- ger would then be biased to wards these words. In order to ensure that our training data had a di verse set of examples containing man y le xical triggers and not just a lot of examples with the same lexical trig- ger , for each modality we capped the number of sen- tences from a single trigger to be at most 50. After we had the set of sentences selected by the simple tagger , we posted them on MT urk for annotation. The T urkers were asked to check a box indicat- ing that the modality was not present in the sentence if the given modality was not expressed. If they did not check that box, then they were asked to highlight the target of the modality . T able 1 sho ws the number of sentences we posted on MT urk for each modal- ity . 6 Three T urkers annotated each sentence. W e restricted the task to T urkers who were adults, had greater than a 95% approv al rating, and had com- pleted at least 50 HITs (Human Intelligence T asks) on MT urk. W e paid US$0.10 for each set of ten sen- tences. Since our data was annotated by three T urkers, for training data we used only those examples for which at least tw o T urkers agreed on the modality and the target of the modality . This resulted in 1,008 examples. 674 examples had two T urkers agreeing and 334 had unanimous agreement. W e kept track of the lev el of agreement for each example so that 4 http://bailando.sims.berkeley .edu/enron/enron.sql.gz 5 Data receiv ed through personal communication 6 More detailed statistics on MT urk annotations are a vailable at http://hltcoe.jhu.edu/datasets/. Modality Count Ability 190 Ef fort 1350 Intention 1320 Success 1160 W ant 1390 T able 1: For each modality , the number of sentences returned by the simple tagger that we posted on MT urk. our learner could weight the examples differently depending on the lev el of inter-annotator agreement. 4 Multiclass SVM f or Modality In this section, we describe the automatic modal- ity tagger we built using the MT urk annotations de- scribed in Section 3 as the training data. Section 4.1 describes the training and e valuation data. In Sec- tion 4.2, we present the machinery and Section 4.3 describes the features we used to train the tagger . In Section 4.4, we present various experiments and discuss results. Section 4.5, presents additional ex- periments using annotator conﬁdence. 4.1 Data For training, we used the data presented in Section 3. W e refer to it as MT urk data in the rest of this paper . For e v aluation, we selected a part of the LU Corpus (Diab et al., 2009a) (1228 sentences) and our expert annotated it with modality tags. W e ﬁrst used the high-recall simple modality tagger described in Sec- tion 3 to select the sentences with modalities. Out of the 235 sentences returned by the simple modal- ity tagger , our expert removed the ones which did not in fact ha ve a modality . In the remaining sen- tences (94 sentences), our expert annotated the tar- get predicate. W e refer to this as the Gold dataset in this paper . The MT urk and Gold datasets differ in terms of genres as well as annotators (T urker vs. Ex- pert). The distribution of modalities in both MT urk and Gold annotations are gi ven in T able 2. 4.2 A pproach W e applied a supervised learning framework us- ing multi-class SVMs to automatically learn to tag Modality MT urk Gold Ability 6% 48% Ef fort 25% 10% Intention 30% 11% Success 24% 9% W ant 15% 23% T able 2: Frequency of Modalities modalities in context. For tagging, we used the Y am- cha (K udo and Matsumoto, 2003) sequence labeling system which uses the SVM light (Joachims, 1999) package for classiﬁcation. W e used One versus All method for multi-class classiﬁcation on a quadratic kernel with a C v alue of 1. W e report recall and pre- cision on word tok ens in our corpus for each modal- ity . W e also report F β =1 (F)-measure as the har- monic mean between (P)recision and (R)ecall. 4.3 F eatures W e used lexical features at the tok en le vel which can be extracted without any parsing with relativ ely high accuracy . W e use the term context width to denote the windo w of tokens whose features are considered for predicting the tag for a giv en token. For e xample, a context width of 2 means that the feature vector of an y giv en token includes, in addition to its o wn features, those of 2 tokens before and after it as well as the tag prediction for 2 tokens before it. W e did experiments varying the context width from 1 to 5 and found that a context width of 2 giv es the optimal performance. All results reported in this paper are obtained with a context width of 2 . For each token, we performed experiments using following lexical features: • wordStem - W ord stem. • wordLemma - W ord lemma. • POS - W ord’ s POS tag. • isNumeric - W ord is Numeric? • verbT ype - Modal/Auxiliary/Regular/Nil • whichModal - If the word is a modal verb, which modal? W e used the Porter stemmer (Porter , 1997) to ob- tain the stem of a word token. T o determine the word lemma, we used an in-house lemmatizer using dictionary and morphological analysis to obtain the dictionary form of a word. W e obtained POS tags from Stanford POS tagger and used those tags to determine verbT ype and whichModal features. The verbT ype feature is assigned a v alue ‘Nil’ if the word is not a v erb and whichModal feature is assigned a v alue ‘Nil’ if the word is not a modal verb . The fea- ture isNumeric is a binary feature denoting whether the token contains only digits or not. 4.4 Experiments and Results In this section, we present experiments performed considering all the MT urk annotations where two annotators agreed and all the MT urk annotations where all three annotators agreed to be equally cor- rect annotations. W e present experiments applying dif ferential weights for these annotations in Section 4.5. W e performed 4-fold cross validation (4FCV) on MT urk data in order to select the best feature set conﬁguration φ . The best feature set obtained was wor dS tem, P OS, whichM odal with a context width of 2 . For ﬁnding the best performing fea- ture set - context width conﬁguration, we did an ex- hausti ve search on the feature space, pruning away features which were prov en not useful by results at stages. T able 3 presents results obtained for each modality on 4-fold cross v alidation. Modality Precision Recall F Measure Ability 82.4 55.5 65.5 Ef fort 95.1 82.8 88.5 Intention 84.3 61.3 70.7 Success 93.2 76.6 83.8 W ant 88.4 64.3 74.3 Overall 90.1 70.6 79.1 T able 3: Per modality results for best feature set φ on 4-fold cross validation on MT urk data W e also trained a model on the entire MT urk data using the best feature set φ and ev aluated it against the Gold data. The results obtained for each modal- ity on gold ev aluation are given in T able 4. W e at- tribute the lo wer performance on the Gold dataset to its difference from MT urk data. MT urk data is en- tirely from email threads, whereas Gold data con- tained sentences from newswire, letters and blogs in addition to emails. Furthermore, the annotation is dif ferent (T urkers vs expert). Finally , the distri- bution of modalities in both datasets is v ery dif fer- ent. F or e xample, Ability modality was merely 6% of MT urk data compared to 48% in Gold data (see T able 2). Modality Precision Recall F Measure Ability 78.6 22.0 34.4 Ef fort 85.7 60.0 70.6 Intention 66.7 16.7 26.7 Success N A 0.0 N A W ant 92.3 50.0 64.9 Overall 72.1 29.5 41.9 T able 4: Per modality results for best feature set φ ev alu- ated on Gold dataset W e obtained reasonable performances for Effort and W ant modalities while the performance for other modalities was rather low . Also, the Gold dataset contained only 8 instances of Success , none of which was recognized by the tagger resulting in a recall of 0%. Precision (and, accordingly , F Measure) for Success was considered “not applicable” (N A), as no such tag was assigned. 4.5 Annotation Conﬁdence Experiments Our MT urk data contains sentence for which at least two of the three T urkers agreed on the modality and the target of the modality . In this section, we inv esti- gate the role of annotation conﬁdence in training an automatic tagger . The annotation conﬁdence is de- noted by whether an annotation was agreed by only two annotators or w as unanimous. W e denote the set of sentences for which only two annotators agreed as Ag r 2 and that for which all three annotators agreed as Ag r 3 . W e present four training setups. The ﬁrst setup is T r 23 where we train a model using both Ag r 2 and Ag r 3 with equal weights. This is the setup we used for results presented in the Section 4.4. Then, we hav e T r 2 and T r 3 , where we train using only Ag r 2 and Ag r 3 respecti vely . Then, for T r 23 W , we T rainingSetup T ested on Ag r 2 and Ag r 3 T ested on Ag r 3 only Precision Recall F Measure Precision Recall F Measure T r 23 90.1 70.6 79.1 95.9 86.8 91.1 T r 2 91.0 66.1 76.5 95.6 81.8 88.2 T r 3 88.1 52.3 65.6 96.8 71.7 82.3 T r 23 W 89.9 70.5 79.0 95.8 86.5 90.9 T able 5: Annotator Conﬁdence Experiment Results; the best results per column are boldfaced (4-fold cross v alidation on MT urk Data) train a model gi ving different cost values for Ag r 2 and Ag r 3 examples. The SVMLight package al- lo ws users to input cost v alues c i for each training instance separately . 7 W e tuned this cost value for Ag r 2 and Ag r 3 examples and found the best value at 20 and 30 respectiv ely . For all four setups, we used feature set φ . W e per - formed 4 -fold cross v alidation on MT urk data in two ways — we tested against a combination of Ag r 2 and Ag r 3 , and we tested against only Ag r 3 . Results of these experiments are presented in T able 5. W e also present the results of ev aluating a tagger trained on the whole MT urk data for each setup against the Gold annotation in T able 6. The T r 23 tested on both Ag r 2 and Ag r 3 presented in T able 5 and T r 23 tested on Gold data presented in T able 6 correspond to the results presented in T able 3 and T able 4 respecti vely . T rainingSetup Precision Recall F Measure T r 23 72.1 29.5 41.9 T r 2 67.4 27.6 39.2 T r 3 74.1 19.1 30.3 T r 23 W 73.3 31.4 44.0 T able 6: Annotator Conﬁdence Experiment Results; the best results per column are boldfaced (Ev aluation against Gold) One main observ ation is that including annota- tions of lower agreement, but still abov e a threshold (in our case, 66.7%), is deﬁnitely helpful. T r 23 out- performed both T r 2 and T r 3 in both recall and F- 7 This can be done by specifying ‘cost: < value > ’ after the label in each training instance. This feature has not yet been documented on the SVMlight website. measure in all e valuations. Also, even when ev aluat- ing against only the high conﬁdent Ag r 3 cases, T r 2 ga ve a high gain in recall (10 .1 percentage points) ov er T r 3 , with only a 1.2 percentage point loss on precision. W e conjecture that this is because there are far more training instances in T r 2 than in T r 3 (674 vs 334), and that quantity beats quality . Another important observation is the increase in performance by using varied costs for Ag r 2 and Ag r 3 examples (the T r 23 W condition). Although it dropped the performance by 0.1 to 0.2 points in cross-validation F measure on the Enron cor - pora, it gained 2.1 points in Gold e v aluation F mea- sure. These results seem to indicate that differen- tial weighting based on annotator agreement might hav e more beneﬁcial impact when training a model that will be applied to a wide range of genres than when training a model with genre-speciﬁc data for application to data from the same genre. Put differ - ently , using varied costs prev ents genre ov er-ﬁtting. W e don’t hav e a full explanation for this difference in behavior yet. W e plan to explore this in future work. 5 Conclusion W e hav e presented an innov ati ve way of combining a high-recall simple tagger with Mechanical T urk annotations to produce training data for a modality tagger . W e sho w that we obtain good performance on the same genre as this training corpus (annotated in the same manner), and reasonable performance across genres (annotated by an independent expert). W e also present e xperiments utilizing the number of agreeing T urkers to choose cost v alues for training examples for the SVM. As future work, we plan to extend this approach to other modalities which are not cov ered in this study . 6 Acknowledgments This work is supported, in part, by the Johns Hop- kins Human Language T echnology Center of Ex- cellence. Any opinions, ﬁndings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reﬂect the views of the sponsor . W e thank se veral anony- mous re vie wers for their constructi ve feedback. References Emilia Apostolov a, Noriko T omuro, and Dina Demner - Fushman. 2011. Automatic extraction of lexico- syntactic patterns for detection of negation and spec- ulation scopes. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguis- tics: Human Language T echnologies: short papers - V olume 2 , HL T ’11, pages 283–287, Portland, Ore gon. Eiji Aramaki, Y asuhide Miura, Masatsugu T onoike, T omoko Ohkuma, Hiroshi Mashuichi, and Kazuhiko Ohe. 2009. T ext2table: Medical text summarization system based on named entity recognition and modal- ity identiﬁcation. In Pr oceedings of the BioNLP 2009 W orkshop , pages 185–192, Boulder , Colorado, June. Association for Computational Linguistics. Kathryn Baker , Michael Bloodgood, Bonnie J. Dorr , Nathaniel W . Filardo, Lori S. Le vin, and Christine D. Piatko. 2010. A modality lexicon and its use in auto- matic tagging. In LREC . Kathryn Baker , Michael Bloodgood, Bonnie J. Dorr , Chris Callison-Burch, Nathaniel W . Filardo, Christine Piatko, Lori Le vin, and Scott Miller . 2012. Use of modality and negation in semantically-informed syn- tactic mt. Computational Linguistics , 38(22). Roy Bar-Haim, Ido Dagan, Iddo Greental, and Eyal Shnarch. 2007. Semantic inference at the lexical- syntactic level. In Pr oceedings of the 22nd Na- tional Confer ence on Artiﬁcial intelligence - V olume 1 , pages 871–876, V ancouver , British Columbia, Canada. AAAI Press. Mona Diab, Bonnie Dorr, Lori Levin, T eruko Mitamura, Rebecca Passonneau, Owen Rambow , and Lance Ramshaw . 2009a. Language Understanding Anno- tation Corpus . Linguistic Data Consortium (LDC), USA. Mona Diab, Lori Levin, T eruko Mitamura, Owen Ram- bow , V inodkumar Prabhakaran, and W eiwei Guo. 2009b . Committed belief annotation and tagging. In Pr oceedings of the Third Linguistic Annotation W ork- shop , pages 68–73, Suntec, Singapore, August. Asso- ciation for Computational Linguistics. Rich ´ ard Farkas, V eronika V incze, Gy ¨ orgy Szarvas, Gy ¨ orgy M ´ ora, and J ´ anos Csirik, editors. 2010. Pr o- ceedings of the F ourteenth Confer ence on Computa- tional Natural Language Learning . Association for Computational Linguistics, Uppsala, Sweden, July . Thorsten Joachims, 1999. Making large-scale support vector machine learning practical , pages 169–184. MIT Press, Cambridge, MA, USA. Stefan Kaufmann, Cleo Condoravdi, and V alentina Harizanov , 2006. F ormal Appr oaches to Modality , pages 72–106. Mouton de Gruyter . Angelika Kratzer . 1981. The Notional Category of Modality . In H. J. Eikmeyer and H. Rieser, editors, W ords, W orlds, and Contexts , pages 38–74. de Gruyter , Berlin. Angelika Kratzer . 1991. Modality . In Arnim von Ste- chow and Dieter W underlich, editors, Semantics: An International Handbook of Contemporary Resear ch . de Gruyter . T aku K udo and Y uji Matsumoto. 2003. F ast methods for kernel-based text analysis. In 41st Meeting of the Association for Computational Linguistics (ACL ’03) , Sapporo, Japan. Marjorie McShane, Sergei Nirenbur g, and Ron Zacharsky . 2004. Mood and modality: Out of the theory and into the fray . Natural Language Engineering , 19(1):57–89. Roser Morante and W alter Daelemans. 2009. Learn- ing the scope of hedge cues in biomedical texts. In Pr oceedings of the BioNLP 2009 W orkshop , pages 28– 36, Boulder , Colorado, June. Association for Compu- tational Linguistics. Masaki Murata, Kiyotaka Uchimoto, Qing Ma, T oshiyuki Kanamaru, and Hitoshi Isahara. 2005. Analysis of machine translation systems’ errors in tense, aspect, and modality . In Pr oceedings of the 19th Asia-P aciﬁc Confer ence on Language, Information and Computa- tion (P ACLIC) , T apei. Row an Nairn, Cleo Condorovdi, and Lauri Karttunen. 2006. Computing relativ e polarity for textual infer - ence. In Pr oceedings of the International W orkshop on Infer ence in Computational Semantics , ICoS-5, pages 66–76, Buxton, England. M. F . Porter, 1997. An algorithm for sufﬁx stripping , pages 313–316. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. V inodkumar Prabhakaran, Owen Rambow , and Mona Diab . 2010. Automatic committed belief tagging. In Coling 2010: P osters , pages 1014–1022, Beijing, China, August. Coling 2010 Organizing Committee. Brian Roark. 2009. Open vocab ulary language model- ing for binary response typing interfaces. T echnical report, Oregon Health and Science Uni versity . Roser Sauri and James Pustejovsky . 2009. Factbank: a corpus annotated with ev ent factuality . Language Resour ces and Evaluation , 43(3):227–268. Roser Sauri, Marc V erhagen, and James Pustejovsky . 2006. Annotating and recognizing event modality in text. In FLAIRS Confer ence , pages 333–339. Bengt Sigurd and Barbara Gawr ´ onska. 1994. Modals as a problem for MT. In Proceedings of the 15th In- ternational Confer ence on Computational Linguistics (COLING) V olume 1 , COLING ’94, pages 120–124, Kyoto, Japan. Johan V an Der Auwera and Andreas Ammann, 2005. Overlap between situational and epistemic modal marking , chapter 76, pages 310–313. Oxford Univ er- sity Press. V eronika V incze, Gy or gy Szarv as, Rich ´ ad F arkas, Gy orgy Mora, and J ´ anos Csirik. 2008. The Bio- Scope corpus: biomedical texts annotated for uncer- tainty , ne gation and their scopes. BMC Bioinformat- ics , 9(Suppl 11):S9+.

Statistical modality tagging from rule-based annotations and crowdsourcing

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment