C. Paul Cook

2023

Raghuraman Swaminathan and Paul Cook. 2023. Token-level Identification of Multiword Expressions using Pre-trained Multilingual Language Models. In Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023), pages 1–6. Dubrovnik, Croatia.

2022

Diego Bear and Paul Cook. 2022. Leveraging a Bilingual Dictionary to Learn Wolastoqey Word Representations. In Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022), pages 1159–1166. Marseille, France.

Diego Bear and Paul Cook. 2022. Evaluating Unsupervised Approaches to Morphological Segmentation for Wolastoqey. In Proceedings of the LREC 2022 Workshop of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages (SIGUL 2022), pages 155–160. Marseille, France.

Archna Bhatia, Paul Cook, Marcos Garcia, Shiva Taslimipoor, Carlos Ramisch (Editors). 2022. Proceedings of the 18th Workshop on Multiword Expressions (MWE 2022). Marseille, France.

2021

Diego Bear and Paul Cook. Cross-Lingual Wolastoqey-English Definition Modelling. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 138–146. Online.

Milton King and Paul Cook. Now, It’s Personal: The Need for Personalized Word Sense Disambiguation. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 692–700. Online.

Ali Hakimi Parizi and Paul Cook. 2021. Evaluating a Joint Training Approach for Learning Cross-lingual Embeddings with Sub-word Information without Parallel Corpora on Lower-resource Languages. In Proceedings of the 10th Joint Conference on Lexical and Computational Semantics (*SEM 2021), pages 302–307. Online.

Samin Fakharian and Paul Cook. 2021. Contextualized Embeddings Encode Monolingual and Cross-lingual Knowledge of Idiomaticity. In Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021), pages 23–32. Online.

Paul Cook, Jelena Mitrović, Carla Parra Escartín, Ashwini Vaidya, Petya Osenova, Shiva Taslimipoor, Carlos Ramisch (Editors). 2021. Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021). Online.

Milton King, Ali Hakimi Parizi, Samin Fakharian and Paul Cook. 2021. UNBNLP at SemEval-2021 Task 1: Predicting lexical complexity with masked language models and character-level encoders. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pages 650–654. Online.

2020

Ali Hakimi Parizi and Paul Cook. 2020. Joint Training for Learning Cross-lingual Embeddings with Sub-word Information without Parallel Corpora. In Proceedings of The 9th Joint Conference on Lexical and Computational Semantics (*SEM 2020), pages 39–49. Barcelona, Spain (online).

Paul Cook. 2020. Natural Language Processing in Lexicography. In S. Ogilvie (ed.), The Cambridge Companion to English Dictionaries (Cambridge Companions to Literature, pp. 240–252). Cambridge: Cambridge University Press.

Arman Kabiri and Paul Cook. 2020. Evaluating a Multi-sense Definition Generation Model for Multiple Languages. In: Sojka P., Kopeček I., Pala K., Horák A. (eds) Text, Speech, and Dialogue. TSD 2020. Lecture Notes in Computer Science, vol 12284, pages 153–161. Springer, Cham. arXiv pre-print

Milton King and Paul Cook. 2020. Authorship Verification with Personalized Language Models. In: Sojka P., Kopeček I., Pala K., Horák A. (eds) Text, Speech, and Dialogue. TSD 2020. Lecture Notes in Computer Science, vol 12284, pages 248–256. Springer, Cham.

Ali Hakimi Parizi and Paul Cook. 2020. Evaluating Sub-word Embeddings in Cross-lingual Models. In Proceedings of The 12th Language Resources and Evaluation Conference, pages 2712–2719. Marseille, France.

Jeremie Boudreau, Akankshya Patra, Ashima Suvarna and Paul Cook. 2020. Evaluating the Impact of Sub-word Information and Cross-lingual Word Embeddings on Mi'kmaq Language Modelling. In Proceedings of The 12th Language Resources and Evaluation Conference, pages 2736–2745. Marseille, France.

Milton King and Paul Cook. 2020. Evaluating Approaches to Personalizing Language Models. In Proceedings of The 12th Language Resources and Evaluation Conference, pages 2461–2469. Marseille, France.

2019

Ali Hakimi Parizi, Milton King and Paul Cook. 2019. UNBNLP at SemEval-2019 Task 5 and 6: Using Language Models to Detect Hate Speech and Offensive Language. In Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-2019), pages 514–518. Minneapolis, USA.

Milton King and Paul Cook. 2019. Building Personalized Language Models through Language Model Interpolation. In Proceedings of the 20th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2019). La Rochelle, France.

2018

Bahar Salehi, Paul Cook and Timothy Baldwin. 2018. Exploiting multilingual lexical resources to predict MWE compositionality. In Stella Markantonatou, Carlos Ramisch, Agata Savary and Veronika Vincze (eds.), Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop, pages 343–373. Berlin: Language Science Press.

Ali Hakimi Parizi and Paul Cook. 2018. Do Character-Level Neural Network Language Models Capture Knowledge of Multiword Expression Compositionality? In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), pages 185–192. Santa Fe, New Mexico.

Vaibhavi Kalgutkar, Natalia Stakhanova, Paul Cook and Alina Matyukhina. 2018. Android authorship attribution through string analysis. In Proceedings of the 13th International Conference on Availability, Reliability and Security (ARES 2018), pages 4:1–4:10. Hamburg, Germany. Runner-up for best paper award

Milton King and Paul Cook. 2018. Leveraging distributed representations and lexico-syntactic fixedness for token-level prediction of the idiomaticity of English verb-noun combinations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 345–350. Melbourne, Australia.

Milton King, Ali Hakimi Parizi and Paul Cook. 2018. UNBNLP at SemEval-2018 Task 10: Evaluating unsupervised approaches to capturing discriminative attributes. In Proceedings of The 12th International Workshop on Semantic Evaluation (SemEval 2018), pages 1013–1016. New Orleans, Louisiana.

Milton King, Dima Alhadidi and Paul Cook. 2018. Text-based Detection of Unauthorized Users of Social Media Accounts. In Bagheri E., Cheung J. (eds) Advances in Artificial Intelligence. Canadian AI 2018. Lecture Notes in Computer Science, vol 10832. Springer.

Anant Maheshwari, Léo Bouscarrat and Paul Cook. 2018. Towards Language Technology for Mi'kmaq. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), pages 4139–4143. Miyazaki, Japan.

2017

Waseem Gharbieh, Virendra Bhavsar and Paul Cook. 2017. Deep Learning Models For Multiword Expression Identification. In Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017), pages 54–64. Vancouver, Canada.

Paul Cook and Laurel J. Brinton. 2017. Building and evaluating web corpora representing national varieties of English. Language Resources and Evaluation, 51:643–662.

Milton King and Paul Cook. 2017. Supervised and unsupervised approaches to measuring usage similarity. In Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications, pages 47–52. Valencia, Spain.

2016

Bahar Salehi, Paul Cook and Timothy Baldwin. 2016. Determining the Multiword Expression Inventory of a Surprise Language. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 471–481. Osaka, Japan.

Paul Cook, Stefan Evert, Roland Schäfer and Egon Stemle, editors. 2016. Proceedings of the 10th Web as Corpus Workshop (WAC-X) and the EmpiriST Shared Task. Berlin, Germany.

Waseem Gharbieh, Virendra Bhavsar and Paul Cook. 2016. A Word Embedding Approach to Identifying Verb-Noun Idiomatic Combinations. In Proceedings of the 12th Workshop on Multiword Expressions (MWE 2016), pages 112–118. Berlin, Germany.

Milton King, Waseem Gharbieh, SoHyun Park and Paul Cook. 2016. UNBNLP at SemEval-2016 Task 1: Semantic Textual Similarity: A Unified Framework for Semantic Processing and Evaluation. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pages 732–735. San Diego, United States.

Richard Killam, Paul Cook and Natalia Stakhanova. 2016. Android Malware Classification through Analysis of String Literals. In First Workshop on Text Analytics for Cybersecurity and Online Safety (TA-COS 2016). Portoroz, Slovenia.

Richard Fothergill, Paul Cook and Timothy Baldwin. 2016. Evaluating a Topic Modelling Approach to Measuring Corpus Similarity. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pages 273–279. Portoroz, Slovenia.

SoHyun Park, Afsaneh Fazly, Annie Lee, Brandon Seibel, Wenjie Zi and Paul Cook. 2016. Automatically Classifying Out-of-vocabulary Terms in a Domain-Specific Social Media Corpus. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pages 2971–2975. Portoroz, Slovenia.

2015

Long Duong, Trevor Cohn, Steven Bird and Paul Cook. 2015. A Neural Network Model for Low-Resource Universal Dependency Parsing. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), pages 339–348. Lisbon, Portugal.

Long Duong, Trevor Cohn, Steven Bird and Paul Cook. 2015. Cross-lingual Transfer for Unsupervised Dependency Parsing Without Parallel Data. In Proceedings of the Nineteenth Conference on Computational Natural Language Learning, pages 113–122. Beijing, China.

Long Duong, Trevor Cohn, Steven Bird and Paul Cook. 2015. Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 845–850. Beijing, China.

Bahar Salehi, Nitika Mathur, Paul Cook and Timothy Baldwin. 2015. The Impact of Multiword Expression Compositionality on Machine Translation Evaluation. In Proceedings of the 11th Workshop on Multiword Expressions (MWE 2015), pages 54–59. Denver, Colorado.

Bahar Salehi, Paul Cook and Timothy Baldwin. 2015. A Word Embedding Approach to Predicting the Compositionality of Multiword Expressions. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT 2015), pages 977–983. Denver, Colorado.

2014

Paul Cook, Bo Han and Timothy Baldwin. 2014. Statistical Methods for Identifying Local Dialectal Terms from GPS-tagged Documents. Dictionaries, 35:248–271.

Bahar Salehi, Paul Cook and Timothy Baldwin. 2014. Detecting Non-compositional MWE Components using Wiktionary. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014), pages 1792–1797. Doha, Qatar.

Long Duong, Trevor Cohn, Karin Verspoor, Steven Bird and Paul Cook. 2014. What Can We Get From 1k Tokens? A Case Study of Multilingual POS Tagging For Resource-Poor Languages. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014), pages 886–897. Doha, Qatar.

Paul Cook, Jey Han Lau, Diana McCarthy and Timothy Baldwin. 2014. Novel word-sense identification. In Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014), pages 1624–1635. Dublin, Ireland. Dataset

Marco Lui, Ned Letcher, Oliver Adams, Long Duong, Paul Cook and Timothy Baldwin. 2014. Exploring Methods and Resources for Discriminating Similar Languages. In Proceedings of the 1st Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (VarDial), pages 129–138. Dublin, Ireland.

Paul Cook, Michael Rundell, Jey Han Lau and Timothy Baldwin. Applying a Word-sense Induction System to the Automatic Extraction of Diverse Dictionary Examples. In Proceedings of the XVI EURALEX International Congress (EURALEX 2014), pages 15–19. Bolzano, Italy.

Jey Han Lau, Paul Cook, Diana McCarthy, Spandana Gella and Timothy Baldwin. 2014. Learning Word Sense Distributions, Detecting Unattested Senses and Identifying Novel Senses Using Topic Models. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), pages 259–270. Baltimore, Maryland.

Spandana Gella, Paul Cook and Timothy Baldwin. 2014. One Sense per Tweeter ... and Other Lexical Semantic Tales of Twitter. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014), volume 2: Short Papers, pages 215–220. Gothenburg, Sweden. Dataset

Bahar Salehi, Paul Cook and Timothy Baldwin. 2014. Using Distributional Similarity of Multi-way Translations to Predict Multiword Expression Compositionality. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014), pages 472–481. Gothenburg, Sweden.

Bo Han, Paul Cook and Timothy Baldwin. 2014. Text-based Twitter User Geolocation Prediction. Journal of Artificial Intelligence Research, 49:451–500.

2013

Marco Lui and Paul Cook. 2013. Classifying English Documents by National Dialect. In Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013), pages 5–15. Brisbane, Australia.

Paul Cook, Jey Han Lau, Michael Rundell, Diana McCarthy and Timothy Baldwin. 2013. A lexicographic appraisal of an automatic approach for detecting new word-senses. In Electronic lexicography in the 21st century: thinking outside the paper. Proceedings of the eLex 2013 conference, pages 49–65. Tallinn, Estonia.

Timothy Baldwin, Paul Cook, Marco Lui, Andrew MacKinlay and Li Wang. 2013. How Noisy Social Media Text, How Diffrnt Social Media Sources? In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP 2013), pages 356–364. Nagoya, Japan.

Long Duong, Paul Cook, Steven Bird and Pavel Pecina. 2013. Increasing the quality and quantity of source language data for unsupervised cross-lingual POS tagging. In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP 2013), pages 1243–1249. Nagoya, Japan.

Bo Han, Paul Cook and Timothy Baldwin. 2013. unimelb: Spanish Text Normalisation. In Proceedings of the Tweet Normalization Workshop at SEPLN 2013 (Tweet-norm), pages 67–71. Madrid, Spain.

Long Duong, Paul Cook, Steven Bird and Pavel Pecina. 2013. Simpler unsupervised POS tagging with bilingual projections. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 634–639. Sofia, Bulgaria.

Bo Han, Paul Cook and Timothy Baldwin. 2013. A Stacking-based Approach to Twitter User Geolocation Prediction. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 7–12. Sofia, Bulgaria.

Bahar Salehi and Paul Cook. 2013. Predicting the Compositionality of Multiword Expressions Using Translations in Multiple Languages. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, pages 266–275. Atlanta, Georgia.

Spandana Gella, Paul Cook and Bo Han. 2013. Unsupervised Word Usage Similarity in Social Media Texts. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, pages 248–253. Atlanta, Georgia. Best short paper Dataset

Spandana Gella, Bahar Salehi, Marco Lui, Karl Grieser, Paul Cook and Timothy Baldwin. 2013. UniMelb_NLP-CORE: Integrating predictions from multiple domains and feature sets for estimating semantic textual similarity. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, pages 207–215. Atlanta, Georgia.

Jey Han Lau, Paul Cook and Timothy Baldwin. 2013. unimelb: Topic Modelling-based Word Sense Induction for Web Snippet Clustering. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pages 217–221. Atlanta, Georgia.

Jey Han Lau, Paul Cook and Timothy Baldwin. 2013. unimelb: Topic Modelling-based Word Sense Induction. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pages 307–311. Atlanta, Georgia.

Paul Cook and Graeme Hirst. 2013. Automatically Assessing Whether a Text is Clichéd, with Applications to Literary Analysis. In Proceedings of the 9th Workshop on Multiword Expressions (MWE 2013), pages 52–57. Atlanta, Georgia.

Bo Han, Paul Cook and Timothy Baldwin. 2013. Lexical Normalisation for Social Media Text. ACM Transactions on Intelligent Systems and Technology 4(1), pages 5:1–5:27.

2012

Bo Han, Paul Cook and Timothy Baldwin. 2012. Geolocation Prediction in Social Media Data by Finding Location Indicative Words. In Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), pages 1045–1062. Mumbai, India.

Paul Cook and Marco Lui. langid.py for better language modelling. 2012. In Proceedings of the Australasian Language Technology Association Workshop 2012 (ALTA 2012), pages 107–112. Dunedin, New Zealand. .pdf code

Paul Cook and Scott Nowson, editors. 2012. Proceedings of the Australasian Language Technology Association Workshop 2012 (ALTA 2012). Dunedin, New Zealand. .pdf

Paul Cook. 2012. Using social media to find English lexical blends. In Proceedings of the 15th EURALEX International Congress (EURALEX 2012), pages 846–854. Oslo, Norway. .pdf code Sample output (There's lots of obscenity in the data. You've been warned.)
I wrote a post for the MacMillan Dictionary Blog on this work.

Bo Han, Paul Cook and Timothy Baldwin. 2012. Automatically Constructing a Normalisation Dictionary for Microblogs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Natural Language Learning (EMNLP-CoNLL 2012), pages 421–432. Jeju, Korea. .pdf
The normalisation dictionary we produced in this paper is available.

Paul Cook and Graeme Hirst. 2012. Do Web corpora from top-level domains represent national varieties of English? In Actes des 11es Journées Internationales d'Analyse Statistique des Données Textuelles / Proceedings of the 11th International Conference on Textual Data Statistical Analysis, pages 281–293. Liège, Belgium. .pdf

Jey Han Lau, Paul Cook, Diana McCarthy, David Newman and Timothy Baldwin. 2012. Word Sense Induction for Novel Sense Detection. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pages 591–601. Avignon, France. .pdf

Timothy Baldwin, Paul Cook, Bo Han, Aaron Harwood, Shanika Karunasekera and Masud Moshtaghi. 2012. A Support Platform for Event Detection using Social Intelligence. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 69–72. Avignon, France. .pdf

2011

Paul Cook and Graeme Hirst. 2011. Automatic identification of words with novel but infrequent senses. In Proceedings of the 25th Pacific Asia Conference on Language Information and Computation (PACLIC 25), pages 265–274. Singapore. .pdf

Paul Cook. 2011. Book review of “A Way with Words: Recent advances in lexical theory and analysis: A Festschrift for Patrick Hanks.” Gilles-Maurice de Schryver (editor). Computational Linguistics 37(2):403–406. .pdf

2010

Paul Cook. 2010. Exploiting linguistic knowledge to infer properties of neologisms. Ph.D. thesis, University of Toronto, November. .pdf

Paul Cook and Suzanne Stevenson. 2010. No sentence is too confusing to ignore. In Proceedings of the ACL 2010 Workshop on NLP and Linguistics: Finding the Common Ground, pages 61–69. Uppsala, Sweden. .pdf

Paul Cook and Anna Feldman, editors. 2010. Proceedings of the NAACL HLT 2010 Second Workshop on Computational Approaches to Linguistic Creativity (CALC-10). Los Angeles, California. .pdf

Paul Cook and Suzanne Stevenson. 2010. Automatically identifying changes in the semantic orientation of words. In Proceedings of the 7th International Conference on Language Resources and Evaluation, pages 28–34. Valletta, Malta. .pdf

Paul Cook and Suzanne Stevenson. 2010. Automatically identifying the source words of lexical blends in English. Computational Linguistics. 36(1):129–149. .pdf
The dataset of blends used in this study is available. Please contact me if you're interested.

2009

Paul Cook and Suzanne Stevenson. 2009. An unsupervised model for text message normalization. In Proceedings of the NAACL HLT 2009 Workshop on Computational Approaches to Linguistic Creativity, pages 71–78. Boulder, Colorado. .pdf

Afsaneh Fazly, Paul Cook and Suzanne Stevenson. 2009. Unsupervised type and token identification of idiomatic expressions. Computational Linguistics 35(1):61–103. .pdf

2008

Paul Cook, Afsaneh Fazly and Suzanne Stevenson. 2008. The VNC-Tokens Dataset. In Proceedings of the LREC Workshop: Towards a Shared Task for Multiword Expressions (MWE 2008), pages 19–22. Marrakech, Morocco. .pdf
The VNC-Tokens dataset (also available from the Multiword Expressions Web)

2007

Paul Cook and Suzanne Stevenson. 2007. Automagically inferring the source words of lexical blends. In Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics (PACLING 2007), pages 289–297. Melbourne, Australia. .pdf

Paul Cook, Afsaneh Fazly and Suzanne Stevenson. 2007. Pulling their weight: Exploiting syntactic forms for the automatic identification of idiomatic expressions in context. In Proceedings of the ACL Workshop on A Broader Perspective on Multiword Expressions (MWE 2007), pages 41–48. Prague, Czech Republic. .pdf

2006

Paul Cook. 2006. Automatically Classifying English Verb-Particle Constructions by Particle Semantics. M.Sc. thesis, University of Toronto, August. .pdf

Paul Cook and Suzanne Stevenson. 2006. Classifying particle semantics in English verb-particle constructions. In Proceedings of the ACL/COLING Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties (MWE 2006), pages 45–53. Sydney, Australia. .pdf