Natural Language Processing

Jose Camacho Collados

about me

Jose Camacho Collados

Jose Camacho Collados
Background. I am currently a Professor at the School of Computer Science and Informatics at Cardiff University. I am also a UKRI Future Leaders Fellow since February 2021, and lead the Cardiff NLP group. Before that I was a postdoc for a year on the FLEXILOG ERC project with Steven Schockaert. Previously I was a Google Doctoral Fellow and PhD student at the Linguistic Computing Laboratory (LCL) of Sapienza University of Rome. My background education includes an Erasmus Mundus Master in Natural Language Processing and Human Language Technology and a 5-year BSc degree in Mathematics. I also worked for a year as a research engineer at ATILF-CNRS in Nancy (France).

Research. I work on various topics in Natural Language Processing (NLP), mainly on the lexical and distributional semantics areas. In this area I've recently written a book (titled Embeddings in NLP) with Taher Pilehvar that can give an overview of the recent trends in distributional semantics and NLP. These last years I have been particularly interested on how relational knowledge is captured in current NLP models (embeddings/language models - check out our latest RelBERT model!), and how this plays a role in applications. During my PhD I've also worked on integrating explicit knowledge (mainly from lexical resources) into downstream NLP applications, with a special focus on multilinguality and ambiguity. To this end, I have been collaborating on the BabelNet project and developing knowledge-based sense vector representations (e.g. NASARI and SW2V) to be used as a bridge between lexical resources and text-based applications.

Open data. I strongly believe that well-curated datasets and resources, as well as shared tasks, are key for advancing science. In 2019 we organized the WiC challenge on evaluating context-sensitive representations. This competition was part of a shared task in the IJCAI workshop SemDeep, and is featured in the SuperGLUE language understanding benchmark. We recently extended this effort with the WiC-TSV benchmark. In the past I also helped organize several SemEval tasks in Word Similarity Hypernym Discovery and Emoji Prediction. Check them out as all datasets are openly available and in various languages!
Finally, I have also been working on social media applications recently, check out the TweetNLP platform! We also have open datasets (TweetEval), time-specific models (TimeLMs), multilingual language models (XLM-T) and cross-lingual word embeddings!

Other. NLP aside, I love travelling and sports. I was raised in Granada, a wonderful city in the south of Spain where I spent the first 20 years of my life. Then, I have been living in large European cities like Paris, Barcelona and Rome, and spent long amounts of time in Seoul. I have also lived in other smaller (but equally charming) cities: Nancy and Besançon (France) and Wolverhampton (UK). I like practising all kinds of sports: football, swimming, tennis, padel, ping pong... and chess (yes, it is also a sport!). I hold the International Master chess title and am currently the Welsh chess champion!

Teaching and Supervision: You can find more about my current teaching and PhD supervision here.
Note: If you are interested in doing a PhD with me, please read this note to prospective PhD students.


Dimosthenis Antypas, Alun Preece and Jose Camacho-Collados.
Negativity spreads faster: A large-scale multilingual twitter analysis on the role of sentiment in political communication. [paper] [data&code]
Online Social Networks and Media Journal (2023).
Jose Camacho-Collados, Kiamehr Rezaee, Talayeh Riahi, Asahi Ushio, Daniel Loureiro, Dimosthenis Antypas, Joanne Boisson, Luis Espinosa-Anke, Fangyu Liu, Eugenio Martínez-Cámara, Gonzalo Medina, Thomas Buhrmann, Leonardo Neves and Francesco Barbieri.
TweetNLP: Cutting-Edge Natural Language Processing for Social Media. [paper] [code] [demo]
EMNLP 2022 (Demo), Abu Dhabi (United Arab Emirates).
Dimosthenis Antypas*, Asahi Ushio*, Jose Camacho-Collados, Leonardo Neves, Vítor Silva and Francesco Barbieri.
Twitter Topic Classification. [paper] [data]
COLING 2022, Gyeongju (Republic of Korea).
Mark Anderson and Jose Camacho-Collados.
Assessing the Limits of the Distributional Hypothesis in Semantic Spaces: Trait-based Relational Knowledge and the Impact of Co-occurrences. [paper] [data]
*SEM 2022, Seattle (USA).
Daniel Loureiro, Alípio Mário Jorge and Jose Camacho-Collados.
LMMS Reloaded: Transformer-based Sense Embeddings for Disambiguation and Beyond. [paper] [data&code]
Artificial Intelligence Journal (2022).
Asahi Ushio, Jose Camacho-Collados and Steven Schockaert.
Distilling Relation Embeddings from Pretrained Language Models [paper] [data&code]
EMNLP 2021


January 2024. I'll be the General Chair of *SEM 2024.

August 2023. We won the AIJ 2023 Prominent Paper Award!

August 2023. I've been promoted to Full Professor!

July 2023. Attending ACL 2023 and organising *SEM 2023 as a program co-chair.

May 2023. I'm giving an invited talk at the Natural Language Symposium at the University of Copenhagen.

January 2022. I'll give one of the keynote talks at the Global WordNet Conference.

July 2022. I'm happy to announce the first release of TweetNLP, a platform for cutting-edge NLP specialized on social media!

March 2022. We are organising the first EvoNLP EvoNLP workshop (Workshop on Ever Evolving NLP), co-located with EMNLP. EvoNLP also features a meaning shift detection shared task framed as Word-in-Context - trial data available!

February 2022. We have launched our TimeLMs, with the commitment to release a new language model every three months!

August 2021. Taher Pilehvar and I taught a week-long course on "Embeddings in NLP" at the ESSLLI summer school.



Jose Camacho Collados

code / resources

Jose Camacho Collados

- TweetNLP, an all-round platform for NLP in social media, including a Python library. - TimeLMs, language models trained for various time periods.

- XLM-T, multilingual language models for Twitter for sentiment analysis and beyond.

- T-NER, a Python Named Entity Recognition library based on transformers.

- TweetEval, a unified evaluation benchmark for Twitter and language models.

- WiC-TSV, a new benchmark extending word to retrieve senses in context.

- Meemi, an open-source implementation to learn cross-lingual embeddings, including pre-trained models.

- WiC, the Word-in-Context dataset for evaluating context-sensitive representations.

- NASARI vector representations for BabelNet synsets and Wikipedia pages (English, French, German, Italian, Spanish).

- SW2V (Senses and Words to Vectors): Word and Sense embeddings in the same vector space (code+pre-trained models).

- Unified Evaluation Framework for Word Sense Disambiguation.

- BabelNet, a very large multilingual encyclopedic dictionary and semantic network.

- BabelDomains: Lexical items (synsets, Wikipedia pages) annotated with domains of knowledge.

- EuroSense: Multilingual sense annotations for Europarl.

- Supervised distributional framework (including Python API) for hypernym discovery.

- Find the word that does not belong: an evaluation benchmark for the outlier detection task

- Large multilingual corpus of sense-annotated textual definitions.

- Word similarity datasets in several languages (also cross-lingual!).


Looking forward to hearing from you!

invited talks / tutorials

Jose Camacho Collados

NLP and social media: Language Modelling, Benchmarking and Temporal Challenges , 6 February 2024, Manchester, UK. [slides]

Language Models for Social Media: Challenges and Applications, 25 March 2022, Sheffield, UK. [slides]

COLING 2020 Tutorial on Embeddings in NLP, 13 December 2020. [slides+video]

The Power of Natural Language Processing for Social Media Analysis, 13 October 2020, AI Tech North, UK. [slides]

Natural Language Processing at Cardiff University, 14 July 2020, Wales Tech Week, UK.

Word, Sense and Contextualized Embeddings: Vector Representations of Meaning in NLP, 6 February 2020, AI Wales, UK. [slides]

Word, Sense and Contextualized Embeddings: Vector Representations of Meaning in NLP, 8 January 2020, Universidad de Granada, Spain.

Word, Sense and Contextualized Embeddings: Vector Representations of Meaning in Natural Language Processing, 18 March 2019, Data and Knowledge Engineering Seminar, Cardiff University, UK. [slides]

NAACL 2018 Tutorial on The Interplay between Lexical Resources and NLP, 1 June 2018, New Orleans, USA. [slides][website]

Semantic (Vector) Representations of Word Senses, Concepts and Entities and their Applications, 20 April 2017, University of Cambridge, UK. [slides]

ACL 2016 Tutorial on Semantic Representation of Word Senses and Concepts, 7 August 2016, Berlin, Germany. [slides]

Semantic Representations of Word Senses, Concepts and Entities and their Applications, 19 October 2016, Pompeu Fabra University, Barcelona, Spain. [slides]

Computational Semantic Representation, 5 June 2015, Department of Applied Mathematics, University of California (UCLA), Los Angeles, USA.

Using NLP tools to identify people and places in a collection of 19th century emigrant letters, 'Digitising Experience of Migration' Workshop,
15 March 2014, Mellon Centre for Migration Studies, Omagh, Northern Ireland.