Natural Language Processing

Jose Camacho Collados

about me

Jose Camacho Collados

Jose Camacho Collados
I am currently a Lecturer at the School of Computer Science and Informatics at Cardiff University, after having worked for a year on the FLEXILOG ERC project as a postdoc. Previously I was a Google Doctoral Fellow and PhD student at the Linguistic Computing Laboratory (LCL) of Sapienza University of Rome. My background education includes an Erasmus Mundus Master in Natural Language Processing and Human Language Technology and a 5-year BSc degree in Mathematics. I also worked for a year as a research engineer at ATILF-CNRS in Nancy (France).

I work on various topics in Natural Language Processing (NLP), mainly on the lexical and distributional semantics areas. Currently I'm working on integrating explicit knowledge (mainly from lexical resources) into downstream NLP applications, with a special focus on multilinguality and ambiguity. To this end, I have been collaborating on the BabelNet project and developing knowledge-based sense vector representations (e.g. NASARI and SW2V) to be used as a bridge between lexical resources and text-based applications. We have organized a tutorial at ACL 2016 and a workshop at EACL 2017 on this topic, and a tutorial at NAACL 2018 on the interplay between lexical resources and NLP.

I strongly believe that well-curated datasets and resources, as well as shared tasks, are key for advancing science. This year we are organizing a CodaLab challenge on evaluating context-sensitive representations on the WiC dataset. This competition was part of a shared task in the IJCAI workshop SemDeep, and is currently featured in the SuperGLUE language understanding benchmark. Last year we organized two SemEval 2018 tasks, on Hypernym Discovery and Emoji Prediction. Check them out!

NLP aside, I love travelling and sports. I was raised in Granada, a wonderful city in the south of Spain where I spent the first 20 years of my life. Then, I have been living in large European cities like Paris, Barcelona and Rome, and spent long amounts of time in Seoul. I have also lived in other smaller (but equally charming) cities: Nancy and Besançon (France) and Wolverhampton (UK). I like practising all kinds of sports: football, swimming, tennis, padel, ping pong... and chess (yes, it is also a sport!). I hold the International Master chess title and am currently the top-rated chess player of South Korea.

Note: If you are interested in doing a PhD with me, please read this note to prospective PhD students.


Francesco Barbieri, Jose Camacho-Collados, Leonardo Neves and Luis Espinosa-Anke.
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification. [paper] [data]
Findings of EMNLP 2020.
Alessandro Raganato*, Tommaso Pasini*, Jose Camacho-Collados and Mohammad Taher Pilehvar.
XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization. [paper] [data] [competition]
EMNLP 2020.
Daniel Loureiro and Jose Camacho-Collados.
Don't Neglect the Obvious: On the Role of Unambiguous Words in Word Sense Disambiguation. [paper] [data&code]
EMNLP 2020
Zied Bouraoui, Jose Camacho-Collados and Steven Schockaert.
Inducing Relational Knowledge from BERT. [paper]
AAAI 2020, New York, USA.
Jose Camacho-Collados*, Yerai Doval*, Eugenio Martínez-Cámara, Luis Espinosa-Anke, Francesco Barbieri and Steven Schockaert.
Learning Cross-lingual Embeddings from Twitter via Distant Supervision. [paper] [data]
ICWSM 2020, Atlanta, USA, to appear.


November 2020. Hiring! Looking for a 3-year postdoc starting early/mid 2021. All the details on how to apply here.

October 2020. Excited and honoured to be awarded a UKRI Future Leaders Fellowship!

September 2020. Recently received a research grant from Snap Research (w/ Luis Espinosa-Anke and Daniel Loureiro) and started a collaboration for investigating meaning shift in social media.

July 2020. We have created a Cardiff NLP group, which now has a Twitter account!

July 2020. Gave a talk with Luis Espinosa-Anke about our research on NLP at Cardiff University as part of the Wales Tech Week.

April 2020. Given the current situation, Taher Pilehvar and I have decided to openly release the first draft of our book Embeddings in Natural Language Processing. We also thank Morgan and Claypool for agreeing to this early release.



Jose Camacho Collados

code / resources

Jose Camacho Collados

- WiC-TSV, a new benchmark extending word to retrieve senses in context.

- Meemi, an open-source implementation to learn cross-lingual embeddings, including pre-trained models.

- WiC, the Word-in-Context dataset for evaluating context-sensitive representations.

- NASARI vector representations for BabelNet synsets and Wikipedia pages (English, French, German, Italian, Spanish).

- SW2V (Senses and Words to Vectors): Word and Sense embeddings in the same vector space (code+pre-trained models).

- Unified Evaluation Framework for Word Sense Disambiguation.

- BabelNet, a very large multilingual encyclopedic dictionary and semantic network.

- BabelDomains: Lexical items (synsets, Wikipedia pages) annotated with domains of knowledge.

- EuroSense: Multilingual sense annotations for Europarl.

- Supervised distributional framework (including Python API) for hypernym discovery.

- Find the word that does not belong: an evaluation benchmark for the outlier detection task

- Large multilingual corpus of sense-annotated textual definitions.

- Word similarity datasets in several languages (also cross-lingual!).


Looking forward to hearing from you!

invited talks / tutorials

Jose Camacho Collados

The Power of Natural Language Processing for Social Media Analysis, 13 October 2020, AI Tech North, UK. [slides]

Natural Language Processing at Cardiff University, 14 July 2020, Wales Tech Week, UK.

Word, Sense and Contextualized Embeddings: Vector Representations of Meaning in NLP, 6 February 2020, AI Wales, UK. [slides]

Word, Sense and Contextualized Embeddings: Vector Representations of Meaning in NLP, 8 January 2020, Universidad de Granada, Spain.

Word, Sense and Contextualized Embeddings: Vector Representations of Meaning in Natural Language Processing, 18 March 2019, Data and Knowledge Engineering Seminar, Cardiff University, UK. [slides]

NAACL 2018 Tutorial on The Interplay between Lexical Resources and NLP, 1 June 2018, New Orleans, USA. [slides][website]

Semantic (Vector) Representations of Word Senses, Concepts and Entities and their Applications, 20 April 2017, University of Cambridge, UK. [slides]

ACL 2016 Tutorial on Semantic Representation of Word Senses and Concepts, 7 August 2016, Berlin, Germany. [slides]

Semantic Representations of Word Senses, Concepts and Entities and their Applications, 19 October 2016, Pompeu Fabra University, Barcelona, Spain. [slides]

Computational Semantic Representation, 5 June 2015, Department of Applied Mathematics, University of California (UCLA), Los Angeles, USA.

Using NLP tools to identify people and places in a collection of 19th century emigrant letters, 'Digitising Experience of Migration' Workshop,
15 March 2014, Mellon Centre for Migration Studies, Omagh, Northern Ireland.