I am currently a Lecturer at the School of Computer Science and Informatics at Cardiff University. I am also a UKRI Future Leaders Fellow since February 2021, and run the Cardiff NLP group.
Before that I was a postdoc for a year on the FLEXILOG ERC project with Steven Schockaert. Previously I was a Google Doctoral Fellow and PhD student
at the Linguistic Computing Laboratory (LCL) of Sapienza University of Rome.
My background education includes an Erasmus Mundus Master in Natural Language Processing and Human Language Technology and a 5-year BSc degree in Mathematics.
I also worked for a year as a research engineer at ATILF-CNRS in Nancy (France).
Research. I work on various topics in Natural Language Processing (NLP), mainly on the lexical and distributional semantics areas. In this area I've recently written a book (titled Embeddings in NLP) with Taher Pilehvar that can give an overview of the recent trends in distributional semantics and NLP. These last years I have been particularly interested on how relational knowledge is captured in current NLP models (embeddings/language models), and how this plays a role in applications. During my PhD I've also worked on integrating explicit knowledge (mainly from lexical resources) into downstream NLP applications, with a special focus on multilinguality and ambiguity. To this end, I have been collaborating on the BabelNet project and developing knowledge-based sense vector representations (e.g. NASARI and SW2V) to be used as a bridge between lexical resources and text-based applications. We have organized a tutorial at ACL 2016 and a workshop at EACL 2017 on this topic, and a tutorial at NAACL 2018 on the interplay between lexical resources and NLP.
Open data. I strongly believe that well-curated datasets and resources, as well as shared tasks, are key for advancing science. In 2019 we organized the WiC challenge on evaluating context-sensitive representations. This competition was part of a shared task in the IJCAI workshop SemDeep, and is featured in the SuperGLUE language understanding benchmark. We recently extended this effort with the WiC-TSV benchmark. In the past I also helped organize several SemEval tasks in Word Similarity Hypernym Discovery and Emoji Prediction. Check them out as all datasets are openly available and in various languages!
Finally, I have also been working on social media applications recently, for which we also have open datasets (TweetEval), multilingual language models (XLM-T) and cross-lingual word embeddings!
Other. NLP aside, I love travelling and sports. I was raised in Granada, a wonderful city in the south of Spain where I spent the first 20 years of my life. Then, I have been living in large European cities like Paris, Barcelona and Rome, and spent long amounts of time in Seoul. I have also lived in other smaller (but equally charming) cities: Nancy and Besançon (France) and Wolverhampton (UK). I like practising all kinds of sports: football, swimming, tennis, padel, ping pong... and chess (yes, it is also a sport!). I hold the International Master chess title and am currently the top-rated chess player of South Korea.
Teaching and Supervision: You can find more about my current teaching and PhD supervision commitments here.
Note: If you are interested in doing a PhD with me, please read this note to prospective PhD students.
|Daniel Loureiro, Alípio Mário Jorge and Jose Camacho-Collados.
LMMS Reloaded: Transformer-based Sense Embeddings for Disambiguation and Beyond. [paper] [data&code]
arXiv preprint arXiv:2105.12449 (2021).
|Francesco Barbieri, Luis Espinosa Anke and Jose Camacho-Collados.
XLM-T: A Multilingual Language Model Toolkit for Twitter. [paper] [data&code]
arXiv preprint arXiv:2104.12250 (2021).
|Asahi Ushio, Luis Espinosa-Anke, Steven Schockaert and Jose Camacho-Collados.
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? [paper] [data&code]
|Daniel Loureiro*, Kiamehr Rezaee*, Mohammad Taher Pilehvar and Jose Camacho-Collados.
Language Models and Word Sense Disambiguation: An Overview and Analysis. [paper] [data&code]
Computational Linguistics (2021)
| Francesco Barbieri, Jose Camacho-Collados, Leonardo Neves and Luis Espinosa-Anke.
TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification. [paper] [data]
Findings of EMNLP 2020.
|Zied Bouraoui, Jose Camacho-Collados and Steven Schockaert.
Inducing Relational Knowledge from BERT. [paper]
AAAI 2020, New York, USA.
August 2021. Taher Pilehvar and I taught a week-long course on "Embeddings in NLP" at the ESSLLI summer school.
February 2021. Just got started with my UKRI Future Leader Fellowship!
December 2020. Our "Embeddings in NLP" book has been published! All information, including a short video tutorial, here.
December 2020. Join us at Cardiff University! Call for five open-ended Lecturer Positions at the School of Computer Science.
November 2020. Hiring! Looking for a 3-year postdoc starting early/mid 2021. All the details on how to apply here.
October 2020. Excited and honoured to be awarded a UKRI Future Leaders Fellowship!
September 2020. Recently received a research grant from Snap Research (w/ Luis Espinosa-Anke and Daniel Loureiro) and started a collaboration for investigating meaning shift in social media.