Natural Language Processing

Jose Camacho Collados

about me

Jose Camacho Collados

Jose Camacho Collados
I am a Google Doctoral Fellow and third-year PhD student at the Linguistic Computing Laboratory (LCL) of Sapienza University of Rome. Previously I worked as a research engineer at ATILF-CNRS in France. My background education includes an Erasmus Mundus Master in Natural Language Processing and Human Language Technology and a 5-year BSc degree in Mathematics.

I work on various topics in Natural Language Processing (NLP), mainly on the lexical and distributional semantics areas. Currently I'm working on integrating explicit knowledge (mainly from lexical resources) into downstream NLP applications, with a special focus on multilinguality and ambiguity. To this end, I have been collaborating on the BabelNet project and developing knowledge-based sense vector representations (e.g. NASARI and SW2V) to be used as a bridge between lexical resources and text-based applications. We have organized a tutorial at ACL 2016 and a workshop at EACL 2017 on this topic.

I strongly believe that well-curated datasets and resources, as well as shared tasks, are key for advancing science. This year we are organizing two SemEval 2018 tasks, on Hypernym Discovery and Emoji Prediction. Check them out!

NLP aside, I love travelling and sports. I was raised in Granada, a wonderful city in the south of Spain where I spent the first 20 years of my life. Then, I have been living in large European cities like Paris, Barcelona and Rome, and spent long amounts of time in Seoul. I have also lived in other smaller (but equally charming) cities: Nancy and Besançon (France) and Wolverhampton (UK). I like practising all kinds of sports: football, swimming, tennis, padel, ping pong... and chess (yes, it is also a sport!). I hold the International Master chess title and am currently the top-rated chess player of South Korea.


Jose Camacho-Collados.
Semantic Vector Representations of Word Senses, Concepts and Entities and their Applications in Natural Language Processing. [thesis]
PhD Thesis (2018).
Jose Camacho-Collados and Mohammad Taher Pilehvar.
On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis. [paper]
arXiv preprint arXiv:1707.01780 (2017).
Massimiliano Mancini*, Jose Camacho-Collados*, Ignacio Iacobacci and Roberto Navigli.
Embedding Words and Senses Together via Joint Knowledge-Enhanced Training. [paper] [data&code]
CoNLL 2017, Vancouver, Canada.
Mohammad Taher Pilehvar, Jose Camacho-Collados, Roberto Navigli and Nigel Collier.
Towards a Seamless Integration of Word Senses into Downstream NLP Applications. [paper] [bib] [data&code]
ACL 2017, Vancouver, Canada.
Claudio Delli Bovi, Jose Camacho-Collados, Alessandro Raganato and Roberto Navigli.
EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text. [paper] [bib] [data]
ACL 2017 (short), Vancouver, Canada.


December 2017. Area chair in COLING 2018.

December 2017. Won the research prize of the Spanish police foundation for developing VeriPol, a system for automatically detecting the falsehood of police reports directly from its text content.

September 2017. To attend Google's Natural Language Processing Summit in Zurich (25-27 September).

August 2017. Co-organizing two SemEval 2018 tasks: Hypernym Discovery and Emoji Prediction

August 2017. To attend ACL, CoNLL and SemEval at Vancouver, Canada.

June 2017. To attend Google's Machine Learning Summit in Zurich (12-14 June).

April 2017. I will give a talk on "Semantic Representations of Word Senses, Concepts and Entities and their Applications" at the University of Cambridge, UK. [Slides]

April 2017. Two papers (one short and one long) accepted at ACL 2017. Joint works with Claudio Delli Bovi, Alessandro Raganato and Roberto Navigli, and Taher Pilehvar and Nigel Collier.



Jose Camacho Collados

data / resources / software

Jose Camacho Collados

- NASARI vector representations for BabelNet synsets and Wikipedia pages (English, French, German, Italian, Spanish).

- SW2V (Senses and Words to Vectors): Word and Sense embeddings in the same vector space (code+pre-trained models).

- Unified Evaluation Framework for Word Sense Disambiguation.

- BabelNet, a very large multilingual encyclopedic dictionary and semantic network.

- BabelDomains: Lexical items (synsets, Wikipedia pages) annotated with domains of knowledge.

- EuroSense: Multilingual sense annotations for Europarl.

- Supervised distributional framework (including Python API) for hypernym discovery.

- Find the word that does not belong: an evaluation benchmark for the outlier detection task

- Large multilingual corpus of sense-annotated textual definitions.

- Word similarity datasets in several languages (also cross-lingual!).


Looking forward to hearing from you!

invited talks

Jose Camacho Collados

Semantic (Vector) Representations of Word Senses, Concepts and Entities and their Applications, 20 April 2017, University of Cambridge, UK. [slides]

Semantic Representations of Word Senses, Concepts and Entities and their Applications, 19 October 2016, Pompeu Fabra University, Barcelona, Spain. [slides]

Computational Semantic Representation, 5 June 2015, Department of Applied Mathematics, University of California (UCLA), Los Angeles, USA.

Using NLP tools to identify people and places in a collection of 19th century emigrant letters, 'Digitising Experience of Migration' Workshop,
15 March 2014, Mellon Centre for Migration Studies, Omagh, Northern Ireland.