Publications

Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions

Published in EMNLP 2023, 2023

This paper tackles the transparency issue in online content moderation, using Wikipedia as a case study where moderation decisions are publicly discussed. The study introduces a multilingual dataset from Wikipedia editor discussions, demonstrating that combining editor stance and policy reasoning can be accurately predicted, contributing to increased transparency in content moderation.

Recommended citation: Kaffee, L. A., Arora, A., & Augenstein, I. (2023). Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions. In EMNLP 2023. https://arxiv.org/pdf/2310.05779.pdf

Thorny Roses: Investigating the Dual Use Dilemma in Natural Language Processing

Published in EMNLP 2023 (Findings of the Association for Computational Linguistics), 2023

This paper explores the growing concern of dual use in Natural Language Processing (NLP) as technologies become more advanced and opaque. Surveying NLP researchers, it reveals widespread concerns about potential misuse with limited proactive measures. The paper proposes a tailored definition of dual use for NLP, discusses the current state of the issue, and suggests mitigation strategies, including a checklist for ethics frameworks like the ACL checklist.

Recommended citation: Kaffee, L. A., Arora, A., Talat, Z., & Augenstein, I. (2023). Thorny Roses: Investigating the Dual Use Dilemma in Natural Language Processing. Findings of the Association for Computational Linguistics: EMNLP 2023. https://arxiv.org/abs/2304.08315

Probing Pre-Trained Language Models for Cross-Cultural Differences in Values

Published in Wiki Workshop at the Web Conference, 2023

This paper delves into the exploration of social, cultural, and political values encoded in Pre-Trained Language Models (PTLMs) and investigates how these values vary across cultures. Introducing probes for systematic study, the research reveals that PTLMs capture cultural differences in values, although alignment with established cross-cultural value surveys is weak.

Recommended citation: Arora, A., Kaffee, L. A., & Augenstein, I. (2023). Probing Pre-Trained Language Models for Cross-Cultural Differences in Values. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP) at EACL 2023. https://arxiv.org/abs/2203.13722

TempEL: Linking Dynamically Evolving and Newly Emerging Entities

Published in NeurIPS 2022, 2022

The paper presents TempEL, a new dataset for entity linking that captures the impact of evolving entities over time. Using time-stratified English Wikipedia snapshots from 2013 to 2022, TempEL reveals a decrease in entity linking accuracy for both continual entities (up to 3.1%) and newly emerging entities (up to 17.9%), highlighting the challenge of time-evolving entity disambiguation and suggesting new directions for research in this field.

Recommended citation: Zaporojets, K., Kaffee, L. A., Deleu, J., Demeester, T., Develder, C., & Augenstein, I. (2022). TempEL: Linking Dynamically Evolving and Newly Emerging Entities. In Advances in Neural Information Processing Systems (NeurIPS) 35 (2022): 1850-1866.=. https://arxiv.org/abs/2302.02500

Ranking Knowledge Graphs By Capturing Knowledge about Languages and Labels

Published in Tenth International Conference on Knowledge Capture, 2019

Ranking knowledge graphs based on class-based label captures, that capture multilinguality on class-level.

Recommended citation: Kaffee, L.-A., Endris, K.M., Simperl, E. and Vidal, M.-E., 2019. Ranking Knowledge Graphs By Capturing Knowledge about Languages and Labels

When Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata

Published in 15th International Symposium on Open Collaboration, 2019

Analysing how humans and bots works on multilingual data in Wikidata

Recommended citation: Kaffee, L.-A., Endris, K.M. and Simperl, E., 2019. When Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata. In Proceedings of the 15th International Symposium on Open Collaboration https://opensym.org/wp-content/uploads/2019/08/os19-paper-A16-kaffee.pdf

Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata

Published in North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2018, 2018

Neural Generation of Summaries from Wikidata for Wikipedia in Underserved Languages

Recommended citation: Kaffee, L.A., Elsahar, H., Vougiouklis, P., Gravier, C., Laforest, F., Hare, J. and Simperl, E., 2018. Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2018. https://arxiv.org/abs/1803.07116

Mind the (Language) Gap: Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders

Published in Extended Semantic Web Conference 2018, 2018

Generation of Multilingual Summaries from Wikidata for Wikipedia’s ArticlePlaceholder

Recommended citation: Kaffee, L.A., Elsahar, H., Vougiouklis, P., Gravier, C., Laforest, F., Hare, J. and Simperl, E., Mind the (Language) Gap: Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders. In Proceedings of the Extended Semantic Web Conference 2018. https://2018.eswc-conferences.org/wp-content/uploads/2018/02/ESWC2018_paper_131.pdf

Property Label Stability in Wikidata: Evolution and Convergence of Schemas in Collaborative Knowledge Bases

Published in Wiki Workshop at the Web Conference, 2018

Analysis of stability of schema labels in Wikidata

Recommended citation: Pellissier Tanon, T., & Kaffee, L. A. (2018, April). Property Label Stability in Wikidata: Evolution and Convergence of Schemas in Collaborative Knowledge Bases. In Companion of the The Web Conference 2018 on The Web Conference 2018 (pp. 1801-1803). International World Wide Web Conferences Steering Committee. http://wikiworkshop.org/2018/papers/wikiworkshop2018_paper_10.pdf

Provenance Information in a Collaborative Knowledge Graph: an Evaluation of Wikidata External References

Published in International Semantic Web Conference, 2017

Evaluating Wikidata external references

Recommended citation: Piscopo, A., Kaffee, L. A., Phethean, C., & Simperl, E. (2017, October). Provenance Information in a Collaborative Knowledge Graph: an Evaluation of Wikidata External References. In International Semantic Web Conference (pp. 542-558). Springer, Cham. https://iswc2017.semanticweb.org/wp-content/uploads/papers/MainProceedings/71.pdf

What do Wikidata and Wikipedia Have in Common?: An Analysis of their Use of External References

Published in International Symposium on Open Collaboration, 2017

Analysis of external references in Wikidata and Wikipedia

Recommended citation: Piscopo, A., Vougiouklis, P., Kaffee, L. A., Phethean, C., Hare, J., & Simperl, E. (2017, August). What do Wikidata and Wikipedia Have in Common?: An Analysis of their Use of External References. In Proceedings of the 13th International Symposium on Open Collaboration (p. 1). ACM. https://eprints.soton.ac.uk/412922/1/opensym_wd_vs_wp_2_.pdf