Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Page Not Found

Page not found. Your pixels are in another canvas.

Jupyter notebook markdown generator

Posts

Wikimedia Hackathon 2019 (Prague, Czechia)

3 minute read

Published: May 19, 2019

The Wikimedia hackathon is an event to bring together the technical community, that works on the MediaWiki code and technical projects around Wikipedia, Wikidata, WikiSource, and all the other Wikimedia projects. The Wikimedia hackathon 2019 took place in Prague. It has been my fourth time at the hackathon, and it is still one of my favorite events of the Wikimedia events and certainly my favorite hackathon ever. As someone that was there for the first time said: “It is not a typical hackathon, it is more about the people.”

ESWC 2018 (Crete, Greece)

1 minute read

Published: June 07, 2018

For the second time, I attended the ESWC conference - this time in Crete. It was great to see known faces again and meet researchers in the field of semantic web.

NAACL 2018 (New Orleans, USA)

3 minute read

Published: June 04, 2018

NAACL was the first conference in the area of computational linguistics that I was able to attend. It was very interesting, as many topics are very relatable. It is an interesting mix of students and professors working on linguistics and machine learning problems and combining those two. It was also the biggest conference in academia that I have attended thus far. It was interesting to see the different concepts of such a big conference- for example, most papers were presented as posters, even full research papers. This made interaction with authors much easier and allowed for individually discussion of the topics. Many topics were fascinating to me, as many authors tried to solved challenges provided by working on under-resourced languages. Especially in the area of machine learning this brings a lot of challenges, I was happy to be able to discuss. Further, I had interesting discussions on the topic of code-switching, when user speak one language and use words from a different language in it (e.g. Hindi speakers using English words), a topic I have been following with interest.

Wikimedia Hackathon 2018 (Barcelona, Spain)

1 minute read

Published: May 20, 2018

The Wikimedia Hackathon is an annual event for the technical Wikimedia community to come together and hack on common projects.

The Web Conference 2018 (Lyon, France)

3 minute read

Published: April 27, 2018

The Web Conference is one of the most established conferences in the field of research around the web. I had the opportunity to attend this year’s version of it. Below are some of my personal highlights summarized.

Wikimedia Developer Summit 2018 (San Francisco)

3 minute read

Published: February 08, 2018

The Wikimedia Developer Summit is a meeting every year of software developer in the context of Wikimedia.

WikidataCon 2017 (Berlin)

1 minute read

Published: December 12, 2017

The first time ever, Wikidata’s community had its own conference. Everyone interested, contributing to, working or researching on Wikidata gathered in Berlin to watch presentations, talk, and discuss the knowledge base.

projects

Ladies That FOSS Meetup

Published: May 09, 2025

Organization of an event to encourage more women to contribute to Open Source projects

Phones don’t grow on trees

Published: May 09, 2025

Data viszualiztion aiming at illustrating the problems of hardware production

Scribe - Helping editors of under-resourced languages to create new high-quality Wikipedia articles

Published: May 09, 2025

Scribe is an editing tool designed for underserved language Wikipedias, addressing the lack of information and participation in non-English languages. The tool facilitates article creation by providing section planning, reference collection, and key points for editors in under-resourced communities. By leveraging techniques from Information Retrieval and Natural Language Processing, Scribe aims to improve the editing experience, allowing editors to contribute based on community interests and notability criteria, independent of existing content in other Wikipedias.

publications

A Glimpse into Babel: An Analysis of Multilinguality in Wikidata

Published in International Symposium on Open Collaboration, 2017

Analysis of labels in Wikidata towards coverage of languages

Recommended citation: Kaffee, Lucie-Aimée, et al. "A Glimpse into Babel: An Analysis of Multilinguality in Wikidata." Proceedings of the 13th International Symposium on Open Collaboration. ACM, 2017. https://eprints.soton.ac.uk/413433/1/Open_Sym_Short_Paper_Wikidata_Multilingual.pdf

What do Wikidata and Wikipedia Have in Common?: An Analysis of their Use of External References

Published in International Symposium on Open Collaboration, 2017

Analysis of external references in Wikidata and Wikipedia

Recommended citation: Piscopo, A., Vougiouklis, P., Kaffee, L. A., Phethean, C., Hare, J., & Simperl, E. (2017, August). What do Wikidata and Wikipedia Have in Common?: An Analysis of their Use of External References. In Proceedings of the 13th International Symposium on Open Collaboration (p. 1). ACM. https://eprints.soton.ac.uk/412922/1/opensym_wd_vs_wp_2_.pdf

Provenance Information in a Collaborative Knowledge Graph: an Evaluation of Wikidata External References

Published in International Semantic Web Conference, 2017

Evaluating Wikidata external references

Recommended citation: Piscopo, A., Kaffee, L. A., Phethean, C., & Simperl, E. (2017, October). Provenance Information in a Collaborative Knowledge Graph: an Evaluation of Wikidata External References. In International Semantic Web Conference (pp. 542-558). Springer, Cham. https://iswc2017.semanticweb.org/wp-content/uploads/papers/MainProceedings/71.pdf

Property Label Stability in Wikidata: Evolution and Convergence of Schemas in Collaborative Knowledge Bases

Published in Wiki Workshop at the Web Conference, 2018

Analysis of stability of schema labels in Wikidata

Recommended citation: Pellissier Tanon, T., & Kaffee, L. A. (2018, April). Property Label Stability in Wikidata: Evolution and Convergence of Schemas in Collaborative Knowledge Bases. In Companion of the The Web Conference 2018 on The Web Conference 2018 (pp. 1801-1803). International World Wide Web Conferences Steering Committee. http://wikiworkshop.org/2018/papers/wikiworkshop2018_paper_10.pdf

Mind the (Language) Gap: Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders

Published in Extended Semantic Web Conference 2018, 2018

Generation of Multilingual Summaries from Wikidata for Wikipedia’s ArticlePlaceholder

Recommended citation: Kaffee, L.A., Elsahar, H., Vougiouklis, P., Gravier, C., Laforest, F., Hare, J. and Simperl, E., Mind the (Language) Gap: Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders. In Proceedings of the Extended Semantic Web Conference 2018. https://2018.eswc-conferences.org/wp-content/uploads/2018/02/ESWC2018_paper_131.pdf

Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata

Published in North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2018, 2018

Neural Generation of Summaries from Wikidata for Wikipedia in Underserved Languages

Recommended citation: Kaffee, L.A., Elsahar, H., Vougiouklis, P., Gravier, C., Laforest, F., Hare, J. and Simperl, E., 2018. Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2018. https://arxiv.org/abs/1803.07116

When Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata

Published in 15th International Symposium on Open Collaboration, 2019

Analysing how humans and bots works on multilingual data in Wikidata

Recommended citation: Kaffee, L.-A., Endris, K.M. and Simperl, E., 2019. When Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata. In Proceedings of the 15th International Symposium on Open Collaboration https://opensym.org/wp-content/uploads/2019/08/os19-paper-A16-kaffee.pdf

Ranking Knowledge Graphs By Capturing Knowledge about Languages and Labels

Published in Tenth International Conference on Knowledge Capture, 2019

Ranking knowledge graphs based on class-based label captures, that capture multilinguality on class-level.

Recommended citation: Kaffee, L.-A., Endris, K.M., Simperl, E. and Vidal, M.-E., 2019. Ranking Knowledge Graphs By Capturing Knowledge about Languages and Labels

References in Wikipedia: The Editors’ Perspective

Published in Wiki Workshop at the Web Conference, 2021

Analysis of how Wikipedia editors create references in articles

Recommended citation: Kaffee, L. A., & Elsahar, H. (2021, April). References in Wikipedia: The Editors’ Perspective. In Companion Proceedings of the Web Conference 2021 (pp. 535-538). https://dl.acm.org/doi/abs/10.1145/3442442.3452337

Assessing the quality of sources in Wikidata across languages: a hybrid approach

Published in Journal of Data and Information Quality, 2021

Evaluating Wikidata external references

Recommended citation: Amaral, G., Piscopo, A., Kaffee, L. A., Rodrigues, O., & Simperl, E. (2021). Assessing the quality of sources in Wikidata across languages: a hybrid approach. Journal of Data and Information Quality (JDIQ), 13(4), 1-35. https://dl.acm.org/doi/abs/10.1145/3484828

Using natural language generation to bootstrap missing Wikipedia articles: A human-centric perspective

Published in Semantic Web Journal, 2022

Human-centric perspective on natural language generation for Wikipedia

Recommended citation: Kaffee, L. A., Vougiouklis, P., & Simperl, E. (2021). Using natural language generation to bootstrap missing Wikipedia articles: A human-centric perspective. Semantic Web, (Preprint), 1-30. https://content.iospress.com/articles/semantic-web/sw210431

TempEL: Linking Dynamically Evolving and Newly Emerging Entities

Published in NeurIPS 2022, 2022

The paper presents TempEL, a new dataset for entity linking that captures the impact of evolving entities over time. Using time-stratified English Wikipedia snapshots from 2013 to 2022, TempEL reveals a decrease in entity linking accuracy for both continual entities (up to 3.1%) and newly emerging entities (up to 17.9%), highlighting the challenge of time-evolving entity disambiguation and suggesting new directions for research in this field.

Recommended citation: Zaporojets, K., Kaffee, L. A., Deleu, J., Demeester, T., Develder, C., & Augenstein, I. (2022). TempEL: Linking Dynamically Evolving and Newly Emerging Entities. In Advances in Neural Information Processing Systems (NeurIPS) 35 (2022): 1850-1866.=. https://arxiv.org/abs/2302.02500

Probing Pre-Trained Language Models for Cross-Cultural Differences in Values

Published in Wiki Workshop at the Web Conference, 2023

This paper delves into the exploration of social, cultural, and political values encoded in Pre-Trained Language Models (PTLMs) and investigates how these values vary across cultures. Introducing probes for systematic study, the research reveals that PTLMs capture cultural differences in values, although alignment with established cross-cultural value surveys is weak.

Recommended citation: Arora, A., Kaffee, L. A., & Augenstein, I. (2023). Probing Pre-Trained Language Models for Cross-Cultural Differences in Values. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP) at EACL 2023. https://arxiv.org/abs/2203.13722

Thorny Roses: Investigating the Dual Use Dilemma in Natural Language Processing

Published in EMNLP 2023 (Findings of the Association for Computational Linguistics), 2023

This paper explores the growing concern of dual use in Natural Language Processing (NLP) as technologies become more advanced and opaque. Surveying NLP researchers, it reveals widespread concerns about potential misuse with limited proactive measures. The paper proposes a tailored definition of dual use for NLP, discusses the current state of the issue, and suggests mitigation strategies, including a checklist for ethics frameworks like the ACL checklist.

Recommended citation: Kaffee, L. A., Arora, A., Talat, Z., & Augenstein, I. (2023). Thorny Roses: Investigating the Dual Use Dilemma in Natural Language Processing. Findings of the Association for Computational Linguistics: EMNLP 2023. https://arxiv.org/abs/2304.08315

Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions

Published in EMNLP 2023, 2023

This paper tackles the transparency issue in online content moderation, using Wikipedia as a case study where moderation decisions are publicly discussed. The study introduces a multilingual dataset from Wikipedia editor discussions, demonstrating that combining editor stance and policy reasoning can be accurately predicted, contributing to increased transparency in content moderation.

Recommended citation: Kaffee, L. A., Arora, A., & Augenstein, I. (2023). Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions. In EMNLP 2023. https://arxiv.org/pdf/2310.05779.pdf

talks

Wikidata’s SPARQL introduction

Published: May 22, 2015

Introduction to the new SPARQL endpoint of Wikidata and how to write simple queries.

Wikidata Query Service - The quest of finding the pope with the most children (German)

Published: May 27, 2015

Introduction to querying Wikidata’s SPARQL endpoint, in German

Wikimedia APIs

Published: October 23, 2015

Comprehensive overview how to access data of the Wikimedia projects via APIs.

What to do with all this open data?

Published: December 04, 2015

There is so much you can do with open data! Lucie Kaffee shows three totally different projects she worked on over the last months. Learn about:

“Tree Of Life” build with the data of Wikidata
“Markets-Berlin Project” based on data from Berlin Open Data
“Phones Don’s Grow on Trees Project” Lucie puts special emphasis on the different possibilities we have with open data, the different sources data can come from and the struggles and advantages is has when we use data from different sources.

Wikidata Introduction

Published: January 16, 2016

Introduction to Wikidata for Wikimedia contributer

Increasing access to free and open knowledge for speakers of underserved languages on Wikipedia

Published: January 31, 2016

One of the biggest barriers for accessing knowledge on the Internet is language. We tend to provide information in one or at most a few languages, which makes it hard for speakers of all the other languages to access that same information. This is also an issue on Wikipedia, a project widely and internationally used by all kind of people. But there are many topics that are only covered in few languages on Wikipedia. People who don’t speak any of these don’t have access to all the information available potentially vital to them. This is a huge issue we need to address.

Digital Language Inequality

Published: February 25, 2016

In an era where digital spaces are essential to almost every area of our lives, one thing that many of us take for granted is the simple but essential ability to type and read in our own native languages. Yet at this moment and since the very beginning of the boom of the internet for all things work and play, there are millions upon millions of people whose languages are not part of this equation.

ArticlePlaceholder - Increasing access to free and open knowledge for speakers of underserved languages on Wikipedia

Published: March 30, 2016

Overview over the ArticlePlaceholder project (invited speaker)

Wikidata - the free open knowledge base

Published: March 30, 2016

Introduction to querying Wikidata’s SPARQL endpoint, in German

SPARQL: How I Learned to Stop Worrying and Love the Triple

Published: June 22, 2016

Introduction to writing queries for Wikidata’s SPARQL endpoint

Ladies That FOSS

Published: November 03, 2016

On the last Sunday of October 2016 a group of curious and inspired people got together at Wikimedia Deutschland in Berlin for Ladies That FOSS; an open source hack event aimed primarily at women who want to join a free and open source software (FOSS) project but don’t know where to start. Source Code Berlin was there to listen and observe, a unique experience that we’re excited to share with you in podcast form. So sit back, press play, and listen to participants talking about what they’re passionate about in the world of software and programming as well as their experience and wishes when it comes to the gender gap in the tech industry.

Actions to increase the diversity of our community

Published: January 10, 2017

An ideation style workshop to work on the following question in the context of the Wikimedia technical community:

Increasing access to free and open knowledge for speakers of underserved languages on Wikipedia

Published: October 28, 2017

ArticlePlaceholder is a MediaWiki extension, that pulls information from Wikidata to small language Wikipedias, in case there’s a Wikidata item but no Wikipedia article yet. That enables small language Wikipedias to serve a lot more information to many more users. I would like to give an introduction to the topic and discuss with the audience what needs to be improved to make editors and communities aware of the advantages this extension can have and what to improve to meet the needs.

Languages in Wikidata

Published: October 29, 2017

Wikidata has the ability to serve the needs of many language communities and could change the way computers interact with language online completely - that all depends on the community though.

A Glimpse into Babel - An Analysis of Multilinguality in Wikidata

Published: November 20, 2017

Multilinguality is an important topic for knowledge bases, especially Wikidata, that was build to serve the multilingual requirements of an international community. Its labels are the way for humans to interact with the data. In this talk, we explore the state of languages in Wikidata as of now, especially in regard to its ontology, and the relationship to Wikipedia. Furthermore, we set the multilinguality of Wikidata in the context of the real world by comparing it to the distribution of native speakers. We find an existing language maldistribution, which is less urgent in the ontology, and promising results for future improvements. An outlook on how users interact with languages on Wikidata is given.

Multilinguality in Wikidata

Published: May 17, 2018

Presentation of my work as part of my visit of the Ontology Engineering Group (OEG) at Universidad Politecnica de Madrid (UPM).

Published: May 09, 2025

Künstliche Intelligenz - Die Macht, die Möglichkeiten & die moralischen Fragen (AI - Power, Possibilities & moral questions)

Published: November 24, 2023

General audience talk about the impact of AI in society, with a focus on online communities such as Wikipedia.

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.

Lucie-Aimée Kaffee

Sitemap

Pages

Posts

projects

publications

talks

teaching