Page Not Found
Page not found. Your pixels are in another canvas.
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Page not found. Your pixels are in another canvas.
About me
This is a page not in th emain menu
Published:
The Wikimedia hackathon is an event to bring together the technical community, that works on the MediaWiki code and technical projects around Wikipedia, Wikidata, WikiSource, and all the other Wikimedia projects. The Wikimedia hackathon 2019 took place in Prague. It has been my fourth time at the hackathon, and it is still one of my favorite events of the Wikimedia events and certainly my favorite hackathon ever. As someone that was there for the first time said: “It is not a typical hackathon, it is more about the people.”
Published:
For the second time, I attended the ESWC conference - this time in Crete. It was great to see known faces again and meet researchers in the field of semantic web.
Published:
NAACL was the first conference in the area of computational linguistics that I was able to attend. It was very interesting, as many topics are very relatable. It is an interesting mix of students and professors working on linguistics and machine learning problems and combining those two. It was also the biggest conference in academia that I have attended thus far. It was interesting to see the different concepts of such a big conference- for example, most papers were presented as posters, even full research papers. This made interaction with authors much easier and allowed for individually discussion of the topics. Many topics were fascinating to me, as many authors tried to solved challenges provided by working on under-resourced languages. Especially in the area of machine learning this brings a lot of challenges, I was happy to be able to discuss. Further, I had interesting discussions on the topic of code-switching, when user speak one language and use words from a different language in it (e.g. Hindi speakers using English words), a topic I have been following with interest.
Published:
The Wikimedia Hackathon is an annual event for the technical Wikimedia community to come together and hack on common projects.
Published:
The Web Conference is one of the most established conferences in the field of research around the web. I had the opportunity to attend this year’s version of it. Below are some of my personal highlights summarized.
Published:
The Wikimedia Developer Summit is a meeting every year of software developer in the context of Wikimedia.
Published:
The first time ever, Wikidata’s community had its own conference. Everyone interested, contributing to, working or researching on Wikidata gathered in Berlin to watch presentations, talk, and discuss the knowledge base.
Published:
Organization of an event to encourage more women to contribute to Open Source projects
Published:
Data viszualiztion aiming at illustrating the problems of hardware production
Published:
Scribe is an editing tool designed for underserved language Wikipedias, addressing the lack of information and participation in non-English languages. The tool facilitates article creation by providing section planning, reference collection, and key points for editors in under-resourced communities. By leveraging techniques from Information Retrieval and Natural Language Processing, Scribe aims to improve the editing experience, allowing editors to contribute based on community interests and notability criteria, independent of existing content in other Wikipedias.
Published in International Symposium on Open Collaboration, 2017
Analysis of labels in Wikidata towards coverage of languages
Recommended citation: Kaffee, Lucie-Aimée, et al. "A Glimpse into Babel: An Analysis of Multilinguality in Wikidata." Proceedings of the 13th International Symposium on Open Collaboration. ACM, 2017. https://eprints.soton.ac.uk/413433/1/Open_Sym_Short_Paper_Wikidata_Multilingual.pdf
Published in International Symposium on Open Collaboration, 2017
Analysis of external references in Wikidata and Wikipedia
Recommended citation: Piscopo, A., Vougiouklis, P., Kaffee, L. A., Phethean, C., Hare, J., & Simperl, E. (2017, August). What do Wikidata and Wikipedia Have in Common?: An Analysis of their Use of External References. In Proceedings of the 13th International Symposium on Open Collaboration (p. 1). ACM. https://eprints.soton.ac.uk/412922/1/opensym_wd_vs_wp_2_.pdf
Published in International Semantic Web Conference, 2017
Evaluating Wikidata external references
Recommended citation: Piscopo, A., Kaffee, L. A., Phethean, C., & Simperl, E. (2017, October). Provenance Information in a Collaborative Knowledge Graph: an Evaluation of Wikidata External References. In International Semantic Web Conference (pp. 542-558). Springer, Cham. https://iswc2017.semanticweb.org/wp-content/uploads/papers/MainProceedings/71.pdf
Published in Wiki Workshop at the Web Conference, 2018
Analysis of stability of schema labels in Wikidata
Recommended citation: Pellissier Tanon, T., & Kaffee, L. A. (2018, April). Property Label Stability in Wikidata: Evolution and Convergence of Schemas in Collaborative Knowledge Bases. In Companion of the The Web Conference 2018 on The Web Conference 2018 (pp. 1801-1803). International World Wide Web Conferences Steering Committee. http://wikiworkshop.org/2018/papers/wikiworkshop2018_paper_10.pdf
Published in Extended Semantic Web Conference 2018, 2018
Generation of Multilingual Summaries from Wikidata for Wikipedia’s ArticlePlaceholder
Recommended citation: Kaffee, L.A., Elsahar, H., Vougiouklis, P., Gravier, C., Laforest, F., Hare, J. and Simperl, E., Mind the (Language) Gap: Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders. In Proceedings of the Extended Semantic Web Conference 2018. https://2018.eswc-conferences.org/wp-content/uploads/2018/02/ESWC2018_paper_131.pdf
Published in North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2018, 2018
Neural Generation of Summaries from Wikidata for Wikipedia in Underserved Languages
Recommended citation: Kaffee, L.A., Elsahar, H., Vougiouklis, P., Gravier, C., Laforest, F., Hare, J. and Simperl, E., 2018. Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2018. https://arxiv.org/abs/1803.07116
Published in 15th International Symposium on Open Collaboration, 2019
Analysing how humans and bots works on multilingual data in Wikidata
Recommended citation: Kaffee, L.-A., Endris, K.M. and Simperl, E., 2019. When Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata. In Proceedings of the 15th International Symposium on Open Collaboration https://opensym.org/wp-content/uploads/2019/08/os19-paper-A16-kaffee.pdf
Published in Tenth International Conference on Knowledge Capture, 2019
Ranking knowledge graphs based on class-based label captures, that capture multilinguality on class-level.
Recommended citation: Kaffee, L.-A., Endris, K.M., Simperl, E. and Vidal, M.-E., 2019. Ranking Knowledge Graphs By Capturing Knowledge about Languages and Labels
Published in Wiki Workshop at the Web Conference, 2021
Analysis of how Wikipedia editors create references in articles
Recommended citation: Kaffee, L. A., & Elsahar, H. (2021, April). References in Wikipedia: The Editors’ Perspective. In Companion Proceedings of the Web Conference 2021 (pp. 535-538). https://dl.acm.org/doi/abs/10.1145/3442442.3452337
Published in Journal of Data and Information Quality, 2021
Evaluating Wikidata external references
Recommended citation: Amaral, G., Piscopo, A., Kaffee, L. A., Rodrigues, O., & Simperl, E. (2021). Assessing the quality of sources in Wikidata across languages: a hybrid approach. Journal of Data and Information Quality (JDIQ), 13(4), 1-35. https://dl.acm.org/doi/abs/10.1145/3484828
Published in Semantic Web Journal, 2022
Human-centric perspective on natural language generation for Wikipedia
Recommended citation: Kaffee, L. A., Vougiouklis, P., & Simperl, E. (2021). Using natural language generation to bootstrap missing Wikipedia articles: A human-centric perspective. Semantic Web, (Preprint), 1-30. https://content.iospress.com/articles/semantic-web/sw210431
Published in NeurIPS 2022, 2022
The paper presents TempEL, a new dataset for entity linking that captures the impact of evolving entities over time. Using time-stratified English Wikipedia snapshots from 2013 to 2022, TempEL reveals a decrease in entity linking accuracy for both continual entities (up to 3.1%) and newly emerging entities (up to 17.9%), highlighting the challenge of time-evolving entity disambiguation and suggesting new directions for research in this field.
Recommended citation: Zaporojets, K., Kaffee, L. A., Deleu, J., Demeester, T., Develder, C., & Augenstein, I. (2022). TempEL: Linking Dynamically Evolving and Newly Emerging Entities. In Advances in Neural Information Processing Systems (NeurIPS) 35 (2022): 1850-1866.=. https://arxiv.org/abs/2302.02500
Published in Wiki Workshop at the Web Conference, 2023
This paper delves into the exploration of social, cultural, and political values encoded in Pre-Trained Language Models (PTLMs) and investigates how these values vary across cultures. Introducing probes for systematic study, the research reveals that PTLMs capture cultural differences in values, although alignment with established cross-cultural value surveys is weak.
Recommended citation: Arora, A., Kaffee, L. A., & Augenstein, I. (2023). Probing Pre-Trained Language Models for Cross-Cultural Differences in Values. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP) at EACL 2023. https://arxiv.org/abs/2203.13722
Published in EMNLP 2023 (Findings of the Association for Computational Linguistics), 2023
This paper explores the growing concern of dual use in Natural Language Processing (NLP) as technologies become more advanced and opaque. Surveying NLP researchers, it reveals widespread concerns about potential misuse with limited proactive measures. The paper proposes a tailored definition of dual use for NLP, discusses the current state of the issue, and suggests mitigation strategies, including a checklist for ethics frameworks like the ACL checklist.
Recommended citation: Kaffee, L. A., Arora, A., Talat, Z., & Augenstein, I. (2023). Thorny Roses: Investigating the Dual Use Dilemma in Natural Language Processing. Findings of the Association for Computational Linguistics: EMNLP 2023. https://arxiv.org/abs/2304.08315
Published in EMNLP 2023, 2023
This paper tackles the transparency issue in online content moderation, using Wikipedia as a case study where moderation decisions are publicly discussed. The study introduces a multilingual dataset from Wikipedia editor discussions, demonstrating that combining editor stance and policy reasoning can be accurately predicted, contributing to increased transparency in content moderation.
Recommended citation: Kaffee, L. A., Arora, A., & Augenstein, I. (2023). Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions. In EMNLP 2023. https://arxiv.org/pdf/2310.05779.pdf
Published:
Introduction to the new SPARQL endpoint of Wikidata and how to write simple queries.
Published:
Introduction to querying Wikidata’s SPARQL endpoint, in German
Published:
Comprehensive overview how to access data of the Wikimedia projects via APIs.
Published:
There is so much you can do with open data! Lucie Kaffee shows three totally different projects she worked on over the last months. Learn about:
Published:
Introduction to Wikidata for Wikimedia contributer
Published:
One of the biggest barriers for accessing knowledge on the Internet is language. We tend to provide information in one or at most a few languages, which makes it hard for speakers of all the other languages to access that same information. This is also an issue on Wikipedia, a project widely and internationally used by all kind of people. But there are many topics that are only covered in few languages on Wikipedia. People who don’t speak any of these don’t have access to all the information available potentially vital to them. This is a huge issue we need to address.
Published:
In an era where digital spaces are essential to almost every area of our lives, one thing that many of us take for granted is the simple but essential ability to type and read in our own native languages. Yet at this moment and since the very beginning of the boom of the internet for all things work and play, there are millions upon millions of people whose languages are not part of this equation.
Published:
Overview over the ArticlePlaceholder project (invited speaker)
Published:
Introduction to querying Wikidata’s SPARQL endpoint, in German
Published:
Introduction to writing queries for Wikidata’s SPARQL endpoint
Published:
On the last Sunday of October 2016 a group of curious and inspired people got together at Wikimedia Deutschland in Berlin for Ladies That FOSS; an open source hack event aimed primarily at women who want to join a free and open source software (FOSS) project but don’t know where to start. Source Code Berlin was there to listen and observe, a unique experience that we’re excited to share with you in podcast form. So sit back, press play, and listen to participants talking about what they’re passionate about in the world of software and programming as well as their experience and wishes when it comes to the gender gap in the tech industry.
Published:
An ideation style workshop to work on the following question in the context of the Wikimedia technical community:
Published:
ArticlePlaceholder is a MediaWiki extension, that pulls information from Wikidata to small language Wikipedias, in case there’s a Wikidata item but no Wikipedia article yet. That enables small language Wikipedias to serve a lot more information to many more users. I would like to give an introduction to the topic and discuss with the audience what needs to be improved to make editors and communities aware of the advantages this extension can have and what to improve to meet the needs.
Published:
Wikidata has the ability to serve the needs of many language communities and could change the way computers interact with language online completely - that all depends on the community though.
Published:
Multilinguality is an important topic for knowledge bases, especially Wikidata, that was build to serve the multilingual requirements of an international community. Its labels are the way for humans to interact with the data. In this talk, we explore the state of languages in Wikidata as of now, especially in regard to its ontology, and the relationship to Wikipedia. Furthermore, we set the multilinguality of Wikidata in the context of the real world by comparing it to the distribution of native speakers. We find an existing language maldistribution, which is less urgent in the ontology, and promising results for future improvements. An outlook on how users interact with languages on Wikidata is given.
Published:
Presentation of my work as part of my visit of the Ontology Engineering Group (OEG) at Universidad Politecnica de Madrid (UPM).
Published:
Invited panelist for a panel on Computers as content consumers: Are publishers ready for the new readers? Event Website
Published:
Invited Speaker at Credibility Coalitation & NewsQ Workshop to talk about misinformation and referencing on Wikipedia and lead efforts to understand how a community of users can work together towards tackling topics of misinformation.
Published:
Invited Speaker at Qurator Conference to talk about the Scribe project. More details on the talk
Published:
Giving an insight into my journey to a PhD, winning the Semantic Web Distinguished Dissertation Award.
Published:
General audience talk about the impact of AI in society, with a focus on online communities such as Wikipedia.
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.