Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Wikimedia Hackathon 2019 (Prague, Czechia)

3 minute read

Published:

The Wikimedia hackathon is an event to bring together the technical community, that works on the MediaWiki code and technical projects around Wikipedia, Wikidata, WikiSource, and all the other Wikimedia projects. The Wikimedia hackathon 2019 took place in Prague. It has been my fourth time at the hackathon, and it is still one of my favorite events of the Wikimedia events and certainly my favorite hackathon ever. As someone that was there for the first time said: “It is not a typical hackathon, it is more about the people.”

ESWC 2018 (Crete, Greece)

1 minute read

Published:

For the second time, I attended the ESWC conference - this time in Crete. It was great to see known faces again and meet researchers in the field of semantic web.

NAACL 2018 (New Orleans, USA)

3 minute read

Published:

NAACL was the first conference in the area of computational linguistics that I was able to attend. It was very interesting, as many topics are very relatable. It is an interesting mix of students and professors working on linguistics and machine learning problems and combining those two. It was also the biggest conference in academia that I have attended thus far. It was interesting to see the different concepts of such a big conference- for example, most papers were presented as posters, even full research papers. This made interaction with authors much easier and allowed for individually discussion of the topics. Many topics were fascinating to me, as many authors tried to solved challenges provided by working on under-resourced languages. Especially in the area of machine learning this brings a lot of challenges, I was happy to be able to discuss. Further, I had interesting discussions on the topic of code-switching, when user speak one language and use words from a different language in it (e.g. Hindi speakers using English words), a topic I have been following with interest.

The Web Conference 2018 (Lyon, France)

3 minute read

Published:

The Web Conference is one of the most established conferences in the field of research around the web. I had the opportunity to attend this year’s version of it. Below are some of my personal highlights summarized.

WikidataCon 2017 (Berlin)

1 minute read

Published:

The first time ever, Wikidata’s community had its own conference. Everyone interested, contributing to, working or researching on Wikidata gathered in Berlin to watch presentations, talk, and discuss the knowledge base.

projects

Ladies That FOSS Meetup

Published:

Organization of an event to encourage more women to contribute to Open Source projects

Scribe - Helping editors of under-resourced languages to create new high-quality Wikipedia articles

Published:

Scribe is an editing tool designed for underserved language Wikipedias, addressing the lack of information and participation in non-English languages. The tool facilitates article creation by providing section planning, reference collection, and key points for editors in under-resourced communities. By leveraging techniques from Information Retrieval and Natural Language Processing, Scribe aims to improve the editing experience, allowing editors to contribute based on community interests and notability criteria, independent of existing content in other Wikipedias.

publications

What do Wikidata and Wikipedia Have in Common?: An Analysis of their Use of External References

Published in International Symposium on Open Collaboration, 2017

Analysis of external references in Wikidata and Wikipedia

Recommended citation: Piscopo, A., Vougiouklis, P., Kaffee, L. A., Phethean, C., Hare, J., & Simperl, E. (2017, August). What do Wikidata and Wikipedia Have in Common?: An Analysis of their Use of External References. In Proceedings of the 13th International Symposium on Open Collaboration (p. 1). ACM. https://eprints.soton.ac.uk/412922/1/opensym_wd_vs_wp_2_.pdf

Provenance Information in a Collaborative Knowledge Graph: an Evaluation of Wikidata External References

Published in International Semantic Web Conference, 2017

Evaluating Wikidata external references

Recommended citation: Piscopo, A., Kaffee, L. A., Phethean, C., & Simperl, E. (2017, October). Provenance Information in a Collaborative Knowledge Graph: an Evaluation of Wikidata External References. In International Semantic Web Conference (pp. 542-558). Springer, Cham. https://iswc2017.semanticweb.org/wp-content/uploads/papers/MainProceedings/71.pdf

Property Label Stability in Wikidata: Evolution and Convergence of Schemas in Collaborative Knowledge Bases

Published in Wiki Workshop at the Web Conference, 2018

Analysis of stability of schema labels in Wikidata

Recommended citation: Pellissier Tanon, T., & Kaffee, L. A. (2018, April). Property Label Stability in Wikidata: Evolution and Convergence of Schemas in Collaborative Knowledge Bases. In Companion of the The Web Conference 2018 on The Web Conference 2018 (pp. 1801-1803). International World Wide Web Conferences Steering Committee. http://wikiworkshop.org/2018/papers/wikiworkshop2018_paper_10.pdf

Mind the (Language) Gap: Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders

Published in Extended Semantic Web Conference 2018, 2018

Generation of Multilingual Summaries from Wikidata for Wikipedia’s ArticlePlaceholder

Recommended citation: Kaffee, L.A., Elsahar, H., Vougiouklis, P., Gravier, C., Laforest, F., Hare, J. and Simperl, E., Mind the (Language) Gap: Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders. In Proceedings of the Extended Semantic Web Conference 2018. https://2018.eswc-conferences.org/wp-content/uploads/2018/02/ESWC2018_paper_131.pdf

Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata

Published in North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2018, 2018

Neural Generation of Summaries from Wikidata for Wikipedia in Underserved Languages

Recommended citation: Kaffee, L.A., Elsahar, H., Vougiouklis, P., Gravier, C., Laforest, F., Hare, J. and Simperl, E., 2018. Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2018. https://arxiv.org/abs/1803.07116

When Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata

Published in 15th International Symposium on Open Collaboration, 2019

Analysing how humans and bots works on multilingual data in Wikidata

Recommended citation: Kaffee, L.-A., Endris, K.M. and Simperl, E., 2019. When Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata. In Proceedings of the 15th International Symposium on Open Collaboration https://opensym.org/wp-content/uploads/2019/08/os19-paper-A16-kaffee.pdf

Ranking Knowledge Graphs By Capturing Knowledge about Languages and Labels

Published in Tenth International Conference on Knowledge Capture, 2019

Ranking knowledge graphs based on class-based label captures, that capture multilinguality on class-level.

Recommended citation: Kaffee, L.-A., Endris, K.M., Simperl, E. and Vidal, M.-E., 2019. Ranking Knowledge Graphs By Capturing Knowledge about Languages and Labels

TempEL: Linking Dynamically Evolving and Newly Emerging Entities

Published in NeurIPS 2022, 2022

The paper presents TempEL, a new dataset for entity linking that captures the impact of evolving entities over time. Using time-stratified English Wikipedia snapshots from 2013 to 2022, TempEL reveals a decrease in entity linking accuracy for both continual entities (up to 3.1%) and newly emerging entities (up to 17.9%), highlighting the challenge of time-evolving entity disambiguation and suggesting new directions for research in this field.

Recommended citation: Zaporojets, K., Kaffee, L. A., Deleu, J., Demeester, T., Develder, C., & Augenstein, I. (2022). TempEL: Linking Dynamically Evolving and Newly Emerging Entities. In Advances in Neural Information Processing Systems (NeurIPS) 35 (2022): 1850-1866.=. https://arxiv.org/abs/2302.02500

Probing Pre-Trained Language Models for Cross-Cultural Differences in Values

Published in Wiki Workshop at the Web Conference, 2023

This paper delves into the exploration of social, cultural, and political values encoded in Pre-Trained Language Models (PTLMs) and investigates how these values vary across cultures. Introducing probes for systematic study, the research reveals that PTLMs capture cultural differences in values, although alignment with established cross-cultural value surveys is weak.

Recommended citation: Arora, A., Kaffee, L. A., & Augenstein, I. (2023). Probing Pre-Trained Language Models for Cross-Cultural Differences in Values. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP) at EACL 2023. https://arxiv.org/abs/2203.13722

Thorny Roses: Investigating the Dual Use Dilemma in Natural Language Processing

Published in EMNLP 2023 (Findings of the Association for Computational Linguistics), 2023

This paper explores the growing concern of dual use in Natural Language Processing (NLP) as technologies become more advanced and opaque. Surveying NLP researchers, it reveals widespread concerns about potential misuse with limited proactive measures. The paper proposes a tailored definition of dual use for NLP, discusses the current state of the issue, and suggests mitigation strategies, including a checklist for ethics frameworks like the ACL checklist.

Recommended citation: Kaffee, L. A., Arora, A., Talat, Z., & Augenstein, I. (2023). Thorny Roses: Investigating the Dual Use Dilemma in Natural Language Processing. Findings of the Association for Computational Linguistics: EMNLP 2023. https://arxiv.org/abs/2304.08315

Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions

Published in EMNLP 2023, 2023

This paper tackles the transparency issue in online content moderation, using Wikipedia as a case study where moderation decisions are publicly discussed. The study introduces a multilingual dataset from Wikipedia editor discussions, demonstrating that combining editor stance and policy reasoning can be accurately predicted, contributing to increased transparency in content moderation.

Recommended citation: Kaffee, L. A., Arora, A., & Augenstein, I. (2023). Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions. In EMNLP 2023. https://arxiv.org/pdf/2310.05779.pdf

talks

Wikimedia APIs

Published:

Comprehensive overview how to access data of the Wikimedia projects via APIs.

What to do with all this open data?

Published:

There is so much you can do with open data! Lucie Kaffee shows three totally different projects she worked on over the last months. Learn about:

  • “Tree Of Life” build with the data of Wikidata
  • “Markets-Berlin Project” based on data from Berlin Open Data
  • “Phones Don’s Grow on Trees Project” Lucie puts special emphasis on the different possibilities we have with open data, the different sources data can come from and the struggles and advantages is has when we use data from different sources.

Increasing access to free and open knowledge for speakers of underserved languages on Wikipedia

Published:

One of the biggest barriers for accessing knowledge on the Internet is language. We tend to provide information in one or at most a few languages, which makes it hard for speakers of all the other languages to access that same information. This is also an issue on Wikipedia, a project widely and internationally used by all kind of people. But there are many topics that are only covered in few languages on Wikipedia. People who don’t speak any of these don’t have access to all the information available potentially vital to them. This is a huge issue we need to address.

Digital Language Inequality

Published:

In an era where digital spaces are essential to almost every area of our lives, one thing that many of us take for granted is the simple but essential ability to type and read in our own native languages. Yet at this moment and since the very beginning of the boom of the internet for all things work and play, there are millions upon millions of people whose languages are not part of this equation.

Ladies That FOSS

Published:

On the last Sunday of October 2016 a group of curious and inspired people got together at Wikimedia Deutschland in Berlin for Ladies That FOSS; an open source hack event aimed primarily at women who want to join a free and open source software (FOSS) project but don’t know where to start. Source Code Berlin was there to listen and observe, a unique experience that we’re excited to share with you in podcast form. So sit back, press play, and listen to participants talking about what they’re passionate about in the world of software and programming as well as their experience and wishes when it comes to the gender gap in the tech industry.

Increasing access to free and open knowledge for speakers of underserved languages on Wikipedia

Published:

ArticlePlaceholder is a MediaWiki extension, that pulls information from Wikidata to small language Wikipedias, in case there’s a Wikidata item but no Wikipedia article yet. That enables small language Wikipedias to serve a lot more information to many more users. I would like to give an introduction to the topic and discuss with the audience what needs to be improved to make editors and communities aware of the advantages this extension can have and what to improve to meet the needs.

Languages in Wikidata

Published:

Wikidata has the ability to serve the needs of many language communities and could change the way computers interact with language online completely - that all depends on the community though.

A Glimpse into Babel - An Analysis of Multilinguality in Wikidata

Published:

Multilinguality is an important topic for knowledge bases, especially Wikidata, that was build to serve the multilingual requirements of an international community. Its labels are the way for humans to interact with the data. In this talk, we explore the state of languages in Wikidata as of now, especially in regard to its ontology, and the relationship to Wikipedia. Furthermore, we set the multilinguality of Wikidata in the context of the real world by comparing it to the distribution of native speakers. We find an existing language maldistribution, which is less urgent in the ontology, and promising results for future improvements. An outlook on how users interact with languages on Wikidata is given.

Multilinguality in Wikidata

Published:

Presentation of my work as part of my visit of the Ontology Engineering Group (OEG) at Universidad Politecnica de Madrid (UPM).

Published:

Invited Speaker at Credibility Coalitation & NewsQ Workshop

Published:

Invited Speaker at Credibility Coalitation & NewsQ Workshop to talk about misinformation and referencing on Wikipedia and lead efforts to understand how a community of users can work together towards tackling topics of misinformation.

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.