Challenge 1

Interactive station for recording and crowdsourcing a multilingual speech dataset. Conducted by ZKM (Center for Art and Media, Karlsruhe).  

Background:
Existing speech datasets often lack data of less common languages and speakers with protected attributes like dialect or gender. Thus, computer audition algorithms are not capable of or struggle with understanding underrepresented groups.    

Description: 
The aim of this challenge is to develop a digital interface prototype with necessary tools for crowdsourcing a speech dataset, that lets the users record and validate spoken words. 
The longterm goal (not part of this challenge) is to embed this prototype into an interactive station and deploy it in the museum and other public spaces around the world so we can collect a highly divers speech dataset. The dataset we aim for will contain examples of people saying the word for their language in that same language, e.g., "english", "deutsch", "español", "français". The size of the dataset should be small enough to capture a large variety of speakers and languages. The resulting dataset may be used to overcome language barriers by training an AI to identify the desired language, thereby keeping the bias to a minimum. 

Considerations: 
Due to the pandemic, it is necessary to implement touchless interfaces for interacting with the station. To validate an audio sample, individuals must listen to a recording and either accept or reject it. The resulting tools and dataset will be released under an open-source license. We hope the interactive station will be deployed in public spaces all over the world. The resulting dataset may be used to overcome language barriers by training an AI to identify the desired language. 

Resources:  
  • Example code for Voice Activity Detection in Python (https://github.com/snakers4/silero-vad) 
  • Example collection for touchless interfacing technologies (keyword spotting, gesture recognition, etc.) 
  • Hosting capabilities 

Skills: 
  • Audio Processing 
  • Expertise in using Deep Learning technologies 
  • Frontend Development 
  • Backend Development 
  • Databases 

Partner:
ZKM - Center for Art and Media

ZKM - Center for Art and Media

Challenge 2

Gender-fair post-editing of Machine Translation. Conducted by University of Graz and Vienna. In this challenge, you will find strategies for post-editing and improving biased MT outputs to achieve gender-fair translations between the languages English and German. 

Organizers: 
Manuel Lardelli (University of Graz), Dagmar Gromann (University of Vienna), Waltraud Kolb (University of Vienna) and Katharina Schuhmann (University of Vienna).  

Background: 
Machine translation (MT) systems have been identified as inherently gender-biased over the last years and several approaches to debias MT have been proposed, such as gender tagging and embedding debiasing. However, debiasing methods and discussions on bias in MT have predominantly focused on a binary conception of gender (male/female), disregarding queer and non-binary communities. Since language is an important vehicle for social reality, an inclusive society requires gender-fair language and MT should not facilitate inequality. In the first step, we propose to tackle this issue with a challenge on gender-fair post-editing of MT outputs, which in the future can provide inspiration for automated debiasing methods.   

Description:  
Post-editing has focused on reducing MT errors, stylistic improvements, and/or resolving terminological issues. The challenge is thus, to instead improve gender-biased MT outputs to achieve gender-fair translations between the languages English and German. For each source text, outputs of several commonly available, user-friendly MT systems will be provided for the convenience of the participating teams. However, if preferred, those outputs can be generated on the day of the challenge and implemented by the teams themselves to obtain the most recent MT translation of each system. The first step will consist of choosing one of the translation options considered. During the second step, participating teams will be asked to post-edit the provided translation with a particular focus on gender-fair language.  
 
Different strategies to achieve gender-fair language for English and German will be introduced and exemplified. Participating teams are then asked to track, document and briefly explain their considerations for choosing a specific strategy. For instance, should it be “He is a chairman”, “She is a chairwoman”, “They are a chairperson” or “Ze is a chairperson” and why? These informally documented decisions will be submitted with the post-edited translations at the end of the challenge and compared across teams in an anonymized format, as part of a final challenge outcome shared with all participants.  

Resources: 
A dataset of examples, as well as a handout with strategies to achieve gender-fair language in English and German will be a component of the challenge, including machine translated contents to be subjected to gender-fair post-editing and examples for gender-fair language. 

Articles and websites: 
Gender Bias in Machine Translation: https://arxiv.org/abs/2104.06001
On, ona, ono: Translating Gender Neutral Pronouns into Croatian: https://repozitorij.svkst.unist.hr/en/islandora/object/ffst%3A2887 
Neural Machine Translation Doesn't Translate Gender Coreference Right Unless You Make It: https://arxiv.org/abs/2010.05332 
Nibi-Space (German): https://nibi.space/geschlechtsneutrale_sprache
APA Style (English): https://apastyle.apa.org/style-grammar-guidelines/bias-free-language/gender

Skills: 
For this challenge, participants should have a good knowledge of English and ideally some knowledge of German. Additionally, participants should be interested in; if not passionate about, language, translation, and social equality.   

Partner:
GenderFairMT

GenderFairMT
 

Challenge 3

Creating Datasets and Resources against Societal Biases in AI. Conducted by IfM (Institut für Medien- und Kommunikationspolitik) and FCAI (Finnish Centre for Artificial Intelligence) special interest group in language, speech and cognition.

Background: 
AI might often seem neutral in regards to humans, as it is after all, a machine. However, the reality is quite the contrary. These systems learn from datasets which contain the same biases found in our language and society. When the AI learns from the very language we use that typical doctors are men and nurses are usually women, the AI-generated texts will contain these kinds of biased stereotypes. These faulty texts then get copy-pasted and further spread on the web.  
  
Description: 
Groups will define and analyse societal biases in machine translation systems. Motivated by a definition, participants can then create a dataset or design a platform for users to discuss and bring up examples of bias. The dataset can for instance be used to mitigate bias in a machine translation system, or the platform might raise awareness and inform users, decision makers and machine translation developers to understand what bias is. Through the collaborative effort of a community, we can in this way have the machine translation systems unlearn these biases, educate and inform the decision makers, and also in the future find other machine learning based uses for the datasets. 

Considerations: 
What exactly is bias in Machine Translation? Is it a translation mistake, a metric or a value from an algorithm, or something else? Should we mitigate biases from the machine translation systems, or perhaps address the users and decision makers that the systems are biased? The answers to these questions should ideally motivate the work on dataset or platform creation. 

Resources:  
Language (Technology) is Power: A Critical Survey of "Bias" in NLP
Gender Bias in Machine Translation
Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution and Machine Translation

Examples of Data used to mitigate bias from AI/ML systems: 
Equity Evaluation Corpus (EEC corpus)
Gendered Ambiguous Pronouns (GAP corpus)

Resources for other purposes: 
macht.sprache. (machtsprache.de)

Multilingual Datasets: 
Opus.eu

Partners: 
IfM
FCAI



FCAI & IFM

Challenge 4

Identifying sentences susceptible to machine translation bias. Conducted by Danielle Saunders. During this challenge, you will automatically identify bias- susceptible sentences, ideally in a way that generalises languages other than English. 

Background: 
Errors can occur when translating any sentence. However, some mistranslations may also exhibit bias, which is more common for sentences about people, groups or individuals, whereby a mistranslation might perpetuate harmful social stereotypes. 

Description: 
Current ways to identify these sentences often revolve around hand-curated word lists, and typically focus on English. Therefore, the challenge is to instead automatically identify bias-susceptible sentences ideally in a way that generalises to languages other than just English. 

Resources: 
An example dataset  

Considerations: 
A "toy" test set for this challenge would be a mix of sentences from existing bias datasets and sentences from other sources that are not about people. However, participants should ideally demonstrate their ideas on other datasets that have not been labelled for bias-susceptibility. 

Skills: 
  • Multilinguism 
  • Text processing (e.g. using nltk, spacy, or similar tools) 
  • Language modelling 

Challenge 5

Does Bias in Collections and Archives Survive Translation and Multilingualism? Provided by the Cultural AI Lab (NL).

Background:  In collaboration with the Dutch National Museum of World Cultures (NMvW), the Cultural AI lab has been developing SABIO (the SociAl BIas Observatory): a tool and visual interface for exploratory analysis of bias and patterns of bias in museum collections and heritage archives. At the core of bias exploration in SABIO is an extensible suite of language processing and machine learning algorithms that score and reorganise a given collection. For a more detailed description of the challenge see the GitHub page.

Description:  So far, SABIO has been tested and developed on the NMvW's collection, a monolingual Dutch archive of around 800k colonial artefacts. Even though SABIO has been developed to be language independent, we would like to test this and invite participants to experiment with and extend SABIO in cross- and multi-lingual collections. As potential outcomes, we hope for guidelines on how to highlight and analyse bias across languages in cultural archives and extensions to SABIO, or at least the algorithms it consists of, tailored for multi-lingual datasets.

Considerations:  Many collections, even if fully digitised, are subject to copyrights or only available in idiosyncratic formats. Furthermore, SABIO expects certain properties to present and this will likely restrict the choice of collections for the hackathon. 

Resources:  We provide SABIO and its backend, that is we will facilitate inputting datasets selected by participants into the SABIO interface on our servers and we additionally have published the machine learning algorithms and data processing scripts, all written in Python, on our GitHub repository for open access.

Skills: 
  • machine learning and statistical algorithms, especially for language processing
  • databases and digital archives, data engineering
  • identifying and analysing bias, especially in contexts of cultural heritage.
Partner:
Cultural AI

Cultural AI

Challenge 6

Measuring the effects of representational bias. Conducted by EQUITBL, and WASP-WARA-Media and Language. The goal of this challenge is to find a way to automatically test whether the amount of unbalanced representation of genders affect the quality of the resulting tools with regard to bias for example. 

Organizers:  
EQUITBL project at Umeå and Uppsala Universities, and WASP-WARA-Media and Language. 

Background:  
We know that unbalanced representation of population groups in language data used to train NLP tools leads to the tools ending up biased. However, relatively little is known about how strong this effect is. This would be important knowledge for NLP practitioners. 

Description: 
Develop a method for measuring how biased an NLP tool (e.g., a language model, a sentiment analysis tool pr a co-reference resolution tool) is towards a certain population group. Investigate how bias; as measured by the method, is affected by the amount of representation that group has in the training data. 

Resources:  
Text corpora in Swedish and English will be provided with different amounts of representation for different genders. Source code in python for training a simple NLP tool based on the corpora will also be provided. 

Required skills:  
Participants in the task will need to have a basic understanding of natural language processing and some proficiency in python programming and statistics. 

Partners: 
EQUITBL project at Umeå and Uppsala Universities
WASP-WARA-Media and Language

EQUITBL