Quick access:

Go directly to content (Alt 1) Go directly to first-level navigation (Alt 2)
Logo Goethe-Institut

Max Mueller Bhavan | India

AI & Media Literacy
Feminist Data Set: How To Inspire Equitable AI & Media Literacy

Feminist Data Set
© Freepik

Have you ever clicked ‘accept’ without reading a website’s digital terms and conditions? In this article, we wonder what if all software had a duty of care to explain how it worked and truly centre consent around data usage. Imagine if all AI technology was made with this ethos.

By Caroline Sinders

In an invitation to write this article, the Goethe-Institut gave me two topics but I believe they both intersect here - of what Feminist Data Set can teach us about media literacy and how it can be used to tackle deep fakes and take back agency. Feminist Data Set is an art project of mine using intersectional feminism as a framework for creating new kinds of AI systems and methodologies.

Inherently, I see these two questions as both about literacy, both about the problems with technology, and both extremely focused on a key issue on the internet - returning agency, consent and transparency back to users. If the Web 1.0 was about users building websites and content, and an openness of the internet, Web 2.0 has been about dynamic content and platforms, but also the rise of 'walled gardens' platforms creating very specific, non-adaptable, products where users must adjust to technology, versus how technology should be made with technology adjusting back to us, the users.

As an artist and researcher, this is where one of my larger projects comes into play: Feminist Data Set.

Feminist Data Set is a multi-year art project using intersectional feminism as a framework for investigating machine learning. It’s trans-inclusive and focuses on racial justice. It’s extremely process driven, so the ‘outputs’ of the project or the artifacts are workshops on data and machine learning, essays, printed matter, and documentation.

How we run it

But, what does this mean and what does this look like in practice? Feminist Data Set is theoretical, practice based, and a bit of a dare. It's about taking a methodology and applying it to the messiest of things - making technology. It's not enough to have a manifest or framework. That framework needs to be placed into practice and tested where it falls apart, where it works well, and where it has to be adjusted. Thus, the outputs of Feminist Data Set are research, a data set, handmade software, or the usage of software we deem to be 'feminist'. Feminist, in our view, is intersectional, and particular requirements are defined by our communities. This can be 'accessible' software focusing on disability or usability, software that is easier to use, or open source. Each community picks these 'building blocks' or 'scaffolds' to help further define that specific 'Feminist Data Set' methodology through each of our workshops.

Currently, there are two ‘nodes’ or steps of Feminist Data Set, though there will be more as we go through the machine learning pipeline. The first node has been ‘data collection’ and the second is ‘data cleaning’. The ‘data’ in feminist data is written text of any kind.

In Feminist Data Set, it's about looking at each step and offering an intervention or testing other interventions often explicitly feminist in nature, co-created, and coming from not just technologists but communities, artists and human rights activists. With data collection, it's a slow process. Its slowness is key and stands against the Silicon Valley “move fast and break things.” By avoiding scale and instantaneousness, the slowness allows the project space to breathe and ask, “ what are the rights of our data set?” We set parameters that are flexible and can change. The purpose is also to explore what would be needed to make a data set feminist, and test those suggestions and intentions.

Failures and frictions

But this testing, prototyping, and community driven environment is also the projects ‘failure’ as much as its process. I am a queer nonbinary white person from North America and our data set has a lot of white creators in it. From the beginning, we decided to have no citation requirements for political and equitable reasons. BIPOC folks, trans folks and femmes are under-published and under-cited. Even still, white women are published more than their counterparts, and the data set, despite some of our efforts, does reflect that. So we add in other changes like, no more white women unless that ‘data’ (which is text) directly relates to the physical area or region where we are holding the workshop.

I consider this a failure within the project, but that’s okay. The project is about surfacing failures and frictions and figuring out how those frictions came to be. I believe the ‘drawbacks’ of the project are worth talking about. That even with institutional buy-in, we are only held in community spaces or with spaces that can demonstrate strong ties to their community. Our focus is on the general public. Even with all of that care, foresight and planning, the project has more white women and more cis women than other kinds of creators and writers.

The goal with the project is to take a lot of really outstanding work and methodologies rooted in feminist technology studies, which explore the coded social and historical implications of science and technology on the development of society. This includes how identity constructs and is constructed by these technologies.

We put them to the test by saying “let's build some commercial software about machine learning, but let’s do it by hand.” It's the process of seeing what we can learn from slowness, and from a small intervention to confront the inadequacies of Big Tech. What can we learn from open source and collaboration? I think a lot about how it stands as an opposition towards the opacity of big technology and million dollar budgets.

Can we have small interventions, and can they 'work'? This question is part of our goal. It's an art and research project that uses participatory methods to ask what is feminist machine learning and then ask, can we create it? By nature, this project is about process, it's about greeting and documenting the friction of acknowledging it there. Because applying a theory to reality will not be perfect, it will be bumpy.

Equally unlike big tech, our project acknowledges failure as mentioned above. Other aspects of failure come down to other technical reasons. I own a Macbook Pro, is that a piece of feminist technology? No, not really. So part of this entire process is greeting failures and frictions when we can and mitigating them. This is about harm reduction as much as testing hypotheses, and sharing our learnings with community members and participants and then asking ‘where do we go from here?’

Media literacy and digital agency

The communities often drive the outputs within our workshops, and the workshops which produce our outputs and findings then become specific and contextual to that group of people. I see this as a form of consent and of returning voices and agency back to communities within software development. At present, I'm attempting to write about the findings overall which means we might not have one total ‘Feminist Data Set’ framework but instead a series of nodal findings specific to each workshop, grouped under the themes within machine learning like data collection, data cleaning, model generation, etc.

How does this relate to media literacy and agency? The ethos of Feminist Data Set is about centering each user, and each community. In the age of modern software, with everything being fast paced and outsourced to AI automation, Feminist Data Set offers a necessary imaginary alternative. Feminist Data Set is about user input and community driven technology. I'm interested in what we can learn about technology if we take an approach that's anti-scale and anti-capitalist, and primarily centred on harm reduction.

This is what Feminist Data Set is attempting to create and teach: What if all AI technology was made with this ethos? What if all software had a duty of care to explain how it worked, to truly centre consent around data usage? I do believe this would create more forms of user literacy and agency, within the software itself. This is the Feminist Data Set ethos.

What can big software learn from this? That it needs to be reshaped, remade and restructured to slow down, to allow for more human employees, more human oversight, and more involvement including consent and agency from actual users.

The reality of deep fakes

What of the issue of deep fakes? It's instantaneous. What would a Feminist Data Set approach be to deep fakes? It would question its own existence - why is it here, how does it harm and what does it add to society? It would also demand that the tool stop using real people in its output. It would interrogate its own data. Did the data collection process centre consent from the data creators, compensation back to the original creators of that data, and does it credit and name who created the data that's been vacuumed up? With a Feminist Data Set framework, we might not even have deep fakes, at least deep fakes made without a person's consent, and I see that as an inherently good thing.
 

Top