Language and data: 4 endeavors to watch

Image by Gerd Altmann from Pixabay

Image by Gerd Altmann from Pixabay

- By Karen Borchgrevink, LA Tech4Good founder and executive director

Our recent article on UInclude's Inclusive Writing Tool brought to mind how often language and data come together in some interesting and unexpected ways. In my mind they sometimes seem like opposites: the creativity of living human language, intrinsic and deeply embedded in human culture and experience – and the technical computation inherent in data.

Nevertheless, it’s no surprise that in these times where data is penetrating every fiber of our lives, language and data intersect in countless ways. I see a thread running through some significant work that I love and want to share some some super cool things in my line of sight.

Te Hiku Media nurtures Māori language through data

In 2018, Te Hiku Media recorded over 300 hours of audio of native Māori speakers across New Zealand. It was enough data to build language tech for te reo Māori, the Māori language – including automatic speech recognition and speech-to-text. Te Hiku Media considers this a treasure in sustaining their language, and in resisting corporate attempts to basically buy their language.

Follow Te Hiku on Twitter, browse their website (an awesome bilingual site with much in te reo Māori), and read more from Wired at Māori are trying to save their language from Big Tech.

EGAL’s Equity Fluent Leadership Playbooks

The Berkeley Haas Center for Equity, Gender, and Leadership offers Equity Fluent Leadership Playbooks on language for racial equity & inclusion and responsible language in AI among other topics, with strategies and tools to advance diversity, equity, and inclusion. Get your playbook here!

“Language impacts people and workplaces every day. Language can make people feel like they belong, or be used to discriminate and advance divisiveness and inequity. Simply put, language matters."

Image from Pixabay

Image from Pixabay

Parrots, language and fixing AI

Google fired Dr Timnit Gebru from her position as co-lead of their ethical artificial intelligence (AI) team for “language violations,” you might say. The outlawed paper compared AI “large language models” – driven to find larger and larger data sets – to “parrots” that were simply very good at repeating combinations of words from their training data. “This means that white supremacist and misogynistic, ageist, etc., views are overrepresented,” wrote Gebru and her colleagues in their Stochastic Parrots paper.

An easy-to-understand article on all this came out recently – Why Timnit Gebru Isn’t Waiting for Big Tech to Fix AI's Problems. Share with your friends!

Conscious Style Guide: words as tools

Karen Yin’s Conscious Style Guide is the first website devoted to conscious language and a valuable resource for thinking critically about using language to empower instead of limit. Doesn’t really count as a data project, but can count as an honorable mention.

PS

If you work in this space, I’d love to hear from you! - Karen

Previous
Previous

Sweet victory: Public outcry forces IRS to drop facial recognition!

Next
Next

UInclude: Creating a more inclusive workplace, one job description at a time