3 Ways to be a Responsible Data Practitioner
If you work in or with data, you're in good company! There are hundreds of thousands of data professionals in the United States alone (U.S. Bureau of Labor Statistics), and if we include those who work indirectly with data to make decisions, the number is in the millions. The field is rapidly growing and new roles are emerging: technical roles like analytics engineer and deep learning scientist, plus new and evolving capacities like data storyteller and data steward, among others.
We believe there is a ton of value in working to implement better data practices at the grassroots level: starting with the individuals who already work with data. Each of these individuals has a choice about whether to prioritize better decision-making practices, from whether the use case is appropriate for machine learning to what should appear on a dropdown menu.
There are many terms to describe the growing community addressing these challenges, and one that we like is responsible data practitioners. Given the current patchwork of standards that apply to data in some industries but not others, getting the broader data community talking about these ideas and implementing new norms will set the stage for broader positive change.
Whether you already identify as a responsible data practitioner or if you're new to the idea, here are…
Three things you can do to advocate for equity in your data work
1. Establish norms
Roles in the data space range from highly technical to leadership, from entry level to CEO, and everything in between, forming a diverse data community with many different skills, interests, and goals. Data roles can also be found across nearly every industry and sector. This variety is one of the main reasons that the data profession does not have a standardized code of ethics or a set of professional responsibility norms as clearly-defined as other professions. In the UK, industry groups are already actively working to address this, with an objective “to uphold an ethical use of the public's data, and ultimately, make data scientists trusted professionals – as trusted as doctors, lawyers or architects.” (Data scientists are used to making up the rules. Now they're getting some of their own to follow).
Without a unifying set of principles to guide this work here in the US, individual data practitioners have the opportunity to start building these practices from the ground up, to form consensus about how responsible data practices should be defined, and to build a movement around bringing a set of professional standards into the mainstream. By thinking through how data responsibility norms should be defined, individual data practitioners can help shape the future of data industry standards. Making these principles both general enough to apply across the data profession and specific enough to be tangibly meaningful will be a challenge, but it's also not one that anyone needs to manage on their own. As a professional community, we can each start implementing data equity frameworks into our work, learning from those experiences, and doing the important work of collectively formulating a set of principles to build an ethical foundation for our profession.
2. Ask tough questions
No matter how you engage with data day-to-day, there are some foundational questions you can ask to guide projects with a lens of equity, ethics, and justice. One starting point is to ask the questions that Sasha Costanza-Chock, author of Design Justice*, poses (To Truly Be Just, ‘Design Challenges’ Need to Listen to the Communities they Claim to Serve):
“What story is told? How is the problem framed? Who decides the scope? What values are built into the designed objects and processes? Who benefits? Who loses?”
Asking these questions early in a data project invites curiosity and learning, and can uncover issues of inequity. For example, power dynamics at play in who defines the scope of the data project; who is represented and who is not represented in the dataset and why; how privacy and transparency are practiced and to whose benefit; and whose values are guiding the creation, analysis, and sharing of the data. Sometimes these questions have straightforward answers, in these cases it is an equity best practice to document the answers. In other instances, these questions might unearth unclear motivations or uncomfortable truths.
Let’s embrace the difficult questions, and bring our curiosity to dig deeper. We can do this by asking further questions like “why” and “how might we”: Why is this project being done? Why was this decision made? Why isn’t this group represented? How might we re-frame this project with equity as a core principle? How might we incorporate additional data to provide better context or representation in our project? Asking and answering these questions, especially when it’s difficult or diverges from how your organization has operated in the past, is a crucial step to practicing data responsibility.
3. Abandon your bias
Neil deGrasse Tyson provides an approach to bias (Our Systems of Belief):
“Everybody has bias… Just be ready to get your stuff checked, and be ready to just abandon your cherished thoughts and ideas in the face of conflicting evidence.”
For data practitioners, being open to recognizing, examining, and taking corrective action to mitigate their biases is the only way to prevent them from causing unintended consequences. By approaching each project with curiosity and a learning mindset, data practitioners can uncover what assumptions are being brought into a project, what biases may be reflected in the data, and what unintentional frames may be showing up in the interpretation phase. Bias awareness is a lifelong practice, not a destination. As with many things, it gets easier with repetition.
Apply these practices throughout the data project lifecycle
Each stage in the lifecycle of a data project represents its own set of opportunities to apply these three practices as a responsible data practitioner. In simple terms, the stages of a project can be framed as:
Problem definition → Data collection → Analysis, delivery & presentation
Here are some tips for applying these practices at each stage of a data project.
First: Problem definition
Appropriate problem definition is the most significant part in any data project, as it sets guideposts for all subsequent steps.
How problems are defined, and by whom, inevitably incorporates assumptions that should be examined by the project team.
In one famous case of failed problem definition, the Gates Foundation spent hundreds of millions over several years developing solar-powered toilets for communities without access to more advanced sanitation. It was only after they delivered toilets to their pilot communities that they discovered far more pressing problems were the safety of female community members when using them, and lack of familiarity or training in the communities for how to maintain the units. Until these concerns were addressed, the toilets sat unused. (Costanza-Chock)
Directly involve community members to mitigate bias in defining a problem, and balance any competing concepts. Move beyond abstractions and push for concrete problems and solutions. Ask lots of questions and be open to what is learned. Focus on what success would look like, from the right perspective. It is nearly impossible for outsiders to understand a community's needs without assistance from the community itself.
Second: Data collection
How and what data is collected impacts how it can be used and what information it will be equipped to communicate.
For example, if a project wants to understand how outcomes for a new medical treatment differ for people aged 65+ vs those aged 21-30, but the data does not include age, the dataset will be unable to inform safe guidelines for dosage by age.
Corrections for missing or incomplete data include thinking ahead about data scope and structure as early in a project as possible, making appropriate updates to data collection practices when a gap is detected, and giving people impacted by a project some control over how they are represented.
Third: Analysis, delivery & presentation
Once a dataset contains what we need to answer a question, those pesky biases enter the analysis, delivery and presentation phases.
Again, awareness is critical, and it can be helpful to assemble a team with varying backgrounds, perspectives, and experiences to help stop individual biases from taking over. The best-case is to continue involving the communities in question in every phase of a project.
Conclusion
Being a responsible data practitioner involves intentional work beyond the already complex endeavor of turning raw data into actionable information. The good news is it also adds real human value by strengthening the final results with the human context. Bias awareness will help make the data interpretation as meaningful and equitable as possible.
Responsible data practices begin by asking: What is one small action you can take today to focus on equity in a current data project?
Read more on applying best practices through the lifecycle of a data project: Applying an Intersectionality Lens in Data Science