Thoughts on Ethics for Data Science
Recently, I am taking a online deep learning course on fast.ai. The lesson 3 of the course is focusing on data ethics. The lecturer, Rachel Thomas, touches a few points that I echo back. Around 1:52:50, there is a short discussion
Q: Maybe the best way to incentivize ethical behavior it to tie financial or reputational risk to good behavior. In some ways, similar to how companies are now investing in cybersecurity because they don’t want to be the next Equifax. Can grassroots compaigns help in better ethical behaviors with regards to the use of AI? Rachel Thomas: …I think it’s hard for people to make the case to their boses of why they should be investing in cybersecurity. Particularly because cybersecurity is something that when it’s working well, you don’t notice it. …
Lots of things like cybersecurity that people are lacking of incentives/motivation to make an investment before something really bad happens to themselves. My personal experience is when designing a product (or running a service), many companies are eager to launch their product/service as early as possible with little test, and they call it “test it in production; move fast; quicker iteration beats fine-tuning”. In some cases, it will only lead to acceptable level of bad user experience while in others it could be devastating and turns out completely shutting down the business. Is there a good strategy to invest early before everything is to late? As mentioned above, it’s really hard to make the case unless it’s tied to financial/reputational risk, which is more close to a postmortem analysis after something bad happens. It’s easier to start this effort if the leadership team pocesses such mindset of prioritizing quality, privacy, security traits of their product/service, and building a culture around them.
Another point that hits me is around 2:05:52. I found the full version of interview and will quote here.
WHAT’S WRONG WITH AI Julia Angwn: I strongly believe that in order to solve a problem, you have to diagnose it, and that we’re still in the diagnosis phase of this. If you think about the turn of the century and industrialization, we had, I don’t know, 30 years of child labor, unlimited work hours, terrible working conditions, and it took a lot of journalist muckraking and advocacy to diagnose the problem and have some understanding of what it was, and then the activism to get laws changed. I feel like we’re in a second industrialization of data information. I think some call it the second machine age. We’re in the phase of just waking up from the heady excitement and euphoria of having access to technology at our fingertips at all times. That’s really been our last 20 years, that euphoria, and now we’re like, “Whoa, looks like there’re some downsides.” I see my role as trying to make as clear as possible what the downsides are, and diagnosing them really accurately so that they can be solvable. That’s hard work, and lots more people need to be doing it. It’s increasingly becoming a field, but I don’t think we’re all the way there.
Dianosing and understanding the full picture of the problem is not an easy problem. It takes non-trivial time to identify the exact problem and serves as the starting point of crafting potential solution. On the one hand, if we don’t fully understand the problem, it may cost way more time and resource to pivot. On the other hand, we can’t spend endless time on diagnosing a problem, it’s not pratical in reality and we never make any progress to exercise some potential solutions to test our understanding. It’s always a evolving and dynamic process that we put enough time ahead to understand the problem, then design solution and test it, measure the results to create a feedback loop to adjust the initial understanding and design a better solution.