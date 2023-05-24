As business leaders strive to get the most out of their analytics investments, democratized data science often seems to offer the perfect solution. Using analytics software with no-code and low-code tools can put data science techniques into the hands of virtually anyone. In the best scenarios, this leads to better decision-making and greater autonomy and self-service in data analysis, especially as the demand for data scientists far exceeds their supply. Add to that reduced talent costs (with fewer high-cost data specialists) and more scalable customization to tailor analytics to a particular business need and context.

However, amidst all the talk about whether and how to democratize data science and analytics, a crucial point has been overlooked. The conversation should define When democratize data and analytics, even to the point of redefining what democratization should mean.

Fully democratized data science and analytics pose many risks. As Reid Blackman and Tamara Sipes wrote in a recent article, data science is difficult, and an untrained expert cannot necessarily solve difficult problems, even with good software. The ease of clicking a button that produces results provides no assurance that the answer is in fact correct, it could be very imperfect and only a qualified data scientist would know.

It’s just a matter of time

Even with these caveats, however, the democratization of data science is here to stay, as evidenced by the proliferation of analysis software and tools. Thomas Redman and Thomas Davenport are among those advocating for the development of citizen data scientists, even going so far as to select basic data science skills and abilities in every hired position.

The democratization of data science should not be taken to extremes, however. Analytics does not need to be available to everyone for an organization to thrive. How many incredibly talented people wouldn’t be hired simply because they lack basic data science skills? It is unrealistic and too limiting.

As business leaders look to democratize data and analytics within their organizations, the real question they should be asking is when does it make the most sense. It starts with recognizing that not all citizens of an organization are comparably qualified to be a citizen data scientist. As Nick Elprin, CEO and co-founder of Domino Data Labs, which provides data science and machine learning tools to organizations, told me in a recent conversation, As soon as you get into modeling , more complex statistical problems often lurk beneath the surface. .

The challenge of data democratization

Consider a grocery chain that recently used advanced predictive methods to properly size its demand planning, with the goal of avoiding having too much inventory (resulting in losses) or too little (resulting in lost sales). . Losses due to spoilage and stock-outs were not huge, but the problem of reducing them was very difficult to solve given all the variables of demand, seasonality and consumer behaviors. The complexity of the problem meant that the grocery chain could not leave it to citizen scientists to solve it, but instead rely on a team of genuine, well-trained data scientists.

Data citizenship requires representative democracy, as Elprin and I discussed. Just as American citizens elect politicians to represent them in Congress (presumably to act in their best interest in legislative matters), organizations also need good representation by data scientists and analysts to weigh in on issues that others simply don’t have the expertise to resolve. .

In short, knowing when and to what extent to democratize data. I propose the following five criteria:

Consider the skill level of citizens: The citizen data scientist, in one form or another, is here to stay. As noted earlier, there simply aren’t enough data scientists to go around, and using this rare talent to solve all data problems isn’t sustainable. Specifically, democratizing data is key to instilling analytical thinking throughout the organization. A well-known example is Coca Colawhich has rolled out a digital academy to train managers and team leaders, producing graduates of the program who are credited with around 20 digital, automation and analytics initiatives across multiple sites in the company’s manufacturing operations .

However, when it comes to engaging in predictive modeling and advanced data analytics that could fundamentally change a company’s operations, it is crucial to consider the skill level of the citizen. A sophisticated tool in the hands of a data scientist is additive and valuable; the same tool in the hands of someone just playing with the data can lead to errors, incorrect assumptions, questionable results, and misinterpretation of results and conclusions.

Measure the importance of the problem: The more important a problem is for the company, the more imperative it is to have an expert in charge of data analysis. For example, generating a simple graph of historical buying trends can probably be accomplished by someone with a dashboard that displays the data in a visually appealing form. But a strategic decision that has a significant impact on a company’s operations requires expertise and reliable precision. For example, how much an insurance company should charge for a policy is so deeply rooted in the business model itself that it would be unwise to leave this task to a non-expert.

Determine the complexity of the problems: Solving complex problems is beyond the capabilities of the typical citizen data scientist. Consider the difference between comparing customer satisfaction scores across customer segments (simple, well-defined, and low-risk metrics) and using deep learning to detect cancer in a patient (complex and high-risk ). Such complexity cannot be left to a non-expert who makes cavalier decisions and potentially the wrong decisions. When the complexity and the stakes are low, democratizing data makes sense.

An example is a Fortune 500 company I work with that uses data throughout its operations. A few years ago I led a training program in which more than 4,500 managers were divided into small teams, each of which was asked to articulate an important business problem that could be solved with analytics. Teams were empowered to solve simple problems with available software tools, but most problems arose precisely because they were difficult to solve. Importantly, these managers were not responsible for actually solving these difficult problems, but rather collaborating with the data science team. Notably, these 1,000 teams identified no less than 1,000 business opportunities and 1,000 ways that analytics could help the organization.

Empower those with domain expertise: If a company is looking for directional information, customer X is more likely to buy a product than customer Y, data democratization and lower-level citizen data science will probably suffice. In fact, tackling these kinds of lower-level analytics can be a great way to give those with domain expertise (i.e., those closest to customers) tools to simplified data. Greater precision (eg with large and complex stakes) requires expertise.

The most compelling case for accuracy is when there are high-stakes decisions to be made based on a certain threshold. If an aggressive cancer treatment plan with significant side effects were to be undertaken at, say, more than a 30% chance of cancer, it would be important to differentiate between 29.9% and 30.1%. Accuracy is especially important in medicine, clinical operations, technical operations, and for financial institutions that navigate markets and risk, often to capture very small margins at scale.

Challenge the experts to detect biases: Advanced analytics and AI can easily lead to decisions that are considered biased. This is difficult in part because the purpose of analysis is to discriminate, that is, to base choices and decisions on certain variables. (Send this offer to that older man, but not that younger woman, because we think they’ll exhibit different buying behaviors in response.) So the big question is when such discrimination is actually acceptable and even good and when it is inherently problematic. , unfair and dangerous to a company’s reputation.

Take the example of Goldman Sachs, accused of discrimination by offering less credit on an Apple Credit Card to women than to men. In response, Goldman Sachs said it did not use gender in its model, only factors such as credit history and income. However, one could argue that credit history and income are correlated with gender and that the use of these variables punishes women who tend to earn less money on average and who have historically had fewer opportunities to establish a loan. When using discriminant results, decision makers and data professionals need to understand how the data was generated and the interrelationship of the data, as well as how to measure things like differential treatment and more. A company should never put its reputation on the line by asking a citizen data scientist to determine on its own whether a model is biased.

The democratization of data has its merits, but it comes with challenges. Giving everyone the keys doesn’t make them experts, and gathering the wrong information can be catastrophic. New software tools can allow everyone to use the data, but don’t confuse this widespread access with real expertise.