Date : 2016-11-28 | From : hbr.org

“AI,” “big data,” and “machine learning” are all trending buzzwords, and you might be curious about how they apply to your domain. You might even have startups beating down your door, pitching you their new “AI-powered” product. So how can you know which problems in your business are amenable to machine learning? To decide, you need to think about the problem to be solved and the available data, and ask questions about feasibility, intuition, and expectations.

Start by distinguishing between automation problems and learning problems. Machine learning can help automate your processes, but not all automation problems require learning.

Automation without learning is appropriate when the problem is relatively straightforward. These are the kinds of tasks where you have a clear, predefined sequence of steps that is currently being executed by a human, but that could conceivably be transitioned to a machine. This sort of automation has been happening in businesses for decades. Screening incoming data from an outside data provider for well-defined potential errors is an example of a problem ready for automation. (For example, hedge funds automatically filtering out bad data in the form of a negative value for trading volume, which can’t be negative.) On the other hand, encoding human language into a structured dataset is something that is just a tad too ambitious for a straightforward set of rules.

Harnessing the power of machine learning and other technologies.

For the second type of problems, standard automation is not enough – they require learning from data. And we now venture into the arena of machine learning. Machine learning, at its core, is a set of statistical methods meant to find patterns of predictability in datasets. These methods are great at determining how certain features of the data are related to the outcomes you are interested in. What these methods cannot do is access any knowledge outside of the data you provide. For example, researchers at the Univeristy of Pittsburg in the late 1990s evaluated machine learning algorithms for predicting mortality rates from pneumonia. The algorithms recommended that hospitals send home pneumonia patients who were also asthma sufferers, estimating their risk of death from pneumonia to be lower. It turned out that the dataset fed into the algorithms did not account for the fact that asthma sufferers had been immediately sent to intensive care, and had fared better only due to the additional attention.

So what are good business problems for machine learning methods? Essentially, any problems that: (1) require prediction rather than causal inference; and (2) are sufficiently self-contained, or relatively insulated from outside influences. The first means that you are interested in understanding how, on average, certain aspects of the data relate to each other, and not in the causal channels of their relationship. Keep in mind that the statistical methods do not bring to the table the intuition, theory, or domain knowledge of human analysts. The second means that you are relatively certain that the data you feed to your learning algorithm includes more or less all there is to the problem. If, in the future, the thing you’re trying to predict changes unexpectedly – and no longer matches prior patterns in the data – the algorithm will not know what to make of it.

Examples of good machine learning problems include predicting the likelihood that a certain type of user will click on a certain kind of ad, or evaluating the extent to which a piece of text is similar to previous texts you have seen.

Bad examples include predicting profits from the introduction of a completely new and revolutionary product line, or extrapolating next year’s sales from past data, when an important new competitor just entered the market.

Once you verify that your problem is suitable for machine learning, the next step is to evaluate whether you have the right data to solve it. The data might come from you, or from an external provider. In the latter case, make sure to ask enough questions to get a good feel for the data’s scope and whether it is likely to be a good fit for your problem.

Say you have determined that your problem is the classic machine learning problem and you have the data to fit that problem. The last step of the process is your intuition check. Yes, intuition: machine learning methods, however proprietary and seemingly magical, are statistics. And statistics can be explained in intuitive terms. Instead of trusting that the brilliant proposed method will seamlessly work, ask lots of questions.

Get yourself comfortable with how the method works. Does the intuition of the method roughly make sense? Does it fit, conceptually, in your framework of the particular setting or problem you are dealing with? What makes this method especially well suited to your problem? If you are encoding a set of steps, perhaps sequential models or decision trees are a good choice. If you need to separate two classes of outcome, perhaps a binary support vector machine would be best aligned with your needs.

With understanding come more realistic expectations. Once you ask enough questions and receive enough answers to have an intuitive understanding of how the methodology works, you will see that it is far from magical. Every human makes mistakes, and every algorithm is error prone, too. For all but the simplest of problems, there will be times when things go wrong. The machine learning prediction engine with get things right on average but will reliably make mistakes. Mistakes will happen, and they will happen most often in ways that you cannot anticipate.

So the last step is to evaluate the extent to which you can allow for exceptions or statistical errors in your process. Is your problem the kind of problem where getting things right 80% of the time is enough? Can you deal with a 10% error rate? 5%? 1%? Are there certain kinds of errors that should never be allowed? Be clear and upfront about your needs and expectations, both with yourself and with your solution-provider. And once both of you are comfortably on the same page, go ahead. Armed with knowledge, understanding, and reasonable expectations, you are set to reap the benefits of machine learning. Just please be patient.