Do we need AutoML… or AutoDM (Automated Data Management)?

August 2, 2021 Steve

Instead of specializing in “Automated Machine Learning” or AutoML, maybe we ought to provide consideration to “Automated Data Management” or AutoDM?

You almost definitely know that feeling. You start a weblog with some ideas to share, nonetheless the whole thing modifications while you get started. That’s what occurred with this weblog. I discussed the promise and potential of Automated Machine Learning (AutoML) in my weblog “What Movies Can Teach Us About Prospering in an AI World“. It seems pretty spectacular.

So, I decided to conduct a LinkedIn poll to garner insights from real-world practitioners, of us who can see through the hype and BS (and that’s not Bill Schmarzo…or maybe it must be) regarding the potential ramifications of AutoML. It was these conversations that lead to my epiphany. But sooner than I dive into my epiphany, let’s current further on AutoML.

What is AutoML?

“Automated machine learning (AutoML) automates making use of machine learning to real-world points. AutoML covers your complete pipeline from the raw info to the deployable machine learning model. The extreme diploma of automation in AutoML permits non-experts to create ML fashions with out being specialists in machine learning. AutoML presents some nice advantages of manufacturing simpler choices, sooner creation of those choices, and fashions that normally outperform hand-designed fashions.[1]” See Figure 1.

Figure 1: Image sourced from: “A Review of Azure Automated Machine Learning (AutoML)”

Man, that is pretty a promise. But proper right here’s the AutoML gotcha: to make AutoML work, info specialists need to hold out important info administration work sooner than getting into into the algorithm alternative and hyperparameter optimization benefits of AutoML. This consists of:

Data Pre-processing which contains info cleansing (detecting and correcting corrupt or inaccurate information), info modifying (detecting and coping with errors inside the info), and knowledge low cost (elimination of redundant info components).
Data wrangling which transforms and maps info from one “raw” info format proper right into a format that is usable by the AI/ML fashions.
Feature Engineering which is the tactic of leveraging space info to ascertain and extract choices (traits, properties, attributes) from raw info that are related to the problem being addressed.
Feature Extraction entails reducing the number of choices or variables required to clarify an enormous set of data. This probably requires space info to ascertain these choices most associated to the problem being addressed.
Feature Selection is the tactic of selecting a subset of associated choices (info variables) for use in model growth. Again, this probably requires space info to ascertain these choices most associated to the problem being addressed.

That’s an entire lot of labor to do sooner than even getting into into the AutoML space. But us outdated info canines already knew that 80% of the analytics work was in info preparation. It’s merely that proper now’s AI/ML period desires to hearken to that, and who larger to ship that message than certainly one of many enterprise’s AI/ML religious leaders – Andrew Ng.

Andrew Ng: Become More Data-Focused and Less Algorithm-Focused

Here is a ought to watch video from by Andrew titled “Big Data to Good Data: Andrew Ng Urges ML Community to Be More Data-Centric And Less Model-Centric”. There are a whole lot of good insights inside the video, nonetheless what struck me was Andrew’s private epiphany on the essential significance of spending a lot much less time tweaking the AI/ML fashions (algorithms) and investing further time on enhancing the information prime quality and completeness that feeds the AI/ML fashions. Andrew’s message is form of clear: whereas tweaking the AI/ML algorithms will help, larger enhancements in complete AI/ML model effectivity and accuracy is perhaps achieved by prime quality and completeness enhancements inside the info that feed the AI/ML algorithms (see Figure 2).

Figure 2: Transitioning from Algorithm-centric to Data-centric AI/ML Model Development

And bear in mind that these enhancements in info prime quality and completeness that feeds the AI/ML fashions will revenue all AI/ML fashions that use that exact same info! Sounds slightly so much identical to the Schmarzo Economic Digital Asset Valuation Theorem – the monetary theorem on sharing, reusing, and refining of the group’s info and analytic belongings.

In the video, Andrew shared onerous info with respect to enchancment in outcomes from tweaking the model (algorithm) versus enhancing info prime quality and completeness (see Figure 3).

Figure 3: Improving the Code versus Improving the Data

In the three use situations in Figure 4, there was really no enchancment in AI/ML model accuracy and effectiveness from tweaking the AI/ML fashions. However, efforts utilized in direction of enhancing the information yielded quantifiable enhancements, and in a single case, very important enhancements!

LinkedIn Poll: What Is True About AutoML?

Figure 4 reveals the LinkedIn poll outcomes the place I requested contributors to select the selection they felt was most true about AutoML (sorry, solely 4 selections will be discovered on LinkedIn).

Figure 4: LinkedIn AutoML Poll

If we difficulty the “All of the Above” choice with the best two choices, we get the subsequent outcomes:

62% of respondents actually really feel AutoML will help automate info science model development
56% of respondents actually really feel AutoML will enable enterprise clients to assemble their very personal ML fashions

Unfortunately, not having a “None of the Above” risk was unfair on account of the outcomes of the poll differ from poll suggestions. Here is my summary of those suggestions:

AutoML will not be altering info scientists anytime rapidly. However, AutoML could assist jumpstart the Data Science course of in ML model exploration, model alternative, and hyperparameter tuning.
AutoML will not immediately flip enterprise analysts into info scientists. That’s on account of ~80% of the ML model development effort continues to be centered on info preparation. To quote one particular person, “AutoML by untrained clients could possibly be like giving an elite athlete teaching plan and meals plan to frequent people and anticipating elite outcomes.”
AutoML is perhaps rather more lacking as Data Scientist’s info preparation work evolves to semi-structured (log info) and unstructured info (textual content material, images, audio, video, odor, waves).
Realizing the AutoML promise would require a strong metadata method and plan.
AutoML could help in AI/ML product administration as a result of the number of manufacturing ML fashions grows into the tons of and tons of. But AutoML would need an automated set-up to observe and correct for ML info drift whereas in manufacturing.
Automating the ML course of is simply a small step. AutoML outcomes need to be explainable to help within the evaluation of the analytic outcomes using strategies much like SHAP or CDD.
AutoML is a commodification of the loops and utilities that ML of us run through diversified ML algorithms, tune hyper-parameters, create choices, and calculate metrics of every kind.
AutoML could possibly be a helpful gizmo to get align teams spherical an organization’s ML aspirations. A space solely prospers when all people from every self-discipline can use it to aim utterly totally different ideas.
For AutoML to attain success, it is essential essential to scope, validate, and plan the operationalization of the problem that one is trying to resolve (e.g., Is the purpose variable proper right here *really* what you want to model? Are all of the inputs on the market in a producing setting? What choices will this model assist? How will you monitor the continued accuracy and utilization of the model? How will you govern modifications to the model, along with commissioning and decommissioning it?). Hint, see Hypothesis Development Canvas?
Finally, is AutoML a promoting and advertising ploy by cloud distributors to broaden their attraction to include enabling enterprise clients to assemble their very personal ML fashions?

I like to recommend that you just strive the chat stream. The suggestions have been very enlightening.

AutoML or AutoDM Summary

My takeaway is that the thought of AutoML is good, nonetheless scope of the AutoML imaginative and prescient is missing 80% of the AI/ML model development and operationalization – providing top of the range and full info that feeds the AI/ML fashions. Figure 5 from “Big Data to Good Data: Andrew Ng Urges ML Community to Be More Data-Centric And Less Model-Centric” correctly summarizes the broader AutoML downside with respect to info administration.

Figure 5: Scope of What AutoML Needs to Address

Instead of specializing in “Automated Machine Learning” or AutoML, maybe we ought to provide consideration to “Automated Data Management” or AutoDM?

Now that’s a thought…

[1] Wikipedia, AutoML https://en.wikipedia.org/wiki/Automated_machine_learning

What is AutoML?

Andrew Ng: Become More Data-Focused and Less Algorithm-Focused

LinkedIn Poll: What Is True About AutoML?

AutoML or AutoDM Summary

You May Also Like

Could ABBAtars be the business model for the metaverse and 5G?

Next-Gen Data Scientist: Thinking Like an Economist

KDnuggets News, June 1: The Complete Collection of Data Science Books; Projects That Will Land You The Job in 2022