From model to market: Scaling data science with MLOps

4-minute read

Quick summary: Machine Learning Operations (MLOps) ensures quality of machine learning systems over time and reduces lead time for moving a model into production.

Your data science team has collected thorough requirements for a pressing business problem, collected and cleaned all the required data, and fit an effective, validated model that is ready to provide real value to real business users.

Mission accomplished, right?

Not so fast. First off, you need to figure out how to get the model in the hands of the business users—and provide quality assurance for model performance long term. Your team needs to consider the following:

“How will my model perform on new, unseen data?”
“Is my model training process reproducible?”
“Can my model scale to the needs of the business?”
“If changes are required, how long will it take to make those changes in production?”

Turns out, this is a difficult problem, and a problem that is not improving over time. Rexter Analytics, who regularly conduct surveys on the data science industry, in a 2017 survey cites that only 13 percent of data scientists say their models always get deployed, and this value has not improved since 2009 when the question first appeared on the survey.

A contributor to this issue is that machine learning production systems have a variety of moving parts outside of pure modeling code. This includes data collection and processing code, environment configuration, process management code, and monitoring code, to name a few. Complexity creates opportunity for technical debt and can increase lead time for both changes and deployment. ML practitioners also cite that the scaling up and versioning/ reproducibility of their models were the two largest challenges their organizations face.

How can this be improved? By borrowing aspects of DevOps. There has been a recent rise in Machine Learning Operations (MLOps), a set of guiding principles focused on automation, collaboration, reproducibility, monitoring, and effective model scaling. The overall goal is ensuring quality of machine learning systems over time and reducing lead time for moving a model into production.

Automation (CI/CD)

Training and deploying a model is a multi-step process. This can often fall into the trap of being treated as a one-off task. Training and deployment processes should be self-contained in an automated pipeline process that can be triggered, not only after code changes, but also to train on new data, either periodically or when performance on recent data exceeds the established drift metric (more on drift metrics below).

By introducing CI/CD and operationalizing model training and monitoring, MLOps provides repeatable, consistent mechanisms for moving models to the target environment, and it opens additional opportunities to incorporate automated integration testing and parity across multiple environments.

Reproducibility—Data and Model Versioning

It’s generally an expectation in software development environments to use a version control system such as Git or SVC for tracking code and configuration artifacts. In data science workflows, there’s an added complexity of being able to create reproducible results (it is data science, after all). This includes being able to take a previous version of the target training/test datasets to reproduce model results.

Data versioning

With the onset of cheap, persistent storage (such as AWS S3), it becomes trivial to version data with modern tools such as Data Version Control, Metaflow, and SageMaker for small to medium datasets. In some workflows, such as SageMaker, this is done automatically alongside model training. For big data problems, repeatable splitting using numeric hashes is a common design pattern versus random value generation.

The cultural impact of making data assets used for model generation shareable and discoverable for a team is invaluable. It encourages emphasis on reproducibility, but also allows reusability by the team to leverage existing work rather than needing to reinvent the wheel when developing new models.

Model versioning

Models trained through the ML pipeline should be saved into a model registry where experiments can be tracked over time. This metadata store should contain pipeline parameter configuration and high-level metrics on experimentation results. By utilizing a common workflow on storing model artifacts and metadata, it can facilitate the decision of which models should be moved into production environments.

Data drift

The cross-validation workflow in machine learning is effective for getting an idea of how a model will generalize on unseen data. The complication with hosting a model in production is that it is impossible to know future and emerging trends in the data that can affect how the model behaves.

Data drift metrics such as population stability index can give insight into how recent periods of data inflow compare to historical data used to train the original models. In addition to data drift, model drift is a useful metric to watch. By using model drift metrics, such as permutation importance, training on new data and comparing feature importance can help the team understand how the model reacts to new data, whether the key drivers in the original model are still relevant, or if there is some indication of a systematic change in the data. Drift metrics like these can be used to trigger a model retraining workflow to produce candidates for new production models.

Scalability

Pipeline design and deployment architecture need to include proper scaling considerations. Scaling issues can occur in data collection, preprocess, model training, or deployment. This varies across IT organizations and cloud platforms in terms of the tool sets available. Modern cloud providers have solutions that enable distributed training of compute intensive deep neural nets, and a scalable deployment mechanism—as either an API call or batch process. There is a tradeoff between opting for a managed solution versus building one internally for hosting, and there is a wide variety of considerations that is out of scope of this article to consider when designing the architecture that works best for your solution.

The path forward

Adoption of machine learning comes with its challenges, and these challenges can make it difficult to implement ML solutions in a production environment. Fitting effective models is important, but by investing in an MLOps strategy, data scientists can be empowered to experiment and move models into production faster, knowing that the models they create will be able to scale to meet the demand required by the business.

Person reading papers in front of laptop screen

Put your data to work for you

We bring together the elements that transform your data into a strategic asset—and a competitive advantage:

Data strategy
Data science
Data engineering
Visual analytics

Explore Advanced Analytics

Alex Johnson is a Machine Learning Architect in Logic20/20’s Advanced Analytics practice. Specializing in wildfire risk modeling for utilities, Alex combines expertise in advanced analytics, cloud engineering, and machine learning to develop impactful solutions. With a background in environmental science and GIS, he brings a unique perspective to wildfire mitigation planning, leveraging data-driven approaches to enhance utility operations and manage risk effectively.