In recent times, Machine Learning has gained importance due to its ability to guide businesses in making precise and accurate decisions. Under the hood, Machine Learning is an iterative and repetitive process. Series of training jobs are done to optimize a model’s predictive performance.
Without the right methods, it is easy to lose track of experimentations with training datasets, hyperparameters, evaluation metrics, and model artifacts. This might in the long run be problematic when you need to reproduce an experiment.
In this article, I will be discussing my top 6 model versioning tools that can greatly improve your workflow. In detail, the outline of this article will be as follows:
- What is model versioning and why is it so important?
- Types of model versioning tools
- Model versioning vs data versioning
- What tools can be used for model versioning
- How do these tools compare with each other?
You may have missed
What is model versioning and why is it so important?
Model versioning in a way involves tracking the changes made to an ML model that has been previously built. Put differently, it is the process of making changes to the configurations of an ML Model. From another perspective, we can see model versioning as a feature that helps Machine Learning Engineers, Data Scientists, and related personnel create and keep multiple versions of the same model.
Think of it as a way of taking notes of the changes you make to the model through tweaking hyperparameters, retraining the model with more data, and so on.
In model versioning, a number of things need to be versioned, to help us keep track of important changes. I’ll list and explain them below:
- Implementation code: From the early days of model building to optimization stages, code or in this case source code of the model plays an important role. This code experiences significant changes during optimization stages which can easily be lost if not tracked properly. Because of this, code is one of the things that are taken into consideration during the model versioning process.
- Data: In some cases, training data does improve significantly from its initial state during model optimization phases. This can be as a result of engineering new features from existing ones to train our model on. Also there is metadata (data about your training data and model) to consider versioning. Metadata can change different times over without the training data actually changing. We need to be able to track these changes through versioning
- Model: The model is a product of the two previous entities and as stated in their explanations, an ML model changes at different points of the optimization phases through hyperparameter setting, model artifacts and learning coefficients. Versioning helps take record of the different versions of a Machine Learning model.
Now, we have defined model versioning and the entities that need to be versioned. But what is the fuss about the concept? How can it help us improve predictive modelling?
Advantages of model versioning
- Model versioning helps us keep track of implementation code and models we build, so we can properly keep track of the development cycle (very important when collaborating on a project).
- Model version can have its corresponding development code and description of performance (with evaluation metrics). We can know the dependencies that improved or reduced the performance of a model.
- Model versioning eases the process of model development, aids AI accountability (a gap many companies in the space seek to fill in recent times), governance and accountability. This is particularly important for Neural Network based models used in self-driving cars, AI powered health applications or stock trading applications.
Model versioning vs data versioning
In some cases, the differences between model and data versioning are quite clear. At other times data practitioners may be confused about the differences and in a way use the terms interchangeably.
As explained above, model versioning refers to tracking the changes and in some cases, improvements made to a model. These changes can occur due to optimization efforts, changes in training data and so on. See the image below for a pictorial representation of model versioning. In this image, we see that different model versions have their different F1_scores. The ML Engineer or Data Scientist must have experimented with different hyperparameters to improve the metric.
Data versioning on the other hand involves tracking changes but this time, to a dataset. The data you work with has a tendency to change over time due to feature engineering efforts, seasonalities and all that. An instance when this can happen is when the original dataset is reprocessed, corrected or even appended to additional data. So, it is important that you track these changes. See the image below for a better explanation
In the image above, we see how data goes through changes. Each of these changes produces a new version of the data, which must be stored.
Key Takeaways: Model versioning is an important aspect of MLOps which involves making and tracking changes made to your model. The implementation code, training data or data, and the model are entities that should be considered in the versioning process. Lastly, model versioning is not the same as data versioning, they mean quite different things.
Best tools for model versioning
Here I will discuss six different tools for model versioning, outlining steps to get started and their different capabilities.
Neptune AI is an MLOps tool that allows you to track and compare model versions. It is primarily built for ML and data science teams to run and track experiments. The platform allows you to store, version, query, and organize models and associated metadata. Specifically, you can store and version things like your training data, code, environment configuration versions, parameters, hyperparameters and evaluation metrics, testset predictions, etc.
Below is the screenshot of Neptune’s model registry UI where you can store metadata about your model.
In addition to the model registry and metadata storage, Neptune AI also has an easy-to-use interface for comparing your models in terms of performance. In the image below, we see how different Keras models are compared in terms of training and validation accuracy, the time spent on creation and modification.
An added advantage to using Neptune AI for model versioning is the ability to re-run a training job. Neptune AI allows you to rerun a training job from the past by just rerunning your implementation code (which is part of the things you should version). See below for an image that explains how this is possible:
Key takeaways: Neptune AI as an MLOps tool allows you to experiment with hyperparameters & compare model versions based on evaluation metrics, store model artifacts, and metadata, versions of training data, and implementation code. Neptune AI has the added advantage of being able to re-run a training job (since the desired implementation code has been versioned).
ModelDB is an open-source MLOps tool that allows you to version your implementation code, data, and model artifacts. It lets you manage models and pipelines built in different programming languages (python, C, Java, and so on) in their native environments and configurations. Model Versioning with ModelDB is easy through your IDE or development environment.
The first step to using Model DB is to make sure it is running in docker. You can easily do this by cloning the repository into your local and running this:
After this, all you need to do is instantiate or set up a modelDB project with:
Version your training data:
After which you can run experiments and store metrics, model artifacts, hyperparameters, and visit the web interface to see the items you have versioned. Here is a snapshot of the web UI:
Key takeaways: ModelDB is an open-source ML tool (signaling more support and high-quality software) that allows you version implementation code, datasets, and model versions. It is language-agnostic allowing you to use a variety of programming languages in their native environments.
Recommended for you
DVC otherwise known as Data Version Control is an open-source MLOps tool that allows you to do version control regardless of choice of programming language. With a git-like experience, DVC allows you to version your training sets, model artifacts and metadata in a simple, fast and efficient way. Storage of large files in DVC is possible with connections to Amazon s3, Google Drive, Google Cloud Storage, and more.
Part of DVC’s features and functionalities include Metric Tracking through commands to list all model version branches and their associated metric values, an ML pipeline framework for related steps, a language-agnostic framework (regardless of the language your implementation code is, you can still work with it), an ability to track failures and so much more.
In using DVC, the first step is to ensure it is installed. You can achieve this by doing the following:
The next step is to have your implementation code ready, create a virtual environment with the venv module, install dependencies and requirements and then start training your model. After your model has been trained, you can now version. For DVC, versioning is as easy as using the DVC add git command to version your data, model, and related stuff. Here is a sample code that shows how to use this command:
Key takeaways: DVC offers a git-like experience in versioning ML models. With this tool, you are able to track evaluation metrics and develop a pipeline/ framework for necessary preprocessing and training steps and an ability to track failures.
May be useful
➡️ Check the comparison between DVC and Neptune
MLFlow is an open-source project that allows Machine Learning Engineers to manage the ML lifecycle. Like other platforms, MLFlow allows you to version data and models, repackage code for reproducible runs. The platform integrates well with a number of ML libraries and tools like TensorFlow, Pytorch, XGBoost as well as Apache Spark.
MLFlow offers four distinct capabilities. These include:
- MLFlow Tracking: Track experiments by logging parameters, metrics, versions of code and output files. Log and query experiments through Python, JAVA APIs and so on.
- MLFlow Projects: Organize implementation code in a reproducible way, following coding conventions. With this, you can rerun your code.
- MLFlow Models: Package ML Models in a standardized way. With this, your models can be used or interacted with through REST API. batch prediction is also possible with Apache spark
- Model Registry: Here you can version your models and have a model lineage that depicts the model’s development life cycle.
Versioning your model is quite easy with MLFlow. A requirement to this is that you have registered the first version of the model. See below for the associated UI
Here you can register the name of your model and upload related metadata and documentation of how the model works. Registering a model version is similarly simple and possible on the same page. When you click Register model, you are able to indicate the model your new version belongs to through a drop-down menu. See the UI below for more explanation
Fetching a model that you had previously versioned is possible through limited lines of code. See an example below:
Key takeaways: MLFlow is one of the top MLOps tools for model versioning. With ML Flow, you are able to log experiments and organize implementation code in a reproducible way and develop a model lineage (model development history) through the model version registrations
➡️ Check an in-depth comparison: MLflow vs Neptune
Pachyderm is a data and model versioning platform that helps data scientists and Machine Learning engineers store different versions of training data in an orderly fashion, offering you traceability through the different changes your data goes through. This tool works on four checkpoints in the ML workflow: data preparation, Experimentation (Training your model with different versions of the data, setting different hyperparameters, and ascertaining suitable metrics), training, and deployment into production.
Data preparation with Pachyderm basically involves ingestion from data sources, processing and transformations, model training, and serving. With pachyderm, you can have all your data in a single location, organize updated versions of your data, run data transformation jobs (a requirement to this is that it runs in a docker) and keep versions of your data.
Pachyderm runs on top of Kubernetes clusters, stores data, and artifacts on Amazon s3. Installing and initializing Pachyderm starts with some dependencies/ requirements that need to be satisfied. The first is to add the Homebrew tap which allows you to tap into different repositories. You can do this in your terminal with the following lines of code:
After this, you install components locally and deploy pachyderm over a Kubernetes cluster:
You can create repositories in Pachyderm to store code and model artifacts with the following lines of code: pachctl create-repo iris. Committing files into this repository is as simple as this: pachctl put-file iris master /raw/iris_1.csv -f data/raw/iris_1.csv
Key takeaways: Pachyderm allows you to store different versions of your training data and models in an orderly fashion. You are also able to run experiments and store artifacts on Amazon s3.
➡️ How does Pachyderm compare to Neptune?
Polyaxon is a platform that provides machine learning packages and algorithms for scalable and reproducible functionalities. Polyaxon boasts of running all machine learning and deep learning libraries like Tensorflow, Scikit Learn and so on, allowing you to push ideas efficiently into production. With regards to model versioning, Polyaxon offers experimentation, model registration and management, and automation capabilities. The first step to using Polyaxon for model versioning is installation. This is possible with this line of code: $ pip install -U polyaxon.
Experimenting with Polyaxon allows you pre-process training data and train your models, run performance analytics to visualize metrics and performance, run notebooks and tensorboards. With an easy to use interface, Polyaxon allows you to visualize evaluation and performance metrics like so:
Key Takeaways: Polyaxon is one of the MLOps tools you should have in your arsenal. It has the ability to run major ML Libraries and packages like Tensorflow, scikit learn. You can also track and visualize evaluation metrics.
➡️ Here’s a detailed comparison of Polyaxon and Neptune
How do these tools compare with each other?
In this section, I will be outlining some characteristics and functionalities to look out for when looking for the right MLOps tool for you.
Number one on the list is pricing. Neptune AI, Pachyderm, and Polyaxon have special pricing plans. Although relatively cheap, they do not compare to MLflow, ModelDB, and DVC which offer free services. These tools are all open source, but they do have indirect costs such as setting it up and maintaining it on your own. So, when choosing a tool, you should decide which option is better for you.
Another thing to look out for is comparative functionality: the ability to compare evaluation metrics of different model versions. All the tools listed above, do offer this. You may also check which type of version control system suits you best.
See the table below which explains how these model versioning tools compare against each other.
on the plan
|Type of Version
(not open source)
|Support for Large
Files and Artifacts
Metrics & Model
Model versioning tools comparison | Source: Author
What can go wrong if you don’t do model versioning?
Model versioning is an important part of the MLOps process. Previously we talked about the importance of model versioning. Here we will look at the consequences of not versioning your ML models, what could really go wrong:
- Misplaced Implementation Code: Implementation codes are part of the entities to version in the model versioning process. Without model versioning, or in this case versioning your implementation code, there are tendencies of losing valuable implementation code. A major fall out of this is not being able to reproduce experiments.
- Pushing half-baked models to production: Like it or not, model versioning serves as a stop gap in the ML process, from model building to production. When we version models, in a way, we prepare our minds, to compare them in terms of their performance, to identify which performs best. Without versioning, we risk pushing weak models to production. This can be costly for a business or customer.
Model versioning is an important aspect of the MLOps workflow. It allows you to retain and organize important metadata about your model, encourages experiments with different versions of training data and hyperparameters, and in a way, points you to the model with the right metrics to solve your business challenge.
Model versioning is made possible and easy with the tools I have explained above. These tools offer a range of capabilities including reproducing experiments, model monitoring and tracking. You can experiment with each of these tools to find what suits you or go through the table above to make your choice.
Version and Compare Datasets in Model Training Runs
You can version datasets, models, and other file objects as Artifacts in Neptune.
This guide shows how to:
- Keep track of a dataset version in your model training runs with artifacts
- Query the dataset version from previous runs to make sure you are training on the same dataset version
- Group your Neptune Runs by the dataset version they were trained on
- See if models were trained on the same dataset version
- Compare datasets in the Neptune UI to see what changed