How to Track and Analyze Experiments in Machine Learning: A Beginner's Guide
As a machine learning (ML) practitioner, you must work simultaneously with code, data, and models. With the rapid evolution of these factors during development, keeping up with their interaction becomes even more demanding.
This article enables you to understand the concept of experiment tracking.
Practices in ML are becoming more streamlined, structured, and defined with the recent adoption of Machine learning operations (MLOps). MLOps workflows have different phases, and this article focuses on data and model experiment tracking in the model management phase. It provides better solutions for ML-driven development, but solutions always have trade-offs.
Model management is a crucial part of the MLOps lifecycle because this is where many rapid changes happen. From running a series of experiments to making little tweaks in the programs in a “needle in a haystack” fashion to figure out what works best quickly.
Why experiment tracking?
The early stages of developing ML solutions feel like a series of experiments with no direction. The critical question then becomes, “how can you conduct the experimentation fast, reproducible, and scalable way and simultaneously provide business value?”
The idea is to combine science (experimentation) and engineering workflows since you will work primarily with code, data, and models. Code is usually static, which we consider as the engineering aspect of the workflow, whereas data and models are dynamic and scientific. This engineering vs. scientific gap is where the concept of experiment tracking comes to play.
Experiment tracking in research
As with any research process, the scientist runs a series of experiments under certain conditions and variables before, hopefully, arriving at an optimal result for a particular problem. In the case of ML, the experiments would be the different times model training was carried out. The conditions and variables during an experiment would be the model’s hyperparameters, data features, etc.
Also, let’s not forget that by the end of the research, there is always this little book where the scientist records the final observations, findings, and discoveries — which engineers, other researchers, and other researchers use in the future. This is precisely what happens at the model development stage of ML development.
The focus is on recording all the reactions occurring during the training process by logging the training metrics, parameters, and artifacts for further analysis and understanding of the model’s performance for each experiment. With this, you can gain insights from experimentation. That knowledge equips us with the right tools to investigate the entire process and make better deductions on what step to take next.
The need to capture your data in ML experiments
Data tracking is an extension of the research process in ML; it entails keeping a proper record of the data that is used for training. It captures data in the form of versions in a feature store or database store for further investigation.
The data versions are driven mainly by the data preprocessing steps. These data preprocessing steps are processes/tasks that primarily structure the data for the type of model you are building. Different forms of data then morph into an acceptable form when passing them as input into the model.
It also entails feature engineering to a large extent in ML/data science because, as the famous data science saying goes, “the model is only as good as the data.” Versioning makes whatever preprocessing steps you have taken on the data easily traceable across preprocessing pipelines, just as code versions are tracked using Git.
Who conducts experiment tracking in ML?
The responsibility of keeping track of the experiments carried out when developing an ML model is the job of any individual directly training the model. Typically, this is a data scientist (DS) or an ML engineer (MLE).
DSs/MLEs curate all the information gathered during model training to easily compare differences across experiments. They create a high-level summary of all the training information in a way that makes sense to them and other technical personnel. This represents a form of documentation that allows communication and understanding of what goes on in their development environment (usually a Jupyter notebook) on a higher level.
By drawing hypotheses from previous experiments, DSs/MLEs can deduce the next promising strategy to apply to the data, model, or code when trying to arrive at an optimal ML solution. It also makes it easy to share experiments with other technical people or stakeholders in a way they can understand.
The tools for experiment tracking
Many tools have been developed for tracking experiments, such as MLflow, weights&biases, Neptune.ai, etc. The core purpose of these tools is to enable a central point of reporting during the model training process. Managing model development workflows become more accessible because all the findings during the experimentation phase will be populated at a single point, irrespective of where you run the experiment (i.e., either locally or virtually, on different machines) or who ran the experiment (i.e., having other engineers running experiments in parallel),
However, there are trade-offs between the numerous experiment tracking tools that exist. These trade-offs influence your decision when picking the perfect tool for your ML development stack and workflows based on the components, features, and integration they support.
How should experiment tracking be carried out
In a nutshell, the processes involved in tracking, recording, and carrying out experiments depend on the type of data, use cases, and the recording method of the tool. Data vary from structured to unstructured, and use cases could be computer vision, natural language processing (NLP), reinforcement learning, etc. It’s also quite common for the methods that tracking tools usually use to fall within the options of a spreadsheet method (like MS Excel), a version control method (like Git), or a software package/platform (like MLflow).
Nonetheless, manual methods like spreadsheet and version control tend to be more tedious, time-consuming, and less intuitive because you have to manually record your metrics, logs, hyper-parameters, etc. On the other hand, software packages/platforms have a robust structure that enables seamless traceability and reproducibility when running experiments. They can seamlessly support and integrate with frameworks like TensorFlow, SKlearn, and PyTorch; as such, they can intuitively take a record of the experiment runs — and even store models and artifacts in remote locations.
This article provides a good overview of what it entails to implement experiment tracking in ML. To learn more about experiment tracking, check out the following resources: