π ️ MLflow in Action: Experiment, Compare & Version ML Models ππ€ | Sagar Kakkala’s World πππ
How MLflow helps?
In this session, we will discuss about MLflow, a tool that is to be known in MLOps concept, before we deep dive into concepts
let us understand this how this tool can be helpful with an example.
let us say, you are running a Diary Farm and you manufacture different products like curd, cheese, milk and you have a Data scientist here who tries to figure out what Model is best for you using statistics.
let us say, he has data of last 3 year sales and now he wants to use an ML Model which is best suited to read data and predict sales for next year so that you can have exact idea of how many products to be manufactured
and to achieve this your Data scientist need to train different models using different statistics, also where he can have design different models, and also train same model with different data using versioning, compare results of training from each model and you can think of it more like a playground to train models and evaluate best model, this is where Mlflow comes into picture
Mlflow helps Data scientist track, manage and evaluate different ML models
we will understand the above scenario better with an example
MLflow Demo
pip install mlflow
!mlflow ui
pip install scikit-learn
import mlflow
import mlflow.sklearn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
# ---------------------------
# 1. Generate Synthetic Sales Data
# ---------------------------
np.random.seed(42)
months = np.arange(1, 37) # 36 months (3 years)
products = ["Milk", "Curd", "Cheese"]
sales_data = {}
for product in products:
trend = months * np.random.randint(3, 8) # random growth trend
seasonality = 200 * np.sin(2 * np.pi * months / 12) # seasonal pattern
noise = np.random.normal(0, 50, 36) # noise
sales = 1000 + trend + seasonality + noise
sales_data[product] = sales
df = pd.DataFrame(sales_data)
df["month_number"] = months
# ---------------------------
# 2. MLflow Setup
# ---------------------------
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("Multi-Product Dairy Sales Forecasting")
# ---------------------------
# 3. Function to Train & Log Models
# ---------------------------
def log_model(X_train, X_test, y_train, y_test, model, model_name, product_name):
with mlflow.start_run(run_name=f"{product_name} - {model_name}"):
# Train model
model.fit(X_train, y_train)
preds = model.predict(X_test)
# Metrics
mse = mean_squared_error(y_test, preds)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, preds)
r2 = r2_score(y_test, preds)
mlflow.log_metric("mse", mse)
mlflow.log_metric("rmse", rmse)
mlflow.log_metric("mae", mae)
mlflow.log_metric("r2", r2)
# Log parameters (if available)
try:
mlflow.log_params(model.get_params())
except:
pass
# Plot Actual vs Predicted
plt.figure(figsize=(8,5))
plt.plot(y_test.values, label="Actual Sales", marker='o')
plt.plot(preds, label="Predicted Sales", marker='x')
plt.title(f"{product_name} - {model_name} Actual vs Predicted")
plt.xlabel("Sample")
plt.ylabel("Sales")
plt.legend()
plt.tight_layout()
plot_file = f"{product_name}_{model_name}_plot.png"
plt.savefig(plot_file)
mlflow.log_artifact(plot_file)
plt.close()
# Log model (updated to use `name` instead of deprecated `artifact_path`)
mlflow.sklearn.log_model(model, name=f"{product_name}_{model_name}")
print(f"{product_name} - {model_name} → R2: {r2:.3f}, RMSE: {rmse:.2f}, MAE: {mae:.2f}")
# ---------------------------
# 4. Train & Log Models for Each Product
# ---------------------------
for product in products:
X = df[["month_number"]]
y = df[product]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
log_model(X_train, X_test, y_train, y_test, LinearRegression(), "Linear Regression", product)
log_model(X_train, X_test, y_train, y_test, DecisionTreeRegressor(), "Decision Tree", product)
log_model(X_train, X_test, y_train, y_test, RandomForestRegressor(), "Random Forest", product)
print("\nAll experiments completed! Check MLflow UI for results at http://localhost:5000")















Comments
Post a Comment