Integrating MLflow with KubeFlow (Revised Edition)

MLflow—a robust open-source platform that simplifies the management of the machine learning lifecycle, including experimentation, reproducibility, and deployment. By integrating MLflow into Kubeflow, users can leverage MLflow’s intuitive UI and comprehensive model registry capabilities to enhance their machine learning workflows. In the modern enterprise landscape, the demand for streamlined and scalable Machine Learning Operations (MLOps) frameworks has never been greater. With increasing complexities in model development, tracking, deployment, and monitoring, organizations need tools that seamlessly integrate to ensure efficiency and reliability. MLflow and Kubeflow are two such tools that, when integrated, provide a robust end-to-end solution for managing machine learning workflows. MLflow excels in tracking experiments, managing model lifecycle, and maintaining a centralized model registry. On the other hand, Kubeflow offers scalable pipelines, distributed training capabilities, hyperparameter optimization, and production-grade model serving on Kubernetes. Together, these tools form a comprehensive framework for MLOps that supports continuous integration and deployment (CI/CD), enabling enterprises to automate workflows, improve collaboration between data science and engineering teams, and ensure models are delivered to production faster and with fewer errors. This tutorial will guide you through the detailed process of integrating MLflow and Kubeflow into an enterprise-level MLOps framework, focusing on scalability, reproducibility, and automation. This framework ensures: Scalability for high-demand ML workflows. Automation of CI/CD pipelines. Centralized tracking and monitoring. Part 1 The first step will be setting up a Database because if you want to use MLflow's tracking functionality with a relational database backend, you will need a PostgreSQL (or another supported database) instance. Here’s a breakdown of why and how to set it up: Why Use PostgreSQL with MLflow? Experiment Tracking: MLflow uses a backend store to log experiments, runs, parameters, metrics, and artifacts. A relational database like PostgreSQL is a robust option for this purpose. Scalability: Using a database allows you to efficiently manage and query large amounts of experiment data. Persistence: A database ensures that your experiment data is stored persistently, even if the MLflow server is restarted. Setting Up PostgreSQL for MLflow Step 1: Deploy PostgreSQL in Your Kubernetes Cluster You can deploy PostgreSQL using a Helm chart or a custom YAML configuration. Here’s a basic example using a Helm chart: Create MLflow namespace: kubectl create namespace mlflow Turn postgres password into base64: echo -n 'MyPostgresPass.!QAZ' | base64 Create a YAML file: apiVersion: v1 kind: Secret metadata: name: postgres-secret namespace: mlflow data: postgresql-password: TUxQbGF0Zm9ybTEyMzQuIVFBWg== --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: postgres-pvc namespace: mlflow spec: accessModes: - ReadWriteOnce resources: requests: storage: 5Gi --- apiVersion: apps/v1 kind: Deployment metadata: name: mlflow-postgres namespace: mlflow spec: replicas: 1 selector: matchLabels: app: mlflow-postgres template: metadata: labels: app: mlflow-postgres spec: containers: - name: postgres image: postgres:16 env: - name: POSTGRES_PASSWORD valueFrom: secretKeyRef: name: postgres-secret key: postgresql-password - name: POSTGRES_DB value: mlflow ports: - containerPort: 5432 resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m" volumeMounts: - name: postgres-storage mountPath: /var/lib/postgresql/data subPath: pgdata volumes: - name: postgres-storage persistentVolumeClaim: claimName: postgres-pvc --- apiVersion: v1 kind: Service metadata: name: mlflow-postgres namespace: mlflow spec: type: ClusterIP ports: - port: 5432 targetPort: 5432 selector: app: mlflow-postgres Apply: kubectl apply -f postgresql-deployment.yaml Create user and db on postgres for mlflow: To set up a PostgreSQL database for MLflow, you'll need to create a user, set a password, create a database, and grant the necessary permissions. Here’s how you can do it step by step in the PostgreSQL shell (psql): Step-by-Step Commands Log into PostgreSQL: First, log into your PostgreSQL server as a superuser (e.g., postgres): psql -U postgres Create a User: Replace mlflow and your_password with your desired username and password. CREATE USER mlflow WITH PASSWORD 'your_password';

May 8, 2025 - 17:52
 0
Integrating MLflow with KubeFlow (Revised Edition)

MLflow—a robust open-source platform that simplifies the management of the machine learning lifecycle, including experimentation, reproducibility, and deployment. By integrating MLflow into Kubeflow, users can leverage MLflow’s intuitive UI and comprehensive model registry capabilities to enhance their machine learning workflows.

In the modern enterprise landscape, the demand for streamlined and scalable Machine Learning Operations (MLOps) frameworks has never been greater. With increasing complexities in model development, tracking, deployment, and monitoring, organizations need tools that seamlessly integrate to ensure efficiency and reliability. MLflow and Kubeflow are two such tools that, when integrated, provide a robust end-to-end solution for managing machine learning workflows. MLflow excels in tracking experiments, managing model lifecycle, and maintaining a centralized model registry. On the other hand, Kubeflow offers scalable pipelines, distributed training capabilities, hyperparameter optimization, and production-grade model serving on Kubernetes. Together, these tools form a comprehensive framework for MLOps that supports continuous integration and deployment (CI/CD), enabling enterprises to automate workflows, improve collaboration between data science and engineering teams, and ensure models are delivered to production faster and with fewer errors. This tutorial will guide you through the detailed process of integrating MLflow and Kubeflow into an enterprise-level MLOps framework, focusing on scalability, reproducibility, and automation.

This framework ensures:

  1. Scalability for high-demand ML workflows.
  2. Automation of CI/CD pipelines.
  3. Centralized tracking and monitoring.

Part 1

The first step will be setting up a Database because if you want to use MLflow's tracking functionality with a relational database backend, you will need a PostgreSQL (or another supported database) instance. Here’s a breakdown of why and how to set it up:

Why Use PostgreSQL with MLflow?

  • Experiment Tracking: MLflow uses a backend store to log experiments, runs, parameters, metrics, and artifacts. A relational database like PostgreSQL is a robust option for this purpose.
  • Scalability: Using a database allows you to efficiently manage and query large amounts of experiment data.
  • Persistence: A database ensures that your experiment data is stored persistently, even if the MLflow server is restarted.

Setting Up PostgreSQL for MLflow

Step 1: Deploy PostgreSQL in Your Kubernetes Cluster

You can deploy PostgreSQL using a Helm chart or a custom YAML configuration. Here’s a basic example using a Helm chart:

  1. Create MLflow namespace:
    kubectl create namespace mlflow
  1. Turn postgres password into base64:
echo -n 'MyPostgresPass.!QAZ' | base64
  1. Create a YAML file:
apiVersion: v1
kind: Secret
metadata:
  name: postgres-secret
  namespace: mlflow
data:
  postgresql-password: TUxQbGF0Zm9ybTEyMzQuIVFBWg==
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
  namespace: mlflow
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mlflow-postgres
  namespace: mlflow
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mlflow-postgres
  template:
    metadata:
      labels:
        app: mlflow-postgres
    spec:
      containers:
      - name: postgres
        image: postgres:16
        env:
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: postgresql-password
        - name: POSTGRES_DB
          value: mlflow
        ports:
        - containerPort: 5432
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        volumeMounts:
        - name: postgres-storage
          mountPath: /var/lib/postgresql/data
          subPath: pgdata
      volumes:
      - name: postgres-storage
        persistentVolumeClaim:
          claimName: postgres-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: mlflow-postgres
  namespace: mlflow
spec:
  type: ClusterIP
  ports:
  - port: 5432
    targetPort: 5432
  selector:
    app: mlflow-postgres

  1. Apply:
kubectl apply -f postgresql-deployment.yaml
  1. Create user and db on postgres for mlflow:

To set up a PostgreSQL database for MLflow, you'll need to create a user, set a password, create a database, and grant the necessary permissions. Here’s how you can do it step by step in the PostgreSQL shell (psql):

Step-by-Step Commands

  1. Log into PostgreSQL:
    First, log into your PostgreSQL server as a superuser (e.g., postgres):

    psql -U postgres
    
  2. Create a User:
    Replace mlflow and your_password with your desired username and password.

    CREATE USER mlflow WITH PASSWORD 'your_password';
    
    
  3. Create a Database:
    Replace mlflow_db with your desired database name.

    CREATE DATABASE mlflow_db;
    
    
  4. Grant Permissions:
    Grant the necessary permissions to the user for the database:

    GRANT ALL PRIVILEGES ON DATABASE mlflow_db TO mlflow;
    
    
  5. Exit the PostgreSQL Shell:
    After executing the commands, you can exit the psql shell:

    \q
    
    

Summary of Commands

Putting it all together, here are the commands you would run in the PostgreSQL shell:

CREATE USER mlflow WITH PASSWORD 'your_password';
CREATE DATABASE mlflow_db;
GRANT ALL PRIVILEGES ON DATABASE mlflow_db TO mlflow;

Additional Considerations

  • Password Security: Make sure to use a strong password for your database user.
  • Database Connection: When configuring MLflow, use the following connection string format:

    postgresql://mlflow_user:your_password@:/mlflow_db
    
    

Replace and with your PostgreSQL server's address and port (default is 5432).

With these steps, you should have a PostgreSQL user and database set up for MLflow, ready for use!

Storage backend

When considering security for your MLflow setup, both Ceph and MinIO can be configured to be secure, but they have different security features and considerations. Here’s a comparison to help you decide which might be more appropriate for your use case:

Using Ceph

Pros:

  1. Robust Security Features: Ceph supports various security mechanisms, including:
    • Authentication: Ceph can use CephX for authentication, ensuring that only authorized clients can access the storage.
    • Encryption: Data can be encrypted both in transit (using TLS) and at rest.
    • Access Control: You can set fine-grained access control policies to restrict who can access specific buckets or objects.
  2. Scalability: Ceph is designed for scalability, making it suitable for large datasets and high availability.

Cons:

  1. Complexity: Setting up and managing Ceph can be more complex compared to simpler object storage solutions.
  2. Configuration Overhead: You may need to invest time in properly configuring security settings to ensure that your Ceph deployment is secure.

Using MinIO

Pros:

  1. S3 Compatibility: MinIO is compatible with the S3 API, making it easy to integrate with applications designed for S3 storage.
  2. Simplicity: MinIO is easier to set up and manage compared to Ceph, especially for smaller deployments.
  3. Built-in Security Features: MinIO provides:
    • Server-Side Encryption: You can enable server-side encryption for data at rest.
    • TLS Support: MinIO supports TLS for secure data transmission.
    • Access Policies: You can define bucket policies and user access controls.

Cons:

  1. Less Feature-Rich: While MinIO is secure and robust, it may not have the same level of advanced features and scalability as Ceph for very large deployments.

Security Recommendations

For Ceph:

  • Enable CephX Authentication: Ensure that you are using CephX for authentication.
  • Use TLS: Configure TLS for secure data transmission.
  • Regular Audits: Regularly audit your Ceph configuration and access logs to detect any unauthorized access.

For MinIO:

  • Enable TLS: Always use TLS to encrypt data in transit.
  • Use Strong Access Keys: Generate strong access and secret keys for your MinIO instance.
  • Set Bucket Policies: Define strict bucket policies to control access to your data.

Conclusion

Both Ceph and MinIO can be configured to be secure, but your choice may depend on your specific needs:

  • Choose Ceph if you need a highly scalable, feature-rich solution and are willing to manage its complexity.
  • Choose MinIO if you prefer a simpler, S3-compatible solution that is easy to set up and manage while still providing solid security features.

For this configuration, we prefer minio over ceph due to its simplicity and efficient resource allocation.

Step 1: Deploy MinIO in Your MLflow Namespace

In this scenario, we will utilize MinIO as the storage backend for MLflow to manage and store artifacts. When considering MinIO, we have two options:

  • Using Standalone MinIO
  • Using MinIO which comes with Kubeflow installation

Using Standalone MinIO (Skip this step if you want to use minio of kubeflow)

  • Pros:

    Isolation: Keeps MLflow and its storage independent, simplifying management.
    Customization: Allows for tailored configurations specific to MLflow needs.
    Version Control: Easier to manage updates and changes without affecting other components.

    • Cons:

    Resource Duplication: Requires additional resources and management overhead.
    Complexity: May complicate the deployment if not properly managed.

To install standalone Minio, follow steps below:

Step 1: Deploy MinIO in Your MLflow Namespace

Base64 Encode Your Keys:
The values for MINIO_ACCESS_KEY and MINIO_SECRET_KEY need to be base64 encoded. You can use the following command in your terminal:

echo -n 'myaccesskey' | base64
echo -n 'mysecretkey' | base64

Again, insert your Base64 encoded string into the secrets section as the values for MINIO_ACCESS_KEY and MINIO_SECRET_KEY entries in the minio-deploy.yaml file. This file includes a deployment, service, persistent volume claim (PVC), and secret. The image uses the latest version of MinIO, which utilizes MINIO_ROOT_USER and MINIO_ROOT_PASSWORD as environment variables for the admin user and password of the MinIO installation. Additionally, a separate port is configured for the console UI in this file, allowing access to the dashboard independently of the API port.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: minio
  namespace: mlflow
spec:
  replicas: 1
  selector:
    matchLabels:
      app: minio
  template:
    metadata:
      labels:
        app: minio
    spec:
      containers:
      - name: minio
        image: minio/minio
        args:
          - server
          - /data
          - --console-address # set console ui a dedicated port
          - ":9001"
        ports:
        - containerPort: 9000
        - containerPort: 9001
        env:
        - name: MINIO_ROOT_USER
          valueFrom:
            secretKeyRef:
              name: minio-credentials
              key: MINIO_ACCESS_KEY
        - name: MINIO_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: minio-credentials
              key: MINIO_SECRET_KEY
        - name: MINIO_CONSOLE_PORT
          value: "9001"
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
        volumeMounts:
        - name: minio-storage
          mountPath: /data # make minio storage persistent
      volumes:
      - name: minio-storage
        persistentVolumeClaim:
          claimName: minio-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: minio
  namespace: mlflow
spec:
  type: NodePort
  ports:
  - name: api
    port: 9000
    targetPort: 9000
  - name: ui
    port: 9001
    targetPort: 9001
  selector:
    app: minio
---
apiVersion: v1
kind: Secret
metadata:
  name: minio-credentials
  namespace: mlflow
type: Opaque
data:
  MINIO_ACCESS_KEY: EyMzQuIV
  MINIO_SECRET_KEY: TUxQbGF0Zm9yb
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: minio-pvc
  namespace: mlflow
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

Step 2: Access MinIO

Get the MinIO Service URL:

You can access MinIO using the service name within the Kubernetes cluster. If you are using port-forwarding for local access, you can do:
kubectl port-forward svc/minio -n mlflow 9001:9001

Step 3: Create a Bucket in MinIO

1. **Using the MinIO Console**:

    After logging in, you can create a bucket via the web interface.

2. **Using `mc` (MinIO Client)**:

    If you prefer to use the command line, you can [install](https://min.io/docs/minio/linux/reference/minio-mc.html#install-mc) `mc` and create bucket:
    ```bash
    kubectl port-forward svc/minio -n mlflow 9001:9001
    ```
    then create a bucket:
    ```bash
    mc alias set mlflow-minio http://localhost:9000  
    mc mb mlflow-minio/mlflow-bucket
    ```

Step 4. create user and policy, then assign policy to user

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "s3:GetBucketLocation",
        "s3:ListBucket"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::mlflow-bucket"
    },
    {
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:DeleteObject"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::mlflow-bucket/*"
    }
  ]
}

This policy enables basic read and write operations on the specified S3 bucket and its contents named mlflow-bucket.

  • The first statement allows the actions s3:GetBucketLocation and s3:ListBucket on the bucket itself, enabling the user to retrieve the bucket's location and list its contents.

  • The second statement permits the actions s3:PutObject, s3:GetObject, and s3:DeleteObject on all objects within the mlflow-bucket. This allows the user to upload, download, and delete objects stored in the bucket.

Step 5: Configure Istio

When using Istio in your Kubernetes cluster, you may need to consider Istio configurations for MinIO and MLflow to ensure proper traffic management, security, and observability.

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: minio-gateway
  namespace: mlflow
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 9000  # For MinIO API
      name: minio-api
      protocol: HTTP
    hosts:
    - "*"
  - port:
      number: 9001  # For MinIO Web UI
      name: minio-ui
      protocol: HTTP
    hosts:
    - "*"
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: minio
  namespace: mlflow
spec:
  hosts:
  - "*"
  gateways:
  - minio-gateway
  http:
  - match:
    - port: 9000  # Match for API requests
      uri:
        prefix: /
    route:
    - destination:
        host: minio
        port:
          number: 9000
  - match:
    - port: 9001  # Match for UI requests
      uri:
        prefix: /
    route:
    - destination:
        host: minio
        port:
          number: 9001
kubectl apply -f minio/minio-istio.yaml

Using MinIO which comes with Kubeflow installation

step 1: configure Network Policy

If you decided to use minio of kubeflow as your MLflow storage backend, you need to set Minio-service of kubeflow namespace in your MLflow configs.

also, there is a NetworkPolicy in KubeFlow namespace which only allows traffic to minio from two namespaces:

kubectl describe networkpolicy -n kubeflow minio
Name:         minio
Namespace:    kubeflow
Created on:   2025-04-28 14:20:07 +0330 +0330
Labels:       
Annotations:  
Spec:
  PodSelector:     app in (minio)
  Allowing ingress traffic:
    To Port:  (traffic allowed to all ports)
    From:
      NamespaceSelector: app.kubernetes.io/part-of in (kubeflow-profile)
    From:
      NamespaceSelector: kubernetes.io/metadata.name in (istio-system)
    From:
      PodSelector: 
  Not affecting egress traffic
  Policy Types: Ingress

because we will deploy MLflow in mlflow namespace in this scenario, it doesn’t match any of those From: sources, so its TCP connection to MinIO is dropped. we need to modify this network policy to allow connection between MLFlow and Minio.
we can apply changes using yaml or a patch:

Option 1:

kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: minio
  namespace: kubeflow
spec:
  podSelector:
    matchLabels:
      app: minio
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              app.kubernetes.io/part-of: kubeflow-profile
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: istio-system
        - namespaceSelector:      # NEW: allow mlflow namespace
            matchLabels:
              kubernetes.io/metadata.name: mlflow
      ports:
        - protocol: TCP
          port: 9000           # adjust if your MinIO listens on a different port
EOF

Option 2: Patch command

kubectl patch networkpolicy minio -n kubeflow --type='json' -p='[
  {
    "op": "add",
    "path": "/spec/ingress/0/from/-",
    "value": {
      "namespaceSelector": {
        "matchLabels": {
          "kubernetes.io/metadata.name": "mlflow"
        }
      }
    }
  }
]'

step 2: Create a Bucket in MinIO

    **Using `mc` (MinIO Client)**:

    [install](https://min.io/docs/minio/linux/reference/minio-mc.html#install-mc) `mc` and port forward Minio service:
    ```bash
    kubectl port-forward svc/minio-service -n kubeflow 9000:9000
    ```
    then create a bucket:
    ```bash
    mc alias set minio-kf http://localhost:9000  
    mc mb minio-kf/mlflow-bucket
    ```

MLFLOW

Does MLflow Need MinIO?

MLflow does not strictly require MinIO; however, it does need a storage backend to store artifacts and models. Here are some options:

  1. Local File Storage: You can use local paths to store artifacts, but this is not recommended for production environments due to scalability and persistence issues.
  2. Object Storage:
    • MinIO: If you prefer using an S3-compatible object storage service, MinIO is a popular choice for Kubernetes environments. It’s lightweight and easy to deploy.
    • Amazon S3: If you have access to AWS, you can use S3 directly.
    • Ceph Object Storage: Since you have a Ceph cluster, you can use it as an object storage backend. Ceph provides an S3-compatible interface, allowing you to use it similarly to MinIO or AWS S3.
  3. Database Storage: MLflow can also log to a relational database (e.g., PostgreSQL, MySQL) for tracking experiments.

Setting Up MLflow

We will start by creating a Dockerfile. This step is essential because the default MLflow image lacks the boto3 and psycopg2-binary packages, which are necessary for connecting MLflow to MinIO and PostgreSQL:

FROM ghcr.io/mlflow/mlflow:latest

RUN pip install psycopg2-binary boto3

CMD ["mlflow", "server"]

Then build:

docker build -t prezaei/mlflow-custom:v1.0 .

And deploy MLflow on Kubernetes by creating your own deployment YAML files.

note that because Kubernetes does not do env var substitution inside value: fields — it only sets them as independent environment variables. So $(POSTGRES_PASSWORD) will literally be interpreted as the string "$(POSTGRES_PASSWORD)", not the actual password. so we can not use env value like this:


name: BACKEND_STORE_URI
value: "postgresql+psycopg2://mlflow:$(POSTGRES_PASSWORD)@mlflow-postgres:5432/mlflow_db"

To fix it, you should construct the full URI inside the container, using environment variables.
change your args: to construct the URI inside the container, like this:

command: ["sh", "-c"]
args:
  - |
    mlflow server \
      --host=0.0.0.0 \
      --port=5000 \
      --backend-store-uri=postgresql+psycopg2://mlflow:${POSTGRES_PASSWORD}@mlflow-postgres:5432/mlflow_db \
      --default-artifact-root=s3://mlflow-bucket

Here’s a basic example using a deployment:

apiVersion: v1
kind: Service
metadata:
  name: mlflow-service
  namespace: mlflow
spec:
  selector:
    app: mlflow
  ports:
    - protocol: TCP
      port: 5000
      targetPort: 5000
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: mlflow-sa
  namespace: mlflow
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mlflow
  namespace: mlflow
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mlflow
  template:
    metadata:
      labels:
        app: mlflow
    spec:
      serviceAccountName: mlflow-sa
      containers:
      - name: mlflow
        image: prezaei/mlflow-custom:v1.0
        ports:
          - containerPort: 5000
        env:
          - name: BACKEND_STORE_URI
            value: "postgresql+psycopg2://mlflow@mlflow-postgres:5432/mlflow_db"
          - name: POSTGRES_PASSWORD
            valueFrom:
              secretKeyRef:
                name: mlflow-secret
                key: POSTGRES_MLFLOW_PASS
          - name: MLFLOW_S3_ENDPOINT_URL
            value: "http://minio.mlflow.svc.cluster.local:9000"
          - name: AWS_S3_ADDRESSING_STYLE
            value: "path"
          - name: AWS_ACCESS_KEY_ID
            valueFrom:
              secretKeyRef:
                name: mlflow-secret
                key: AWS_ACCESS_KEY_ID
          - name: AWS_SECRET_ACCESS_KEY
            valueFrom:
              secretKeyRef:
                name: mlflow-secret
                key: AWS_SECRET_ACCESS_KEY
        command: ["sh", "-c"]
        args:
          - |
            mlflow server \
              --host=0.0.0.0 \
              --port=5000 \
              --backend-store-uri=postgresql+psycopg2://mlflow:${POSTGRES_PASSWORD}@mlflow-postgres:5432/mlflow_db \
              --default-artifact-root=s3://mlflow-bucket \
              --artifacts-destination s3://mlflow-bucket
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "4Gi"
            cpu: "2"

And a secret:

apiVersion: v1
kind: Secret
metadata:
  name: mlflow-secret
  namespace: mlflow
type: Opaque
data:
  AWS_ACCESS_KEY_ID: bWxmbG93
  AWS_SECRET_ACCESS_KEY: VGsvUEFJa1I5fkxZbVp
  POSTGRES_MLFLOW_PASS: QXliRmoxVFdhMW

Istio

When using Istio in your Kubernetes cluster, you may need to consider Istio configurations for MinIO and MLflow to ensure proper traffic management, security, and observability. Here’s a breakdown of what you might need:

Configure MLflow with Istio

If you are also exposing MLflow outside the cluster or want to manage traffic to it, you should similarly set up an Istio Virtual Service for MLflow.

Example Configuration for MLflow

  1. Create a Virtual Service for MLflow:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: mlflow
  namespace: mlflow
spec:
  gateways:
    - kubeflow/kubeflow-gateway
  hosts:
    - '*'
  http:
    - match:
        - uri:
            prefix: /mlflow/ # match any request with a URI that starts with /mlflow/
      rewrite:
        uri: / #requests matching /mlflow/ are rewritten to /, routing them to the root of the mlflow service
      route:
        - destination:
            host: mlflow-service.mlflow.svc.cluster.local
            port:
              number: 5000
    - match:
        - uri:
            prefix: /graphql
      rewrite:
        uri: /graphql
      route:
        - destination:
            host: mlflow-service.mlflow.svc.cluster.local
            port:
              number: 5000

We configured settings to allow access to the MLflow UI at kubeflow.mydomain.com/mlflow/. However, when selecting run details in the MLflow UI, a 404 HTTP error code is encountered due to issues with the /graphql section. The /graphql prefix is responsible for handling backend GraphQL API requests, which are utilized by the Kubeflow UI to interact with MLflow.

  1. Apply the Configurations:
kubectl apply -f mlflow-virtualservice.yaml

Next, we need to integrate an MLflow tab into the central dashboard of Kubeflow. So we will modify the ConfigMap for Kubeflow's dashboard to make MLflow visible:

kubectl edit cm centraldashboard-config -n kubeflow

and adding this config in menuLinks section:

            { 
                "type": "item",
                "link": "/mlflow/",
                "text": "MlFlow",
                "icon": "icons:cached"
            },

Restarting the central dashboard deployment will result in the tab being added.

kubectl rollout restart deploy centraldashboard -n kubeflow

Part 2

Nice work getting MLflow into Kubeflow! Now let’s walk through a detailed guide on how to test the integration. The goal is to verify that MLflow is working smoothly within the Kubeflow environment—logging experiments, models, parameters, and metrics. Here's how you can do it step by step:

✅ 1. Decide Where to Run the Code

To best test the integration, you should run the MLflow code inside Kubeflow Notebooks (e.g., a Jupyter Notebook in a Kubeflow workspace). This ensures that:

  • You're using the same Kubernetes network.
  • MLflow client talks directly to the MLflow tracking server you integrated.
  • Any paths (e.g., artifact store, model registry) resolve correctly within the cluster.