<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>DevOps on Thomas</title>
        <link>https://blog.thomasplantain.fr/tags/devops/</link>
        <description>Recent content in DevOps on Thomas</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>fr-fr</language>
        <lastBuildDate>Fri, 13 Feb 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://blog.thomasplantain.fr/tags/devops/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>MLOps: Bridging Machine Learning and Operations</title>
        <link>https://blog.thomasplantain.fr/post/mlops/</link>
        <pubDate>Fri, 13 Feb 2026 00:00:00 +0000</pubDate>
        
        <guid>https://blog.thomasplantain.fr/post/mlops/</guid>
        <description>&lt;img src="https://blog.thomasplantain.fr/img/mlops.png" alt="Featured image of post MLOps: Bridging Machine Learning and Operations" /&gt;&lt;h2 id=&#34;what-is-mlops&#34;&gt;What is MLOps?
&lt;/h2&gt;&lt;p&gt;MLOps (Machine Learning Operations) is a set of practices that combines machine learning, DevOps, and data engineering to streamline the development, deployment, and maintenance of machine learning systems in production.&lt;/p&gt;
&lt;p&gt;Think of it as applying software engineering best practices to machine learning workflows—but with additional complexity due to data, model versioning, and continuous retraining requirements.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;the-mlops-landscape&#34;&gt;The MLOps Landscape
&lt;/h2&gt;&lt;h3 id=&#34;core-pillars&#34;&gt;Core Pillars
&lt;/h3&gt;&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;┌─────────────────────────────────────────┐
│          ML Model Development           │
│  (Experimentation, Training, Tuning)    │
└────────────────┬────────────────────────┘
                 │
┌────────────────┴────────────────────────┐
│     Model Validation &amp;amp; Testing          │
│  (Metrics, A/B Testing, Performance)    │
└────────────────┬────────────────────────┘
                 │
┌────────────────┴────────────────────────┐
│      Model Deployment &amp;amp; Serving         │
│  (Containerization, APIs, Scaling)      │
└────────────────┬────────────────────────┘
                 │
┌────────────────┴────────────────────────┐
│   Monitoring, Retraining &amp;amp; Governance   │
│  (Data Drift, Performance, Compliance)  │
└─────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;&lt;hr&gt;
&lt;h2 id=&#34;key-components&#34;&gt;Key Components
&lt;/h2&gt;&lt;h3 id=&#34;1-data-management&#34;&gt;1. &lt;strong&gt;Data Management&lt;/strong&gt;
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Data Pipeline&lt;/strong&gt;: Automated extraction, transformation, and loading (ETL)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data Versioning&lt;/strong&gt;: Track datasets like code (DVC, Delta Lake)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data Quality&lt;/strong&gt;: Validation, schema enforcement, anomaly detection&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Feature Store&lt;/strong&gt;: Centralized feature management and serving&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;2-model-development&#34;&gt;2. &lt;strong&gt;Model Development&lt;/strong&gt;
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Experimentation Tracking&lt;/strong&gt;: Log hyperparameters, metrics, artifacts (MLflow, Weights &amp;amp; Biases)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model Registry&lt;/strong&gt;: Version control for trained models&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reproducibility&lt;/strong&gt;: Environment specifications, random seeds, documentation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Collaboration&lt;/strong&gt;: Shared resources for data scientists and engineers&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;3-model-validation&#34;&gt;3. &lt;strong&gt;Model Validation&lt;/strong&gt;
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Unit Testing&lt;/strong&gt;: Data and model logic validation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Integration Testing&lt;/strong&gt;: End-to-end pipeline testing&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model Evaluation&lt;/strong&gt;: Performance metrics on holdout sets&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A/B Testing&lt;/strong&gt;: Compare models in production with real user traffic&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fairness &amp;amp; Bias Detection&lt;/strong&gt;: Ensure model equity across demographics&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;4-deployment--serving&#34;&gt;4. &lt;strong&gt;Deployment &amp;amp; Serving&lt;/strong&gt;
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Containerization&lt;/strong&gt;: Docker for reproducible environments&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Orchestration&lt;/strong&gt;: Kubernetes for managing deployments at scale&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model Serving&lt;/strong&gt;: REST APIs, gRPC, batch inference, real-time serving&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Version Management&lt;/strong&gt;: Blue-green deployments, canary releases, rollback capabilities&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;5-monitoring--governance&#34;&gt;5. &lt;strong&gt;Monitoring &amp;amp; Governance&lt;/strong&gt;
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Performance Monitoring&lt;/strong&gt;: Track model predictions, latency, throughput&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data Drift Detection&lt;/strong&gt;: Alert when input distributions change&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model Decay&lt;/strong&gt;: Monitor accuracy degradation over time&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Audit Trails&lt;/strong&gt;: Compliance logging for regulated industries&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost Optimization&lt;/strong&gt;: Track infrastructure and compute costs&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;mlops-workflow&#34;&gt;MLOps Workflow
&lt;/h2&gt;&lt;h3 id=&#34;from-experimentation-to-production&#34;&gt;From Experimentation to Production
&lt;/h3&gt;&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;Notebook Experimentation
        ↓
Feature Engineering &amp;amp; Versioning
        ↓
Model Training Pipeline (Automated)
        ↓
Model Validation &amp;amp; Testing
        ↓
Model Registry &amp;amp; Packaging
        ↓
Containerization &amp;amp; Artifact Storage
        ↓
Deployment to Staging Environment
        ↓
A/B Testing &amp;amp; Validation
        ↓
Production Deployment
        ↓
Monitoring &amp;amp; Alerting
        ↓
Retraining Triggers (Data/Model Drift)
        ↓
Loop Back to Training
&lt;/code&gt;&lt;/pre&gt;&lt;hr&gt;
&lt;h2 id=&#34;popular-mlops-tools&#34;&gt;Popular MLOps Tools
&lt;/h2&gt;&lt;h3 id=&#34;experiment-tracking--model-management&#34;&gt;Experiment Tracking &amp;amp; Model Management
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;MLflow&lt;/strong&gt;: Open-source platform for managing ML lifecycle&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Weights &amp;amp; Biases&lt;/strong&gt;: Cloud-based experiment tracking and hyperparameter optimization&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Neptune.ai&lt;/strong&gt;: Lightweight experiment tracking for teams&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;data-management&#34;&gt;Data Management
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;DVC&lt;/strong&gt; (Data Version Control): Version control for data and pipelines&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Apache Airflow&lt;/strong&gt;: Workflow orchestration and DAG scheduling&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Prefect&lt;/strong&gt;: Modern data orchestration platform&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Delta Lake&lt;/strong&gt;: ACID transactions for data lakes&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;model-serving&#34;&gt;Model Serving
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;TensorFlow Serving&lt;/strong&gt;: Specialized serving for TensorFlow models&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Seldon Core&lt;/strong&gt;: Open-source model serving platform on Kubernetes&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;BentoML&lt;/strong&gt;: Framework for packaging and deploying ML models&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ray Serve&lt;/strong&gt;: Distributed model serving framework&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;monitoring--observability&#34;&gt;Monitoring &amp;amp; Observability
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Datadog&lt;/strong&gt;: Infrastructure and APM monitoring&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Prometheus + Grafana&lt;/strong&gt;: Metrics collection and visualization&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;WhyLabs&lt;/strong&gt;: ML model monitoring and data quality&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Arize&lt;/strong&gt;: ML model monitoring and explainability&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;infrastructure--operations&#34;&gt;Infrastructure &amp;amp; Operations
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Kubernetes&lt;/strong&gt;: Container orchestration&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Docker&lt;/strong&gt;: Containerization&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Terraform&lt;/strong&gt;: Infrastructure as Code&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Jenkins / GitLab CI / GitHub Actions&lt;/strong&gt;: CI/CD pipelines&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;mlops-best-practices&#34;&gt;MLOps Best Practices
&lt;/h2&gt;&lt;h3 id=&#34;development&#34;&gt;Development
&lt;/h3&gt;&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Treat data like code&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Version datasets and transformations&lt;/li&gt;
&lt;li&gt;Automate data pipeline testing&lt;/li&gt;
&lt;li&gt;Document data lineage&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Reproducibility First&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lock dependencies (requirements.txt, environment.yml)&lt;/li&gt;
&lt;li&gt;Document model training steps&lt;/li&gt;
&lt;li&gt;Store random seeds and hyperparameters&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Modular Design&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Separate feature engineering from model training&lt;/li&gt;
&lt;li&gt;Use configuration files (YAML, JSON) for hyperparameters&lt;/li&gt;
&lt;li&gt;Build reusable components&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;testing&#34;&gt;Testing
&lt;/h3&gt;&lt;ol start=&#34;4&#34;&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Comprehensive Testing Strategy&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Unit tests for data transformations&lt;/li&gt;
&lt;li&gt;Integration tests for full pipelines&lt;/li&gt;
&lt;li&gt;Model performance tests against baselines&lt;/li&gt;
&lt;li&gt;Test edge cases and adversarial inputs&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Automated Validation&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Schema validation for inputs and outputs&lt;/li&gt;
&lt;li&gt;Range checks for features&lt;/li&gt;
&lt;li&gt;Bias and fairness audits&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;deployment&#34;&gt;Deployment
&lt;/h3&gt;&lt;ol start=&#34;6&#34;&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Infrastructure as Code&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Define environments declaratively&lt;/li&gt;
&lt;li&gt;Version control infrastructure changes&lt;/li&gt;
&lt;li&gt;Automate provisioning&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Progressive Rollouts&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Start with canary deployments (1-5% traffic)&lt;/li&gt;
&lt;li&gt;Monitor metrics closely during rollout&lt;/li&gt;
&lt;li&gt;Have automated rollback mechanisms&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;monitoring&#34;&gt;Monitoring
&lt;/h3&gt;&lt;ol start=&#34;8&#34;&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Log predictions and features&lt;/li&gt;
&lt;li&gt;Monitor model accuracy in real time&lt;/li&gt;
&lt;li&gt;Track data distributions for drift detection&lt;/li&gt;
&lt;li&gt;Set up alerts for anomalies&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Feedback Loops&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Capture true labels as they become available&lt;/li&gt;
&lt;li&gt;Use actuals to retrain models&lt;/li&gt;
&lt;li&gt;Monitor feedback quality&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;organization&#34;&gt;Organization
&lt;/h3&gt;&lt;ol start=&#34;10&#34;&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Clear Ownership&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Define roles: data engineer, ML engineer, ML ops engineer&lt;/li&gt;
&lt;li&gt;Establish SLOs (Service Level Objectives) for models&lt;/li&gt;
&lt;li&gt;Document runbooks for common issues&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Governance &amp;amp; Compliance&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Audit trail for model decisions&lt;/li&gt;
&lt;li&gt;Explainability/interpretability requirements&lt;/li&gt;
&lt;li&gt;Data privacy and regulatory compliance&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h2 id=&#34;real-world-challenges&#34;&gt;Real-World Challenges
&lt;/h2&gt;&lt;h3 id=&#34;technical-challenges&#34;&gt;Technical Challenges
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Data Quality&lt;/strong&gt;: Garbage in, garbage out&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model Complexity&lt;/strong&gt;: Balancing accuracy vs. interpretability vs. latency&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalability&lt;/strong&gt;: Handling millions of predictions per second&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reproducibility&lt;/strong&gt;: ML experiments are inherently non-deterministic&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;organizational-challenges&#34;&gt;Organizational Challenges
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Silos&lt;/strong&gt;: Data science isolated from engineering&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Skills Gap&lt;/strong&gt;: Few engineers understand both ML and infrastructure&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Time to Market&lt;/strong&gt;: Experimentation cycles are long&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost Control&lt;/strong&gt;: Compute resources can quickly become expensive&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;getting-started-with-mlops&#34;&gt;Getting Started with MLOps
&lt;/h2&gt;&lt;h3 id=&#34;level-1-manual-processes&#34;&gt;Level 1: Manual Processes
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Jupyter notebooks for experimentation&lt;/li&gt;
&lt;li&gt;Manual model files and version tracking&lt;/li&gt;
&lt;li&gt;Basic monitoring with logs&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;level-2-automated-pipelines&#34;&gt;Level 2: Automated Pipelines
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Automated training pipelines with cron jobs&lt;/li&gt;
&lt;li&gt;Version control for code and models&lt;/li&gt;
&lt;li&gt;Basic monitoring dashboards&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;level-3-continuous-integration&#34;&gt;Level 3: Continuous Integration
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;Automated testing on code changes&lt;/li&gt;
&lt;li&gt;CI/CD pipelines for model training and deployment&lt;/li&gt;
&lt;li&gt;Experiment tracking and model registries&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;level-4-full-mlops-maturity&#34;&gt;Level 4: Full MLOps Maturity
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;End-to-end automation and orchestration&lt;/li&gt;
&lt;li&gt;Advanced monitoring with drift detection&lt;/li&gt;
&lt;li&gt;Automated retraining triggers&lt;/li&gt;
&lt;li&gt;Multi-model experimentation and A/B testing&lt;/li&gt;
&lt;li&gt;Governance and audit trails&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;MLOps is essential for scaling machine learning from experimentation to reliable, production systems. It bridges the gap between data science innovation and operational stability.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Takeaways:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MLOps combines ML, DevOps, and data engineering practices&lt;/li&gt;
&lt;li&gt;Success requires automation at every stage&lt;/li&gt;
&lt;li&gt;Monitoring and feedback loops are critical&lt;/li&gt;
&lt;li&gt;Start simple and mature gradually&lt;/li&gt;
&lt;li&gt;Team collaboration and clear processes matter as much as tools&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The goal isn&amp;rsquo;t perfect tooling—it&amp;rsquo;s sustainable, scalable ML systems that deliver business value.&lt;/p&gt;
</description>
        </item>
        
    </channel>
</rss>
