Posts

Optimizing a Slow ML Inference API: Lessons Learned

When I started working with Inference, I quickly realized there was a gap between theory and what actually happens in practice. This post is about how i optimised a slow ml inference api. I'll walk you through what I learned, what tripped me up, and the lessons that stuck with me. No fluff — just honest notes from someone who went through it. Introduction to Optimizing a Slow ML Inference API I still recall the frustration of dealing with a slow ML inference API. The latency was unbearable, and it seemed like no matter what I did, I just couldn't get the performance I needed. But after weeks of trial and error, I finally managed to optimize the API and achieve significant improvements. In this article, I'll share my experience, the mistakes I made, and the lessons I learned along the way. The Initial Challenges When I first started working on the ML inference API, I was excited to see it in action. However, my enthusiasm was short-lived. The API was slow, and the laten...

Mastering Model Versioning with DVC and Git: Lessons from the Trenches

When I started working with DVC, I quickly realized there was a gap between theory and what actually happens in practice. This post is about my experience with model versioning using dvc and git. I'll walk you through what I learned, what tripped me up, and the lessons that stuck with me. No fluff — just honest notes from someone who went through it. Introduction to Model Versioning As I delved into the world of machine learning operations (MLOps), I quickly realized the importance of model versioning. Keeping track of changes to models, datasets, and training pipelines is crucial for reproducibility and collaboration. In this article, I'll share my experience with using DVC (Data Version Control) and Git for model versioning, highlighting the lessons I learned, the mistakes I made, and the best practices I discovered. What is DVC and How Does it Work? DVC is a tool that helps track large files, such as datasets and model artifacts, outside of Git. This is essential becaus...

Tackling Imbalanced Datasets in Classification Problems

When I started working with Imbalanced Data, I quickly realized there was a gap between theory and what actually happens in practice. This post is about how i handle imbalanced datasets in classification problems. I'll walk you through what I learned, what tripped me up, and the lessons that stuck with me. No fluff — just honest notes from someone who went through it. Introduction to Imbalanced Datasets I still remember the first time I encountered an imbalanced dataset in a classification problem. I was working on a fraud detection model, and my initial results showed a whopping 99 percent accuracy. Sounds great, right? But as I dug deeper, I realized that my model was predicting every single instance as non-fraud. The model was essentially useless, as it was unable to detect any fraudulent cases. This experience taught me a valuable lesson: accuracy is not always the best metric, especially when dealing with imbalanced datasets. The Problem with Imbalanced Datasets Imbalance...

Building End-to-End ML Pipelines with Kubeflow: Lessons Learned

When I started working with Kubeflow, I quickly realized there was a gap between theory and what actually happens in practice. This post is about building an end-to-end ml pipeline with kubeflow. I'll walk you through what I learned, what tripped me up, and the lessons that stuck with me. No fluff — just honest notes from someone who went through it. Introduction to Kubeflow Pipelines As I delved into the world of Machine Learning Operations (MLOps), I discovered the power of Kubeflow Pipelines in building end-to-end ML workflows. My journey was not without its challenges, but the lessons I learned along the way have been invaluable. In this article, I'll share my experiences, mistakes, and key takeaways from building ML pipelines with Kubeflow. What is Kubeflow Pipelines? Kubeflow Pipelines is a platform that allows you to define, deploy, and manage complex ML workflows. It turns each ML step into a containerized component, making it easy to manage and reuse pipeline comp...

Demystifying Model Predictions with SHAP Values

When I started working with SHAP, I quickly realized there was a gap between theory and what actually happens in practice. This post is about how i used shap values to understand what my model was actually doing. I'll walk you through what I learned, what tripped me up, and the lessons that stuck with me. No fluff — just honest notes from someone who went through it. Introduction to SHAP Values As a machine learning engineer, I've often found myself wondering what's driving my model's predictions. Are the features I've carefully selected truly influencing the outcomes, or is something else at play? I discovered the answer to this question when I started using SHAP values, a technique that has revolutionized the way I understand and debug my models. In this article, I'll share my experience with SHAP values, the lessons I learned, and the mistakes I made along the way. What are SHAP Values? SHAP (SHapley Additive exPlanations) values are a technique used to ...

Mastering PostgreSQL for Machine Learning: Lessons from the Trenches

When I started working with PostgreSQL, I quickly realized there was a gap between theory and what actually happens in practice. This post is about postgresql for ml engineers - storing features, predictions, and logs. I'll walk you through what I learned, what tripped me up, and the lessons that stuck with me. No fluff — just honest notes from someone who went through it. Introduction to PostgreSQL for ML Engineers As I delved into the world of machine learning (ML) engineering, I quickly realized the importance of a robust database management system. PostgreSQL, with its powerful features and flexibility, became my go-to choice for storing and managing ML-related data. In this article, I'll share my experiences, mistakes, and lessons learned from using PostgreSQL in ML projects, highlighting the benefits of using this database system for storing features, predictions, and logs. The Importance of Auditing and Debugging One of the most significant advantages of using a dat...

Navigating Model Drift: Lessons from the Trenches

When I started working with Drift, I quickly realized there was a gap between theory and what actually happens in practice. This post is about understanding model drift and setting up automated retraining. I'll walk you through what I learned, what tripped me up, and the lessons that stuck with me. No fluff — just honest notes from someone who went through it. Introduction to Model Drift As I delved into the world of machine learning operations (MLOps), I encountered a critical challenge that can make or break the performance of a model in production: model drift. It's a phenomenon where the underlying relationships between the input data and the predicted outputs change over time, causing the model's accuracy to degrade. My experience with model drift has been a journey of discovery, filled with mistakes, lessons learned, and a deeper understanding of how to navigate this complex issue. Concept Drift and Data Drift There are two primary types of model drift: concept d...