Learn Data Science With Mukeshram

Posts

Showing posts from June, 2026

Optimizing a Slow ML Inference API: Lessons Learned

June 29, 2026

When I started working with Inference, I quickly realized there was a gap between theory and what actually happens in practice. This post is about how i optimised a slow ml inference api. I'll walk you through what I learned, what tripped me up, and the lessons that stuck with me. No fluff — just honest notes from someone who went through it. Introduction to Optimizing a Slow ML Inference API I still recall the frustration of dealing with a slow ML inference API. The latency was unbearable, and it seemed like no matter what I did, I just couldn't get the performance I needed. But after weeks of trial and error, I finally managed to optimize the API and achieve significant improvements. In this article, I'll share my experience, the mistakes I made, and the lessons I learned along the way. The Initial Challenges When I first started working on the ML inference API, I was excited to see it in action. However, my enthusiasm was short-lived. The API was slow, and the laten...

Mastering Model Versioning with DVC and Git: Lessons from the Trenches

June 29, 2026

When I started working with DVC, I quickly realized there was a gap between theory and what actually happens in practice. This post is about my experience with model versioning using dvc and git. I'll walk you through what I learned, what tripped me up, and the lessons that stuck with me. No fluff — just honest notes from someone who went through it. Introduction to Model Versioning As I delved into the world of machine learning operations (MLOps), I quickly realized the importance of model versioning. Keeping track of changes to models, datasets, and training pipelines is crucial for reproducibility and collaboration. In this article, I'll share my experience with using DVC (Data Version Control) and Git for model versioning, highlighting the lessons I learned, the mistakes I made, and the best practices I discovered. What is DVC and How Does it Work? DVC is a tool that helps track large files, such as datasets and model artifacts, outside of Git. This is essential becaus...

Tackling Imbalanced Datasets in Classification Problems

June 28, 2026

When I started working with Imbalanced Data, I quickly realized there was a gap between theory and what actually happens in practice. This post is about how i handle imbalanced datasets in classification problems. I'll walk you through what I learned, what tripped me up, and the lessons that stuck with me. No fluff — just honest notes from someone who went through it. Introduction to Imbalanced Datasets I still remember the first time I encountered an imbalanced dataset in a classification problem. I was working on a fraud detection model, and my initial results showed a whopping 99 percent accuracy. Sounds great, right? But as I dug deeper, I realized that my model was predicting every single instance as non-fraud. The model was essentially useless, as it was unable to detect any fraudulent cases. This experience taught me a valuable lesson: accuracy is not always the best metric, especially when dealing with imbalanced datasets. The Problem with Imbalanced Datasets Imbalance...

Building End-to-End ML Pipelines with Kubeflow: Lessons Learned

June 28, 2026

When I started working with Kubeflow, I quickly realized there was a gap between theory and what actually happens in practice. This post is about building an end-to-end ml pipeline with kubeflow. I'll walk you through what I learned, what tripped me up, and the lessons that stuck with me. No fluff — just honest notes from someone who went through it. Introduction to Kubeflow Pipelines As I delved into the world of Machine Learning Operations (MLOps), I discovered the power of Kubeflow Pipelines in building end-to-end ML workflows. My journey was not without its challenges, but the lessons I learned along the way have been invaluable. In this article, I'll share my experiences, mistakes, and key takeaways from building ML pipelines with Kubeflow. What is Kubeflow Pipelines? Kubeflow Pipelines is a platform that allows you to define, deploy, and manage complex ML workflows. It turns each ML step into a containerized component, making it easy to manage and reuse pipeline comp...

Demystifying Model Predictions with SHAP Values

June 27, 2026

When I started working with SHAP, I quickly realized there was a gap between theory and what actually happens in practice. This post is about how i used shap values to understand what my model was actually doing. I'll walk you through what I learned, what tripped me up, and the lessons that stuck with me. No fluff — just honest notes from someone who went through it. Introduction to SHAP Values As a machine learning engineer, I've often found myself wondering what's driving my model's predictions. Are the features I've carefully selected truly influencing the outcomes, or is something else at play? I discovered the answer to this question when I started using SHAP values, a technique that has revolutionized the way I understand and debug my models. In this article, I'll share my experience with SHAP values, the lessons I learned, and the mistakes I made along the way. What are SHAP Values? SHAP (SHapley Additive exPlanations) values are a technique used to ...

Mastering PostgreSQL for Machine Learning: Lessons from the Trenches

June 27, 2026

When I started working with PostgreSQL, I quickly realized there was a gap between theory and what actually happens in practice. This post is about postgresql for ml engineers - storing features, predictions, and logs. I'll walk you through what I learned, what tripped me up, and the lessons that stuck with me. No fluff — just honest notes from someone who went through it. Introduction to PostgreSQL for ML Engineers As I delved into the world of machine learning (ML) engineering, I quickly realized the importance of a robust database management system. PostgreSQL, with its powerful features and flexibility, became my go-to choice for storing and managing ML-related data. In this article, I'll share my experiences, mistakes, and lessons learned from using PostgreSQL in ML projects, highlighting the benefits of using this database system for storing features, predictions, and logs. The Importance of Auditing and Debugging One of the most significant advantages of using a dat...

Navigating Model Drift: Lessons from the Trenches

June 26, 2026

When I started working with Drift, I quickly realized there was a gap between theory and what actually happens in practice. This post is about understanding model drift and setting up automated retraining. I'll walk you through what I learned, what tripped me up, and the lessons that stuck with me. No fluff — just honest notes from someone who went through it. Introduction to Model Drift As I delved into the world of machine learning operations (MLOps), I encountered a critical challenge that can make or break the performance of a model in production: model drift. It's a phenomenon where the underlying relationships between the input data and the predicted outputs change over time, causing the model's accuracy to degrade. My experience with model drift has been a journey of discovery, filled with mistakes, lessons learned, and a deeper understanding of how to navigate this complex issue. Concept Drift and Data Drift There are two primary types of model drift: concept d...

Structuring a Real-World Machine Learning Project from Scratch

June 26, 2026

When I started working with Project Structure, I quickly realized there was a gap between theory and what actually happens in practice. This post is about how i structured a real ml project from scratch. I'll walk you through what I learned, what tripped me up, and the lessons that stuck with me. No fluff — just honest notes from someone who went through it. Introduction to Machine Learning Project Structure I'll never forget my first machine learning project. I was excited to dive in and start building, but I made a critical mistake: I put all my code in a single, massive script. It wasn't long before I realized that this approach wouldn't scale, especially when I added a second teammate to the project. The script was cumbersome, difficult to navigate, and prone to errors. I learned the hard way that a well-structured project is essential for success in machine learning. As I worked through the challenges of building a machine learning project from scratch, I disco...

Mastering Background Tasks in Python with Celery and Redis

June 25, 2026

When I started working with Celery, I quickly realized there was a gap between theory and what actually happens in practice. This post is about running background tasks in python with celery and redis. I'll walk you through what I learned, what tripped me up, and the lessons that stuck with me. No fluff — just honest notes from someone who went through it. Introduction to Background Tasks As a developer, I've often found myself dealing with tasks that are too heavy to be handled within the request cycle of my web application. Whether it's sending emails, processing large datasets, or making API calls, these tasks can significantly slow down my application's response time. That's where Celery comes in – a distributed task queue that allows me to run background tasks asynchronously. In this article, I'll share my experience with Celery and Redis, highlighting the lessons I've learned and the challenges I've faced. Why Celery and Redis? I chose Celery ...

The Unspoken Truths of Feature Engineering: Lessons from the Trenches

June 25, 2026

When I started working with Feature Engineering, I quickly realized there was a gap between theory and what actually happens in practice. This post is about what i learned about feature engineering that no tutorial tells you. I'll walk you through what I learned, what tripped me up, and the lessons that stuck with me. No fluff — just honest notes from someone who went through it. Introduction to Feature Engineering As I reflect on my journey in machine learning, I've come to realize that feature engineering is often the unsung hero of a successful model. It's easy to get caught up in the latest algorithms and techniques, but at the end of the day, good features matter more than a fancy model. I've learned this the hard way, through trial and error, and I'm excited to share my experiences with you. One of the most important lessons I've learned is that domain knowledge beats any automated feature selection algorithm. There's no substitute for understandin...

Streamlining MLOps with GitHub Actions: My Journey to a Seamless CI/CD Pipeline

June 24, 2026

When I started working with GitHub Actions, I quickly realized there was a gap between theory and what actually happens in practice. This post is about how i set up a ci/cd pipeline for ml models using github actions. I'll walk you through what I learned, what tripped me up, and the lessons that stuck with me. No fluff — just honest notes from someone who went through it. Introduction to CI/CD Pipelines for ML Models As I delved into the world of Machine Learning Operations (MLOps), I quickly realized the importance of implementing a robust Continuous Integration/Continuous Deployment (CI/CD) pipeline. This wasn't just about automating repetitive tasks; it was about ensuring the reliability and consistency of our ML models. In this article, I'll share my experience of setting up a CI/CD pipeline using GitHub Actions, highlighting the lessons I learned, the challenges I faced, and the solutions I discovered. Getting Started with GitHub Actions One of the primary reasons...