Testimonial - Collect and embed testimonials in minutes

Raphaël Hoogvliets

Tech Lead

ML monitoring should not be centered around data drift alone! There are two important problems with univariate data drift detection: 1. Lack of context: univariate data drift monitoring does not take into account the relationship between the variable being monitored and other variables in the system. The relationship between these variables might be complex. 2. Sensitivity to outliers: univariate data drift monitoring may be sensitive to outliers This leads to univariate data drift detection causing way too many false alarms, which is terrible. Imagine getting called to your ML solution every time saying something is wrong, when in fact nothing is wrong at all. What happens? Yes, alarm fatigue. You start paying less attention, or stop paying attention at all. Until, at one point, your model starts explicitly hurting the business. You then wake up, and it's too late. NannyML invented a sweet alternative called Direct Loss Estimation. It predicts the possible bandwidth of your predictions, in production. When your predictions go out of bounds, you get notified. And this is where we get back to good old univariate data drift detection! We can now use it as one of the tools for root cause analysis. Finding out why your model has degraded. Learn more about this in the blog by Santiago Viquez down here ⬇ #datascience #machinelearning #mlops

Jun 27, 2024

Ilia Ekhlakov

Senior Business Data Scientist @ Wrike

Great thanks to #nannyml for their amazing algorithms! It turns out they have algorithms that can accurately evaluate performance of both classification (M-CBPE) and regression (DLE) models on new data, even before the target values are available. I'll add links with detailed descriptions in the comments. While here I'd like to discuss why is this so great. The importance of having such algorithms for monitoring models in production has been brilliantly explained by #nannyml themselves. But this is only the beginning of the list of their possible successful use cases. Let me share my proposals on a few more of them. In one of my previous posts, I touched upon the problem of feature selection when there is a long time gap between the freshest data for which a target is available and the data coming in for inference. Traditional feature selection methods based on the impact of adding/dropping a feature on validation metrics are not very useful if the validation data is significantly different from the inference data. That time my suggestion was to use the Population Stability Index (PSI) to detect features with the most significant distribution shift and to take a closer look at them. But how exactly do this? Imagine that you couldn't find a transformation to stabilise the distribution of a problematic feature without noticeable worsening its metrics on validation. The answer to the question of whether to drop it is hard to formalise, thus it requires experience, a good understanding of prediction patterns, and experimentation. Of course, the probability to make a wrong decision remains high. With M-CBPE and DLE, such a decision can be made with maximum justification, practically eliminating the probability of errors: based on the change in selected performance metrics values expected on inference. In addition, these algorithms allow us to evaluate model quality metrics in situations where the difference between training and prediction data is expected from the outset. For example, if representativeness was lost during dirty data cleaning, or when creating the training dataset by matching. #Kudos to #nannyml and thank you for your contribution to #datascience! UPD: please, keep in mind that both methods cannot cope with the situation when the concept shift takes noticeable place. So, we are waiting for new brilliant solutions from #nannyml #machinelearning #modelmonitoring #featureselection #modelevaluation

Apr 16, 2024

Gagandeep Singh

Staff Data Scientist @ Walmart

🚀 Exciting News for Data Practitioners! 🚀 Are you tired of the hassle and expense of annotating production data to monitor your machine learning models? Well, Allow me to introduce you to NannyML, the open-source post-deployment model monitoring framework in Python that's about to make your life a whole lot easier. NannyML comes packed with some seriously clever features that are a game-changer for those of us who work with data. What's the best part? It doesn't rely on labeled data! Yes, you heard that right – all the magic happens with the features you're already capturing while your model is in production. Let's dive into the essence of NannyML: 🔍 Feature Drift Detection: NannyML helps you identify univariate drift by comparing feature distributions across different chunks of data. Think of these 'chunks' as snapshots of your data – they can be based on time, size, or number. 📊 Advanced Analysis: For more complex scenarios involving multivariate drift, NannyML checks the Principal Component Analysis (PCA) data reconstruction error across chunks. In simple terms, it ensures the stability of key components over time, so you can spot any deviations. But wait, there's more: 📈 Model Performance Estimation: NannyML doesn't stop at drift detection; it also estimates your model's performance, helping you maintain top-notch results. 💰 Business Value Estimation: It even assists you in assessing the business value of your models, ensuring they continue to deliver the desired outcomes. 🔍 Data Quality Monitoring: Keeping an eye on data quality? NannyML has your back, ensuring your data remains reliable and consistent. While NannyML offers a plethora of functionalities, let's focus on feature drift detection, which doesn't require labeled data. Unfortunately, it won't help with concept drift, but that's a small trade-off for the convenience it provides. 📊 Univariate Drift: Check if feature distributions have shifted between reference and analysis periods. This can be a game-changer for maintaining model accuracy. 📉 Output Drift: You can also track shifts in the distribution of predicted classes over time, ensuring your model's predictions remain on point. 📊 Multivariate Drift: NannyML goes a step further by looking at the overall shift in feature distributions, validating the univariate results. In conclusion, NannyML is a powerful tool that simplifies drift detection for production models. With its intuitive interface and open-source nature, there's no excuse not to use it. Don't let your production models lose their business value due to neglect. Embrace NannyML and ensure your models stay on the right track! 🚀

Oct 29, 2023

Marios Kadriu

Data Scientist | Software Engineer | DeepMind Scholar

91% is an incredible percentage. Just goes to demonstrate how crucial it is to keep an eye on the performance of the models during production. Furthermore, there may be additional factors at play in addition to data drift that causes degradation. I am very happy to learn that NannyML an open-source python module exists to assist us in starting a performance monitoring project. Have you utilized NannyML or a different model performance monitoring tool? #datascience #ai #machinelearning …see more

Jul 27, 2023

Anderson Chaves

Lead Data Scientist at EssilorLuxottica

Another cool thing for the day! 💡 Thanks, Wojtek Kuberski and NannyML! #python #machinelearning #datascience

Nov 10, 2022

Pascal Biese

AI/ML Engineer | LLMs | NLP

Does the pattern below look familiar to you? If not, you can consider yourself incredibly lucky! For everyone else, check out NannyML: NannyML is an open-source python library that allows you to estimate post-deployment model performance (without access to targets), detect data drift, and intelligently link data drift alerts back to changes in model performance. Built for data scientists, NannyML has an easy-to-use interface, interactive visualizations, is completely model-agnostic and currently supports all tabular binary classification use cases. (Source: lnkd.in/e2RjQsQg) #DataScience #MachineLearning #AI #DeepLearning

May 16, 2022

Kishan Savant

Software Engineer @VCollab | Open Source Enthusiast Software Engineer @VCollab | Open Source Enthusiast

Thank you Hakim Elakhrass, Niels Nuyttens and NannyML team for sending this #contributions #swag all the way from Belgium. Really appreciate the note, Hakim. Looking forward to more #opensource #contributions to the wonderful NannyML #modelmonitoring tool. …see more

Apr 8, 2023

🚀 Mikkel Jensen

Data Scientist | Developer | ML & AI

Great resources and libraries I have come across recently: NannyML: An open source tool for monitoring models post deployment. I find the ability to estimate performance without labels especially helpful, as we have a one year lag on our labels. It is also possible to detect data/classifier drift. 𝐎𝐩𝐭𝐁𝐢𝐧𝐧𝐢𝐧𝐠: The go to data binning library. Super useful for discovering interesting intervals in variables, and for binning and preprocessing before applying a logistic regression on top. 𝐓𝐚𝐛𝐃𝐃𝐏𝐌: A new method for modelling tabular data. While I have only briefly browsed the paper, it seems promising for generating tabular data, outperforming both Smote and GANs in most of the tested cases! The paper was published only a week ago, but I'm probably already late to the party talking about this one. What are your favorite new tools? Links in the comments👇 ------------------------------------------------------------------ Talking data science, credit risk, cryptocurrency, software development - Connect & start the conversation! #machinelearning #python #datascience #opensource

Oct 6, 2022

Olivier Binette

Data Science Research, ML Evaluation & Entity Resolution // PhD Candidate at Duke Data Science Research, ML Evaluation & Entity Resolution // PhD Candidate at Duke

🙅 Stop monitoring data drift. Here's what to do instead. After deploying machine learning models, it's essential to ensure that they keep functioning as intended. One common way to do this is by monitoring data drift, or verifying that the data used for predictions is similar to the data the model was trained on. If there is a significant difference, retraining the model on updated data can help. However, monitoring data drift does not indicate if the drift is affecting the model's ability to solve the task it was trained for. An alternative and more focused approach is to continuously monitor the model's generalization performance. This can be done without using any labeled data through clever statistical techniques, such as confidence-based performance estimation and direct loss estimation. 🚀 NannyML implements these methods, allowing you to estimate the model's performance on drifted data and focus on what is most important for your task. Does it mean you should really stop monitoring data drift? No. But keep in mind that data drift does not always equate to performance drift, and it's usually more beneficial to focus on the latter. Code below is from lnkd.in/e95kJv-H #machinelearning #ml #ai #drift #performance #evaluation #statistics #datascience #nannyml #datadrift #MLOps

Feb 2, 2023

Roger Kamena, M.Sc.

Senior Data Scientist, AI-Powered Analytics at UKG Senior Data Scientist, AI-Powered Analytics at UKG

Even though sheer experience taught me that data drift alone is not enough to monitor models in production, and that monitoring performance through randomized control trials on out of sample data regularly is a needed practice, it’s nice to read a paper that confirms it with empirical evidence. Moreover NannyML is a nice discovery for me. Can’t wait to test it!

Feb 8, 2023

Louis Owen

AI & Data Science | Yellow.ai

[Estimating Accuracy Without Ground Truth] Getting your ML model to production is not an easy task. Monitoring your deployed ML model is even harder. If you have a business metric that directly correlates to your ML model's performance, then it's good for you. However, what if there are multiple factors that influence your business metric and your ML model is just one of them? What if you want to know how exactly your ML model performing in production measured by technical ML metrics (Accuracy, Precision, Recall, etc)? ✨Introducing CBPE (Confidence Based Performance Estimation) developed by the amazing team behind NannyML. With CBPE you can estimate the performance of any ML model, without any ground-truth! How it is even possible? It's possible by relying on the prediction output confidence score and under the assumption of the model is well-calibrated and concept drift is not exist. Furthermore, we can also estimate the performance of a regressor ML model with a similar algorithm developed by the NannyML team called DLE (Direct Loss Estimation)! Curious to learn more? You can refer to the following articles for more information! 📌 lnkd.in/gJuPBQcb 📌https://lnkd.in/gsbG3HgE 📌https://lnkd.in/gh3A6iUJ #artificialintelligence #machinelearning #datascience #sharingiscaring

Feb 7, 2023

Smriti Mishra

Data Science & Engineering at Adage

How can you know if your ML models did not fail silently after deployment? NannyML was trending on GitHub (it was also #3 on Product Hunt). It's a fantastic Open-Source Python Library for detecting silent ML model failure! Key aspects include: 🔹Estimate the performance of a deployed ML model in the absence of target data! Detect multivariate and univariate data drift robustly! 🔹Drop-in link performance due to drift in specific aspects 🔹It is compatible with all categorization models. 🔹Built-in performance and data drift visualisation pip install nannyml Check out the open-source project and give it a star to keep up with future updates like forthcoming regression support! lnkd.in/ej_VtyeP #technology #artificialintelligence #python #data #programming #machinelearning

Jun 20, 2022

👋 Simon Stiebellehner

Building ML Platforms | Engineering Manager | 👨‍🏫 University Lecturer | Advisor

Deployed a #MachineLearning model to production? Now you're having sleepless nights due to 𝗶𝗺𝗺𝗶𝗻𝗲𝗻𝘁 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗱𝗲𝗴𝗿𝗮𝗱𝗮𝘁𝗶𝗼𝗻? 😰 𝗚𝗲𝘁 𝗮 𝗡𝗮𝗻𝗻𝘆 𝗳𝗼𝗿 𝘆𝗼𝘂𝗿 𝗺𝗼𝗱𝗲𝗹! 𝗡𝗮𝗻𝗻𝘆𝗠𝗟 (by NannyML) is an 𝗢𝗽𝗲𝗻 𝗦𝗼𝘂𝗿𝗰𝗲 𝗣𝘆𝘁𝗵𝗼𝗻 𝗽𝗮𝗰𝗸𝗮𝗴𝗲 that helps you estimate model performance in production without access to targets! 𝚙𝚒𝚙 𝚒𝚗𝚜𝚝𝚊𝚕𝚕 𝚗𝚊𝚗𝚗𝚢𝚖𝚕 ➡️ 𝗗𝗲𝘁𝗲𝗰𝘁 𝗱𝗮𝘁𝗮 𝗱𝗿𝗶𝗳𝘁 of deployed models w/o targets(!) ➡️ Configure 𝗮𝗹𝗲𝗿𝘁𝘀 ➡️ Link data drift back to 𝗺𝗼𝗱𝗲𝗹 𝗰𝗵𝗮𝗻𝗴𝗲𝘀 ➡️ 𝗠𝗼𝗱𝗲𝗹-𝗮𝗴𝗻𝗼𝘀𝘁𝗶𝗰 ➡️ Comes with a neat set of 𝘃𝗶𝘀𝘂𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻𝘀 Check it out and star the repo to stay up-to-date! ⭐ 𝗚𝗶𝘁𝗛𝘂𝗯: lnkd.in/e-_YSDsh --- Follow me for curated, high-quality content on productionizing #MachineLearning and #MLOps . Let’s take #DataScience from Notebook to Production!

May 14, 2022

João Maia

Data Scientist | Machine Learning | Python | Keras

Thanks Wojtek, I recently implemented your solution in my personal project. And it's really really usefull to identify bad features, even going through other variable selection methods and help me to prevent some failures.

Jul 27, 2023

NLP Logix

3,527 followers

Great read alert: this article from NannyML discusses a recent study by MIT, Harvard, The University of Monterrey, and other top institutions-- in regard to models degrading over time. lnkd.in/g49zhe5a #ml #ai #mlmodels #datascienceisateamsport

Apr 18, 2023

Raphaël Hoogvliets

Tech Lead

Jun 27, 2024

Ilia Ekhlakov

Senior Business Data Scientist @ Wrike

Apr 16, 2024

Gagandeep Singh

Staff Data Scientist @ Walmart

Oct 29, 2023

Marios Kadriu

Data Scientist | Software Engineer | DeepMind Scholar

Jul 27, 2023

Anderson Chaves

Lead Data Scientist at EssilorLuxottica

Another cool thing for the day! 💡 Thanks, Wojtek Kuberski and NannyML! #python #machinelearning #datascience

Nov 10, 2022

Pascal Biese

AI/ML Engineer | LLMs | NLP

May 16, 2022

Kishan Savant

Software Engineer @VCollab | Open Source Enthusiast Software Engineer @VCollab | Open Source Enthusiast

Apr 8, 2023

🚀 Mikkel Jensen

Data Scientist | Developer | ML & AI

Oct 6, 2022

Olivier Binette

Data Science Research, ML Evaluation & Entity Resolution // PhD Candidate at Duke Data Science Research, ML Evaluation & Entity Resolution // PhD Candidate at Duke

Feb 2, 2023

Roger Kamena, M.Sc.

Senior Data Scientist, AI-Powered Analytics at UKG Senior Data Scientist, AI-Powered Analytics at UKG

Feb 8, 2023

Louis Owen

AI & Data Science | Yellow.ai

Feb 7, 2023

Smriti Mishra

Data Science & Engineering at Adage

Jun 20, 2022

👋 Simon Stiebellehner

Building ML Platforms | Engineering Manager | 👨‍🏫 University Lecturer | Advisor

May 14, 2022

João Maia

Data Scientist | Machine Learning | Python | Keras

Jul 27, 2023

NLP Logix

3,527 followers

Apr 18, 2023