Hacker Newsnew | past | comments | ask | show | jobs | submit | cmauck10's commentslogin

Do you want to reduce the error-rate of responses from OpenAI’s o1 LLM by over 20% and also catch incorrect responses in real-time?

These 3 benchmarks demonstrate this can be achieved with the Trustworthy Language Model (TLM) framework.

TLM wraps any base LLM to automatically: score the trustworthiness of its responses and produce more accurate responses. As of today: o1-preview is supported as a new base model within TLM. The linked benchmarks reveal that TLM outperforms o1-preview consistently across 3 datasets.

TLM helps you build more trustworthy AI applications than existing LLMs, even the latest Frontier models.


This article demonstrates an agentic system to ensure reliable answers in Retrieval-Augmented Generation, while also ensuring that latency and compute costs do not exceed the processing needed to accurately respond to complex queries. Our system relies on trustworthiness scores for LLM outputs, in order to dynamically adjust retrieval strategies until sufficient context has been retrieved to generate a trustworthy RAG answer.


Large language models are famous for their ability to make things up—in fact, it’s what they’re best at. But their inability to tell fact from fiction has left many businesses wondering if using them is worth the risk.

A new tool created by Cleanlab, an AI startup spun out of a quantum computing lab at MIT, is designed to give high-stakes users a clearer sense of how trustworthy these models really are. Called the Trustworthy Language Model, it gives any output generated by a large language model a score between 0 and 1, according to its reliability. This lets people choose which responses to trust and which to throw out. In other words: a BS-o-meter for chatbots.


We open sourced cleanlab as a Python library to quickly identify dataset problems in any Machine Learning project. While manual issue detection is often done during data prep prior to model training, your trained ML model captures a lot of information about its dataset that can reveal critical issues if the right algorithms are applied. The cleanlab package offers a data-centric AI platform to run many such algorithms and detect common problems in ML datasets like: mislabeling, outliers, (near) duplicates, drift, etc.


Would you trust medical AI that’s been trained on pathology/radiology images where tumors/injuries were overlooked by data annotators or otherwise mislabeled? Most image segmentation datasets today contain tons of errors because it is painstaking to annotate every pixel.

We have added semantic segmentation to automatically catch annotation errors in image segmentation datasets, before they harm your models! Quickly use cleanlab open source to detect bad data and fix it before training/evaluating your segmentation models. This is the easiest way to increase the reliability of your data & AI!

We've feely open-sourced our new method for improving segmentation data and published a paper on the research behind it. https://arxiv.org/abs/2307.05080


Multi-label classification utilizes data where each example can belong to multiple (or none) of the K classes. One example of this could be an image of a face that is labeled with wearing_glasses and wearing_necklace as opposed to standard multi-class classification where each example has only one label.

Ensuring high quality labels in multi-label classification datasets is really hard, as they often contain tons of tagging errors because annotating such data requires many decisions per example.

This article explores the challenges of multi-label data quality and demonstrate how to automatically identify and rectify problems with an enterprise no-code AI data correction tool.


When generating synthetic data with LLMs (GPT4, Claude, …) or diffusion models (DALLE 3, Stable Diffusion, Midjourney, …), how do you evaluate how good it is?

Introducing: Quality scores to systematically evaluate a synthetic dataset with just one line of code! Use Cleanlab’s synthetic dataset scores to rigorously guide your prompt engineering (much better signal than just manually inspecting samples). These scores also help you tune settings of any synthetic data generator (eg. GAN or probabilistic model hyperparameters) and compare different synthetic data providers.

Cleanlab scores comprehensively evaluate a synthetic dataset for different shortcomings including: unrealistic examples, low diversity, overfitting/memorization of real data, and underrepresentation of certain real scenarios. These scores are universally applicable to image, text, and structured/tabular data!

Check out the blog for more details or the tutorial notebook: https://help.cleanlab.ai/tutorials/synthetic_data/


Would you deploy a self-driving car model that was trained on images for which data annotators accidentally forgot to highlight some pedestrians?

Annotators of real-world object detection datasets often make such errors and many other mistakes. To avoid training models on erroneous data and save QA teams significant time, you can now use automated algorithms invented by our scientists.

Our newest paper introduces Cleanlab Object Detection: a novel algorithm to assess label quality in any object detection dataset and catch errors (named ObjectLab for short). Extensive benchmarks show Cleanlab Object Detection identifies mislabeled images with better precision/recall than other approaches. When applied to the famous COCO dataset, Cleanlab Object Detection automatically discovers hundreds of mislabeled images, including errors where annotators mistakenly: overlooked an object that should’ve had a bounding box, sloppily drew a box in a poor location, or chose the wrong class label for an annotated object.

We’ve open-sourced one line of code to find errors in any object detection dataset via Cleanlab Object Detection, which can utilize any existing object detection model you’ve trained.


Years ago, we showed the world it was possible to automatically detect label errors in classification datasets via machine learning. Since that moment, folks have asked whether the same is possible for regression datasets?

Figuring out this question required extensive research since properly accounting for uncertainty (critical to decide when to trust machine learning predictions over the data itself) poses unique challenges in the regression setting.

Today we have published a new paper introducing an effective method for “Detecting Errors in Numerical Data via any Regression Model”. Our method can find likely incorrect values in any numerical column of a dataset by utilizing a regression model trained to predict this column based on the other data features.

We’ve added our new algorithm to our open-source cleanlab library for you to algorithmically audit your own datasets for errors. Use this code for applications like detecting: data entry errors, sensor noise, incorrect invoices/prices in your company’s / client’s records, mis-estimated counts (eg. of cells in biological experiments).

Extensive benchmarks reveal cleanlab’s algorithm detects erroneous values in real numeric datasets better than alternative methods like RANSAC and conformal inference.

If you'd like to learn more, you can also check out the research paper (https://arxiv.org/abs/2305.16583), code (https://github.com/cleanlab/cleanlab), and the tutorial to run this on your data (https://docs.cleanlab.ai/stable/tutorials/regression.html)


Hey guys!

I'm excited to share our newest release of cleanlab which helps you clean data and labels by automatically detecting issues in a ML dataset. To facilitate machine learning with messy, real-world data, this data-centric AI package uses your existing models to estimate dataset problems that can be fixed to train even better models.

With this release, it now supports: - Regression (NEW) - Object detection (NEW) - Image segmentation (NEW) - Outlier detection - Binary, multi-class, and multi-label classification - Token classification - Classification with data labeled by multiple annotators - Active learning with multiple annotators

Let’s make AI more data-centric :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: