Introducing 3LC
A game-changer in machine learning, offering real-time, data-centric iterations for model training and data quality. It’s a unique debugger focusing on nuances of how models learn from your data. 3LC users can delve into model performance beyond labeling errors, exploring aspects like non-trivial False Positives, embedding space movements, and interactive metrics analysis. 3LC enables data scientists to make informed decisions, enhancing model accuracy and performance.
Model Building with True Data-Centric Iterations
Welcome to the first blog post for 3LC! After over 20 months in stealth mode and having our first release, we’re excited to introduce a game-changer in the world of machine learning and data science and share how 3LC is transforming the approach to model training and data quality. A first-of-its-kind debugger for training or fine-tuning your models!
Beyond Label Errors
Imagine a scenario where a model’s performance isn’t just about correcting mislabeled data but about understanding the nuances of how the model learns from the data.
What about all those high-confidence False Positive bounding boxes that have zero intersection with your labels? What if you could interactively filter them to have a closer look? Is what’s inside those bounding boxes visually very similar to what you labeled and so it’s no surprise that the model is confused? Would adding those newly filtered bounding boxes to the training set as a new class be beneficial, so the model can separate this more easily? Could that reduce your number of False Positives?
Or how far did your training samples travel in embedding space between epochs, and are some more important to train on than others? Visualize them in 3LC and see the movement between epochs & filter on distance traveled! Inspect those prompts or images – are they difficult but correct samples? What would happen if you weighed them based on distance traveled so that the hard samples occur more often in the data stream?
What if you, in an instant, could create plots of any metrics 3LC recorded per sample? Can a 3D plot of the chosen_reward, rejected_reward, and loss on prompt fine-tuning tell you something? What if you lasso in on those outliers? Are they appropriate samples for your LLM? No? Did a “troll” create the label sets or are they just poor/bad? With 3LC, just turn off the outliers, and train again immediately. There is no need to export a new dataset!
Are your labels “correct” but inconsistent? Are the True Positive predictions for object detection more consistent than the labelers? 3LC lets you filter in the samples with True Positives, filter out any False Positives, and replace the labels with the predictions in one click, (which is the intersection where human and Model agree) and get a higher quality and more consistent training set.
With 3LC, you can delve deep into these aspects and many more, making informed decisions and iterative improvements that dramatically enhance model accuracy and performance.
Understand why your model struggles and immediately do something about it!
Interactive Data Quality and Model Improvement
3LC distinguishes itself by providing a distinctive, real-time data-centric iterative workflow. It positions your model at the core, offering deep insights into the impact of the data used for fine-tuning or training on your model’s performance. We empower data scientists to harness their expertise fully. With 3LC, there’s no rigid functionality or “fixed” workflow; you can record and utilize your own metrics to investigate why a model may not be learning effectively. This method emphasizes improving the quality and utility of training data, a task which often involves more complex challenges than merely rectifying labeling issues & human error—though it swiftly enables you to identify and immediately correct those as well!
Live Data Editing: The Heart of 3LC
At its core, 3LC is about live data editing and visualization. It’s a powerful tool that lets you modify your data on the fly in large batches. This feature is crucial because, let’s face it, no matter how much you tweak your model’s hyperparameters or adjust the neural network, if your data isn’t right and isn’t what your model needs, your model won’t perform as expected.
Seamless Integration and Visualization
One of the most appealing aspects of 3LC is its seamless integration into existing training scripts. You don’t have to upload data to a SaaS or radically change your workflow. Instead, you add 3LC to your current processes and immediately gain access to stunning visualizations of your training runs, detailed per-sample per-epoch granular insights, and the opportunity for immediate data editing and modifications.
Dataset Revisions: Sparse and Non-Intrusive
We understand the importance of data integrity. We don’t modify, duplicate, or move your underlying data. 3LC’s dataset revisions are lightweight, sparse, and only where you want them. Your data can reside wherever it resides now, and 3LC can even operate in an air-gapped environment, ensuring security and compliance with regulations.
A New Approach to Machine Learning
What sets 3LC apart is our focus on per-sample, per-epoch metrics, combined with real-time, interactive visual diagnostics and data editing capabilities. This approach offers a granular insight missing in traditional machine learning processes. The beauty of 3LC is that it doesn’t care about what network you are using, whether you are working on Object Detection, LLM fine-tuning, Multi-Modal improvements with Stable Diffusion or others, and same with data; time-series, or text to millions of images with bounding boxes.
Conclusion: The 3LC Way
In conclusion, 3LC is not just another tool; it’s a paradigm shift in machine learning. By emphasizing data quality, rapid iterative improvement, and interactive editing combined with real-time & high-end 2D/3D visualization, we’re paving the way for more efficient, accurate, and user-friendly machine learning iterations. There is no reason to NOT have 3LC attached while training! Join us in this exciting journey and see how 3LC can transform your approach to machine learning.
And you only need to add 3 lines of code to get started!