The Blog

Introducing 3LC

Paul Endresen
November 13, 2023

A game-changer in machine learning, offering real-time, data-centric iterations for model training and data quality. It’s a unique debugger focusing on nuances of how models learn from your data. 3LC users can delve into model performance beyond labeling errors, exploring aspects like non-trivial False Positives, embedding space movements, and interactive metrics analysis. 3LC enables data scientists to make informed decisions, enhancing model accuracy and performance.

Model Building with True Data-Centric Iterations

Welcome to the first blog post for 3LC! After over 20 months in stealth mode and having our first release, we’re excited to introduce a game-changer in the world of machine learning and data science and share how 3LC is transforming the approach to model training and data quality. A first-of-its-kind debugger for training or fine-tuning your models!

Live view of a run right after training, showing every data point for every epoch and different plots.

Beyond Label Errors

Imagine a scenario where a model’s performance isn’t just about correcting mislabeled data but about understanding the nuances of how the model learns from the data.

What about all those high-confidence False Positive bounding boxes that have zero intersection with your labels? What if you could interactively filter them to have a closer look? Is what’s inside those bounding boxes visually very similar to what you labeled and so it’s no surprise that the model is confused? Would adding those newly filtered bounding boxes to the training set as a new class be beneficial, so the model can separate this more easily? Could that reduce your number of False Positives?

Or how far did your training samples travel in embedding space between epochs, and are some more important to train on than others? Visualize them in 3LC and see the movement between epochs & filter on distance traveled! Inspect those prompts or images – are they difficult but correct samples? What would happen if you weighed them based on distance traveled so that the hard samples occur more often in the data stream?

Filtered in 5% of the training samples that traveled the longest in embedding space during training. Add weight, and just restart the training.

What if you, in an instant, could create plots of any metrics 3LC recorded per sample? Can a 3D plot of the chosen_reward, rejected_reward, and loss on prompt fine-tuning tell you something? What if you lasso in on those outliers? Are they appropriate samples for your LLM? No? Did a “troll” create the label sets or are they just poor/bad? With 3LC, just turn off the outliers, and train again immediately. There is no need to export a new dataset!

Not a very good answer for the LLM to learn from, like all the other outliers above.

Are your labels “correct” but inconsistent? Are the True Positive predictions for object detection more consistent than the labelers? 3LC lets you filter in the samples with True Positives, filter out any False Positives, and replace the labels with the predictions in one click, (which is the intersection where human and Model agree) and get a higher quality and more consistent training set.

With 3LC, you can delve deep into these aspects and many more, making informed decisions and iterative improvements that dramatically enhance model accuracy and performance.

Understand why your model struggles and immediately do something about it!

Interactive Data Quality and Model Improvement

3LC distinguishes itself by providing a distinctive, real-time data-centric iterative workflow. It positions your model at the core, offering deep insights into the impact of the data used for fine-tuning or training on your model’s performance. We empower data scientists to harness their expertise fully. With 3LC, there’s no rigid functionality or “fixed” workflow; you can record and utilize your own metrics to investigate why a model may not be learning effectively. This method emphasizes improving the quality and utility of training data, a task which often involves more complex challenges than merely rectifying labeling issues & human error—though it swiftly enables you to identify and immediately correct those as well!

Live Data Editing: The Heart of 3LC

At its core, 3LC is about live data editing and visualization. It’s a powerful tool that lets you modify your data on the fly in large batches. This feature is crucial because, let’s face it, no matter how much you tweak your model’s hyperparameters or adjust the neural network, if your data isn’t right and isn’t what your model needs, your model won’t perform as expected.

Here we have filtered in images with False Positives and very high confidence bounding boxes. Then it’s easy to find missing labels in the training set on numerous samples. In the image a bottle was not labeled, so we’re adding that bounding box to the training set. Assigning these can of course also be done on multiple selected bounding boxes in batch, also across rows.

Seamless Integration and Visualization

One of the most appealing aspects of 3LC is its seamless integration into existing training scripts. You don’t have to upload data to a SaaS or radically change your workflow. Instead, you add 3LC to your current processes and immediately gain access to stunning visualizations of your training runs, detailed per-sample per-epoch granular insights, and the opportunity for immediate data editing and modifications.

Dataset Revisions: Sparse and Non-Intrusive

We understand the importance of data integrity. We don’t modify, duplicate, or move your underlying data. 3LC’s dataset revisions are lightweight, sparse, and only where you want them. Your data can reside wherever it resides now, and 3LC can even operate in an air-gapped environment, ensuring security and compliance with regulations.

A New Approach to Machine Learning

What sets 3LC apart is our focus on per-sample, per-epoch metrics, combined with real-time, interactive visual diagnostics and data editing capabilities. This approach offers a granular insight missing in traditional machine learning processes. The beauty of 3LC is that it doesn’t care about what network you are using, whether you are working on Object Detection, LLM fine-tuning, Multi-Modal improvements with Stable Diffusion or others, and same with data; time-series, or text to millions of images with bounding boxes.

Conclusion: The 3LC Way

In conclusion, 3LC is not just another tool; it’s a paradigm shift in machine learning. By emphasizing data quality, rapid iterative improvement, and interactive editing combined with real-time & high-end 2D/3D visualization, we’re paving the way for more efficient, accurate, and user-friendly machine learning iterations. There is no reason to NOT have 3LC attached while training! Join us in this exciting journey and see how 3LC can transform your approach to machine learning.

And you only need to add 3 lines of code to get started!

Name	Provider	Purpose	Expiration
hubspotutk	Hubspot	This cookie keeps track of a visitor's identity. It is passed to HubSpot on form submission and used when deduplicating contacts. It contains an opaque GUID to represent the current visitor.	6 months
__hssc	Hubspot	This cookie keeps track of sessions. This is used to determine if HubSpot should increment the session number and timestamps in the __hstc cookie. It contains the domain, viewCount (increments each pageView in a session), and session start timestamp.	30 minutes
__hssrc	Hubspot	Whenever HubSpot changes the session cookie, this cookie is also set to determine if the visitor has restarted their browser. If this cookie does not exist when HubSpot manages cookies, it is considered a new session. It contains the value "1" when present.	End of the session
__hstc	Hubspot	The main cookie for tracking visitors. It contains the domain, hubspotutk, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).	6 months