The Blog

Data Scientist’s Secret Weapon

Paul Endresen
December 21, 2023

3LC gives data scientists a chance to do REAL data science and work with the data directly inside the platform. No more graph plotting, no more coding, no more months of trial and error to find the problem. Why wouldn’t you work this way?

Fixing model prediction errors: why 3LC is a data scientist’s secret weapon

A relative of mine speaks two languages. We are a close knit family and he often looks after some of the children. As part of that responsibility, he is teaching them another language.

Great, right? Except he has taught them some relatively naughty words for things instead of the correct terms. While amusing for others it’s a nightmare for the parents.

And this is precisely what is wrong with current ML model work today. If you teach a model to ‘say’ one thing in a certain scenario, it doesn’t know what NOT to say in another.

This isn’t a new insight, but it’s a big challenge if working on large datasets when the results of training come back and there’s a big discrepancy.

Where are things going wrong and how can we fix them?

Which words do we need to teach it and how do we teach it new words quickly?

Which words is it miscategorizing? Is it human labeling error or something else?

The big problem with a model-centric approach

Most tools today only get you so far in solving where the problems are with labeling. Typically you see about 10-15% incorrectly labeled. But what do you do after you fix that and your results are still wide of the mark?

You can of course try other ways to assess the data and improve it. You can remove blurry images or images which are too small or too large. But that won’t move the needle much.

As it stands today, you have to extract slices of data and plot them on a graph. Not before you’ve done some coding work, though.

You painstakingly analyze, calculate and try to decode what’s going on. But it’s a fruitless endeavor since it takes far too long with no clear answers.

Well this is precisely where 3LC plays.

3LC gives data scientists a chance to do REAL data science and work with the data directly inside the platform.

No more graph plotting, no more coding, no more months of trial and error to find the problem.

Why wouldn’t you work this way?

How 3LC helps you identify where the model is wrong and fix it quickly

Let’s take an example from a customer we are working with. They are tasked with removing objects from buildings that could cause major problems if they stay in place.

Every day they scan and receive images from the buildings to ensure these objects are not causing an issue. They were using an ML model to scan these images for anomalies and flagging issues.

It was working, but there was room for improvement.

The client attached 3LC and ran through their dataset of images–painstakingly labeled by hand–and worked with us to figure out how they could get better.

The human error had been eradicated from the labels so the hunt was on for the discrepancy.

Going into hyper-granular detail on each image, 3LC found two distinct issues:

Non-matching labels:
Labels drawn by the diligent client team didn’t completely match what the model predicted.Some of the humans categorizing the objects drew lines around every aspect of the body it wanted the model to recognize.Others would capture the main essence but not everything. This meant that the model was slightly confused as there was margin for error.This would never have been picked up through traditional model tinkering and adjustments–or at least not for months.Using 3LC, they not only found the inconsistencies, but with some small tweaks to the dataset and some alignment, they were ready to re-run the model training in hours.The result? Significant improvements on baseline performance and prediction.
Wrongly identified objects:
With such a vast dataset and some inconsistencies in labeling, the model was additionally picking out similar-looking objects in its predictions.In the real world, this could result in hours of additional work for teams on the ground in their quest to find similar but wrongly identified objects.Again, this would’ve been extremely difficult to spot without the use of 3LC. But, filtering across these mis-labeled objects and creating a new category in the dataset, forcing the model to learn the difference between similar looking objects, the client eradicated the problem.They fed the model the new data that retrained it and helped it separate the objects much better.The result? Further improvements to the model performance.

How 3LC accelerates your model training and improves outcomes

In each of the examples outlined above, the AI model was being fed better, more accurate information that it needed to make better predictions.

The difference in outcomes and overall performance reduced errors by a significant margin. And even tiny margins can save significant time and money or improve customer and worker safety.

The data scientists were able to effectively review the performance and tell the model what was good and what was bad and re-run the training the same day.

Rather than demanding data science and ML Ops teams to run through the painful task of trying to decipher what is going on, 3LC gives them detailed information to review right then and there.

No more sending large datasets back and forth, editing, reviewing, analysing and updating in painstaking detail. It is a huge step forward.

3LC users are able to iterate on their data in minutes. And it provides the tools and visibility for users to actively scour the results in a single view.

Why wouldn’t you do it this way if you could?

Join our Beta program

If you’re a frustrated data scientist looking for ways to make meaningful improvements to your predictions and workflows, try 3LC.

Designed by data scientists for data scientists, it puts you in control and gives you the tools you need to get to the answers you want, fast.

Sign Up Here →

Name	Provider	Purpose	Expiration
hubspotutk	Hubspot	This cookie keeps track of a visitor's identity. It is passed to HubSpot on form submission and used when deduplicating contacts. It contains an opaque GUID to represent the current visitor.	6 months
__hssc	Hubspot	This cookie keeps track of sessions. This is used to determine if HubSpot should increment the session number and timestamps in the __hstc cookie. It contains the domain, viewCount (increments each pageView in a session), and session start timestamp.	30 minutes
__hssrc	Hubspot	Whenever HubSpot changes the session cookie, this cookie is also set to determine if the visitor has restarted their browser. If this cookie does not exist when HubSpot manages cookies, it is considered a new session. It contains the value "1" when present.	End of the session
__hstc	Hubspot	The main cookie for tracking visitors. It contains the domain, hubspotutk, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).	6 months