The Blog

3LC Changed How I Perceive Data Science

Frederik Mellbye
January 24, 2024

After facing redundancy at the start of 2023, Frederik considered his options when looking for a new Data Science job. Looking to work on something cutting-edge and deliver meaningful work, he found 3LC. This is his story and why he picked 3LC.

3LC Changed How I Perceive Data Science:
Here’s Why You Should Try It Too

A little over a year ago, during my job search, I was seeking a role in a more intimate, smaller-scale environment where my work would be both exciting and impactful. I was eager to be in the thick of things, to work with and learn from experienced developers on a large, ambitious software project. At 3LC, alongside my remarkable colleagues, I found exactly that.

When I first saw the demo, I was immediately impressed. Improving AI models solely through improving the data seemed strange at first, as data scientists often leave the data alone once training has started.

The approach made sense. Cleaner and more consistent data should lead to faster training, saving cost, time, and carbon emissions in the process.

My only question was how impactful dataset quality could be to model performance. I have been surprised, time and time again, by the magnitude of the improvements we have seen for a wide range of AI modalities and tasks across all our client projects.

Past Data Science Experiences

In past data science projects, my relationship to my dataset would consist of simple statistical analysis in a python notebook. In other cases, I would study individual samples, but this quickly becomes infeasible for larger datasets.

Then, I would leave the dataset alone and turn my attention to the model. I would try out different model architectures and perform hyperparameter tuning, relying on aggregate metrics to suggest whether I am making progress. It felt like fumbling around in the dark with no way to understand how my process was affecting my model, and what to do next.

The Impact of 3LC on How I Analyze Data

Working with 3LC has opened my eyes to new ways of working with datasets.

Normally what you see after an experiment is an aggregate of metrics across classes (or just the entire dataset). Then you make a hypothesis on what to do next – which takes time and doesn’t necessarily mean you make improvements.

With 3LC, however, you get a granular view – row by row – of model predictions and metrics for each individual sample in your dataset. You can use this information to take action on your dataset immediately and make improvements to it right then and there.

The real kicker here is that you don’t move your data anywhere or into a new tool or platform. There are no security risks or additional challenges – just pure data science. There is also tight integration with the ML code, there is no need to export or import bulk data. Plus, each version of your dataset is saved, allowing you to easily revert any changes or reproduce previous results. You identify what you want to change, make the changes, and either restart or continue training to measure their effectiveness.

What used to be a set of cumbersome and time-consuming tasks; editing, managing, sharing, and versioning of AI datasets, is now taking me minutes.

How I Use 3LC Every Day

Since joining 3LC my time has been spent on many different tasks and projects. Like all true startups, it’s fast and frenetic, but working with some of the world’s biggest companies on exciting and groundbreaking projects makes it all worth it.

I regularly work on improving AI models, particularly in computer vision, for which there are many challenges 3LC helps me tackle.

Our client projects include damage detection on rental cars, detecting problems with power lines, and finding anomalies on the ocean surface. We also use 3LC to find problematic samples in large text datasets, for example, those used in RLHF.

3LC shows me how the model performs on each training sample, and based on that, allows me to:

- - Make rapid iterations to improve the data itself and rerun training.
  - Find anomalous samples which shouldn’t even be a part of the dataset.
  - Understand and identify which samples the model struggles with, and make sure these samples appear more frequently in training.
  - Rebalance the data fed to the model during training to reduce unwanted bias present in the dataset.

In client projects, 3LC has enabled us to:

- - Reduce training time by sampling difficult and interesting samples more frequently and by training on cleaner data. This has drastically decreased training time and cost. Good for the user, good for the planet.
  - Improve the model performance by improving the data (fixing incorrect labels, adding missing labels). Finding these issues is easy because 3LC allows you to filter the relevant data efficiently.
  - Gain intuition and understanding about how the model performs beyond the aggregate metrics – a valuable addition when evaluating a model. The model is easiest to understand in the context of the data it is learning from.

3LC is a very flexible tool, which has allowed us to learn new ways of filtering and looking at the data we didn’t know about before.

3LC offers a new way of engaging with your dataset and model predictions. It gives you an overview of your dataset, and simultaneously lets you work on the finer details. This versatility is what makes it the Swiss Army Knife of data science.

Join our Beta Program

Our singular focus now is preparing to open up the platform for our Beta launch. The technology works and is being used actively by a few select companies around the world. However, we need to stress test the tool.

3LC will be free for non-commercial use so anyone can play around with it. And we have also made it ready for enterprises for the world’s biggest companies to throw massive datasets at.

What we don’t have, yet, is an army of fellow data scientists putting it through its paces and telling us what it can do.

Sign up for the program at 3LC.ai

Name	Provider	Purpose	Expiration
hubspotutk	Hubspot	This cookie keeps track of a visitor's identity. It is passed to HubSpot on form submission and used when deduplicating contacts. It contains an opaque GUID to represent the current visitor.	6 months
__hssc	Hubspot	This cookie keeps track of sessions. This is used to determine if HubSpot should increment the session number and timestamps in the __hstc cookie. It contains the domain, viewCount (increments each pageView in a session), and session start timestamp.	30 minutes
__hssrc	Hubspot	Whenever HubSpot changes the session cookie, this cookie is also set to determine if the visitor has restarted their browser. If this cookie does not exist when HubSpot manages cookies, it is considered a new session. It contains the value "1" when present.	End of the session
__hstc	Hubspot	The main cookie for tracking visitors. It contains the domain, hubspotutk, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).	6 months

3LC Changed How I Perceive Data Science: Here’s Why You Should Try It Too

Past Data Science Experiences

The Impact of 3LC on How I Analyze Data

How I Use 3LC Every Day

Join our Beta Program

3LC Changed How I Perceive Data Science:
Here’s Why You Should Try It Too