3LC Changed How I Perceive Data Science
After facing redundancy at the start of 2023, Frederik considered his options when looking for a new Data Science job. Looking to work on something cutting-edge and deliver meaningful work, he found 3LC. This is his story and why he picked 3LC.
3LC Changed How I Perceive Data Science:
Here’s Why You Should Try It Too
A little over a year ago, during my job search, I was seeking a role in a more intimate, smaller-scale environment where my work would be both exciting and impactful. I was eager to be in the thick of things, to work with and learn from experienced developers on a large, ambitious software project. At 3LC, alongside my remarkable colleagues, I found exactly that.
When I first saw the demo, I was immediately impressed. Improving AI models solely through improving the data seemed strange at first, as data scientists often leave the data alone once training has started.
The approach made sense. Cleaner and more consistent data should lead to faster training, saving cost, time, and carbon emissions in the process.
My only question was how impactful dataset quality could be to model performance. I have been surprised, time and time again, by the magnitude of the improvements we have seen for a wide range of AI modalities and tasks across all our client projects.
Past Data Science Experiences
In past data science projects, my relationship to my dataset would consist of simple statistical analysis in a python notebook. In other cases, I would study individual samples, but this quickly becomes infeasible for larger datasets.
Then, I would leave the dataset alone and turn my attention to the model. I would try out different model architectures and perform hyperparameter tuning, relying on aggregate metrics to suggest whether I am making progress. It felt like fumbling around in the dark with no way to understand how my process was affecting my model, and what to do next.
The Impact of 3LC on How I Analyze Data
Working with 3LC has opened my eyes to new ways of working with datasets.
Normally what you see after an experiment is an aggregate of metrics across classes (or just the entire dataset). Then you make a hypothesis on what to do next – which takes time and doesn’t necessarily mean you make improvements.
With 3LC, however, you get a granular view – row by row – of model predictions and metrics for each individual sample in your dataset. You can use this information to take action on your dataset immediately and make improvements to it right then and there.
The real kicker here is that you don’t move your data anywhere or into a new tool or platform. There are no security risks or additional challenges – just pure data science. There is also tight integration with the ML code, there is no need to export or import bulk data. Plus, each version of your dataset is saved, allowing you to easily revert any changes or reproduce previous results. You identify what you want to change, make the changes, and either restart or continue training to measure their effectiveness.
What used to be a set of cumbersome and time-consuming tasks; editing, managing, sharing, and versioning of AI datasets, is now taking me minutes.
How I Use 3LC Every Day
Since joining 3LC my time has been spent on many different tasks and projects. Like all true startups, it’s fast and frenetic, but working with some of the world’s biggest companies on exciting and groundbreaking projects makes it all worth it.
I regularly work on improving AI models, particularly in computer vision, for which there are many challenges 3LC helps me tackle.
Our client projects include damage detection on rental cars, detecting problems with power lines, and finding anomalies on the ocean surface. We also use 3LC to find problematic samples in large text datasets, for example, those used in RLHF.
3LC shows me how the model performs on each training sample, and based on that, allows me to:
-
-
- Make rapid iterations to improve the data itself and rerun training.
- Find anomalous samples which shouldn’t even be a part of the dataset.
- Understand and identify which samples the model struggles with, and make sure these samples appear more frequently in training.
- Rebalance the data fed to the model during training to reduce unwanted bias present in the dataset.
-
In client projects, 3LC has enabled us to:
-
-
- Reduce training time by sampling difficult and interesting samples more frequently and by training on cleaner data. This has drastically decreased training time and cost. Good for the user, good for the planet.
- Improve the model performance by improving the data (fixing incorrect labels, adding missing labels). Finding these issues is easy because 3LC allows you to filter the relevant data efficiently.
- Gain intuition and understanding about how the model performs beyond the aggregate metrics – a valuable addition when evaluating a model. The model is easiest to understand in the context of the data it is learning from.
-
3LC is a very flexible tool, which has allowed us to learn new ways of filtering and looking at the data we didn’t know about before.
3LC offers a new way of engaging with your dataset and model predictions. It gives you an overview of your dataset, and simultaneously lets you work on the finer details. This versatility is what makes it the Swiss Army Knife of data science.
Join our Beta Program
Our singular focus now is preparing to open up the platform for our Beta launch. The technology works and is being used actively by a few select companies around the world. However, we need to stress test the tool.
3LC will be free for non-commercial use so anyone can play around with it. And we have also made it ready for enterprises for the world’s biggest companies to throw massive datasets at.
What we don’t have, yet, is an army of fellow data scientists putting it through its paces and telling us what it can do.
Sign up for the program at 3LC.ai