Nexar Dashcam Kaggle Challenge

The Nexar Dashcam Crash Prediction Challenge presented a compelling task: predict whether a collision or near-collision event is imminent based on dashcam video footage. The training dataset comprised 1,500 real-world driving videos, each approximately 40 seconds long, annotated with precise timestamps for both the event (collision or near-collision) and the earliest moment it could be predicted – the alert time. The test set included 1,344 videos, half of which ended either 0.5, 1.0, or 1.5 seconds before an event.

Rather than focusing on finding the best model architecture or do hyperparameter tuning, adding optical flow data, or incorporating a transformer/LSTM secondary network based on predicted time-series, I wanted to keep it simple! I adopted a 100% data-centric approach. My goal was to see how far the performance could be pushed using only the feedback from a standard model – mvit_v2_s (a Multiscale Vision Transformer from PyTorch’s Torchvision library) – and the 3LC framework to iteratively refine the training data based on the model’s feedback. This included inspecting predictions to build understanding, removing ambiguous data, weighting valuable examples, and debugging via embedding space.

This was also first time I used 3LC on video models – usually it has been object detection or instance segmentation, so that was interesting in itself.

This article walks through the process, decisions, and insights that ultimately elevated my score from 0.71 to 0.898 on the leaderboard – and winning the competition – without altering the model architecture or playing with model parameters.

👉 Code repository available here

Registering the Videos in 3LC tables

The first step was converting the videos to individual 256×256 frames and registering them as a table in 3LC. Early testing confirmed my suspicion: training on frames after the event/crash was detrimental. I observed that the videos after the crash was quite chaotic, and the goal was to predict when an alert should go off. So I trimmed those frames out during the registration step.

Each registered frame included the following metadata:

Time to Event
Time to Alert
Event Occurs (if the frame is within the alert-to-event window)
Has Event (if the video frame belongs to contains an event)

This setup allowed me to filter and inspect the dataset intelligently even before training.

Plot of video vs. frame number, colored by Event Occurs in green

Exploring Alert-to-Event Frames

Using 3LC’s filtering tools, I focused on frames between the alert and the event. Exploring those, I decided the model needed around 2 seconds of context to do a good prediction as that was subjectively “enough” change in the situation, so even though each training sample was one frame, I loaded 15 preceding frames (step size = 4) in the PyTorch dataset for each sample during training. The mvit_v2_s model accepts 16-frame inputs, so that worked well. At 30fps that was about 2 seconds for each training sample.

Filtering in only those frames that are between alert and event

I trained on 6,000 samples per epoch, using torch’s WeightedRandomSampler. Since each sample includes 16 frames, this meant 96,000 frames per epoch loaded. I reserved 20% of the videos for validation early on, later reducing that to 2%. I ensured that the frames from the same video were never split between the training and validation sets.

Each epoch took ~3 minutes on an NVIDIA A100. For validation, I sampled sparsely (every 64th frame for non-events, every 4th for events), keeping validation under a minute. I observed best results after 5–20 epochs of training.

After the initial training run, I selected the best-performing epoch (on the validation set) and ran inference on the training set, saving the results as a 3LC run. Inference on full training set alone took ~4 hours, so I ran it separately post-training. Usually I run inference and add to the 3LC run for each epoch, but here it would just take too long.

Now opening the training run in 3LC I could start to plot and analyze the original data combined with the training metrics captured for each frame and start to make necessary changes to the training data!

Middle view shows where events are in video, vs predicted at threshold 0.95 to the right

Refining the Training Data

Using the 3LC UI, I began by inspecting high-confidence “event” predictions that were incorrect. In many of these cases (subjectively speaking), I believe it would have been useful to receive an alert in real-life! However, since I wasn’t confident enough to re-label these frames, I chose to delete them from the training table instead.

I found a lot of frames where the model is quite certain that a collision is about to happen although the samples are not labelled as such

To do this, I first isolated the cases by using text filtering on the left-hand panel – targeting the specific videoIDs I’d identified – and then used the lasso tool to select the sequences of frames I wanted to remove.

Selecting sequences around false positives for deletion

Dive deeper into your data

Light up the black box and pip install 3lc to gain the clarity you need to make meaningful changes to your models in moments. Remove the guesswork from your model training and iterate fast.

Get started now

or book a meeting to learn more.

Get Started Book a Meeting

Name	Provider	Purpose	Expiration
hubspotutk	Hubspot	This cookie keeps track of a visitor's identity. It is passed to HubSpot on form submission and used when deduplicating contacts. It contains an opaque GUID to represent the current visitor.	6 months
__hssc	Hubspot	This cookie keeps track of sessions. This is used to determine if HubSpot should increment the session number and timestamps in the __hstc cookie. It contains the domain, viewCount (increments each pageView in a session), and session start timestamp.	30 minutes
__hssrc	Hubspot	Whenever HubSpot changes the session cookie, this cookie is also set to determine if the visitor has restarted their browser. If this cookie does not exist when HubSpot manages cookies, it is considered a new session. It contains the value "1" when present.	End of the session
__hstc	Hubspot	The main cookie for tracking visitors. It contains the domain, hubspotutk, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).	6 months

Registering the Videos in 3LC tables

Exploring Alert-to-Event Frames

Refining the Training Data

Analyzing Prediction Errors

Further Training Data Edits

Things That Didn’t Work

Key Insights

Results

Final Thoughts

Dive deeper into your data

Get started now

Nexar Dashcam Kaggle Challenge

Rebecca McDonald

Registering the Videos in 3LC tables

Exploring Alert-to-Event Frames

Refining the Training Data

Analyzing Prediction Errors

Further Training Data Edits

Things That Didn’t Work

Key Insights

Results

Final Thoughts

Dive deeper into your data

Get started now