Your smart car can tattle on you. It not only knows where you you have been, but also how fast you were driving and how hard you brake. It can tell if you were the driver or if someone else was behind the wheel. Due to the connected nature of smart cars, it can share this data over web-based platforms.
Over the last few decades, automobiles evolved from modes of transport into sensor-laden mobile computing platforms. While the sensor-generated data has enabled breakthroughs in safety features and performance, it also creates privacy concerns for drivers.
But your car doesn’t have to be smart to be a privacy concern. Cars that aren’t connected can also create trackable driver passports.
Even a 20-year-old car can share your driving data
Event data recorders (EDRs) have been built into cars for years. The data from these in-vehicle black boxes and other car sensor data is accessible to mechanics via on-board diagnostic systems (OBDs), which have been required in cars sold in the United States since 1996 and in the EU since 2001. For cars that aren’t considered ‘smart’, the data is not wirelessly transmitted and is only accessible from inside the car itself.
In recent years, these devices have allowed police to reconstruct crash scenes with actual speed and braking data. Insurance companies have utilized data from EDRs to assign accident blame. According to the Insurance Institute for Highway Safety, “insurers may be able to access the EDRs in their policyholders’ vehicles based on provisions in the insurance contract.”
According to police, nothing that would identify driver
Police have acknowledged that some people are concerned about privacy when it comes to EDRs, and that many worry that the law enforcement agencies and insurance providers can use the data to spy on drivers. They have maintained that there is nothing in the data that would identify what’s going on in the cab with the person or what person is sitting in the driver’s seat.
That last statement is no longer true. In a study published last month (see below), researchers from the University of Washington and the University of California, San Diego showed that the data from your car can be used to positively identify the driver with 100% accuracy after just minutes of driving.
Machine learning and the sensors in your car can ID you in minutes
A WIRED report, A Car’s Computer Can ‘Fingerprint’ You in Minutes Based on How You Drive, explained that with just a few minutes’ worth of the information your car is already recording, the driver of a car can be accurately identified. In the study, which the researchers plan to present at the Privacy Enhancing Technology Symposium this summer, the team used MATLAB and machine learning to accurately identify the drivers with data collected on the car’s internal computer.
“With very limited amounts of driving data we can enable very powerful and accurate inferences about the driver’s identity,” says Miro Enev, one of the authors of the study.
To ensure the focus of the project was on differences in driving habits, the same car was used by all 15 test participants. The participants all drove the same route at the same time of day, and were even required to listen to the same radio station.
The researchers collected data from the car’s internal computer and used a machine learning algorithm to analyze the data for each driver. The machine learning algorithm used 90 percent of the driving data to train the algorithm. The remaining 10 percent was used to test the algorithm.
The trained machine learning algorithm was able to distinguish the drivers in the study with 100% accuracy with only 15 minutes of the driving data.
Comparing machine learning algorithms with MATLAB
The researchers used various MATLAB toolboxes for the processing pipeline, from sensor signal processing to comparing performance of various machine learning algorithms. First, signal preprocessing was performed using the Signal Processing Toolbox and Wavelet Toolbox:
After the data is uniformly sampled, each sensor stream was smoothed by applying wavelet denoising to remove high frequency artifacts. This operation involved multi-level stationary wavelet decomposition and subsequent reconstruction using the Haar wavelet with the default denoising threshold of the MATLAB iswt command.
The researchers then performed feature extraction and classification using methods available in the Statistics and Machine Learning Toolbox. After preprocessing the sensor signals, features that capture the statistical and morphological characteristics of the signals are derived. In total, 48 features are derived for each sensor and time segment.
Random Forest, Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Naive Bayes were the four classification algorithms that were trained and evaluated in this work. Of the four classifiers, Random Forest yielded the best results.
The privacy implications
According to the article in WIRED, the ability to use this data could have unexpected privacy implications: everything from letting insurance companies punish drivers who loan their cars to their teenage kids, to identifying a driver who violated traffic laws or caused a collision.
“The same data that tells their insurance company when they’ve let their 16-year-old kid take their car to prom might just as easily be used to identify drunk driving or a medical condition that’s altered someone’s driving ability, tests Enev claims would actually be simpler than trying to distinguish a driver’s identity,” says Andy Greenberg, senior writer for WIRED.
On March 15, the U.S. Senate Committee on Commerce, Science and Transportation met with leaders in the autonomous vehicle industry to discuss cybersecurity and privacy. According to Government Technology Magazine, “The questions that arose during the committee hearing were to what extent automakers could use that data. Could they collect it and sell it to third parties for marketing purposes? Could they tie information to specific people?” We now know the answer to that last question is a resounding yes.