Smart cars vs. privacy: a driverless car could generate 100 GB of data per second

Fully autonomous vehicles might never come to much, but many of the privacy issues they raise are as salient today as they would be in any high-tech driverless future.

Assuming that a recent article in New Scientist is wrong and driverless cars won't be “going the way of the jetpack” anytime soon, it nonetheless seems probable that fully autonomous driverless vehicles will not be an everyday reality in the developed world at least before 2030. This flies in the face of predictions that by 2020, a human behind the wheel would be a thing of the past, and can largely be attributed to developments in AI not keeping up with the blue-sky thinking of many tech evangelists during the past decade.

That is not to say, though, that significant strides are not being taken towards this possibly unattainable goal. Since October last year, for instance, Google's sister firm Waymo has been operating a commercial driverless taxi, or robotaxi, service in certain parts of Phoenix, Arizona.

Meanwhile, numerous car manufacturers, most vociferously Tesla, have unveiled multiple advanced driver assistance systems. These systems, while a far way off from providing fully automated driving, are nonetheless trumpeted by some, most notably Tesla's “technoking” and CEO Elon Musk, as heralding the arrival of such tech into the popular sphere.

Real-world applications

Certainly, away from the glare of the popular press, driverless vehicles are increasingly the norm in warehouses, ports and industrial facilities alike. Germany's BASF, for example, is using 32-wheeled automated guided vehicles (AGVs) to transport chemicals in volumes of up to 73,000 litres and at speeds of 30 km/h around its city-sized home site of Ludwigshafen.

Moreover, while the notion of pilotless drones killing people on the other side of the planet is now a reality of war, there are many in the maritime industry who have an essentially sailorless shipping sector firmly in their sights.

And if that seems hard to fathom, bear in mind that in Japan, a country already heavily committed to robotisation, there are already industry-led plans afoot to make 50% of the country's domestic fleet unmanned by 2040. Japan, by the way, is far from alone in harbouring such ambitions.

What's more, while the need for ongoing engine maintenance will likely thwart full autonomy in the realm of deep sea commercial shipping for some time to come, a host of unmanned surface vehicles (USVs) are already performing a widening range of real-life duties, from scientific research to port security.

Indeed, at the end of July, US-based AI firm Buffalo Automation and alternative transport operator Future Mobility Network unveiled what they describe as “Europe's first fully autonomous robotaxi service” in the Netherlands. Rather than running on the road as is Waymo's mode of choice, this system instead provides a solar-powered river ferry service in the municipality of Teylingen in South Holland that, hailed via a ridesharing app, “opens the door for cities across the EU to adopt this ground-breaking alternative form of transportation.”

A ton of data

But whether by land, sea or air, level 4 (high automation) and level 5 (full automation) vehicles still have many technological barriers to overcome before they transcend the pages of sci-fi, not least in terms of the amount of data they would have to amass, transmit and process. Putting a figure on this, Charles Sevior, Dell's chief technical officer for unstructured data solutions, told Verdict in August that when it comes to autonomous cars, test vehicles alone typically generate between 20 and 40 TB of data per day. Out in the wild, though, the true figure could well prove much higher.

For instance, Robert Bielby, senior director of system architecture and product planning for automotive at Micron Technology's Embedded Business Unit, estimated in February last year that while the average self-driving car in the US might generate between 1 and 15 TB per day, the daily amount for a road-based robotaxi could well leap to 450 TB.

Meanwhile, US law firm Baum Hedlund asserted in 2019 that a driverless car could ultimately generate around 100 GB of data per second.

This, if correct and the vehicle were to operate non-stop for 24 hours, would work out at a gargantuan 8,437.5 TB a day. At present, it is still too early to say which of these figures, if any, are more likely accurate.

Nevertheless, it seems fair to say that were level 4 and 5 vehicles to become an everyday reality, they would clearly generate a significant amount of data. And while consumers probably need not worry too much about how Big Tech boffins intend to actually store and handle all those digital titbits, they should perhaps wonder what all those ones and noughts are saying about them.

95% connected cars by 2030?

Not that chatty components are a new automotive phenomenon. For instance, in June 2017, the US Federal Trade Commission (FTC) and the National Highway Traffic Safety Administration (NHTSA) ran a workshop to discuss such matters as they pertained to Internet-enabled connected cars.

PYMNTS.com expects connected cars to account for some 95% of global new car sales by 2030 as opposed to around 50% at present.

While workshop participants noted that “many companies throughout the connected car ecosystem will collect data from vehicles”, FTC says in a subsequent report, much of this would consist of aggregate and non-sensitive data that could be used quite benignly for inter alia managing/monitoring congestion or vehicle performance.

Nonetheless, other types of data generated by connected cars would be much more sensitive and personally identifiable, potentially including “a fingerprint or iris pattern for authentication purposes” or navigational and other information about the vehicle's, and ergo the occupants', real-time location.

“Given all of this data collection, consumers may be concerned about secondary, unexpected uses of such data.” - FTC report

“For example,” it continues, “personal information about vehicle occupants using the vehicle's infotainment system, such as information about their browsing habits or app usage, could be sold to third parties, who may use the information to target products to consumers.” Although some people might find this helpful, “others may have concerns about recommendations based on, for example, [the] tracking of their usage of apps.”

The thing is, the amount of data garnered by connected cars pales when compared to what level 4 and 5 vehicles would need to accrue, such as that amassed via an array of external cameras and other sensors to provide computer vision and to ascertain their surroundings, other road users, potential hazards and even potential car thieves. Of course, exactly just how much of a privacy risk might be posed by a passing vehicle's possible use of radar, lidar or thermal imaging systems is open to debate. Nevertheless, many concerns have understandably been raised by the notion of a vehicle's cameras capturing things other people would much rather they did not.

Huge data set

But are these fears justified? Well, when it comes to the cameras on his company's waterborne robotaxis identifying members of the public, Buffalo Automation's CEO Thiru Vikram thinks not.

“We do have a huge data set that we collect and annotate but that's primarily to see what types of other vehicles might be there in the frame that could cause problems for our navigation system,” he tells CyberNews. “We're not actively trying to do facial recognition, so we have no idea who any people that might find themselves in our data are.”

Accepting the possibility that facial recognition software could be applied to such data, he also notes that it would equally be quite simple to anonymise images if so required. “Our AI is pretty good at knowing when it sees a human face,” he says. “So an easy solution would be to ask us to blur out the face of the individuals in the image.”

This is highly reminiscent of the privacy issues encountered when Google launched Street View and which led to the company blurring out any human faces it captured. Certainly, there seems to be relatively few technological barriers to applying this, with Austria's Celantur, for example, operating a cloud-based platform that it claims is capable of anonymising around 200,000 panoramas a day and 90,000 video frames per hour.

But facial recognition is arguably not the only potential issue at stake here as seemingly evidenced by China's recent decision to ban Tesla vehicles from military and other sensitive sites. While this move may have been partially motivated by politics and/or economics, the main reason cited in a Bloomberg report this past March concerned fears that the surround cameras employed by these vehicles' Autopilot driver assistance systems presented an unacceptable security risk to the Chinese military. The thinking being that they could harvest information about facilities rather than faces.

Whatever the case, when it comes to autonomous vehicles, privacy concerns are by no means limited to external matters.

While Thiru reports that the only passenger information that Buffalo Automation's robotaxis collect “is to process payments,” something “that's all done through a third party,” other autonomous vehicles might not stop there.

In addition to monitoring the state of the vehicle and whether, for instance, it needs maintenance or servicing, it is likely that a significant battery of cameras, microphones and other sensors will be trained on the occupants to analyse everything from their preferences and habits to how they are sitting or indeed lying down. Or, as Sam Abuelsamid, Guidehouse Insights' principal research analyst, e-mobility, put it in 2019: “The outside of [autonomous vehicles] will bristle with cameras, radar, lidar and other sensors and so will the passenger cabins.”

Moreover, as Polish tech firm Summa Linguae notes, Level 4 and 5 vehicles will also rely heavily on advanced speech recognition software to make decisions based not only on specific commands, but also more subtle inferences, such as “Oh no! I've left item X at home.” Thus, it seems not inconceivable that such vehicles will be designed to listen in on every spoken word and not just the range of apps an occupant might choose to select on the in-car infotainment system.

While some people might love the idea of being the centre of attention, others who have watched 2001: A Space Odyssey or read Orwell's 1984 might find this constant surveillance a tad unsettling to say the least. However, for David Navetta, partner at US international law firm Cooley and vice-chair of its cyber/data/privacy practice, the biggest privacy issue concerns location-based information and the ensuing ability to ascertain where people are, where they have been, who they are visiting and when and when not they are at home.

Equating this to someone tracking your phone, such a situation, he says, would not be unique to driverless vehicles but it would be "specific to cars" and could give rise to serious physical issues. “If someone leaves their house and it's known that they're gone, that could make their house potentially exposed,” Navetta states, adding that in general when companies or other third parties know a person's location by whatever means, they can track them and serve them ads. Perhaps unsurprisingly, “some people might take issue with that type of activity.”

The time is now

So how concerned should people be with the various privacy challenges presented by driverless vehicles? “I don't think consumers should be significantly more concerned about level 4 and level 5 vehicles than they should be with cars that exist now because [those cars] are already collecting, generating and sharing massive amounts of data,” says Chelsey Colbert, policy counsel at the Future of Privacy Forum and leader of the Washington, DC-based think tank and advocacy group's portfolio on mobility and location data.

That, though, is certainly not to say that they shouldn't be concerned. Rather, addressing these issues is something that should not be put off until the day that the Johnny Cabs of Total Recall are actually up and running and Minority Report deemed a documentary. “We should be thinking about it now,” she says.

And with that said, the big challenge facing consumers, Colbert believes, is “to keep up with all the technology that is in cars” and staying abreast of what data is being shared and with whom and what is then being done with that data. And to that end, Navetta would appear to agree.

More than just reading the privacy notices that come with a vehicle and their increasing number of interfaces, he urges people to try to understand the technology at stake. “More and more companies are building functionality into their products and services,” he says. “So understanding the technology you are working with is becoming more and more important if people are concerned about privacy.”

Drawing parallels with current mobile phones and social media, Navetta advises people to dig deeper into the settings and options that may not be readily apparent to the casual user.

“If you explore a little bit, you actually have a lot more control than you might think,” he says. However, to do this, the individual first needs to locate those options and settings while also learning how to exercise them. Thus the bigger picture for people wishing to protect their privacy is not just reading the terms and conditions, “but understanding how the technology actually works.”

“As consumers, we really need to shift our mindset from a car being something that we buy [and which] doesn’t really change over time,” Colbert says. Moreover, as ever more layers of connectivity and automation are added to vehicles, they essentially become big robots that blend hardware with continually evolving software. “Modern cars are already getting over-the-air software updates,” she states, explaining that this could not only impact data flows, but also “trigger new privacy obligations.”

Location and use

Of course, many issues pertaining to privacy will depend on where the vehicles in question operate, Colbert observes. For instance, the EU already maintains quite stringent privacy and data protection laws, most notably in the guise of the General Data Protection Regulation (GDPR).

Meanwhile, the situation is very different in the US, where, lacking any federal-level GDPR equivalent, it is instead up to individual states to determine what is and what is not acceptable, resulting in broad variations across state lines. Similarly, privacy implications will also likely depend on how any level 4 and 5 vehicles are actually used.

Noting that many people enjoy driving and would not want to stop doing so regardless of how technology develops, Thiru sees private spaces, such as campuses and the like, being particularly suited to the deployment of automated vehicles. This itself might also blur some lines as to who or what owns any generated data.

Meanwhile, citing the high cost of these vehicles, Colbert, along with other commentators, sees level 4 and 5 vehicles as most likely operating as robotaxis or performing automated trucking and delivery services, such as those forming the focus of Waymo's Via project.

This would mean that rather than having a direct relationship with the vehicle and being able to learn about privacy issues from a dealer, the average consumer would most likely encounter these vehicles in situations similar to how they might use a ride hailing app today.

In such a scenario, consumers, Colbert suggests, would be wise to first check out the privacy policy of the applicable robotaxi service provider and also pay full attention to any notices within the vehicle stating what data may be collected.

Certainly, consumers should not take it for granted that any such vehicles won't be collecting, processing and sharing significant amounts of information. Instead, it might be better to assume that they will. Indeed, this may be something that is already happening if, for instance, the ride hailing app driver, or indeed a conventional taxi driver, is operating a dash cam or similar recording device.

How safe is your data?

Such usage scenarios could also have implications in terms of cybersecurity. While connected cars have been proven to be hackable, when it comes to level 4 and 5 vehicles, the biggest threats may not actually concern black-hat hackers seeking to take over individual vehicles like they might do a phone or laptop.

Instead, Navetta identifies ransomware and other attacks capable of bringing down entire networks of vehicles while simultaneously compromising onerous amounts of data as potentially more worrisome.

That said, whether or not level 4 or 5 vehicles, either individually owned or as part of a fleet, would be more attractive, and indeed susceptible, to cybercriminals compared to other potential targets remains moot.

After all, as Thiru notes, the autonomy industry in general follows the standards of the broader cybersecurity industry, outsourcing its security needs to established players as opposed to developing its own specific security software. As such, the sector will likely use the common infrastructure employed by other industries that regularly handle sensitive data, such as banking and retail.

“I like to joke that [an automated vehicle is] as secure as your bank account,” Thiru says, asserting that there is nothing about driverless vehicles that makes them any riskier than other digitally-enabled systems. “The risk profile is not unique to us,” he says.

Privacy by design

This may well be so and from a security point of view, the multiple terabytes of data generated by a level 4 or 5 vehicle in any given day may well be deemed largely safe from the attentions of unauthorised actors.

However, in this post-Snowden world of seemingly endless snooping, surveillance and data mining, there remains the question in some minds at least as to what those actors with authorised access to that data might choose to do with it.

Furthermore, there are also question marks regarding exactly what data is being collected in the first place and whether it is strictly necessary from a functional and/or safety perspective.

Indeed, as Colbert believes, perhaps now is the ideal time for manufacturers to ask themselves just how much information their cameras and other sensors actually need to collect and just how much of that information actually needs to be identifiable. “Having those questions right up front at the design stage is crucial,” she says, expressing a cautious optimism that industry will embrace the privacy-by-design principle and build in suitable protections from the get-go.

What's more, were this to happen, it wouldn't just be consumers that would arguably stand to benefit. For instance, a 2020 survey commissioned by the Partners for Automated Vehicle Education (PAVE) unearthed considerable mistrust among US citizens when it came to the subject of driverless vehicles, with roughly three quarters of those polled viewing such technology as “not ready for primetime.”

Admittedly, the survey focussed more on perceptions of safety as opposed to privacy. However, anything that the sector might do to allay public fears would only help to advance its cause. After all, who wants to hail a cab that might then fleece you of every bit of data it can?

About the author: Brian Dixon is a freelance journalist and video editor with more than 20 years' experience covering business, tech and industrial beats for print and online publications in the UK, Poland and Sweden. A keen traveller, he has so far visited 75 countries on six continents. Pandemics permitting, he divides his time between the UK, Poland and Japan.