Privacy expert: enterprises don't know where their sensitive data is

For years, companies tapped the bottomless well of data almost for free. But times change, and the "the more data, the merrier" concept is dying away as data shifts from being a commodity to being a serious liability for companies.

When I speak to startups and other relatively new companies, I often hear that they do not collect user data or go with the minimum information they need to run their business. However, with GDPR (The General Data Protection Rules) and similar regulations worldwide, customers now realize that their data belongs to them, and that they even deserve to get paid if someone uses it.

With legacy businesses, it is a bit different. For many years, they have collected as much data as possible just in case they can monetize it somewhere along the way.

Rick Hedeman, who has more than 20 years of experience in cybersecurity, believes that every company eventually will have to minimize the data they collect and use for business purposes. If they do not do that willingly, politicians and regulators will step in with some new halter.

The UK's data protection authority's latest guidance for data anonymization, pseudonymization, and privacy-enhancing technologies seems like one of those measures that should reduce companies' thirst for unnecessary data.

Data anonymization and pseudonymization

The UK’s Information Commissioner’s Office (ICO) published the first chapter of its anonymization, pseudonymization, and privacy-enhancing technologies guidance. It introduces the concepts mentioned above and explores when data can be anonymized to reduce risks.

“We understand the benefits that data sharing can bring to organizations, individuals, and society as a whole, but there are risks too. However, effective anonymization techniques provide a privacy-friendly alternative to sharing personal data,” ICO argues and asks for feedback on the draft.

Anonymization means that the data can’t be re-attributed to an individual. Pseudonymisation is a technique that replaces or removes information that identifies an individual. For example, it may involve replacing names or other identifiers with a reference number.

"This is an interesting and tricky area. Anonymization is more secure but less useful. Pseudonymization is more useful but less secure (there is a chance of identifying the individual). Both of these items are relatively straightforward when it comes to data elements but get tricky when you combine multiple data elements into an associated record," Rick Hedeman, senior director of business development at personal data and network analytics company 1touch.io, told CyberNews.

According to him, two completely pseudonymous sets of records,r when combined, may still be used to identify individuals.

"We see this all the time with sets of data like gender, age, and postal code combined with other data sets (like sentiment data, consumer data, or telemetry data). We don't think much about each of these sets of data on their own, but when combined, they can reveal much more than intended or realized," Hedeman explained.

These arguments are relatively academic and are most relevant to large data sets. However, in discussions with customers and partners, Hedeman thinks the most extensive exposure is still pretty simple - most enterprises don't know where their sensitive data is.

"Most of these organizations know where the data originates (for the most part) and what would be considered the source of record, but they have a very hard time understanding how the data is transformed, copied, and used throughout the organization. They most often will have controls around the source of record but much less likely around the copies and derivatives of that data downstream," he said.

For example, companies can point to their data sources, some information originating in the HR (human resources) systems. But once it gets copied to some report or a spreadsheet, it becomes much harder to safeguard that information.

"It is this area that I think the regulators are going to address - where can the effort have the best risk/benefit return. It is more about companies doing the basics well rather than particularly esoteric or complicated approaches," he said.

Cyberattacks do not go well with the general public, significantly when it painfully affects the consumers. Colonial Pipeline, when people experienced gas shortages because of the cyberattack, is only one of numerous examples.

"If you can't drive someplace, the politicians and the regulators are going to hear that pretty loud and clear. So I'm pretty sure you are going to see some impact," Hedeman said.

The sudden burst of sensitive data

The less data you have, the less you have to lose. But data has been a commodity for many years, and only recently have companies started to learn that it is also a huge liability. Regulators made sure they treat it as a liability, and cybercriminals showed just how expensive it might be to collect vast amounts of data without paying enough attention to its protection.

COVID introduced a whole new challenge. Not only cyberattacks skyrocketed, but the amounts of data on people that needed and still needs to be collected grew. Just imagine your employer already knew your home address and your kid's birthday. Now, he is also following your health, and each COVID test you take gets noted somewhere in the system.

"It has been interesting this past year with COVID because it introduced a lot of additional sensitive information that organizations have to figure out how to deal with, whether that's test results, vaccine statuses, or other things that normally you would think of as health-related data but ultimately may be operational data. For example, is this person allowed to come to work today? How do you capture that? How do you minimize the surface area and manage the privacy and security of that data," Hedeman explained.

Not only there's more information on people, but data that used to be somewhat contained in office settings is now dispersed with people working at least partly remotely.

Storing became costlier

Data used to be just like anything else you were reluctant to throw out and store in the attic in case you might need it sometime. The more data, the merrier was companies' motto for many years.

"Organizations were able to collect and store a lot more information starting, let's say, 10-15 years ago. There were minimal costs to doing that. The privacy regulations have raised the costs for organizations for just collecting that data," Hedeman said.

It is much easier to make the right choice for new companies than for legacy businesses that used to collect information even if it was not part of their core business.

"It is much more difficult for them to go back, find the data, delete it without breaking something," he explained.

A subject access request (SAR) can prove just how much data and how little understanding about it some companies have. Individuals in the UK can approach a particular company with a request to know whether any of their information is processed by the company, what that data included, and even ask for copies of the data.

Hedeman sent several SARs to organizations he'd been involved with. As a result, younger companies could provide the data he asked for while legacy businesses struggled.

"In one case with a very large organization, one of the largest information technology companies in the world of which I had been a partner, an employee, and a customer, basically asked me for more information than they provided to me. I was a little shocked. Interestingly, they came back to me and said they are not required by law to provide this information so they were not going to. And I know this organization is taking this pretty seriously. However, they still won't do it until they are required by mandate to do it," he said.

On the positive side, enterprises realize that data is a liability and are looking forward to collecting only the essential data for their businesses.

The situation varies across the world. For example, in Europe, you have to opt in to share your information. Meanwhile, in the US, traditionally, the default setting has been that you have to opt out if you do not want to share personal data.

"The default makes a huge difference because most people don't change the default. We've opted in to sharing our data for years, maybe not with such great effect," Hedeman said.