Dr. Alon Kaufman, Duality: “the world should collaborate on sensitive data while preserving privacy”
With cybercriminals always lurking around many corners of cyberspace, users are becoming more aware about the value of their data. And when the world had to unexpectedly face the COVID-19 pandemic, it showed that data collaboration can also be beneficial, for instance, for noticing certain trends.
As Dr. Alon Kaufman states: “The data driven world we live in essentially requires us to analyze and collaborate on data coming from multiple sources and organizations, but the privacy world is limiting it.”
Cybernews invited Dr. Alon Kaufman, the CEO and Co-Founder of Duality – a company that specializes in empowering organizations to securely collaborate on sensitive data with their business ecosystem, to discuss data collaboration and privacy issues.
How did the idea of Duality come to life? What was the journey like since your launch?
My career began as a data scientist (actually way before the term data science existed, when I did my PhD in computational neuroscience) and I was fortunate enough to work with several companies within the data science space. I later joined RSA where I was responsible for building an online learning fraud detection engine. In order for this engine to be successful, we needed to collect data from multiple sources, in this case banks. By combining data on fraudulent transactions from multiple banks, we could build machine learning models that were enabled for improving the ability to detect fraud, way beyond what could be achieved on a single bank’s data set. This emphasized and demonstrated the huge value and impact data collaboration within the context of data science has to solve real world problems.
Based on the success of the fraud detection project, I received responsibility for data science across all of RSA as well as RSA Labs (the division of RSA that works on encryption). This is how I was first introduced to the world of encryption. This brought together two dramatically different worlds – data science where your goal is to try and find the needle in the haystack, and encryption where the goal is to hide the needle in a haystack.
At the same time, the issue of data privacy was rising in importance. A seemingly unsolvable conflict arose between the need to protect data privacy and the need to gather and analyze as much data as possible to solve real world problems. The success we had with approaching fraud detection in this manner was a result of being able to gather sensitive data from several different sources (banks) into a single data set that we could run models on.
The big question then became, “Can the world collaborate on sensitive data while preserving privacy?” It turns out the answer is ‘yes’, with secure computing technology. Secure computing brings together data science and cryptography to solve the conflict between data privacy and data collaboration. During my service at RSA, we were dealing with data collaboration challenges in the cybersecurity world. Fortunately, in one of my periodic meetings with Rina Shainski, whom I had known for many years, we discussed these challenges and Rina shared that she had teamed up with world-renowned cryptographers who had achieved excellent results on amazing technology that can allow users to analyze data while encrypted. Rina was so fascinated by the new opportunity and its transformative potential that she had decided to leave her role as a leading VC and start a company. The match was ideal - I was exposed to the actual business problem, and Prof. Shafi Goldwasser, Prof. Vinod Vaikuntanathan, and Dr. Kurt Rohloff, together with Rina, were developing the technology into a market-ready solution, and so Duality was born.”. We founded Duality to enable organizations to bring together data science, cryptography, and other Privacy-Enhancing Technologies (PETs) to solve the business conflict between data privacy and the need to analyze data collaboratively. Today, leading industry and government organizations partner with Duality to maximize the value of their data, including DARPA, Intel, Scotiabank, Oracle, IBM, World Economic Forum (WEF), and more.
Can you tell us a little bit about what you do? How is Duality changing data science?
I believe the next frontier of data science is collaboration. The more data we have, the better the insights we can extract from it. At the moment concerns about data privacy, be it personal data of patients or customers, proprietary information or intellectual property, limit our ability to share data. The data exists, but we are not able to get the full value from it.
The past two years of the COVID-19 pandemic have made the importance of data collaboration clear to the world. We want to quickly detect emerging trends, track the effectiveness of interventions and vaccines, and so on. This requires multiple medical institutions and countries to work on all the data together in a collaborative manner, but nobody wants their personal health information to be shared with third parties. On the one hand, we want researchers to quickly learn as much as possible about the pandemic, but on the other hand, we are not willing to share our personal medical information.
The data driven world we live in essentially requires us to analyze and collaborate on data coming from multiple sources and organizations, but the privacy world is limiting it.
This is a critical conflict that Duality has solved. Making privacy preserving collaborative data science possible will unleash the exponential value of sensitive data sets.
In your opinion, which industries will benefit the most from data collaboration?
Data is critical to every industry and sector, without exception. The first movers are sectors that are driven by data but also must comply with strict privacy regulations: finance and financial crime, health sciences, insurance, government, and telecommunications. I think marketing, adtech, and automotive will soon follow because these are also very data driven industries that are experiencing more and more regulation and limitations on the use and sharing of data. These limitations are a real threat to advertising, so adtech will need solutions that allow them to continue gaining insights from consumer data without ever sharing it with third parties.
Has the pandemic affected public opinion and attitudes toward data collaboration?
The pandemic was an excellent example that data sharing and collaborative data science can expedite health treatments and government decisions, but at the same time also highlighted the need for privacy.
The pandemic turned everyone into a data scientist. We all constantly sought out data about what was going on in order to make better decisions to protect ourselves and our families. Prior to the pandemic, most people only thought about keeping all their health data a secret, but COVID-19 made us realize the potential importance of making health data available for analysis. Now that everyone is an armchair data analyst, people are beginning to understand that privacy versus utility is not a one-way equation.
Prior to the pandemic, everyone thought of privacy as holy, anyone promoting tougher privacy protections was automatically hailed as the ‘good guy’, as a ‘data privacy hero’. The tougher the restriction the better, no questions asked. On the opposing side, the data giants like Google and Facebook/Meta were vilified as the ‘bad guys.’ The pandemic helped people understand that the issue of data privacy is not that simple. Yes, privacy is a very important principle, it is a personal right and absolutely must be upheld. Public health at the same time is important as well. And as such, the need to combine the two and find a way to analyze data and yet maintain privacy and confidentiality has become critical.
In your opinion, what are the biggest mistakes companies make in their data strategy?
Many companies are focused solely on analyzing their own data because they assume they could never gain access to other data sets due to privacy requirements and confidentiality constraints. Once companies realize it is possible to collaborate on data while preserving privacy and staying compliant, it will usher in a new era of data collaboration. When organizations start really collaborating with each other, they will finally be able to mine all the precious ‘gold’ out of their data.
What would you consider the biggest data privacy issues prominent today?
Data destruction. Today, the most popular approach to working on private data is to de-identify it using various techniques like deleting or masking personal identifiers with hashing, suppressing, or generalizing quasi-identifiers, etc. These de-identifying techniques are limiting and prone to re-identification. Firstly, they don’t really protect privacy. For example, in the case of the Netflix Recommendation Contest, the company de-identified viewer data before sharing it publicly with contest participants, but were hit with a $1M lawsuit and forced to cancel the contest after hackers managed to use another publicly available data source to re-identify the customers. There is a false sense of security in using these techniques.
Anytime you change or delete data, you degrade the quality of that data and hinder its value. Unfortunately, many regulatory schemes require the outright destruction of precious data. For example, HIPAA explicitly requires deletion of 18 data fields. This regulatory requirement is meant to prevent any possibility of identifying the individual, but deleting so much information degrades the value that can be extracted from the data. Today, in the name of privacy, organizations are being forced to damage the value of the data. Duality enables companies to realize the full value of data, while still protecting privacy.
Would you like to share what’s next for Duality?
Duality will continue to push forward into the new data collaboration frontier. We are very fortunate to have the very best minds in data science, privacy enhancing technologies and software engineering gathered together in one team. In the coming months and years, we will help our customers answer questions that up until now they were unable to even ask. We will help them gain access to highly valuable, and many times sensitive, multiparty data sets and create a network effect of extracting exponential value by using collaborative data science.