Maker Pro

Securing Consumer Privacy While Still Ensuring the Accurate Quality of the Data Collected

April 27, 2020 by Kelcie Moseley

Research highlights the importance of preventing data breaches through accidental exposure or employee malpractice and proposes a new method of scrambling survey datasets while keeping the integrity of the data.

Consumer Data Collection is Vital But Risky If Not Secured

Consumer data has become one of the most valuable commodities of the modern-day world. A survey from PwC showed 94% consider data on customer and client preferences/needs as critical or important. It helps shape the direction of companies, drives sales and marketing, and holds a host of other purposes.

The valuable information that can be gleaned from surveys and other data collection efforts is vital to the success of many organisations, but it comes with many risks of exposing personal data, and breach after breach in the headlines reinforces those risks.

In fact, one occurred just this week to the Small Business Administration — approximately 8,000 individuals had their data exposed on the federal agency’s website before they were able to fix the privacy error in the application portal.  

While this happened to a federal agency, the consequences can be much direr for a technology company or other brand. A 2019 survey showed 81% of respondents would stop engaging with a brand online if a data breach had occurred. 


A Recurring Problem

Data privacy laws do exist and require encryption or the transformation of original data into protected data, but it doesn’t necessarily mean data breaches won’t occur. More than 2,000 confirmed data breaches took place in 2019, and worse, 34% of them occurred as a result of the actions of employees or other internal resources.

To ensure people keep responding to important data collection efforts, they need to be reassured that their information will be secure. Data privacy breaches not only harm the reputation of a company and erode trust with consumers, but they are often costly as well — something that can be detrimental to even the largest businesses.

The data breach at Target in 2013 cost the company millions of dollars in profits. And for small businesses, the average cost of $200,000 for a cyberattack can mean the difference between keeping the doors open and shuttering for good.


Security computer data graphic.

To ensure better security for the users of electronics, it takes more than encrypting data. 

Scrambling Survey Datasets for Security

Research from Drexel University’s Matthew Schneider and Vanderbilt University’s Dawn Iacobucci published in the Journal of Marketing Analytics discussed a new methodology at least for survey datasets that could help protect consumer privacy while protecting the integrity and accuracy of the data. They note that encryption isn’t enough to protect a company from a breach.

“Encryption definitely helps, but it does not prevent a data breach," Schneider said. "It's similar to safeguarding your email password. An internal actor with access to the encryption key could easily cause a data breach. It's more conservative from a risk perspective to assume that all data will eventually get out and should be transformed prior to sharing anywhere within the organisation".

The researchers examined a survey by the city of Austin that, as part of the dataset, asked 2,614 Asian Americans for their ethnic origin, zip code, and gender.

“Nearly everyone is identifiable with these four variables — some more so than others," said Schneider. "Once you identify them, this survey revealed other sensitive responses such as employment status, religious affiliation, household income, housing affordability and many attitudinal questions”.


A New Method That "Shuffles" Data 

The new methodology, as proposed by the authors, was modeled after a technique in genomic sequencing applications that is able to disguise a consumers’ identity while maintaining the accuracy of the data within 5%.

“Our method would essentially 'shuffle' the demographic data in a survey dataset," said Schneider. "But, unlike previous methods, ours only shuffles data when it maintains the correlations between important variables that are essential to analysts. The protected data is simulated on a consumer level, but still valuable to the end user. If this dataset got out, then only the organisation's insights would be known".

Related Content


You May Also Like