The Census Bureau is Making Big Changes to Keep Our Data Safe

By: Whitney Tucker | November 2019

Post Author

I spend a lot of my time trying to get people excited about participating in the 2020 Census. And with good reason. The Census count is critical to our democracy. It determines each state’s number of congressional seats. It also determines billions of dollars in federal funding to support state and local programs that every family depends on. But every time I make that pitch, people ask me about privacy.

How can I be sure my data will be kept safe? 

It’s an important question. The census serves as the richest data source in the country for understanding the U.S. population. Unfortunately that comprehensiveness is precisely what makes census data a high-profile target for cyberattacks and bad actors.

Differential privacy

Nerd out with me for a minute. A fundamental shift in decennial census data protection is underway right now. Data releases from the U.S. Census Bureau are about to become a lot more secure, but also less technically accurate, using a method known as “differential privacy.” The Census Bureau argues that a little inaccuracy is a price we must pay to maintain census data confidentiality and public trust.     

 

In the late 1990s and early 2000s, computer scientists began testing the limitations of the Census Bureau’s ability to keep open data truly anonymous. With newly available computing power, researchers were able to efficiently reconstruct individual information from statistical tables. They found that bad actors could potentially reconstruct enough personal census data to then link it to other publicly available data (like Facebook and Twitter accounts, credit reports, and property records). They were able to identify individuals in a process known as ‘reidentification’. 

Census researchers knew more would need to be done to keep pace with rapidly advancing technology moving forward. That’s where differential privacy came into play. Differential privacy is a mathematical concept that can be applied to data. It allows a user to mathematically determine the risk of revealing specific information in any statistical tables that might be released. The user can then decide how much “noise,” or mathematically random variations, should be introduced to that data to reduce the vulnerability. 

Randomness creates security

Differential privacy is fundamentally different from other techniques that the Census Bureau has used because it adds noise to the calculations used to produce data products, not just the products themselves. Stay with me. What that means is that while you could still hypothetically reconstruct records from a database that’s being protected with differential privacy, those individual records would be useless and unreliable. The database itself is protected with enough randomness to ensure confidentiality. 

This method gives a guaranteed level of privacy to individual data. It also means that anyone who wants to analyze census data in the future will need a much more sophisticated understanding of the margins of error. 

Critique from data scientists

The Census Bureau announced last September that it will apply differential privacy to the release of 2020 census data. As you can imagine, there are some researchers who don’t like it. Critics have argued that differential privacy goes above and beyond the necessary data protections under census law and precedent. They point out several conspicuous disadvantages of making this change: 

  • Some types of census data products may no longer be produced because they simply cannot be adequately protected; 
  • Some data products may be changed to ensure confidentiality, while others will not; and
  • Researchers are being asked to tell the Census Bureau in advance what sort of information they may want to know for future projects.

A number of data scientists feel that the move to differential privacy is simply too big a change, made too fast. The Census Bureau is forging ahead with differential privacy for Census 2020 despite these concerns. 

Decennial census data are fundamental to justice, democracy, and research in the United States. If those data are even potentially under threat, Census Bureau officials feel it is necessary to take preemptive measures to keep it safe. Differential privacy is the most comprehensive method currently available to do that. The change is still contentious, but at the very least it’s getting people talking about what can and should be meaningfully done with census data. And that warms my geeky, data-filled heart. 

Whitney Tucker is NC Child’s Research Director. She reminds you to #MakeNCCount by completing your census form for every household member on April 1, 2020. 

Support NC Child’s work to ensure health and wellness for all NC’s children. Click here to make a donation today.