Data Needs To Be Anonymized For Dev And Test
November 14, 2018 Scott Heinlein
(Sponsored Content) Every company needs to create applications and then test those applications against a database that has real data in it. And therefore, most companies will take their test data from production databases and make a copy of it, or a subset of it, to do such testing, usually by doing a bulk copy. And at that moment, all of the sensitive data in that database is exposed to the programmers and testers.
Given that in excess of 80 percent of information leaks start from an internal source, this is a potential – and possibly great – security risk for any company.
It is good business to protect your customer data, but what is really driving the need for data anonymization tools such as DOT-Anonymizer from ARCAD Software, is the European General Data Protection Regulation (GDPR) law, which went into effect at the end of May. While this is a European regulation, it affects any companies that do business in Europe and any companies located outside of Europe that have European customers.
The law allows for very steep fines to be imposed on companies that are found to not be compliant with GDPR, and Twitter is one of the first companies, for instance, to be investigated by European regulators to make sure they are compliant. Among its provisions, the law requires the active consent of users for their personal information to be stored, so companies have to have mechanisms in place to ensure this. GDPR also requires the secure processing of data, and some kind of anonymization if real data is used in testing.
Right now, only 12 percent of companies in the United States are compliant with GDPR, and we get the impression that most companies don’t really know about it here and the Europeans are already starting their investigations and fines, so the storm is coming. As best we can figure, about half of European companies are not GDPR compliant, so even Europeans have a lot of work to do.
DOT-Anonymizer, which was developed as the GDPR was going from concept into law but has uses beyond GDPR in terms of securing vital customer data, can help secure data in such a way that it is still useful as far as testing applications goes but it utterly useless in terms of someone trying to gain access illegally to personal data stored in that database. As the name suggests, DOT-Anonymizer is a database anonymization tool that connects to practically any database through a JDBC connector, including the various versions of IBM Db2, MySQL, Microsoft SQL Server, Oracle’s eponymous databases. If it has a driver, DOT-Anonymizer can connect to it.
DOT-Anonymizer hooks into existing development and test processes and scrambles the data in ways that it is still readable and useful for the purposes of testing application, but it is anonymized. So, for instance, if you want to scramble first and last names in the database, we can have it replace the real data with homonyms – meaning two different spellings of something that sounds the same. They are still readable and sound like names, but they are different from the original values. The tool can scramble social security numbers, and we have algorithms for all different kinds of data.
There are two ways to anonymize data – one is to do it in bulk and actually convert the database from production to scrambled format or doing it in real time. Given that ARCAD is fully aware that in regression testing you have to be careful not to change your data structures or screen fields or the regression tests will fail, it is not surprising that we have taken the interactive anonymization approach rather than a static one.
To be precise, DOT-Anonymizer is done in real-time, and it replaces the developers and testers link to the database. This is a simplified view of the way it works:
And this is a more detailed view of what is going on:
In this scenario, developers and testers make their requests for data to DOT-Anonymizer and it is processed in real-time against the real database. It is not creating a copy of the database all jumbled up, but rather scrambling it in various ways on the fly. And importantly, this scrambling is not reversible. There is no way to take the scrambled database and somehow work it back to the original data. System admins have to tell the anonymization engine at the heart of DOT-Anonymizer how they want specific kinds of data scrambled, and from that point forward, it will do the scrambling one record at a time as each record is called into the database.
DOT-Anonymizer is multi-platform, so it can run on servers with IBM i, Windows Server, or Linux operating systems, and since it is written in Java, it could be ported to other platforms as needed. It snaps into an Eclipse integrated development environment, like most of the tools at ARCAD. It requires Java V1.5 or higher on the server and the current Java Development Kit.
The tool has an API, so it can integrate with existing testing tools programmatically and integrate with existing development and test and quality assurance processes.
To run DOT-Anonymizer, the system needs a minimum of 1 GB of main memory and a minimum of a 2 GHz Pentium processor – so that is not really that much of a machine at all. Our laptops have more computing power than this, and in fact I have run DOT-Anonymizer on a Windows laptop and it can take a really heavy database query load. Obviously, it will take a heavier machine to do a heavier database query load. For IBM i shops, it doesn’t really matter where it is installed so long as it can connect to the Db2 for i database, and that can be on a distinct machine or in a logical partition that is associated with test and development.
Right now, all of the customers using DOT-Anonymizer are located in Europe, but there are customers in the United States that are doing proofs of concept on the tool. The risk of data breaches is too great to ignore, and the penalties are too high.