Several surveys of IT professionals indicate that they use live data for testing their applications and in software development. So if your organization is in the practice of using real data as part of your software testing strategy,
Dangers of using real data
As far as India is concerned, I have observed that many organizations are not aware of the dangers of using real data in a testing environment. Real data contains confidential information such as personal (or business) information, customer (or employee) records, credit card numbers, and payment transactions.
During the testing and development phases (which are less secure), data is more likely to face unauthorized access, and get stolen or mishandled (that is, it could be lost through laptops or removable drives, or be mishandled in the hands of third parties). Data breach is the biggest danger of using real data as part of such software testing strategies.
This being the case, why do enterprises use real data in a testing and development environment? The reason is that non-production environments have mostly been created by copying production data because of the absence of, inadequacy of, or ill-followed policies and procedures. Also, data creation (data simulation) is complex, time-consuming, and may not be able to represent all the possible permutations and eccentricities of real data. Hence, most organizations clone the real data as part of their software testing strategy, and place the test servers in a physically secure area.
Recently, there have been cases where regulatory bodies have imposed large fines on organizations for the loss of sensitive information. This can also lead to damage to the corporate brand and reputation.
Measures to secure real data
I have observed that organizations implement a few compensatory controls (some levels of database security) as part of their software testing strategy so that only certain authorized individuals can access particular tables and data. They also deploy two-factor authentication methods, segregate the test server in a separate segment, and restrict physical entry access or network access.
While these measures provide a high degree of data protection in the production environment, they are not necessarily the best approach for a non-production environment. So here are four tips to guide your software testing strategy if you use real data.
Software testing strategy 1: Create policies around real data
Organizations should create or revisit policies around test data to specify that production data should not be used in non-production (like testing, development or training) environments. At the same time, these procedures should not fail to mention the type of technology and recommended data size to prepare test data (This is to ensure that test data simulates the production environment and ensures correctness of test results). The policies should also clearly mention who is responsible for creating test data or protecting data during development and testing.
Software testing strategy 2: Ensure proper access rights
Organizations try to ensure that data breaches do not happen in production environments by heavily protecting physical access and encrypting disks and data against threats from external hackers. All this effort goes in vain when a breach can happen easily in internal test environments. Also, inadequate security restrictions are easier for a tester (technically skilled user) to bypass. Hence, the enterprise needs to ensure tight security and a role-based access control mechanism for internal users and developers as part of its strategies for testing software.
Software testing strategy 3: Use appropriate techniques to protect real data
A de-identifying technique (also known as de-sensitizing / masking / sanitizing of data) takes data from a production system and converts it to non-sensitive data suitable for testing or analysis. This technique, along with a reduction technique (sub-set to a right-sized data) can be utilized to secure real data used in development and testing. There are many ways to de-identify and mask data as part of your software testing strategy; it depends on the type of data used by the application or system.
The results of the above strategies for software testing have to be appropriate in the application context. They must make sense to developers and testers while performing their tasks. That is, any field mandating alphanumeric characters, data range and number of characters should be substituted with other alphanumeric characters within the set data range and follow the limit of number of characters.
Software testing strategy 4: Use data masking
Data masking generally means test data generation from a production environment. Industry and regulatory standards (such as PCI) now mandate the protection of real data. Organizations which need to meet these compliances as part of their software testing strategy have two options: they need to either create test data manually or use data masking. Data masking is easier. These technologies are effective, scalable and easy if performed properly. For example, only sensitive data must be masked, the masked data must not be reversible, and the masked data must represent real data.
About the author: Shalini Gupta is the senior project manager and head of quality at Paladion Testing Services. She has Co-authored "Handbook of Security Testing for Banking Application", presented at OWASP Mumbai Chapter.
(As told to Dhwani Pandya.)
This was first published in October 2010