Title: GRIDTOOLS LIMITED
1 GRID-TOOLS LIMITED
Protecting Sensitive Data Records 2009
On the seventh day God created test data... and
man asked, but, how did you do that?
Grid Tools Ltd - Test Data Management solutions
2 GT-DATAMAKER
Data Obfuscation
3The data protection issue in non production
environments
- Data protection law is designed to protect the
privacy of - Individuals. This is achieved by placing
important - restrictions on the ways in which an
organisations can use - peoples Personally Identifiable Information
(PII). - Companies spend time and money to secure systems
from - external attacks but approx 70 of data breaches
occur from - within your organisation.
- Sensitive data can be found across an
organisation.. - Testing, Training, Development, back-up and QA .
- Application testing and development often
involves the - use of significant quantities of PII, usually
relating to your - customers or staff. It is common for such
personal data to be - used in a way that breaches data protection
requirements.
PROTECT SENSITIVE DATA RECORDS
4Defining sensitive data
- This can be an important task in itself
- High level - As a guide
- Personally identifiable Information (PII)
- Sensitive Personal Data relating to
- (a ) The racial or ethnic origin of the data
subject, - (b) Political opinions,
- (c ) Religious beliefs or other beliefs of a
similar nature, - (d) Trade union affiliations,
- (e ) Physical, mental health or condition,
- ( f ) Sexual life,
- (g ) The commission or alleged commission by him
of any offence. - Where and how people, store and use this data is
just as
PROTECT SENSITIVE DATA RECORDS
5Data protection legislation
- This is a Global Issue with one common theme
- Companies struggle to understand current
compliance issues and best - practice
- EU - Data Protection Directive 95/46/EC
- UK DATA PROTECTION ACT
- Primary concern Accuracy and Security of data
- Breaching a principle of the DPA is a criminal
offence enforceable with a prison sentence - Industry specific Standards
-
- Health Insurance Portability and Accountability
Act (HIPAA) - Payment card Industry Data Security Standard (PCI
)
PROTECT SENSITIVE DATA RECORDS
6DPA Compliance
- The DPA has 8 governing principles
- The 4 most relevant for Application testing
- and development are..
- Fair and lawful processing
- Excessive data
- Security
- Foreign transfers
- Addressing these areas will ensure
- both compliance and best practice
PROTECT SENSITIVE DATA RECORDS
7DPA Compliance
- Fair and lawful processing
- There are issues over gaining consent a
virtually impossible task - In the absence of consent what are the
- legitimate expectations of your customers.
- As individuals do not expect you to use their
- details in testing it is usually considered a
- legal requirement to treat sensitive data
PROTECT SENSITIVE DATA RECORDS
8DPA Compliance
- 2. Excessive Data
- Process no more personal data than is
- necessary for the purpose of processing
- In other words.. this principle requires
- organisations to process only the minimum data
- required for the relevant objective.
- The quantity of PII data must therefore be
- Reviewed as a legal requirement.
PROTECT SENSITIVE DATA RECORDS
9DPA Compliance
- 3. Data Security
- Use appropriate precautions to ensure the
- integrity and safety of sensitive data.
- The same measures safeguarding data in
- production environments such as firewalls,
- encryption and network security are not
- appropriate for Dev Test.
- Copying production is non compliant and
- widely considered poor practice
PROTECT SENSITIVE DATA RECORDS
10DPA Compliance
- 4. Foreign transfers
- Outlaws the transfer outside of the EU
- Unless adequate levels of protection for the
rights and freedoms of individual data protection
exist within that geographical territory - You need to answer these two questions..
- Are the individuals within that territory
protected by the same DP legislation that you
are? - Can you impose the necessary security controls
over who has access to sensitive data and can you
ensure these individuals treat this data
appropriately?
PROTECT SENSITIVE DATA RECORDS
11PCI Compliance
- The PCI DSS The Payment card Industrys
- response to data beaches and security threats
- The standard for protecting cardholder data that
is stored, transmitted or processed. - Relevant for payment card processors, point of
sale vendors financial institutions - Regulations for implementing security management
policies, procedures, network architecture,
design and other critical measures to improve
electronic payments. - Compliance involves establishing strict security
policies processes and procedures for all
companies administering payment card information.
PROTECT SENSITIVE DATA RECORDS
12Calculating the cost of a data breach
- The cost of a Data breach
- The Nationwide Building Society (UK) was fined
980,000 - after an employees laptop was stolen from his
home. 11 million - records were exposed to the risk of financial
crimes. - A large fine from the governing body is just
- the tip of the iceberg?
- Other financial consequences include..
- Loss of market share,
- brand damage,
- negative impact on customer retention
- Decreased revenue
- All the associated costs to rectify the problem
e.g. managing the front line response and
increasing security in the aftermath of the
breach.
PROTECT SENSITIVE DATA RECORDS
13Examples of ICO Enforcement orders
- www.ico.gov.uk/what_we_cover/data_protection/enfor
cement.aspx - 30 September 2008
- A formal undertaking has been signed by Virgin
Media Limited, agreeing to comply with the
seventh data protection principle. This follows
the loss of an unencrypted compact disc
containing the personal data of more than 3000
Virgin Media customers. - 25 September 2008
- The Information Commissioners Office is today
serving an Enforcement Notice against the
Department of Communities and Local Government
for contravening the Data Protection Act 1998 in
relation to their response to a subject access
request received by them. - 15 July 2008
- The Information Commissioners Office (ICO) is
today serving enforcement notices against the
Ministry of Defence following recent high profile
data breaches. - 21 February 2008
- The Information Commissioner's Office has found
Skipton Financial Services in breach of the Data
Protection Act. This follows the theft of an
unencrypted laptop which contained the personal
information of 14,000 SFS customers.
PROTECT SENSITIVE DATA RECORDS
UK Information Commissioner's Office
http//www.ico.gov.uk/
14Data ObfuscationThe Data Protection Act
- You cannot identify an original customer, account
or secure entity from the masked data - Overall data trends cannot be easily identified
- You use the minimum amount of data to accomplish
your needs - You use Best Efforts?
-
15Data ObfuscationCopying production data
- Copied data is usually out of date by the time it
is used for testing, making time specific tests
irrelevant. - New functionality will not have any pertinent
data. - Multiple users will set up specific test
scenarios which will be destroyed every time
production is re-copied to testing. - Large copies of production data on less powerful
testing hardware make queries and searches run
slowly and take up lots of expensive disk. -
16Data ObfuscationOther Data
- A small development database in which users
create data by hand, this usually contains a
large amount of invalid data. - Extract a subset of production data for use in
development using tools such as GT Subset. - Using capture playback tools such as QTP,
Forecast Studio etc to populate transactions
using the online applications. - Using data generation tools such as Datamaker to
build accurate test data. -
17Where and how to scramble
- The live data lives in a development environment
for a while unscrambled. - The scripts to scramble the data tend to get
forgotten, are not kept updated and tend to be
built by a single DBA who may move on. - Scripts tend to fall outside normal programming
control and are written in SQL scripts and non
standard languages such as PERL. These scripts
may well be perfect but tend not to be
documented, not incorporated in source control
systems and are not subject to testing by the
test department. - Database structures tend to change regularly and
the scrambling functionality needs to be upgraded
with each release. After a while the scrambling
routines tend to be forgotten. -
18Know your data
- Foreign Keys. How are tables related in the
database? - Documentation. This is usually held in a variety
of formats and applications, however, they are
rarely current. - User knowledge. What is the users understanding
of how and where key data is held and displayed? - Naming standards. A surprisingly good source of
information, column names in tables can give a
strong hint to their use and relationship to
other columns. - Data versions
-
19Know your data
- Data columns being used for multiple purposes.
It is quite common for limitations in an
application to be overcome by creative use of
fields. Thus a field used for one purpose
contains data to identify data for other uses.
Examples of this type of usage are comment fields
being used to hold structured information, these
comments may contain data that is sensitive for
example a temporary address or phone number. - Invalid Data. As applications and databases
evolve and merge with other systems data may be
created that is invalid. Users usually have an
idea that this invalid data exists however have
made the decision to ignore the data problems as
there is no critical problem that would justify
the time to clean up the data. -
20Documentation and traceability
- Which columns are sensitive and need scrambling?
- Who has access to any scrambling functions, i.e.
the code that scrambles should be protected as
well. - A before and after report of what the data has
been changed from and to. You can use database
compare tools such as Datamaker for this or
generate triggers to update audit tables. - Who has access to any working schemas or files
used in the scrambling process?
21Documentation and traceability
22Scrambling Methods
- Simple independent functions to put in random
text, dates and numbers. - Multi table column values, for example, an
account number is used in lots of tables and as
an identifier in other applications. - Offset values, for example, if a date is adjusted
then other related dates must shift in line with
the original date if a post code is changed then
corresponding address lines must also shift. - Database functions Every RDBMS comes with a
vast library of built in functions many of which
can be built up to scramble data quite easily. - Toolsets Tools such as Datamaker come with many
pre built functions. - Your own code Some of the scrambling you need
will be very specific to your systems, for
example, customer numbers can be built up of
combinations of locations, dates of birth and
partial names. There will be code in your system
already that builds these numbers so use the same
function as part of your scrambling strategy. - The internet Provides a vast array of free code
snippets which can be easily used.
23Scrambling MethodsSeed Tables
24Scrambling MethodsDynamic Tables
25Scrambling Methods
- Adding a small decimal increment to transaction
values can mask individual transactions, for
example, SELECT TRANSACTION_AMOUNT
TO_CHAR(SYSDATE,'DD') / 100 will add from 0.01 to
0.31 to a number dependant on the date. - Adding a number of days to all dates. A very
simple transformation to implement, assuming all
your dates are identified as date data type.
This also has the obvious advantage of allowing
time dependant process testing to be more
accurate, an example would beSELECT ORDER_DATE
7 FROM ORDERSBear in mind end of month
processing can be affected by this. You may be
better off using a cross reference table to match
up periods, for example
26Scrambling Methods
- A simple lookup to a value in the seed table, for
exampleselect seed_value from (select
seed_value, rownum rn
from seed_data
order by seed_value)
where rn mod((in_rownum - 1),wk_count)
1Will bring back a random value from a seed
table. - A simple substation, for example SELECT
DECODE(BANK_TYPE,C,S,L,M,P) will
reassign the code values C to S, L to M otherwise
P. - Top and bottom Coding. Setting a maximum and
minimum value for a column, for exampleSELECT
least(least(holiday_days,10) -1,
least(holiday_days -1,-4)) -1 holiday_days
FROM PeopleThe above will set the minimum
holidays days to 4 and the maximum to 10. - The above techniques are sometimes know as
Swapping or multidimensional Transformations.
27Scrambling Methods
- Use a hash function using date, time and rownum
as input to create random text or number values,
for example translate(to_char(ora_hash(in_rownu
m 1,4294967295,in_rownum)),'0123456789','
ABCDEFGH') - A simple text replace for a phone number is a
perfectly simple way of cleaning data, for
exampleSELECT 212-555-2121 PHONE_NUMBER, .
FROM
28Scrambling MethodsMulti Table Cross Reference
29Scrambling MethodsMulti Table Cross Reference
A simple character by character replacement is an
effective technique, basically shift character 5
and 6 in a string identifier to one more less and
one more respectively. An example would
beSELECT substr(card_number,1,6)
translate(substr(card_number,7,1),'0123456789','1
234567890') substr(card_number,8) CARD_NUMBER
FROM CREDIT_CARDSIn this example the 7th
character is being shifted up by one. As long as
you apply the same function to all of the
occurrences of this CARD_NUMBER then the system
will retain integrity. Bear in mind that
sometimes the column may be used for other
purposes. For example the column could contain
NO CARD NO YET, the scrambling function would
then fail.
30Scrambling MethodsMulti Table Cross Reference
Hashing is a key component in multi column
replacement. It allows values to be transformed
to the same value every time dependant on a hash
key. For example, 1 would be transformed to 7, 2
to 6, 3 to 1 etc. Each value has a unique
hashed value and is repeatable based on the hash
key. An example of this would besubstr(transla
te(to_char(ora_hash(in_value,4294967295,in_parm1))
,'0123456789','1234567890'),1,9)This would
build a hashed Social Security number. Using
dynamic seed tables will build an exact
replacement value for each identifier. You need
to protect the seed table as it contains the key
to crack the code and you must also protect the
offset algorithm, as this can be used to identify
data.
31Scrambling MethodsOffset Values
Micro aggregation The values of prior rows in a
set of transactions refer to each other. For
example a TRANSACTION_BALANCE is dependant of the
TRANSACTION_AMOUNT and the TRANSACTION_BALANCE of
the prior transaction.
Application process driven aggregations Many
systems have application components that
calculate balances based on transaction
throughput. These tend to be separate processes
which can be run stand alone. For example, if
you are adjusting transaction amounts the
customer balances may not match. Running the
balance adjustment process may be required to
reset these values.
32Scrambling MethodsOffset Values
Dates of birth Adjusting this by a few days,
will alter the age and also the age bracket so a
person may move into a different insurance
premium.An Order Date The Ship Date is after
the Order Date, thus adjusting one date means the
other must move by a similar amount.
33Scrambling MethodsLibrary of Functions
34Gathering Information
35Scrambling MethodsSynthetic Data
- Create a standard data object, for example a
Customer. - Tokenize or parameterise those objects such that
you can vary them when you create them. - Inherit objects such that you can make your own
edits without affecting the original data
objects.
36Synthetic Data -Data Objects
37Synthetic Data -Data Objects
38Scrambling Architecture
Production
Obfuscated
Clean Extract
Staging Production
Obfuscated
Production
Application
Generated
Production