Title: Ethic Core August 19th, 2004
1Ethic Core August 19th, 2004
Jan-Eric Litton, Karolinska Institutet,
Stockholm, Sweden
2Karolinska Institutet Dept. of Medical
Epidemiology and Biostatistics and KI-Biobank
3Sharing Data
ID MURA_BACSU STANDARD PRT 429 AA.
DE PROBABLE UDP
-
N
-
ACETYLGLUCOSAMINE 1
-
CARBOXYVINYLTRANSFERASE
DE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP
-
N
-
ACETYLGLUCOSAMINE
DE ENOLPYRUVYL TRANSFERASE) (EPT).
GN MURA OR MURZ.
OS BACILLUS SUBTILIS.
OC BACTERIA FIRMICUTES BACILLUS/CLOSTRIDIUM
GROUP BACILLACE
AE
OC BACILLUS.
KW PEPTIDOGLYCAN SYNTHESIS CELL WALL
TRANSFERASE.
FT ACT_SITE 116 116 BINDS PEP (BY
SIMILARITY).
FT CONFLICT 374 374 S
-
gt A (IN REF. 3).
SQ SEQUENCE 429 AA 46016 MW 02018C5C
CRC32
MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT
IEGLPEISDI ETLR
DLLKEI
GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA
MLGRFKQAVI GLPG
GCHLGP
RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV
VSVGATINIM LAAV
LAEGKT
IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE
LHGCKHTIIP DRIE
AGTFMI
4Do we need standards ?
- There is no need for standards
- unless/until we start to integrate
- In other words
- I dont care about electrical plug socket
- until I travel and wish to integrate my laptop
into the plug socket in my hotel
5A historical essay The Machine Screw
- Principle discovered around 400 BC
- Limited use until machine tools made mass
production possible (18th cent.) - Every machine shop and foundry made unique sizes
and thread dimensions - 1841 Joseph Whitworth presented The Uniform
System of Screw-Threads to Britains Institute
of Civil Engineers - 1864 William Sellers proposes On a Uniform
System of Screw Threads to the Franklin
Institute, Philadelphia - Enabled interchangeable parts and tooling for
mechanization and mass production - 1945 British and American standards merged
6Point-to-point integration of data
- Application includes subprogram
- to each different data source
- Operations on data must be
- processed by an application
- Lots of coding efforts
- Fully dependent of
- data resources
7The load-in-database approach of integrating
data
- Data are loaded in the database
- Data need filtering, cleaning,
- transformation
- Data must be refreshed
- Scripts must be written
- Time consuming to refresh data
- Up-to-date data can not be
- guaranteed
ODBC - JDBC
8 Federated data approach
- Data stay untouched
- Integrates
- heterogeneous local or
- remote data sources
- through wrappers
- Just need to know what
- data should be available
- to whom and how to access them
- It makes all data look
- like its one virtual database
- hiding the data layer complexity
9 Federated data approach
- Federated data technology
- Discovery Link (DB2)
- Distributed SQL (Oracle)
- Major problems
- Remote connection
- Speed
- Security
10 Federated data approach
- Relational wrappers list
- DB2 family
- Informix (Informix Client SDK)
- Oracle (SQLNet or Net8 client)
- MSQL Server (ODBC driver version 3.0 or later)
- Sybase (Sybase open client)
- Teradata (Teradata CLI)
- ODBC (ODBC driver version 3.x)
- Non-relational wrappers list
- Lotus Extended search
- Excel
- Flatfile (CSV format)
- XML (1.0 specifications)
- OLE DB
- BLAST
- Documentum (Documentum client API)
- Entrez (version 1.0)
11Web vs. File Exchange
- File Exchange
- Files pushed - duplicated
- Multiple data management system
- Configuration control issues
- Sporadic communication
- WEB
- Data pulled as needed - when and
- how much
- Access via single data management source
- Continuous communication
12Ontologies
- Controlled vocabulary means
- only one controlled term is used for a given
concept - Data Model
- Data structuring mechanism in which an ontology
is expressed
13- Database completion
- a common, secure database established in
- Europe for all relevant scientific information
in GenomEUtwin - Ten first months
- a database structure established
- Shared approaches to
- Infrastructure
- Core facilities
- Bio-informatics
- Harmonized methods
14- EUid number (EUIDNUM) 752000021210
- The EUidnumber consists of four parts
- ? Country code 3 digits ISO 3166
- ? Randomized number 7 digits
- ? Identification number 1 digit
- Check sum 1 digit
- Shared approaches to
- Infrastructure
- Core facilities
- Bio-informatics
- Harmonized methods
15- Shared approaches to
- Infrastructure
- Core facilities
- Bio-informatics
- Harmonized methods
16 - Shared approaches to
- Infrastructure
- Core facilities
- Bio-informatics
- Harmonized methods
17- Shared approaches to
- Infrastructure
- Core facilities
- Bio-informatics
- Harmonized methods
18 Key requirements
- 1. Genotype and phenotype data are kept in
- separate databases
- 2. Phenotype data must be in full control of
national - centers
- No common data repository for the phenotype data
- Study units can access the data only upon
agreement - 3. Anonymous Genotype data is less sensitive and
- can be collected into one repository
- Only EUTWINIDs can be used (no local person,
- sample and family identifiers)
- Limited access
- National centers can see their own data
- Study units can access the data as whole
- 4. Secure database infrastructure
- Shared approaches to
- Infrastructure
- Core facilities
- Bio-informatics
- Harmonized methods
19Phenotypes
Genotypes
Distributed SQL
- Shared approaches to
- Infrastructure
- Core facilities
- Bio-informatics
- Harmonized methods
Stockholm
Genotypes/institution specific slice
GT
GT
GT
LIMS / Instrument databases
Access control
Phenotypes
Helsinki
Uppsala
Tracking info
Samples and data
Samples
Samples and sample data
Tracking info
National centers
20GenomEUtwin NET
SQL
VIRTUAL PRIVATE NETWORK
21- Shared approaches to
- Infrastructure
- Core facilities
- Bio-informatics
- Harmonized methods
22 what we learned, so far..
- Database core and epi-core and stat-core and
etic-core must work hand in hand - To many involved don't know what a database is
and why we are using it - To many are still thinking flat-file
- Federated database
- Must apply from scholar program from IBM, else
- A fulltime skilled dba at each center working
close to epi- and stat-core - Security
- VPN vs SSH
- Shared approaches to
- Infrastructure
- Core facilities
- Bio-informatics
- Harmonized methods
23jan-eric.litton_at_meb.ki.se