Title: milkER a milk informatics resource
1milkER a milk informatics resource
- Stephen Edwards BSc.
- University of Edinburgh
- BioNLP meeting 6th June 2005
2Overview
- Aims of milkER
- milkER database
- Text-mining
- Potential targets
3milkER aims
- To amalgamate disperse milk information into one
resource, allowing more focused analysis of milk
proteins in relation to dairy issues, health and
disease.
4A milk database
- Knowledge on milk affects many industries
- UniProt, GenBank excellent resources
- Marsupial genomics database (New Zealand)
- Glasgow genomics data
- Chinese database
- Polish bioactive peptide database
- Food property database (commercial)
5Milk components
- Fat, carbohydrates, proteins, minerals
- Growth factors, enzymes, enzyme inhibitors,
immunoglobulins, allergens, disease factors,
anti-bacterial proteins, opioids - 1. Deliberate
- 2. Leakage from blood
- 3. Result of disease conditions
- 4. Engineered
- 5. Bacterial origin
6milkER database
- Database using BioSQL which allows incorporation
of UniProt, EMBL, GenBank entries
7LOCUS NM_173929 790 bp
mRNA linear MAM 27-OCT-2004 DEFINITION Bos
taurus lactoglobulin, beta (LGB), mRNA. ACCESSION
NM_173929 VERSION NM_173929.2
GI31343239 KEYWORDS . SOURCE Bos taurus
(cow) ORGANISM Bos taurus
Eukaryota Metazoa Chordata Craniata
Vertebrata Euteleostomi Mammalia
Eutheria Cetartiodactyla Ruminantia Pecora
Bovidae Bovinae Bos. REFERENCE 1
(bases 1 to 790) AUTHORS Jayat,D.,
Gaudin,J.C., Chobert,J.M., Burova,T.V., Holt,C.,
McNae,I., Sawyer,L. and Haertle,T.
TITLE A recombinant C121S mutant of bovine
beta-lactoglobulin is more
susceptible to peptic digestion and to
denaturation by reducing agents and
heating JOURNAL Biochemistry 43 (20),
6312-6321 (2004) PUBMED 15147215 REMARK
GeneRIF Results suggest that the stability of
beta-lactoglobulin arising from the
hydrophobic effect is reduced by the C121S
mutation so that unfolded or partially
unfolded states are more
favored. ORIGIN 1 actccactcc
ctgcagagct cagaagcgtg atcccggctg cagccatgaa
gtgcctcctg 61 cttgccctgg ccctcacctg
tggcgcccag gccctcatcg tcacccagac catgaagggc
..
8 Information retrieval
Other Databases
EMBL
UniProt
Information extraction
Other Sources (e.g. published tables)
milkER population
milkER
Web Query
9milkER database
- Database using BioSQL which allows incorporation
of UniProt, EMBL, GenBank entries - Library of literature on milk
- User interface (www.milker.org.uk)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13Text-mining
- Machine reading of text
- Many techniques involved
- Tokenisation
- Stemming (Activation ? Activat)
- POS tagging (Protein ? noun)
- Abbreviation expansion (CN ? Casein)
- Entity identification (Casein ? protein)
- Dictionary
14Increased levels of IgA antibodies to B-LG were
found and were shown to be an independent risk
marker for type 1 diabetes.
Increased past participle levels
plural noun of preposition
Tokeniser / POS tagger
IgA antibody B-LG protein Diabetes disease
Entity identification
Parser
IgA antibodies to B-LG MARKER type 1
diabetes
15Information extraction
- Rule based
- interact bind activate
- protein (0-5 words) verbs (0-5 words)
protein - (Blaschke and Valencia, 2002)
- Machine-learning
- Statistical methods, Hidden Markov Models
- Learn interfillers, text lying between tagged
entities (Bunescu et al, 2004)
16Difficulties
- Synonyms
- Proteins and genes with same name
- Funny names e.g. ERK-1/2, and gene!
- Variability of natural language
- Compounded names
- Co-ordination, negatives, speeling errors
17Evaluation
- Precision (P) - how correct is output
- Recall (R) - how often does it pick
- F-measure - combines P and R
- IE systems can achieve high results, but not
enough to populate databases automatically
18Text-mining uses
- Aim to extract interactions and diseases
- Swanson (Fish oil)
- Srinivasan (Turmeric)
19General model for discovering implicit links
between topics Starting topic Turmeric
(inhibits) Intermediate topic Nuclear
factor-kappa B (involved in) Terminal
topic Crohns disease
Diagram taken from Srinivasan et al, 2004
20Targets for text mining
- Many milk relationships still require further
investigation - Positive reasons
- - nutritional benefits
- - neonatal growth
- - antimicrobial activity
- - bioactive peptides
21Targets for text mining (cont.)
- Negative reasons
- - recent link with Alzheimer's
- - diabetes link
- - asthma
- - human reactions to cow hormones
- (e.g. Acne, Danby 2005)
- - drug transfer to milk and effects
- - allergic reactions/intolerance
- - toxic contaminants
22milkER process
- 897 proteins, 772 dna, 1232 rna
- Analyze references (1465 MEDLINE refs)
- MeSH terms, GO terms etc
- POS tag
- UMLS standardisation
- Gene/protein dictionary
- Extract relations
23Milk literature
24milkER interactions
- Table of interacting proteins
- Store as queryable XML strings?
- Discover links between proteins and disease
- Create hypotheses
- Confirm experimentally
25Diabetes
- Pancreas secretes hormones
- Glycagon, increases conversion glycagon ? glucose
- Insulin, increases conversion glucose ? glycagon.
Allows glucose into cells. - Condition where the amount of glucose in the
blood is abnormally high as the body cannot use
it adequately as fuel
26Diabetes
- Affects 3-5 of industrialised populations
- Type 1 (10)
- Genetic and environmental factors (e.g. diet)
- Decreased insulin production
- Mostly develops lt age 20
- Type II (90)
- Resistance of body to insulin
- Normally develops gt age 40
- Often associates with high B.P, cholsterol and
arterial disease
27Milk and diabetes
28Selected quotes
- More research is needed on all aspects of
lactation in women with diabetes. - Reader D. et al, Curr Diab Rep. 2004
- The effect of high protein intakes from
different sources on glucose-insulin metabolism
needs further study - Hoppe et al, European Journal of Clinical
Nutrition 2005 - American children also tend to be heavier than
those from European countries, skewing the
growth charts further. - The Scotsman Sat 5 Feb 2005
- The government currently recommends that babies
should be fed breast milk alone for the first six
months - the WHO recommends two years.
29Conclusions
- Knowledge of milk vital in many areas
- milkER aims to bring disparate milk data together
- Text-mining can wade through large amounts of
data to retrieve and discover vital information
30Future work
- Relation extraction of milk literature
- Extend content of milkER to include interaction
data - Create hypotheses for experimental work
31Acknowledgements
- Prof. Lindsay Sawyer
- Dr. Carl Holt (Hannah Research Institute, Ayr)
- Prof. Bonnie Webber (Informatics)
- Dr. Alistair Kerr and Dr. Douglas Armstrong for
technical support
32References
- Acne/milk
- Acne and milk, the diet myth, and beyond (Danby,
2005) - Diabetes/milk
- Milk and diabetes (Schrezenmeir et al, 2000)
REVIEW - The role of ?-casein variants in the induction of
insulin-dependent diabetes (Elliott et al, 1997) - Text-mining
- Natural language processing and systems biology
(Cohen et al, 2004) REVIEW - Mining MEDLINE for implicit links between dietary
substances and diseases (Srinivasan et al, 2004) - Learning to extract proteins and their
interactions from MEDLINE abstracts (Bunescu et
al, 2003)