Title: Human Genome Structure and Organization
1Human Genome Structure and Organization
- Bert Gold, Ph.D., F.A.C.M.G.
2Genetic Variation
- Phenotype
- Expression of the genotype (modified by the
environment). The structural or functional nature
of an individual. Includes - appearance, physical features, organ structure
- biochemical, physiologic nature
- Genotype
- Genetic status, the alleles an individual carries.
3Learning Objectives
- Recap and Update Public and Private Human Genome
Project Status - Provide Reminders of Necessary Background for
Genetic Disease Association and Linkage Studies
4Definitions
- Penetrance - The probability that an individual
who is at-risk for the disorder (ie- carries
the gene) develops (expresses) the condition.
May be age dependent. - Expression - The characteristics of a trait or
disease that are outwardly expressed.
Eg-myotonic dystrophy myotonia, cataracts,
narcolepsy, frontal balding, infertility. - Ascertainment The method used in gathering
genetic data. Study conclusions differ depending
on how affected individuals entered the study. - Phenocopy Individuals whose phenotype, under
the influence of non-genetic agents, has become
like the one normally caused by a specific
genotype in the absence of non-genetic agents. - Pleiotropy - The quality of an allele to produce
more than one effect ie- to manifest its
expression in the structure and/or function of
more than one organ system or tissue - Recurrence Risk Likelihood that a relative of a
proband for a rare disease will have the same
disease.
5Penetrance and Expressivity
- Penetrance Proportion that expresses a trait
- Complete P1.0 or 100
- Incomplete (reduced) Plt1.0 or lt 100
- Expressivity Severity of the phenotype
- Expressivity may vary
- Between families (interfamilial) or
- Within families (intrafamilial)
- TRY NOT TO CONFUSE VARIABLE EXPRESSIVITY WITH
INCOMPLETE PENETRANCE
6Chromosomes, Genes and Proteins
- Genes are on Chromosomes
- Genes may encode proteins or RNA
7Non-coding RNA genes
- tRNAs (497 were counted, 821 when count genes and
pseudogenes) - tRNAs found are consistent with Wobble
- Codon bias only roughly correlated with tRNA
distribution - rRNAs
- small nucleolar RNAs (snoRNAs)
- snRNAs (spliceosome constituents)
- 7SL RNA
- telomerase RNA
- Xist transcript
- Vault RNA
8tRNAs
9Some chromosomes are richer in genes than others
Number of Nucleotides in Exons
10HOXA, HOXB, HOXC and HOXD are in regions with a
particularly low density of repeats This is
believed to result from the presence of
Cis-acting elements in this vicinity.
11Proteins demonstrate patterns and similarity of
function
12Functionally and Structurally similar proteins
are organized into families
e.g.- E.C., SWISS-PROT, TrEMBL,
13In silico approaches to characterize genes
include
- PFAM, searchable via HMMER
- Other in silico collections include
- PRINTS
- PROSITE
- SMART
- BLOCKS
- Creation of an Integrated Protein Index (IPI)
14How many genes are there?
- Estimates from the Public Program
- RefSeq
- Exons
- Introns
- Average Sizes
- Coding Sequences (CDS)
- Alternative splice products (about 3)
- Creation of an Integrated Gene Index (IGI)
- Genscan to Ensembl to Pfam via GeneWise (31,778)
- Could be as low as 24,500 using overprediction
corrections.
15Estimates from Celera25,086 in Assembly 3
16Pre-existing estimates
- W. Gilberts back of the envelope calculation
- Reassociation Kinetics
- Estimates from Double Twist using Promoter
Inspector plus - Unpublished estimates from Human Genome Sciences
17Size of Genes
- Largest Dystrophin 2.7 Mb
- Titin
- 80,780 bp coding
- 178 exons
- largest single exon 17,106
18GENE HOMOLOGS, ORTHOLOGS, PARALOGS
- Vaculolar sorting machinery in yeast
- ABC gene superfamily
- Ig gene superfamily
- FGF superfamily
- Intermediate filament superfamily
- PROTEIN FAMILY EXPANSION APPEARS TO BE A PRIMARY
EVOUTIONARY MECHANISM
19The proteome
- Functional categories
- PRINTS
- Prosite
- Pfam
- Interpro (http//www.ebi.ac.uk/interpro/)
20GENE ONTOLOGY
- Standard Vocabulary
- Hierarchy of terms (Directed ACYCLIC Graph)
- Ashburner Nature Genetics 2525-29 (2000)
- Bushy model
21Horizontal Transfer controversy
- One of the major conclusions of the Public Genome
effort, published in Feb. 15, 2001 Nature was - Hundreds of human genes appear likely to have
resulted from horizontal transfer from bacteria
at some point in the vertebrate lineage. Dozens
of genes appear to have been derived from
transposable elements - This has now been widely disputed and is believed
to result from - Microbial contaminants in the sequence.
- Bacterial gene integration into pre-vertebrates
- And
- The more probable explanation for the existence
of genes shared by humans and prokaryotes, but
missing in nonvertebrates, is a combination of
evolutionary rate - variation, the small sample of nonvertebrate
genomes, and gene loss in the nonvertebrate
lineages. - -Salzberg et. al., Science
22Splice Pattern, 98 GT-AG
23Chromatin Structure
- Euchromatin
- Heterochromatin
- Nucleosomes
24Chromosome Facts
- Chromosomes replicate during S phase
- Chromosomes recombine during Pachytene
- Recombination is an obligate activity
- Sex chromosomes recombine with each other
25Cytogenetics is done by Karyotyping
- Chromosomes are chemically frozen in metaphase
- Must be carried out on dividing cells
- Microfilament inhibitors
- Microtubule inhibitors
- Membrane lysis
- Pronase, trypsin digest
- Giemsa stain
- G-bands correspond to regions of relatively low
GC content - http//genome.ucsc.edu/goldenPath/mapPlots/
- http//genome.ucsc.edu/goldenPath/hgTracks.html
26Cell Division Meiosis
- Segregation
- Defined Alleles are paired gametes receive one
of each. - Exceptions trisomy and uniparental disomy
- Independent Assortment
- Gene Pairs segregate independently
- Exception linkage
27Meiosis Creates Gametes
- And provides a basis for genetic recombination!
28Genetic Recombination
- Crossing Over
- Resolution
- Recombinant Chromosomes
- OBLIGATE ACTIVITY
- FEMALE RECOMB. RATES HIGHER THAN MALE
- INCREASED RATES AT TELOMERES
- PARADOX SHORT ARMS SHOW MORE THAN LONG ARMS
- 1cM is 1 Mb on long arms, but short arms are 2 cM
per Mb and the Yp-Xp pseudoautosomal region is 20
cM per Mb.
29INCREASED RATES AT TELOMERES
30PARADOX SHORT ARMS SHOW MORE THAN LONG ARMS
31Genes
- Units of heredity
- Encode proteins (and some RNAs)
- Human genetics is the study of gene variation in
humans - Gene as a term is used ambiguously to refer
both to the locus and the allele ie- There is
only one locus but two alleles in a given
individual. - Sequencing in both genome projects took place
upon multiple alleles this has led to some
assembly confusions. - Ultimately want a haploid genome map.
32The Human Genome Project
- International public effort commencing in 1990 to
sequence the entire human genome by 2005. - STS approach chosen in 1991
- Private effort launched in 1996 by Celera using
Shotgun cloning
33BAC clones, sequenced into BAC end reads, and
assembled into contigs
34Markerless contigs in the Celera assembly are
called Scaffolds
35Markers are BAC ends in the shotgun
36Mate pair reads provided the core of Celera
sequence
37Draft human genome sequences complete by February
2001.
- Published simultaneously in Feb. 2001
- Public Sequence in NATURE (409 745-964)
- Celera Sequence in SCIENCE (291 1145-1434)
38Greater than 50 of sequence is repetitive
3945 of the human genome is derived from
transposable elements
- Long Interspersed Elements LINEs (21 of genome)
- LINE1 Some Still Active, Autonomous, consist of
two ORFs (one is a pol). - LINE2
- LINE3
- Short Interspersed Elements SINEs (13 of
genome) - ALU Some still active, use L1 enzymes to
replicate - MIR
- Ther2/MIR3
- LTR Retroposons
- Consist of gag and pol
- Protease, rt, RNAseH, integrase all encoded
- Reverse transcription occurs cytoplasmically,
using a tRNA to prime replication - DNA Transposons
4098.5 of sequence is non-coding.
- Approximately 1/3 of the human genome is
transcribed (public guess).
41Allelism
- Alternate forms of a gene
- e.g.- Sickle Cell, CFTR
- Recessive disease
- e.g. Achondroplasia, Tuberous Sclerosis
- Dominant Disease
42Heterozygote or Homozygote
- 1,2 or 1,1
- homogeneity of alleles at a locus
43Genetic Markers
- RFLPs
- VNTRs (STRs)
- Microsatellites
- STSs
- SNPs
- Tools used to find disease genes
- Flags with locations throughout the genome
44Polymorphism Information Content versus
Heterozygosity (PIC vs. het)
- Determining heterozygosity from SNP rare allele
frequency - Information Content in SNPs versus STRs
45Typology of SNPs
- Type I- Coding, non-synonymous, non-conservative
- Type II- Coding, non-synonymous, conservative
- Type III- Coding, synonymous
- Type IV- Non-coding, 5-UTR
- Type V- Non-coding, 3UTR
- Type VI- Other non-coding
- Type I and Type II SNPs have lower heterozygosity
than other SNPs, presumably as a result of
selective pressure. - About 25 of type I and type II SNPs have minor
allele frequencies gt 15 - About 60 have minor allele frequencies lt 5
46Mutation
- Occurs more often during male meiosis
- Occurs more often in long genes
- More easily detected in Dominant Diseases
- Achondroplasia
- Duchenne Muscular Dystrophy
- May often involve CpG mutating to TpG
47Autosomal Recessive Inheritance
- Two copies of a gene required to be affected
- Carriers have one copy of the mutation and are
unaffected - 25 of offspring of two carriers will be affected
- Males and females affected in equal number
- Eg. Sickle Cell, beta-thal., CF
48X Linked Recessive (Sex Linked)
- Females rarely affected
- No male to male transmission
- Affected males transmit gene to all daughters
- Eg- Duchenne Muscular Dystrophy, Hemophilia A
49Autosomal Dominant Inheritance
- Each child at 50 risk
- Does not skip generations
- Often, lethal in double dose
- Large genetic load
50X-linked Dominant Pedigree
- Example is Hypophosphatemic, Vitamin D Resistant
Rickets - Distinguished from Autosomal Dominant by
- No male-to-male transmission
- All daughters of affected fathers are affected
51IMPORTANT NOTE
- Dominant and Recessive refer to the phenotypic
expression of alleles, NOT to intrinsic
characteristics of gene loci.
52Inheritance Pattern Complexities
- Pseudodominant Transmission of a Recessive
- Pseudorecessive Transmission of a Dominant
- Misassigned paternity, causal heterogeneity,
incomplete penetrance, germline mosaicisim - Mosaicism
- Mitochondrial Inheritance
- Penetrance and Expressivity
- Semi-dominant, gender- influenced, age-related,
transmission-related, imprinting - Uniparental Disomy (UPD)
- Environmental effects, phenocopies
53Preview of linkage analysis
- Characterizing Human Genetics
- Long generation time
- Inability to control matings
- Inability to control study population
- Inability to control exposures to environmental
conditions - It is possible to define phenotypes well!
- Can study genetic structures through family
history - Link phenotypes and genetic structures through
statistical methods