ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, - PowerPoint PPT Presentation

About This Presentation
Title:

ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS,

Description:

codon mismatch. and. amino acid mismatch (non-synonymous ... Script automatically counts codon usage. Output: spreadsheet with info about codon usage ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 2
Provided by: AKo78
Category:

less

Transcript and Presenter's Notes

Title: ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS,


1
ANALYSIS AND VISUALIZATION OF SINGLE COPY
ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND
OTHER PLANT SPECIES. Alexander Kozik and
Richard W. Michelmore University of California,
Davis, Dept. of Vegetable Crops, Davis, CA 95616,
USA
Approximately 3,700 of the genes in the
Arabidopsis Col-0 genome are single copy. These
genes were used to identify conserved orthologs
in several other plant species. Using
computational approaches we identified 1104
lettuce, 686 sunflower, 1704 tomato, 2016
soybean, 1701 maize and 1290 rice ESTs that are
conserved orthologs to these Arabidopsis genes.
Each EST sequence from these sets has an
unambiguous single strong BLAST hit to the
Arabidopsis genome. Reciprocal BLAST searches
(Arabidopsis single copy genes versus EST
assemblies) showed that more than 80 of BLAST
hits had only a single strong hit. It indicated
that the majority of these conserved orthologs
are represented by single genes in multiple plant
species. The total number of Arabidopsis genes
that have similarity (BLAST score 1e-20 or
better) to at least one of these selected ESTs is
2205, which is 60 of total number of single copy
genes in Arabidopsis. Only 248 sequences were in
common between EST collections from different
species and Arabidopsis single copy genes. This
can be partially explained by the incomplete
representation within each EST collection.
Analysis and visualization of single copy genes
over Arabidopsis chromosomes (http//cgpdb.ucdavis
.edu/COS_Arabidopsis/arabidopsis_single_copy_genes
_2003.html) revealed that these genes were
distributed throughout the genome regardless of
large scale chromosomal duplications. This
indicates that deduction of order of genes in
common ancestors is required for informative
analyses of synteny.
SINGLE COPY ORTHOLOGS SUMMARY
PIPELINE TO IDENTIFY SINGLE COPY ORTHOLOGS
BLAST search of selected ESTs versus all
Arabidopsis predicted proteins and selection of
ESTs with a single strong hit to Arabidopsis
genome (Exp cutoff 1e-20) step 3
source number of single copy orthologs
lettuce 1104
sunflower 686
tomato 1704
soybean 2016
maize 1701
rice 1290
common between all 248
common between lettuce and sunflower 431
Arabidopsis (total) 2205 (out of 3,714 single copy genes)
Arabidopsis predicted proteins (27,169 seqs)
lettuce ESTs (68,197 seqs)
sunflower ESTs (67,180 seqs)
BLAST search Arabidopsis proteins against
themselves and selection of Arabidopsis single
copy genes step 1
BLAST search of Arabidopsis single copy genes
versus full sets of ESTs selection of ESTs with
BLAST hits to Arabidopsis single copy
subset step 2
tomato ESTs (113,932 seqs)
Arabidopsis single copy genes (3,714 seqs)
soybean ESTs (341,564 seqs)
maize ESTs (362,510 seqs)
rice ESTs (107,329 seqs)
Raw data and detailed description of the sequence
extraction pipeline is available
at http//cgpdb.ucdavis.edu/COS_Arabidopsis/
PIPELINE TO EXTRACT ALIGNMENTS AT NUCLEOTIDE LEVEL
GenBank files of Arabidopsis genome (DNA
sequences of entire chromosomes and
corresponding annotation)
tab-delimited file with info about BLAST
alignments (start points and end points for each
sequence in BLAST report)
BLAST parser (Tcl/Tk script)
step 4
GenBank Parser
SeqsExtractorFromBlastX (Python script)
step 1
spliced DNA sequences corresponding to ORFs
step 5
final step of the pipeline
BLASTX search ESTs vs proteins
translation
extraction of DNA sequences corresponding to
BLAST alignments from spliced DNA (subject) and
EST (query) files. Script automatically counts
codon usage. Output spreadsheet with info about
codon usage
step 2
step 3
translated (protein) sequences subject
ESTs (unigene) set query
http//cgpdb.ucdavis.edu/COS_Arabidopsis/Codon_Usa
ge_Pipeline.html
MULTIPLE ALIGNMENT VISUALIZED WITH TkLife (
http//www.atgc.org/TkLife/ )
Graphical representation of BLAST search of
lettuce, sunflower, tomato, soybean, maize and
rice ESTs against Arabidopsis genome. The picture
displays potential conserved orthologs (single
copy genes in Arabidopsis). Each box (element) is
a single copy Arabidopsis gene having homology to
selected sets of plant ESTs. Genes are plotted
along five Arabidopsis chromosomes according to
their physical positions.
codon mismatch and amino acid mismatch (non-synony
mous substitutions)
codon match (and amino acid match)
codon mismatch and amino acid match (synonymous
substitutions)
Segmental duplication between Arabidopsis
chromosomes 4 and 5 Color Scheme
Black - single copy genes Purple -
kinases Green - cytochrome
Red - resistance genes Yellow -
ribosomal proteins Gray lines connect genes
with sequence identity 40 or greater Note
Single copy genes are distributed evenly through
both segments of the duplicated region. Image was
generated by GenomePixelizer using the locus
zoomer function. Additional information is
available at http//www.atgc.org/GP_Ref/presentat
ion/ Credits This work was funded by USDA
IFAFS Plant Genome Program to the Compositae
Genome Project Questions and comments to
Alexander Kozik, email akozik_at_atgc.org
CHRM 4
CHRM 5
Putative scenario of gene loss after segmental
duplication Because of extensive gene loss after
duplication, deduction of gene order in ancestral
genomes is required for informative synteny
analysis between different genomes.
Patterns of segmental duplications in Arabidopsis
genome (generated by GenomePixelizer
http//www.atgc.org/). Regions selected by white
boxes are shown in large scale above.
Write a Comment
User Comments (0)
About PowerShow.com