Introduction to EMBOSS

About This Presentation

Title:

Introduction to EMBOSS

Description:

User required to apply for a BIOINFO account to use the tools on ... No good solution yet but advantageously replaceable by indexsearch. Stringsearch (mode A) ... – PowerPoint PPT presentation

Number of Views:395

Avg rating:3.0/5.0

Slides: 120

Provided by: kcch8

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to EMBOSS

1
Introduction to EMBOSS

Christine Ho
chrisho_at_cc.hku.hk

2
Web page of EMBOSS

The programs of EMBOSS is available at
http//bioinfo.hku.hk/EMBOSS/
The files required for this lecture is available
at
http//bioinfo.hku.hk/tutorial/
User required to apply for a BIOINFO account to
use the tools on the web and off-line, and to
download the databases.
BIOINFO account is open freely to the public to
register, and usage on the BIOINFO is restricted
for academic and research purposes only.
How to apply BIOINFO account
HKU members Submit the HKUESD application
Form(Cfe-139)
Non-HKU members submit the application form of
http//www.hku.hk/ccoffice/forms/cf139.pdf
Question and comment biosupport_at_bioinfo.hku.hk

3
What is EMBOSS?

EMBOSS (The European Molecular Biology Open
Software Suite) is a free Open Source software
analysis package that provides a comprehensive
set of sequence analysis package specially
developed for the needs of the molecular biology
user community.
Within EMBOSS you will find around 100 programs
(applications).
More information about EMBOSS can be found at
http//www.uk.embnet.org/Software/EMBOSS/

4
Main Programs in EMBOSS

Retrieve sequences from database
Sequence alignment
Nucleic gene finding and translation
Protein secondary structure prediction
Rapid database searching with sequence patterns
Protein motif identification, including domain
analysis
Nucleotide sequence pattern analysis, for example
to identify CpG islands or repeats.
Codon usage analysis for small genomes
Rapid identification of sequence patterns in
large scale sequence sets
Presentation tools for publication

5
Starting EMBOSS

There are three ways to start EMBOSS
Command line after login bioinfo.hku.hk
Web interface (EMBOSS-GUI)

6
Command line of EMBOSS

Inside HKU campus
telnet bioinfo.hku.hk
Outside HKU campus
Windows machine
Use putty, see http//bioinfo.hku.hk FAQ Q13
Linux or UNIX machine
ssh ltusernamegt_at_bioinfo.hku.hk

7
Web interface of EMBOSS

Directly access the web page at
http//bioinfo.hku.hk/EMBOSS/
Or browse the BIOSUPPORT Homepage
http//bioinfo.hku.hk/ and select Tools Option

8
Web interface of EMBOSS

Click on the link EMBOSS - GUI

9
Programs in EMBOSS

Parameters in EMBOSS
Input can be
Uniform Sequence Addresses (USAs) path in the
format
database
databaseentry_name or databaseaccession_number
(e.g. emblxlrhodop or emblL07770)
databasewildcard (swopsd_a)
filename
filenameentry
formatfilename
_at_list
The sequence data to be pasted in the text area.

10
Programs in EMBOSS

Output will be
Textual and/or graphical representation of data.
The output can be saved as text file or in some
cases image file in PNG or PS format.

11
EMBOSS online help

The documentation for EMBOSS is available at
http//bioinfo.hku.hk/emboss/

12
Difference between GCG and EMBOSS
13
Replacement of GCG programs

Exchanging sequences between packages

14
Replacement of GCG programs

Sequence editing, manipulation and display

15
Replacement of GCG programs

Translation

Sequence comparison and alignment

16
Replacement of GCG programs

Patterns and gene finding

17
Replacement of GCG programs

Phylogeny

Mapping

18
Replacement of GCG programs

Protein analysis

Primer selection

19
Replacement of GCG programs

Keyword-based databank searching

20
Running EMBOSS program

EMBOSS programs are run by typing them at the
Unix prompt, or by using an interface.
The EMBOSS command syntax follows normal Unix
command conventions.
Programname -help
to get some help on the options.
Programname -opt
to make the program prompt you for common
options.
tfm programname
to get the full help on a program.

21
Login bioinfo

Login bioinfo with telnet bioinfo.hku.hk
If you are using the temp account, please create
a directory of your username at hkusua
bioinfo mkdir ltusernamegt
E.g. bioinfo mkdir chantaiman
Change directory to your created directory
Bioinfo cd ltusernamegt
E.g. bioinfo cd chantaiman

22
wossname

It is easy to forget the name of a program.
To find EMBOSS programs, use wossname
wossname finds programs by looking for keywords
in the description or the name of the program.

23
wossname

Type wossname at the Unix prompt
bioinfo wossname
Displays one-line description.
Prompts you for information
Finds programs by keywords in their one-line
documentation
Keyword to search for restrict
SEARCH FOR 'RESTRICT
recode Remove restriction sites but
maintain the same translation
remap Display a sequence with
restriction cut sites, translation
etc..

24
Optional parameters

To get prompted for all the optional parameters,
type the following
bioinfo wossname -opt
Finds programs by keywords in their one-line
documentation
Keyword to search for protein
Output program details to a file stdout myfile
Format the output for HTML N
String to form the first half of an HTML link
String to form the second half of an HTML link
Output only the group names N
Output an alphabetic list of programs N
Use the expanded group name N

25
help

bioinfo wossname -help
Mandatory qualifiers
-search string Enter a word or words
here.
Optional qualifiers ( if not always prompted)
-outfile outfile this program will write the
program names
Advanced qualifiers
-noemboss bool EMBOSS program
documentation will be
searched.
Mandatory - required, are often parameters (in
)
Optional - use -opt to be prompted for these.
Advanced - things that are not often used!

26
Writing to the screen

Note that the default output file for wossname
was
stdout (Standard output)
Use this whenever prompted for an output file.
This is a magic file name.
It displays the output on the screen, not a file.

27
Working with sequences

EMBOSS reads sequences from files or databases.
It automatically recognizes the input sequence
format.
You can easily specify many output formats.

28
Getting sequences from the databases

Database single entry (ID)
databaseentry
For example emblhsfau
Wildcarded entries (Query)
databasehs
For example swfos_
All entries
database
Most databases will support all 3 methods - some
may not.

29
showdb

bioinfo showdb
Displays information on the currently available
databases
Name Type ID Qry All Comment
domo P OK OK OK DOMO sequences
enspep P OK OK OK ENSEMBL PEP
sequences
gp P OK OK OK GENPEPT sequences
gpnew P OK OK OK New GENPEPT
sequences
kabatp P OK OK OK KABAT Protein
sequences
nrl P OK OK OK NRL_3d
pdb P OK OK OK PDB sequences
pir P OK OK OK PIR using NBRF
access for 4 files
rem P OK OK OK REMTREMBL sequences

30
seqret

Reads in a sequence, and writes it out.
bioinfo seqret
Reads and writes (returns) a sequence
Input sequence emblxlrhodop
Output sequence xlrhodop.fasta
bioinfo more xlrhodop.fasta
gtXLRHODOP L07770 Xenopus laevis rhodopsin
ggtagaacagcttcagttgggatcacaggcttctagggatcctttgggca
aaaaagaaac
acagaaggcattctttctatacaagaaaggactttatagagctgctacca
tgaacggaac
.
.

31
seqret from the command line

Give seqret all of its data on the command-line.
It doesnt need to prompt for anything else.
bioinfo seqret emblxlrhodop -outseq
xlrhodop.fasta
The -outseq can be abbreviated to -out.
Any abbreviation must be unique.
Even shorter, leave out the qualifier
bioinfo seqret emblxlrhodop xlrhodop.fasta

32
Changing output formats (reformatting)

seqret can reformat sequences by specifying the
output format
bioinfo seqret emblxlrhodop xlrhodop.gcg
-osformat gcg
bioinfo more xlrhodop.gcg
!!NA_SEQUENCE 1.0
Xenopus laevis rhodopsin mRNA, complete cds.
XLRHODOP Length 1684 Type N Check 9453 ..
1 ggtagaacag cttcagttgg gatcacaggc ttctagggat
cctttgggca
51 aaaaagaaac acagaaggca ttctttctat acaagaaagg
actttataga
.
.

33
Multiple sequences, single files

You can use seqret to retrieve multiple sequences
into a file
bioinfo seqret swopsd_a opsd_a.seqs
This retrieves all the sequences whose
identifiers start with opsd_a into a file
called opsd_a.seqs.

34
Multiple sequences, many files

If you wish to write one sequence per file, use
bioinfo seqret swopsd_a -ossingle
The output filenames will be based on the
sequence entry names.
The program seqretsplit will split an existing
multiple sequence file into many files.

35
Asterisk on the command line

You can't use a on the UNIX command-line.
UNIX tries to match it to filenames.
Use it quoted, either with quotes or a backslash
"embl"
embl\
For example
bioinfo seqret emblhsf hsf.seq

36
EMBOSS web interface

On the left, you can choose the program to run.
You can also see all the program sorted
alphabetically instead of sorted by group by
clicking on the link.

37
Getting help in EMBOSS

Help on the program is available by clicking on
the question mark.

38
Input to EMBOSS

If you know the entry_name or accession number,
enter the sequence in the Uniform Sequence
Addresses (USAs) format
E.g. emblxlrhodop

39
Input to EMBOSS

If you have your own sequence file, upload the
sequence by clicking the browse button.

40
Input to EMBOSS

You can also copy and paste your own sequence
into the text area.

41
seqret web interface

E.g. seqret - retrieving single sequence
Input
USA path emblxlrhodop
Output file format GCG 9.x/10.x
Output
The sequence retrieved in GCG format

42
seqret
43
seqret
44
seqret

Seqret retrieving multiple sequences
Input swops2_. Output file format Pearson
FASTA
Output multiple sequences with the identifier
starting with swops2_.
Save the file as ops2.fasta by right clicking on
the link

45
coderet

Extract CDS, mRNA and translations from feature
tables. If any sequences are in other entries of
that database, they are automatically fetched and
incorporated correctly into the final sequence.
Input emblX03487

46
coderet

Output

47
dottup

dottup Comparison between 2 sequences using
dot-plots.
Input
1st sequence emblxl23808 (Xenopus laevis
rhodopsin gene)
Second sequence emblxlrhodop (Xenopus laevis
rhodopsin cDNA from complement of mRNA)
Output
A dotplot showing the diagonal lines representing
areas where the two sequences align well in PNG
format.
The image can be saved into the computer.

48
dottup
49
dottup

The 5 diagonal lines represent areas where the
two sequences align well.
Since this is aligning genomic and cDNA, the five
diagonals represent the five exons of the gene.

50
Pairwise Sequence Alignment

An alignment is an arrangement of two sequences
which shows where the two sequences are similar,
and where they differ.
There is no unique, precise, or universally
applicable notion of similarity.

51
Global Alignment

A global alignment is one that compares the two
sequences over their entire lengths, and is
appropriate for comparing sequences that are
expected to share similarity over the whole
length.
The alignment maximizes regions of similarity and
minimizes gaps using the scoring matrices and gap
parameters provided to the program.

52
needle

Function
Needleman-Wunsch global alignment
Description
This program uses the Needleman-Wunsch global
alignment algorithm to find the optimum alignment
(including gaps) of two sequences when
considering their entire length.
The computation is rigorous.
It can be time consuming to run if the sequences
are long.

53
Input sequence for needle
54
needle

needle - Needleman-Wunsch global alignment
Input1st sequence emblxlrhodop, 2nd sequence
emblxl23808
Output Global alignment showing the 5 aligned
regions.

55
Local alignment

Local alignment searches for regions of local
similarity and need not include the entire length
of the sequences.
Local alignment methods are very useful for
scanning databases or other circumstances when
you wish to find matches between small regions of
sequences, for example, between protein domains.

56
water

Function
Smith-Waterman local alignment.
Description
Water uses the Smith-Waterman algorithm (modified
for speed enhancements) to calculate the local
alignment.

57
water

water - Smith-Waterman local alignment.
Input1st sequence emblxlrhodop, 2nd sequence
emblxl23808
Output Local alignment showing the 5 aligned
region.

58
Multiple Sequence Analysis

Multiple sequence alignments are used
To find patterns to characterize protein
families.
To detect or demonstrate homology between new
sequence and existing families of sequences.
To help predict the secondary and tertiary
structures of the new sequences.
As an essential prelude to molecular
evolutionary analysis.

59
emma

Function
Multiple alignment program - interface to
ClustalW program
Description
EMMA calculates the multiple alignment of nucleic
acid or protein sequences according to the method
of Thompson, J.D., Higgins, D.G. and Gibson, T.J.
(1994). This is an interface to the ClustalW
distribution.

60
Upload file to emma

Input output from seqret (ops2.fasta) retrieving
all swissprot sequences whose identifiers begin
with swops2_
Click on browse button to upload the file
ops2.fasta

61
Input sequence to emma

ops2.fasta

62
emma

emma interface to ClustalW program
Output multiple alignment saved as file
ops2.aln.

63
prettyplot

Prettyplot displays aligned sequences, with
colouring and boxing
Input output from program emma ops2.aln
Output graphic display of aligned sequences.
Identical residues in red, similar residues in
green.

64
prophecy

Function
Creates matrices/profiles from multiple
alignments
Description
This creates a profile matrix file from a nucleic
acid or a protein sequence alignment.
The profile matrix file can then be used by
program profit or prophet.

65
prophecy

Input
Sequence output from program emma ops2.aln
Select type Gribskov

66
prophecy

Output A profile to be saved as ops2.prophecy.
This profile allows a new sequence to be aligned
optimally to a family of similar sequences in the
program prophet.

67
prophet

Prophet Gapped alignment for profiles
Input
Input sequence The file xlrhodop.pep, output
from transeq of the sequence emblxlrhodop from
110-1171 region.
Profile or matrix file ops2.prophecy
Output file ops2.prophet
Output The gapped alignment to profile. The
vertical bars () represent residues that are
identical between the ops2 consensus and our
rhodopsin, while the colons () represent
conservative substitutions. Aligning members of a
family can reveal conserved regions that may be
important for structure and/or function.

68
prophet

Output

69
plotorf

plotorf plots potential opening reading frames
Input sequence emblxlrhodop
Output graphical output showing the potential
opening reading frames in all six frames.
The longest protein is in second frame.
The correct open reading frame is the second
frame.

70
getorf

getorf - Finds and extracts open reading frames
(ORFs)
Input
Sequence emblxlrhodop
Type of sequence to output Nucleic sequence
between START and STOP codons
Output Textual information of the region and the
sequence of that region.

71
transeq

transeq - Translate nucleic acid sequences
Input
sequence emblxlrhodop
regions to translate 110-1171 (from information
of getorf)
Output Translated sequence of the given region.
Save the file as xlrhodop.pep

72
Exercise 1 Q1

Align HER2 _ERB2_HUMAN and UNKNOWN_AAL39899.1
with needle and water. What is the main
difference between the two types of alignment in
these two cases (the files HER2-fasta.prt and
ALL39899_1.prt are at http//bioinfo.hku.hk/tutori
al/)?
Repeat the Smith-Waterman alignment of
HER2-fasta.prt and ALL39899_1.prt with different
parameters. What happens if gap penalties are
changed to 30 and 2 instead of the defaults 10
and 0.5?
BLOSUM62 is default. What happens to the local
alignment (using program water) when using other
matrices, e.g. EPAM10?

73
Exercise 1 Q2

Type gbA7120FTSZ in the text box and run seqret.
Run entret with the same sequence USA and examine
the entry. What is the difference between the two
entries?

74
Exercise 1 Q3

With the program infoseq, display information on
all sequences whose name starts with 10 in the
SwissProt database. (hint the sequence is
sw10, choose the information you want to
display by changing to yes)

75
Exercise 1 answer (A1)

Needle output

76
Exercise 1 answer (A1)

Water output

77
Exercise 1 answer (A1)

Water output with gap opening penality of 30 and
gap extension penality of 2.

78
Exercise 1 answer (A1)

Water output with matrix of EPAM10

79
Exercise 1 answer (A1)

The global alignment (needle) require the whole
sequences to be aligned. The identity and
similarity is much less than local alignment
(water).
If the gap penalties are changed to 30 and 2, no
gap appears in the alignment
If EPAM10 is used, the score and alignment length
drops. Since PAM is derived from global
alignment, it gives worser result for the local
alignment program water. EPAM10 is more suitable
for very similar protein with no more than 10
evolutionary divergent.

80
Exercise 1 answer (A1)

Amino Acid substitution matrices
PAM (percent accepted mutation) lists the
likelihood of change from one amino acid to
another in homologous sequences during evolution.
One PAM is a unit of evolutionary divergence in
which 1 of the amino acids have been changed.
some amino acid substitutions occurred more
readily than others, probably because they did
not have a great effect on the structure and
function of a protein.

81
Exercise 1 answer (A1)

Amino Acid substitution matrices (cont)
BLOSUM matrix values are based on a large set
of 2000 conserved amino acid patterns called
blocks. Blocks come from a database of protein
sequences representing more than 500 families of
related proteins.
PAM is derived from global alignments of
proteins, while BLOSUM comes from alignments of
shorter sequences.
The matrix built from blocks with no more than x
of similarity is called BLOSUM X

82
Exercise 1 answer (A1)

PAM100 gt Blosum90
PAM120 gt Blosum80
PAM160 gt Blosum62
PAM200 gt Blosum52
PAM250 gt Blosum45
The Blosum matrices are best for detecting local
alignments.
The Blosum62 matrix is the best for detecting the
majority of weak protein similarities.
The Blosum45 matrix is the best for detecting
long and weak alignments.

83
Exercise 1 answer (A1)

If the BLOSUM62 matrix is compared to PAM160 then
it is found that the BLOSUM matrix is less
tolerant of substitutions to or from hydrophilic
amino acids, while more tolerant of hydrophobic
changes and of cysteine and tryptophan mismatches.

84
Exercise 1 answer (A2)

seqret output

85
Exercise 1 answer (A2)

entreq output

86
Exercise 1 answer (A2)

You will see the sequence for the Anabaena 7120
ftsZ and gsh-III genes.
EMBOSS is also capable of extracting more
information than just the sequence from a
database entry. The program entret will return
the entire entry as a text file.

87
Exercise 1 answer (A3)

Output

88
garnier

Garnier - Predicts protein secondary structure
using the Garnier-Osguthorpe-Robson (GOR) method
Secondary structure prediction is notoriously
difficult to do accurately. The GOR I alogorithm
is one of the first semi-successful methods.
The Garnier method is not regarded as the most
accurate prediction, but is simple to calculate
on most workstations.
Input translated sequence (xlrhodop.pep)
emblxlrhodop from 110-1171 region with program
transeq.
Output Predicted protein secondary structure

89
garnier

Output

90
pepinfo

pepinfo - Plots simple amino acid properties in
parallel.
Input sequence translated sequence
(xlrhodop.pep) emblxlrhodop from 110-1171 region
with program transeq.
Output A textual and graphical representation of
amino acid properties (size, polarity,
aromaticity, charge, etc). Hydrophobicity
profiles useful for locating turns, potential
antigenic peptides and transmembrane helices.

91
pepinfo

Showing the residues distribution

92
pepinfo

Hydrophobicity profiles are useful for locating
turns, potential antigentic peptides and
transmembrane helices.
positive score -gt a hydrophobic region.
negative score -gt hydrophilic region.
show seven highly hydrophobic regions.
use the program tmap to investigate further.

93
patmatmotifs

Patmatmotifs search a PROSITE motif database
with a protein sequence. It can identify to which
known family of protein (if any) the new sequence
belongs.
PROSITE currently contains patterns and profiles
specific for more than a thousand protein
families or domains.
PROSITE patterns (Biologically significant amino
acid patterns can be summarized in the form of
regular expressions)
PROSITE profile (techniques based on weight
matrices allows the detection extreme sequence
divergence protein families and
functional/structural domains)

94
patmatmotifs

Input sequence The file xlrhodop.pep, which is
output from transeq of the sequence emblxlrhodop
from 110-1171 region.
Output A textual representation showing where
the sequence match with a motif.

95
pscan

Pscan Scans proteins using PRINTS
PRINTS is a database of diagnostic protein
signatures, or fingerprints.
Fingerprints are groups of conserved motifs or
elements that together form a diagnostic
signature for particular protein families.
An uncharacterised sequence matching all motifs
or elements can then be readily diagnosed as a
true match to a particular family fingerprint.
Input sequence The file xlrhodop.pep, which is
output from transeq of the sequence emblxlrhodop
from 110-1171 region.

96
pscan

Output A textual representation showing where
the short sequences match with the PRINTS
database that defines functional protein families.

97
fuzznuc

fuzznuc uses PROSITE style patterns to search
nucleotide sequences.
Letter code for pattern
ACG stands for A or C or G.
AG stands for any nucleotides except A and G.
N(3) corresponds to N-N-N, N(2,4) corresponds to
N-N or N-N-N or N-N-N-N.
CG(5)TGAN(1,5)C
Input
sequence emblhhtetra
Pattern AAGCTT

98
fuzznuc

Output

99
Exercise 2 Q1

Use tmap to displays membrane spanning regions
with the input sequence of xlrhodop.pep (
translated with program transeq from
emblxlrhodop at 110-1171 region). Does the
result agree with pepinfo?

100
Exercise 2 Q2

Use fuzzpro to search sequence CREAp_m.txt
pattern CXXXXC (the file CREAp_m.txt is from
http//bioinfo.hku.hk/tutorial/)

101
Exercise 2 Q3

Use patmatmotifs to find pattern in swissprot
sequences fos_human or fos_rat, and use these
pattern to do fuzzpro. Search other fos genes of
different organisms. (Hint Use swfos_human for
the input Other organisms bovin, chick, mouse,
sheep.)

102
Exercise 2 Q4

Sometimes it is better to run the program fuzznuc
in command line because more parameters can be
given
In the BIOINFO terminal, type the following (you
must put the command in one line in the UNIX
prompt)
bioinfo fuzznuc -sequenceemblhhtetra
-patternAAGCTT -mismatch1 -complement
-outfoutf.out
How is the result different from previous run in
web interface?

103
Exercise 2 answer (A1)

Bars are displayed in the plot above the regions
predicted as being most likely to form
transmembrane regions
May be seven transmembrane helices in this
protein.
Result agree with pepinfo.

104
Exercise 2 answer (A2)

The symbol x is used for a position where any
amino acid is accepted.
There, the pattern CXXXXC matches the result
patterns of CQFPGC and CMFPGC.

105
Exercise 2 answer (A2)

Patmatmotifs output using swFOS_HUMAN

106
Exercise 2 answer (A3)

When run with patmatmotifs, the sequences
swFOS_HUMAN and swFOS_RAT returns the same
motifs of AMIDATION, LEUCINE_ZIPPER, and
BZIP_BASIC.
When run with fuzzpro with one of the pattern,
the start and end position agrees with
patmatmotifs.

107
Exercise 2 answer (A3)

Fuzzpro output with pattern GRAQSIGRRGKVEQ and
sequence swfos_human

108
Exercise 2 answer (A4)

You can add no. of mismatches in input parameters
for command line. The result with 1 mismatch can
now be shown

109
cpgplot

CPGPLOT Plot the CpG rich areas
CpG refers to a C nucleotide immediately followed
by a G. The 'p' in 'CpG' refers to the phosphate
group linking the two bases.
By default, this program defines a CpG island as
a region where
over an average of 10 windows, the calculated
composition is over 50
and the calculated Obs/Exp (i.e.
Observed/Expected) ratio is over 0.6
and the conditions hold for a minimum of 200
bases.
These conditions can be modified by setting the
values of the appropriate parameters.

110
cpgplot

The Observed number of CpG patterns in a window
is simply the count of the number of times a 'C'
is found followed immediately by a 'G'.
The Expected frequency of CpG's in a window is
calculated as the number of 'C's in the window
multiplied by the number of 'G's in the window,
divided by the window length.
Expected (number of C's number of G's) /
window length

111
cpgplot

Input emblrnu68037
Output

112
cpgplot

Output

113
cusp

CUSP reads one or more coding sequences (CDS
sequence only) and calculates a codon frequency
table.
It is important to use a codon frequency table
that is appropriate for the species that your
protein comes from.
Input
Seq emblpaamir
Codon usage table Default (Ehum.cut)

114
cusp

Output
Fract the faction of all amino acids coded for
this codon triplet.
/1000 the number of codons per 1000 bases

115
cusp

Running the program in command line allows you to
specify the sequence begin and sequence end
bioinfo cusp -sbeg 135 -send 1292
Create a codon usage table
Input sequence(s) emblpaamir
Output file paamir.cusp

116
cusp

bioinfo more paamir.cusp

117
hmoment

hmoment plots or writes out the hydrophobic
moment. Hydrophic moment is the hydrophobicity of
a peptide measured for a specified angle of
rotation per residue.
Assumption The angle of rotation (bonds of the
backbone and amino acid side-chains) per residue
in alpha helices is 100 degrees. The angle of
rotation per residue in beta sheets is 160
degrees.
Input
Sequenceswhbb_human
Produce graph yes
Plot two graph yes

118
hmoment