Title: Inside%20the%20Gene%20Sorter
1Inside the Gene Sorter
- A moderately complex CGI application
2Built on top of library modules
- cheapcgi - creates widgets
- cart - gathers input
- web - UCSC look and feel
- axtAffine - align two sequences
- hash - in memory table keyed by string
- linefile - fast line oriented parsing
- ra - file full of var/value records, used to
configure columns etc.
3Gene Sorter Main Page
- Controls up top followed by big table.
- One of about a dozen pages produced by hgNear CGI
script.
4hgNear controls CGI vars
hgsid INPUT HIDDEN 97309948 org SELECT
n/a Human Human Mouse
Rat C. elegans D. melanogaster
S. cerevisiae db SELECT n/a hg18
hg18 hg17 hg16 near_search
INPUT TEXT submit INPUT SUBMIT
n/a near_order SELECT n/a expGnfAtlas2
expGnfAtlas2 blastp
pfamSimilarity geneDistance
genomePos nameSimilarity
near.do.configure INPUT SUBMIT
n/a near.do.advFilter INPUT SUBMIT
n/a near.count SELECT n/a 50
25 50 100 200
500 1000 all near.do.getSeqPage
INPUT SUBMIT n/a
- Note near prefix of CGI-specific vars.
- Note near.do prefix of buttons.
- Presence of a button var in CGI is used to figure
out which page to draw. - All near.do vars are ripped out of cart so page
only drawn once.
5Snippet of C for big dispatch
void doMiddle(struct cart theCart) / Write the
middle parts of the HTML page. This routine sets
up some globals and then dispatches to the
appropriate page maker. / else if
(cartVarExists(cart, customPageDoName))
doCustomPage(conn, colList) else if
(cartVarExists(cart, customSubmitDoName))
doCustomSubmit(conn, colList) else if
(cartVarExists(cart, customClearDoName))
doCustomClear(conn, colList) else if
(cartVarExists(cart, customPasteDoName))
doCustomPaste(conn, colList) else if
(cartVarExists(cart, customUploadDoName))
doCustomUpload(conn, colList) else if
(cartVarExists(cart, customFromUrlDoName))
doCustomFromUrl(conn, colList) else if
(cartVarExists(cart, orderInfoDoName))
doOrderInfo(conn) else if (cartVarExists(cart,
affineAliVarName)) doAffineAlignment(conn) el
se if (cartNonemptyString(cart, searchVarName))
doSearch(conn, colList) else if
(gotAdvFilter()) displayData(conn, colList,
knownPosFirst(conn)) else doExamples(conn,
colList) cartRemovePrefix(cart, "near.do.")
6Gene Sorter Columns
7columnDb.ra example
- name num
- shortLabel
- longLabel Item Number in Displayed List/Select
Gene - priority 1
- visibility on
- type num
- name name
- shortLabel Name
- longLabel Gene Name/Select Gene
- priority 2
- visibility on
- type knownName kgXref kgID geneSymbol
- search fuzzy
- searchLabel Known Gene Names
- name proteinName
- shortLabel UniProt
- longLabel UniProt (SwissProt/TrEMBL) Protein
Display ID
8hgNearData
- columnDb.ra lives in hgNearData directory.
- Theres three levels of columnDb.ra files in
three levels of dir heirarchy - root - applicable to all organisms
- organism - override root for an organism
- database - override for specific assembly
- Can override specific fields as well as entire
record. Always need at least name field. - genome.ra and orderDb.ra also live in hgNearData,
as well as column.html files that describe the
columns.
9Routine to get active columns
struct column getColumns(struct sqlConnection
conn) / Return list of columns for big table.
/ char raName "columnDb.ra" struct column
col, next, customList, colList NULL struct
hash raList readRa(raName), raHash
NULL / Create built-in columns. / if (raList
NULL) errAbort("Couldn't find anything
from s", raName) for (raHash raList raHash
! NULL raHash raHash-gtnext)
AllocVar(col) col-gtsettings raHash
columnVarsFromSettings(col, raName) if
(!hashFindVal(raHash, "hide"))
setupColumnType(col) if
(col-gtexists(col, conn))
slAddHead(colList, col) /
Create custom columns. / customList
customColumnsRead(conn, genome, database) for
(col customList col ! NULL col next)
next col-gtnext setupColumnType(col)
if (col-gtexists(col, conn))
slAddHead(colList, col)
10Column structure
struct column / A column in the big table. The
central data structure for hgNear. /
/ Data set during initialization that is
guaranteed to be in each column. / struct
column next / Next column. / char
name / Column name, not
allocated here. / char shortLabel
/ Column label. / char longLabel
/ Column description. / boolean on
/ True if turned on. / char
type / Type - encodes which
methods to used etc. / boolean
(exists)(struct column col, struct
sqlConnection conn) / Return TRUE if column
exists in database. / char
(cellVal)(struct column col, struct genePos
gp, struct sqlConnection conn) /
Get value of one cell as string. FreeMem this
when done. Note that gp-gtchrom may be NULL
legitimately. / void (cellPrint)(struct
column col, struct genePos gp, struct
sqlConnection conn) / Print one cell of
this column in HTML. Note that gp-gtchrom may be
NULL legitimately. / void
(labelPrint)(struct column col) / Print
the label in the label row. / void
(configControls)(struct column col) /
Print out configuration controls. /
11Drawing big table
hPrintf("ltTABLE BORDER1 CELLSPACING0
CELLPADDING1 COLSd BGCOLOR\""HG_COL_INSIDE"\"
gt\n", totalHtmlColumns(colList)) / Print label
row. / hPrintf("ltTR BGCOLOR\""HG_COL_HEADER"\"gt
") for (col colList col ! NULL col
col-gtnext) if (col-gton)
col-gtlabelPrint(col) hPrintf("lt/TRgt\n") /
Print other rows. / for (gene geneList gene
! NULL gene gene-gtnext) if
(sameString(gene-gtname, curGeneId-gtname))
hPrintf("ltTR BGCOLOR\"D0FFD0\"gt") else
hPrintf("ltTRgt") for (col colList col
! NULL col col-gtnext) if
(col-gton) col-gtcellPrint(col,gene,conn
) if (ferror(stdout))
errAbort("Write error to stdout")
hPrintf("lt/TABLEgt")
12Common column types
- Lookup - works with database table keyed by gene
name. Only a single string value allowed for
each gene. - Association - allows arbitrary SQL query
including gene name. Multiple values per gene ok. - Float - like lookup but with numerical values.
- Distance - reports a value associated with two
genes - selected gene and gene on current row. - expMulti - expression microarray data
- See hgNearData.doc for more details.
13Adding kgTxInfo columns
- name geneCategory
- shortLabel Gene Category
- longLabel High Level Gene Category - Coding,
Antisense, etc. - priority 2.6001
- visibility off
- type lookup kgTxInfo name category
- name cdsScore
- shortLabel CDS Score
- longLabel Coding potential score from
txCdsPredict - priority 2.6002
- visibility off
- type float kgTxInfo name cdsScore
14Updating coding SNPs Column
- name codingSnps
- shortLabel Coding SNPs
- longLabel Simple Nucleotide Polymorphisms in
Coding Regions - priority 7.5
- visibility off
- type association knownToCdsSnp
- queryFull select name,value from knownToCdsSnp
- queryOne select value,value from knownToCdsSnp
where name 's' - invQueryOne select name from knownToCdsSnp where
value 's' - itemUrl http//www.ncbi.nlm.nih.gov/SNP/snp_ref.cg
i?typersrss
15knownToCdsSnp from hg17.txt
- Make knownToCdsSnp table (DONE Nov 11, 2004,
Heather) - ssh hgwdev
- hgMapToGene hg17 snp knownGene knownToCdsSnp
-all -cds - row count 165728
- unique 34013
- approx. 5 minutes running time
But no snp table in hg18. Try instead hgMapToGene
hg18 snp126 knownGene knownToCdsSnp -all -cds
16Making intronSize column
- A new column type
- copy colTemplate.c to intronSize.c
- Edit code
- Add intronSize type to hgNear.c/.h
- Add intronSize to columnDb
- Make/test