Title: Information Infrastructure II
1Information Infrastructure II
- I211 Week 12
- Rajarshi Guha
2Outline
- The Python DB-API
- Working with Postgres
- Working with sqlite3
- Base64 encoding
3The Python DB-API
- A consistent way to access different databases
- If your program uses this, your application can
use any database supported by the API - Just need to change 2 lines
- Hides much of the database internals
- Allows you to interact in a Pythonic way
- A row is a Python tuple
4The Python DB-API
- You need to construct SQL statements as strings
- The API will allow you to
- Send a query
- Get the results of a query
- One at a time
- All together
- Commit transactions
5Connecting to Postgres
- Well use the psycopg2 module
- Remember to import it
- To connect we use the connect method
- Returns a connection object
- We then get a cursor object
- Now were set to use the database
con psycopg2.connect('dbnamepython211
useri211 passwordI211pwd') cursor con.cursor()
6Working with rguha_pc
Column Type Modifiers
--------------------------------------------
cid bigint not null
iupac_openeye_name text
iupac_cas_name text iupac_name
text iupac_systematic_name
text iupac_traditional_name text
nist_inchi text xlogp
real exact_mass
real mw real
can_smi text
iso_smi text tpsa
real charge
smallint hatm_count integer
gfp bit(166) gfpbcnt
integer
7Performing Queries
- The cursor object has a method call execute takes
one argument - Create an SQL statement in a string
- Send this to execute
- This function has no return value
sql select cid from rguha_pc where mw gt 100
and mw lt 150 cursor.execute(sql)
8How To Handle Errors?
- If your SQL statement has an error an exception
will be thrown - Catch it and report what happened
- If the statement was successful, there is no
response
9How to Handle Errors
sql selet from rguha_pc where mw gt 100 and
mw lt 150 try cursor.execute(sql) except
Exception, e print e gtgtgt syntax error at or
near "selet" at character 1 sql select from
rguha_pctable where mw gt 100 and mw lt 150 try
cursor.execute(sql) except Exception, e
print e gtgtgt relation "rguha_pctable" does not
exist
10How to Handle Errors
- In general, syntax errors in SQL, wrong table
names etc will be fixed during testing - Other errors might occur at runtime
- Tables may be locked
- Loose connection
- Wrapping stuff in try/except is a good idea
11Getting Results
- Get the results from the cursor object
- fetchall() - returns all the results as a list of
tuples - fetchone() - returns a single tuple
- In both cases, the individual tuples have N
elements - N is the number of columns in the SELECT statement
12Getting All The Results
- If the result set is very big, your program will
be very slow or crash - If there were no records then ret will be zero
length
c.execute('select cid,mw from rguha_pc where mw gt
120 and mw lt 150') ret c.fetchall() print Got
back d records (length(ret)) for row in ret
print ret
13A Cursor As a List
- After executing the SQL, the cursor has the
records - We can also directly loop over the cursor
- Cant get the length of the cursor since its not
a real list
c.execute('select cid,mw from rguha_pc where mw gt
120 and mw lt 150') for row in c print row
14Results are Temporary
- After one call to fetchall(), doing it again will
not get any results - Also, if you loop over the cursor. Works just one
time - To get the results again, must execute the
original query - If you need to keep the rows for later use, copy
them into a list!
15Results are Temporary
gtgtgt c.execute('select cid from rguha_pc where mw
gt 120 and mw lt 150') gtgtgt ret c.fetchall() gtgtgt
print 'Got d results' (len(ret)) Got 62
results gtgtgt ret c.fetchall() gtgtgt print 'Got d
results' (len(ret)) Got 0 results
16Results One at a Time
- Rather than get a whole list of records, its
better to get one at a time - Useful for large result sets
c.execute('select cid from rguha_pc where mw gt
120 and mw lt 150') while True ret
c.fetchone() if not ret break print
ret
17Results One at a Time
- As with fetchall(), once the loop is over, you
cant repeat it - You can combine fetchone() with fetchall()
- After calling fetchone() a few times, fetchall()
will return the remainder
18Combining fetchall() fetchone()
- The result of execute() is like a pipe
- fetchone() picks one item from the pipe at a time
- fetchall() gets all the items (or whatever is
left) - In both cases, those items are no longer available
c.execute('select cid from rguha_pc where mw gt
120 and mw lt 150') print c.fetchone() print
c.fetchone() print len(c.fetchall())
19Update/Delete from Python
- Very similar to SELECT queries
- Construct the SQL
- Execute it
sql "update rguha_pc set tpsa 1234 where cid
'12997'" c.execute(sql)
20Committing Transactions
- psycopg2 uses transactions
- Not important for SELECT
- If you do a series of UPDATE, INSERT or DELETE
you should commit - If you dont the changes may or may not be seen
in a subequent query
con psycopgq( ) cursor.execute(insert into x
values (1)) con.commit
21A Postgres Alternative?
- Uptil now, weve been working with Postgres
- Lot of overhead involved
- Someone has to maintain the database
- Involves a distinct process on the machine
- Overkill for small projects
- Go to a smaller database
22sqlite3
- A small database
- Doesnt require a separate process
- Minimal (no) maintenance
- Can even run in memory
- Uses a reduced form of SQL
- Uses DB-API
23Sqlite3 - Why Use it?
- Comes with Python 2.5
- Useful for smallish projects
- MP3 library manager (Amarok)
- Manage email (Mail.app)
- Good for prototyping
- Easily shift to Postgres/Mysql/Oracle later on
with minimal change
24Using Sqlite3
- Wont work on Sulu
- The code creates a DB file (if it didnt exist)
- After that itsthe same asfor Postgres
import sqlite3 con sqlite3.connect(/tmp/example
) cursor con.cursor()
25Creating Tables
- If you had sqlite installed separately, you could
create tables from the command line - No need to bother with that in Python!
- Just do an execute
- Should be donejust once
c.execute('' create table stocks (date text,
trans text, symbol text, qty real, price
real)''')
26Insert/Update/Delete
- Exactly the same as previous examples
- Create SQL statement
- Execute it
27Handling Binary Data
- Many sources of binary data
- Images (PNG, JPEG)
- Music (MP3, WAV)
- Data files (Word, Excel)
- Cant handle binary files like text
- No line endings
- Some characters might print weirdly
28Handling Binary Data
- But certain applications are text based
- HTTP
- Email
- How can we pass binary data in these types of
applications - Example How does you email client send an MP3
attachment?
29Base64 Encoding
- The trick is to convert the binary data to text
data - No special characters involved
- Many ways to do this
- Well consider Base 64 encoding
- Why 64?
- Uses numbers in base 64
30Base 64
- Base 2 is binary
- Digits are 1 and 0
- Base 10 is our use base
- Digits are 0, , 9
- Base 16 is hexadecimal
- Digits are 0, , 9, A, , F
- Base 64 uses many more digits
- 0, , 9, A, , Z, a, , z, , /
- The is used as a padding symbol
31How Does it Work?
- Consider Hello
- Take the ASCII code for each letter
- Convert the code to binary
- Join the whole thing (40 bits)
- Then take 6 bits at a time and convert it to base
64 - SGVsbG8
32Working With Base64
- Python has a module, base64
- Handles encoding and decoding
- Makes life very simple
- Two functions of interest
- encodestring - converts a list of bytes to Base64
- decodestring - converts Base64 to a list of bytes
- With larger data (say MP3 files) youll want to
use encode() and decode() which work directly
with files
33Working with Base64
- import base64
- s My name is Jack Bauer
- enc base64.encodestring(s)
- dec base64.decodestring(enc)
- print enc
- print dec
34Working with Base64
- To quickly encode a binary file, just read in all
the data - Call encodestring
- Same procedure for a Base64 encoded file
- Read the wholefile
- Dont read lineby line
import base64 f open(binary.exe, r) data
f.read() f.close() enc base64.encodestring(dat)