Title: Why my iPhone sucks: Screen Scraping the O2 Website
1Why my iPhone sucksScreen Scraping the O2
Website
- Simon Lewis
- http//lewis.li - simon_at_lewis.li
2Overview
- Observations about the iPhone
- Why Screen Scrape?
- MMS solution
- Conclusions
3Good Points(Before the Bashing)
- Web Browser
- SSL IMAP Email Client
- Alarm Clock
- Timer
4Bad Points
(Camera, Phone Interface, Text Message
Interface)
Wet piece of string is better connected
5MMS?
(maybe its not that popular)
ERROR
6Mocking from O2
7Message List Window
8Beautiful Interface?
9There must be a better Solution...
Problems
- No web service available
- http//o2.co.uk/m not optimised for iPhone
Opportunities
- HTTP interface can be exploited
- iPhone Mail app is very nice SSL IMAP
10Screen Scraping?
- Programmatically accessing webpage content
- Parsing the contents of the page to extract the
information you want - Need a language suited to text parsing
Laziness - one of the principle virtues of a
programmer What tools are available?
Perl
11CPAN(Comprehensive Perl Archive Network)
use ConfigIniFilesuse WWWMechanizeuse
GetoptLonguse MIMELiteuse YAML qw(LoadFile
DumpFile)use DataDumperuse CGI qw(standard)
12System Overview
2
5
lewis.li
iPhone
4
1
3
o2.co.uk
13Authentication
my mech WWWMechanize-gtnew( autocheck gt 1
) mech-gtget( site_url )mech-gtsubmit_form(
form_name gt 'fm', fields gt msisdn gt
user_name, pin gt user_password )
14Inbox Listing
mech-gtcontent /From.?ltpgt(.?)lt\/pgt.?Subjec
t.?ltpgt(.?)lt\/pgt/s my _at_mms_message_links
mech-gtfind_all_links( url_regex gt
qr/showMessage/ ) my unique_mms_links map
_, 1 map _-gturl_abs
_at_mms_message_links
15Iterating through MMS
for my option (_at_other_options)
mech-gtcurrent_form-gtvalue('selectedItem',
option) mech-gtsubmit() filename
"/tmp/mmsmessage_part" open fh, "gt filename"
or die "filename !" binmode fh print fh
mech-gtcontent close fh mms_details-gtfile
name mech-gtct() mech-gtback()
16Creating the Email
my msg MIMELite-gtnew( From gt"sender",
To gt"mail_recipient", Subject
gt"subject", Type gt'multipart/mixed')
for my part (keys mms_details) ( my
nice_name part) s!/tmp/!! if
(mms_details-gtpart eq 'image/jpeg')
msg-gtattach(Type gt'image/jpeg',
Path gt part,
Filename gt"nice_name.jpeg",
Disposition gt 'attachment') msg-gtsend
17Receiving MMS via Email
18Conclusions
- CPAN had all the modules needed to automate
interaction - Messages on the O2 website expire. Email
provides a good way to archive content - WWWMechanize is very useful
- Radio buttons were the biggest challenge