Title: Algorithms
1Algorithms
2Administrivia
- Homework Assignment 6
- If you forgot to put your name on it, let me know
- Homework Assignment 7
- Due next Tuesday
- Lab 6 (Visual Basic Part 2)
- This week due Friday
3The big picture
- We built a computer
- We built an operating system to control the
computer - We attached the computer to a network
- We wrote a compiler to make programming the
computer easier - We share CPU and disk across the network
- Need to talk about algorithms
4Algorithms
- Recipes for doing computations
- The underpinnings of programming
- Think out your algorithm
- Show that it works
- Determine its efficiency
- Write it as a program
5What is an algorithm
- Algorithm is a recipe
- Has
- Inputs
- Rules
- Evaluation Criteria
- Output
6When do we use algorithms?
- Always!
- Assignment 5
- Step 1 -- Create a message of between 150 and 200
characters that you wish to transmit. - Step 2 -- Give an encoding of the alphabet
- Step 3 -- Use the compression ideas we discussed
to compress your message. - Step 4 -- Write your compressed message as a
sequence of hexadecimal digits in this encoding. - Step 5 -- Now you are ready to create the message
to be hidden. Your message will - Step 6 -- We now consider a picture that could
be displayed on your web page.
7Examples of problems
- Baking cookies
- Putting things in alphabetical order
- Being a web search engine
8Chocolate chip cookies
9Chocolate chip cookies
- Input
- flour (2 ¼ c)
- baking soda (1t)
- salt (1t)
- butter (1c)
- granulated sugar (3/4 c)
- brown sugar(3/4c)
- vanilla(1t)
- eggs (2)
- chocolate chip morsels (2c)
- chopped nuts (1c)
- Output
- 5 dozen cookies
10Chocolate chip cookies
- Steps in the algorithm
- Combine flour, baking soda, and salt in small
bowl. - Beat butter, granulated sugar, brown sugar and
vanilla in large bowl - Add eggs one at a time Beating after adding each
egg - Gradually beat in flour mixture
- Stir in morsels and nuts
- Drop by rounded tablespoons onto ungreased baking
sheets - Bake 9-11 minutes
- Let stand for 2 minute
11Chocolate chip cookie algorithm
- Primitives
- Inputs
- Flour, baking soda, salt, butter, brown sugar,
granulated sugar, vanilla, egg, morsels, nuts - Alternatively, chocolate chip cookie mix
- Alternatively, wheat, sugar cane, hen,
- Operators
- Combine, Beat, Gradually beat, Stir, Drop, Bake,
Let stand
12Chocolate chip cookie algorithm
- Execution
- First 2 steps can be done in parallel?
- Parbegin (Combine(),Beat()) Parend
- Machine dependencies
- Ovens vary (Bake 9-11 minutes)
- Ingredients vary and so need to be handled
differently
13Chocolate chip cookie algorithm
- Algorithm testing
- Proof of the pudding is in the eating
- How do we mechanize this?
14Chocolate chip cookie algorithm
- Comparing different algorithms
- Quality of input/output map
- User time
- Machine (oven) time
15Putting things in alphabetical order
- Data set sizes
- Course list for COS 111 40 students
- PU directory assistance 10,000 people
- Manhattan phone book 1 million people
- Social Security database 1 billion records
- Long distance call billing records 100
billion/year - Different methods for different tasks
- Fast for large
- Simple for small
16A simple method for sorting
- Find smallest value -- put it first in list
- Find second smallest value -- put it second
-
- Find next smallest value put it next
-
- When no more values, youre done
17How it works
18How it works
Find smallest value -- put it first in list
19How it works
Find second smallest value -- put it second
20How it works
Finish the sorting
21A simple method for sorting
- To sort array x x1,x2, , xn
- For I 1 to n
- For J I1 to n
- If (xI gt xJ) Then swap their values
- next
- next
22Another sorting algorithm
- Sorting by Merging
- Key idea ? Its easy to merge 2 sorted lists
- Sort larger lists by
- Sort smaller lists
- Merge the results
- How do we sort smaller lists?
23Merging 2 sorted lists
24Merging 2 sorted lists
Start at the top of each list
25Merging 2 sorted lists
190 is bigger than 155
26Merging 2 sorted lists
Record 155 and move the arrow
27Merging 2 sorted lists
190 is less than 255
28Merging 2 sorted lists
Finished when at the end of each list
29Sort then merge
Subdivide
30Sort then merge
Sort pieces By merging
Subdivide
31Sort then merge
Sort pieces By merging
Merge
Subdivide
32SortMerge algorithm
- Function SortMerge(x,1,n)
- If n 1 then
- Return
- End if
- Mid (1 n)/2
- SortMerge(x,1, Mid )
- SortMerge(x, Mid 1, n)
- Merge(x,1, Mid , Mid 1, n)
- End Function
33Does it work?
- Have to be careful about stopping
- There are always a lot of things going on
34Divide and conquer
- Use recursion
- reduce solving for problem of size n to solving
two problems of size n/2 - then combine the solutions
- S(n) 2 S(n/2) M(n/2,n/2)
- Solving a sorting problem of size n requires
solving 2 sorting problems of size n/2 and doing
a merge of 2 sets of size n/2
35Comparing running times
36Comparing running times
Reducing 20 hours to 3 seconds
37Searching
- Once a list is in alphabetical order, how do you
find things in it? - For example, is COS 111 on the list of courses
that satisfy the (EC) Epistemology and Cognition
requirement?
38EC courses
PHI 201 PHI 204 PHI 301 PHI 304 PHI 312 PHI
321 PHI 333 PHI 338 PSY 255 PSY 306 PSY 307 PSY
316
AAS 391 ANT 201 COS 302 FRS 135 FRS 137 GER
306 HUM 365 LIN 213 LIN 302 LIN 306 LIN 315 PHI
200
39Searching for COS 111
AAS 391 ANT 201 COS 302 FRS 135 FRS 137 GER
306 HUM 365 LIN 213 LIN 302 LIN 306 LIN 315 PHI
200
PHI 201 PHI 204 PHI 301 PHI 304 PHI 312 PHI
321 PHI 333 PHI 338 PSY 255 PSY 306 PSY 307 PSY
316
COS 111
Compare to the middle
40Searching
AAS 391 ANT 201 COS 302 FRS 135 FRS 137 GER
306 HUM 365 LIN 213 LIN 302 LIN 306 LIN 315 PHI
200
PHI 201 PHI 204 PHI 301 PHI 304 PHI 312 PHI
321 PHI 333 PHI 338 PSY 255 PSY 306 PSY 307 PSY
316
If larger search second half
If smaller search first half
COS 111
Compare to the middle
41Repeat
If smaller search first half
AAS 391 ANT 201 COS 302 FRS 135 FRS 137 GER
306 HUM 365 LIN 213 LIN 302 LIN 306 LIN 315 PHI
200
If larger search second half
COS 111
Compare to the middle
42Building indices
PHI 201 PHI 204 PHI 301 PHI 304 PHI 312 PHI
321 PHI 333 PHI 338 PSY 255 PSY 306 PSY 307 PSY
316
AAS 391 ANT 201 COS 302 FRS 135 FRS 137 GER
306 HUM 365 LIN 213 LIN 302 LIN 306 LIN 315 PHI
200
43Search indices then data
PHI 201 PHI 204 PHI 301 PHI 304 PHI 312 PHI
321 PHI 333 PHI 338 PSY 255 PSY 306 PSY 307 PSY
316
AAS 391 ANT 201 COS 302 FRS 135 FRS 137 GER
306 HUM 365 LIN 213 LIN 302 LIN 306 LIN 315 PHI
200
COS 111
44How do we describe algorithms?
- Pseudocode
- Combines English, Visual Basic constructs
- Works with various types of primitives
- Could be - /
- Could be more complex things
- Describes how data is organized
- Describes operations on the data
- Is meant to be higher level than programming
45Searching with indices (pseudocode)
- Build the indices
- Do this by going through the list and determining
where department names change - Store the results in an array called Indices
- Search the indices
- Do a binary search on the array Indices
- Do this by comparing to the middle element
- Then use binary search to compare to the upper
half - Or use binary search to compare to the lower half
46Building a web search engine
- Crawl the web
- Organize the results for fast query processing
- Process queries
47Crawl the web
- Every month use TCP/IP to go to all reachable web
pages - 1.5B pages, 10 Kbytes/page, so 15 terabytes
- Can compress an average page to 3Kbytes
- Numeracy
- Crawl 1.5B pages in 14 days so
- Crawl 100M pages per day
- Crawl 4M pages per hour
- Crawl 1,000 pages per second
48Organize the results
- Put into alphabetical order
- Build indices
- Make multiple copies so that searching can
proceed in parallel. - When you update, you rebuild the indices
49Process queries
- Look up indices
- Look up words/phrases
- Advertiser can buy a word or phrase
- This search gives you internal addresses of web
pages - Look them up to build results page
50Searching time
- Want to answer a query in less than ½ second
- Use PageRank to get good results
51Page Rank
- The web is a collection of links
- A documents importance is determined by
- How many pages point to it
- How important those pages are
- This is its PageRank
- Used for determining
- How often to crawl a page
- How to order pages presented.
52Remaining subtask
- Matching strings
- Is this the word computer?
- Comparing strings
- Did the word computer occur before or after?
53How does string matching work?
- State machines ?
- Move along states as long as you keep matching
- Back off when you miss a match
54State machine looking for abcd
Read a
Read b
Read c
Sd
Sa
Sb
Sc
Other
Other
Read d
Other
OK
55State machine looking for abcd
Read a
What happens if input is abccadbacabcd?
Sa Sb Sc Sd Sa Sb Sa Sa Sb Sa Sb Sc Sd OK
56State machine looking for abcd
Read a
What happens if input is abcabcd?
Sa Sb Sc Sd Sa Sa Sa Sa
57State machine looking for abcd
Read a
Read a
Read b
Read c
Sd
Sa
Sb
Sc
Read a
Other
Read a
Read d
Other
OK
Other
58Larger search challenges
- Allow strings to have dont cares
- Starts with a and ends with e
- Has come number of copies of the substring ab
- Finding strings close to your string
- For spelling corection
59Algorithms -- summary
- Methods of modeling processes
- Understand at a high level
- Make sure your reasoning is correct
- Worry about efficiency in situations where that
matters - Write as pseudocode
60Whats next
- Problems for which there are no algorithms
- Problems for which all algorithms run slowly
- Applications of problems where algorithms run
slowly