Title: What is Computer Science Really About?
1What is Computer Science Really About?
2Main Points
- There is more to computer science than just
programming. - Computer science is about algorithms and
computational thinking. - We try to formalize activities into repeatable
procedures and concrete decisions. - Generalizing a procedure into an abstract
algorithm helps us recognize if there are known
solutions, and how complex the problem is. - Programming is just translating an algorithm into
a specific syntax.
3Some definitions...
- Computational thinking
- translating processes/procedures into
step-by-step activities with well-defined choice
points and decision criteria - Design and analysis of algorithms
- expression of a procedure in terms of operations
on abstract data structures like graphs, lists,
strings, and trees - finite number of steps (clear termination
conditions it has to halt) - is the algorithm correct?
- are all cases handled, or might it fail on
certain inputs? - how much time will it take? how much space
(memory)? - Programming
- translating algorithms into a specific language
- Software engineering
- managing the development and life-cycle of a
system, including design/specification,
documentation, testing, use of components/librarie
s, release of new/updated versions - usually a team effort
4Computational Thinking
- has infused into all kinds of fields from cooking
and gardening, to transportation, medical
diagnosis, and particle physics - many intelligent activities are often
ill-defined, and CT is about formalizing them
into concrete decisions and repeatable procedures - think about how to find a good place for dinner
in a new town - think about how choose a book to read (interest
area? author or type of lit.? next in series?
recommendations? available in library?) - finding Waldo (how do you search for shapes in
images?) - think about how you recognize misspelled words,
or words in a foreign language, or proper names
5- mechanical analogies
- think about how a thermostat does temperature
control - think about the pattern of traffic signals at an
intersection - think about how a soda machine works
- ultimately, we formalize these things into
- flowcharts and pseudocode
- abstractions like finite-state machines
has 1.00 been inserted?
is choice available?
accept new coin
get users soda choice
dispense soda can
dispense change
display current total
display make another choice
6Googles ideas on Computational Thinking
http//www.google.com/edu/computational-thinking/w
hat-is-ct.html
Four components Example baking a cake Computationally
DECOMPOSION breaking a problem into (decoupled) sub-problems mixing dry ingredients, then wet ingredients divide-and-conquer
PATTERN RECOGNITION identifying repeatable operations crack egg1, crack egg2, crack egg3... for/while-loops, sub-routines
GENERALIZATION and ABSTRACTION baking chocolate cake or carrot cake or pound cake is similar, except add/substitute a few different ingredients adding parameters to code also, can we apply the same procedure to other data like vectors, arrays, lists, trees, graphs?
ALGORITHM DESIGN formalize procedure into recipe others can use define things like how you know when it is done (bake 30 min at 350? or until crust is golden...) step-by-step procedure with clear initialization, decision and termination conditions
7The earliest algorithm
- Euclids algorithm for determining the GCD
(greatest common denominator) also known as the
Chinese Remainder Theorem - problem given two integers m and n, find the
largest integer d that divides each of them - example 4 divides 112 and 40 is it the GCD?
(no, 8) - repeatedly divide the smaller into the larger
number and replace with the remainder - GCD(a,b)
- if altb, swap a and b
- while bgt0
- let r be the remainder of a/b
- a?b, b?r
- return a
- questions a Computer Scientist would ask
- Does it halt? (note how a always shrinks with
each pass). Is it correct? Is there a more
efficient way to do it (that uses fewer steps)?
Relationship to factoring and testing for prime
numbers.
- a112, b40, a/b2 with rem. 32
- a40, b32, a/b1 with rem. 8
- a32, b8, a/b4 with rem. 0
- a8, b0, return 8
8...UCLA 92 Stanford 80 OklaSt 55 Iowa 61
Indiana 83 MichSt 82 ...
- while monitoring a stream of basketball scores,
keep track of the 3 highest scores - impractical to just save them all and sort
- how would you do it?
9...UCLA 92 Stanford 80 OklaSt 55 Iowa 61
Indiana 83 MichSt 82 ...
- algorithm design often starts with representation
- imagine keeping 3 slots, for the highest scores
seen so far - define the semantics or an invariant to
maintain - A gt B gt C gt all other scores
- with each new game score (p,q) (e.g. Aggies 118,
Longhorns 50) - if pgtA then CB, BA, Ap
- else if pgtB, then CB, Bp
- else if pgtC, then Cp
- repeat this shifting with q
- questions to consider
- what happens on first pass, before A, B, and C
are defined? - what happens with ties?
- should A, B, and C represent distinct games, or
could 2 of them come from the same game?
10Spell-checking
- given a document as a list of words, wi, identify
misspelled words and suggest corrections - simple approach use a dictionary
- for each wi, scan dictionary in sorted order
- can you do it faster? (doc size N x dict size D)
- suppose we sort both lists
- sorting algs usually take N log2 N time
- example if doc has 10,000 words, sort in
132,000 steps - assume you can call a sort sub-routine (reuse of
code) - note you will learn about different sorting
algorithms (and related data structures like
trees and hash tables) and analyze their
computational efficiency in CSCE 211 - can scan both lists in parallel (takes D steps)
- D N log N lt ND
11- Words in a document like the US
- Declaration of Independence
- abdicated
- abolish
- abolishing
- absolute
- absolved
- abuses
- accommodation
- accordingly
- accustomed
- acquiesce
- act
- acts
- administration
- affected
- after
- against
- ages
- Words in the English Dictionary
- ...
- achieve
- achromatic
- acid
- acidic
- acknowledge
- acorn
- acoustic
- acquaint
- acquaintance
- acquiesce
- acquiescent
- acquire
- acquisition
- acquisitive
- acquit
- acquittal
- acquitting
note that this list is denser
12occupashun occupation occupashun
occurrence occupashun
occl-usion
d3
- suggesting spelling corrections
- requires defining closest match
- fewest different letters? same length?
- how to efficiently find all words in dictionary
with minimal distance? - does context matter?
- used as noun or verb (affect vs. effect)
- nearby words in same subject area?
- what about dialects (Olde English) or words in a
foreign language (Japanese)? - The sensei showed the student how to catch a
flish.
d6
d5
13Summary about Computational Thinking
- I didnt actually tell you how to do it you
will learn it in your classes. - CT is about transforming (ill-defined) activities
into concrete, well-defined procedures. - a finite sequence of steps anybody could follow,
with well-defined decision criteria and
termination conditions - take-home message the following components are
important to CT - decomposition (breaking problem into pieces)
- identifying patterns and repetition
- abstraction and generalization
14Design and Analysis of Algorithms
- Why we study algorithms
- many tasks can be reduced to abstract problems
- if we can recognize them, we can use known
solutions
15- example graph algorithms
- graphs could represent friendships among people,
or adjacency of states on a map, or links between
web pages... - determining connected components
- MapQuest cant reach city A from city B if they
are on different islands/continents - finding shortest path between two points
- finding cliques
- completely connected sub-graphs
- uniquely matching up pairs of nodes
- e.g. a buddy system based on friendships
- determining whether 2 graphs have same
connectivity (isomorphism) - useful for visual shape recognition
- finding an acyclic tree that spans all nodes
- minimal-cost communication networks
16- characterize algorithms in terms of efficiency
- note we count number of steps, rather than
seconds - wall-cock time is dependent on machine, compiler,
load, etc... - however, optimizations are important for
real-time sys., games - are there faster ways to sort a list? invert a
matrix? find a completely connected sub-graph? - scalability for larger inputs (think human
genome) how much more time/memory does the
algorithm take? - polynomial vs. exponential run-time (in the worst
case) - depends a lot on the data structure
(representation) - hash tables, binary trees, etc. can help a lot
- proofs of correctness
- can you prove Euclids algorithm is correct?
- can you prove an algorithm will guarantee to
output the longest palindrome in a string? - is the code for billing long-distance calls
correct?
17- Why do we care so much about polynomial run-time?
- consider 2 programs that take an input of size n
(e.g. length of a string number of nodes in
graph, etc.) - run-time of one scales up as n2 (polynomial), and
the other as 2n (exponential) - the latter are effectively unsolvable for
ngt16, even if we used computers that were 10
times as fast!
18- Why do we care so much about polynomial run-time?
- consider 2 programs that take an input of size n
(e.g. length of a string number of nodes in
graph, etc.) - run-time of one scales up as n2 (polynomial), and
the other as 2n (exponential) - the latter are effectively unsolvable for
ngt16, even if we used computers that were 10
times as fast!
a computational cliff
19n n2 2n
1 1 2
2 4 4
3 9 8
4 16 16
5 25 32
6 36 64
7 49 128
8 64 256
9 81 512
10 100 1024
11 121 2048
12 144 4096
13 169 8192
14 196 16384
15 225 32768
16 256 65536
17 289 131072
18 324 262144
19 361 524288
20 400 1048576
21 441 2097152
22 484 4194304
23 529 8388608
24 576 16777216
25 625 33554432
20Moores Law(named after Gordon Moore, founder of
Intel)
- Computers keeping getting faster
- Number of transistors on CPU chips appears to
double about once every 18 months - Similar statements hold for CPU speed, disk
capacity, network bandwidth, etc.
(but waiting a couple years for computers to get
faster is not an effective solution to NP-hard
problems)
source Wikipedia
21P vs. NP (CSCE 411)
- some rough definitions
- problems in P solvable in polynomial time with
a deterministic algorithm - examples sorting a list, inverting a matrix...
- problems in NP solvable in polynomial time
with a non-deterministic algorithm - given a guess, can check if it is a solution in
polynomial time - example given a set of k vertices in a graph,
can check if they form a completely connected
clique but there are exponentially many possible
sets to choose from - most computer scientists believe P?NP, though it
has yet to be rigorously proved
even harder problems (complexity classes)
P
NP
22- Being able to recognize whether a problem is in P
or NP is fundamentally important to a computer
scientist - Many combinatorial problems are in NP
- knapsack problem (given n items with size wi and
value vi, fit as many as possible items into a
knapsack with a limited capacity of L that
maximizes total value. - clique (largest completely connected sub-graph)
- traveling salesman problem (shortest circuit
visiting every city) - Finding the shortest path in a graph between 2
nodes is in P - there is an algorithm that scales-up polynomially
with size of graph Djikstras algorithm - however, finding the longest path is in NP!
(hence we do not expect there are complete and
efficient solutions for all cases) - Applications to logistics, VLSI circuit design...
23- If a problem is in NP, there might be an
approximation algorithm to solve it efficiently
(in polynomial time) - However, it is important to determine the error
bounds. - For example, it might find a path that is no more
than twice the optimal length - A simple greedy algorithm for the knapsack
problem - put in item with largest weight-to-value ratio
first, then next largest, and so on... - can show that will fill knapsack to within 2
times the optimal value