Title: Python%20Data%20Structures
1Python Data Structures
- LING 5200
- Computational Corpus Linguistics
- Martha Palmer
2An Overview of Python
3Basic Datatypes
- Integers (default for numbers)
- z 5 / 2 Answer is 2, integer division.
- Floats
- x 3.456
- Strings
- Can use or to specify. abc abc
(Same thing.) - Unmatched ones can occur within the string.
matts - Use triple double-quotes for multi-line strings
or strings than contain both and inside of
them abc
4Whitespace
- Whitespace is meaningful in Python especially
indentation and placement of newlines. - Use a newline to end a line of code. (Not a
semicolon like in C or Java.)(Use \ when must
go to next line prematurely.) - No braces to mark blocks of code in Python
Use consistent indentation instead. The first
line with a new indentation is considered outside
of the block. - Often a colon appears at the start of a new
block. (Well see this later for function and
class definitions.)
5Comments
- Start comments with the rest of line is
ignored. - Can include a documentation string as the first
line of any new function or class that you
define. - The development environment, debugger, and other
tools use it its good style to include one. - def my_function(x, y)
- This is the docstring. This function does
blah blah blah. The code would go here...
6Defining Functions
- No header file or declaration of types of
function or arguments.
def get_final_answer(filename) Documentation
String line1 line2 return total_counter
7Python and Types
- Python determines the data types in a program
automatically. Dynamic Typing - But Pythons not casual about types, it enforces
them after it figures them out. Strong
Typing - So, for example, you cant just append an integer
to a string. You must first convert the integer
to a string itself. - x the answer is Decides x is string.
- y 23 Decides y is integer.
- print x y Python will complain about this.
8Calling a Function
- The syntax for a function call is
- gtgtgt def myfun(x, y)
- return x y
- gtgtgt myfun(3, 4)
- 12
- Parameters in Python are Call by Assignment.
- Sometimes acts like call by reference and
sometimes like call by value in C. Depends
on the data type. - Well discuss mutability of data types later
this will specify more precisely how function
calls behave.
9Functions without returns
- All functions in Python have a return value, even
ones without a specific return line inside the
code. - Functions without a return will give the
special value None as their return value. - None is a special constant in the language.
- None is used like NULL, void, or nil in other
languages. - None is also logically equivalent to False.
10Names and References 1
- Python has no pointers like C or C. Instead,
it has names and references. (Works a lot
like Lisp or Java.) - You create a name the first time it appears on
the left side of an assignment expression x
3 - Names store references which are like pointers
to locations in memory that store a constant or
some object. - Python determines the type of the reference
automatically based on what data is assigned to
it. - It also decides when to delete it via garbage
collection after any names for the reference have
passed out of scope.
11Names and References 2
- There is a lot going on when we typex 3
- First, an integer 3 is created and stored in
memory. - A name x is created.
- An reference to the memory location storing the 3
is then assigned to the name x.
Type Integer Data 3
Name x Ref ltaddress1gt
name list
memory
12Names and References 3
- The data 3 we created is of type integer. In
Python, the basic datatypes integer, float, and
string are immutable. - This doesnt mean we cant change the value of x
For example, we could increment x. - gtgtgt x 3
- gtgtgt x x 1
- gtgtgt print x
- 4
13Names and References 4
- If we increment x, then whats really happening
is - The reference of name x is looked up.
- The value at that reference is retrieved.
- The 31 calculation occurs, producing a new data
element 4 which is assigned to a fresh memory
location with a new reference. - The name x is changed to point to this new
reference. - The old data 3 is garbage collected if no name
still refers to it.
Type Integer Data 3
Name x Ref ltaddress1gt
14Names and References 4
- If we increment x, then whats really happening
is - The reference of name x is looked up.
- The value at that reference is retrieved.
- The 31 calculation occurs, producing a new data
element 4 which is assigned to a fresh memory
location with a new reference. - The name x is changed to point to this new
reference. - The old data 3 is garbage collected if no name
still refers to it.
Type Integer Data 3
Name x Ref ltaddress1gt
Type Integer Data 4
15Names and References 4
- If we increment x, then whats really happening
is - The reference of name x is looked up.
- The value at that reference is retrieved.
- The 31 calculation occurs, producing a new data
element 4 which is assigned to a fresh memory
location with a new reference. - The name x is changed to point to this new
reference. - The old data 3 is garbage collected if no name
still refers to it.
Type Integer Data 3
Name x Ref ltaddress2gt
Type Integer Data 4
16Names and References 4
- If we increment x, then whats really happening
is - The reference of name x is looked up.
- The value at that reference is retrieved.
- The 31 calculation occurs, producing a new data
element 4 which is assigned to a fresh memory
location with a new reference. - The name x is changed to point to this new
reference. - The old data 3 is garbage collected if no name
still refers to it.
Name x Ref ltaddress2gt
Type Integer Data 4
17Assignment 1
- So, for simple built-in datatypes (integers,
floats, strings), assignment behaves as you would
expectgtgtgt x 3 Creates 3, name x
refers to 3 gtgtgt y x Creates name y,
refers to 3.gtgtgt y 4 Creates ref for 4.
Changes y.gtgtgt print x No effect on x,
still ref 3.3
18Assignment 1
- So, for simple built-in datatypes (integers,
floats, strings), assignment behaves as you would
expectgtgtgt x 3 Creates 3, name x
refers to 3 gtgtgt y x Creates name y,
refers to 3.gtgtgt y 4 Creates ref for 4.
Changes y.gtgtgt print x No effect on x,
still ref 3.3
Name x Ref ltaddress1gt
Type Integer Data 3
19Assignment 1
- So, for simple built-in datatypes (integers,
floats, strings), assignment behaves as you would
expectgtgtgt x 3 Creates 3, name x
refers to 3 gtgtgt y x Creates name y,
refers to 3.gtgtgt y 4 Creates ref for 4.
Changes y.gtgtgt print x No effect on x,
still ref 3.3
Name x Ref ltaddress1gt
Type Integer Data 3
Name y Ref ltaddress1gt
20Assignment 1
- So, for simple built-in datatypes (integers,
floats, strings), assignment behaves as you would
expectgtgtgt x 3 Creates 3, name x
refers to 3 gtgtgt y x Creates name y,
refers to 3.gtgtgt y 4 Creates ref for 4.
Changes y.gtgtgt print x No effect on x,
still ref 3.3
Name x Ref ltaddress1gt
Type Integer Data 3
Name y Ref ltaddress1gt
Type Integer Data 4
21Assignment 1
- So, for simple built-in datatypes (integers,
floats, strings), assignment behaves as you would
expectgtgtgt x 3 Creates 3, name x
refers to 3 gtgtgt y x Creates name y,
refers to 3.gtgtgt y 4 Creates ref for 4.
Changes y.gtgtgt print x No effect on x,
still ref 3.3
Name x Ref ltaddress1gt
Type Integer Data 3
Name y Ref ltaddress2gt
Type Integer Data 4
22Assignment 1
- So, for simple built-in datatypes (integers,
floats, strings), assignment behaves as you would
expectgtgtgt x 3 Creates 3, name x
refers to 3 gtgtgt y x Creates name y,
refers to 3.gtgtgt y 4 Creates ref for 4.
Changes y.gtgtgt print x No effect on x,
still ref 3.3
Name x Ref ltaddress1gt
Type Integer Data 3
Name y Ref ltaddress2gt
Type Integer Data 4
23Assignment 2
- But well see that for other more complex data
types assignment seems to work differently. - Were talking about lists, dictionaries,
user-defined classes. - We will learn details about all of these type
later. - The important thing is that they are mutable.
- This means we can make changes to their data
without having to copy it into a new memory
reference address each time. - gtgtgt x 3 x some mutable object
- gtgtgt y x y x
- gtgtgt y 4 make a change to y
- gtgtgt print x look at x
- 3 x will be changed as well
immutable
mutable
24Assignment 3
- Assume we have a name x that refers to a mutable
object of some user-defined class. This class
has a set and a get function for some value. - gtgtgt x.getSomeValue()4
- We now create a new name y and set yx.
- gtgtgt y x
- This creates a new name y which points to the
same memory reference as the name x. Now, if we
make some change to y, then x will be affected as
well. - gtgtgt y.setSomeValue(3)gtgtgt y.getSomeValue()3gtgtgt
x.getSomeValue()3
25Assignment 4
- Because mutable data types can be changed in
place without producing a new reference every
time there is a modification, then changes to one
name for a reference will seem to affect all
those names for that same reference. This leads
to the behavior on the previous slide. - Passing Parameters to Functions
- When passing parameters, immutable data types
appear to be call by value while mutable data
types are call by reference. - (Mutable data can be changed inside a function to
which they are passed as a parameter. Immutable
data seems unaffected when passed to functions.)
26Naming and Assignment Details
27Naming Rules
- Names are case sensitive and cannot start with a
number. They can contain letters, numbers, and
underscores. - bob Bob _bob _2_bob_ bob_2 BoB
- There are some reserved words
- and, assert, break, class, continue, def, del,
elif, else, except, exec, finally, for, from,
global, if, import, in, is, lambda, not, or,
pass, print, raise, return, try, while
28Accessing Non-existent Name
- If you try to access a name before its been
properly created (by placing it on the left side
of an assignment), youll get an error. - gtgtgt y
- Traceback (most recent call last)
- File "ltpyshell16gt", line 1, in -toplevel-
- y
- NameError name y' is not defined
- gtgtgt y 3
- gtgtgt y
- 3
29Multiple Assignment
- You can also assign to multiple names at the same
time. - gtgtgt x, y 2, 3
- gtgtgt x
- 2
- gtgtgt y
- 3
-
30String Operations
31String Operations
- We can use some methods built-in to the string
data type to perform some formatting operations
on strings - gtgtgt hello.upper()
- HELLO
- There are many other handy string operations
available. Check the Python documentation for
more.
32String Formatting Operator
- The operator allows us to build a string out of
many data items in a fill in the blanks
fashion. - Also allows us to control how the final string
output will appear. - For example, we could force a number to display
with a specific number of digits after the
decimal point. - It is very similar to the sprintf command of C.
33Formatting Strings with
- gtgtgt x abc
- gtgtgt y 34
- gtgtgt s xyz d (x, y)
- abc xyz 34
- The tuple following the operator is used to
fill in the blanks in the original string marked
with s or d. - Check Python documentation for whether to use s,
d, or some other formatting code inside the
string.
34Printing with Python
- You can print a string to the screen using
print. - Using the string operator in combination with
the print command, we can format our output text.
- gtgtgt print s xyz d (abc, 34)
- abc xyz 34
- Print automatically adds a newline to the end
of the string. If you include a list of strings,
it will concatenate them with a space between
them. - gtgtgt print abc gtgtgt print abc, def
- abc abc def
35Container Types in Python
36Container Types
- Last time, we saw the basic data types in Python
integers, floats, and strings. - Containers are other built-in data types in
Python. - Can hold objects of any type (including their own
type). - There are three kinds of containers
- Tuples
- A simple immutable ordered sequence of items.
- Lists
- Sequence with more powerful manipulations
possible. - Dictionaries
- A look-up table of key-value pairs.
37Tuples, Lists, and Strings Similarities
38Similar Syntax
- Tuples and lists are sequential containers that
share much of the same syntax and functionality. - For conciseness, they will be introduced
together. - The operations shown in this section can be
applied to both tuples and lists, but most
examples will just show the operation performed
on one or the other. - While strings arent exactly a container data
type, they also happen to share a lot of their
syntax with lists and tuples so, the operations
you see in this section can apply to them as well.
39Tuples, Lists, and Strings 1
- Tuples are defined using parentheses (and
commas). - gtgtgt tu (23, abc, 4.56, (2,3), def)
- Lists are defined using square brackets (and
commas). - gtgtgt li abc, 34, 4.34, 23
- Strings are defined using quotes (, , or ).
- gtgtgt st Hello World
- gtgtgt st Hello World
- gtgtgt st This is a multi-line
- string that uses triple quotes.
40Tuples, Lists, and Strings 2
- We can access individual members of a tuple,
list, or string using square bracket array
notation. - gtgtgt tu1 Second item in the tuple.
- abc
- gtgtgt li1 Second item in the list.
- 34
- gtgtgt st1 Second character in string.
- e
41Looking up an Item
- gtgtgt t (23, abc, 4.56, (2,3), def)
- Positive index count from the left, starting
with 0. - gtgtgt t1
- abc
- Negative lookup count from right, starting with
1. - gtgtgt t-3
- 4.56
42Slicing Return Copy of a Subset 1
- gtgtgt t (23, abc, 4.56, (2,3), def)
- Return a copy of the container with a subset of
the original members. Start copying at the first
index, and stop copying before the second index. - gtgtgt t14
- (abc, 4.56, (2,3))
- You can also use negative indices when slicing.
- gtgtgt t1-1
- (abc, 4.56, (2,3))
43Slicing Return Copy of a Subset 2
- gtgtgt t (23, abc, 4.56, (2,3), def)
- Omit the first index to make a copy starting from
the beginning of the container. - gtgtgt t2
- (23, abc)
- Omit the second index to make a copy starting at
the first index and going to the end of the
container. - gtgtgt t2
- (4.56, (2,3), def)
44Copying the Whole Container
- You can make a copy of the whole tuple using .
- gtgtgt t
- (23, abc, 4.56, (2,3), def)
- So, theres a difference between these two lines
- gtgtgt list2 list1 2 names refer to 1 ref
- Changing one affects both
- gtgtgt list2 list1 Two copies, two refs
- Theyre independent
45The in Operator
- Boolean test whether a value is inside a
container - gtgtgt t 1, 2, 4, 5
- gtgtgt 3 in t
- False
- gtgtgt 4 in t
- True
- gtgtgt 4 not in t
- False
- Be careful the in keyword is also used in the
syntax of other unrelated Python constructions
for loops and list comprehensions.
46The Operator
- The operator produces a new tuple, list, or
string whose value is the concatenation of its
arguments. - gtgtgt (1, 2, 3) (4, 5, 6)
- (1, 2, 3, 4, 5, 6)
- gtgtgt 1, 2, 3 4, 5, 6
- 1, 2, 3, 4, 5, 6
- gtgtgt Hello World
- Hello World
47The Operator
- The operator produces a new tuple, list, or
string that repeats the original content. - gtgtgt (1, 2, 3) 3
- (1, 2, 3, 1, 2, 3, 1, 2, 3)
- gtgtgt 1, 2, 3 3
- 1, 2, 3, 1, 2, 3, 1, 2, 3
- gtgtgt Hello 3
- HelloHelloHello
48MutabilityTuples vs. Lists
49Tuples Immutable
- gtgtgt t (23, abc, 4.56, (2,3), def)
- gtgtgt t2 3.14
- Traceback (most recent call last)
- File "ltpyshell75gt", line 1, in -toplevel-
- tu2 3.14
- TypeError object doesn't support item assignment
- Youre not allowed to change a tuple in place in
memory so, you cant just change one element of
it. - But its always OK to make a fresh tuple and
assign its reference to a previously used name. - gtgtgt t (1, 2, 3, 4, 5)
50Lists Mutable
- gtgtgt li abc, 23, 4.34, 23
- gtgtgt li1 45
- gtgtgt liabc, 45, 4.34, 23
- We can change lists in place. So, its ok to
change just one element of a list. Name li still
points to the same memory reference when were
done.
51Slicing with mutable lists
- gtgtgt L spam, Spam, SPAM
- gtgtgt L1 eggs
- gtgtgt L
- spam, eggs, SPAM
- gtgtgt L02 eat,more
- gtgtgt L
- eat, more, SPAM
52Operations on Lists Only 1
- Since lists are mutable (they can be changed in
place in memory), there are many more operations
we can perform on lists than on tuples. - The mutability of lists also makes managing them
in memory more complicated So, they arent as
fast as tuples. Its a tradeoff.
53Operations on Lists Only 2
- gtgtgt li 1, 2, 3, 4, 5
- gtgtgt li.append(a)
- gtgtgt li
- 1, 2, 3, 4, 5, a
- gtgtgt li.insert(2, i)
- gtgtgtli
- 1, 2, i, 3, 4, 5, a
- NOTE li li.insert(2,I) loses the list!
54Operations on Lists Only 3
- The extend operation is similar to
concatenation with the operator. But while the
creates a fresh list (with a new memory
reference) containing copies of the members from
the two inputs, the extend operates on list li in
place. - gtgtgt li.extend(9, 8, 7)
- gtgtgtli
- 1, 2, i, 3, 4, 5, a, 9, 8, 7
- Extend takes a list as an argument. Append takes
a singleton. - gtgtgt li.append(9, 8, 7)
- gtgtgt li
- 1, 2, i, 3, 4, 5, a, 9, 8, 7, 9, 8, 7
55Operations on Lists Only 4
- gtgtgt li a, b, c, b
- gtgtgt li.index(b) index of first occurrence
- 1
- gtgtgt li.count(b) number of occurrences
- 2
- gtgtgt li.remove(b) remove first occurrence
- gtgtgt li
- a, c, b
56Operations on Lists Only 5
- gtgtgt li 5, 2, 6, 8
- gtgtgt li.reverse() reverse the list in place
- gtgtgt li
- 8, 6, 2, 5
- gtgtgt li.sort() sort the list in place
- gtgtgt li
- 2, 5, 6, 8
- gtgtgt li.sort(some_function)
- sort in place using user-defined comparison
57Tuples vs. Lists
- Lists slower but more powerful than tuples.
- Lists can be modified, and they have lots of
handy operations we can perform on them. - Tuples are immutable and have fewer features.
- We can always convert between tuples and lists
using the list() and tuple() functions. - li list(tu)
- tu tuple(li)
58String Conversions
59String to List to String
- Join turns a list of strings into one
string. ltseparator_stringgt.join( ltsome_listgt ) - gtgtgt .join( abc, def, ghi )
- abcdefghi
- Split turns one string into a list of
strings. ltsome_stringgt.split(
ltseparator_stringgt ) - gtgtgt abcdefghi.split( )
- abc, def, ghi
- gtgtgt I love New York.split()
- I, love, New, York
60Convert Anything to a String
- The built-in str() function can convert an
instance of any data type into a string. - You can define how this function behaves for
user-created data types. You can also redefine
the behavior of this function for many types. - gtgtgt Hello str(2)
- Hello 2
61Dictionaries
62Basic Syntax for Dictionaries 1
- Dictionaries store a mapping between a set of
keys and a set of values. - Keys can be any immutable type.
- Values can be any type, and you can have
different types of values in the same dictionary. - You can define, modify, view, lookup, and delete
the key-value pairs in the dictionary.
63Basic Syntax for Dictionaries 2
- gtgtgt d userbozo, pswd1234
- gtgtgt duser
- bozo
- gtgtgt dpswd
- 1234
- gtgtgt dbozo
- Traceback (innermost last)
- File ltinteractive inputgt line 1, in ?
- KeyError bozo
64Basic Syntax for Dictionaries 3
- gtgtgt d userbozo, pswd1234
- gtgtgt duser clown
- gtgtgt d
- userclown, pswd1234
- Note Keys are unique. Assigning to an
existing key just replaces its value. - gtgtgt did 45
- gtgtgt d
- userclown, id45, pswd1234
- Note Dictionaries are unordered. New
entry might appear anywhere in the output.
65Basic Syntax for Dictionaries 4
- gtgtgt d userbozo, p1234, i34
- gtgtgt del duser Remove one.
- gtgtgt d
- p1234, i34
- gtgtgt d.clear() Remove all.
- gtgtgt d
66Basic Syntax for Dictionaries 5
- gtgtgt d userbozo, p1234, i34
- gtgtgt d.keys() List of keys.
- user, p, i
- gtgtgt d.values() List of values.
- bozo, 1234, 34
- gtgtgt d.items() List of item tuples.
- (user,bozo), (p,1234), (i,34)
67Assignment and Containers
68Multiple Assignment with Container Classes
- Weve seen multiple assignment before
- gtgtgt x, y 2, 3
- But you can also do it with containers.
- The type and shape just has to match.
- gtgtgt (x, y, (w, z)) (2, 3, (4, 5))
- gtgtgt x, y 4, 5
69Empty Containers 1
- We know that assignment is how to create a name.
- x 3 Creates name x of type integer.
- Assignment is also what creates named references
to containers. - gtgtgt d a3, b4
- We can also create empty containers
- gtgtgt li
- gtgtgt tu ()
- gtgtgt di
Note an empty containeris logically equivalent
to False. (Just like None.)
70Empty Containers 2
- Why create a named reference to empty container?
You might want to use append or some other list
operation before you really have any data in your
list. This could cause an unknown name error if
you dont properly create your named reference
first. - gtgtgt g.append(3)
- Python complains here about the unknown name
g! - gtgtgt g
- gtgtgt g.append(3)
- gtgtgt g 3