UNB/ CS/ David Bremner/ teaching/ cs2613/ books/ practical-python/ 02 Working with data/ 02 Containers

Contents | Previous (2.1 Datatypes) | Next (2.3 Formatting)

2.2 Containers

This section discusses lists, dictionaries, and sets.

Overview

Programs often have to work with many objects.

There are three main choices to use.

Lists as a Container

Use a list when the order of the data matters. Remember that lists can hold any kind of object. For example, a list of tuples.

portfolio = [
    ('GOOG', 100, 490.1),
    ('IBM', 50, 91.3),
    ('CAT', 150, 83.44)
]

portfolio[0]            # ('GOOG', 100, 490.1)
portfolio[2]            # ('CAT', 150, 83.44)

List construction

Building a list from scratch.

records = []  # Initial empty list

# Use .append() to add more items
records.append(('GOOG', 100, 490.10))
records.append(('IBM', 50, 91.3))
...

An example when reading records from a file.

records = []  # Initial empty list

with open('Data/portfolio.csv', 'rt') as f:
    next(f) # Skip header
    for line in f:
        row = line.split(',')
        records.append((row[0], int(row[1]), float(row[2])))

Dicts as a Container

Dictionaries are useful if you want fast random lookups (by key name). For example, a dictionary of stock prices:

prices = {
   'GOOG': 513.25,
   'CAT': 87.22,
   'IBM': 93.37,
   'MSFT': 44.12
}

Here are some simple lookups:

>>> prices['IBM']
93.37
>>> prices['GOOG']
513.25
>>>

Dict Construction

Example of building a dict from scratch.

prices = {} # Initial empty dict

# Insert new items
prices['GOOG'] = 513.25
prices['CAT'] = 87.22
prices['IBM'] = 93.37

An example populating the dict from the contents of a file.

prices = {} # Initial empty dict

with open('Data/prices.csv', 'rt') as f:
    for line in f:
        row = line.split(',')
        prices[row[0]] = float(row[1])

Note: If you try this on the Data/prices.csv file, you'll find that it almost works--there's a blank line at the end that causes it to crash. You'll need to figure out some way to modify the code to account for that (see Exercise 2.6).

Dictionary Lookups

You can test the existence of a key.

if key in d:
    # YES
else:
    # NO

You can look up a value that might not exist and provide a default value in case it doesn't.

name = d.get(key, default)

An example:

>>> prices.get('IBM', 0.0)
93.37
>>> prices.get('SCOX', 0.0)
0.0
>>>

Composite keys

Almost any type of value can be used as a dictionary key in Python. A dictionary key must be of a type that is immutable. For example, tuples:

holidays = {
  (1, 1) : 'New Years',
  (3, 14) : 'Pi day',
  (9, 13) : "Programmer's day",
}

Then to access:

>>> holidays[3, 14]
'Pi day'
>>>

Neither a list, a set, nor another dictionary can serve as a dictionary key, because lists and dictionaries are mutable.

Sets

Sets are collection of unordered unique items.

tech_stocks = { 'IBM','AAPL','MSFT' }
# Alternative syntax
tech_stocks = set(['IBM', 'AAPL', 'MSFT'])

Sets are useful for membership tests.

>>> tech_stocks
set(['AAPL', 'IBM', 'MSFT'])
>>> 'IBM' in tech_stocks
True
>>> 'FB' in tech_stocks
False
>>>

Sets are also useful for duplicate elimination.

names = ['IBM', 'AAPL', 'GOOG', 'IBM', 'GOOG', 'YHOO']

unique = set(names)
# unique = set(['IBM', 'AAPL','GOOG','YHOO'])

Additional set operations:

unique.add('CAT')        # Add an item
unique.remove('YHOO')    # Remove an item

s1 = { 'a', 'b', 'c'}
s2 = { 'c', 'd' }
s1 | s2                 # Set union { 'a', 'b', 'c', 'd' }
s1 & s2                 # Set intersection { 'c' }
s1 - s2                 # Set difference { 'a', 'b' }

Exercises

In these exercises, you start building one of the major programs used for the rest of this course. Do your work in the file Work/report.py.

Exercise 2.4: A list of tuples

The file Data/portfolio.csv contains a list of stocks in a portfolio. In Exercise 1.30, you wrote a function portfolio_cost(filename) that read this file and performed a simple calculation.

Your code should have looked something like this:

# pcost.py

import csv

def portfolio_cost(filename):
    '''Computes the total cost (shares*price) of a portfolio file'''
    total_cost = 0.0

    with open(filename, 'rt') as f:
        rows = csv.reader(f)
        headers = next(rows)
        for row in rows:
            nshares = int(row[1])
            price = float(row[2])
            total_cost += nshares * price
    return total_cost

Using this code as a rough guide, create a new file report.py. In that file, define a function read_portfolio(filename) that opens a given portfolio file and reads it into a list of tuples. To do this, you’re going to make a few minor modifications to the above code.

First, instead of defining total_cost = 0, you’ll make a variable that’s initially set to an empty list. For example:

portfolio = []

Next, instead of totaling up the cost, you’ll turn each row into a tuple exactly as you just did in the last exercise and append it to this list. For example:

for row in rows:
    holding = (row[0], int(row[1]), float(row[2]))
    portfolio.append(holding)

Finally, you’ll return the resulting portfolio list.

Experiment with your function interactively (just a reminder that in order to do this, you first have to run the report.py program in the interpreter):

Hint: Use -i when executing the file in the terminal

>>> portfolio = read_portfolio('Data/portfolio.csv')
>>> portfolio
[('AA', 100, 32.2), ('IBM', 50, 91.1), ('CAT', 150, 83.44), ('MSFT', 200, 51.23),
    ('GE', 95, 40.37), ('MSFT', 50, 65.1), ('IBM', 100, 70.44)]
>>>
>>> portfolio[0]
('AA', 100, 32.2)
>>> portfolio[1]
('IBM', 50, 91.1)
>>> portfolio[1][1]
50
>>> total = 0.0
>>> for s in portfolio:
        total += s[1] * s[2]

>>> print(total)
44671.15
>>>

This list of tuples that you have created is very similar to a 2-D array. For example, you can access a specific column and row using a lookup such as portfolio[row][column] where row and column are integers.

That said, you can also rewrite the last for-loop using a statement like this:

>>> total = 0.0
>>> for name, shares, price in portfolio:
            total += shares*price

>>> print(total)
44671.15
>>>

Exercise 2.5: List of Dictionaries

Take the function you wrote in Exercise 2.4 and modify to represent each stock in the portfolio with a dictionary instead of a tuple. In this dictionary use the field names of "name", "shares", and "price" to represent the different columns in the input file.

Experiment with this new function in the same manner as you did in Exercise 2.4.

>>> portfolio = read_portfolio('Data/portfolio.csv')
>>> portfolio
[{'name': 'AA', 'shares': 100, 'price': 32.2}, {'name': 'IBM', 'shares': 50, 'price': 91.1},
    {'name': 'CAT', 'shares': 150, 'price': 83.44}, {'name': 'MSFT', 'shares': 200, 'price': 51.23},
    {'name': 'GE', 'shares': 95, 'price': 40.37}, {'name': 'MSFT', 'shares': 50, 'price': 65.1},
    {'name': 'IBM', 'shares': 100, 'price': 70.44}]
>>> portfolio[0]
{'name': 'AA', 'shares': 100, 'price': 32.2}
>>> portfolio[1]
{'name': 'IBM', 'shares': 50, 'price': 91.1}
>>> portfolio[1]['shares']
50
>>> total = 0.0
>>> for s in portfolio:
        total += s['shares']*s['price']

>>> print(total)
44671.15
>>>

Here, you will notice that the different fields for each entry are accessed by key names instead of numeric column numbers. This is often preferred because the resulting code is easier to read later.

Viewing large dictionaries and lists can be messy. To clean up the output for debugging, consider using the pprint function.

>>> from pprint import pprint
>>> pprint(portfolio)
[{'name': 'AA', 'price': 32.2, 'shares': 100},
    {'name': 'IBM', 'price': 91.1, 'shares': 50},
    {'name': 'CAT', 'price': 83.44, 'shares': 150},
    {'name': 'MSFT', 'price': 51.23, 'shares': 200},
    {'name': 'GE', 'price': 40.37, 'shares': 95},
    {'name': 'MSFT', 'price': 65.1, 'shares': 50},
    {'name': 'IBM', 'price': 70.44, 'shares': 100}]
>>>

Exercise 2.6: Dictionaries as a container

A dictionary is a useful way to keep track of items where you want to look up items using an index other than an integer. In the Python shell, try playing with a dictionary:

>>> prices = { }
>>> prices['IBM'] = 92.45
>>> prices['MSFT'] = 45.12
>>> prices
... look at the result ...
>>> prices['IBM']
92.45
>>> prices['AAPL']
... look at the result ...
>>> 'AAPL' in prices
False
>>>

The file Data/prices.csv contains a series of lines with stock prices. The file looks something like this:

"AA",9.22
"AXP",24.85
"BA",44.85
"BAC",11.27
"C",3.72
...

Write a function read_prices(filename) that reads a set of prices such as this into a dictionary where the keys of the dictionary are the stock names and the values in the dictionary are the stock prices.

To do this, start with an empty dictionary and start inserting values into it just as you did above. However, you are reading the values from a file now.

We’ll use this data structure to quickly lookup the price of a given stock name.

A few little tips that you’ll need for this part. First, make sure you use the csv module just as you did before—there’s no need to reinvent the wheel here.

>>> import csv
>>> f = open('Data/prices.csv', 'r')
>>> rows = csv.reader(f)
>>> for row in rows:
        print(row)


['AA', '9.22']
['AXP', '24.85']
...
[]
>>>

The other little complication is that the Data/prices.csv file may have some blank lines in it. Notice how the last row of data above is an empty list—meaning no data was present on that line.

There’s a possibility that this could cause your program to die with an exception. Use the try and except statements to catch this as appropriate. Thought: would it be better to guard against bad data with an if-statement instead?

Once you have written your read_prices() function, test it interactively to make sure it works:

>>> prices = read_prices('Data/prices.csv')
>>> prices['IBM']
106.28
>>> prices['MSFT']
20.89
>>>

Exercise 2.7: Finding out if you can retire

Tie all of this work together by adding a few additional statements to your report.py program that computes gain/loss. These statements should take the list of stocks in Exercise 2.5 and the dictionary of prices in Exercise 2.6 and compute the current value of the portfolio along with the gain/loss.

Contents | Previous (2.1 Datatypes) | Next (2.3 Formatting)