Contents | Previous (2.4 Sequences) | Next (2.6 List Comprehensions)
2.5 collections module
The collections
module provides a number of useful objects for data handling.
This part briefly introduces some of these features.
Example: Counting Things
Let's say you want to tabulate the total shares of each stock.
portfolio = [
('GOOG', 100, 490.1),
('IBM', 50, 91.1),
('CAT', 150, 83.44),
('IBM', 100, 45.23),
('GOOG', 75, 572.45),
('AA', 50, 23.15)
]
There are two IBM
entries and two GOOG
entries in this list. The shares need to be combined together somehow.
Counters
Solution: Use a Counter
.
from collections import Counter
total_shares = Counter()
for name, shares, price in portfolio:
total_shares[name] += shares
total_shares['IBM'] # 150
Example: One-Many Mappings
Problem: You want to map a key to multiple values.
portfolio = [
('GOOG', 100, 490.1),
('IBM', 50, 91.1),
('CAT', 150, 83.44),
('IBM', 100, 45.23),
('GOOG', 75, 572.45),
('AA', 50, 23.15)
]
Like in the previous example, the key IBM
should have two different tuples instead.
Solution: Use a defaultdict
.
from collections import defaultdict
holdings = defaultdict(list)
for name, shares, price in portfolio:
holdings[name].append((shares, price))
holdings['IBM'] # [ (50, 91.1), (100, 45.23) ]
The defaultdict
ensures that every time you access a key you get a default value.
Example: Keeping a History
Problem: We want a history of the last N things.
Solution: Use a deque
.
from collections import deque
history = deque(maxlen=N)
with open(filename) as f:
for line in f:
history.append(line)
...
Exercises
The collections
module might be one of the most useful library
modules for dealing with special purpose kinds of data handling
problems such as tabulating and indexing.
In this exercise, we’ll look at a few simple examples. Start by
running your report.py
program so that you have the portfolio of
stocks loaded in the interactive mode.
bash % python3 -i report.py
Exercise 2.18: Tabulating with Counters
Suppose you wanted to tabulate the total number of shares of each stock.
This is easy using Counter
objects. Try it:
>>> portfolio = read_portfolio('Data/portfolio.csv')
>>> from collections import Counter
>>> holdings = Counter()
>>> for s in portfolio:
holdings[s['name']] += s['shares']
>>> holdings
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
>>>
Carefully observe how the multiple entries for MSFT
and IBM
in portfolio
get combined into a single entry here.
You can use a Counter just like a dictionary to retrieve individual values:
>>> holdings['IBM']
150
>>> holdings['MSFT']
250
>>>
If you want to rank the values, do this:
>>> # Get three most held stocks
>>> holdings.most_common(3)
[('MSFT', 250), ('IBM', 150), ('CAT', 150)]
>>>
Let’s grab another portfolio of stocks and make a new Counter:
>>> portfolio2 = read_portfolio('Data/portfolio2.csv')
>>> holdings2 = Counter()
>>> for s in portfolio2:
holdings2[s['name']] += s['shares']
>>> holdings2
Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25})
>>>
Finally, let’s combine all of the holdings doing one simple operation:
>>> holdings
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
>>> holdings2
Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25})
>>> combined = holdings + holdings2
>>> combined
Counter({'MSFT': 275, 'HPQ': 250, 'GE': 220, 'AA': 150, 'IBM': 150, 'CAT': 150})
>>>
This is only a small taste of what counters provide. However, if you ever find yourself needing to tabulate values, you should consider using one.
Commentary: collections module
The collections
module is one of the most useful library modules
in all of Python. In fact, we could do an extended tutorial on just
that. However, doing so now would also be a distraction. For now,
put collections
on your list of bedtime reading for later.
Contents | Previous (2.4 Sequences) | Next (2.6 List Comprehensions)