Path: blob/master/python/python3_cookbook/1_data_structure.ipynb
1480 views
Table of Contents
- 1 Data Structure
- 1.1 Simple Assignments to Unpack Iterables into Separate Variables
- 1.2 Use "Star Expressions" to Unpack Iterables of Arbitrary Length
- 1.3 Keeping the Last N Items Using deque
- 1.4 Finding the Largest or Smallest N Items Using heapq
- 1.5 PriorityQueue
- 1.6 Keeping Dictionaries in Order Using OrderedDict
- 1.7 Calculating with Dictionaries Using zip
- 1.8 Finding Commonalities in Two Dictionaries Using set operations
- 1.9 Removing Duplicates from a Sequence while Maintaining Order
- 1.10 Naming Slices
- 1.11 Find the Most Frequently Items in a Sequence Using Counter
- 1.12 Sorting with itemgetter and attrgetter
- 1.13 Named Tuples
- 1.14 DefaultDict
- 1.15 Filtering Sequence Elements
- 1.16 Extracting a Subset of a Dictionary
- 1.17 Transforming and Reducing Data Using Generation Expression
- 1.18 For else
- 2 Reference
Data Structure
Some of the materials are a condensed reimplementation from the resource: Python3 Cookbook Chapter 1. Data Structures and Algorithms, which originally was freely available online.
Simple Assignments to Unpack Iterables into Separate Variables
Example1: Unpacking a tuple.
This works for all types of sequences (iterables), including tuples, list, strings, generators, files.
Example2: If you want to discard some of the elements, simply use _
to represent it (You can use anything you want, this is just convention).
Use "Star Expressions" to Unpack Iterables of Arbitrary Length
Example1: Data that has name, e-mail and arbitrary number of telephone numbers.
The star expression will always unpack a list (including none).
Example2: Performing different actions when looping over different "tagged" tuples. "Tagged" simply means they have some known pattern or structure. e.g. The first element is a tag indicating what other information does this tag contain.
Example3: String manipulation and throwing away variables.
Keeping the Last N Items Using deque
Example1: A fix-sized queue that removes the oldest item when a new item is added and the queue is full.
Example2: A unbounded queue. You can pop and add item from both end with O(1).
Finding the Largest or Smallest N Items Using heapq
Example1: nlargest()
and nsmallest()
.
Example2: nlargest()
and nsmallest()
with more complex data structure.
When to use what:
Use
nlargest()
andnsmallest()
if you are trying to find a relatively small number of items.Use
min()
andmax()
if you are simply trying to find the single smallest or largest item (N=1).Use
sorted(items)[:N]
orsorted(items)[-N:]
if N is about the same size as the input.
PriorityQueue
We can think of priority queue as a modified queue: instead of retrieving the next element by insertion time, it retrieves the highest-priority element. The priority of individual elements is decided by the ordering applied to their keys.
Priority queues are commonly used for dealing with scheduling problems. High-priority tasks on the system should take precedence over lower-priority tasks. By organizing pending tasks in a priority queue that uses the task urgency as the key, the task scheduler can allow the highest-priority tasks to run first.
Let’s take a look at how we can implement priority queues in Python using built-in data structures or data structures that ship with Python’s standard library.
As we can infer from the output, prioriy queue stores the elements by its priority, in this case the first element in the tuple. After from sorting primitive types such as integers, we can also sort objects that we've defined. To perform sorting on custom objects we need to implement the dunder methods for all 6 comparisons.
Operator | Method |
---|---|
== | __eq__ |
!= | __ne__ |
< | __le__ |
<= | __le__ |
> | __gt__ |
>= | __ge__ |
Keeping Dictionaries in Order Using OrderedDict
Be aware that the size of an OrderedDict is more than twice as large as a normal dictionary!!
Calculating with Dictionaries Using zip
Performing calculations (e.g., min, max, sorting, etc.) on a dictionary by convert the it into tuples of ( value, key ).
Finding Commonalities in Two Dictionaries Using set operations
We can perform common set operations with dictionary .keys()
and .items()
without having to convert them into a set as they are unique.
Removing Duplicates from a Sequence while Maintaining Order
Naming Slices
Find the Most Frequently Items in a Sequence Using Counter
The most_common
functionality from Counter.
Counter is simply a dictionary and we can manually increment the count.
Sorting with itemgetter and attrgetter
By using the key
argument with the sorted
function we can accomplish a bit more complex operations when it comes to sorting. Note that key only accepts functions as its parameters, thus in the following we use a lambda to create an anonymous function and sort by the second element of the tuples.
Instead of doing that we can use itemgetter
for convenience.
The same notion also applies to sorting by dictionary key.
The method is faster then key = lambda r: r['fname']
. And we can assign multiple values inside the itemgetter().
There's also attrgetter
for performing sorting according to class object's attribute.
Named Tuples
Create a tuple with names for clearity (compared with tuples) and also has the immutable feature.
namedtuple can be used as a replacement for a dictionary, which requires more space to store. However, be aware that a it's immutable.
DefaultDict
Whenever you need a dictionary, and each value of the dictionary has to start with the default value, use defaultdict. A defaultdict will never raise a KeyError. Any key that does not exist gets the value returned by the default factory.
Filtering Sequence Elements
Example1: Replacing the values that don’t meet the criteria with a new value.
Example2: Use filter
when the filtering criteria can't be easily expressed.
Example3: itertools.compress()
for filtering a sequence with an accompanying boolean sequence. Suppose you want to make a list of all addresses where the corresponding count value was greater than 5.
Extracting a Subset of a Dictionary
Transforming and Reducing Data Using Generation Expression
Example1: Notice that we do not need to write sum((x * x for x in nums))
to first create the generator.
Example2: Determine if any .py files exist in a directory.
For else
For else let's us remove extraneous flag variables. Examples with and without for ... else ...
Note that Python also has a try, except, else expression. In this scenario, else will evaluate only if there is no exception from the try block, allowing us to simplify the more complicated code below:
We can re-write the flow above using else:
The reason why this type of flow control exists is because A try block allows us to handle an "expected" error. The except block should only catch exceptions we are prepared to handle. If we handle an unexpected error, our code may do the wrong thing and hide bugs.