Python: Functions and Mutable Defaults

April 22, 2017
11 min. read

This post is part of the Python Tips series.

Default arguments in Python allow you to make complex functions very easy to use. They can be called with mostly defaults or called with as much configuration as needed. This comes with a gotcha for those getting started in this. If you use mutable data structures as defaults, things don’t behave exactly as you would expect.

However, before we start, lets cover some things you may not know that I will use in the code. And others that are just interesting and partially related.

Short detour into *args and **kwargs

Once we get done with all the detours, you will see *items as a parameter in the code below. Prefixing an argument with a single * means that you will collect all unnamed parameters into an argument list. If you have single arguments you wish to capture outside of this, they just need to appear prior to the catch all variable. This is most commonly used with the term *args for arguments, which is fine if representing generic or various types of data. However, I prefer to make a clearer name when possible. So here is used *items as we are sending items of a list.

We are not using **kwargs, or the ** style, but this is used to expand key word arguments. So you can think of a dictionary being passed in. Here is a function that takes key word arguments and just prints them up. Notice how we are using .items() which is the same way to get (key, value) pairs out of a dictionary.

def my_function(**kwargs):
    for (key, value) in kwargs.items():
        print('{} - {}'.format(key, value))

We can call this with two arguments and look at the output:

my_function(joe='corny', amy='amazing')
joe - corny
amy - amazing

Lets go a little crazier:

my_function(a=1, b=2, c=3, ddddddd=4, i='am', getting='tired', of='arguments')
a - 1
b - 2
c - 3
ddddddd - 4
i - am
getting - tired
of - arguments

More detour: Forcing named arguments

New with Python 3 is the ability to force named arguments. This means they must be entered with the name. This is easiest to show with a few examples.

def print_label(label_text, inverted, mirrored, flipped):
    """ function for printing label with many boolean args """
    pass

print_label('My Text', False, True, False)

While I would define the print_label function with good defaults for the last three arguments, I didn’t want to here as I don’t want you to think that forcing named arguments requires arguments with defaults.

If you saw that print_label call somewhere in code, how easy would it be to figure out what is going on? Sure, you knew at the time you were coding as you just saw the definition or PyCharm popped up hints.

Just reading through source code, it will be impossible to understand. This is bad.

def print_label(label_text, *, inverted, mirrored, flipped):
    """ function for printing label with many boolean args """
    pass

print_label('My Text', False, True, False)  # Errors

print_label('My Text', inverted=False, mirrored=True, flipped=False)

I have added an * argument that indicates a split between the positional arguments and named arguments. If you are using a *args, you get this already. The change in Python 3 is allowing this functionality without allowing unlimited positional arguments, as *args use would allow.

In the first call to print_label above, we would receive a TypeError that tells us print_label() takes 1 positional argument but 4 were given. The second call will work properly.

Notice how the requirement of named parameters combined with good names, makes this call self documenting. It is a little extra typing that is completely work the effort in readability. Source code is read many more times than it is written. All you are doing in the function definition is forcing programmers to be good source code citizens.

Last detour, I promise

Since we talked about * and ** in receiving arguments to a function, I wanted to cover the idea of expanding lists and dictionaries. The syntax is the same. If you have a tuple with three items and wish to send those arguments into a function, you would just use *tuple_var. This expands them into separate arguments. Without this, you would be sending the tuple as the first argument. This works the same for a list as a tuple. For a dict, just use **dict_var to send named arguments in.

Back to default mutability

That was a deeper rabbit whole than I planned, but it was a good time to cover some of those points. Lets build a function that allows you to add elements to a list, but create a fresh list if you don’t provide one. (Just act like the list.extend() method does not exist.)

Our first pass if we are new to Python might looks like this:

def append_values(*items, to_list=[]):
    for item in items:
        to_list.append(item)
    return to_list

list_a = append_values(1, 2)
list_b = append_values(3, 4, to_list=[])
list_c = append_values(5, 6, to_list=[9, 10])
list_d = append_values(7, 8)

Even people that don’t know the dangers of mutable defaults in parameters will be able to determine the values of lists a through c. However, many will not know what happens for list_d.

list_a passes in two values, but no list to append onto. So we start with the default empty list. The output is what we would expect as [1, 2].

list_b passes in both values and an empty starting list. So we would expect the output to be what it is: [3, 4].

list_c passes in a prepopulated list with values, so these are added at the end and we get [9, 10, 5, 6] as our output.

list_d seems like it is exactly the same as list_a, but with different values. We would expect to have an empty list as default and receive [7, 8] back. However, we receive [1, 2, 7, 8] from append_values.

How is this possible, with the default values of []? Here is the issue with using mutable types as defaults.

When a function is created, Python keeps data related to it. The default arguments are created and stored. Each time the function is called, Python doesn’t recreate it. This would be wasteful. It just looks at the values and uses them. So this default list starts empty. However, in the list_a call, we are using it and appending 1 and 2 into it. list_b and list_c are not using the default, so these default values stay dormant. list_d again uses the default list, which is not empty but [1, 2] from the list_a call. This is why we get the values instead of an empty list.

Using inspect

Before we show how to keep this from happening, I want to show what is happening rather than forcing you to take my word for it. For this I’ll be using the inspect module that allows us to peek under the covers.

This is the same code as above, with a little more in it. I’m printing the values and displaying the values inside the function definition.

import inspect

def append_values(*items, to_list=[]):
    for item in items:
        to_list.append(item)
    return to_list

print(inspect.getfullargspec(append_values))
FullArgSpec(args=[], varargs='items', varkw=None, defaults=None, kwonlyargs=['to_list'], kwonlydefaults={'to_list': []}, annotations={})

Before we start calling this function, I used inspect to print out the full argument specification just after the function was created.

  • args is empty, because we don’t have any positional arguments that are not handled via the *items variable. This is a list that would contain them otherwise.
  • varargs is a string, as it can only hold one value (or be None). There is only one variable that collects all left over positional arguments.
  • varkw is None, as we don’t have any ** style named argument catchalls.
  • defaults for positional arguments is None as we don’t have any.
  • kwonlyargs shows us our to_list keyword argument.
  • kwonlydefaults is a dictionary that holds our default values for keywords. Notice how we have to_list with a default of [].
  • annotations are functional annotations new in Python 3. These are defined in PEP 3107. I’m not going to talk about them, but feel free to take a look.

So with the initial state after the function is created, our to_list is []. Lets follow this through the executions.

list_a = append_values(1, 2)
print(list_a)
print(inspect.getfullargspec(append_values))
[1, 2]
FullArgSpec(args=[], varargs='items', varkw=None, defaults=None, kwonlyargs=['to_list'], kwonlydefaults={'to_list': [1, 2]}, annotations={})

We received [1, 2] as I discussed above for list_a. However, looks at kwonlydefault ins the srgspec. We are not [1, 2] instead of []. We have poisoned our pristine default.

list_b = append_values(3, 4, to_list=[])
print(list_b)
print(inspect.getfullargspec(append_values))
[3, 4]
FullArgSpec(args=[], varargs='items', varkw=None, defaults=None, kwonlyargs=['to_list'], kwonlydefaults={'to_list': [1, 2]}, annotations={})
list_c = append_values(5, 6, to_list=[9, 10])
print(list_a)
print(inspect.getfullargspec(append_values))
[1, 2]
FullArgSpec(args=[], varargs='items', varkw=None, defaults=None, kwonlyargs=['to_list'], kwonlydefaults={'to_list': [1, 2]}, annotations={})

Since we passed in the list for both list_b and list_c, default was not touched. However, we are still carrying around the mutated default to sting us in list_d.

list_d = append_values(7, 8)
print(list_a)
print(inspect.getfullargspec(append_values))
[1, 2, 7, 8]
FullArgSpec(args=[], varargs='items', varkw=None, defaults=None, kwonlyargs=['to_list'], kwonlydefaults={'to_list': [1, 2, 7, 8]}, annotations={})

Not only did we pollute the output of list_d, but we have left more sunrises to any future callers. This is the reason you never unintentionally use mutable variables (lists, dicts, objects) as default arguments. They will be created once and continually mutated with calls.

So we have a major bug. We need to fix it. How will we know when we did?

Test Driven Development

There is two ways to write code. Code, then try to see if it works. Or define what working is with tests, and code until it does work.

Yes, you read about how tests are good, but it is more work and you wind up just calling the function a few times with different data and calling it good. The problem is you delete that sample code and move on. How do we tell that it still works with changes?

Or worse, you just write the function and assume it works as we did above. Now we have to figure out why we are getting weird bugs in production as almost everyone passes in a list, but occasionally they don’t.

For all but the smallest programs, testing is lower overall effort and improves code quality. Yes, it feels like more work, especially when deadlines are looming. And it may be very hard with existing software. However, untestable software is usually not modular enough for good code to begin with. Just the act of testing makes you structure code into smaller and more predictable functions with fewer inter-related ‘magic’. Getting in the habit early will make you a better developer.

Lecture over. Lets write some tests.

For many modules that are built into Python, there is a better version of that functionality available in PyPi. If you find an example for urllib2 looks for a better one using requests. unittest is built into Python. Do yourself a favor and install pytest instead.

pip3 install pytest

See that was painless.

Lets use our examples and put those into tests. On a larger project, I would have a separate testing directory with tests. For simplicity, I’ll put everything in one file here. Below is what my default-mutable.py file looks like.

def append_values(*items, to_list=[]):
    for item in items:
        to_list.append(item)
    return to_list


def test_with_default():
    list_a = append_values(1, 2)
    assert list_a == [1, 2]
    list_d = append_values(7, 8)
    assert list_d == [7, 8]


def test_with_list():
    list_b = append_values(3, 4, to_list=[])
    assert list_b == [3, 4]
    list_c = append_values(5, 6, to_list=[9, 10])
    assert list_c == [9, 10, 5, 6]

If I run this from the folder containing the file, I would use pytest default-mutable.py. Here is the output:

C:\Users\micro\repositories\scratchpad>pytest default-mutable.py
============================= test session starts =============================
platform win32 -- Python 3.6.3, pytest-3.2.5, py-1.5.2, pluggy-0.4.0
rootdir: C:\Users\micro\repositories\scratchpad, inifile:
collected 2 items

default-mutable.py F.

================================== FAILURES ===================================
______________________________ test_with_default ______________________________

    def test_with_default():
        list_a = append_values(1, 2)
        assert list_a == [1, 2]
        list_d = append_values(7, 8)
>       assert list_d == [7, 8]
E       assert [1, 2, 7, 8] == [7, 8]
E         At index 0 diff: 1 != 7
E         Left contains more items, first extra item: 7
E         Use -v to get the full diff

default-mutable.py:11: AssertionError
===================== 1 failed, 1 passed in 0.08 seconds ======================

We have an output of F. so we failed the first one and passed the second. Notice the code is shown for the test and we see [1, 2, 7, 8] == [7, 8] as the failed assert. This is what we saw with our manual testing. Now we have a failing test that covers our problem. Now lets fix the code and verify with passing tests.

Fixing the default mutable problem

The best method for a default argument value is something that is easy to determine between a list with items and no list. So if we use None as the default, it is easy to tell the difference. However it forces us to handle the situation inside the function.

We will update out function to be the following:

def append_values(*items, to_list=None):
    if to_list is None:
        to_list = []
    for item in items:
        to_list.append(item)
    return to_list

If we have None, we are just creating a new empty list. None is immutable, so it will not be polluted. Lets rerun our tests.

============================= test session starts =============================
platform win32 -- Python 3.6.3, pytest-3.2.5, py-1.5.2, pluggy-0.4.0
collected 2 items

default-mutable.py ..

========================== 2 passed in 0.03 seconds ===========================

Well that certainly looks better.

Using the default mutable for good

Once you understand what is going on and how it works, this default mutable problem can be used for your goals.

Since the post has gotten fairly long, I’m going to keep you hanging on this one until my next Python post. For most people who have arrived after the next is posted, that just means clicking the next link below.


Part 3 of 9 in the Python Tips series.

Series Start | Python: Strings | Python: Mutable Defaults and Decorators

comments powered by Disqus