Skip to content

Current Refactorings

Hellebore edited this page Mar 20, 2021 · 14 revisions

These are the refactorings currently available in Sourcery.

If you hover over a Sourcery refactoring proposal in your IDE or look at a Sourcery comment in GitHub you can see the Sourcery Refactoring ID for all of the refactorings being proposed in that change.

If you want to disable any of the refactorings in a specific function add in the comment # sourcery skip followed by the refactoring id. To skip it across your whole project add it into the refactor skips section of your Sourcery config file .

If you have any questions about any of these refactorings or have suggestions for new refactorings please reach out to us.

Augmented assign

Sourcery refactoring id: aug-assign

Description:

Replaces assignments with augmented assignments

Before:

count = count + other_value

After:

count += other_value

Explanation:

Reading the changed code is a bit shorter and clearer - we don't need to think about the count variable twice. Other operators that can be used include -=, *=, /= and **=.

The type you're assigning to has to have the appropriate operator defined, so Sourcery will only suggestion this change where it can determine the type and it is suitable. For instance numpy arrays do not support the /= operation.

Assign if expression

Sourcery refactoring id: assign-if-exp

Description:

Replaces conditional assignment to a variable with an if expression

Before:

if condition:
    x = 1
else:
    x = 2

After:

x = 1 if condition else 2

Explanation:

Python's conditional expression syntax is its version of the ternary operator. Using this syntax is definitely more concise, but it is one of the more controversial refactorings (along with list comprehensions). Some coders dislike these expressions and find them slightly harder to parse than writing them out fully.

Our view is that this is a definite improvement where the conditional expression is:

  • short and fits on one line
  • is not complex (no nested expressions or long chains of booleans)

Once the change is made there's only one statement where x is defined as opposed to having to read two statements plus the if-else lines. Similarly to the comprehension example, when we're scanning the code we usually won't need to know the details of how x gets assigned, and can just see that it's being assigned and move on.

Binary operator identity

Sourcery refactoring id: bin-op-identity

Description:

Replaces binary operationss between a value and itself with known identities:

Changes:

  • x | x => x
  • x & x => x
  • x ^ x => 0
  • x - x => 0
  • x / x => 1
  • x // x => 1
  • x % x => 0

Explanation:

These changes simplify the code, making it easier to understand what is going on.

Boolean if expression identity

Sourcery refactoring id: boolean-if-exp-identity

Description:

Simplifies boolean if expressions by removing unnecessary explicit references to True or False states

Before:

some_var = True if some_boolean_expression else False

After:

some_var = some_boolean_expression

Explanation:

Rather than using an if-expression to evaluate a boolean, you can just use it directly. This is shorter and clearer - if expressions of this form take a while to mentally parse.

Where Sourcery cannot determine if the expression is a boolean it will wrap it in a call tobool().

Collection to comprehension

Sourcery refactoring id: collection-builtin-to-comprehension

Description:

Use list, set or dictionary comprehensions directly instead of calling list(), dict() or set()

Before:

squares = list(x*x for x in y)
squares = set(x*x for x in y)
squares = dict((x, x*x) for x in xs)

After:

squares = [x*x for x in y]
squares = {x*x for x in y}
squares = {x: x*x for x in xs}

Explanation:

The Pythonic way to create a list, set or dictionary from a generator is to use comprehensions.

Using the comprehensions rather than the methods is slightly shorter, and the dictionary comprehension in particular is easier to read in comprehension form.

Comprehension to generator

Sourcery refactoring id: comprehension-to-generator

Description:

Replace unneeded comprehension with generator

Before:

hat_found = any([is_hat(item) for item in wardrobe])

After:

hat_found = any(is_hat(item) for item in wardrobe)

Explanation:

Functions like any, all and sum allow you to pass in a generator rather than a collection. Doing so removes a pair of brackets, making the intent slightly clearer. It will also return immediately if a hat is found, rather than having to build the whole list. This lazy evaluation can lead to performance improvements.

Note that we are actually passing a generator into any() so strictly speaking the code would look like this:

hat_found = any((is_hat(item) for item in wardrobe))

but Python allows you to omit this pair of brackets.

The standard library functions that accept generators are:

'all', 'any', 'enumerate', 'frozenset', 'list', 'max', 'min', 'set', 'sum', 'tuple'

Convert any to in

Sourcery refactoring id: convert-any-to-in

Description:

Converts any() functions to simpler in statements

Before:

if any(hat == 'bowler' for hat in hats):
    shout("I have a bowler hat!")

After:

if 'bowler' in hats:
    shout("I have a bowler hat!")

Explanation:

Using Python's in operator simplifies the code and makes it much easier to tell at a glance that you are checking if something is present in the sequence.

Convert to enumerate

Sourcery refactoring id: convert-to-enumerate

Description:

Replaces manual loop counter with call to enumerate

Before:

i = 0
for currency in currencies:
    print(i, currency)
    i += 1

After:

for i, currency in enumerate(currencies):
    print(i, currency)

Explanation:

When iterating over a list you sometimes need access to a loop counter that will let you know the index of the element you are utilising.

Using the built-in Python function, enumerate, lets you generate an index directly, removing two unneeded lines of code.

When reading this we don't have to worry about the book-keeping of the i variable, letting us focus in on the code that really matters.

Default get

Sourcery refactoring id: default-get

Description:

Simplify dictionary access by using the default get method

Before:

def pick_hat(available_hats: Dict[Label, Hat]):
    if self.favourite_hat in available_hats:
        hat_to_wear = available_hats[self.favourite_hat]
    else:
        hat_to_wear = NO_HAT
    return hat_to_wear

After:

def pick_hat(available_hats: Dict[Label, Hat]):
    hat_to_wear = available_hats.get(self.favourite_hat, NO_HAT)
    return hat_to_wear

Explanation:

We often want to pick something from a dictionary if the key is present, or use a default value if it isn't.

A useful shortcut is that Python dictionaries have a get() method which lets you set a default value using the second parameter.

This has slimmed the code down and removed some duplication. A point to note is that if you don't pass in a default value to get() it will use None.

Del comprehension

Sourcery refactoring id: del-comprehension

Description:

Replaces cases where deletions are made via for loops with comprehensions

Before:

x1 = {'a': 1, 'b': 2, 'c': 3}
for key in x1.copy():  # can't iterate over a variable that changes size
    if key not in x0:
        del x1[key]

After:

x1 = {'a': 1, 'b': 2, 'c': 3}
x1 = {key: value for key, value in x1.items() if key in x0}

Explanation:

When creating a filtered list in Python it is shorter and more readable to use a comprehension than to copy a list and delete the unneeded keys.

De Morgan Identity

Sourcery refactoring id: de-morgan

Description:

Simplifies conditional logic using De Morgan's Identity

Before:

if not not p:
    a = 1
if not a is b:
    a = 1
if not (p == 1 and q == 2):
    a = 1

After:

if p:
    a = 1
if a is not b:
    a = 1
if p != 1 or q != 2:
    a = 1

Explanation:

When reading logical statements it is important for them to be as simple as possible. This proposal applies De Morgan's identity to simplify code by removing double negatives or pushing negations deeper into statements.

Dictionary comprehension

Sourcery refactoring id: dict-comprehension

Description:

Replaces dictionaries created with for loops with dictionary comprehensions

Before:

cubes = {}
for i in range(100):
    cubes[i] = i**3

After:

cubes = {x: x**3 for x in range(100)}

Explanation:

A dictionary comprehension can create the dictionary on one line, cutting out the clutter of declaring an empty dict and then adding items.

Turning three lines of code into one is a definite win - it means less scrolling back and forth when reading methods and helps keep things manageable.

Squeezing code onto one line can make it more difficult to read, but for comprehensions this isn't the case. All of the elements that you need are nicely presented, and once you are used to the syntax it is actually more readable than the for loop version.

Another point is that the assignment is now more of an atomic operation - we're declaring what cubes is rather than giving instructions on how to build it. This makes the code read like more of a narrative, since going forward we will care more about what cubes is than the details of its construction.

Finally comprehensions will usually execute more quickly than building the collection in a loop, which is another factor if performance is a consideration.

Dictionary literal

Sourcery refactoring id: dict-literal

Description:

Replaces dictionaries created with dict() with {}

Before:

x = dict()

After:

x = {}

Explanation:

The most concise and Pythonic way to create a dictionary is to use the {} notation.

This fits in with the way we create dictionaries with items, saving a bit of mental energy that might be taken up with thinking about two different ways of creating dicts.

x = {'first':"thing"}

Doing things this way has the added advantage of being a nice little performance improvement.

Here are the timings before and after the change:

$ python3 -m timeit "x = dict()"
5000000 loops, best of 5: 69.8 nsec per loop
$ python3 -m timeit "x = {}"
20000000 loops, best of 5: 29.4 nsec per loop

Similar reasoning and performance results hold for replacing list() with [].

Ensure file closed

Sourcery refactoring id: ensure-file-closed

Description:

Use with when opening file to ensure closure

Before:

file = open("welcome.txt")
data = file.read()
print(data)
file.close()

After:

with open("welcome.txt") as file:
    data = file.read()
    print(data)

Explanation:

Opening files in this way is more concise, and also ensures that the call to file.close() is not skipped if there is an exception.

It makes use of Python's with context manager - under the hood the changed code behaves like this:

file = open("welcome.txt")
try:
    data = file.read()
    print(data)
finally:
    file.close()

The file is closed for you as soon as the block is exited, even where an exception has been thrown.

Extract duplicate method

Sourcery refactoring id: extract-duplicate-method

Description:

Identifies duplicate sections of code in a function and extracts these into their own method

Requires Sourcery Pro

Before:

def extraction_example():
  self.speed_slider = Scale(
    self.master, from_=1, to=10, orient=HORIZONTAL, label='Speed')
  self.speed_slider.pack()
  self.speed_slider.set(DEFAULT_SPEED)
  self.speed_slider.configure(background='white')

  self.force_slider = Scale(
    self.master, from_=1, to=10, orient=HORIZONTAL, label='Force')
  self.force_slider.pack()
  self.force_slider.set(DEFAULT_FORCE)
  self.force_slider.configure(background='white')

After:

def extraction_example():
  self.speed_slider = _extracted_from_extraction_example_2(
      self, 'Speed', DEFAULT_SPEED)
  self.force_slider = _extracted_from_extraction_example_2(
      self, 'Force', DEFAULT_FORCE)

def _extracted_from_extraction_example_2(self, label, arg2):
  result = Scale(self.master, from_=1, to=10, orient=HORIZONTAL, label=label)
  result.pack()
  result.set(arg2)
  result.configure(background='white')

  return result

Explanation:

Do not Repeat Yourself (DRY) is an important tenet of writing clean, maintainable code. Duplicated code bloats the code base, making it harder to read and understand. It often also leads to bugs. Where changes are made in only some of the duplicated areas unintended behaviour will often arise.

One of the main ways to remove duplication is to extract the common areas into another method and call that. Sourcery can detect areas of duplicate code that are in the same function and extract them. It is recommended that you then rename the extracted function and any arguments that have not been automatically named. In the above example a suitable method name would be create_slider, and arg2 would be default_value.

Extract method

Sourcery refactoring id: extract-method

Description:

Extracts complex pieces of functions into their own methods

Requires Sourcery Pro

Before:

def func(self):
  if typed and self.current_char == '(':
    pos = self.pos
    try:
      pass
    except DefinitionError as exParamQual:
      self.pos = pos
      try:
        pass
      except DefinitionError as exNoPtrParen:
        self.pos = pos
        msg = "If parenthesis in noptr-declarator"
        if paramMode == 'function':
          msg += " (e.g., 'void (*f(int arg))(double)')"
        prevErrors.append((exNoPtrParen, msg))
        header = "Error in declarator"
        raise self._make_multi_error(prevErrors, header)

After:

def func(self):
  if typed and self.current_char == '(':
    pos = self.pos
    try:
      pass
    except DefinitionError as exParamQual:
      self.pos = pos
      try:
        pass
      except DefinitionError as exNoPtrParen:
        self._extracted_from_func_11(pos, exNoPtrParen)

def _extracted_from_func_11(self, pos, exNoPtrParen):
  self.pos = pos
  msg = "If parenthesis in noptr-declarator"
  if paramMode == 'function':
    msg += " (e.g., 'void (*f(int arg))(double)')"
  prevErrors.append((exNoPtrParen, msg))
  header = "Error in declarator"
  raise self._make_multi_error(prevErrors, header)

Explanation:

Methods are much more readable if they are short, and if every line of code is at the same rough level of abstraction.

This refactoring takes long blocks from functions and extracts them into their own method. It will only trigger for large blocks that are possible to extract, and where the remainder of the method is sufficiently long.

It is recommended that you then rename the extracted function and any arguments that have not been automatically named. Here a suitable method name would be handle_noptr_exception.

Flip comparison

Sourcery refactoring id: flip-comparison

Description:

Moves variables from the right side to the left side of comparisons

Before:

if 1 == x:
    do_x()

After:

if x == 1:
    do_x()

Explanation:

It is a coding convention that when doing comparisons to a numeric constant, the number should be on the right and the variable on the left. Having things be consistent in this way makes the code easier to read.

Guard

Sourcery refactoring id: guard

Description:

Adds in guard clause to a conditional

Before:

def set_hat_details(hat):
    if hat.stylish and is_summer_hat(hat):
        hat.tag = 'SUMMER'
    elif not hat:
        pass
    elif hat.stylish and is_winter_hat(hat):
        hat.tag = 'WINTER'

After:

def set_hat_details(hat):
    if not hat:
        return

    if hat.stylish and is_summer_hat(hat):
        hat.tag = 'SUMMER'
    elif hat.stylish and is_winter_hat(hat):
        hat.tag = 'WINTER'

Explanation:

Often one option within an if..elif conditional will have no effect, and hence have only a pass within it. Where there is no further code within the block afterwards it is possible to extract this branch out as a guard condition. This can then sometimes allow further refinements to be made to the if..elif statements. Sourcery only suggests this refactoring if it unlocks such further changes - for example in the above case it would be combined with lift-duplicated-conditional to give the following output:

def set_hat_details(hat):
    if not hat:
        return
    if hat.stylish:
        if is_summer_hat(hat):
            hat.tag = 'SUMMER'
        elif is_winter_hat(hat):
            hat.tag = 'WINTER'

Hoist if out of if

Sourcery refactoring id: hoist-if-from-if

Description:

Moves if statements that match a conditional out of that conditional

Before:

if hat.quality < DESIRED_QUALITY:
    happiness -= 1
    if not hat.is_stylish() and hat.quality < DESIRED_QUALITY:
        happiness -= 1

After:

if hat.quality < DESIRED_QUALITY:
    happiness -= 1
if not hat.is_stylish() and hat.quality < DESIRED_QUALITY:
    happiness -= 1

Explanation:

Where a nested conditional repeats the conditions of the outer one, it is logically equivalent to reduce the level of nesting as shown above. Doing this can make the meaning of the code slightly clearer.

Hoist loops out of if

Sourcery refactoring id: hoist-loop-from-if

Description:

Moves loops that occur in all cases of an if statement outside of the conditional

Before:

def sing_song(self):
    if style == 1:
        while thing_true():
            do_x()
    elif style == 2:
        while thing_true():
            do_y()
    elif style == 3:
        while thing_true():
            do_z()

After:

def sing_song(self):
    while thing_true():
        if style == 1:
            do_x()
        elif style == 2:
            do_y()
        elif style == 3:
            do_z()

Explanation

Where the same for or while loop is defined in all branches of a conditional, the code can be considerably shortened and clarified by hoisting. By moving the loop outside, duplication is removed and the code becomes much clearer.

Hoist statements out of if

Sourcery refactoring id: hoist-statement-from-if

Description:

Moves statements that occur in all cases of an if statement outside of the conditional

Before:

if sold > DISCOUNT_AMOUNT:
    total = sold * DISCOUNT_PRICE
    label = f'Total: {total}'
else:
    total = sold * PRICE
    label = f'Total: {total}'

After:

if sold > DISCOUNT_AMOUNT:
    total = sold * DISCOUNT_PRICE
else:
    total = sold * PRICE
label = f'Total: {total}'

Explanation:

We should always be on the lookout for ways to remove duplicated code. An opportunity for code hoisting is a nice way of doing so.

Sometimes code is repeated on both branches of a conditional. This means that the code will always execute. The duplicate lines can be hoisted out of the conditional and replaced with a single line.

By taking the assignment to label outside of the conditional we have removed a duplicate line of code, and made it clearer what the conditional is actually controlling, which is the total.

Hoist statement out of loop

Sourcery refactoring id: hoist-statement-from-loop

Description:

Moves statements that are constant across all cases of a loop outside of the loop

Before:

for building in buildings:
    city = 'London'
    addresses.append(building.street_address, city)

After:

city = 'London'
for building in buildings:
    addresses.append(building.street_address, city)

Explanation:

Another type of hoisting is pulling invariant statements out of loops. If a statement just sets up some variables for use in the loop, it doesn't need to be inside it. Loops are inherently complex, so making them shorter and easier to understand should be on your mind while writing them.

In this example the city variable gets assigned inside the loop, but it is only read and not altered.

It's therefore safe to hoist it out, and this makes it clearer that the same city value will apply to every building.

This also improves performance - any statement in a loop is going to be executed every time the loop runs. The time spent on these multiple executions is being wasted, since it only needs to be executed once. This saving can be significant if the statements involve calls to databases or other time-consuming tasks.

Inline immediately returned variables

Sourcery refactoring id: inline-immediately-returned-variable

Description:

Inlines a variable to a return in the case when the variable being declared is immediately returned

Before:

def state_attributes(self):
    """Return the state attributes."""
    state_attr = {
        ATTR_CODE_FORMAT: self.code_format,
        ATTR_CHANGED_BY: self.changed_by,
    }
    return state_attr

After:

def state_attributes(self):
    """Return the state attributes."""
    return {
        ATTR_CODE_FORMAT: self.code_format,
        ATTR_CHANGED_BY: self.changed_by,
    }

Explanation:

Something that we often see in people's code is assigning to a result variable and then immediately returning it.

Returning the result directly shortens the code and removes an unnecessary variable, reducing the mental load of reading the function.

Where intermediate variables can be useful is if they then get used as a parameter or a condition, and the name can act like a comment on what the variable represents. In the case where you're returning it from a function, the function name is there to tell you what the result is - in the example above it's the state attributes, and the state_attr name wasn't providing any extra information.

Index range replacement

Sourcery refactoring id: for-index-replacement

Description:

Replace item lookups in loops using the index with direct reference to the items

Before:

for i in range(len(currencies)):
    print(currencies[i])

After:

for currency in currencies:
    print(currency)

Explanation:

A pattern that is often used in Python for loops is to use range(len(list)) to generate a range of numbers that can be iterated over.

If the index i is only used to do list access this code can be improved by iterating over the list directly, as in the above example.

This code is easier to understand, and a lot less cluttered. In particular being able to use a meaningful name for currency greatly improves readability.

Note that this refactoring will only trigger where Sourcery can determine that the collection (currencies in this case) is a list.

Invert any & all

Sourcery refactoring id: invert-any-all

Description:

Switches not any or not all statements to all or any statements respectively

Before:

b = not all(a > b for a in things)

After:

b = any(a <= b for a in things)

Explanation:

A negation is more difficult to read than a straightforward statement. This refactoring inverts the any or all to remove the negation.

Invert any & all in body

Sourcery refactoring id: invert-any-all-body

Description:

Simplifies any and all statements that can be inverted

Before:

b = any(not a for a in things)

After:

b = all(things)

Explanation:

A negation is more difficult to read than a straightforward statement. This refactoring inverts the any or all to remove the negation.

Last if statement guard

Sourcery refactoring id: last-if-guard

Description:

Convert the final conditional into a guard clause

Before:

def f(a=None):
    if a is None:
        return 42
    else:
        # some long calculations
        var = (i 2 for i in range(a))
        return sum(var)

After:

def f(a=None):
    if a is None:
        return 42

    # some long calculations
    var = (i 2 for i in range(a))
    return sum(var)

Explanation:

A common code pattern is to have some clauses at the start of a function, to check whether certain conditions have been fulfilled and return early or raise an exception if not.

While this is perfectly valid code, it can run into problems with excessive nesting, particularly if the rest of the function is fairly long.

Here we can take advantage of the fact that we don't need the else if the main body of the if breaks the control flow by ending with return or raise. Rewriting the function as shown here is logically equivalent.

Using a guard condition, or multiple guard conditions, in this way now doesn't cause the rest of the function to be indented. In general the less we have to deal with indents the easier the code is to understand.

Lift duplicated conditional

Sourcery refactoring id: lift-duplicated-conditional

Description:

Lift repeated conditional into its own if statement

Before:

if isinstance(hat, Sombrero) and hat.colour == 'green':
    wear(hat)
elif isinstance(hat, Sombrero) and hat.colour == 'red':
    destroy(hat)

After:

if isinstance(hat, Sombrero):
    if hat.colour == 'green':
        wear(hat)
    elif hat.colour == 'red':
        destroy(hat)

Explanation:

Duplicating conditions in an if statement makes things more difficult to read, and carries with it all the usual problems of code duplication. While normally we try to avoid adding nesting to the code, in this case it makes sense to lift the duplicated conditional into its own if statement.

It is now clearer at a glance that the whole if..elif chain relates only to sombreros and not other types of hat.

Lift return into if

Sourcery refactoring id: lift-return-into-if

Description:

Lift return into if

Before:

def f():
    if condition:
        val = 42
    else:
        val = 0
    return val

After:

def f():
    if condition:
        return 42
    else:
        return 0

Explanation:

This is a quick way to streamline code slightly. Where a value is set on each branch of an if and then immediately returned, instead return it directly from each branch.

This has removed an unnecessary intermediate variable which we had to mentally track.

List comprehension

Sourcery refactoring id: list-comprehension

Description:

Converts a for loop into a list comprehension

Before:

cubes = []
for i in range(20):
    cubes.append(i**3)

After:

cubes = [i**3 for i in range(20)]

Explanation:

A list comprehension can create the list on one line, cutting out the clutter of declaring an empty list and then appending values.

Turning three lines of code into one is a definite win - it means less scrolling back and forth when reading methods and helps keep things manageable.

Squeezing code onto one line can make it more difficult to read, but for comprehensions this isn't the case. All of the elements that you need are nicely presented, and once you are used to the syntax it is actually more readable than the for loop version.

Another point is that the assignment is now more of an atomic operation - we're declaring what cubes is rather than giving instructions on how to build it. This makes the code read like more of a narrative, since going forward we will care more about what cubes is than the details of its construction.

Finally comprehensions will usually execute more quickly than building the collection in a loop, which is another factor if performance is a consideration.

List literal

Sourcery refactoring id: list-literal

Description:

Replaces lists created with list() with []

Before:

x = list()

After:

x = []

Explanation:

The most concise and Pythonic way to create a list is to use the [] notation.

This fits in with the way we create lists with elements, saving a bit of mental energy that might be taken up with thinking about two different ways of creating lists.

x = ['first', 'second']

Doing things this way has the added advantage of being a nice little performance improvement.

Here are the timings before and after the change:

$ python3 -m timeit "x = list()"
5000000 loops, best of 5: 63.3 nsec per loop
$ python3 -m timeit "x = []"
20000000 loops, best of 5: 15.8 nsec per loop

Similar reasoning and performance results hold for replacing dict() with {}.

Merge comparisons

Sourcery refactoring id: merge-comparisons

Description:

Consolidates multiple comparisons into a single comparison

Before:

def process_payment(payment):
    if payment.currency == 'USD' or payment.currency == 'EUR':
        process_standard_payment(payment)
    else:
        process_international_payment(payment)

After:

def process_payment(payment):
    if payment.currency in ['USD', 'EUR']:
        process_standard_payment(payment)
    else:
        process_international_payment(payment)

Explanation:

We often have to compare a value to one of several possible others. When written out like the 'Before' example we have to look through each comparison to understand it, as well as mentally processing the boolean operator.

By using the in operator and moving the values we are comparing to into a collection we can simplify things.

This has avoided a little bit of duplication, and the conditional can now be taken in and understood with one glance.

Merge dictionary assign

Sourcery refactoring id: merge-dict-assign

Description:

Declare the dictionary with values rather than creating an empty one and assigning to it

Before:

hats_i_own = {}
hats_i_own['panama'] = 1
hats_i_own['baseball_cap'] = 2
hats_i_own['bowler'] = 23

After:

hats_i_own = {'panama': 1, 'baseball_cap': 2, 'bowler': 23}

Explanation:

When declaring a dictionary and filling it up with values one way that can come naturally is to declare it as empty and then add entries to it.

This can be done in place, shortening the code and making the intent more explicit. Now I just need to glance at one line to see that I'm filling a variable with hats, rather than four.

The same holds true for filling up other collection types like sets and lists.

Merge duplicate blocks

Sourcery refactoring id: merge-duplicate-blocks

Description:

Restructure conditional to merge duplicate branches together

Before:

def process_payment(payment):
    if payment.currency == 'USD':
        process_standard_payment(payment)
    elif payment.currency == 'EUR':
        process_standard_payment(payment)
    else:
        process_international_payment(payment)

After:

def process_payment(payment):
    if payment.currency == 'USD' or payment.currency == 'EUR':
        process_standard_payment(payment)
    else:
        process_international_payment(payment)

Explanation:

We should always be searching out opportunities to remove duplicated code. A good place to do so is where there are multiple identical blocks inside an if..elif chain. This refactoring combines such blocks.

Now if we need to change the process_standard_payment(payment) line we can do it in one place instead of two. This becomes even more important if these blocks involve multiple lines.

Merge is instance

Sourcery refactoring id: merge-is-instance

Description:

Combines together multiple isinstance functions

Before:

if isinstance(hat, Bowler) or isinstance(hat, Fedora):
    wear(hat)

After:

if isinstance(hat, (Bowler, Fedora)):
    wear(hat)

Explanation:

When you want to check whether something is one of multiple different types, you can merge the two isinstance checks into a single call.

This is shorter while staying nice and easy to read.

Merge list append

Sourcery refactoring id: merge-list-append

Description:

Create the list with values instead of creating an empty list and appending to it

Before:

hats_i_own = []
hats_i_own.append('panama')
hats_i_own.append('baseball_cap')
hats_i_own.append('bowler')

After:

hats_i_own = ['panama', 'baseball_cap', 'bowler']

Explanation:

When declaring a list and filling it up with values one way that can come naturally is to declare it as empty and then append to it.

This can be done in place, shortening the code and making the intent more explicit. Now I just need to glance at one line to see that I'm filling a variable with hats, rather than four.

Doing it this way is also slightly more performant since it avoids the function calls to append. The same holds true for filling up other collection types like sets and dictionaries.

Merge list extend

Sourcery refactoring id: merge-list-extend

Description:

Create the list with values instead of creating an empty list and extending it with another list

Before:

hats_i_own = []
hats_i_own.extend(['panama, baseball_cap, bowler'])

After:

hats_i_own = ['panama', 'baseball_cap', 'bowler']

Explanation:

When declaring a list with values it is much clearer to do so on one line rather than declaring an empty list and then extending it with values.

Merge nested if statements

Sourcery refactoring id: merge-nested-ifs

Description:

Merges together multiple nested if conditions into one

Before:

if a:
    if b:
        return c

After:

if a and b:
    return c

Explanation:

Too much nesting can make code difficult to understand, and this is especially true in Python, where there are no brackets to help out with the delineation of different nesting levels.

Reading deeply nested code is confusing, since you have to keep track of which conditions relate to which levels. We therefore strive to reduce nesting where possible, and the situation where two if conditions can be combined using and is an easy win.

Merge repeated ifs

Sourcery refactoring id: merge-repeated-ifs

Description:

Merges together the interior contents of if statements with identical conditions

Before:

if wardrobe.hats:
  self.happiness += 1
else:
  self.happiness -= 1

if wardrobe.hats:
  self.stylishness += 1
else:
  self.stylishness -= 1

After:

if wardrobe.hats:
  self.happiness += 1
  self.stylishness += 1
else:
  self.happiness -= 1
  self.stylishness -= 1

Explanation:

Where subsequent if statements have identical conditions it is logically equivalent to merge them together. This shortens the code and makes it easier to read and understand.

Note that this can only be done if the statements inside the first if condition have no effect on the condition itself.

Merge set add

Sourcery refactoring id: merge-set-add

Description:

Create the set with values instead of declaring an empty set and adding to it

Before:

hats_i_own = set()
hats_i_own.add('panama')
hats_i_own.add('baseball_cap')
hats_i_own.add('bowler')

After:

hats_i_own = {'panama', 'baseball_cap', 'bowler'}

Explanation:

When declaring a set and filling it up with values one way that can come naturally is to declare it as empty and then add to it.

This can be done in place, shortening the code and making the intent more explicit. Now I just need to glance at one line to see that I'm filling a variable with hats, rather than four.

Doing it this way is also slightly more performant since it avoids the function calls to add. The same holds true for filling up other collection types like lists and dictionaries.

Min/max identity

Sourcery refactoring id: min-max-identity

Description:

Replaces duplicate conditionals looking for the minimum or maximum value of multiple variables with a min or max function

Before:

if first_hat.price < second_hat.price:
    cheapest_hat_price = first_hat.price
else:
    cheapest_hat_price = second_hat.price
if sale_price >= 10:
    sale_price = 10

After:

cheapest_hat_price = min(first_hat.price, second_hat.price)
sale_price = min(sale_price, 10)

Explanation:

We often need to work out the smallest or largest of two values, and the quickest way to do this in Python is to use the built-in min and max functions. This results in a shorter and clearer way to achieve the same result. The same functions also offer a shortcut for when we want to put a cap or a floor on the value of a variable.

Move assign

Sourcery refactoring id: move-assign

Description:

Moves assignment of variables closer to their usage

Before:

def should_i_wear_this_hat(self, hat):
    if not isinstance(hat, Hat):
        return False

    weather_outside = self.look_out_of_window()
    is_stylish = isinstance(hat, StylishHat)
    if weather_outside.is_raining:
        print("Damn.")
        return True
    else:
        print("Great.")
        return is_stylish

After:

def should_i_wear_this_hat(self, hat):
    if not isinstance(hat, Hat):
        return False

    weather_outside = self.look_out_of_window()
    if weather_outside.is_raining:
        print("Damn.")
        return True
    else:
        print("Great.")
        is_stylish = isinstance(hat, StylishHat)
        return is_stylish

Explanation:

The scope of local variables should always be as tightly defined as possible and practicable.

This means that:

  • You don't have to keep the variable in your working memory through the parts of the function where it's not needed. This cuts down on the cognitive load of understanding your code.

  • If code is in coherent blocks where variables are declared and used together, it makes it easier to split functions apart, which can lead to shorter, easier to understand methods.

  • If variables are declared far from their usage, they can become stranded. If the code where they are used is later changed or removed unused variables can be left sitting around, complicating the code unnecessarily.

None compare

Sourcery refactoring id: none-compare

Description:

Replaces == with is when comparing to None

Before:

if hat == None:
    raise NoHatException

After:

if hat is None:
    raise NoHatException

Explanation:

In Python is refers to reference equality - where you want to check if something is the same object. Equals == checks value equality, to check that the objects are equal to each other. In Python the None object is a singleton, so it is correct to use is when comparing to it.

Or if experession identity

Sourcery refactoring id: or-if-exp-identity

Description:

Replaces if expressions that compare to the target with or

Before:

currency = input_currency if input_currency else DEFAULT_CURRENCY

After:

currency = input_currency or DEFAULT_CURRENCY

Explanation:

Here we find ourselves setting a value if it evaluates to True, and otherwise using a default.

The 'After' case is a bit easier to read and avoids the duplication of input_currency.

It works because the left-hand side is evaluated first. If it evaluates to True then currency will be set to this and the right-hand side will not be evaluated. If it evaluates to False the right-hand side will be evaluated and currency will be set to DEFAULT_CURRENCY.

Remove dictionary keys

Sourcery refactoring id: remove-dict-keys

Description:

Removes unnecessary call to keys() when iterating over a dictionary

Before:

for currency in currencies.keys():
    process(currency)

After:

for currency in currencies:
    process(currency)

Explanation:

Sometimes when iterating over a dictionary you only need to use the dictionary keys.

In this case the call to keys() is unnecessary, since the default behaviour when iterating over a dictionary is to iterate over the keys.

The code is now slightly cleaner and easier to read, and avoiding a function call will yield a (small) performance improvment.

Remove pass from body

Sourcery refactoring id: remove-pass-body

Description:

Removes a pass from the body of a conditional by inverting it

Before:

if not a:
    pass
else:
    return c

After:

if a:
    return c

Explanation:

It is much clearer for a conditional to do something in its main body than solely in its else clause. In the latter case anyone reading the code has to mentally invert the condition in order to determine the meaning.

Remove pass from elif

Sourcery refactoring id: remove-pass-elif

Description:

Removes a pass from the elif section of a conditional

Before:

if a:
    x()
elif b:
    pass
else:
    return c

After:

if a:
    x()
elif not b:
    return c

Explanation:

Reducing the number of branches in a conditional reduces the mental load of reading and understanding it considerably. This simplification can be made where an empty elif branch is immediately followed by the else.

Remove redundant exception handlers

Sourcery refactoring id: remove-redundant-except-handler

Description:

Removes exception handlers that can never trigger (as the exceptions have already been caught)

Before:

try:
    pass
except Exception:
    do_x()
except ValueError:
    do_y()
finally:
    do_t()

After:

try:
    pass
except Exception:
    do_x()
finally:
    do_t()

Explanation:

Currently only one except handler can be triggered for any exception raised in Python. This means that if an exception is already caught, subsequent handlers can never have an effect on it. Having them present is confusing, since the reader may think that this exception handling code is behaving as expected, when in fact it is redundant.

Remove redundant if statements

Sourcery refactoring id: remove-redundant-if

Description:

Removes conditional tests where the conditional is always True or False

Before:

def hello(name : str):
    if name.startswith("L"):
        self.sing("Hip Hip Horray! For " + name)
    elif not name.startswith("L"):
        self.sing("Hello " + name + ", it's nice to meet you.")

After:

def hello(name : str):
    if name.startswith("L"):
        self.sing("Hip Hip Horray! For " + name)
    else:
        self.sing("Hello " + name + ", it's nice to meet you.")

Explanation:

This refactoring will re-structure conditionals where Sourcery determines that one of the tests is always True or False. This reveals the actual logic of the conditional to the reader, making the code easier to understand.

Remove redundant pass

Sourcery refactoring id: remove-redundant-pass

Description:

Removes unnecessary pass statements

Before:

if a:
    x()
else:
    pass
do_something()
pass

After:

if a:
    x()
do_something()

Explanation:

A pass statement is unnecessary if a block has other statements in it, and an else which only contains pass can be safely removed. Making these changes shortens the code, and the reader doesn't have to consider and deal with the unnecessary pass statements.

Remove unnecessary else

Sourcery refactoring id: remove-unnecessary-else

Description:

Remove unnecessary else after guard condition

Before:

def f(a=None):
    if a is None:
        return 42
    else:
        # some long calculations
        var = (i 2 for i in range(a))
        return sum(var)

After:

def f(a=None):
    if a is None:
        return 42

    # some long calculations
    var = (i 2 for i in range(a))
    return sum(var)

Explanation:

A common code pattern is to have some clauses at the start of a function, to check whether certain conditions have been fulfilled and return early or raise an exception if not.

While this is perfectly valid code, it can run into problems with excessive nesting, particularly if the rest of the function is fairly long.

Here we can take advantage of the fact that we don't need the else if the main body of the if breaks the control flow by ending with return or raise. Rewriting the function as shown here is logically equivalent.

Using a guard condition, or multiple guard conditions, in this way now doesn't cause the rest of the function to be indented. In general the less we have to deal with indents the easier the code is to understand.

Remove unreachable code

Sourcery refactoring id: remove-unreachable-code

Description:

Removes code that will never be executed

Before:

for a in b:
    do_a()
    continue
    do_x()
    do_y()

After:

for a in b:
    do_a()
    continue

Explanation:

Statements after a continue, return or raise will never be executed. Leaving them in the code confuses the reader, who may believe that these statements have some effect. They should therefore be removed.

Remove zero from range

Sourcery refactoring id: remove-zero-from-range

Description:

Removes unnecessary explicit 0 argument from range

Before:

for i in range(0, len(x)):
    do_t()

After:

for i in range(len(x)):
    do_t()

Explanation:

The default starting value for a call to range() is 0, so it is unnecessary to explicitly define it. This refactoring removes such zeros, slightly shortening the code.

Replace index with underscore

Sourcery refactoring id: for-index-underscore

Description:

Replaces an unused index in a for loop with an underscore

Before:

for hat in my_wardrobe.hats:
    shout("Hurrah!")

After:

for _ in my_wardrobe.hats:
    shout("Hurrah!")

Explanation:

Sometimes in a for loop we just want some code to run a certain number of times, and don't actually make use of the index variable.

In the above example we have introduced a new variable, hat, which we have to note when reading the code, but actually we don't need it and could replace it with _:

It is a convention in Python to use _ as a throwaway name for unused variables. This means your brain can learn to safely ignore these, reducing the overhead to understand the code. Where you see this in a for loop it it immediately clear that the loop is just used to repeat a block of code and we don't care about the value being iterated over.

Replace while with for

Sourcery refactoring id: while-to-for

Description:

Replaces a while loop with a counter by a for loop

Before:

i = 0
while i < 10:
    print("I love hats")
    i += 1

After:

for i in range(10):
    print("I love hats")

Explanation:

We often need to iterate over the same bit of code a certain number of times. Developers who are unfamiliar with Python may consider using a while loop for this, but it's almost certainly better to use for.

Converting from while to for means the explicit handling of the counter can be removed, shortening the code and making it esier to read.

Return identity

Sourcery refactoring id: return-identity

Description:

Return booleans directly rather than wrapping them in a conditional that returns True or False

Before:

def function(a, b):
    if isinstance(a, b) or issubclass(b, a):
        return True
    return False

After:

def function(a, b):
    return isinstance(a, b) or issubclass(b, a)

Explanation:

Where we reach the end of a method and want to return True or False, a common way of doing so to explicity return them.

However, it's neater just to return the result directly.

This can only be done if the expression evaluates to a boolean. In this example:

def any_hats(self, wardrobe):
    hats = [item for item in wardrobe if is_hat(item)]
    if hats or self.wearing_hat():
        return True
    return False

We can't do this exact refactoring, since now we could be returning the list of hats rather than True or False. To make sure we're returning a boolean, we can wrap the return in a call to bool():

def any_hats(self, wardrobe):
    hats = [item for item in wardrobe if is_hat(item)]
    return bool(hats or self.wearing_hat())

Set comprehension

Sourcery refactoring id: set-comprehension

Description:

Replaces sets created with for loops with set comprehensions

Before:

cubes = set()
for i in range(20):
    cubes.add(i**3)

After:

cubes = {i**3 for i in range(20)}

Explanation:

A set comprehension can create the set on one line, cutting out the clutter of declaring an empty set and then adding values.

Turning three lines of code into one is a definite win - it means less scrolling back and forth when reading methods and helps keep things manageable.

Squeezing code onto one line can make it more difficult to read, but for comprehensions this isn't the case. All of the elements that you need are nicely presented, and once you are used to the syntax it is actually more readable than the for loop version.

Another point is that the assignment is now more of an atomic operation - we're declaring what cubes is rather than giving instructions on how to build it. This makes the code read like more of a narrative, since going forward we will care more about what cubes is than the details of its construction.

Finally comprehensions will usually execute more quickly than building the collection in a loop, which is another factor if performance is a consideration.

Simplify boolean comparison

Sourcery refactoring id: simplify-boolean-comparison

Description:

Removes unnecessarily verbose boolean comparisons

Before:

need_hat = not is_raining
if need_hat == True:
    put_on_hat()

After:

need_hat = not is_raining
if need_hat:
    put_on_hat()

Explanation:

It is unnecessary to compare boolean values to True or False in the test of an if condition. Removing these unnecessary checks makes the code slightly shorter and easier to parse.

Simplify generator

Sourcery refactoring id: simplify-generator

Description:

An identity generator (a for a in coll) can be replaced directly with the collection coll

Before:

if any(hat for hat in wardrobe.hats):
  print("I have a hat!")

After:

if any(wardrobe.hats):
  print("I have a hat!")

Explanation:

The expression (x for x in y) is a generator that returns all of the elements of y. If being passed into a function like any or all that takes a generator or sequence, it can simply be replaced by y which is much clearer.

Simplify length comparison

Sourcery refactoring id: simplify-len-comparison

Description:

Removes unnecessarily verbose length comparisons

Before:

if len(list_of_hats) > 0:
    hat_to_wear = choose_hat(list_of_hats)

After:

if list_of_hats:
    hat_to_wear = choose_hat(list_of_hats)

Explanation:

Something we often do is check whether a list or sequence has elements before we try and do something with it.

A Pythonic way of doing this is just to use the fact that Python lists and sequences evaluate to True if they have elements, and False otherwise.

Doing it this way is a convention, set out in Python's PEP8 style guide. Once you've gotten used to doing it this way it does make the code slightly easier to read and a bit less cluttered.

Simplify negative index

Sourcery refactoring id: simplify-negative-index

Description:

Replaces a[len(a)-1] with negative index lookup a[-1]

Before:

a = [1, 2, 3]
last_element = a[len(a) - 1]

After:

a = [1, 2, 3]
last_element = a[-1]

Explanation:

In Python you can access the end of a list by using negative indices. So my_list[-1] gets the final element, my_list[-2] gets the penultimate one and so on. Once you know the trick, reading code that uses it is much easier than the alternative.

Simplify numeric comparison

Sourcery refactoring id: simplify-numeric-comparison

Description:

Consolidates any mathematical operations in a numeric comparison so there is a direct comparison of variable to number

Before:

if a + 1 < 2:
    do_x()

After:

if a < 1:
    do_x()

Explanation:

This refactoring removes unnecessary clutter from the code and makes it easier to determine what a variable is being compared to.

Split or ifs

Sourcery refactoring id: split-or-ifs

Description:

Splits out conditions combined with an or in an if statement into their own if statement.

Before:

if a or b:
    c()

After:

if a:
    c()
elif b:
    c()

Explanation:

It is logically equivalent to transform an or condition into an if branch and an elif branch of a conditional. This is not prima facie an improvement to the code, but it can unlock further improvements, and Sourcery will only make this change if such improvements are possible.

Square identity

Sourcery refactoring id: square-identity

Description:

Replaces cases of a variable being multiplied by itself with squaring that variable

Before:

def square(variable: int):
    return variable*variable

After:

def square(variable: int):
    return variable**2

Explanation:

This identity can make the code slightly shorter and easier to read.

Sum comprehension

Sourcery refactoring id: sum-comprehension

Description:

Replaces summed values created with for loops with sum comprehensions

Before:

total = 0
for hat in hats:
    total += hat.price

After:

total = sum(hat.price for hat in hats)

Explanation:

Much of programming is about adding up lists of things, and Python has the built-in sum() function to help with this.

This is much shorter, which is a definite bonus. The code also now explicitly tells you what it is trying to do - sum the price of all the hats.

Swap if & else branches

Sourcery refactoring id: swap-if-else-branches

Description:

Swaps if and else branches of conditionals

Before:

if location == OUTSIDE:
    pass
else:
    take_off_hat()

After:

if location != OUTSIDE:
   take_off_hat()
else:
   pass

Explanation:

One pattern we sometimes see is a conditional where nothing happens in the main body, and all of the action is in the else clause.

In this case we can make the code shorter and more concise by swapping the main body and the else around. We have to make sure to invert the conditional, then the logic from the else clause moves into the main body.

We then have an else clause which does nothing, so we can remove it.

if location != OUTSIDE:
    take_off_hat()

This is easier to read, and the intent of the conditional is clearer. When reading the code I don't have to mentally invert it to understand it, since that has been done for me.

Sourcery will also make this change if the else can be dropped since the body of the if is a guard condition.

Swap nested ifs

Sourcery refactoring id: swap-nested-ifs

Description:

Swaps the order of nested if statements

Before:

if a:
    if b:
        return c

After:

if b:
    if a:
        return c

Explanation:

This refactoring identity will not improve code quality on its own, but can be part of a series of steps that will improve quality. Sourcery will only suggest this refactoring where it unlocks further improvements.

Switch

Sourcery refactoring id: switch

Description:

Simplify conditionals into a form more like a switch statement

Before:

if item.a == 1:
    q += 2
elif item.a == 2:
    q += 2
elif item.a == 4:
    q = 0

After:

if item.a in [1, 2]:
    q += 2
elif item.a == 4:
    q = 0

Explanation:

This refactoring examines complex conditionals where a variable is being compared to various different values, and tries to put them into the simplest form possible, eliminating duplicated blocks.

Use assigned variable

Sourcery refactoring id: use-assigned-variable

Description:

Uses a variable that was previously defined in the function instead of repeating what was defined in the variable

Before:

wardrobe = {'blue_hat': 1, 'red_hat':3}
for item in wardrobe:
    count = wardrobe[item]
    add_to_total(wardrobe[item])

After:

wardrobe = {'blue_hat': 1, 'red_hat':3}
for item in wardrobe:
    count = wardrobe[item]
    add_to_total(count)

Explanation:

Where possible, it is preferable to re-use local variables that have been assigned to. These variables will often have a more descriptive name that can aid in code comprehension, and additionally re-use can reduce duplication.

Use any

Sourcery refactoring id: use-any

Description:

Use any rather than a for loop

Before:

found = False
for hat in hats:
    if hat == SOMBRERO:
        found = True
        break

After:

found = any(hat == SOMBRERO for hat in hats)

Explanation:

A common pattern is that we need to find if some condition holds for one or all of the items in a collection.

Using Python's any() and all() built in functions is a more concise way of doing this than using a for loop.

any() will return True when at least one of the elements evaluates to True, all() will return True only when all the elements evaluate to True.

These will also short-circuit execution where possible. If the call to any() finds an element that evalutes to True it can return immediately. This can lead to performance improvements if the code wasn't already short-circuiting.

Use count

Sourcery refactoring id: use-count

Description:

Replaces sum() with count() where appropriate

Before:

return sum(hat == "bowler" for hat in hats)

After:

return hats.count("bowler")

Explanation:

This simplification can be made where a call to sum() is counting the number of elements that match a condition.

Use dictionary items

Sourcery refactoring id: use-dict-items

Description:

Use dictionary.items() in for loops to access both key and value at same time

Before:

hats_by_colour = {'blue': ['panama', 'baseball_cap']}
for hat_colour in hats_by_colour:
    hats = hats_by_colour[hat_colour]
    if hat_colour in FAVOURITE_COLOURS:
        think_about_wearing(hats)

After:

hats_by_colour = {'blue': ['panama', 'baseball_cap']}
for hat_colour, hats in hats_by_colour.items():
    if hat_colour in FAVOURITE_COLOURS:
        think_about_wearing(hats)

Explanation:

When iterating over a dictionary a good tip is that you can use items() to let you access the keys and values at the same time.

This saves us the line that we used to assign to hats, incorporating it into the for loop. The code now reads more naturally, with a touch less duplication.

Use str.join()

Sourcery refactoring id: use-join

Description:

Use str.join() instead of for loop

Before:

all_hats = ""
for hat in my_wardrobe.hats:
    all_hats += hat.description

After:

all_hats = "".join(hat.description for hat in my_wardrobe.hats)

Explanation:

The most straightforward way to concatenate strings in Python is to just use the + operator:

hat_description = hat.colour + hat.type

This is perfectly fine when you are joining together small numbers of strings (though f-strings are the best choice for doing more complicated string handling).

The problem with using + or += comes when they are used to concatenate large lists of strings. For example you might use them in a for loop like the 'Before' example. This is cumbersome to read, and also isn't very performant. A new string has to be created for every iteration in the for loop, which slows things down.

Luckily Python strings come with the join method to solve this problem.

This accomplishes the same task in one line and is quite a lot faster to boot. You can also add a separator in between each string easily, without having to worry about extra separators being added at the beginning or end of the result.

all_hats = ", ".join(hat.description for hat in my_wardrobe.hats)

Use len()

Sourcery refactoring id: use-len

Description:

Replaces sum() with len where appropriate

Before:

if sum(1 for hat in hats) > 0:
    self.shout("I have hats")

After:

if len(hats) > 0:
    self.shout("I have hats")

Explanation:

This simplification can be made where the sum() call is effectively just determining the length of a sized object. It can only be made where Sourcery can determine that hats supports the __len__() method.

Yield from

Sourcery refactoring id: yield-from

Description:

Replaces yield as part of for loops with yield from

Before:

def get_content(entry):
    for block in entry.get_blocks():
        yield block

After:

def get_content(entry):
    yield from entry.get_blocks()

Explanation:

One little trick that often gets missed is that Python's yield keyword has a corresponding yield from for collections, so there's no need to iterate over a collection with a for loop. This makes the code slightly shorter and removes the mental overhead and extra variable used by the for loop. Eliminating the for loop also makes the yield from version about 15% faster.

Note that this cannot be done on async functions.

Clone this wiki locally