-
Notifications
You must be signed in to change notification settings - Fork 74
Current Refactorings
These are the refactorings currently available in Sourcery.
If you hover over a Sourcery refactoring proposal in your IDE or look at a Sourcery comment in GitHub you can see the Sourcery Refactoring ID for all of the refactorings being proposed in that change.
If you want to disable any of the refactorings in a
specific function add in the comment # sourcery skip
followed by the refactoring id.
To skip it across your whole project add it into the refactor skips section of your Sourcery config file
.
If you have any questions about any of these refactorings or have suggestions for new refactorings please reach out to us.
Replaces assignments with augmented assignments
count = count + other_value
count += other_value
Reading the changed code is a bit shorter and clearer - we don't need to think about the count
variable
twice. Other operators that can be used include -=
, *=
, /=
and **=
.
The type you're assigning to has to have the appropriate
operator defined, so Sourcery will only suggestion this change where it can
determine the type and it is suitable.
For instance numpy
arrays do not support the /=
operation.
Replaces conditional assignment to a variable with an if expression
if condition:
x = 1
else:
x = 2
x = 1 if condition else 2
Python's conditional expression syntax is its version of the ternary operator. Using this syntax is definitely more concise, but it is one of the more controversial refactorings (along with list comprehensions). Some coders dislike these expressions and find them slightly harder to parse than writing them out fully.
Our view is that this is a definite improvement where the conditional expression is:
- short and fits on one line
- is not complex (no nested expressions or long chains of booleans)
Once the change is made there's only one statement where x
is defined as opposed to having to
read two statements plus the if-else lines. Similarly to the comprehension example, when
we're scanning the code we usually won't need to know the details of how x
gets assigned,
and can just see that it's being assigned and move on.
Replaces binary operationss between a value and itself with known identities:
-
x | x
=>x
-
x & x
=>x
-
x ^ x
=>0
-
x - x
=>0
-
x / x
=>1
-
x // x
=>1
-
x % x
=>0
These changes simplify the code, making it easier to understand what is going on.
Simplifies boolean if expressions by removing unnecessary explicit references to True
or False
states
some_var = True if some_boolean_expression else False
some_var = some_boolean_expression
Rather than using an if-expression to evaluate a boolean, you can just use it directly. This is shorter and clearer - if expressions of this form take a while to mentally parse.
Where Sourcery cannot determine if the expression is a boolean it will wrap it in a call tobool()
.
Use list, set or dictionary comprehensions directly instead of calling list(), dict() or set()
squares = list(x*x for x in y)
squares = set(x*x for x in y)
squares = dict((x, x*x) for x in xs)
squares = [x*x for x in y]
squares = {x*x for x in y}
squares = {x: x*x for x in xs}
The Pythonic way to create a list, set or dictionary from a generator is to use comprehensions.
Using the comprehensions rather than the methods is slightly shorter, and the dictionary comprehension in particular is easier to read in comprehension form.
Replace unneeded comprehension with generator
hat_found = any([is_hat(item) for item in wardrobe])
hat_found = any(is_hat(item) for item in wardrobe)
Functions like any
, all
and sum
allow you to pass in a generator rather
than a collection. Doing so removes a pair of brackets, making the intent slightly clearer.
It will also return immediately if a hat is found, rather than having to build the whole list. This lazy
evaluation can lead to performance improvements.
Note that we are actually passing a generator into any()
so strictly speaking the code
would look like this:
hat_found = any((is_hat(item) for item in wardrobe))
but Python allows you to omit this pair of brackets.
The standard library functions that accept generators are:
'all', 'any', 'enumerate', 'frozenset', 'list', 'max', 'min', 'set', 'sum', 'tuple'
Converts any()
functions to simpler in
statements
if any(hat == 'bowler' for hat in hats):
shout("I have a bowler hat!")
if 'bowler' in hats:
shout("I have a bowler hat!")
Using Python's in
operator simplifies the code and makes it much easier to tell at a
glance that you are checking if something is present in the sequence.
Replaces manual loop counter with call to enumerate
i = 0
for currency in currencies:
print(i, currency)
i += 1
for i, currency in enumerate(currencies):
print(i, currency)
When iterating over a list you sometimes need access to a loop counter that will let you know the index of the element you are utilising.
Using the built-in Python function, enumerate
, lets you generate an index
directly, removing two unneeded lines of code.
When reading this we don't have to worry about the book-keeping of the i
variable,
letting us focus in on the code that really matters.
Simplify dictionary access by using the default get
method
def pick_hat(available_hats: Dict[Label, Hat]):
if self.favourite_hat in available_hats:
hat_to_wear = available_hats[self.favourite_hat]
else:
hat_to_wear = NO_HAT
return hat_to_wear
def pick_hat(available_hats: Dict[Label, Hat]):
hat_to_wear = available_hats.get(self.favourite_hat, NO_HAT)
return hat_to_wear
We often want to pick something from a dictionary if the key is present, or use a default value if it isn't.
A useful shortcut is that Python dictionaries have a get()
method which lets you set a default value
using the second parameter.
This has slimmed the code down and removed some duplication. A point to note is that if you don't pass
in a default value to get()
it will use None
.
Replaces cases where deletions are made via for
loops with comprehensions
x1 = {'a': 1, 'b': 2, 'c': 3}
for key in x1.copy(): # can't iterate over a variable that changes size
if key not in x0:
del x1[key]
x1 = {'a': 1, 'b': 2, 'c': 3}
x1 = {key: value for key, value in x1.items() if key in x0}
When creating a filtered list in Python it is shorter and more readable to use a comprehension than to copy a list and delete the unneeded keys.
Simplifies conditional logic using De Morgan's Identity
if not not p:
a = 1
if not a is b:
a = 1
if not (p == 1 and q == 2):
a = 1
if p:
a = 1
if a is not b:
a = 1
if p != 1 or q != 2:
a = 1
When reading logical statements it is important for them to be as simple as possible. This proposal applies De Morgan's identity to simplify code by removing double negatives or pushing negations deeper into statements.
Replaces dictionaries created with for
loops with dictionary comprehensions
cubes = {}
for i in range(100):
cubes[i] = i**3
cubes = {x: x**3 for x in range(100)}
A dictionary comprehension can create the dictionary on one line, cutting out the clutter of declaring an empty dict and then adding items.
Turning three lines of code into one is a definite win - it means less scrolling back and forth when reading methods and helps keep things manageable.
Squeezing code onto one line can make it more difficult to read, but for comprehensions this isn't the case. All of the elements that you need are nicely presented, and once you are used to the syntax it is actually more readable than the for loop version.
Another point is that the assignment is now more of an atomic operation - we're declaring what cubes
is
rather than giving instructions on how to build it. This makes the code read like more of
a narrative, since going forward we will care more about what cubes
is than the details
of its construction.
Finally comprehensions will usually execute more quickly than building the collection in a loop, which is another factor if performance is a consideration.
Replaces dictionaries created with dict()
with {}
x = dict()
x = {}
The most concise and Pythonic way to create a dictionary is to use the {}
notation.
This fits in with the way we create dictionaries with items, saving a bit of mental energy that might be taken up with thinking about two different ways of creating dicts.
x = {'first':"thing"}
Doing things this way has the added advantage of being a nice little performance improvement.
Here are the timings before and after the change:
$ python3 -m timeit "x = dict()"
5000000 loops, best of 5: 69.8 nsec per loop
$ python3 -m timeit "x = {}"
20000000 loops, best of 5: 29.4 nsec per loop
Similar reasoning and performance results hold for replacing list()
with []
.
Use with
when opening file to ensure closure
file = open("welcome.txt")
data = file.read()
print(data)
file.close()
with open("welcome.txt") as file:
data = file.read()
print(data)
Opening files in this way is more concise, and also ensures that the
call to file.close()
is not skipped if there is an exception.
It makes use of Python's with
context manager - under the hood the changed
code behaves like this:
file = open("welcome.txt")
try:
data = file.read()
print(data)
finally:
file.close()
The file is closed for you as soon as the block is exited, even where an exception has been thrown.
Identifies duplicate sections of code in a function and extracts these into their own method
Requires Sourcery Pro
def extraction_example():
self.speed_slider = Scale(
self.master, from_=1, to=10, orient=HORIZONTAL, label='Speed')
self.speed_slider.pack()
self.speed_slider.set(DEFAULT_SPEED)
self.speed_slider.configure(background='white')
self.force_slider = Scale(
self.master, from_=1, to=10, orient=HORIZONTAL, label='Force')
self.force_slider.pack()
self.force_slider.set(DEFAULT_FORCE)
self.force_slider.configure(background='white')
def extraction_example():
self.speed_slider = _extracted_from_extraction_example_2(
self, 'Speed', DEFAULT_SPEED)
self.force_slider = _extracted_from_extraction_example_2(
self, 'Force', DEFAULT_FORCE)
def _extracted_from_extraction_example_2(self, label, arg2):
result = Scale(self.master, from_=1, to=10, orient=HORIZONTAL, label=label)
result.pack()
result.set(arg2)
result.configure(background='white')
return result
Do not Repeat Yourself (DRY) is an important tenet of writing clean, maintainable code. Duplicated code bloats the code base, making it harder to read and understand. It often also leads to bugs. Where changes are made in only some of the duplicated areas unintended behaviour will often arise.
One of the main ways to remove duplication is to extract the common areas into another
method and call that. Sourcery can detect areas of duplicate code that are in the same
function and extract them. It is recommended that you then rename the extracted function
and any arguments that have not been automatically named. In the above example
a suitable method name would be create_slider
, and arg2
would be default_value
.
Extracts complex pieces of functions into their own methods
Requires Sourcery Pro
def func(self):
if typed and self.current_char == '(':
pos = self.pos
try:
pass
except DefinitionError as exParamQual:
self.pos = pos
try:
pass
except DefinitionError as exNoPtrParen:
self.pos = pos
msg = "If parenthesis in noptr-declarator"
if paramMode == 'function':
msg += " (e.g., 'void (*f(int arg))(double)')"
prevErrors.append((exNoPtrParen, msg))
header = "Error in declarator"
raise self._make_multi_error(prevErrors, header)
def func(self):
if typed and self.current_char == '(':
pos = self.pos
try:
pass
except DefinitionError as exParamQual:
self.pos = pos
try:
pass
except DefinitionError as exNoPtrParen:
self._extracted_from_func_11(pos, exNoPtrParen)
def _extracted_from_func_11(self, pos, exNoPtrParen):
self.pos = pos
msg = "If parenthesis in noptr-declarator"
if paramMode == 'function':
msg += " (e.g., 'void (*f(int arg))(double)')"
prevErrors.append((exNoPtrParen, msg))
header = "Error in declarator"
raise self._make_multi_error(prevErrors, header)
Methods are much more readable if they are short, and if every line of code is at the same rough level of abstraction.
This refactoring takes long blocks from functions and extracts them into their own method. It will only trigger for large blocks that are possible to extract, and where the remainder of the method is sufficiently long.
It is recommended that you then rename the extracted function
and any arguments that have not been automatically named. Here a suitable method
name would be handle_noptr_exception
.
Moves variables from the right side to the left side of comparisons
if 1 == x:
do_x()
if x == 1:
do_x()
It is a coding convention that when doing comparisons to a numeric constant, the number should be on the right and the variable on the left. Having things be consistent in this way makes the code easier to read.
Adds in guard clause to a conditional
def set_hat_details(hat):
if hat.stylish and is_summer_hat(hat):
hat.tag = 'SUMMER'
elif not hat:
pass
elif hat.stylish and is_winter_hat(hat):
hat.tag = 'WINTER'
def set_hat_details(hat):
if not hat:
return
if hat.stylish and is_summer_hat(hat):
hat.tag = 'SUMMER'
elif hat.stylish and is_winter_hat(hat):
hat.tag = 'WINTER'
Often one option within an if..elif
conditional will have no effect, and hence
have only a pass
within it. Where there is no further code within the block
afterwards it is possible to extract this branch out as a guard condition. This
can then sometimes allow further refinements to be made to the if..elif
statements.
Sourcery only suggests this refactoring if it unlocks such further changes - for example
in the above case it would be combined with lift-duplicated-conditional
to give
the following output:
def set_hat_details(hat):
if not hat:
return
if hat.stylish:
if is_summer_hat(hat):
hat.tag = 'SUMMER'
elif is_winter_hat(hat):
hat.tag = 'WINTER'
Moves if
statements that match a conditional out of that conditional
if hat.quality < DESIRED_QUALITY:
happiness -= 1
if not hat.is_stylish() and hat.quality < DESIRED_QUALITY:
happiness -= 1
if hat.quality < DESIRED_QUALITY:
happiness -= 1
if not hat.is_stylish() and hat.quality < DESIRED_QUALITY:
happiness -= 1
Where a nested conditional repeats the conditions of the outer one, it is logically equivalent to reduce the level of nesting as shown above. Doing this can make the meaning of the code slightly clearer.
Moves loops that occur in all cases of an if
statement outside of the conditional
def sing_song(self):
if style == 1:
while thing_true():
do_x()
elif style == 2:
while thing_true():
do_y()
elif style == 3:
while thing_true():
do_z()
def sing_song(self):
while thing_true():
if style == 1:
do_x()
elif style == 2:
do_y()
elif style == 3:
do_z()
Explanation
Where the same for
or while
loop is defined in all branches of a conditional,
the code can be considerably shortened and clarified by hoisting. By moving the
loop outside, duplication is removed and the code becomes much clearer.
Moves statements that occur in all cases of an if
statement outside of the conditional
if sold > DISCOUNT_AMOUNT:
total = sold * DISCOUNT_PRICE
label = f'Total: {total}'
else:
total = sold * PRICE
label = f'Total: {total}'
if sold > DISCOUNT_AMOUNT:
total = sold * DISCOUNT_PRICE
else:
total = sold * PRICE
label = f'Total: {total}'
We should always be on the lookout for ways to remove duplicated code. An opportunity for code hoisting is a nice way of doing so.
Sometimes code is repeated on both branches of a conditional. This means that the code will always execute. The duplicate lines can be hoisted out of the conditional and replaced with a single line.
By taking the assignment to label
outside of the conditional we have removed a
duplicate line of code, and made it clearer what the conditional is actually
controlling, which is the total
.
Moves statements that are constant across all cases of a loop outside of the loop
for building in buildings:
city = 'London'
addresses.append(building.street_address, city)
city = 'London'
for building in buildings:
addresses.append(building.street_address, city)
Another type of hoisting is pulling invariant statements out of loops. If a statement just sets up some variables for use in the loop, it doesn't need to be inside it. Loops are inherently complex, so making them shorter and easier to understand should be on your mind while writing them.
In this example the city
variable gets assigned inside the loop, but it is only
read and not altered.
It's therefore safe to hoist it out, and this makes it clearer that the same city
value will apply to every building
.
This also improves performance - any statement in a loop is going to be executed every time the loop runs. The time spent on these multiple executions is being wasted, since it only needs to be executed once. This saving can be significant if the statements involve calls to databases or other time-consuming tasks.
Inlines a variable to a return
in the case when the variable being declared is immediately returned
def state_attributes(self):
"""Return the state attributes."""
state_attr = {
ATTR_CODE_FORMAT: self.code_format,
ATTR_CHANGED_BY: self.changed_by,
}
return state_attr
def state_attributes(self):
"""Return the state attributes."""
return {
ATTR_CODE_FORMAT: self.code_format,
ATTR_CHANGED_BY: self.changed_by,
}
Something that we often see in people's code is assigning to a result variable and then immediately returning it.
Returning the result directly shortens the code and removes an unnecessary variable, reducing the mental load of reading the function.
Where intermediate variables can be useful is if they then get used as a parameter or a condition, and the name can act like a comment on what the
variable represents. In the case where you're returning it from a function, the function name is
there to tell you what the result is - in the example above it's the state attributes, and the state_attr
name
wasn't providing any extra information.
Replace item lookups in loops using the index with direct reference to the items
for i in range(len(currencies)):
print(currencies[i])
for currency in currencies:
print(currency)
A pattern that is often used in Python for loops is to use range(len(list))
to
generate a range of numbers that can be iterated over.
If the index i
is only used to do list access this code can be improved by iterating
over the list directly, as in the above example.
This code is easier to understand, and a lot less cluttered. In particular being able to use
a meaningful name for currency
greatly improves readability.
Note that this refactoring will only trigger where Sourcery can determine that
the collection (currencies
in this case) is a list.
Switches not any
or not all
statements to all
or any
statements respectively
b = not all(a > b for a in things)
b = any(a <= b for a in things)
A negation is more difficult to read than a straightforward statement. This refactoring inverts the any or all to remove the negation.
Simplifies any
and all
statements that can be inverted
b = any(not a for a in things)
b = all(things)
A negation is more difficult to read than a straightforward statement. This refactoring inverts the any or all to remove the negation.
Convert the final conditional into a guard clause
def f(a=None):
if a is None:
return 42
else:
# some long calculations
var = (i 2 for i in range(a))
return sum(var)
def f(a=None):
if a is None:
return 42
# some long calculations
var = (i 2 for i in range(a))
return sum(var)
A common code pattern is to have some clauses at the start of a function, to check whether certain conditions have been fulfilled and return early or raise an exception if not.
While this is perfectly valid code, it can run into problems with excessive nesting, particularly if the rest of the function is fairly long.
Here we can take advantage of the fact that we don't need the else
if the main body of the if
breaks the control flow by ending with return
or raise
. Rewriting the function as shown here
is logically equivalent.
Using a guard condition, or multiple guard conditions, in this way now doesn't cause the rest of the function to be indented. In general the less we have to deal with indents the easier the code is to understand.
Lift repeated conditional into its own if
statement
if isinstance(hat, Sombrero) and hat.colour == 'green':
wear(hat)
elif isinstance(hat, Sombrero) and hat.colour == 'red':
destroy(hat)
if isinstance(hat, Sombrero):
if hat.colour == 'green':
wear(hat)
elif hat.colour == 'red':
destroy(hat)
Duplicating conditions in an if
statement makes things more difficult to read,
and carries with it all the usual problems of code duplication. While normally
we try to avoid adding nesting to the code, in this case it makes sense to lift
the duplicated conditional into its own if
statement.
It is now clearer at a glance that the whole if..elif
chain relates only to
sombreros and not other types of hat.
Lift return
into if
def f():
if condition:
val = 42
else:
val = 0
return val
def f():
if condition:
return 42
else:
return 0
This is a quick way to streamline code slightly. Where a value is set on each branch of an if and then immediately returned, instead return it directly from each branch.
This has removed an unnecessary intermediate variable which we had to mentally track.
Converts a for
loop into a list comprehension
cubes = []
for i in range(20):
cubes.append(i**3)
cubes = [i**3 for i in range(20)]
A list comprehension can create the list on one line, cutting out the clutter of declaring an empty list and then appending values.
Turning three lines of code into one is a definite win - it means less scrolling back and forth when reading methods and helps keep things manageable.
Squeezing code onto one line can make it more difficult to read, but for comprehensions this isn't the case. All of the elements that you need are nicely presented, and once you are used to the syntax it is actually more readable than the for loop version.
Another point is that the assignment is now more of an atomic operation - we're declaring what cubes
is
rather than giving instructions on how to build it. This makes the code read like more of
a narrative, since going forward we will care more about what cubes
is than the details
of its construction.
Finally comprehensions will usually execute more quickly than building the collection in a loop, which is another factor if performance is a consideration.
Replaces lists created with list()
with []
x = list()
x = []
The most concise and Pythonic way to create a list is to use the []
notation.
This fits in with the way we create lists with elements, saving a bit of mental energy that might be taken up with thinking about two different ways of creating lists.
x = ['first', 'second']
Doing things this way has the added advantage of being a nice little performance improvement.
Here are the timings before and after the change:
$ python3 -m timeit "x = list()"
5000000 loops, best of 5: 63.3 nsec per loop
$ python3 -m timeit "x = []"
20000000 loops, best of 5: 15.8 nsec per loop
Similar reasoning and performance results hold for replacing dict()
with {}
.
Consolidates multiple comparisons into a single comparison
def process_payment(payment):
if payment.currency == 'USD' or payment.currency == 'EUR':
process_standard_payment(payment)
else:
process_international_payment(payment)
def process_payment(payment):
if payment.currency in ['USD', 'EUR']:
process_standard_payment(payment)
else:
process_international_payment(payment)
We often have to compare a value to one of several possible others. When written out like the 'Before' example we have to look through each comparison to understand it, as well as mentally processing the boolean operator.
By using the in
operator and moving the values
we are comparing to into a collection we can simplify things.
This has avoided a little bit of duplication, and the conditional can now be taken in and understood with one glance.
Declare the dictionary with values rather than creating an empty one and assigning to it
hats_i_own = {}
hats_i_own['panama'] = 1
hats_i_own['baseball_cap'] = 2
hats_i_own['bowler'] = 23
hats_i_own = {'panama': 1, 'baseball_cap': 2, 'bowler': 23}
When declaring a dictionary and filling it up with values one way that can come naturally is to declare it as empty and then add entries to it.
This can be done in place, shortening the code and making the intent more explicit. Now I just need to glance at one line to see that I'm filling a variable with hats, rather than four.
The same holds true for filling up other collection types like sets and lists.
Restructure conditional to merge duplicate branches together
def process_payment(payment):
if payment.currency == 'USD':
process_standard_payment(payment)
elif payment.currency == 'EUR':
process_standard_payment(payment)
else:
process_international_payment(payment)
def process_payment(payment):
if payment.currency == 'USD' or payment.currency == 'EUR':
process_standard_payment(payment)
else:
process_international_payment(payment)
We should always be searching out opportunities to remove duplicated code. A good
place to do so is where there are multiple identical blocks inside an if..elif
chain. This refactoring combines such blocks.
Now if we need to change the process_standard_payment(payment)
line we can
do it in one place instead of two. This becomes even more important if these blocks
involve multiple lines.
Combines together multiple isinstance
functions
if isinstance(hat, Bowler) or isinstance(hat, Fedora):
wear(hat)
if isinstance(hat, (Bowler, Fedora)):
wear(hat)
When you want to check whether something is one of multiple different types, you
can merge the two isinstance
checks into a single call.
This is shorter while staying nice and easy to read.
Create the list with values instead of creating an empty list and appending to it
hats_i_own = []
hats_i_own.append('panama')
hats_i_own.append('baseball_cap')
hats_i_own.append('bowler')
hats_i_own = ['panama', 'baseball_cap', 'bowler']
When declaring a list and filling it up with values one way that can come naturally is to declare it as empty and then append to it.
This can be done in place, shortening the code and making the intent more explicit. Now I just need to glance at one line to see that I'm filling a variable with hats, rather than four.
Doing it this way is also slightly more performant since it avoids the function calls to append
.
The same holds true for filling up other collection types like sets and dictionaries.
Create the list with values instead of creating an empty list and extending it with another list
hats_i_own = []
hats_i_own.extend(['panama, baseball_cap, bowler'])
hats_i_own = ['panama', 'baseball_cap', 'bowler']
When declaring a list with values it is much clearer to do so on one line rather than declaring an empty list and then extending it with values.
Merges together multiple nested if
conditions into one
if a:
if b:
return c
if a and b:
return c
Too much nesting can make code difficult to understand, and this is especially true in Python, where there are no brackets to help out with the delineation of different nesting levels.
Reading deeply nested code is confusing, since you have to keep track of which conditions
relate to which levels. We therefore strive to reduce nesting where possible, and the
situation where two if
conditions can be combined using and
is an easy win.
Merges together the interior contents of if
statements with identical conditions
if wardrobe.hats:
self.happiness += 1
else:
self.happiness -= 1
if wardrobe.hats:
self.stylishness += 1
else:
self.stylishness -= 1
if wardrobe.hats:
self.happiness += 1
self.stylishness += 1
else:
self.happiness -= 1
self.stylishness -= 1
Where subsequent if
statements have identical conditions it is logically
equivalent to merge them together. This shortens the code and makes it easier
to read and understand.
Note that this can only be done if the statements inside the first if
condition
have no effect on the condition itself.
Create the set with values instead of declaring an empty set and adding to it
hats_i_own = set()
hats_i_own.add('panama')
hats_i_own.add('baseball_cap')
hats_i_own.add('bowler')
hats_i_own = {'panama', 'baseball_cap', 'bowler'}
When declaring a set and filling it up with values one way that can come naturally is to declare it as empty and then add to it.
This can be done in place, shortening the code and making the intent more explicit. Now I just need to glance at one line to see that I'm filling a variable with hats, rather than four.
Doing it this way is also slightly more performant since it avoids the function calls to add
.
The same holds true for filling up other collection types like lists and dictionaries.
Replaces duplicate conditionals looking for the minimum or maximum value of multiple variables with a min
or max
function
if first_hat.price < second_hat.price:
cheapest_hat_price = first_hat.price
else:
cheapest_hat_price = second_hat.price
if sale_price >= 10:
sale_price = 10
cheapest_hat_price = min(first_hat.price, second_hat.price)
sale_price = min(sale_price, 10)
We often need to work out the smallest or largest of two values, and the
quickest way to do this in Python is to use the built-in min
and max
functions.
This results in a shorter and clearer way to achieve the same result.
The same functions also offer a shortcut for when we want to put a cap or a floor
on the value of a variable.
Moves assignment of variables closer to their usage
def should_i_wear_this_hat(self, hat):
if not isinstance(hat, Hat):
return False
weather_outside = self.look_out_of_window()
is_stylish = isinstance(hat, StylishHat)
if weather_outside.is_raining:
print("Damn.")
return True
else:
print("Great.")
return is_stylish
def should_i_wear_this_hat(self, hat):
if not isinstance(hat, Hat):
return False
weather_outside = self.look_out_of_window()
if weather_outside.is_raining:
print("Damn.")
return True
else:
print("Great.")
is_stylish = isinstance(hat, StylishHat)
return is_stylish
The scope of local variables should always be as tightly defined as possible and practicable.
This means that:
-
You don't have to keep the variable in your working memory through the parts of the function where it's not needed. This cuts down on the cognitive load of understanding your code.
-
If code is in coherent blocks where variables are declared and used together, it makes it easier to split functions apart, which can lead to shorter, easier to understand methods.
-
If variables are declared far from their usage, they can become stranded. If the code where they are used is later changed or removed unused variables can be left sitting around, complicating the code unnecessarily.
Replaces ==
with is
when comparing to None
if hat == None:
raise NoHatException
if hat is None:
raise NoHatException
In Python is
refers to reference equality - where you want to check if something
is the same object. Equals ==
checks value equality, to check that the objects
are equal to each other. In Python the None
object is a singleton, so it is
correct to use is
when comparing to it.
Replaces if
expressions that compare to the target with or
currency = input_currency if input_currency else DEFAULT_CURRENCY
currency = input_currency or DEFAULT_CURRENCY
Here we find ourselves setting a value if it evaluates to True
, and otherwise using a default.
The 'After' case is a bit easier to read and avoids the duplication of input_currency
.
It works because the left-hand side is evaluated first. If it evaluates to True
then currency
will be set to this and the right-hand side will not be evaluated. If it evaluates to False
the
right-hand side will be evaluated and currency
will be set to DEFAULT_CURRENCY
.
Removes unnecessary call to keys()
when iterating over a dictionary
for currency in currencies.keys():
process(currency)
for currency in currencies:
process(currency)
Sometimes when iterating over a dictionary you only need to use the dictionary keys.
In this case the call to keys()
is unnecessary, since the default behaviour when
iterating over a dictionary is to iterate over the keys.
The code is now slightly cleaner and easier to read, and avoiding a function call will yield a (small) performance improvment.
Removes a pass
from the body of a conditional by inverting it
if not a:
pass
else:
return c
if a:
return c
It is much clearer for a conditional to do something in its main body than
solely in its else
clause. In the latter case anyone reading the code has
to mentally invert the condition in order to determine the meaning.
Removes a pass
from the elif section of a conditional
if a:
x()
elif b:
pass
else:
return c
if a:
x()
elif not b:
return c
Reducing the number of branches in a conditional reduces the mental load of
reading and understanding it considerably. This simplification can be made
where an empty elif
branch is immediately followed by the else
.
Removes exception handlers that can never trigger (as the exceptions have already been caught)
try:
pass
except Exception:
do_x()
except ValueError:
do_y()
finally:
do_t()
try:
pass
except Exception:
do_x()
finally:
do_t()
Currently only one except handler can be triggered for any exception raised in Python. This means that if an exception is already caught, subsequent handlers can never have an effect on it. Having them present is confusing, since the reader may think that this exception handling code is behaving as expected, when in fact it is redundant.
Removes conditional tests where the conditional is always True
or False
def hello(name : str):
if name.startswith("L"):
self.sing("Hip Hip Horray! For " + name)
elif not name.startswith("L"):
self.sing("Hello " + name + ", it's nice to meet you.")
def hello(name : str):
if name.startswith("L"):
self.sing("Hip Hip Horray! For " + name)
else:
self.sing("Hello " + name + ", it's nice to meet you.")
This refactoring will re-structure conditionals where Sourcery determines that
one of the tests is always True
or False
. This reveals the actual logic of
the conditional to the reader, making the code easier to understand.
Removes unnecessary pass
statements
if a:
x()
else:
pass
do_something()
pass
if a:
x()
do_something()
A pass
statement is unnecessary if a block has other statements in it, and
an else
which only contains pass
can be safely removed. Making these changes
shortens the code, and the reader doesn't have to consider and deal with the
unnecessary pass
statements.
Remove unnecessary else
after guard condition
def f(a=None):
if a is None:
return 42
else:
# some long calculations
var = (i 2 for i in range(a))
return sum(var)
def f(a=None):
if a is None:
return 42
# some long calculations
var = (i 2 for i in range(a))
return sum(var)
A common code pattern is to have some clauses at the start of a function, to check whether certain conditions have been fulfilled and return early or raise an exception if not.
While this is perfectly valid code, it can run into problems with excessive nesting, particularly if the rest of the function is fairly long.
Here we can take advantage of the fact that we don't need the else
if the main body of the if
breaks the control flow by ending with return
or raise
. Rewriting the function as shown here
is logically equivalent.
Using a guard condition, or multiple guard conditions, in this way now doesn't cause the rest of the function to be indented. In general the less we have to deal with indents the easier the code is to understand.
Removes code that will never be executed
for a in b:
do_a()
continue
do_x()
do_y()
for a in b:
do_a()
continue
Statements after a continue
, return
or raise
will never be executed. Leaving
them in the code confuses the reader, who may believe that these statements have
some effect. They should therefore be removed.
Removes unnecessary explicit 0
argument from range
for i in range(0, len(x)):
do_t()
for i in range(len(x)):
do_t()
The default starting value for a call to range()
is 0, so it is unnecessary
to explicitly define it. This refactoring removes such zeros, slightly
shortening the code.
Replaces an unused index in a for
loop with an underscore
for hat in my_wardrobe.hats:
shout("Hurrah!")
for _ in my_wardrobe.hats:
shout("Hurrah!")
Sometimes in a for loop we just want some code to run a certain number of times, and don't actually make use of the index variable.
In the above example we have introduced a new variable, hat
, which we have to note when reading the code, but
actually we don't need it and could replace it with _
:
It is a convention in Python to use _
as a throwaway name for unused variables. This
means your brain can learn to safely ignore these, reducing the overhead to understand
the code. Where you see this in a for
loop it it immediately clear that the loop
is just used to repeat a block of code and we don't care about the value being
iterated over.
Replaces a while
loop with a counter by a for
loop
i = 0
while i < 10:
print("I love hats")
i += 1
for i in range(10):
print("I love hats")
We often need to iterate over the same bit of code a certain number of times.
Developers who are unfamiliar with Python may consider using a while
loop
for this, but it's almost certainly better to use for
.
Converting from while
to for
means the explicit handling of the counter
can be removed, shortening the code and making it esier to read.
Return booleans directly rather than wrapping them in a conditional that returns True
or False
def function(a, b):
if isinstance(a, b) or issubclass(b, a):
return True
return False
def function(a, b):
return isinstance(a, b) or issubclass(b, a)
Where we reach the end of a method and want to
return True
or False
, a common way of doing so to explicity return them.
However, it's neater just to return the result directly.
This can only be done if the expression evaluates to a boolean. In this example:
def any_hats(self, wardrobe):
hats = [item for item in wardrobe if is_hat(item)]
if hats or self.wearing_hat():
return True
return False
We can't do this exact refactoring, since now we could be returning the list of hats rather than True
or False
.
To make sure we're returning a boolean, we can wrap the return in a call to bool()
:
def any_hats(self, wardrobe):
hats = [item for item in wardrobe if is_hat(item)]
return bool(hats or self.wearing_hat())
Replaces sets created with for
loops with set comprehensions
cubes = set()
for i in range(20):
cubes.add(i**3)
cubes = {i**3 for i in range(20)}
A set comprehension can create the set on one line, cutting out the clutter of declaring an empty set and then adding values.
Turning three lines of code into one is a definite win - it means less scrolling back and forth when reading methods and helps keep things manageable.
Squeezing code onto one line can make it more difficult to read, but for comprehensions this isn't the case. All of the elements that you need are nicely presented, and once you are used to the syntax it is actually more readable than the for loop version.
Another point is that the assignment is now more of an atomic operation - we're declaring what cubes
is
rather than giving instructions on how to build it. This makes the code read like more of
a narrative, since going forward we will care more about what cubes
is than the details
of its construction.
Finally comprehensions will usually execute more quickly than building the collection in a loop, which is another factor if performance is a consideration.
Removes unnecessarily verbose boolean comparisons
need_hat = not is_raining
if need_hat == True:
put_on_hat()
need_hat = not is_raining
if need_hat:
put_on_hat()
It is unnecessary to compare boolean values to True
or False
in the test
of an if
condition. Removing these unnecessary checks makes the code slightly
shorter and easier to parse.
An identity generator (a for a in coll)
can be replaced directly with the collection coll
if any(hat for hat in wardrobe.hats):
print("I have a hat!")
if any(wardrobe.hats):
print("I have a hat!")
The expression (x for x in y)
is a generator that returns all of the elements
of y. If being passed into a function like any
or all
that takes a generator or
sequence, it can simply be replaced by y
which is much clearer.
Removes unnecessarily verbose length comparisons
if len(list_of_hats) > 0:
hat_to_wear = choose_hat(list_of_hats)
if list_of_hats:
hat_to_wear = choose_hat(list_of_hats)
Something we often do is check whether a list or sequence has elements before we try and do something with it.
A Pythonic way of doing this is just to use the fact that Python lists and sequences evaluate
to True
if they have elements, and False
otherwise.
Doing it this way is a convention, set out in Python's PEP8 style guide. Once you've gotten used to doing it this way it does make the code slightly easier to read and a bit less cluttered.
Replaces a[len(a)-1]
with negative index lookup a[-1]
a = [1, 2, 3]
last_element = a[len(a) - 1]
a = [1, 2, 3]
last_element = a[-1]
In Python you can access the end of a list by using negative indices. So my_list[-1]
gets the
final element, my_list[-2]
gets the penultimate one and so on.
Once you know the trick, reading code that uses it is much easier than the
alternative.
Consolidates any mathematical operations in a numeric comparison so there is a direct comparison of variable to number
if a + 1 < 2:
do_x()
if a < 1:
do_x()
This refactoring removes unnecessary clutter from the code and makes it easier to determine what a variable is being compared to.
Splits out conditions combined with an or
in an if
statement into their own if
statement.
if a or b:
c()
if a:
c()
elif b:
c()
It is logically equivalent to transform an or
condition into an if
branch
and an elif
branch of a conditional. This is not prima facie an improvement
to the code, but it can unlock further improvements, and Sourcery will only
make this change if such improvements are possible.
Replaces cases of a variable being multiplied by itself with squaring that variable
def square(variable: int):
return variable*variable
def square(variable: int):
return variable**2
This identity can make the code slightly shorter and easier to read.
Replaces summed values created with for
loops with sum
comprehensions
total = 0
for hat in hats:
total += hat.price
total = sum(hat.price for hat in hats)
Much of programming is about adding up lists of things, and Python has the built-in sum()
function
to help with this.
This is much shorter, which is a definite bonus. The code also now explicitly tells you what it is trying to do - sum the price of all the hats.
Swaps if
and else
branches of conditionals
if location == OUTSIDE:
pass
else:
take_off_hat()
if location != OUTSIDE:
take_off_hat()
else:
pass
One pattern we sometimes see is a conditional where nothing happens in the main body,
and all of the action is in the else
clause.
In this case we can make the code shorter and more concise by swapping the main body
and the else
around. We have to make sure to invert the conditional, then the logic
from the else
clause moves into the main body.
We then have an else
clause which does nothing, so we can remove it.
if location != OUTSIDE:
take_off_hat()
This is easier to read, and the intent of the conditional is clearer. When reading the code I don't have to mentally invert it to understand it, since that has been done for me.
Sourcery will also make this change if the else
can be dropped since the body
of the if
is a guard condition.
Swaps the order of nested if
statements
if a:
if b:
return c
if b:
if a:
return c
This refactoring identity will not improve code quality on its own, but can be part of a series of steps that will improve quality. Sourcery will only suggest this refactoring where it unlocks further improvements.
Simplify conditionals into a form more like a switch
statement
if item.a == 1:
q += 2
elif item.a == 2:
q += 2
elif item.a == 4:
q = 0
if item.a in [1, 2]:
q += 2
elif item.a == 4:
q = 0
This refactoring examines complex conditionals where a variable is being compared to various different values, and tries to put them into the simplest form possible, eliminating duplicated blocks.
Uses a variable that was previously defined in the function instead of repeating what was defined in the variable
wardrobe = {'blue_hat': 1, 'red_hat':3}
for item in wardrobe:
count = wardrobe[item]
add_to_total(wardrobe[item])
wardrobe = {'blue_hat': 1, 'red_hat':3}
for item in wardrobe:
count = wardrobe[item]
add_to_total(count)
Where possible, it is preferable to re-use local variables that have been assigned to. These variables will often have a more descriptive name that can aid in code comprehension, and additionally re-use can reduce duplication.
Use any
rather than a for
loop
found = False
for hat in hats:
if hat == SOMBRERO:
found = True
break
found = any(hat == SOMBRERO for hat in hats)
A common pattern is that we need to find if some condition holds for one or all of the items in a collection.
Using Python's any()
and all()
built in functions is a more concise way
of doing this than using a for
loop.
any()
will return True
when at least one of the elements evaluates to True
,
all()
will return True
only when all the elements evaluate to True
.
These will also short-circuit execution where possible. If the call to any()
finds an element that evalutes to True
it can return immediately. This can lead to
performance improvements if the code wasn't already short-circuiting.
Replaces sum()
with count()
where appropriate
return sum(hat == "bowler" for hat in hats)
return hats.count("bowler")
This simplification can be made where a call to sum()
is counting the number
of elements that match a condition.
Use dictionary.items()
in for
loops to access both key and value at same time
hats_by_colour = {'blue': ['panama', 'baseball_cap']}
for hat_colour in hats_by_colour:
hats = hats_by_colour[hat_colour]
if hat_colour in FAVOURITE_COLOURS:
think_about_wearing(hats)
hats_by_colour = {'blue': ['panama', 'baseball_cap']}
for hat_colour, hats in hats_by_colour.items():
if hat_colour in FAVOURITE_COLOURS:
think_about_wearing(hats)
When iterating over a dictionary a good tip is that you can use items()
to let
you access the keys and values at the same time.
This saves us the line that we used to assign to hats
, incorporating it into the for loop.
The code now reads more naturally, with a touch less duplication.
Use str.join()
instead of for
loop
all_hats = ""
for hat in my_wardrobe.hats:
all_hats += hat.description
all_hats = "".join(hat.description for hat in my_wardrobe.hats)
The most straightforward way to concatenate strings in Python is to just use the +
operator:
hat_description = hat.colour + hat.type
This is perfectly fine when you are joining together small numbers of strings (though f-strings are the best choice for doing more complicated string handling).
The problem with using +
or +=
comes when they are used to concatenate large
lists of strings. For example you might use them in a for loop like the 'Before'
example.
This is cumbersome to read, and also isn't very performant. A new string has to be
created for every iteration in the for loop, which slows things down.
Luckily Python strings come with the join
method to solve this problem.
This accomplishes the same task in one line and is quite a lot faster to boot. You can also add a separator in between each string easily, without having to worry about extra separators being added at the beginning or end of the result.
all_hats = ", ".join(hat.description for hat in my_wardrobe.hats)
Replaces sum()
with len
where appropriate
if sum(1 for hat in hats) > 0:
self.shout("I have hats")
if len(hats) > 0:
self.shout("I have hats")
This simplification can be made where the sum()
call is effectively just
determining the length of a sized object. It can only be made where Sourcery
can determine that hats
supports the __len__()
method.
Replaces yield
as part of for
loops with yield from
def get_content(entry):
for block in entry.get_blocks():
yield block
def get_content(entry):
yield from entry.get_blocks()
One little trick that often gets missed is that Python's yield
keyword has
a corresponding yield from
for collections, so there's no need to iterate
over a collection with a for loop. This makes the code slightly shorter and removes
the mental overhead and extra variable used by the for loop.
Eliminating the for loop also makes the yield from
version about 15% faster.
Note that this cannot be done on async
functions.
Please visit our newer docs at https://docs.sourcery.ai