Suggested reading material: http://www.erlang.se/doc/programming_rules.shtml
Table of Contents:
- Contact Us
- Conventions & Rules
- Source Code Layout
- Maintain existing style
- Spaces over tabs
- Use your spacebar
- No Trailing Whitespace
- 100 column per line
- More, smaller functions over case expressions
- Group functions logically
- Get your types together
- No God modules
- Simple unit tests
- Honor DRY
- Group modules in subdirectories by functionality
- Header files
- Syntax
- Naming
- Strings
- Macros
- Records
- Misc
- Tools
- Source Code Layout
- Suggestions & Great Ideas
- Prefer pattern-matching over testing for equality
- Favor higher-order functions over manual use of recursion
- CamelCase over Under_Score
- Prefer shorter (but still meaningful) variable names
- Comment levels
- Keep functions small
- Use behaviours
- When programming defensively, do so on client side
- Avoid unnecessary calls to length/1
- Move stuff to independent applications
- Use the facade pattern on libraries
- Types in exported functions
- Separate responsibilities in sumo_db
If you find any bugs or have a problem while using this library, please open an issue in this repo (or a pull request :)).
And you can check all of our open-source projects at inaka.github.io
These are "Things that may be used as reason to reject a Pull Request".
When editing a module written by someone else, stick to the style in which it was written. If a project has an overall style, stick to that when writing new modules as well.
Examples: existing_style
Reasoning: It's better to keep a module that just looks ugly to you than to have a module that looks half ugly to you, half ugly to somebody else.
Spaces over tabs, 2 space indentation.
Examples: indent
Reasoning: This is not intended to allow deep nesting levels in the code. 2 spaces are enough if the code is clean enough, and the code looks more concise, allowing more characters in the same line.
Surround operators and commas with spaces.
Examples: spaces
Reasoning: It produces cleaner code that's easier to find / read / etc.
Remove trailing whitespaces from your lines
Examples: trailing_whitespace
Reasoning: It's commit noise. As a reference, check out this long argument.
Stick to 100 chars per line, maximum.
Examples: col_width
Reasoning: Excessively long lines are a pain to deal with: you either have to scroll horizontally while editing, or live with ugly line wrapping at arbitrary points. The 100 character limit also keeps lines short enough that you can comfortably work with two source files side by side on a typical laptop screen, or three on a 1080p display.
Use pattern-maching in clause functions rather than case's. Specially important if the case is:
- the top-level expression of the function
- huge
Examples: smaller_functions
Reasoning: it is usually the case that a case in a function body represents some sort of decision, and functions should be as simple as possible. If each branch of a decision's outcome is implemented as a function clause instead of as a case clause, the decision may be given a meaningful name. In other words, the case is acting as an 'anonymous function', which unless they are being used in the context of a higher-order function, merely obscure meaning.
Try to always separate unexported and exported functions in groups, with the exported ones first, unless it helps readability and code discovery.
Examples: grouping_functions
Reasoning: Well structured code is easier to read/understand/modify.
Place all types at the beginning of the file
Examples: type_placement
Reasoning: Types are used to define data structures that will most likely be used by multiple functions on the module, so their definition can not be tied to just one of them. Besides it's a good practice to place them in code in a similar way as the documentation presents them and edoc puts types at the beginning of each module documentation
Don't design your system using god modules (modules that have a huge number of functions and/or deal with very unrelated things)
Examples: god
Reasoning: God modules, like god objects, are modules that do too much or know too much. God modules usually come into existence by feature accretion. A beautiful, to-the-point module with one job, one responsibility done well, gains a function. Then another, which does the same thing but with different parameters. Then one day, you have a 6000-line module with 500 functions. Having modules (and functions) that do one and only one thing well makes it easy to explore and reason about code, and thus maintain it.
Single responsibility applies to tests as well. When writing unit tests, keep them short and don't put more than 1 or 2 asserts per test
Examples: test_SUITE
Reasoning: Multiple tests can identify multiple errors in one run, if you put all the things you want to test into one test you'll have to fix one thing at a time until the test passes.
Don't write the same code in many places, use functions and variables for that
Examples: dry
Reasoning: This convention is specifically put in this list (instead of treat it as a great idea) so that reviewers can reject PRs that include the same code several times or PRs that re-implement something that they know it's already done somewhere else.
When having lots of modules, use subdirectories for them, named with a nice descriptive name for what that "package" does.
Reasoning: That way it's easier to find what you need and determine what a certain module does.
Note: Remember to properly configure your Emakefile
to handle that, if you use it.
Header files:
- SHOULD NOT include type definitions nor record definitions nor function definitions.
- MAY include macros definitions, although macros should be avoided.
Examples: headers
Reasoning: Type definitions should be located in the modules where the data and its associated functions are defined. In type specs types may be module-prefixed which also makes it clear where the data type is defined, so there is no reason to need to share them via headers.
Including record definitions in header files promotes sharing internal details of those records across modules, increasing coupling and preventing encapsulation, in turn making it more difficult to change and maintain the code. Records should be defined in their own modules which should provide an opaque data type and functions to access and manipulate the record.
Function definitions should most definitely not be included in header files because it leads to code duplication.
Erlang syntax is horrible amirite? So you might as well make the best of it, right? Right?
Don't write spaghetti code (A list comprehension with a case inside, or blocks with begin/end, and nested stuff)
Examples: spaghetti
Reasoning: Spaghetti code is harder to read, understand and edit. The function callgraph for your program should strive to be a directed acyclic graph.
If there is no specific need for it, don't use dynamic function calling.
Examples: dyn_calls
Reasoning: Dynamic calls can't be checked by xref
, one of the most useful tools in the Erlang world. xref
is a cross reference checking/observing tool.
Try not to nest more than 3 levels deep.
Examples: nesting
Reasoning: Nested levels indicate deep logic in a function, too many decisions taken or things done in a single function. This hinders not only readability, but also maintainability (making changes) and debugging, and writing unit tests. See also: More, smaller functions over case expressions.
Don't use
if
.
Examples: no_if
Reasoning: In some circumstances if
introduces static boolean logic in your code, reducing code flexibility. In other cases, a case
or a function call with pattern matching in its clauses is just more declarative. For newcommers (that have learned to use if
in other languages), Erlang's if
can be either hard to understand or easily abused.
Debate:
Don't nest
try…catch
clauses
Examples: nested_try_catch
Reasoning: Nesting try…catch
blocks defeats the whole purpose of them, which is to isolate the code that deals with error scenarios from the nice and shiny code that deals with the expected execution path.
Don't use
throw
andcatch
Examples: catch_and_throw
Reasoning:
On one hand, throw
is not meant to be used to throw exceptions, so it shouldn't be used for that.
On the other hand, throw
is meant to be used for non-local returns. While they might seem to be useful in terms of performance, they tend to produce more complex code that is harder to understand or reason about. Particularly if the result being thrown in one place is caught in another very distant part of the application. You can find more in this conversation at the Erlang Forums:
IMO, it is a really bad style, especially in the context of a functional language. It is returning something through side effects. That may have its uses to quickly break out of deep recursion, but they are rare. I think I never used throw even once. In any case, I think it should not be made “socially acceptable” to return stuff via throw, much less encouraged. (said @Maria-12648430)
I definitely agree. I prefer to use tail-recursive functions instead of non-local returns unless that turns out to be too cumbersome. That is, if the code becomes convoluted written in a tail-recursive way, I will write it in a body-recursive way and use throw to break out of the recursion. I have typically used that in complicated optimization passes in the compiler; that is, the code will do the optimization in recursive functions, but if anything turns up that makes the optimization impossible I will throw a
not_possible
exception to break out of the recursion. (said @bjorng)
Use the same variable name for the same concept everywhere (even in different modules).
Examples: consistency
Reasoning: When trying to figure out all the places where an OrgID
is needed (e.g. if we want to change it from string
to binary
), it's way easier if we can just grep for OrgID
instead of having to check all possible names.
Name your state records
#mod_state
and use-type state():: #mod_state{}
in all your modules that implement OTP behaviors.
Examples: state
Reasoning: OTP behaviours implementations usually require a state, and if it has a recognizable name it makes it more easily identifiable. Defining a type for it, helps dialyzer detect leaks (where an internal type as the state is used outside of the module).
The usage of the module prefix in the record name has the goal of distinguishing different state tuples while debugging: Since records are just tuples when one is dumped into the shell it is easier to read {good_state, att1, attr2}
than {state, attr1, attr2, attr3}
or {state, attr1, att2}
.
At a glance you know that the tuple/record can be found in the good.erl
module.
Variables beginning with _ are still variables, and are matched and bound, the _ just keeps the compiler from warning when you don't use them. If you add the _ to a variable's name, don't use it.
Examples: ignored_vars
Reasoning: They are not supposed to be used.
Don't use boolean parameters (i.e.
true
andfalse
) to control clause selection.
Examples: boolean_params
Reasoning: Clarity of intention and not requiring the reader to check the function definition to understand what it does.
Stick to one convention when naming modules (i.e: ik_something vs iksomething vs something).
Examples: naming_modules
Reasoning: It gives coherence to your system.
Atoms should use only lowercase characters. Words in atom names should be separated with
_
. Special cases are allowed (like'GET'
,'POST'
, etc.) but should be properly justified.
Examples: atoms
Reasoning: Adhering to one convention makes it easier not to have "duplicated" atoms all around the code. Also, not using caps or special characters reduces the need for '
around atoms.
Function names must use only lowercase characters or digits. Words in function names must be separated with
_
.
Examples: function_names
Reasoning: Function names are atoms, they should follow the same rules that apply to them.
CamelCase must be used for variables. Don’t separate words in variables with
_
.
Examples: variable_names
Reasoning: Adhering to one convention makes it easier not to have "duplicated" variables all around the code. Camel-case makes variable names more visually distinguishable from atoms and it matches the OTP standard.
Use iolists instead of string concatenation whenever possible
Examples: iolists
Reasoning: Performance and errors during conversion. iolists are just deeply nested lists of integers and binaries to represent IO data to avoid copying when concatenating strings or binaries.
Don't use macros, except for very specific cases, that include
- Predefined ones:
?MODULE
,?MODULE_STRING
and?LINE
- Literal constants:
?DEFAULT_TIMEOUT
Examples: macros
Reasoning: Macros make code harder to debug. If you're trying to use them to avoid repeating the same block of code over and over, you can use functions for that. See related blog post by @erszcz.
Macros should be named in ALL_UPPER_CASE:
Examples: macro_names
Reasoning: It makes it easier not to duplicate macro names, to find them using grep, etc.
Don't use macros for module or function names
Examples: macro_mod_names
Reasoning: Copying lines of code to the console for debugging (something that happens a lot) becomes a really hard task if we need to manually replace all the macros.
Record names must use only lowercase characters. Words in record names must be separated with
_
. Same rule applies to record field names
Examples: record_names
Reasoning: Record and field names are atoms, they should follow the same rules that apply to them.
Records that are used within a module should be defined before any function bodies.
Examples: record_placement
Reasoning: Records are used to define data types that will most likely be used by multiple functions on the module, so their definition can not be tied to just one. Also, since records will be associated to types, it's a good practice to place them in code in a similar way as the documentation does (and edoc puts types at the beginning of each module documentation)
Records should not be shared among multiple modules. If you need to share objects that are represented as records, use opaque exported types and provide adequate accessor functions in your module.
Examples: record_sharing
Reasoning: Records are used for data structure definitions. Hiding those structures aids encapsulation and abstraction. If a record structure needs to be changed and its definition is written in a .hrl file, the developer should find all the files where that .hrl and verify that his change hasn't broken anything. That's not needed if the record structure is internal to the module that manages it.
Avoid using records in your specs, use types.
Examples: record_spec
Reasoning: Types can be exported, which aids documentation and, using opaque
types it also helps with encapsulation and abstraction.
Always add type definitions to your record fields
Examples: record_types
Reasoning: Records define data structures, and one of the most important parts of that definition is the type of the constituent pieces.
Write the -spec's for your exported fun's, and for unexported fun's when it adds real value for documentation purposes. Define as many types as needed.
Examples: specs
Reasoning: Dialyzer output is complicated as is, and it is improved with good type names. In general, having semantically loaded type names for arguments makes reasoning about possible type failures easier, as well as the function's purpose.
Unless you know your project will be compiled with R14 or lower, use
-callback
instead ofbehavior_info/1
for your behavior definitions.
Examples: callbacks
Reasoning: Avoid deprecated functionality
When sending a message between processes, you should typically either send a single, human-readable atom, or a tuple with a human-readable atom placed in element 1. This includes messages being sent via
gen_server:call
and the like.
Examples: message-formatting
Reasoning: Tagging messages with a distinctive, human-readable atom helps clarify the purpose of a message for anyone reading or debugging the code. Using element 1 of the tuple makes code more consistent and predictable, and improves readability when browsing through multiple clauses of functions like handle_call
.
This pattern also helps avoid bugs where different messages get confused with one another, or where messages get sent to the wrong recipient; it's much easier to find the source of an unexpected message if it looks like {set_foobar_worker_pid, <0.312.0>}
than if you just find a bare pid in your mailbox.
When having many nested "include files", use -ifndef(HEADER_FILE_HRL) .... -endif so they can be included in any order without conflicts.
Examples: nested
Reasoning: -include
directives in included headers may lead to duplication of inclusions and/or other conflicts and it also hides things from the developer view.
No
-type
in hrl files
Examples: types
Reasoning: Defining types in public header files (especially those intended for inclusion via -include_lib()
) might lead to type name clashes between projects and even modules of a single big project.
Instead, types should be defined in modules which they correspond to (with -export_type()
) and this way take advantage of the namespacing offered by module names.
In other words, "no type definitions in header files" rule means that we will always need to use some_mod:some_type()
unless referring to a type from the same module it's defined in.
Following this rule you also get the benefits that -opaque
types provide, for instance, to dialyzer.
Do not use the
-import
directive
Examples: import
Reasoning: Importing functions from other modules makes the code harder to read and debug since you cannot directly distinguish local from external functions. In appropriately named functions, the module is part of the function name, it gives meaning to it.
Do not use the
-compile(export_all)
directive
Examples: export_all
Reasoning: It's generally considered best to only export the specific functions that make up your module's known and documented external API. Keeping this list of functions small and consistent encourages good encapsulation and allows for more aggressive refactoring and internal improvements without altering the experience for those who make use of your module.
Never do raw
gen_server
calls across module boundaries; the call should be encapsulated in an API function in the same module that implements the correspondinghandle_call
function. The same goes for other such OTP constructs (gen_server
casts,gen_fsm
events, etc).
Examples: otp_encapsulation
Reasoning: By sticking to this pattern of encapsulation, we make it much easier to find out where calls/events might originate from.
Instead of having to search through the entire source tree for e.g. gen_server
calls that look like they might send a certain message to a given process, we can just search for calls to the corresponding API function.
This makes it much easier to modify APIs, and also allows us to benefit more from Dialyzer's checks, assuming our API functions have appropriate type specs on them.
We can also change the underlying message format without disturbing any code outside of the module in question, and we can more easily avoid issues with RPC calls when running a mixed cluster.
With good encapsulation, you can even do things like convert a gen_server
to a gen_fsm
without any code changes beyond just the one module.
Unless your project is meant to be run as an escript, there should be no
io:format
norct:pal
calls in your production code (i.e. in the modules inside thesrc
folder). Same rule applies forlager
orerror_logger
calls if they're used just for debugging purposes during test stages.
Examples: debug_calls
Reasoning: Leaving unnecessary logs on production code impacts performance. It increases the processing time for the functions you're debugging and also consumes disk space if the logs are written to a file (as they usually are). Besides, more often than not the log messages are only understood in the context of the test or debugging round in which they were created, therefore the become useless pretty fast.
Don't capture errors with
case catch
, usetry ... of ... catch
instead.
Examples: case-catch
Reasoning: case catch ...
mixes good results with errors which is confusing. By
using try ... of ... catch
the golden path is kept separate from the error
handling.
In your rebar.config or Erlang.mk, specify a tag or commit, but not master.
Examples:
Reasoning: You don't want to be suddenly affected by a change in one of your dependencies. Once you've found the right version for you, stick to it until you need to change.
Don't let errors and exceptions go unlogged. Even when you handle them, write a log line with the stack trace.
Examples: loud_errors
Reasoning: The idea is that somebody watching the logs has enough info to understand what's happening.
When using lager, use the different logging levels with the following meanings:
Meanings:
debug
: Very low-level info, that may cover your screen and don't let you type in it :Pinfo
: The system's life, in some detail. Things that happen usually, but not all the time. You should be able to use the console with acceptable interruptions in this level.notice
: Meaningful things that are worth noticing, like the startup or termination of supervisors or important gen_servers, etc…warning
: Handled errors, the system keeps working as usual, but something out of the ordinary happenederror
: Something bad and unexpected happen, usually an exception or error (DO log the stack trace here)critical
: The system (or a part of it) crashed and somebody should be informed and take action about italert
: There is no rule on when to use this levelemergency
: There is no rule on when to use this level
When specifying dependencies in erlang.mk Makefiles or rebar.config, prefer using the https protocol to download the dependency repository.
Examples:
Reasoning: HTTPS is recommended by GitHub and is easier for CI.
Don't implicitly include all functions from a module when using the mixer library. Explicitly list all mixed-in functions.
Examples: mixer
Reasoning: Knowing all the functions that are included in a module makes it easier to reason about it. If any number of functions are implicitly brought from another module, it introduces an extra level of unnecessary indirection that requires jumping back and forth between files. The less information we have to keep in our heads the better.
Things that should be considered when writing code, but do not cause a PR rejection, or are too vague to consistently enforce.
When you want to write a conditional statement based on a comparison of two values, don't use equality and then switch according to the boolean result value. Use pattern matching instead.
Examples: pattern matching
Reasoning: From a semantic standpoint, boolean switches after equality introduce static boolean logic in your code, reducing its flexibility. Besides, pattern matching is just more declarative. And, specially in the case where there is a function involved, using pattern matching you get a chance to do something with the result of such a function call.
Occasionally recursion is the best way to implement a function, but often a fold or a list comprehension will yield safer, more comprehensible code.
Examples: alternatives to recursion
Reasoning: Manually writing a recursive function is error-prone, and mistakes can be costly. In the wrong circumstances, a buggy recursive function can miss its base case, spiral out of control, and take down an entire node. This tends to counteract one of the main benefits of Erlang, where an error in a single process does not normally cause the entire node to crash.
Additionally, to an experienced Erlang developer, folds and list comprehensions are much easier to understand than complex recursive functions. Such contstructs behave predictably: they always perform an action for each element in a list. A recursive function may work similarly, but it often requires careful scrutiny to verify what path the control flow will actually take through the code in practice.
Symbol naming: Use variables in CamelCase and atoms, function and module names with underscores.
Examples: camel_case
Reasoning: It helps a lot with the next issue in this list ;)
As long as it's easy to read and understand, keep variable names short
Examples: var_names
Reasoning: It helps reducing line lengths, which is also described above
Module comments go with %%%, function comments with %%, and code comments with %.
Examples: comment_levels
Reasoning: It clearly states what the comment is about, also helpful to search for specific comments, like "%% @".
Try to write functions with a small number of expressions, and that do only one thing. 12 expressions per function except for integration tests is a good measure.
Examples: small_funs
Reasoning: From 3 different sources:
- Small functions aid readability and composeability. Readability aids maintainability. This cannot be stressed enough. The smaller your code, the easier it is to fix and change.
- A small function allows one to see its purpose clearly, so that you need to only understand the small subset of operations it performs, which makes it very simple to verify it works correctly.
- These are all compeling reasons:
- a function should do one thing, if it's too large you are likely to be doing work better suited for multiple functions
- clarity, it's easier to see what a function does when it's short and concise
- reuse, keeping them short means you can use them later for something else (specially true for Erlang)
- screen size: you want to be able to see the whole function if you want to connect via ssh to a server for whatever reason
Notes:
This guideline, together with Avoid deep nesting and More, smaller functions over case expressions, can be well followed by structuring your functions as follows:
some_fun(#state{name=foo} = State) ->
do_foo_thing(),
continue_some_fun(State);
some_fun(#state{name=bar} = State) ->
do_bar_thing(),
continue_some_fun(State).
continue_some_fun(State) ->
...,
ok.
Remember:
- There is no cost for a tail call like that.
- This pattern is efficient, compact, clear.
- It "resets" indentation so the code doesn't wander off the right edge of the screen.
Most importantly:
- It's easier to test because the functions delineate the testing hinge points.
- It gives more surface for tracing, so one can get very specific about where the computation goes off the rails. Nested cases are opaque at runtime.
Encapsulate reusable code in behaviors.
Examples: behavior
Reasoning: It's the OTP way ;)
Do validations on the outmost layers of your code.
Examples: validations
Reasoning: One aspect of choosing where want you to crash is how you design your API: A function that checks the input before calling the gen_server behind it will avoid a full roundtrip to the gen_server and maybe even a gen_server crash. do_it(Pid, X) when is_integer(X) -> gen_server:call(Pid, {do_it, X}). If you design this way, the caller crashes if the arg is wrong. If you don't tighten up the function head, the gen_server will crash.
Lots of use cases of length/1 can be replaced by pattern matching, this is specially true when checking if the list has at least one element.
Examples: pattern matching
Reasoning: Pattern matching is one of the core aspects of Erlang and as such it's both performant and readable. Pattern matching is also more flexible so changes to the logic get simpler.
When you identify a block of functionality that is self-contained (it may be several modules or just a big one) and actually independent of the main purpose of your application, place that in a separate application. And consider open-sourcing it.
Reasoning: It's easier to share among apps. If open-sourced, you're sharing it with the community and you get the benefits of the community being involved in it.
Note: Do not create highly specific libraries that are too coupled with the project you're working on. Use this rule for libraries that will likely be reused in other projects.
The facade pattern is great to simplify library usage and serves as a form of self-documentation.
Examples: kafkerl
Reasoning: Having the relevant functions in a single module means that the end user doesn't have a hard time figuring out which functions to call. Note that to avoid making it too complex, you probably want to carefully consider which functionality you wish to support here; exposing fewer functions (the ones that show the basic use of the library) as opposed to just creating a dummy module containing every single exported function in the library is preferred. This greatly reduces the learning curve of the library and therefore makes it more tempting to use.
Custom data types used in exported functions should be defined with Erlang type declarations and exported from the module
Examples: data_types
Reasoning: It helps with function documentation and, when using opaque types, we ensure encapsulation.
When using sumo_db you should separate the responsibilities clearly, creating for each entity:
- one module (usually called MODELs) to describe the entity and allow administrating instances of the model in memory
- one module (usually called MODEL_repo) to handle the various operations that require business logic relating to the entity
Examples: separate responsibilities in sumo_db
Reasoning: By dividing the functions into two different modules we increase understandability of the functionality especially if these are called from external modules. It also allows us to better organize the code and have smaller modules.