Vault7: CIA Hacking Tools Revealed
Navigation: » Latest version
Owner: User #26968069
Python Coding Conventions
('toc' missing)
Introduction
This style guide is intended to help standardize the code generated by teams working at Oceans Edge. Code of a consistent format leads to greater legibility.
When extending an existing project that does not already follow this guide, strive for consistency. Speak the local dialect. Aim for a uniformity in approach, and make the job of reading your code easier for the next person (it may be future you).
This guide is not intended to be a book to be thrown at others during code review. Instead, if there is a question about how something should be formatted, this guide will be the objective reference on what code "should" look like. Much of the text here is from the Google Python Style Guide. It has been adapted to suit our needs and preferences. This guide is not holy canon. When in doubt, do what makes the most sense for your particular project or use case.
Language Rules
Imports
Use imports for packages and modules only.
The namespace management convention is simple. The source of each identifier is indicated in a consistent way; x.Obj says that object Obj is defined in module x.
Use import x for importing packages and modules. Use from x import y where x is the package prefix and y is the module name with no prefix. Use from x import y as z if two modules named y are to be imported or if y is an inconveniently long name.
Do not use relative names in imports. Even if the module is in the same package, use the full package name. This helps prevent unintentionally importing a package twice.
Exceptions
Exceptions are the preferred mechanism for reporting and handling error conditions.
Definition
Exceptions are a means of breaking out of the normal flow of control of a code block to handle errors or other exceptional conditions.
Pros
The control flow of normal operation code is not cluttered by error-handling code. It also allows the control flow to skip multiple frames when a certain condition occurs, e.g., returning from N nested functions in one step instead of having to carry-through error codes.
Cons
May cause the control flow to be confusing. Easy to miss error cases when making library calls.
Decision
Exceptions must follow certain conditions:
- Raise exceptions like this: raise MyException('Error message') or raise MyException. Do not use the two-argument form (raise MyException, 'Error message') or deprecated string-based exceptions (raise 'Error message').
- Derive exceptions from Exception rather than BaseException
- Modules or packages should define their own domain-specific base exception class, which should inherit from the built-in Exception class. The base exception for a module should be called Error.
class Error(Exception):
pass
- Never use catch-all except: statements, or catch Exception or StandardError, unless you are re-raising the exception or in the outermost block in your thread (and printing an error message). Python is very tolerant in this regard andexcept: will really catch everything including misspelled names, sys.exit() calls, Ctrl+C interrupts, unittest failures and all kinds of other exceptions that you simply don't want to catch.
- Minimize the amount of code in a try/except block. The larger the body of the try, the more likely that an exception will be raised by a line of code that you didn't expect to raise an exception. In those cases, the try/except block hides a real error.
- Use the finally clause to execute code whether or not an exception is raised in the try block. This is often useful for cleanup, i.e., closing a file.
- Exceptions are for handling "exceptional" (uncommon) conditions, and not for typical control flow manipulation.
- Use exception chaining appropriately. In Python 3, "raise X from Y" should be used to indicate explicit replacement without losing the original traceback.
- When capturing an exception, use as rather than a comma. For example:
try:
raise Error
except Error as error:
pass
Global Variables
Avoid global variables.
Definition
Variables that are declared at the module level.
Pros
Occasionally useful.
Cons
Has the potential to change module behavior during the import, because assignments to module-level variables are done when the module is imported.
Decision
Avoid global variables in favor of class variables. Some exceptions are:
- Default options for scripts.
- Module-level constants. For example: PI = 3.14159. Constants should be named using all caps with underscores; see Naming below.
- It is sometimes useful for globals to cache values needed or returned by functions.
- If needed, globals should be made internal to the module and accessed through public module level functions; see Naming below.
Nested/Local/Inner Classes and Functions
Nested/local/inner classes and functions are fine.
A class can be defined inside of a method, function, or class. A function can be defined inside a method or function. Nested functions have read-only access to variables defined in enclosing scopes.
List Comprehensions
Okay to use for simple cases.
List comprehensions and generator expressions provide a concise and efficient way to create lists and iterators without resorting to the use of map(), filter(), or lambda.
Pros
Simple list comprehensions can be clearer and simpler than other list creation techniques. Generator expressions can be very efficient, since they avoid the creation of a list entirely.
Cons
Complicated list comprehensions or generator expressions can be hard to read.
Decision
Okay to use for simple cases. Each portion must fit on one line: mapping expression, for clause, filter expression. Multiple for clauses or filter expressions are not permitted. Use loops instead when things get more complicated.
Default Iterators and Operators
Use default iterators and operators for types that support them, like lists, dictionaries, and files.
Definition
Container types, like dictionaries and lists, define default iterators and membership test operators ("in" and "not in").
Pros
The default iterators and operators are simple and efficient. They express the operation directly, without extra method calls. A function that uses default operators is generic. It can be used with any type that supports the operation.
Cons
You can't tell the type of objects by reading the method names (e.g. has_key() means a dictionary). This is also an advantage.
Decision
Use default iterators and operators for types that support them, like lists, dictionaries, and files. The built-in types define iterator methods, too. Prefer these methods to methods that return lists, except that you should not mutate a container while iterating over it.
#Yes
for key in adict: ...
if key not in adict: ...
if obj in alist: ...
for line in afile: ...
for k, v in dict.iteritems(): ...
#No
for key in adict.keys(): ...
if not adict.has_key(key): ...
for line in afile.readlines(): ...
Generators
Use generators as needed.
Use "Yields:" rather than "Returns:" in the doc string for generator functions.
Lambda Functions
Okay to use them for one-liners. If the code inside the lambda function is any longer than 60–80 chars, it's probably better to define it as a regular (nested) function.
For common operations like multiplication, use the functions from the operator module instead of lambda functions. For example, prefer operator.mul to lambda x, y: x * y.
Conditional Expressions
Conditional expressions are mechanisms that provide a shorter syntax for if statements. For example: x = 1 if cond else 2.
Okay to use for one-liners. In other cases prefer to use a complete if statement.
Default Argument Values
Okay in most cases.
You can specify values for variables at the end of a function's parameter list, e.g., def foo(a, b=0):. If foo is called with only one argument, b is set to 0. If it is called with two arguments, b has the value of the second argument.
Do not use mutable objects as default values in the function or method definition.
#Yes
def foo(a, b=None):
if b is None:
b = []
#No, no and no
def foo(a, b=[]):
...
def foo(a, b=time.time()): # The time the module was loaded???
...
def foo(a, b=FLAGS.my_thing): # sys.argv has not yet been parsed...
...
Properties
Use properties for accessing or setting data where you would normally have used simple, lightweight accessor or setter methods.
Use properties in new code to access or set data where you would normally have used simple, lightweight accessor or setter methods. Read-only properties should be created with the @property decorator.
Inheritance with properties can be non-obvious if the property itself is not overridden. Thus one must make sure that accessor methods are called indirectly to ensure methods overridden in subclasses are called by the property (using the Template Method DP).
True/False Evaluations
Python evaluates certain values as false when in a boolean context. A quick "rule of thumb" is that all "empty" values are considered false so 0, None, [], {}, '' all evaluate as false in a boolean context.
Avoid the "implicit" false. Most of the team comes from a C background, and this is a jarring pattern.
- Never use == or != to compare singletons like None. Use is or is not.
- Beware of writing if x: when you really mean if x is not None:—e.g., when testing whether a variable or argument that defaults to None was set to some other value. The other value might be a value that's false in a boolean context!
- Never compare a boolean variable to False using ==. Use if not x: instead. If you need to distinguish False from None then chain the expressions, such as if not x and x is not None:.
- For sequences (strings, lists, tuples), avoiding using the fact that empty sequences are false, so if not len(seq): or if len(seq): is preferable to if seq: or if not seq:.
-
Use is not operator rather than not ... is . While both expressions are functionally identical, the former is more readable and preferred.
#Yes if foo is not None: #No if not foo is None:
String Comparison
Use ''.startswith() and ''.endswith() instead of string slicing to check for prefixes or suffixes.
if foo.startswith('bar'): # Yes
if foo[:3] == 'bar': # No
Object Type Comparison
Object type comparisons should always use isinstance() instead of comparing types directly.
if isinstance(obj, int): # Yes
if type(obj) is type(1): # No
Lexical Scoping
A nested Python function can refer to variables defined in enclosing functions, but can not assign to them. Variable bindings are resolved using lexical scoping, that is, based on the static program text. Any assignment to a name in a block will cause Python to treat all references to that name as a local variable, even if the use precedes the assignment. If a global declaration occurs, the name is treated as a global variable.
An example of the use of this feature is:
def get_adder(summand1):
"""Returns a function that adds numbers to a given number."""
def adder(summand2):
return summand1 + summand2
return adder
Function and Method Decorators
Decorators for Functions and Methods (a.k.a "the @ notation"). The most common decorators are @classmethod and @staticmethod, for converting ordinary methods to class or static methods.
Pros
Elegantly specifies some transformation on a method; the transformation might eliminate some repetitive code, enforce invariants, etc.
Cons
Decorators can perform arbitrary operations on a function's arguments or return values, resulting in surprising implicit behavior. Additionally, decorators execute at import time. Failures in decorator code are pretty much impossible to recover from.
Decision
Use decorators judiciously when there is a clear advantage. Decorators should follow the same import and naming guidelines as functions. Decorator pydoc should clearly state that the function is a decorator. Write unit tests for decorators.
Avoid external dependencies in the decorator itself (e.g. don't rely on files, sockets, database connections, etc.), since they might not be available when the decorator runs (at import time, perhaps from pydoc or other tools). A decorator that is called with valid parameters should (as much as possible) be guaranteed to succeed in all cases.
Threading
Do not rely on the atomicity of built-in types.
While Python's built-in data types such as dictionaries appear to have atomic operations, there are corner cases where they aren't atomic (e.g. if _hash_ or _eq_ are implemented as Python methods) and their atomicity should not be relied upon. Neither should you rely on atomic variable assignment (since this in turn depends on dictionaries).
Use the Queue module's Queue data type as the preferred way to communicate data between threads. Otherwise, use the threading module and its locking primitives. Learn about the proper use of condition variables so you can use threading.Condition instead of using lower-level locks.
Best Practices
Power Features
Avoid these features.
Python is an extremely flexible language and gives you many fancy features such as metaclasses, access to bytecode, on-the-fly compilation, dynamic inheritance, object reparenting, import hacks, reflection, modification of system internals, etc.
It's very tempting to use these "cool" features when they're not absolutely necessary. It's harder to read, understand, and debug code that's using unusual features underneath. It doesn't seem that way at first (to the original author), but when revisiting the code, it tends to be more difficult than code that is longer but is straightforward.
Style Rules
Semicolons
Do not terminate your lines with semi-colons and do not use semi-colons to put two commands on the same line.
Line Length
Each line of text in your code should be at most 100 characters long.
Exceptions
- If a comment line contains an example command or a literal URLUniform Resource Locator longer than 100 characters, that line may be longer than 100 characters for ease of cut and paste
- A raw-string literal may have content that exceeds 100 characters. Except for test code, such literals should appear near top of a file.
- An import statement with a long path may exceed 100 columns.
Do not use backslash line continuation.
Make use of Python's implicit line joining inside parentheses, brackets and braces. If necessary, you can add an extra pair of parentheses around an expression.
if (width == 0 and height == 0 and
color == 'red' and emphasis == 'strong'):
When a literal string won't fit on a single line, use parentheses for implicit line joining.
x = ('This will build a very long long '
'long long long long long long string')
Within comments, put long URLs on their own line if necessary.
Yes: # See details at http://www.example.com/us/developer/documentation/api/content/v2.0/csv_file_name_extension_full_specification.html
No: # See details at
http://www.example.com/us/developer/documentation/api/content/\
v2.0/csv_file_name_extension_full_specification.html
Make note of the indentation of the elements in the line continuation examples above; see the indentation section for explanation.
Parentheses
Use parentheses sparingly.
Do not use them in return statements or conditional statements unless using parentheses for implied line continuation. (See above.) It is however fine to use parentheses around tuples.
Yes: if foo:
bar()
while x:
x = bar()
if x and y:
bar()
if not x:
bar()
return foo
for (x, y) in dict.items(): ...
No: if (x):
bar()
if not(x):
bar()
return (foo)
Indentation
Indent your code blocks with 4 spaces.
Never use tabs or mix tabs and spaces. In cases of implied line continuation, you should align wrapped elements either vertically, as per the examples in the line length section; or using a hanging indent of 4 spaces, in which case there should be no argument on the first line.
#Aligned with opening delimiter
foo = long_function_name(var_one, var_two,
var_three, var_four)
#Aligned with opening delimiter in a dictionary
foo = {
long_dictionary_key: value1 +
value2,
...
}
#4-space hanging indent in a dictionary
foo = {
long_dictionary_key:
long_dictionary_value,
...
}
#space hanging indent forbidden
foo = long_function_name(
var_one
var_two)
#No hanging indent in a dictionary
foo = {
long_dictionary_key:
long_dictionary_value,
...
}
Blank Lines
Two blank lines between top-level definitions, one blank line between method definitions.
Two blank lines between top-level definitions, be they function or class definitions. One blank line between method definitions and between the class line and the first method. Use single blank lines as you judge appropriate within functions or methods.
Whitespace
Follow standard typographic rules for the use of spaces around punctuation.
No whitespace inside parentheses, brackets or braces.
spam(ham[1], {eggs: 2}, []) # OK
spam( ham[ 1 ], { eggs: 2 }, [ ] ) # Bad
No whitespace before a comma, semicolon, or colon. Do use whitespace after a comma, semicolon, or colon except at the end of the line.
#Yes
if x == 4:
print x, y
x, y = y, x
#No
if x == 4 :
print x , y
x , y = y , x
No whitespace before the open paren/bracket that starts an argument list, indexing or slicing.
spam(1) # OK
spam (1) # Bad
dict['key'] = list[index] # OK
dict ['key'] = list [index] # Bad
Surround binary operators with a single space on either side for assignment (=), comparisons (==, <, >, !=, <>, <=, >=, in, not in, is, is not), and Booleans (and, or, not). Use your better judgment for the insertion of spaces around arithmetic operators but always be consistent about whitespace on either side of a binary operator.
x == 1 # OK
x<1 # Bad
Don't use spaces around the '=' sign when used to indicate a keyword argument or a default parameter value.
def complex(real, imag=0.0): return magic(r=real, i=imag): # OK
def complex(real, imag = 0.0): return magic(r = real, i = imag) # Bad
Shebang Line
Most .py files do not need to start with a #! line. Start the main file of a program with #!/usr/bin/env python with an optional single digit 2 or 3 suffix.
This line is used by the kernel to find the Python interpreter, but is ignored by Python when importing modules. It is only necessary on a file that will be executed directly.
Comments
Be sure to use the right style for module, function, method and in-line comments.
Doc Strings
Python has a unique commenting style using doc strings. A doc string is a string that is the first statement in a package, module, class or function. These strings can be extracted automatically through the _doc_ member of the object and are used by pydoc. (Try running pydoc on your module to see how it looks.) We always use the three double-quote """ format for doc strings (per PEP 257). A doc string should be organized as a summary line (one physical line) terminated by a period, question mark, or exclamation point, followed by a blank line, followed by the rest of the doc string starting at the same cursor position as the first quote of the first line. There are more formatting guidelines for doc strings below.
Modules
Comments at the module-level are not mandatory.
Functions and Methods
As used in this section "function" applies to methods, function, and generators.
A function should have a docstring, unless it meets all of the following criteria:
- not externally visible
- very short
- obvious
Classes
Classes should have a doc string below the class definition describing the class. If your class has public attributes, they should be documented here.
Block and Inline Comments
The final place to have comments is in tricky parts of the code. If you're going to have to explain it at the next code review, you should comment it now. Complicated operations get a few lines of comments before the operations commence. Non-obvious ones get comments at the end of the line.
#We use a weighted dictionary search to find out where i is in
#the array. We extrapolate position based on the largest num
#in the array and the array size and then do binary search to
#get the exact number.
if i & (i-1) == 0: # true iff i is a power of 2
To improve legibility, these comments should be at least 2 spaces away from the code.
On the other hand, never describe the code. Assume the person reading the code knows Python (though not what you're trying to do) better than you do.
# BAD COMMENT: Now go through the b array and make sure whenever i occurs
# the next element is i+1
Classes
In Python 2.x code, if a class inherits from no other base classes, explicitly inherit from object. This also applies to nested classes.
Inheriting from object is needed to make properties work properly, and it will protect your code from one particular potential incompatibility with Python 3.x. It also defines special methods that implement the default semantics of objects including _new, __init, __delattr, __getattribute, __setattr, __hash, __repr, and __str_.
Strings
Use the format method or the % operator for formatting strings, even when the parameters are all strings. Use your best judgement to decide between + and % (or format) though.
#Yes
x = a + b
x = '%s, %s!' % (imperative, expletive)
x = '{}, {}!'.format(imperative, expletive)
x = 'name: %s; score: %d' % (name, n)
x = 'name: {}; score: {}'.format(name, n)
#No
x = '%s%s' % (a, b) # use + in this case
x = '{}{}'.format(a, b) # use + in this case
x = imperative + ', ' + expletive + '!'
x = 'name: ' + name + '; score: ' + str(n)
Avoid using the + and += operators to accumulate a string within a loop. Since strings are immutable, this creates unnecessary temporary objects and results in quadratic rather than linear running time. Instead, add each substring to a list and ''.join the list after the loop terminates (or, write each substring to a io.BytesIO buffer).
#Yes
items = ['<table>']
for last_name, first_name in employee_list:
items.append('<tr><td>%s, %s</td></tr>' % (last_name, first_name))
items.append('</table>')
employee_table = ''.join(items)
#No
employee_table = '<table>'
for last_name, first_name in employee_list:
employee_table += '<tr><td>%s, %s</td></tr>' % (last_name, first_name)
employee_table += '</table>'
Be consistent with your choice of string quote character within a file. Pick ' or " and stick with it. It is okay to use the other quote character on a string to avoid the need to \ escape within the string.
Prefer """ for multi-line strings rather than '''. Note that it is often cleaner to use implicit line joining since multi-line strings do not flow with the indentation of the rest of the program.
Files and Sockets
Explicitly close files and sockets when done with them.
Leaving files, sockets or other file-like objects open unnecessarily has many downsides, including:
- They may consume limited system resources, such as file descriptors. Code that deals with many such objects may exhaust those resources unnecessarily if they're not returned to the system promptly after use.
- Holding files open may prevent other actions being performed on them, such as moves or deletion.
- Files and sockets that are shared throughout a program may inadvertently be read from or written to after logically being closed. If they are actually closed, attempts to read or write from them will throw exceptions, making the problem known sooner.
Furthermore, while files and sockets are automatically closed when the file object is destructed, tying the life-time of the file object to the state of the file is poor practice, for several reasons:
- There are no guarantees as to when the runtime will actually run the file's destructor. Different Python implementations use different memory management techniques, such as delayed Garbage Collection, which may increase the object's lifetime arbitrarily and indefinitely.
- Unexpected references to the file may keep it around longer than intended (e.g. in tracebacks of exceptions, inside globals, etc).
The preferred way to manage files is using the "with" statement:
with open("hello.txt") as hello_file:
for line in hello_file:
print line
For file-like objects that do not support the "with" statement, use contextlib.closing():
import contextlib
with contextlib.closing(urllib.urlopen("http://www.python.org/")) as front_page:
for line in front_page:
print line
TODO Comments
Use TODO comments for code that is temporary, a short-term solution, or good-enough but not perfect.
TODOs should include the string TODO in all caps. The main purpose is to have a consistent TODO that can be searched to find out how to get more details upon request. If code is being reviewed with TODO, expect to answer why it is still present. Most commonly, this is used to denote where some out-of-scope addition or enhancement will be added.
If your TODO is of the form "At a future date do something" this probably belongs in a ticket.
Imports Formatting
Imports should be on separate lines.
#Yes
import os
import sys
#No
import os, sys
Imports are always put at the top of the file, just after any module comments and doc strings and before module globals and constants. Imports should be grouped with the order being most generic to least generic:
- standard library imports
- third-party imports
- application-specific imports
Within each grouping, imports should be sorted lexicographically, ignoring case, according to each module's full package path.
import foo
from foo import bar
from foo.bar import baz
from foo.bar import Quux
from Foob import ar
Statements
Generally only one statement per line.
Access Control
If an accessor function would be trivial you should use public variables instead of accessor functions to avoid the extra cost of function calls in Python. When more functionality is added you can use property to keep the syntax consistent.
On the other hand, if access is more complex, or the cost of accessing the variable is significant, you should use function calls (following the Naming guidelines) such as get_foo() and set_foo(). If the past behavior allowed access through a property, do not bind the new accessor functions to the property. Any code still attempting to access the variable by the old method should break visibly so they are made aware of the change in complexity.
Naming
module_name, package_name, ClassName, method_name, ExceptionName, function_name, GLOBAL_CONSTANT_NAME, global_var_name, instance_var_name, function_parameter_name, local_var_name.
Names to Avoid
- single character names except for counters or iterators
- dashes (-) in any package/module name
- _double_leading_and_trailing_underscore_ names (reserved by Python)
Naming Convention
- "Internal" means internal to a module or protected or private within a class.
- Prepending a single underscore () has some support for protecting module variables and functions (not included with import * from). Prepending a double underscore (_) to an instance variable or method effectively serves to make the variable or method private to its class (using name mangling).
- Place related classes and top-level functions together in a module. Unlike Java, there is no need to limit yourself to one class per module.
- Use CapWords for class names, but lower_with_under.py for module names. Although there are many existing modules named CapWords.py, this is now discouraged because it's confusing when the module happens to be named after a class. ("wait – did I write import StringIO or from StringIO import StringIO?")
- Always use self for the first argument to instance methods.
- Always use cls for the first argument to class methods.
- If a function argument's name clashes with a reserved keyword, it is generally better to prepend a single trailing underscore rather than use an abbreviation or spelling corruption. Thus _class is better than clss . (Perhaps better is to avoid such clashes by using a synonym.)
Main
Even a file meant to be used as a script should be importable and a mere import should not have the side effect of executing the script's main functionality. The main functionality should be in a main() function.
In Python, pydoc as well as unit tests require modules to be importable. Your code should always check if _name_ == '_main_' before executing your main program so that the main program is not executed when the module is imported.
def main():
...
if _name_ == '_main_':
main()
All code at the top level will be executed when the module is imported. Be careful not to call functions, create objects, or perform other operations that should not be executed when the file is being pydoced.
Parting Words
Use common sense and BE CONSISTENT.
If you are editing code, take a few minutes to look at the code around you and determine its style. If their comments have little boxes of stars around them, make your comments have little boxes of stars around them too.
The point of having style guidelines is to have a common vocabulary of coding so people can concentrate on what you are saying, rather than on how you are saying it. We present global style rules here so people know the vocabulary. But local style is also important. If code you add to a file looks drastically different from the existing code around it, the discontinuity throws readers out of their rhythm when they go to read it. Try to avoid this.