Skip to content

Latest commit

 

History

History
602 lines (471 loc) · 30 KB

README.md

File metadata and controls

602 lines (471 loc) · 30 KB

Effective Python: 59 Specific Ways to Write Better Python

Code Sample of Book "Effective Python: 59 Specific Ways to Write Better Python" by Brett Slatkin.

Chapter 1: Pythonic thinking

    1. There are two major version of Python still in active use: Python 2 and Python 3.
    1. There are multiple popular runtimes for Python: CPython, Jython, IronPython, PyPy, etc.
    1. Be sure that the command-line for running Python on your system is the version you expect it to be.
    1. Prefer Python 3 for your next project because that is the primary focus of the Python community.
    1. Always follow the PEP 8 style guide when writing Python code.
    1. Sharing a common style with the larger Python community facilitates collaboration with others.
    1. Using a consistent style makes it easier to modify your own code later.
    1. In Python 3, bytes contains sequences of 8-bit values, str contains sequences of Unicode characters. bytes and str instances can't be used together with operators (like > or +).
    1. In Python 2, str contains sequences of 8-bit values, unicode contains sequences of Unicode characters. str and unicode can be used together with operators if the str only contains 7-bit ASCII characters.
    1. Use helper functions to ensure that the inputs you operate on are the type of character sequence you expect (8-bit values, UTF-8 encoded characters, Unicode characters, etc.)
    1. If you want to read or write binary data to/from a file, always open the file using a binary mode (like 'rb' or 'wb').
    1. Python's syntax makes it all too easy to write single-line expressions that are overly complicated and difficult to read.
    1. Move complex expressions into helper functions, especially if you need to use the same logic repeatedly.
    1. The if/else expression provides a more readable alternative to using Boolean operators like or and adn in expressions.
    1. Avoid being verbose: Don't supply 0 for the start index or the length of the sequence for the end index.
    1. Slicing is forgiving of start or end indexes that are out of bounds, making it easy to express slices on the front or back boundaries of a sequence (like a[:20] or a[-20:]).
    1. Assigning to a list slice will replace that range in the original sequence with what's referenced even if their lengths are different.
    1. Specifying start, end, and stride in a slice can be extremely confusing.
    1. Prefer using positive stride values in slices without start or end indexes. Avoid negative stride values if possible.
    1. Avoid using start, end and stride together in a single slice. If you need all three parameters, consider doing two assignments (one to slice, another to stride) or using islice form itertools built-in module.
    1. List comprehensions are clearer than the map and filter built-in functions because they don't require extra lambda expressions.
    1. List comprehensions allow you easily skip items from the input list, a behavior map doesn't support without help from filter.
    1. Dictionaries and sets also support comprehension expressions.
    1. List comprehensions support multiple levels of loops and multiple conditions per loop level.
    1. List comprehensions with more than two expressions are very difficult to read and should be avoided.
    1. List comprehensions can cause problems for large inputs by using too much memory.
    1. Generator expressions avoid memory issues by producing outputs one at a time as an iterator.
    1. Generator expressions can be composed by passing the iterator from one generator expression into the for subexpression of another.
    1. Generator expressions execute very quickly when chained together.
    1. enumerate provides concise syntax for looping over an iterator and getting the index of each item from the iterator as you go.
    1. Prefer enumerate instead of looping over a range and indexing into a sequence.
    1. You can supply a second parameter to enumerate to specify the number from which to begin counting (zero is default).
    1. The zip built-in function can be used to iterate over multiple iterators in parallel.
    1. In Python 3, zip is a lazy generator that produces tuples. In Python 2, zip returns the full result as a list of tuples.
    1. zip truncates its outputs silently if you supply it with iterators of different lengths.
    1. The zip_longest function from the itertools built-in module lets you iterate over multiple iterators in parallel regardless of their lengths (see Item 46: Use built-in algorithms and data structures).
    1. Python has special syntax that allows else blocks to immediately follow for and while loop interior blocks.
    1. The else block after a loop only runs if the loop body did not encounter a break statement.
    1. Avoid using else blocks after loops because their behavior isn't intuitive and can be confusing.
    1. The try/finally compound statement lets you run cleanup code regardless of whether exceptions were raised in the try block.
    1. The else block helps you minimize the amount of code in try blocks and visually distinguish the success case from the try/except blocks.
    1. An else block can be used to perform additional actions after a successful try block but before common cleanup in a finally block.

Chapter 2: Functions

    1. Functions that return None to indicate special meaning are error prone because None and other values (e.g., zero, the empty string) all evaluate to False in conditional expressions.
    1. Raise exceptions to indicate special situations instead of returning None. Expect the calling code to handle exceptions properly when they are documented.
    1. Closure functions can refer to variables from any of the scopes in which they were defined.
    1. By default, closure can't affect enclosing scopes by assigning variables.
    1. In Python 3, use the nonlocal statement to indicate when a closure can modify a variable in its enclosing scopes.
    1. In Python 2, use a mutable value (like a single-item list) to work around the lack of the nonlocal statement.
    1. Avoid using nonlocal statements for anything beyond simple functions.
    1. Using generators can be clearer than the alternative of returning lists of accumulated results.
    1. The iterator returned by a generator produces the set of values passed to yield expressions within the generator function's body.
    1. Generators can produce a sequence of outputs for arbitrarily large inputs because their working memory doesn't include all inputs and outputs.
    1. Beware of functions that iterate over input arguments multiple times. If these arguments are iterators, you may see strange behavior and missing values.
    1. Python's iterator protocol defines how containers and iterators interact with the iter and next built-in functions, for loops, and related expression.
    1. You can easily define your own iterable container type by implementing the iter method as a generator.
    1. You can detect that a value is an iterator (instead of a container) if calling iter on it twice produces the same result, which can then be progressed with the next built-in function.
    1. Functions can accept a variable number of positional arguments by using *args in the def statement.
    1. You can use the items from a sequence as the positional arguments for a function with the * operator.
    1. Using the * operator with a generator may cause your program to run out of memory and crash.
    1. Adding new positional parameters to functions that accept *args can introduce hard-to-find bugs.
    1. Function arguments can be specified by position or by keyword.
    1. Keywords make it clear what the purpose of each arguments is when it would be confusing with only positional arguments.
    1. Keywords arguments with default values make it easy to add new behaviors to a function, especially when the function has existing callers.
    1. Optional keyword arguments should always be passed by keyword instead of by position.
    1. Closure functions can refer to variables from any of the scopes in which they were defined.
    1. By default, closure can't affect enclosing scopes by assigning variables.
    1. In Python 3, use the nonlocal statement to indicate when a closure can modify a variable in its enclosing scopes.
    1. In Python 2, use a mutable value (like a single-item list) to work around the lack of the nonlocal statement.
    1. Avoid using nonlocal statements for anything beyond simple functions.
    1. Keyword arguments make the intention of a function call more clear.
    1. Use keyword-only arguments to force callers to supply keyword arguments for potentially confusing functions, especially those that accept multiple Boolean flags.
    1. Python 3 supports explicit syntax for keyword-only arguments in functions.
    1. Python 2 can emulate keyword-only arguments for functions by using **kwargs and manually raising TypeError exceptions.

Chapter 3: Classes and Inheritance

    1. Avoid making dictionaries with values that are other dictionaries or long tuples.
    1. Use namedtuple for lightweight, immutable data containers before you need the flexibility of a full class.
    1. Move your bookkeeping code to use multiple helper classes when your internal state dictionaries get complicated.
    1. Instead of defining and instantiating classes, functions are often all you need for simple interfaces between components in Python.
    1. References to functions and methods in Python are first class, meaning they can be used in expressions like any other type.
    1. The call special method enables instances of a class to be called like plain Python functions.
    1. When you need a function to maintain state, consider defining a class that provides the call method instead of defining a stateful closure (see Item 15: "Know how closures interact with variable scope").
    1. Python only supports a single constructor per class, the init method.
    1. Use @classmethod to define alternative constructors for your classes.
    1. Use class method polymorphism to provide generic ways to build and connect concrete subclasses.
    1. Python's standard method resolution order (MRO) solves the problems to superclass initialization order and diamond inheritance.
    1. Always use the super built-in function to initialize parent classes.
    1. Avoid using multiple inheritance if mix-in classes can achieve the same outcome.
    1. Use pluggable behaviors at the instance level to provide per-class customization when mix-in classes may require it.
    1. Compose mix-ins to create complex functionality from simple behaviors.
    1. Private attributes aren't rigorously enforced by the Python compiler.
    1. Plan from the beginning to allow subclass to do more with your internal APIs and attributes instead of locking them out by default.
    1. Use documentation of protected fields to guide subclass instead of trying to force access control with private attributes.
    1. Only consider using private attributes to avoid naming conflicts with subclasses that are out of your control.
    1. Inherit directly from Python's container types (like list or dict) for simple use cases.
    1. Beware of the large number of methods required to implement custom container types correctly.
    1. Have your custom container types inherit from the interface defined in collections.abc to ensure that your classes match required interfaces and behaviors.

Chapter 4: Metaclasses and Attributes

    1. Define new class interfaces using simple public attributes, and avoid set and get methods.
    1. Use @property to define special behavior when attributes are accessed on your objects, if necessary.
    1. Follow the rule of least surprise and void weird side effects in your @property methods.
    1. Ensure that @property methods are fast; do slow or complex work using normal methods.
    1. Use @property to give existing instance attributes new functionality.
    1. Make incremental progress toward better data models by using @property.
    1. Consider refactoring a class and all call sites when you find yourself using @property too heavily.
    1. Reuse the behavior and validation of @property methods by defining your own descriptor classes.
    1. Use WeakKeyDictionary to ensure that your descriptor classes don't cause memory leaks.
    1. Don't get bogged down trying to understand exactly how getattribute uses the descriptor protocol for getting and setting attributes.
    1. Use getattr and setattr to lazily load and save attributes for an object.
    1. Understand that getattr only gets called once when accessing a missing attribute, whereas getattribute gets called every time an attribute is accessed.
    1. Avoid infinite recursion in getattribute and setattr by using methods from super() (i.e., the object class) to access instance attributes directly.
    1. Use metaclasses to ensure that subclass are well formed at the time they are defined, before objects of their type are constructed.
    1. Metaclass have slightly different syntax in Python 2 vs. Python 3.
    1. The new method of metaclasses is run after the class statement's entire body has been processed.
    1. Class registration is a helpful pattern for building modular Python programs.
    1. Metaclass let you run registration code automatically each time your base class is subclassed in a program.
    1. Using metaclass for class registration avoids errors by ensuring that you never miss a registration call.
    1. Metaclass enable you to modify a class's attributes before the class is fully defined.
    1. Descriptors and metaclasses make a powerful combination for declarative behavior and runtime introspection.
    1. You can avoid both memory leaks and the weakref module by using metaclasses along with descriptors.

Chapter 5: Concurrency and parallelism

    1. Use the subprocess to run child processes and manage their input and output streams.
    1. Child processes run in parallel with the Python interpreter, enabling you to maximize your CPU usage.
    1. Use the timeout parameter with communicate to avoid deadlocks and hanging child processes.
    1. Python threads can't bytecode in parallel on multiple CPU cores because of the global interpreter lock (GIL).
    1. Python threads are still useful despite the GIL because they provide an easy way to do multiple things at seemingly the same time.
    1. Use Python threads to make multiple system calls in parallel. This allows you to do blocking I/O at the same time as computation.
    1. Even though Python has a global interpreter lock, you're still responsible for protecting against objects without locks.
    1. Your programs will corrupt their data structures if you allow multiple threads to modify the same objects without locks.
    1. The lock class in the threading built-in module is Python's standard mutual exclusion lock implementation.
    1. Pipelines are a great way to organize sequences of work that run concurrently using multiple Python threads.
    1. Be aware of the many problems in building concurrent pipelines: busy waiting, stopping workers, and memory explosion.
    1. The Queue class has all of the facilities you need to build robust pipelines: blocking operations, buffer sizes, and joining.
    1. Coroutines provide an efficient way to run tens of thousands of functions seemingly at the same time.
    1. Within a generator, the value of the yield expression will be whatever value was passed to the generator's send method from the exterior code.
    1. Coroutines give you a powerful tool for separating the core logic of your program from its interaction with the surrounding environment.
    1. Python 2 doesn't support yield from or returning values from generators.
    1. Moving CPU bottlenecks to C-extension modules can be an effective way to improve performance while maximizing your investment in Python code. However, the cost of doing so is high and may introduce bugs.
    1. The multiprocessing module provides powerful tools that can parallelize certain types of Python computation with minimal effort.
    1. The power of multiprocessing is best accessed through the concurrent.futures built-in module and its simple ProcessPoolExecutor class.
    1. The advanced parts of the multiprocessing module should be avoided because they are so complex.

Chapter 6: Built-in Modules

    1. Decorators are Python syntax for allowing one function to modify another function at runtime.
    1. Using decorators can cause strange behaviors in tools that do introspection, such as debuggers.
    1. Use the wraps decorator from the functools built-in module when you define your own decorators to avoid any issues.
    1. The with statement allows you to reuse logic from try/finally blocks and reduce visual noise.
    1. The contextlib built-in module provides a contextmanager decorator that makes it easy to use your own functions in with statements.
    1. The value yielded by context managers is supplied to the as part of the with statement. It's useful for letting your code directly access the cause of the special context.
    1. The pickle built-in module is only useful for serializing and de-serializing objects between trusted programs.
    1. The pickle module may break down when used for more than trivial use cases.
    1. Use the copyreg built-in module with pickle to add missing attributes values, allow versioning of classes, and provide stable import paths.
    1. Avoid using the time module for translating between different time zones.
    1. Use the datetime built-in module along with the pytz module to reliably convert between times in different time zones.
    1. Always represent time in UTC and do conversations to local time as the final step before presentation.
    1. Use Python's built-in modules for algorithms and data structures.
    1. Don't re-implement this functionality yourself. It's hard to get right.
    1. Python has built-in types and classes in modules that can represent practically every type of numerical value.
    1. The Decimal class is ideal for situations that require high precision and exact rounding behavior, such as computations of monetary values.
    1. The Python Package Index (PyPI) contains a wealth of common packages that are built and maintained by the Python community.
    1. pip is the command-line to use for installing packages from PyPI.
    1. pip is installed by default in Python 3.4 and above; you must install it yourself for older versions.
    1. The majority of PyPI modules are free and open source software.

Chapter 7: Collaboration

    1. Write documentation for every module, class and function using docstrings. Keep them up to date as your code changes.
    1. For modules: introduce the contents of the module and any important classes or functions all users should know about.
    1. For classes: document behavior, important attributes, and subclass behavior in the docstring following the class statement.
    1. For functions and methods: document every argument, returned value, raised exception, and other behaviors in the docstring following the def statement.
    1. Packages in Python are modules that contain other modules. Packages allow you to organize your code into separate, non-conflicting namespaces with unique absolute module names.
    1. Simple package are defined by adding an init.py file to a directory that contains other source files. These files become that child modules of the directory's package. Package directories may also contain other packages.
    1. You can provide an explict API for a module by listing its publicly visible name in its all special attribute.
    1. You can hide a package's internal implementation by only importing public names in the package's init.py file or by naming internal-only members with a leading underscore.
    1. When collaborating within a single team or on a single codebase, using all for explicit APIs is probably unnecessary.
    1. Defining root exceptions for your modules allows API consumers to insulate themselves from your API.
    1. Catching root exceptions can help you find bugs in code that consumes an API.
    1. Catching the Python Exception base class can help you find bugs in API implementations.
    1. Intermediate root exceptions let you add more specific types of exceptions in the future without breaking your API consumers.
    1. Circular dependencies happen when two modules must call into each other at import time. They can cause your program to crash at startup.
    1. The best way to break a circular dependency is refactoring mutual dependencies into a separate module at the bottom of the dependency tree.
    1. Dynamic imports are the simplest solution for breaking a circular dependency between modules while minimizing refactoring and complexity.
    1. Virtual environment allow you to use pip to install many different versions of the same package on the same machine without conflicts.
    1. Virtual environments are created with pyvenv, enabled with source bin/activate, and disabled with deactivate.
    1. You can dump all of the requirements of an environment with pip freeze. You can reproduce the environment by supplying the requirements.txt file to pip install -r.
    1. In versions of Python before 3.4, the pyvenv tool must be downloaded and installed separately. The command-line tool is called virtualenv instead of pyvenv.

Chapter 8: Production

    1. Programs often need to run in multiple deployment environments that each have unique assumptions and configurations.
    1. You can tailor a module's contents to different deployment environments by using normal Python statements in module scope.
    1. Module contents can be the product of any external condition, including host introspection through the sys and os modules.
    1. Calling print on built-in Python types will produce the human-readable string version of a value, which hides type information.
    1. Calling repr on built-in Python types will produce the printable string version of a value. These repr strings could be passed to the eval built-in function to get back the original value.
    1. %s in format strings will produce human-readable strings like str.%r will produce printable strings like repr.
    1. You can define the repr method to customize the printable representation of a class and provide more detailed debugging information.
    1. You can reach into any object's dict attribute to view its internals.
    1. The only way to have confidence in a Python program is to write tests.
    1. The unittest built-in module provides most of the facilities you'll need to write good tests.
    1. You can define tests by subclassing TestCase and defining one method per behavior you'd like to test. Test methods on TestCase classes must start with the word test.
    1. It's important to write both unit tests (for isolated functionality) and integration tests (for modules that interact).
  1. You can initiate the Python interactive debugger at a point of interest directly in your program with the import pdb; pdb.set_trace() statements.
  2. The Python debugger prompt is a full Python shell that lets you inspect and modify the state of a running program.
  3. pdb shell commands let you precisely control program execution, allowing you to alternate between inspecting program state and progressing program execution.
    1. It's import to profile Python programs before optimizing because the source of slowdowns is often obscure.
    1. Use the cProfile module instead of the profile module because it provides more accurate profiling information.
    1. The Profile object's runcall method provides everything you need to profile a tree of function calls in isolation.
    1. The Stats object lets you select and print the subset of profiling information you need to see to understand your program's performance.
    1. It can be difficult to understand how Python programs use and leak memory.
    1. The gc module can help you understand which objects exist, but it has no information about how they were allocated.
    1. The tracemalloc built-in module provides powerful tools for understanding the source of memory usage.
    1. tracemalloc is only available in Python 3.4 and above.