Object creation patterns in Python: Static factory methods

Posted on Tue 21 May 2024 in Python

The basic way to create and initialize an object in Python is using the default constructor, defined as the __init__() method:

class MyClass:
    def __init__(self, x: float) -> None:
        self.x: float = x
        print(f"created an object with x = {x}")


obj = MyClass(5)

Usually, initial values of instance variables are specified using arguments to __init__(), and assigned to the variables. The body of __init__() may also contain other steps, such as precomputing other other variables based on these instance attributes, or, as in this example, logging the creation of an object. (Resource allocation such as opening files should generally not be done in an initializer, but in a context manager, so that client code doesn't have to take care of deallocation.)

Dataclasses and kwargs

When a class has many instance variables, it is common to use keyword arguments when initializing them, which makes the meaning of each argument more explicit and allows arguments to be reordered later. Furthermore, such classes are often good candidates for making into dataclasses, since the @dataclass decorator automatically creates an __init__() method with the necessary arguments and that assigns them to the instance variables:

@dataclass
class Person:
    # types are not enforced at runtime, but are checked by type
    # checkers
    name: str
    age: int

    # More complex types, including recursive types, are allowed. Use
    # `field`
    occupation: Optional[str] = field(default=None)
    best_friend: "Person" = field(default=None)

    # an __init___ method is automatically generated, but you can still
    # create a __post_init__ method for additional behaviour like
    # logging

    def __post_init__(self):
        print(f"Created new Person: {self}")


# Use kwargs when creating instances
p = Person(name="Sir Robin", age=32, occupation="knight")

As well as automatically providing the __init__() method, dataclasses also automatically provide methods such as __repr__() and __eq__(), further reducing the amount of boilerplate code and allowing you to use a more declarative style of programming that focuses on the structure of the class (and makes it easier to add or remove fields witout needing to worry about updating all these methods).

While using dataclasses makes it easy to create an initializer even for objects with many fields, there are nonetheless some limitations of creating objects using the basic syntax Person(...). To provide more expressive ways to create objects, various object creation patterns have been developed over the years, and are now standard fare in large projects, including the standard library itself.

The rest of this post will discuss the static factory methods pattern, which allows you to provide multiple, distinct ways to instantiate a class without having a complicated __init__() method. In a future post I shall discuss the builder pattern, which lets you gradually build up a complex object by passing around the information needed to create it rather than requiring all the information in one go.

The static factory method

Goal: Provide multiple, distinct ways of instantiating a class.

Explicit is better than implicit. The Zen of Python

In many cases you want to provide multiple ways to obtain instances of a class. To do that, you can provide static factory methods in your class: these are class methods that create and return instances. (The name comes from Java, which does not distinguish between class methods and static methods.)

A classic example is a Point2D class that represents a point in the plane. It is desired to be able to create instances of Point2D by specifying either the point's Cartesian coordinates or its polar coordinates. To achieve this, we define static factory methods from_cartesians() and from_polars() that return instances of Point2D:

import math


@dataclass
class Point2D:
    x: float
    y: float

    # __init__(self, x, y) is automatically generated

    @classmethod
    def from_cartesians(cls, x: float, y: float) -> "Point2D":
        return cls(x=x, y=y)

    @classmethod
    def from_polars(cls, r: float, theta: float) -> "Point2D":
        if r < 0:
            raise ValueError("Radial coordinate must be nonnegative")
        return cls(x=r * math.cos(theta), y=r * math.sin(theta))

    
p: Point2D = Point2D.from_polars(r=4, theta=0.5 * math.pi)

Although the __init__() method (automatically created by @dataclass) already gives us a way of instantiating a Point2D through its Cartesian coordinates, we nonetheless provide a from_cartesians() method. There are two major benefits to using this method oer the basic initializer:

It is more explicit about what the input numbers mean.
If it is desired to store the polar coordinates, rather than the Cartesian coordinates, as fields, the class methods can be updated without affecting client code; whereas instantiations with Point2D(...) would need to be updated. (This is referred to as "invariance under refactoring" and allows developers to work on different parts of a codebase without hugely affecting each other.)

For contrast, this is what the class might look like if we wanted to allow the user to specify either Cartesian or polar coordinates when creating a Point2D, all in the __init__() method:

@dataclass(init=False)  # we provide our own __init__() method
class Point2D:
    x: float
    y: float

    def __init__(
        self, 
        x: Optional[float] = None, 
        y: Optional[float] = None, 
        r: Optional[float] = None, 
        theta: Optional[float] = None
    ):
        using_cartesians = x is not None and y is not None
        using_polars = r is not None and theta is not None
        if using_cartesians and not using_polars:
            self.x, self.y = x, y
            return
        if using_polars and not using_cartesians:
            if r < 0:
              raise ValueError("Radial coordinate must be nonnegative")
            self.x = r * math.cos(theta)
            self.y = r * math.sin(theta)
            return
        
        raise ValueError("Either specify x and y or specify r and theta, but not both.")


p: Point2D = Point2D(r=4, theta=0.5 * math.pi)

This is clearly more complex and less expressive.

Static factory methods are a common pattern in object oriented languages. Python doesn't have method overloading (unlike C/++ and Java) and so it is not possible to define multiple __init__() methods with different signatures. But even in languages like Java it is often recommended to use static factory methods instead of overloaded constructors, for the expressiveness. (Indeed, this is the very first item in Joshua Bloch's Effective Java.)

Abstract base classes

The real power of static factory methods comes when working with abstract classes. While (by definition) you cannot directly instantiate an abstract class, you can emulate such an instantiation providing a static factory method on the abstract base class that chooses an appropriate subclass and produces an instance of that class.

A typical use case is to provide the user with a simplified interface while internally having several underlying implementations. Consider the following interface, which declares a single method for calculating the integral of a function between two points:

import abc


class Integrator(abc.ABC):
    @abc.abstractmethod
    def integrate(self, f: Callable[[float], float], a: float, b: float) -> float:
        """Calculate the integral of the function f between a and b.""" 
        raise NotImplementedError

The subclasses implement different methods of integration:

class EulerIntegrator(Integrator):
    def __init__(self, n: int = 10):
        self.n = n
        
    def integrate(self, f, a, b):
        # Even for such an inaccurate scheme, this is a terrible 
        # implementation! It is horrendously inefficient and doesn't
        # properly handle floating point errors. This is just for
        # illustration purposes.
        dx = (b - a) / self.n
        x = a
        s = 0
        while x < b:
            s += f(x) * dx
            x += dx
        return s
    
class AdamsBashforthIntegrator(Integrator): ...  # similarly
class TrapeziumIntegrator(Integrator): ...  # similarly
class RK4Integrator(Integrator): ...  # similarly

The appropriate choice of integration method depends on the function. For example, the Adams–Bashorth integrator has a lower order of accuracy than RK4, but performs better against stiff equations.

While the client may wish to choose an integrator for themself, as the developer of the Integrator class one may wish to provide the user a shortcut for common cases without requiring them to know the details of the particular implementation. To achieve this, the Integrator class could provide a static factory method that returns an appropriate instance:

class Integrator(abc.ABC):
    @classmethod
    def create(cls, stiff: bool = False) -> "Integrator":
        if stiff:
            return AdamsBashforthIntegrator(...)  # with suitable params
        else:
              return RK4Integrator(...)  # ditto

with example usage:

integrator = Integrator.create(stiff=True)
print(integrator.integrate(lambda x: x ** 2, 1, 2))

Other numerical operations that can be calculated using different algorithms are similarly candidates for such a structure.

Caveat and conclusion

The examples above are smallish classes that nonetheless benefit from having multiple ways to create them. With that said, a class that has very different behaviours depending on how it is initialized may be too complicated, and be a candidate for refactoring into smaller classes each with fewer responsibilities.

As mentioned above, the static factory method pattern is commonly used not only in Python but also in other languages. The benefits are the same: using static factory methods (a) allows the user to more explicitly state how an object should be produced from input data (Point2D.from_cartesians(3, 4) versus Point2D(3, 4)), and (b) allows the class to rework what data it stores internally or which implementation of an abstract class to provide.