The price of dot

Posted on Wed 15 April 2026 in Software

Here are a few different ways to do the same thing.

x = [1, 2, 3, 4]

# Option 1
x[0]

# Option 2
x.__getitem__(0)

# Option 3
from operator import getitem
getitem(x, 0)

# Option 4
import operator
operator.getitem(x, 0)

Functionally, they are meant to be equivalent to each other: given a sequence x that supports __getitem__, retrieve the 0th item (and do nothing with it). In this case x is a list, a primitive type, so one would expect such operations to be very fast.

However, an experiment with %timeit shows that they are not at all equivalent in practice. Here's a run on Python 3.14:

ipython
Python 3.14.2 (v3.14.2:df793163d58, Dec  5 2025, 12:18:06) [Clang 16.0.0 (clang-1600.0.26.6)]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.9.0 -- An enhanced Interactive Python. Type '?' for help.
Tip: You can use `files = !ls *.png`

[ins] In [1]: x = [1, 2, 3, 4]

[ins] In [2]: %timeit x[0]
15.1 ns ± 0.122 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)

[ins] In [3]: %timeit x.__getitem__(0)
20.9 ns ± 0.0328 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

[ins] In [4]: from operator import getitem

[ins] In [5]: %timeit getitem(x, 0)
21.1 ns ± 0.145 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

[ins] In [6]: %timeit getitem(x, 0)
22.2 ns ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

[ins] In [7]: import operator

[ins] In [8]: %timeit operator.getitem(x, 0)
24.7 ns ± 0.136 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

[ins] In [9]: %timeit operator.getitem(x, 0)
24.5 ns ± 0.145 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

Using the direct access operator x[0] (Option 1) is fastest of all; the other options all take at least 30% longer, sometimes more.¹

The bytecode

A look at the bytecode (thanks Matt Godbolt) illustrates the difference between the four options.²

Option 1

x[0]

LOAD_NAME       2 (x)
LOAD_SMALL_INT  0
BINARY_OP      26 ([])
POP_TOP

This is an entirely primitive operation. As such it is the fastest of them all.

Option 2

x.__getitem__(0)

LOAD_NAME       2 (x)
LOAD_ATTR       7 (__getitem__ + NULL|self)
LOAD_SMALL_INT  0
CALL            1
POP_TOP

This requires the additional step of LOAD_ATTR to dereference the .__getitem__ attribute (method), and a CALL. The actual work is hidden inside this CALL.

Option 3

# from operator import getitem
getitem(x, 0)

LOAD_NAME       1 (getitem)
PUSH_NULL
LOAD_NAME       2 (x)
LOAD_SMALL_INT  0
CALL            2
POP_TOP

Option 4

# import operator
operator.getitem(x, 0)

LOAD_NAME       0 (operator)
LOAD_ATTR       2 (getitem)
PUSH_NULL
LOAD_NAME       2 (x)
LOAD_SMALL_INT  0
CALL            2
POP_TOP

Compared to Option 3, Option 4 has the additional step of LOAD_NAME for operator and then dereferencing the .getitem attribute using LOAD_ATTR. This additional step adds on a couple of nanoseconds.

It was worse in Python 3.10

In Python 3.10, the bytecode is the same, but Option 4 (operator.getitem) took almost twice the amount of time compared to Options 2 and 3.

Python 3.10.9 (main, Dec 15 2022, 10:44:50) [Clang 14.0.0 (clang-1400.0.29.202)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.10.0 -- An enhanced Interactive Python. Type '?' for help.

[ins] In [1]: x = [1, 2, 3, 4]

[ins] In [2]: %timeit x[0]
15.5 ns ± 0.0276 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)

[ins] In [3]: %timeit x.__getitem__(0)
24.8 ns ± 0.149 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

[ins] In [4]: from operator import getitem

[ins] In [5]: %timeit getitem(x, 0)
25.1 ns ± 0.0492 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

[ins] In [6]: import operator

[ins] In [7]: %timeit operator.getitem(x, 0)
39.6 ns ± 0.157 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

The lessons

Dot lookups have a hidden cost.

Use primitive operations when possible.

Imports

"From" imports (from ... import ...) have a small advantage over importing a module and then referencing the names of members of the module, for example:

[ins] In [1]: import numpy as np

[ins] In [2]: x = np.array([3, 4])

[ins] In [3]: %timeit np.linalg.norm(x)
1.19 μs ± 3.81 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

[ins] In [4]: from numpy.linalg import norm

[ins] In [5]: %timeit norm(x)
1.15 μs ± 3.57 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

The same time shaving can be made by defining an alias:

[ins] In [6]: n = np.linalg.norm

[ins] In [7]: %timeit n(x)
1.16 μs ± 3.63 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

However, the difference is small, and indeed becomes negligible compared to the cost of the operations themselves.

[ins] In [15]: x = np.arange(1000)

[ins] In [16]: %timeit np.linalg.norm(x)
3.12 μs ± 231 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

[ins] In [17]: %timeit norm(x)
3.08 μs ± 28.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

[ins] In [18]: x = np.arange(1_000_000)

[ins] In [19]: %timeit np.linalg.norm(x)
726 μs ± 30 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

[ins] In [20]: %timeit norm(x)
708 μs ± 16.5 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

I ran Options 3 and 4 twice to make sure that the %timeit is measuring only the time taken for the operation, and not including any time for performing the import, or for using an imported name for the first time. Lazy imports will become a thing in Python 3.15, but not yet. ↩
Note that the last two lines of bytecode can be ignored – they come from the fact that the code implicitly returns None when Compiler Explorer supposes it to be the body of a function. ↩