# New(er) Patterns in (Scientific) Python

Author: Nathaniel Starkman (MIT, starkman@mit.edu)

This isn't a tutorial on Python, but this is also a time of exciting change in the Python ecosystem and you might not be familiar with some (relatively) recent developments.

## Setup

If you have an isolated Python you'd like to work in that's fine.
If not, here's how to set up a `uv` environment, install dependencies, and connect the environment to your Jupyter.

1. Install `uv` @ https://docs.astral.sh/uv/#installation

2. Set it up
```bash
uv sync
```

## Type Annotations

Type annotations provide information about your functions.

They are useful for:

1. Understanding a function
2. Static analysis of that function
3. Hooking into IDEs
4. Catching errors
5. compilation of your Python for speed! See [mypyc](https://mypyc.readthedocs.io/en/latest/index.html), [codon](https://docs.exaloop.io/codon), [nuitka](https://nuitka.net/pages/overview.html#now-vs-future-or-the-plan)

In [1]:
def func(x: int) -> int:
    return x + 1

Type annotations are now important in many places in Python.
For example...

## The Dataclass

Source modified from https://docs.python.org/3/library/dataclasses.html

In [2]:
from dataclasses import dataclass, KW_ONLY, field
import astropy.units as u

When should you use a dataclass? Pretty much any time you have a class
that is defined by it's attributes.

For example...

In [3]:
@dataclass
class StarCluster:
    """Class for average properties of a star cluster."""
    num_stars: int
    radius: u.Quantity
    velocity_dispersion: u.Quantity
    age: u.Quantity

    _: KW_ONLY
    name: str | None = field(
        default=None,  # name is optional, default to None
        compare=False,  # don't use `name` in `__eq__` checks
        hash=False  # don't use `name` in hash calculations
    )

    @property
    def number_density(self):
        return self.num_stars / (4/3 * 3.14159 * self.radius**3)

Dataclass takes care of generating all the special methods: `__init__`, `__repr__`, `__eq__`, etc.

In [4]:
StarCluster?

[31mInit signature:[39m
StarCluster(
    num_stars: int,
    radius: astropy.units.quantity.Quantity,
    velocity_dispersion: astropy.units.quantity.Quantity,
    age: astropy.units.quantity.Quantity,
    *,
    name: str | [38;5;28;01mNone[39;00m = [38;5;28;01mNone[39;00m,
) -> [38;5;28;01mNone[39;00m
[31mDocstring:[39m      Class for average properties of a star cluster.
[31mType:[39m           type
[31mSubclasses:[39m     

In [5]:
StarCluster.__repr__?

[31mSignature:[39m StarCluster.__repr__(self)
[31mDocstring:[39m Return repr(self).
[31mFile:[39m      Dynamically generated function. No source code available.
[31mType:[39m      function

In [6]:
StarCluster.__eq__?

[31mSignature:[39m StarCluster.__eq__(self, other)
[31mDocstring:[39m Return self==value.
[31mFile:[39m      Dynamically generated function. No source code available.
[31mType:[39m      function

In [7]:
pal5 = StarCluster(
    num_stars=500, 
    radius=u.Quantity(100, "pc"),
    velocity_dispersion=u.Quantity(5, "km/s"),
    age=u.Quantity(12, "Gyr"), name="Palomar 5"
)
pal5

StarCluster(num_stars=500, radius=<Quantity 100. pc>, velocity_dispersion=<Quantity 5. km / s>, age=<Quantity 12. Gyr>, name='Palomar 5')

In [8]:
pal5.number_density

<Quantity 0.00011937 1 / pc3>

In [9]:
tuc47 = StarCluster(1000, u.Quantity(50, "pc"), u.Quantity(3, "km/s"), u.Quantity(10, "Gyr"), name="Tucana 47")

In [10]:
tuc47 == pal5

False

Besides convenience, dataclasses offer a few advantages:

1. Consistently structured. This means that tools can easily understand and manipulate dataclasses. We don't need bespoke tooling.
2. Statically defined.
3. Easily introspected.

In [11]:
from dataclasses import asdict, replace, fields

In [12]:
fields(pal5)

(Field(name='num_stars',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object at 0x1015ccce0>,default_factory=<dataclasses._MISSING_TYPE object at 0x1015ccce0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='radius',type=<class 'astropy.units.quantity.Quantity'>,default=<dataclasses._MISSING_TYPE object at 0x1015ccce0>,default_factory=<dataclasses._MISSING_TYPE object at 0x1015ccce0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='velocity_dispersion',type=<class 'astropy.units.quantity.Quantity'>,default=<dataclasses._MISSING_TYPE object at 0x1015ccce0>,default_factory=<dataclasses._MISSING_TYPE object at 0x1015ccce0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),kw_only=False,_field_type=_FIELD),
 Field(name='age',type=<class 'astropy.units.quantity.Quantity'>,default=<dataclasses._MISSING_TYPE object at 0x1015ccce0>,d

In [13]:
# With asdict, we can convert the dataclass to a dictionary!
asdict(pal5)

{'num_stars': 500,
 'radius': <Quantity 100. pc>,
 'velocity_dispersion': <Quantity 5. km / s>,
 'age': <Quantity 12. Gyr>,
 'name': 'Palomar 5'}

In [14]:
# With replace, we can create a new instance of the dataclass with ANY fields
# changed! Here, we change the number of stars to 100,000.
replace(pal5, num_stars=int(1e5))

StarCluster(num_stars=100000, radius=<Quantity 100. pc>, velocity_dispersion=<Quantity 5. km / s>, age=<Quantity 12. Gyr>, name='Palomar 5')

In [15]:
# Quick test of equality under replacement
pal5 == replace(pal5), pal5 == replace(pal5, num_stars=1_000)

(True, False)

## Abstract/Final Pattern

Adapted from https://docs.kidger.site/equinox/

This is a pattern that is mandatory in Julia and has been emerging in popularity in the ML community in Python.


It produces VERY clean and readable code and has easy to understand class inheritances.


<div class="alert alert-block alert-info">
Every class must be either: <br>

(a) abstract (it can be subclassed, but not instantiated); or <br>
(b) final (it can be instantiated, but not subclassed).
</div>

![](figures/abstract_final.jpg)

In [16]:
from abc import ABCMeta, abstractmethod
from typing import final

In [17]:
# This class is abstract and cannot be instantiated

class AbstractClass(metaclass=ABCMeta):
    """Abstract base class."""

    @abstractmethod
    def method(self):
        pass

In [18]:
# This class is concrete and can be instantiated

@final
class ConcreteClass(AbstractClass):
    """A concrete subclass of AbstractClass."""

    def method(self):
        return "Beautiful is better than ugly."


In [19]:
# This class is abstract and cannot be instantiated.
# But it's subclasses are concrete and can be instantiated.

class AbstractSubClass(AbstractClass):
    """An abstract subclass of AbstractClass."""

    @abstractmethod
    def another_method(self):
        pass


@final
class ConcreteSubClass1(AbstractSubClass):
    """A concrete subclass of AbstractSubClass."""

    def method(self):
        return "Explicit is better than implicit."

    def another_method(self):
        return "Simple is better than complex."


@final
class ConcreteSubClass2(AbstractSubClass):
    """Another concrete subclass of AbstractSubClass."""

    def method(self):
        return "Complex is better than complicated."

    def another_method(self):
        return "Flat is better than nested."

In [20]:
ConcreteSubClass1().another_method()

'Simple is better than complex.'

## Array API

See https://data-apis.org/array-api/

There's a lot to say about the Array API.

Here's two:
1. It's unifying the numerical Python API
2. It enables a lot of interoperability

<img src="figures/array_api.png" width="600">

<img src="figures/array_api_scipy_support.png" width="600">

Unifying the API does mean some function names are changing!

In [21]:
import numpy as np

np.concat

<function concatenate at 0x108007d70>

## Recap

1. Type Annotations
2. Dataclasses
3. Abstract / Final
4. Array API