All about dataclass in Python
The @dataclass
decorator appeared with Python 3.7 to address a clear need: structuring data simply, without writing repetitive code.
It's a native and elegant solution for creating readable, concise, typed and powerful classes, while significantly reducing the code to write (we'll get to that right after). It's a tool praised by both beginner and expert developers.
When applied to a class, Python automatically generates several special methods such as:
__init__()
for attribute initialization;__repr__()
for a readable representation;__eq__()
for object comparison.
and even other methods if desired (__lt__
, __le__
, etc).
This allows you to focus on the data, while having a fully functional object.
Unlike
namedtuple
,dataclass
are more flexible, accept typing, mutability or immutability, and can contain custom methods.
Basic syntax and example
To use a dataclass
, simply add @dataclass
(which is a decorator) above a class to activate its features.
Here's a small example of dataclass
to illustrate all its power:
from dataclasses import dataclass
@dataclass
class Product:
name: str
price: float
item = Product("Keyboard", 49.99)
print(item.name) # Keyboard
print(item.price) # 49.99
print(item) # Product(name='Keyboard', price=49.99)
In just 4 lines, we have a typed class with an automatic constructor. Magical, isn't it? 😋
No need for
__init__
,__str__
, or manually defining types in methods:@dataclass
does it for us.
Key parameters of the @dataclass decorator
The @dataclass
decorator can be configured using optional parameters, according to the desired behavior:
Parameter | Description |
init=True | Automatically generates the __init__() method |
repr=True | Generates __repr()__ to create a representation |
eq=True | Generates __eq__() to make equality comparisons |
order=False | Allows comparing inferiority and superiority |
frozen=False | Makes the object immutable (if True) |
Default values and optional fields
With dataclass
, you can easily specify default values for certain fields, like in a classic function:
from dataclasses import dataclass
@dataclass
class Article:
name: str
price: float = 0.0
In this example, if we create an Article("USB Cable")
, the price will automatically be 0.0
.
The
dataclasses
module also provides thefield()
function to handle advanced cases:PYTHONfrom dataclasses import dataclass, field @dataclass class Order: items: list = field(default_factory=list)
Here,
default_factory
is very useful to avoid pitfalls related to shared mutable values like lists or dictionaries.
Making a dataclass immutable (frozen=True)
When we add frozen=True
, the object becomes immutable: its attributes can no longer be modified after creation.
from dataclasses import dataclass
@dataclass(frozen=True)
class Client:
name: str
age: int
c = Client("John", 30)
# c.age = 31 ❌ Causes an error: cannot assign to field
Moreover, frozen dataclass
are automatically hashable if their fields are. You can thus use them as keys in a dictionary or in a set
.
Comparison and sorting with order=True
By default, a dataclass
cannot be sorted or compared with <
, >
, <=
, >=
.
To activate these operations, we can specify the order=True
parameter.
from dataclasses import dataclass
@dataclass(order=True)
class Product:
price: float
name: str
The sort order is based on the field order in the declaration: here, instances will be sorted according to price
, then according to name
if prices are equal.
The special __post_init__() method
The __post_init__()
method is called after the execution of __init__()
automatically generated by @dataclass
. It allows you to perform custom processing or validations on attributes.
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int
def __post_init__(self):
if self.age < 0:
raise ValueError("Age cannot be negative.")
In this example, even though __init__()
is automatic, we add specific business logic without having to rewrite it.
We often use
__post_init__()
to validate, round or transform input data. 😉
Strong typing and type hints
One of the great advantages of dataclass
is their native integration with type annotations.
Each field is typed with a standard annotation, which offers several benefits:
- Clear documentation;
- Better IDE support (auto-completion, verification);
- Integration with static verification tools (MyPy, Pyright).
Here's how to do it:
@dataclass
class Account:
identifier: int
balance: float
active: bool
Thanks to these types, tools can detect if a wrong value is passed at instantiation:
c = Account("abc", 50.0, True) # 🚫 Error detectable with MyPy
Typing is recommended, but not mandatory. However, without types, some
dataclass
features will not work correctly (like the generation of__init__()
).
dataclass vs classic class: what are the differences?
Let's now see a concrete comparison between a classic class and a dataclass
.
The objective is to highlight the code reduction and readability gain.
Classic class:
class Product:
def __init__(self, name, price):
self.name = name
self.price = price
def __repr__(self):
return f"Product(name={self.name!r}, price={self.price!r})"
def __eq__(self, other):
return isinstance(other, Product) and self.name == other.name and self.price == other.price
Version dataclass
:
from dataclasses import dataclass
@dataclass
class Product:
name: str
price: float
Result: same functionality, 5 times less code, clearer, cleaner.
dataclass vs namedtuple
Before dataclass
, we often used namedtuple
to create lightweight objects with field naming. Let's compare them:
Criteria | dataclass | namedtuple |
Requires import | ✅ Yes (dataclasses ) | ✅ Yes (collections ) |
Mutable? | ✅ Yes | ❌ No (immutable) |
Typed? | ✅ Yes | ❌ No |
Custom methods? | ✅ Yes | ❌ Not really |
Inheritance | ✅ Yes | ❌ Complex |
Sortable? | ✅ Yes | ✅ Yes (by default) |
In summary: dataclass
is more modern and more flexible, while namedtuple
remains useful if you want simple immutability without overhead.
Advanced options with field()
As we've seen, the dataclasses
module provides the field()
utility to finely customize the behavior of each attribute.
Let's see its parameters together:
Option | Description |
default= | Default value |
default_factory= | Dynamically generates a default value (great for lists and dictionaries) |
init=False |
Don't include in |
repr=False | Exclude from __repr__() |
compare=False | Exclude from __eq__() and __lt__() |
Let's take this example:
from dataclasses import dataclass, field
@dataclass
class Counter:
name: str
history: list = field(default_factory=list, repr=False, compare=False)
In this example:
history
is invisible in__repr__()
;- It is not used for comparison between two objects;
- It receives a new list for each instance, without reference sharing (avoids pitfalls).
As we can see, field()
is a very powerful tool to refine the rules of our dataclass
objects, especially in API-oriented contexts, serialization or business logic.
Using asdict() and astuple()
dataclass
can be easily converted to dictionary or tuple thanks to the utility functions asdict()
and astuple()
from the dataclasses
module.
from dataclasses import dataclass, asdict, astuple
@dataclass
class Product:
name: str
price: float
p = Product("Pen", 2.50)
print(asdict(p)) # {'name': 'Pen', 'price': 2.5}
print(astuple(p)) # ('Pen', 2.5)
These conversions are useful:
For serialization to JSON;
For debug display;
Or to send data via an API.
asdict()
performs recursive conversion: if a field contains another dataclass, it will also be converted.
Inheritance and nested dataclasses
Dataclass inheritance
Like "classic" Python classes, dataclass
support inheritance. This allows factoring common attributes or enriching specialized classes.
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int
@dataclass
class Employee(Person):
position: str
Here, Employee
inherits fields from Person
and adds a position
field.
Nested dataclasses
Nested dataclasses are ideal for representing complex or hierarchical objects.
@dataclass
class Address:
city: str
postal_code: str
@dataclass
class Client:
name: str
address: Address
c = Client("Chloe", Address("Paris", "75001"))
print(c.address.city) # Paris
Performance and limitations of dataclasses
Performance
dataclass
are as fast as classic classes for most uses, but it should be remembered that automatic methods add a slight overhead at instantiation (it's really negligible).
Also, frozen=True
objects are a bit slower because they are obviously hashable.
Limitations
Dataclasses do not replace a complete ORM or business model, they are also poorly suited to very dynamic objects or for multiple inheritance.
Finally: they are only compatible from Python 3.7 onwards.
Practical use cases
dataclass
are used in many concrete contexts. Let's take a quick tour with some examples.
With simple data
@dataclass
class User:
name: str
email: str
active: bool = True
Ideal for handling user objects in an API.
With configurations
@dataclass
class Config:
debug: bool
path: str
version: float
With business data
@dataclass
class Order:
product: str
quantity: int
unit_price: float
def total(self) -> float:
return self.quantity * self.unit_price
Frequently asked questions about dataclasses
Let's make a quick point about the most frequently asked questions about dataclasses in Python!
Can I add methods in a
dataclass
?
Yes! dataclass
are classic Python classes, so you can add methods like in any other class.
Can I modify the attributes of a
dataclass
?
Yes, unless you have defined frozen=True
. In this case, instances become immutable.
Is it compatible with versions < 3.7?
No. The dataclasses
module is native from Python 3.7. For earlier versions, a backport exists: pip install dataclasses
.
Where to learn to master Python?