Going Deeper with Pydantic: Nested Models and Data Structures

In Post 1, we explored the basics of Pydantic: creating models, enforcing type validation, and ensuring data integrity with minimal boilerplate. But real-world applications often involve more complex, structured data—like API payloads, configuration files, or nested JSON. How do we handle a blog post with comments, an order with multiple items, or a user profile with nested addresses? This post dives into Pydantic’s powerful support for nested models and smart data structures, showing how to model, validate, and access complex data with ease. We’ll cover practical examples, including a blog system with authors and comments, and touch on use cases like user profiles or e-commerce orders. Let’s get started! Nested BaseModels Pydantic allows you to define models within models, enabling clean, hierarchical data structures. Let’s model a blog system with an Author, Comment, and Blog model. from pydantic import BaseModel from datetime import datetime class Author(BaseModel): name: str email: str class Comment(BaseModel): content: str author: Author created_at: datetime class Blog(BaseModel): title: str content: str author: Author comments: list[Comment] = [] # Example usage blog_data = { "title": "Nested Models in Pydantic", "content": "This is a blog post about Pydantic...", "author": {"name": "Jane Doe", "email": "jane@example.com"}, "comments": [ { "content": "Great post!", "author": {"name": "John Smith", "email": "john@example.com"}, "created_at": "2025-05-04T10:00:00" } ] } blog = Blog(**blog_data) print(blog.author.name) # Jane Doe print(blog.comments[0].author.email) # john@example.com Here, Comment and Blog embed the Author model, and Pydantic automatically validates the nested data. If author.email is invalid (e.g., not a string), validation fails before the model is instantiated. This cascading validation ensures every layer of your data is correct. Lists, Tuples, and Sets of Models Nested models often involve collections, like a list of comments on a blog. Pydantic supports List[T], Tuple[T, ...], and Set[T] for collections of models or other types. Using our Blog model, notice the comments: list[Comment] = []. Pydantic validates each Comment in the list: invalid_comment_data = { "title": "Invalid Comment Example", "content": "This blog has a bad comment...", "author": {"name": "Jane Doe", "email": "jane@example.com"}, "comments": [ { "content": "This is fine", "author": {"name": "John Smith", "email": "john@example.com"}, "created_at": "2025-05-04T10:00:00" }, { "content": "This is bad", "author": {"name": "Bad Author", "email": "not-an-email"}, # Invalid email "created_at": "2025-05-04T10:01:00" } ] } try: blog = Blog(**invalid_comment_data) except ValueError as e: print(e) Pydantic will raise a ValidationError pinpointing the invalid email in the second comment. You can also use Tuple[Comment, ...] for immutable sequences or Set[Comment] for unique items, and validation works the same way. Optional Fields and Defaults Real-world data often includes optional fields or defaults. Pydantic supports Optional[T] from typing and allows default values. from typing import Optional class Author(BaseModel): name: str email: Optional[str] = None # Email is optional bio: str = "No bio provided" # Default value class Blog(BaseModel): title: str content: str author: Author # Example with missing email blog_data = { "title": "Optional Fields", "content": "This blog has an author with no email.", "author": {"name": "Jane Doe"} } blog = Blog(**blog_data) print(blog.author.email) # None print(blog.author.bio) # No bio provided Optional[str] means the field can be None or a string, while email: str = None implies the field is optional but defaults to None. Pydantic distinguishes between missing fields (not in the input) and fields explicitly set to None, ensuring precise control over data parsing. Dict and Map-Like Structures Pydantic supports Dict[K, V] for key-value structures, perfect for feature flags, localized content, or other mappings. from typing import Dict class Blog(BaseModel): title: str content: str translations: Dict[str, str] # Language code -> translated title blog_data = { "title": "Pydantic Power", "content": "This is a blog post...", "translations": { "es": "El poder de Pydantic", "fr": "La puissance de Pydantic" } } blog = Blog(**blog_data) print(blog.translations["es"]) # El poder de Pydantic You can also nest models in dictionaries, like Dict[str, Author], for more complex mappings. Pydantic validates both keys and values according to their types. Accessing Nested

May 4, 2025 - 01:19
 0
Going Deeper with Pydantic: Nested Models and Data Structures

In Post 1, we explored the basics of Pydantic: creating models, enforcing type validation, and ensuring data integrity with minimal boilerplate. But real-world applications often involve more complex, structured data—like API payloads, configuration files, or nested JSON. How do we handle a blog post with comments, an order with multiple items, or a user profile with nested addresses? This post dives into Pydantic’s powerful support for nested models and smart data structures, showing how to model, validate, and access complex data with ease.

We’ll cover practical examples, including a blog system with authors and comments, and touch on use cases like user profiles or e-commerce orders. Let’s get started!

Nested BaseModels

Pydantic allows you to define models within models, enabling clean, hierarchical data structures. Let’s model a blog system with an Author, Comment, and Blog model.

from pydantic import BaseModel
from datetime import datetime

class Author(BaseModel):
    name: str
    email: str

class Comment(BaseModel):
    content: str
    author: Author
    created_at: datetime

class Blog(BaseModel):
    title: str
    content: str
    author: Author
    comments: list[Comment] = []

# Example usage
blog_data = {
    "title": "Nested Models in Pydantic",
    "content": "This is a blog post about Pydantic...",
    "author": {"name": "Jane Doe", "email": "jane@example.com"},
    "comments": [
        {
            "content": "Great post!",
            "author": {"name": "John Smith", "email": "john@example.com"},
            "created_at": "2025-05-04T10:00:00"
        }
    ]
}

blog = Blog(**blog_data)
print(blog.author.name)  # Jane Doe
print(blog.comments[0].author.email)  # john@example.com

Here, Comment and Blog embed the Author model, and Pydantic automatically validates the nested data. If author.email is invalid (e.g., not a string), validation fails before the model is instantiated. This cascading validation ensures every layer of your data is correct.

Lists, Tuples, and Sets of Models

Nested models often involve collections, like a list of comments on a blog. Pydantic supports List[T], Tuple[T, ...], and Set[T] for collections of models or other types.

Using our Blog model, notice the comments: list[Comment] = []. Pydantic validates each Comment in the list:

invalid_comment_data = {
    "title": "Invalid Comment Example",
    "content": "This blog has a bad comment...",
    "author": {"name": "Jane Doe", "email": "jane@example.com"},
    "comments": [
        {
            "content": "This is fine",
            "author": {"name": "John Smith", "email": "john@example.com"},
            "created_at": "2025-05-04T10:00:00"
        },
        {
            "content": "This is bad",
            "author": {"name": "Bad Author", "email": "not-an-email"},  # Invalid email
            "created_at": "2025-05-04T10:01:00"
        }
    ]
}

try:
    blog = Blog(**invalid_comment_data)
except ValueError as e:
    print(e)

Pydantic will raise a ValidationError pinpointing the invalid email in the second comment. You can also use Tuple[Comment, ...] for immutable sequences or Set[Comment] for unique items, and validation works the same way.

Optional Fields and Defaults

Real-world data often includes optional fields or defaults. Pydantic supports Optional[T] from typing and allows default values.

from typing import Optional

class Author(BaseModel):
    name: str
    email: Optional[str] = None  # Email is optional
    bio: str = "No bio provided"  # Default value

class Blog(BaseModel):
    title: str
    content: str
    author: Author

# Example with missing email
blog_data = {
    "title": "Optional Fields",
    "content": "This blog has an author with no email.",
    "author": {"name": "Jane Doe"}
}

blog = Blog(**blog_data)
print(blog.author.email)  # None
print(blog.author.bio)    # No bio provided

Optional[str] means the field can be None or a string, while email: str = None implies the field is optional but defaults to None. Pydantic distinguishes between missing fields (not in the input) and fields explicitly set to None, ensuring precise control over data parsing.

Dict and Map-Like Structures

Pydantic supports Dict[K, V] for key-value structures, perfect for feature flags, localized content, or other mappings.

from typing import Dict

class Blog(BaseModel):
    title: str
    content: str
    translations: Dict[str, str]  # Language code -> translated title

blog_data = {
    "title": "Pydantic Power",
    "content": "This is a blog post...",
    "translations": {
        "es": "El poder de Pydantic",
        "fr": "La puissance de Pydantic"
    }
}

blog = Blog(**blog_data)
print(blog.translations["es"])  # El poder de Pydantic

You can also nest models in dictionaries, like Dict[str, Author], for more complex mappings. Pydantic validates both keys and values according to their types.

Accessing Nested Data Safely

Once validated, Pydantic models provide type-safe access to nested attributes. You can access fields like blog.author.name or blog.comments[0].content without worrying about KeyError or AttributeError.

For serialization, use .dict() (or .model_dump() in Pydantic V2) with options like exclude_unset, include, or exclude:

# Serialize only specific fields
print(blog.dict(include={"title", "author": {"name"}}))
# Output: {'title': 'Pydantic Power', 'author': {'name': 'Jane Doe'}}

# Exclude unset fields
blog = Blog(
    title="Test",
    content="Content",
    author=Author(name="Jane")
)
print(blog.dict(exclude_unset=True))
# Only includes fields explicitly set, skips defaults like author.bio

This makes it easy to control what data is serialized for APIs or storage.

Validation and Error Reporting in Nested Structures

Pydantic’s error reporting is precise, even for nested data. Let’s revisit the invalid comment example:

try:
    blog = Blog(**invalid_comment_data)
except ValueError as e:
    print(e.errors())

Output might look like:

[
    {
        'loc': ('comments', 1, 'author', 'email'),
        'msg': 'value is not a valid email address',
        'type': 'value_error.email'
    }
]

The loc field shows the exact path to the error (comments[1].author.email), making it easy to debug complex structures. This granularity is invaluable for APIs or user-facing validation.

Recap and Takeaways

Nested models in Pydantic make it easy to handle complex, structured data with robust validation. Key techniques:

  • Use BaseModel for nested structures like Author in Blog.
  • Leverage List[T], Dict[K, V], and Optional[T] for flexible data shapes.
  • Access nested data safely with dot notation or serialize with .dict().
  • Rely on Pydantic’s detailed error reporting for debugging.

These tools are perfect for APIs, configuration files, or any scenario with hierarchical data.