Defensive Programming in Python: Part 2: Input Validation

Defensive Programming in Python: Part 2: Input Validation

Introduction

Python is not a strongly typed language — which means things can go wrong at runtime. Python provides you with a dynamic environment, where variables don’t need a declared type before use, flexibility comes hand-in-hand with the challenge of unexpected runtime errors. These errors often stem from data not being rigorously checked or validated, leading to issues that might only surface when the application is already in use. This is where defensive programming shines — it’s all about writing code that’s not just ready for the happy path but also prepared for the twists and turns.

This blog post aims to explore how Pydantic can help you write defensive code that saves you from the unexpected, ensuring that your applications are both robust and reliable.

If you haven't read my previous article in this series do check it out at: https://www.kubeblogs.com/defensive-programming-in-python-part-1-golden-rules-for-logging/

What are my options?

There are several approaches to input validation, each with its strengths and trade-offs:

  1. Custom Classes with Self-Written Validation Logic: This approach involves defining classes with methods that include manual checks and validation logic. It offers maximum flexibility but requires more code, increasing the potential for errors and maintenance overhead.
  2. Dataclasses: dataclasses are a lightweight way to define classes that primarily store data. While they reduce boilerplate code and provide a clean syntax for data containers, dataclasses themselves don’t offer built-in validation. Validation logic still needs to be implemented manually, often in the __post_init__ method.
  3. Pydantic: Pydantic uses Python type annotations to define data schemas and performs validation automatically. It’s designed to be fast and easy to use, with features like error handling, complex data structures, JSON serialization/deserialization, and more, right out of the box.

Why I Recommend Pydantic

Among these options, Pydantic stands out for several reasons, making it my preferred choice for input validation:

  • Ease of Use: Pydantic’s syntax is intuitive, leveraging Python type hints, which many developers are already familiar with. This reduces the learning curve and allows for quick integration into existing projects.
  • Automatic Validation: Pydantic automatically validates data as it’s instantiated, ensuring that all fields meet the specified type annotations and constraints without the need for additional boilerplate validation code.
  • Rich Type Support: Pydantic supports a wide range of types, including standard Python types, complex objects, and even custom types. This makes it versatile for different use cases, from simple configurations to complex nested models.
  • Error Reporting: Pydantic provides detailed error messages when validation fails, which helps in quickly pinpointing and resolving issues.
  • Extensibility and Customization: While Pydantic works well out of the box, it also offers extensive options for customization, including custom validators, type coercion, and model configuration.
  • Performance: Pydantic is designed to be fast.
  • Community and Ecosystem Support: Pydantic has a strong and growing community, leading to a wealth of resources, extensions, and integrations with other tools and frameworks.

In conclusion, while custom classes and dataclasses provide valuable tools for input validation, Pydantic’s combination of ease of use, automatic and comprehensive validation, detailed error reporting, and extensibility make it a clear winner!

Got it — Show me some Examples!

Pydantic makes it super easy to do defensive coding. It helps us enforce strict type checking and data validation through its untuitive use of type annotations.

Basic Model Validation

Let’s start with a simple example of a user model:

from pydantic import BaseModel, ValidationError, EmailStr

class User(BaseModel):
    name: str
    age: int
    email: EmailStr

# Valid input
user = User(name="John Doe", age=30, email="john.doe@example.com")
print(user)

# Invalid input: raises ValidationError
try:
    User(name="John Doe", age="30", email="not-an-email")
except ValidationError as e:
    print(e)

In this example, User is a Pydantic model with three fields: nameage, and email. Pydantic validates the data upon model instantiation, ensuring age is an integer and email is a valid email address.

Custom Validators

For more complex validations, you can define custom validators:

from pydantic import BaseModel, validator

class Item(BaseModel):
    name: str
    price: float
    @validator('price')
    def price_must_be_positive(cls, value):
        if value <= 0:
            raise ValueError('Price must be positive')
        return value
# Valid input
item = Item(name="Laptop", price=1000.00)
print(item)
# Invalid input: raises ValueError
try:
    Item(name="Laptop", price=-100.00)
except ValueError as e:
    print(e)

This Item model ensures that the price is always positive, using a custom validator method.

Nested Models

from pydantic import BaseModel
from typing import List

class Address(BaseModel):
    city: str
    country: str
class Person(BaseModel):
    name: str
    age: int
    address: Address
address = Address(city="New York", country="USA")
person = Person(name="Bob", age=35, address=address)
print(person)

In this example, Pydantic validates complex, nested data structures using models within models.

Using Pydantic with JSON

from pydantic import BaseModel

class Event(BaseModel):
    title: str
    location: str
# Imagine this JSON is coming from an HTTP request
json_data = '{"title": "Concert", "location": "City Arena"}'
# Parse JSON to create a Pydantic model
event = Event.parse_raw(json_data)
print(event)

Usage with flask

Integrating Pydantic with Flask allows you to validate incoming JSON data in Flask routes using Pydantic models.

from flask import Flask, request, jsonify
from pydantic import BaseModel, ValidationError

app = Flask(__name__)

# Define a Pydantic model
class User(BaseModel):
    name: str
    age: int
    email: str

# Define a Flask route that expects a POST request containing user data
@app.route('/user', methods=['POST'])
def create_user():
    try:
        # Parse and validate the request JSON body using Pydantic
        user_data = User.parse_raw(request.data)
    except ValidationError as e:
        # If validation fails, return the error details
        return jsonify(e.errors()), 400
    
    # If validation succeeds, do something with the valid data (like saving to a database)
    # For this example, we'll just return the validated data as JSON
    return jsonify(user_data.dict()), 201

if __name__ == '__main__':
    app.run(debug=True)

In this Flask app, we have defined a User model with Pydantic, which expects a nameage, and email. The /user route listens for POST requests, where it expects JSON data. It uses User.parse_raw() to parse and validate the incoming JSON data against the Pydantic model. If the validation passes, it returns the user data as a JSON response; if not, it returns a 400 Bad Request response with validation error details.

Conclusion

Enforcing input validation improves the quality of your code considerably. It also reduces the number of errors you face in production and improves the overall experience of your end users. For any serious Python application — input validation is a must. Pydantic makes it super simple. Whether you’re building web APIs, data processing applications, or any Python project that requires data validation, Pydantic can significantly enhance your defensive programming strategies.

Start integrating Pydantic into your projects and notice the difference in code quality and resilience.w

Checkout some golden rules of logging here: https://python.plainenglish.io/defensive-programming-in-python-part-1-logging-1e365177c5aa