Checkpointing

Checkpointing allows you to save the state of your quantum algorithm optimization at any point and resume from that exact state later. This is essential for long-running optimizations, debugging, and resuming interrupted computations.

Overview

Divi’s checkpointing system saves both the program state (parameters, losses, iteration count) and optimizer state (internal optimizer data) to disk. This allows you to:

Resume interrupted runs - Continue optimization from where it stopped
Debug optimization - Inspect intermediate states and parameters
Manage long runs - Break up very long optimizations into manageable chunks
Adjust iteration targets - Change max_iterations after loading to continue beyond the original target

Checkpointing is supported for all VariationalQuantumAlgorithm subclasses (VQE, QAOA) and works with checkpointing-capable optimizers:

MonteCarloOptimizer
PymooOptimizer (CMAES and DE methods)

Note

ScipyOptimizer does not support checkpointing due to limitations in the underlying scipy optimization methods.

Basic Usage

Saving Checkpoints

To enable checkpointing, pass a CheckpointConfig object to the run() method:

from pathlib import Path
from divi.qprog import VQE, HartreeFockAnsatz
from divi.qprog.checkpointing import CheckpointConfig
from divi.backends import ParallelSimulator
import pennylane as qml

# Create a molecule
mol = qml.qchem.Molecule(
    symbols=["H", "H"],
    coordinates=np.array([[0.0, 0.0, -0.6614], [0.0, 0.0, 0.6614]])
)

# Create VQE program
vqe = VQE(
    molecule=mol,
    ansatz=HartreeFockAnsatz(),
    n_layers=1,
    max_iterations=10,
    backend=ParallelSimulator(),
)

# Run with checkpointing enabled
checkpoint_dir = Path("my_checkpoints")
vqe.run(checkpoint_config=CheckpointConfig(checkpoint_dir=checkpoint_dir))

By default, checkpoints are saved every iteration. Each checkpoint is stored in a subdirectory named checkpoint_{iteration:03d} (e.g., checkpoint_001, checkpoint_002).

Checkpoint Interval

To save checkpoints less frequently, set the checkpoint_interval parameter:

# Save checkpoint every 5 iterations
vqe.run(
    checkpoint_config=CheckpointConfig(
        checkpoint_dir=checkpoint_dir,
        checkpoint_interval=5
    )
)

Auto-Generated Checkpoint Directories

You can automatically generate a timestamped checkpoint directory:

# Creates a directory like "checkpoint_20250115_143022"
config = CheckpointConfig.with_timestamped_dir()
vqe.run(checkpoint_config=config)

# Or with a checkpoint interval
config = CheckpointConfig.with_timestamped_dir(checkpoint_interval=5)
vqe.run(checkpoint_config=config)

Loading and Resuming

To resume from a checkpoint, use the load_state() class method:

from divi.qprog import VQE

# Load the latest checkpoint
vqe_resumed = VQE.load_state(
    checkpoint_dir="my_checkpoints",
    backend=ParallelSimulator(),
    molecule=mol,  # Must provide original problem configuration
    ansatz=HartreeFockAnsatz(),
    n_layers=1,
)

# Continue optimization
vqe_resumed.max_iterations = 20  # Set new target
vqe_resumed.run()

Important: When loading from a checkpoint, you must provide all the original constructor arguments (problem definition, ansatz, etc.) because checkpoints only store runtime state, not the problem configuration.

Loading Specific Checkpoints

By default, load_state() loads the latest checkpoint. To load a specific checkpoint:

# Load checkpoint from iteration 5
vqe_resumed = VQE.load_state(
    checkpoint_dir="my_checkpoints",
    backend=ParallelSimulator(),
    subdirectory="checkpoint_005",  # Specific checkpoint subdirectory
    molecule=mol,
    ansatz=HartreeFockAnsatz(),
    n_layers=1,
)

Complete Example: QAOA with Checkpointing

Here’s a complete example showing checkpointing with QAOA:

import networkx as nx
from pathlib import Path
from divi.qprog import QAOA, GraphProblem
from divi.qprog.checkpointing import CheckpointConfig
from divi.qprog.optimizers import PymooOptimizer, PymooMethod
from divi.backends import ParallelSimulator

# Create problem
G = nx.bull_graph()
checkpoint_dir = Path("qaoa_checkpoints")

# Initial run - first half
qaoa1 = QAOA(
    problem=G,
    graph_problem=GraphProblem.MAX_CLIQUE,
    n_layers=1,
    optimizer=PymooOptimizer(method=PymooMethod.CMAES, population_size=10),
    max_iterations=5,
    backend=ParallelSimulator(),
)

# Run with checkpointing
qaoa1.run(checkpoint_config=CheckpointConfig(checkpoint_dir=checkpoint_dir))

# Later: Resume from checkpoint
qaoa2 = QAOA.load_state(
    checkpoint_dir=checkpoint_dir,
    backend=ParallelSimulator(),
    problem=G,  # Must provide original problem
    graph_problem=GraphProblem.MAX_CLIQUE,
    n_layers=1,
)

# Continue optimization
qaoa2.max_iterations = 10
qaoa2.run()

# Access results
print(f"Best loss: {qaoa2.best_loss}")
print(f"Solution: {qaoa2.solution}")

Managing Checkpoints

Listing Checkpoints

You can list all checkpoints in a directory:

from divi.qprog.checkpointing import list_checkpoints

checkpoints = list_checkpoints(Path("my_checkpoints"))
for checkpoint in checkpoints:
    print(f"Iteration {checkpoint.iteration}: {checkpoint.path}")
    print(f"  Size: {checkpoint.size_bytes / 1024:.2f} KB")
    print(f"  Valid: {checkpoint.is_valid}")

Getting Checkpoint Information

Get detailed information about a specific checkpoint:

from divi.qprog.checkpointing import get_checkpoint_info

info = get_checkpoint_info(Path("my_checkpoints/checkpoint_005"))
print(f"Iteration: {info.iteration}")
print(f"Timestamp: {info.timestamp}")
print(f"Size: {info.size_bytes} bytes")
print(f"Valid: {info.is_valid}")

Finding the Latest Checkpoint

Get the path to the latest checkpoint:

from divi.qprog.checkpointing import get_latest_checkpoint

latest = get_latest_checkpoint(Path("my_checkpoints"))
if latest:
    print(f"Latest checkpoint: {latest}")

Cleaning Up Old Checkpoints

Remove old checkpoints, keeping only the most recent N:

from divi.qprog.checkpointing import cleanup_old_checkpoints

# Keep only the 5 most recent checkpoints
cleanup_old_checkpoints(Path("my_checkpoints"), keep_last_n=5)

Checkpoint Structure

Each checkpoint is stored in a subdirectory with the following structure:

checkpoint_dir/
├── checkpoint_001/
│   ├── program_state.json    # Program state (parameters, losses, etc.)
│   └── optimizer_state.json  # Optimizer internal state
├── checkpoint_002/
│   ├── program_state.json
│   └── optimizer_state.json
└── ...

The program_state.json file contains:

Current iteration number
Loss history
Best parameters found so far
Current parameters
Random number generator state
Algorithm-specific state (e.g., eigenstate for VQE, solution nodes for QAOA)

The optimizer_state.json file contains optimizer-specific data:

For MonteCarloOptimizer: Population, evaluated population, losses, RNG state
For PymooOptimizer: Serialized algorithm object and population

Best Practices

Use meaningful checkpoint directory names - Include experiment identifiers or timestamps
Set appropriate checkpoint intervals - For long runs, checkpoint every N iterations to save disk space
Always provide problem configuration when loading - Checkpoints don’t store problem definitions
Clean up old checkpoints - Use cleanup_old_checkpoints() to manage disk space
Verify checkpoint validity - Check is_valid before resuming from a checkpoint
Use auto-generated directories - CheckpointConfig.with_timestamped_dir() prevents accidental overwrites

Error Handling

Checkpointing operations can raise several exceptions:

CheckpointNotFoundError - Checkpoint directory or file not found
CheckpointCorruptedError - Checkpoint file is invalid or corrupted
RuntimeError - Attempting to save checkpoint before any iterations complete
ValueError - Invalid checkpoint configuration

Always handle these exceptions appropriately:

from divi.qprog.checkpointing import (
    CheckpointNotFoundError,
    CheckpointCorruptedError,
)

try:
    vqe = VQE.load_state(checkpoint_dir, backend=backend, molecule=mol)
except CheckpointNotFoundError as e:
    print(f"Checkpoint not found: {e}")
except CheckpointCorruptedError as e:
    print(f"Checkpoint corrupted: {e}")

Limitations

ScipyOptimizer does not support checkpointing
Checkpoints are not portable across different Python versions or library versions
Problem configuration must be manually provided when loading (not stored in checkpoint)
Checkpoint files can be large for population-based optimizers (MonteCarlo, Pymoo)