Automatic Differentiation

In Julia, it is often possible to automatically compute derivatives, gradients, Jacobians and Hessians of arbitrary Julia functions with precision matching the machine precision, that is, without the numerical inaccuracies incurred by finite-difference approximations.

Two general methods for automatic differentiation are available: forward and reverse mode. Forward mode is algorithmically more favorable for functions with few inputs but many outputs, while reverse mode is more efficient for functions with many parameters but few outputs (like in deep learning). In Julia, forward-mode AD is provided by the package ForwardDiff.jl, while reverse-mode AD is provided by several different packages, such as Zygote.jl and ReverseDiff.jl. Forward-mode AD generally has a lower overhead than reverse-mode AD, so for functions of a small number of parameters, say, less than about 10 or 100, forward-mode is usually most efficient. ForwardDiff.jl also has support for differentiating most of the Julia language, making the probability of success higher than for other packages, why we generally recommend trying ForwardDiff.jl first.

Linearizing nonlinear dynamics

Nonlinear dynamics on the form

\[\begin{aligned} \dot x &= f(x, u) \\ y &= g(x, u) \end{aligned}\]

is easily linearized in the point $x_0, u_0$ using ForwardDiff.jl:

using ControlSystemsBase, ForwardDiff

"An example of nonlinear dynamics"
function f(x, u)
    x1, x2 = x
    u1, u2 = u
    [x2; u1*x1 + u2*x2]

x0 = [1.0, 0.0] # Operating point to linearize around
u0 = [0.0, 1.0]

A = ForwardDiff.jacobian(x -> f(x, u0), x0)
B = ForwardDiff.jacobian(u -> f(x0, u), u0)

"An example of a nonlinear output (measurement) function"
function g(x, u)
    y = [x[1] + 0.1x[1]*u[2]; x[2]]

C = ForwardDiff.jacobian(x -> g(x, u0), x0)
D = ForwardDiff.jacobian(u -> g(x0, u), u0)

linear_sys = ss(A, B, C, D)
StateSpace{Continuous, Float64}
A = 
 0.0  1.0
 0.0  1.0
B = 
 0.0  0.0
 1.0  0.0
C = 
 1.1  0.0
 0.0  1.0
D = 
 0.0  0.1
 0.0  0.0

Continuous-time state-space model

The example above linearizes f in the point $x_0, u_0$ to obtain the linear statespace matrices $A$ and $B$, and linearizes g to obtain the linear output matrices $C$ and $D$.

Optimization-based tuning–PID controller

This example will demonstrate simple usage of AD using ForwardDiff.jl for optimization-based auto tuning of a PID controller.

The system we will control is a double-mass system, in which two masses (or inertias) are connected by a flexible transmission.

We start by defining the system model and an initial guess for the PID controller parameters

using ControlSystemsBase, ForwardDiff, Plots

P = DemoSystems.double_mass_model()

bodeplot(P, title="Bode plot of Double-mass system \$P(s)\$")
Ω = exp10.(-2:0.03:3)
kp,ki,kd,Tf =  1, 0.1, 0.1, 0.01 # controller parameters

C  = pid(kp, ki, kd; Tf, form=:parallel, state_space=true) # Construct a PID controller with filter
G  = feedback(P*C) # Closed-loop system
S  = 1/(1 + P*C)   # Sensitivity function
Gd = c2d(G, 0.1)   # Discretize the system
res = step(Gd,15)  # Step-response

mag = bodev(S, Ω)[1]
plot(res, title="Time response", layout = (1,3), legend=:bottomright)
plot!(Ω, mag, title="Sensitivity function", xscale=:log10, yscale=:log10, subplot=2, legend=:bottomright, ylims=(3e-2, Inf))
Ms, _ = hinfnorm(S)
hline!([Ms], l=(:black, :dash), subplot=2, lab="\$M_S = \$ $(round(Ms, digits=3))", sp=2)
nyquistplot!(P*C, Ω, sp=3, ylims=(-2.1,1.1), xlims=(-2.1,1.2), size=(1200,400))

The initial controller $C$ achieves a maximum peak of the sensitivity function of $M_S = 1.3$ which implies a rather robust tuning, but the step response is sluggish. We will now try to optimize the controller parameters to achieve a better performance.

We start by defining a helper function plot_optimized that will evaluate the performance of the tuned controller. We then define a function systems that constructs the gang-of-four transfer functions (extended_gangoffour) and performs time-domain simulations of the transfer functions $S(s)$ and $P(s)S(s)$, i.e., the transfer functions from reference $r$ to control error $e$, and the transfer function from an input load disturbance $d$ to the control error $e$. By optimizing these step responses with respect to the PID parameters, we will get a controller that achieves good performance. To promote robustness of the closed loop as well as to limit the amplification of measurement noise in the control signal, we penalize the peak of the sensitivity function $S$ as well as the (approximate) frequency-weighted $H_2$ norm of the transfer function $CS(s)$.

The constraint function constraints enforces the peak of the sensitivity function to be below Msc. Finally, we use Optimization.jl to optimize the cost function and tell it to use ForwardDiff.jl to compute the gradient of the cost function. The optimizer we use in this example is Ipopt.

using Optimization, Statistics, LinearAlgebra
using Ipopt, OptimizationMOI; MOI = OptimizationMOI.MOI

function plot_optimized(P, params, res, systems)
    fig = plot(layout=(1,3), size=(1200,400),
    for (i,params) = enumerate((params, res))
        ls = (i == 1 ? :dash : :solid)
        lab = (i==1 ? "Initial" : "Optimized")
        C, G = systems(params, P)
		r1, r2 = sim(G)
        mag = reshape(bode(G, Ω)[1], 4, :)'[:, [1, 2, 4]]
        plot!([r1, r2]; title="Time response", subplot=1,
            lab = lab .* [" \$r → e\$" " \$d → e\$"], legend=:bottomright, ls,
            fillalpha=0.05, linealpha=0.8, seriestype=:path, c=[1 3])
        plot!(Ω, mag; title="Sensitivity functions \$S(s), CS(s), T(s)\$",
            xscale=:log10, yscale=:log10, subplot=2, lab, ls,
            legend=:bottomright, fillalpha=0.05, linealpha=0.8, c=[1 2 3], linewidth=i)
        nyquistplot!(P*C, Ω; Ms_circles=Msc, sp=3, ylims=(-2.1,1.1), xlims=(-2.1,1.2), lab, seriescolor=i, ls)
    hline!([Msc], l=:dashdot, c=1, subplot=2, lab="Constraint", ylims=(9e-2, Inf))

"A helper function that creates a PID controller and closed-loop transfer functions"
function systemspid(params, P)
    kp,ki,kd,Tf = params # We optimize parameters in
    C    = pid(kp, ki, kd; form=:parallel, Tf, state_space=true)
    G    = extended_gangoffour(P, C) # [S PS; CS T]
    C, G

"A helper function that simulates the closed-loop system"
function sim(G)
    Gd = c2d(G, 0.1, :tustin)   # Discretize the system
    res1 = step(Gd[1, 1], 0:0.1:15) # Simulate S
    res2 = step(Gd[1, 2], 0:0.1:15) # Simulate PS
    res1, res2

"The cost function to optimize"
function cost(params::AbstractVector{T}, (P, systems)) where T
    CSweight = 0.001 # Noise amplification penalty
    C, G = systems(params, P)
    res1, res2 = sim(G)
    R, _ = bodev(G[2, 1], Ω; unwrap=false)
    CS = sum(R .*= Ω)               # frequency-weighted noise sensitivity
    perf = mean(abs2, res1.y .*= res1.t') + mean(abs2, res2.y .*= res2.t')
    return perf + CSweight * CS # Blend all objectives together

"The sensitivity constraint to enforce robustness"
function constraints(res, params::AbstractVector{T}, (P, systems)) where T
    C, G = systems(params, P)
    S, _ = bodev(G[1, 1], Ω; unwrap=false)
    res .= maximum(S) # max sensitivity

Msc = 1.3        # Constraint on Ms

params  = [kp, ki, kd, 0.01] # Initial guess for parameters

solver = Ipopt.Optimizer()
MOI.set(solver, MOI.RawOptimizerAttribute("print_level"), 0)
MOI.set(solver, MOI.RawOptimizerAttribute("max_iter"), 200)
MOI.set(solver, MOI.RawOptimizerAttribute("acceptable_tol"), 1e-1)
MOI.set(solver, MOI.RawOptimizerAttribute("acceptable_constr_viol_tol"), 1e-2)
MOI.set(solver, MOI.RawOptimizerAttribute("acceptable_iter"), 5)
MOI.set(solver, MOI.RawOptimizerAttribute("hessian_approximation"), "limited-memory")

fopt = OptimizationFunction(cost, Optimization.AutoForwardDiff(); cons=constraints)

prob = OptimizationProblem(fopt, params, (P, systemspid);
    lb    = fill(-10.0, length(params)),
    ub    = fill(10.0, length(params)),
    ucons = fill(Msc, 1),
    lcons = fill(-Inf, 1),

res = solve(prob, solver)
plot_optimized(P, params, res.u, systemspid)

The optimized controller achieves more or less the same low peak in the sensitivity function, but does this while both making the step responses significantly faster and using much less controller gain for large frequencies (the orange sensitivity function), an altogether better tuning. The only potentially negative effect of this tuning is that the overshoot in response to a reference step increased slightly, indicated also by the slightly higher peak in the complimentary sensitivity function (green). However, the response to reference steps can (and most often should) be additionally shaped by reference pre-filtering (sometimes referred to as "feedforward" or "reference shaping"), by introducing an additional filter appearing in the feedforward path only, thus allowing elimination of the overshoot without affecting the closed-loop properties.

Optimization-based tuning–LQG controller

We could attempt a similar automatic tuning of an LQG controller. This time, we choose to optimize the weight matrices of the LQR problem and the state covariance matrix of the noise. The synthesis of an LQR controller involves the solution of a Ricatti equation, which in turn involves performing a Schur decomposition. These steps hard hard to differentiate through in a conventional way, but we can make use of implicit differentiation using the implicit function theorem. To do so, we load the package ImplicitDifferentiation, and define the conditions that hold at the solution of the Ricatti equation:

\[A^TX + XA - XBR^{-1}B^T X + Q = 0\]

When ImplicitDifferentiation is loaded, differentiable versions of lqr and kalman that make use of the "implicit function" are automatically loaded.

using ImplicitDifferentiation, ComponentArrays # Both these packages are required to load the implicit differentiation rules

Since this is a SISO system, we do not need to tune the control-input matrix or the measurement covariance matrix, any non-unit weight assigned to those can be associated with the state matrices instead. Since these matrices are supposed to be positive semi-definite, we optimize Cholesky factors rather than the full matrices.

function triangular(x)
    m = length(x)
    n = round(Int, sqrt(2m-1))
    T = zeros(eltype(x), n, n)
    k = 1
    for i = 1:n, j = i:n
        T[i,j] = x[k]
        k += 1
invtriangular(T) = [T[i,j] for i = 1:size(T,1) for j = i:size(T,1)]

function systemslqr(params::AbstractVector{T}, P) where T
    n2 = length(params) ÷ 2
    Qchol = triangular(params[1:n2])
    Rchol = triangular(params[n2+1:2n2])
    Q = Qchol'Qchol
    R = Rchol'Rchol
    L = lqr(P, Q, one(T)*I(1)) # It's important that the last matrix has the correct type
    K = kalman(P, R, one(T)*I(1))
    C = observer_controller(P, L, K)
    G = extended_gangoffour(P, C) # [S PS; CS T]
    C, G

Q0 = diagm([1.0, 1, 1, 1]) # Initial guess LQR state penalty
R0 = diagm([1.0, 1, 1, 1]) # Initial guess Kalman state covariance
params2 = [invtriangular(cholesky(Q0).U); invtriangular(cholesky(R0).U)]

prob2 = OptimizationProblem(fopt, params2, (P, systemslqr);
    lb    = fill(-10.0, length(params2)),
    ub    = fill(10.0, length(params2)),
    ucons = fill(Msc, 1),
    lcons = fill(-Inf, 1),

res2 = solve(prob2, solver)
plot_optimized(P, params2, res2.u, systemslqr)

This controller should perform better than the PID controller, which is known to be incapable of properly damping the resonance in a double-mass system. However, we did not include any integral action in the LQG controller, which has implication for the disturbance response, as indicated by the steady-state error in the green step response in the simulation above.

Robustness analysis

To check the robustness of the designed LQG controller w.r.t. parametric uncertainty in the plant, we load the package MonteCarloMeasurements and recreate the plant model with 20% uncertainty in the spring coefficient.

using MonteCarloMeasurements
Pu = DemoSystems.double_mass_model(k = Particles(32, Uniform(80, 120))) # Create a model with uncertainty in spring stiffness k ~ U(80, 120)
unsafe_comparisons(true) # For the Bode plot to work

C,_ = systemslqr(res2.u, P)             # Get the controller assuming P without uncertainty
Gu = extended_gangoffour(Pu, C)     # Form the gang-of-four with uncertainty
w = exp10.(LinRange(-1.5, 2, 500))
bodeplot(Gu, w, plotphase=false, ri=false, N=32, ylims=(1e-1, 30), layout=1, sp=1, c=[1 2 4 3], lab=["S" "CS" "PS" "T"])
hline!([Msc], l=:dashdot, c=1, lab="Constraint", ylims=(9e-2, Inf))