Automatic Differentiation

In Julia, it is often possible to automatically compute derivatives, gradients, Jacobians and Hessians of arbitrary Julia functions with precision matching the machine precision, that is, without the numerical inaccuracies incurred by finite-difference approximations.

Two general methods for automatic differentiation are available: forward and reverse mode. Forward mode is algorithmically more favorable for functions with few inputs but many outputs, while reverse mode is more efficient for functions with many parameters but few outputs (like in deep learning). In Julia, forward-mode AD is provided by the package ForwardDiff.jl, while reverse-mode AD is provided by several different packages, such as Zygote.jl and ReverseDiff.jl. Forward-mode AD generally has a lower overhead than reverse-mode AD, so for functions of a small number of parameters, say, less than about 10 or 100, forward-mode is usually most efficient. ForwardDiff.jl also has support for differentiating most of the Julia language, making the probability of success higher than for other packages, why we generally recommend trying ForwardDiff.jl first.

Linearizing nonlinear dynamics

Nonlinear dynamics on the form

\[\begin{aligned} \dot x &= f(x, u) \\ y &= g(x, u) \end{aligned}\]

is easily linearized in the point $x_0, u_0$ using ForwardDiff.jl:

using ControlSystemsBase, ForwardDiff

"An example of nonlinear dynamics"
function f(x, u)
    x1, x2 = x
    u1, u2 = u
    [x2; u1*x1 + u2*x2]

x0 = [1.0, 0.0] # Operating point to linearize around
u0 = [0.0, 1.0]

A = ForwardDiff.jacobian(x -> f(x, u0), x0)
B = ForwardDiff.jacobian(u -> f(x0, u), u0)

"An example of a nonlinear output (measurement) function"
function g(x, u)
    y = [x[1] + 0.1x[1]*u[2]; x[2]]

C = ForwardDiff.jacobian(x -> g(x, u0), x0)
D = ForwardDiff.jacobian(u -> g(x0, u), u0)

linear_sys = ss(A, B, C, D)
StateSpace{Continuous, Float64}
A = 
 0.0  1.0
 0.0  1.0
B = 
 0.0  0.0
 1.0  0.0
C = 
 1.1  0.0
 0.0  1.0
D = 
 0.0  0.1
 0.0  0.0

Continuous-time state-space model

The example above linearizes f in the point $x_0, u_0$ to obtain the linear statespace matrices $A$ and $B$, and linearizes g to obtain the linear output matrices $C$ and $D$. Instead of manually calling ForwardDiff.jl to linearize the dynamics, the user may call the function ControlSystemsBase.linearize which includes the necessary calls to ForwardDiff.jl.

Optimization-based tuning–PID controller

This example will demonstrate simple usage of AD using ForwardDiff.jl for optimization-based auto tuning of a PID controller.

The system we will control is a double-mass system, in which two masses (or inertias) are connected by a flexible transmission.

We start by defining the system model and an initial guess for the PID controller parameters

using ControlSystemsBase, ForwardDiff, Plots

P = DemoSystems.double_mass_model()

bodeplot(P, title="Bode plot of Double-mass system \$P(s)\$")
Example block output
Ω = exp10.(-2:0.03:3)
kp,ki,kd,Tf =  1, 0.1, 0.1, 0.01 # controller parameters

C  = pid(kp, ki, kd; Tf, form=:parallel, state_space=true) # Construct a PID controller with filter
G  = feedback(P*C) # Closed-loop system
S  = 1/(1 + P*C)   # Sensitivity function
Gd = c2d(G, 0.1)   # Discretize the system
res = step(Gd,15)  # Step-response

mag = bodev(S, Ω)[1]
plot(res, title="Time response", layout = (1,3), legend=:bottomright)
plot!(Ω, mag, title="Sensitivity function", xscale=:log10, yscale=:log10, subplot=2, legend=:bottomright, ylims=(3e-2, Inf))
Ms, _ = hinfnorm(S)
hline!([Ms], l=(:black, :dash), subplot=2, lab="\$M_S = \$ $(round(Ms, digits=3))", sp=2)
nyquistplot!(P*C, Ω, sp=3, ylims=(-2.1,1.1), xlims=(-2.1,1.2), size=(1200,400))
Example block output

The initial controller $C$ achieves a maximum peak of the sensitivity function of $M_S = 1.3$ which implies a rather robust tuning, but the step response is sluggish. We will now try to optimize the controller parameters to achieve a better performance.

We start by defining a helper function plot_optimized that will evaluate the performance of the tuned controller. We then define a function systems that constructs the gang-of-four transfer functions (extended_gangoffour) and performs time-domain simulations of the transfer functions $S(s)$ and $P(s)S(s)$, i.e., the transfer functions from reference $r$ to control error $e$, and the transfer function from an input load disturbance $d$ to the control error $e$. By optimizing these step responses with respect to the PID parameters, we will get a controller that achieves good performance. To promote robustness of the closed loop as well as to limit the amplification of measurement noise in the control signal, we penalize the peak of the sensitivity function $S$ as well as the (approximate) frequency-weighted $H_2$ norm of the transfer function $CS(s)$.

The constraint function constraints enforces the peak of the sensitivity function to be below Msc. Finally, we use Optimization.jl to optimize the cost function and tell it to use ForwardDiff.jl to compute the gradient of the cost function. The optimizer we use in this example is Ipopt.

using Optimization, Statistics, LinearAlgebra
using Ipopt, OptimizationMOI; MOI = OptimizationMOI.MOI

function plot_optimized(P, params, res, systems)
    fig = plot(layout=(1,3), size=(1200,400),
    for (i,params) = enumerate((params, res))
        ls = (i == 1 ? :dash : :solid)
        lab = (i==1 ? "Initial" : "Optimized")
        C, G = systems(params, P)
		r1, r2 = sim(G)
        mag = reshape(bode(G, Ω)[1], 4, :)'[:, [1, 2, 4]]
        plot!([r1, r2]; title="Time response", subplot=1,
            lab = lab .* [" \$r → e\$" " \$d → e\$"], legend=:bottomright, ls,
            fillalpha=0.05, linealpha=0.8, seriestype=:path, c=[1 3])
        plot!(Ω, mag; title="Sensitivity functions \$S(s), CS(s), T(s)\$",
            xscale=:log10, yscale=:log10, subplot=2, lab, ls,
            legend=:bottomright, fillalpha=0.05, linealpha=0.8, c=[1 2 3], linewidth=i)
        nyquistplot!(P*C, Ω; Ms_circles=Msc, sp=3, ylims=(-2.1,1.1), xlims=(-2.1,1.2), lab, seriescolor=i, ls)
    hline!([Msc], l=:dashdot, c=1, subplot=2, lab="Constraint", ylims=(9e-2, Inf))

"A helper function that creates a PID controller and closed-loop transfer functions"
function systemspid(params, P)
    kp,ki,kd,Tf = params # We optimize parameters in
    C    = pid(kp, ki, kd; form=:parallel, Tf, state_space=true)
    G    = extended_gangoffour(P, C) # [S PS; CS T]
    C, G

"A helper function that simulates the closed-loop system"
function sim(G)
    Gd = c2d(G, 0.1, :tustin)   # Discretize the system
    res1 = step(Gd[1, 1], 0:0.1:15) # Simulate S
    res2 = step(Gd[1, 2], 0:0.1:15) # Simulate PS
    res1, res2

"The cost function to optimize"
function cost(params::AbstractVector{T}, (P, systems)) where T
    CSweight = 0.001 # Noise amplification penalty
    C, G = systems(params, P)
    res1, res2 = sim(G)
    R, _ = bodev(G[2, 1], Ω; unwrap=false)
    CS = sum(R .*= Ω)               # frequency-weighted noise sensitivity
    perf = mean(abs2, res1.y .*= res1.t') + mean(abs2, res2.y .*= res2.t')
    return perf + CSweight * CS # Blend all objectives together

"The sensitivity constraint to enforce robustness"
function constraints(res, params::AbstractVector{T}, (P, systems)) where T
    C, G = systems(params, P)
    S, _ = bodev(G[1, 1], Ω; unwrap=false)
    res .= maximum(S) # max sensitivity

Msc = 1.3        # Constraint on Ms

params  = [kp, ki, kd, 0.01] # Initial guess for parameters

solver = Ipopt.Optimizer()
MOI.set(solver, MOI.RawOptimizerAttribute("print_level"), 0)
MOI.set(solver, MOI.RawOptimizerAttribute("max_iter"), 200)
MOI.set(solver, MOI.RawOptimizerAttribute("acceptable_tol"), 1e-1)
MOI.set(solver, MOI.RawOptimizerAttribute("acceptable_constr_viol_tol"), 1e-2)
MOI.set(solver, MOI.RawOptimizerAttribute("acceptable_iter"), 5)
MOI.set(solver, MOI.RawOptimizerAttribute("hessian_approximation"), "limited-memory")

fopt = OptimizationFunction(cost, Optimization.AutoForwardDiff(); cons=constraints)

prob = OptimizationProblem(fopt, params, (P, systemspid);
    lb    = fill(-10.0, length(params)),
    ub    = fill(10.0, length(params)),
    ucons = fill(Msc, 1),
    lcons = fill(-Inf, 1),

res = solve(prob, solver)
plot_optimized(P, params, res.u, systemspid)
Example block output

The optimized controller achieves more or less the same low peak in the sensitivity function, but does this while both making the step responses significantly faster and using much less controller gain for large frequencies (the orange sensitivity function), an altogether better tuning. The only potentially negative effect of this tuning is that the overshoot in response to a reference step increased slightly, indicated also by the slightly higher peak in the complimentary sensitivity function (green). However, the response to reference steps can (and most often should) be additionally shaped by reference pre-filtering (sometimes referred to as "feedforward" or "reference shaping"), by introducing an additional filter appearing in the feedforward path only, thus allowing elimination of the overshoot without affecting the closed-loop properties.

Optimization-based tuning–LQG controller

We could attempt a similar automatic tuning of an LQG controller. This time, we choose to optimize the weight matrices of the LQR problem and the state covariance matrix of the noise. The synthesis of an LQR controller involves the solution of a Ricatti equation, which in turn involves performing a Schur decomposition. These steps hard hard to differentiate through in a conventional way, but we can make use of implicit differentiation using the implicit function theorem. To do so, we load the package ImplicitDifferentiation, and define the conditions that hold at the solution of the Ricatti equation:

\[A^TX + XA - XBR^{-1}B^T X + Q = 0\]

When ImplicitDifferentiation is loaded, differentiable versions of lqr and kalman that make use of the "implicit function" are automatically loaded.

using ImplicitDifferentiation, ComponentArrays # Both these packages are required to load the implicit differentiation rules

Since this is a SISO system, we do not need to tune the control-input matrix or the measurement covariance matrix, any non-unit weight assigned to those can be associated with the state matrices instead. Since these matrices are supposed to be positive semi-definite, we optimize Cholesky factors rather than the full matrices.

function triangular(x)
    m = length(x)
    n = round(Int, sqrt(2m-1))
    T = zeros(eltype(x), n, n)
    k = 1
    for i = 1:n, j = i:n
        T[i,j] = x[k]
        k += 1
invtriangular(T) = [T[i,j] for i = 1:size(T,1) for j = i:size(T,1)]

function systemslqr(params::AbstractVector{T}, P) where T
    n2 = length(params) ÷ 2
    Qchol = triangular(params[1:n2])
    Rchol = triangular(params[n2+1:2n2])
    Q = Qchol'Qchol
    R = Rchol'Rchol
    L = lqr(P, Q, one(T)*I(1)) # It's important that the last matrix has the correct type
    K = kalman(P, R, one(T)*I(1))
    C = observer_controller(P, L, K)
    G = extended_gangoffour(P, C) # [S PS; CS T]
    C, G

Q0 = diagm([1.0, 1, 1, 1]) # Initial guess LQR state penalty
R0 = diagm([1.0, 1, 1, 1]) # Initial guess Kalman state covariance
params2 = [invtriangular(cholesky(Q0).U); invtriangular(cholesky(R0).U)]

prob2 = OptimizationProblem(fopt, params2, (P, systemslqr);
    lb    = fill(-10.0, length(params2)),
    ub    = fill(10.0, length(params2)),
    ucons = fill(Msc, 1),
    lcons = fill(-Inf, 1),

res2 = solve(prob2, solver)
plot_optimized(P, params2, res2.u, systemslqr)
Example block output

This controller should perform better than the PID controller, which is known to be incapable of properly damping the resonance in a double-mass system. However, we did not include any integral action in the LQG controller, which has implication for the disturbance response, as indicated by the steady-state error in the green step response in the simulation above.

Robustness analysis

To check the robustness of the designed LQG controller w.r.t. parametric uncertainty in the plant, we load the package MonteCarloMeasurements and recreate the plant model with 20% uncertainty in the spring coefficient.

using MonteCarloMeasurements
Pu = DemoSystems.double_mass_model(k = Particles(32, Uniform(80, 120))) # Create a model with uncertainty in spring stiffness k ~ U(80, 120)
unsafe_comparisons(true) # For the Bode plot to work

C,_ = systemslqr(res2.u, P)             # Get the controller assuming P without uncertainty
Gu = extended_gangoffour(Pu, C)     # Form the gang-of-four with uncertainty
w = exp10.(LinRange(-1.5, 2, 500))
bodeplot(Gu, w, plotphase=false, ri=false, N=32, ylims=(1e-1, 30), layout=1, sp=1, c=[1 2 4 3], lab=["S" "CS" "PS" "T"])
hline!([Msc], l=:dashdot, c=1, lab="Constraint", ylims=(9e-2, Inf))
Example block output

The uncertainty in the spring stiffness caused an uncertainty in the resonant peak in the sensitivity functions, it's a good thing that we designed a controller that was conservative with a large margin (small $M_S$) so that all the plausible variations of the plant are expected to behave reasonably well:

Gd   = c2d(Gu, 0.05)   # Discretize the system
r1 = step(Gd[1,1], 0:0.05:15) # Simulate S
r2 = step(Gd[1,2], 0:0.05:15) # Simulate PS
plot([r1, r2]; title="Time response",
            lab = [" \$r → e\$" " \$d → e\$"], legend=:bottomright,
            fillalpha=0.05, linealpha=0.8, seriestype=:path, c=[1 3], ri=false, N=32)
Example block output

Parameterizing the controller using feedback gains

For completeness, lets also parameterize the observer-based state-feedback controller using the gain matrices directly, that is, we search directly over $L$ and $K$. This is typically a harder problem since the search space contains non-stabilizing controllers, and the set of stabilizing gains is non-convex. (For state feedback, a nice theoretical result exists that says that there are no local minima, but the space of stabilizing gains is still non-convex.)

function systems_sf(params::AbstractVector{T}, P) where T
    n2 = length(params) ÷ 2
    L = params[1:n2]'
    K = params[n2+1:2n2, 1:1]
    C = observer_controller(P, L, K)
    G = extended_gangoffour(P, C) # [S PS; CS T]
    C, G

L0 = lqr(P, Q0, I) # Initial guess
K0 = kalman(P, R0, I)
params3 = [vec(L0); vec(K0)]
prob3 = OptimizationProblem(fopt, params3, (P, systems_sf);
    lb    = fill(-15.0, length(params3)),
    ub    = fill(15.0, length(params3)),
    ucons = fill(Msc, 1),
    lcons = fill(-Inf, 1),
res3 = solve(prob3, solver)
plot_optimized(P, params3, res3.u, systems_sf)
Example block output

Known limitations

The following issues are currently known to exist when using AD through ControlSystems.jl:


ForwardDiff.jl works for a lot of workflows without any intervention required from the user. The following known limitations exist:

  • The function c2d with the default :zoh discretization method makes a call to LinearAlgebra.exp!, which is not defined for ForwardDiff.Dual numbers. A forward rule for this function exist in ChainRules, which can be enabled using ForwardDiffChainRules.jl, but this PR must be merged and released before it will work as intended. A workaround is to use the :tustin method instead, or manually defining this method.
  • The function svdvals does not have a forward rule defined. This means that the functions sigma and opnorm will not work for MIMO systems with ForwardDiff. SISO, MISO and SIMO systems will, however, work.
  • hinfnorm requires ImplicitDifferentiation.jl and ComponentArrays.jl to be manually loaded by the user, after which there are implicit differentiation rules defined for hinfnorm. The implicit rule calls opnorm, and is thus affected by the first limitation above for MIMO systems. hinfnorm has a reverse rule defined in RobustAndOptimalControl.jl, which is not affected by this limitation.
  • are, lqr and kalman all require ImplicitDifferentiation.jl and ComponentArrays.jl to be manually loaded by the user, after which there are implicit differentiation rules defined. To invoke the correct method of these functions, it is important that the second matrix (corresponding to input or measurement) has the Dual number type, i.e., the R matrix in lqr(P, Q, R) or lqr(Continuous, A, B, Q, R)
  • The schur factorization has an implicit differentiation rule defined, but the companion function ordschur does not. This is the fundamental reason for requiring ImplicitDifferentiation.jl to differentiate through the Ricatti equation solver. schur is called in several additional places, including balreal and all lyap solvers. Many of these algorithms also call givensAlgorithm which has no rule either.
  • An implicit rule is defined for continuous-time lyap and plyap solvers, but not yet for discrete-time solvers. This means that gram covar and norm ($H_2$-norm) is differentiable for continuous-time systems but not for discrete.

Reverse-mode AD

  • Zygote does not work very well at all, due to
    • Frequent use of mutation for performance
    • Try/catch blocks