Numerical Solutions of Differential Equations

Whilst analytical methods give exact outputs for differential equations, they can become tedious to solve quickly, and very quickly become intractable. When a problem is intractable to solve analytically, then we have to use a numerical method to approximate it.

Numerical methods introduce sources of error to the solution, which must be controlled. Errors include truncation and round-off, which will quickly compound over many iterations.

To counter this, we make use of error balance in numerical integration.

Euler Method

Many differential equations cannot be solved exactly, or the analytical solution is very complicated. The idea behind the Euler method is to take the derivation for the differential equation we got from the Malthusian model, in the limit of \(\Delta t \to 0\) of a difference equation.

In numerical schemes, we find difference equations that have distinct time update steps to approximate the derivatives that we get from the continuous differential functions.

Initial Value Problem

In general, we have the initial value problem, of the form:

\[ \frac{dy}{dt} = f(y,t), y(0) = y_0 \]

From first principles, we recall that a derivative of the function \(y\) with respect to \(t\) is:

\[ \frac{dy}{dt} = \lim_{\Delta t \to 0} \frac{y(t+\Delta t) - y(t)}{\Delta t} \]

Therefore, for sufficiently small \(\Delta t\), we can say:

\[ \frac{dy}{dt} \approx \frac{y(t+\Delta t) - y(t)}{\Delta t} \]

To then put all the steps together, we can take the function \(f(y,t)\) and give an approximation of the derivative as:

\[ f(y,t) = \frac{dy}{dt} \approx \frac{y(t+\Delta t) - y(t)}{\Delta t} \]

From this, we find that \(y(t + \Delta t) \approx y(t) + \Delta t f (y,t)\).

We can then discretize an interval \(T\) into \(N+1\) intervals of some length \(h\), such that we form a line of \(y_0\) to \(y_{N+1}\), with \(t_0=0,t_1 = h, t_2 = 2h, \dots, t_i = ih, t_{N+1} = T\).

We have then derived Euler's formula to approximate the solutions, which is:

\[ y_{n+1} = y_n + hf(t_n,y_n) \]

Example: Malthusian Growth

Recall the equation for Malthusian growth:

\[ \frac{dP}{dt} = rP, P(0) = P_0 \]

This has the analytical solution:

\[ P(t) = P_0 e^{rt} \]

But if we apply Euler's method, such that \(P_{n+1} = P_n + hrP_n\), we get \(t = hn\forall n \in \{0,1,\dots\}\).

We can then chain this back until we reach the initial condition:

\[ \begin{align} P_{n+1} &= (1+hr)P_n\\ &= (1+hr)^2P_{n-1}\\ &= (1+hr)^3P_{n-2}\\ &= (1+hr)^{n+1}P_{0} \end{align} \]

In the general case, then, we end up with \(P_n = (1+hr)^n P_0\), and the function for Euler as:

\[ P^{\text{Euler}}(t) = (1 + hr)^{t/h} P_0 \]

Here, the \(t/h\) comes from the fact that \(t\) is divided into \(n\) steps of size \(h\), and as \(n\) gets smaller and smaller, we get closer and closer to the analytical solution.

Euler's method is only exact in the limit of infinitely small step sizes. For a finite step size, we will always have some error. The larger the step size, the higher the deviation is likely to be.

For simple functions, e.g., Malthusian growth, the Euler method works well for up to \(t \approx 5\), at which point it begins to underestimate. For a linear function, the error is zero.

On the slides some graphs are presented for \(\frac{dP}{dt} = 0.2P,P_0 = 50,h=0.1\), for the graph on the left and then Malthusian growth on the right.

Sample Code

We can write some simple C code to approximate an Euler scheme, passing in \(x_0,y_0,h,x_n\):

#include<stdio.h>
float fun(float x,float y) {
  float f;
  f=x+y;
  return f;
}

main() {
  float a,b,x,y,h,t,k;
  printf("\nEnter x0,y0,h,xn: "); scanf("%f%f%f%f",&a,&b,&h,&t,);
  x=a; y=b;
  printf("\n x\t y\n");

  while(x<=t) {
    k = h*fun(x,y);
    y=y+k;
    x=x+h;
    printf("%0.3f\t%0.3f\n",x,y);
  }
}

Error

The Euler scheme has what is known as a truncation error. The local error can be approximated approximately by the differential quotient, that is:

\[ y(t+h) \approx \underbrace{y(t)}_{\text{real function}} + \underbrace{hy'}_{\text{error source}} \]

If we look back to the Taylor series, we can see that there is a local error \(h\) introduced at every step, and this error is \(\propto h^2\):

\[ y(t+h) = y(t) + hy' + \frac{h^2}{2!}y'' + \frac{h^3}{3!} y'''+ \cdots \]

This error may not seem that large, but as we compound calculations, we get a bigger global error.

Similar to how the Taylor series could use the central difference to exhibit a lower error, which would be an order of magnitude difference, the Euler scheme is 'first order', as for integration over time \(t\), we need \(t/h\) steps. At each step, we accumulate some error \(\propto h^2\). The Euler scheme is first order, as the local error scales proportional to the square of the step size, but as we accumulate \(h^2\) errors in \(t/h\), we multiply them together to get:

\[ h^2 \frac{t}{h} = th \]

as the overall error. The accuracy of this scheme can be improved by decreasing the step length \(h\).

Truncation and Round-off Errors

So far, we have discussed what are known as truncation errors. These are the errors that are inherent from an approximation of an exact mathematical procedure, e.g., the Taylor expansion of a function.

In contrast, a round off error is one that occurs from having numbers with a limited set of significant digits, which represent exact numbers.

Floating Point Representations (IEEE 754)

Computers cannot store numbers to an infinite precision. Therefore, when a computer has to store some arbitrary number, it does so using a floating point representation. This representation is of the form:

\[ N = mb^e \]

where \(N\) is the number to store, \(m\) is the Mantissa fractional part, \(b\) is the base of the number system, and \(e\) is the exponent.

This system allows us to represent any number to some precision value, yet still store it using an efficient number of bits. Floating point representations do a good job of storing the most significant parts of the number but always round them off to some degree.

If we take, for example, the fraction \(\frac{1}{27}\), we know the solution for this is \(0.037037037\dots\), but at some point the computer cannot store the remaining \(\dots\). Therefore, we have to round off the next term in the sequence and store it at the end of our precision.

If we use 4 digits, we could store this as \(0.0370 \times 10^0\), or if we normalize \(m\) to be in the range \((1/b,1)\) then we could store is as \(0.3703 \times 10^{-1}\).

The round off error also increases with the value of \(x\), as, for example \(0.3516\times10^4\) has a possible \(\Delta x = 1\), but storing \(0.3516\times 10^0\) only has a \(\Delta x = 0.0001\).

As an aside: the Mantissa is simply the part that we multiply out or divide to get the actual representation of the number.

We can visualize the error of adding two numbers, \(2.365\), and \(0.01234\) together:

\[ \begin{align} & 0.2365 & \times 10^1\\ + \quad& 0.001234 & \times 10^1\\ = \quad& 0.237734 & \times 10^1\\ = \quad& 0.2337 &\times 10 ^1 \end{align} \]

As we can see here, the last two digits have now been lost. These errors really matter when we are adding large and small numbers together, as in the example above, or when we subtract two nearly equal numbers.

The most important thing with simulations modelling, though, is that the errors compound over time. As simulations typically run for thousands of iterations, the errors really start to add up.

The total error is the sum of the truncation error and all the round off errors.

Mitigations

Whilst errors are annoying to deal with, they can be mitigated somewhat by using a higher precision format, but this comes with a processing and storage penalty.

Sometimes there are mathematical techniques that can be used to reduce round off error, for example compensated summation, or Kahan summation.

Categorising Round-off Errors in the Euler Scheme

We define \(\epsilon\) as the machine precision. At each step \(n\) of the Euler scheme, the rounding error is \(\epsilon y_n\). In \(N\) steps, the error is approximately \(N\epsilon y_0\), and this is true if all errors have the same signedness (e.g., all \(<0\) or all \(>0\)).

Round-off errors commonly have different signs to each other, so the error usually approximates \(\sqrt{N} \epsilon y_0\).

Numerical Stability

The stability of a numerical solution may differ to the stability of an analytical solution. This can be the case even for Malthusian growth, which we would have thought would be an easier function to try and approximate numerically.

If we have some \(r<0\), e.g., the size of the population is shrinking, then the analytical solution can be solved for any arbitrary number. The numerical method must take some step size \(h\). Of this step size is such that \(|hr| < 1\), the solution remains stable, but for a \(|hr| > 1\), then the behaviour of the system numerically is different from the true solution.

For an equation \(\frac{dx}{dt} = -2.3x, x(0)=1\), we yield the following plot:

We can see that where \(h < 1\), we have a stable solution, and for \(h=1\) it becomes unstable.

When is a Solution Stable

We can categorise a solution as numerically stable if a small deviation from the true solution doesn't tend to grow as the solution is iterated.

If we have a small deviation from the exact value of \(\delta y\) at some time, we want to see how this error will propagate. We start with the Euler scheme:

Not fully understood

I don't fully get how this derivation works.

\[ \begin{align} y_{n+1} &= y_n + hf(y_n,t_n)\\ y_{n+1} + \delta y_{n+1} &= y_n + \delta y_n + hf(y_n + \delta y_n, t_n)\\ y_{n+1} + \delta y_{n+1} &= y_n + \delta y_n + \underbrace{hf(y_n,t_n) + hf_y \delta y_n + \text{higher order terms}}_\text{Taylor series expansion}\\ \delta y_{n+1} &= \delta y_n(1+hf_y) \end{align} \]

We can say that the Euler scheme is numerically stable if \(|1+hf_y| < 1\). We can monitor this function to ensure that the step size chosen is small enough. If \(y\) is complex, then the solution will be a plane in the \(\Re,\Im\) directions, giving a region of stability, instead of a 1-dimensional bound.

Backwards Euler

So far, we have always taken \(y_n\) and integrated forward to get \(y_{n+1}\). If we do this the other way round, and put our estimate not at the first point in the interval, e.g.,

\[ \begin{align} y_{n+1} &= y_n + hf(y_{n\textcolor{red}{+1}}, t_n + h) \text{ as opposed to}\\ y_{n+1} &= y_n + hf(y_{n}, t_n + h) \end{align} \]

The backwards scheme is different because it appears on both the LHS and RHS of the equation. Thus, we call it an implicit method, because to figure out the next step, we have to solve an implicit equation.

Implicit methods are more computationally expensive to run, as we need to solve an implicit method such as the Newton-Raphson method. Using the forward Euler scheme, we always take the slope at the start of the interval, which for an increasing function will cause us to underestimate the slope.

In contrast, for the same function, the backwards scheme will always overestimate.

For backwards Euler, we need some trick to calculate the next step, because it is an implicit equation, so can't be solved normally. Computationally, it is much more expensive to run but we get the advantage in that we have a much larger area of stability, as shown by the grey area on the plot:

In contrast to forward Euler, where the stability is only in a much smaller area, the backwards scheme is stable everywhere except for that small area.

Explicit vs Implicit

Explicit schemes such as forwards Euler get the future information based on the past and present implementation. They are easy to implement, but to avoid instability, small time steps are required.

In contracts, backwards Euler takes future information based on the past, present and future information. They are much more difficult to program, and require more memory for processing. These functions are, however, stable over large time steps, and dependent on the problem might even save CPU time.

Improved Euler Scheme (Heun Scheme)

Instead of taking the estimate at the first interval, we take the derivative using the central difference scheme. Instead of using the point at the start of the interval and taking the derivative there, we first have to calculate the centre point.

To calculate the centry point, we'd normally need the next part of the interval, which we don't yet know (this is what we are trying to calculate. We can, however, make a prediction of where this point will be if we use a guiding line. We can define the two guiding lines as:

\[ \begin{align} s_1 &= f(y_n, t_n)\\ s_2 &= f(y_n + hf(y_n,t_n), t_n + h) \end{align} \]

We can then take the average slope from the two slopes. This type of method is called a predictor-corrector method, as we first make a prediction, then take the real step based on the prediction slopes.

To average \(s_{1,2}\), we take \(s = 1/2(s_1 + s_2)\):

\[ \begin{align} y_{n+1} &= y_n + h/2(s_1 + s_2)\\ &= y_n + h/2(f(y_n,t_n) + f(y_n + hf(y_n,t_n), t_n + h) \end{align} \]

This new method is second order, so has a local error \(h^3\) and a global error \(h^2\).

Order

On the LHS of the equation, we always have \(y\) of \(t+\Delta t\). On the RHS, we have the terms according to the method that we use.

To compare the order of the methods, we expand both sides with a Taylor expansion and in \(\Delta t\), we compare order by order to see how many of the orders match.

Missing a bit

The lecture slides show the way to do this, but it's tedious to write out, so I skip it here.

This method is meant to approximate \(y(t_n + h)\), so we also expand the LHS, in addition to the RHS.

General Range Kutta Methods

Heun's method is a second order method. We can go much further by parameterising the RHS of the equation. We can do more and more prediction steps in the interval, then use Heun's method to expand individually:

\[ \begin{align*} y_{n+1} &= y_n + w_1 K_1 + w_2 K_2 + \dots + w_m K_m \\ K_1 &= h f(y_n, t_n) \\ K_2 &= h f(y_n + b_2 K_1, t_n + a_2 h) \\ K_3 &= h f(y_n + b_3 K_1 + c_3 K_2, t_n + a_3 h) \\ & \vdots \\ K_m &= h f\left(y_n + \sum_{i=1}^{m-1} K_i \phi_i, t_n + a_m h\right) \end{align*} \]

We can then tune the parameters \(a_i, b_i, c_i, \phi_i, w_i\) to get a general solution. As per usual, there is a trade off as we have to do a lot more computation to calculate all the supporting points.

Standard Today

The standard for numerical integration of systems of ODEs is to use the 4th order RK method as above. This improves the truncation errors, but doesn't increase the computational costs that much.

Working in Higher Orders

For higher order ODEs, we can decompose it into a system of first order ODEs. For systems, we go through the same equations, but interpret quantities that would be linear in the base case as vectors.

Adaptive Step Size Methods

Whilst smaller step sizes reduce truncation errors, we need more steps. More steps then give a larger round-off error and so we're back to square 1. Instead of endlessly fighting, we can choose a small step size when it's required, and otherwise use a large step size where \(f\) doesn't vary much.

One approach to do this is to tune the error so that the error is the same for all steps. To get the error, we can do a half step and a full step, then estimate the error.

If the error is larger than a \(n \sigma\) of the step size, then we half it, if error lower than \(n \sigma\), then double the step size. If within tolerances, then leave as is. The only issue with this is that we waste cycles computing the \(h/2\).

In Practice

In practice, we use a 4th order RK, with a term to calculate the 5th order RK. As the 5th order is based on the 4th order, we don't waste compute. From here, we go through in exactly the same way as before, but instead of wasting CPU cycles, we are using only a few to check the size.

The 5th order RK is the current state of the art.

Stiff Systems

With numerical integration, a system might be stiff. It is called stiff if the solution has components that vary with very different speed and frequency. The definition of stiff varies a lot from place to place.

Stiff systems can require a really small step size, and this is the case if the solutions vary a lot.

Example

Consider a system of two differential equations:

\[ \begin{align} x' &= -20x - 19y & x(0) = 2\\ y' &= -19x - 20y &y(0) = 0 \end{align} \]

As this is a simple set of equations with analytical solutions, we can calculate them for analysis:

\[ \begin{align} x(t) &= e^{-39t} + e^{-t}\\ y(t) &= e^{-39t} - e^{-t} \end{align} \]

We observe that \(x,y \to 0\) whilst \(t \to \infty\). This system contains two timescales with components which vary at very different rates. For large \(t\), the \(e^{-t}\) is the dominating term, with \(e^{-39t}\) known as a transient scheme.

To numerically integrate them, we can make use of the Euler scheme. This Euler scheme gives the following, once we solve it:

\[ \begin{align} x_n &= (1-39h)^n + (1-h)^n\\ y_n &= (1-39h)^n + (1-h)^n \end{align} \]

For the system to converge to the true solution, we need \(x_n \to 0\) and \(y_n \to 0\), as \(n \to \infty\).

Therefore, we can say that \(| 1 -39h | < 1\) and \(| 1-h| < 1\).

The main takeaway for a stiff system is that we need to use an implicit scheme to solve stiff systems.