A brief introduction to variational calculus

Disclaimer: what follows assumes familiarity with calculus (in particular the chain rule and integration by parts). This is also no substitute for a proper textbook, as we will skip over a lot of details.

The simplest introduction to the calculus of variations is to solve in a slightly roundabout way a very easy geometrical problem: what is the shortest path between 2 points on a plane? (Spoiler: it's a straight line.)
Let's pretend we have no idea, and so we are forced to take into consideration all possible functions passing through 2 given points. What we want to do is to calculate the length of each of them, and select the shortest one. (Spoiler: we are not really going to do that.)
The length of a curve is given by the integral (between the start and finish points) of the infinitesimal lengths ds. And, since we are on the plane, ds is given by Pythagoras theorem: \[\ell=\int_{x_1}^{x_2} ds \quad ; \quad ds^2= dx^2+dy^2 \, .\] With some we can rewrite ds, and thus the length of the curve, in a more convenient way: \[ds^2= dx^2+dy^2 \rightarrow \left( \frac{ds}{dx} \right)^2=1+ \left( \frac{dy}{dx} \right)^2 \rightarrow ds = \sqrt{1+ \left( \frac{dy}{dx} \right)^2} \, dx \, .\] What we want to do now is to look over all possible functions that pass from the two points, and select the one with the shortest length. This is impossible to do via brute force, so we are going to "cheat". Imagine that y(x) is the function we are looking for (i.e. our solution). Any function f(x) that we look will be the (unknown) solution y(x) plus a "perturbation" ε η(x), where η(x) is also a function, and ε is a number that tells us how big the perturbation is: \(f(x) = y(x) + \varepsilon \eta(x)\).
When you look for the minimum of a function (or functional), what you want to do is to differentiate it, and then look when the derivative is zero. In this case we want to differentiate with respect to ε. Furthermore we know that we will get our solution at ε=0. To make the derivative we make use of the chain rule, and then we set ε=0 (remembering how we defined f(x) ): \[\ell = \int_{x_1}^{x_2} \sqrt{1+f'^2} dx \rightarrow \frac{d\ell}{d\varepsilon} = \int_{x_1}^{x_2} \frac{f'}{\sqrt{1+f'^2}} \frac{d f'}{d \varepsilon} dx \rightarrow \left. \frac{d\ell}{d\varepsilon} \right\rvert_{\varepsilon=0}= \int_{x_1}^{x_2} \frac{y' \eta}{\sqrt{1+y'^2}} dx . \] The next step is (as it is often the case) an integration by parts. The first term is zero because η(x) is zero at both extremes. So also the second term must be zero, and it must be zero for every possible η, meaning that the part in square brackets must be zero too: \[ \left. \frac{d\ell}{d\varepsilon} \right\rvert_{\varepsilon=0}= \left. \frac{y' \eta}{\sqrt{1+y'^2}} \right\rvert_{\varepsilon=0} - \int_{x_1}^{x_2} \eta \left[ \frac{d}{dx} \frac{y'}{\sqrt{1+y'^2}} \right] dx =0 . \] If the derivative of a function is zero, it means that the function itself is a constant. And if we rearrange it, we find that y' is also a constant, meaning that our solution was a straight line after all! \[\frac{d}{dx} \frac{y'}{\sqrt{1+y'^2}}=0 \rightarrow \frac{y'}{\sqrt{1+y'^2}} = \text{constant} \rightarrow y' = \text{constant} .\] This probably feels like an anticlimax, but in solving a problem we already knew the solution of, we stumbled on a quite general way of solving this kind of problems. So let's look again at what we have done!

Let's say we want to find the function f that minimizes (or maximizes) the functional I. We can once again pretend that f is the sum of our unknown solution + a perturbation. \[I= \int_{x_1}^{x_2} F(x, f, f') dx \quad \text{with} \quad f(x)= y(x) + \varepsilon \eta(x) . \] To find the minimum (or the maximum) we want to differentiate with respect to ε, and then set ε=0 like we did before: \[ \frac{dI}{d\varepsilon} = \int_{x_1}^{x_2} \left( \frac{\partial F}{\partial f} \frac{\partial f}{\partial \varepsilon} + \frac{\partial F}{\partial f'} \frac{\partial f'}{\partial \varepsilon} \right) dx = \int_{x_1}^{x_2} \left( \frac{\partial F}{\partial f} \eta + \frac{\partial F}{\partial f'} \eta' \right) dx . \] We again integrate by parts the second term, and notice that we remain with a term that goes to zero because η is zero at the extremes, and a term that must be zero whatever η is: \[\left. \frac{dI}{d\varepsilon}\right\rvert_{\varepsilon=0} = \eta \, \left. \frac{\partial F}{\partial y'} \right\rvert_{\varepsilon=0} - \int_{x_1}^{x_2} \eta \left[ \frac{d}{dx} \frac{\partial F}{\partial y'} -\frac{\partial F}{\partial y} \right] dx =0 \] \[\rightarrow \frac{d}{dx} \frac{\partial F}{\partial y'} -\frac{\partial F}{\partial y}=0 , \] which is the differential equation our solution needs to satisfy in order to be the function that minimizes (or maximizes) the functional I.
(This differential equation that we found might be familiar to you, as it is exactly the Euler-Lagrange equation that governs the dynamics of classical mechanics systems. We just have to interpret the parameter x as time, and F as the Lagrangian of the system.)

A simple exercise: Find the shortest line between two points on a cylinder.
Obviously we want to do it in cylindrical coordinates. We can also assume that the axis of the cylinder is aligned with the z coordinate, and that the cylinder has radius R. What is the functional we want to minimize? As we did before we use Pythagoras' theorem to find ds and thus the length of the curve: \[ ds^2=dx^2+dy^2+dz^2 \rightarrow \left( \frac{ds}{d\theta} \right)^2 =\left( \frac{dx}{d\theta} \right)^2+\left( \frac{dy}{d\theta} \right)^2+\left( \frac{dz}{d\theta} \right)^2 \] \[\rightarrow \left( \frac{ds}{d\theta} \right)^2 = \left( R \sin\theta \right)^2 + \left( R \cos\theta \right)^2 + z'^2 \] \[ \rightarrow ds = \sqrt{R^2 +z'^2} \, d\theta \rightarrow \ell = \int_{\theta_1}^{\theta_2} \sqrt{R^2 +z'^2} \, d\theta \, . \] Since we now know F, we apply the Euler-Lagrange equation we found above, and (lo and behold!) we find that the geodesics on a cylinder looks suspiciously similar to the geodesics on a plane! \[ F = \sqrt{R^2 +z'^2} \rightarrow\frac{d}{d\theta} \frac{\partial F}{\partial z'} -\frac{\partial F}{\partial z} = 0 \rightarrow \frac{d}{d\theta} \frac{\partial F}{\partial z'} =0 \] \[ \frac{\partial F}{\partial z'} = \text{constant} \rightarrow \frac{z'}{\sqrt{R^2 +z'^2}} = \text{constant} \rightarrow z' = \text{constant} . \]

A more complex exercise: Consider an idealized perfectly straight shore, where the water depth \(h\) increase linearly with the distance \(x\) from the shoreline. Knowing that the group velocity of a wave in shallow water is given by \(v_g=\sqrt{g\,h}\) (where \(g\) is the gravity acceleration), at what angle will the waves hit the shore?
By Fermat's principle, the path taken by a wave between two given points is the path that can be traveled in the least time. One point will be on the shoreline, and the other can be anywhere in the water, and we want to minimize the time it will take to the wave to go from one to the other, i.e. we want to minimize \[T=\int_{x_1}^{x_2} \frac{ds}{v_g} = \int_{x_1}^{x_2} \frac{\sqrt{1+y'^2} dx}{\sqrt{g\, h}}= \frac{1}{\sqrt{g}}\int_{x_1}^{x_2} \sqrt{\frac{1+y'^2}{x}} dx.\] We now know our \(F\), so we can apply the Euler-Lagrange equation and find \[\frac{d}{dx}\frac{\partial}{\partial y'} \sqrt{\frac{1+y'^2}{x}} - \frac{\partial}{\partial y} \sqrt{\frac{1+y'^2}{x}} =0 \Rightarrow \frac{d}{dx}\frac{\partial}{\partial y'} \sqrt{\frac{1+y'^2}{x}}=0 \] \[\Rightarrow \frac{y'}{\sqrt{x (1+y'^2)}}=\text{constant}=c \Rightarrow y' = c \sqrt{x (1+y'^2)} \] \[\Rightarrow y'= \sqrt{\frac{c^2 x}{1-c^2 x}} \Rightarrow y = \int \sqrt{\frac{c^2 x}{1-c^2 x}} dx . \] To solve this integral we make the substitution \(x=\frac{1-\cos \theta}{2 c^2}\) \(\rightarrow dx = \frac{\sin \theta}{2 c^2} d\theta \): \[y = \int \sqrt{\frac{c^2 \frac{1-\cos \theta}{2 c^2}}{1-c^2 \frac{1-\cos \theta}{2 c^2}}} \frac{\sin \theta}{2 c^2} d\theta = \frac{1}{2c^2}\int \sqrt{\frac{1-\cos\theta }{1+\cos \theta }} \sin\theta d\theta =\] \[= \frac{1}{2c^2}\int \tan \left( \frac{\theta}{2} \right) \sin \theta d\theta = \frac{1}{2c^2}\int \left( 1- \cos \theta \right) d\theta = \frac{\theta- \sin\theta}{2 c^2} + c_1 , \] where we used the trigonometric identities \(\frac{1-\cos\theta }{1+\cos \theta} = \tan^2 \left( \frac{\theta}{2} \right) \), and \(\tan \left( \frac{\theta}{2} \right)= \frac{1-\cos\theta}{\sin\theta }\) (and \(c_1\) is an integration constant).
We now have to find the angle of this curve at \(x=0\): \[\left.\frac{dy}{dx}\right\rvert_{x=0} = \left. \frac{\partial y}{\partial \theta} \left( \frac{\partial x}{\partial \theta}\right)^{-1 } \right\rvert_{\theta=0} = \left. \frac{1-\cos\theta}{2c^2} \left( \frac{\sin\theta}{2c^2}\right)^{-1 } \right\rvert_{\theta=0} = \lim_{\theta\rightarrow 0} (1-\cos\theta)\sin\theta =0 , \] so all waves (at least in this simplified system) will arrive perpendicular to the shore.

Contact details :

  • Postal address:
    University of Exeter
    Physics building
    Stocker Road
    EX4 4QL
    United Kingdom
  • E-mail: j.bertolotti@exeter.ac.uk