# Fundamental Ideas

# Fundamental Ideas

# Abstract and Keywords

Chapter 1 offers a simple introduction to the use of variational principles in physics. This approach to physics plays a key role in the book. The chapter starts with a look at how we might minimize a journey by car, even if this means taking a longer route. Soap films are also discussed. It then turns to geometrical optics and uses Fermat’s principle to explain the reflection and refraction of light. There follows a discussion of the significance of variational principles throughout physics. The chapter also covers some introductory mathematical ideas and techniques that will be used in later chapters. These include the mathematical representation of space and time and the use of vectors; partial differentiation, which is necessary to express all the fundamental equations of physics; and Gaussian integrals, which arise in many physical contexts. These mathematical techniques are illustrated by their application to waves and radioactive decay.

*Keywords:*
Fermat’s principle, geometrical optics, variational principle, Gaussian integral, waves, vectors, radioactive decay, soap films

# 1.1 Variational Principles

Many of our activities in everyday life are directed towards optimizing some quantity. We often try to perform tasks with minimal effort, or as quickly as possible. Here is a simple example: we may plan a road journey to minimize the travel time, taking a longer route in order to go faster along a section of highway. Figure 1.1 is a schematic road map between towns A and B. Speed on the ordinary roads is 50 mph, and on the highway passing through F, G and H it is 70 mph. The minimum journey time is 1 hr 24 mins along route AFGB, even though this is not the shortest route.

Remarkably, many natural processes can similarly be seen as optimizing some quantity. We say that they satisfy a *variational principle*. An elastic band stretched between two points lies along a straight line; this is the shortest path and also minimizes the elastic band’s energy. We can understand why a straight line is the shortest path as follows. First we need to assume that a shortest path *does exist*. In the current situation this is obvious, but there are more complicated optimization problems where there is no optimal solution. Now assume that the shortest path has a curved segment somewhere along it. Any curved segment can be approximated by part of a circle, as shown in Figure 1.2, and using a little trigonometry, we can check that the straight segment CD is shorter than the circular arc CD. In fact, the circular arc has length $2R\alpha $, and the straight segment has length $2Rsin\alpha $, which is shorter. So the assumption that the shortest path is somewhere curved is contradictory. Therefore the shortest path is straight.

(p.5)
A soap film is another familiar, physical example of energy optimization. Although it might initially be vibrating, the soap film will eventually settle into a state in which it is at rest. Its energy is then the product of its constant surface tension and its area, so the energy is minimized when the area is minimized. For any smooth surface in 3-dimensional space, there are two principal radii of curvature, *r*_{1} and *r*_{2}; for a surface of minimal area the two radii of curvature are equal, but point in opposite directions. Every region of the surface is saddle-like, as shown in Figure 1.3. We can understand physically why the surface tension has this effect. On each small element of the surface, the two curvatures produce forces. If they are equal in magnitude and opposite in direction then they cancel, and the surface element is in equilibrium. We therefore have an intimate connection between the physical ideas of energy and force and the geometrical concept of minimal area. We will discuss the geometry of curved surfaces further in Chapter 5.

## (p.6) 1.1.1 Geometrical optics—reflection and refraction

*Fermat’s principle* in the field of optics was the first optimization principle to be discovered in physics. It was described by Pierre de Fermat in 1662. Geometrical optics is the study of idealized, infinitesimally thin beams of light, known as light rays. In the real world, narrow beams of light that are close to ideal rays can be obtained using parabolic mirrors or by projecting light through a screen containing narrow slits. Even if the light is not physically restricted like this, it can still be considered as a collection of rays travelling in different directions.

Fermat’s principle says that the path taken by a light ray between two given points, A and B, is the path that minimizes the total travel time. The path may be straight, or it may be bent or even curved as it passes through various media. A fundamental assumption is that in a given medium, a light ray has a definite, finite speed. In a uniform medium, for example air or water or a vacuum, the travel time equals the length of the path divided by the light speed. Since the speed is constant, the path of minimal time is also the shortest path, and this is the straight line path from A to B. So light rays are straight in a uniform medium, as is readily verified. A light ray heading off in the correct direction from a source at A will arrive at B, and even though the source may emit light in all directions, a small obstacle anywhere along the line between A and B will prevent light reaching B, and will cast a shadow there.

Fermat’s principle can be used to understand two basic laws of optics, the laws of *reflection* and *refraction*. First, let’s consider reflection. Suppose we have a long flat mirror in a uniform medium, and a light source at A. Let B be the light receiving point, on the same side of the mirror, as shown in Figure 1.4. Consider all the possible light rays from A to B that bounce off the mirror once. If the time for the light to travel from A to B is to be minimized, the segments before and after reflection must be straight. What we’d like to know is the position of the reflection point X.

The coordinates in the figure show the *x*-axis along the mirror, and the reflection point X is at $x=X$. Consider the various lengths in the figure, and ignore the angles $\vartheta $ and $\phi $ for the moment. Using Pythagoras’ theorem to determine the path lengths, we find that the time for the light to travel from A to B via X is
(p.7)

where *c* is the speed of the light along both straight segments. The derivative of *T* with respect to *X* is

and the travel time is minimized when this derivative vanishes, giving the equation for *X*,

Now the angles come in handy, as equation (1.3) is equivalent to

as can be seen from Figure 1.4. Therefore $\vartheta $ and $\phi $ are equal. We haven’t explicitly found *X*, but that doesn’t matter. The important result is that the incoming and outgoing light rays meet the mirror surface at equal angles. This is the fundamental law of reflection. In fact, by simplifying equation (1.3) or by considering the equation $cot\vartheta =cot\phi $, we obtain $\frac{X}{a}=\frac{(L-X)}{b}$, and then *X* is easily found.

Refraction isn’t very different. Here, light rays pass from a medium where the speed is *c*_{1} into another medium where the speed is *c*_{2}. The geometry of refraction is different from that of reflection, but not very much, and we use similar coordinates (see Figure 1.5). By Fermat’s principle, the path of the actual light ray from A to B, or from B to A, is the one that minimizes the time taken. Note that, unless ${c}_{1}={c}_{2}$, this is definitely not the same as the *shortest* path from A to B, which is the straight line between them. The path of minimum time has a kink, just like the route via the highway that we considered earlier.

The rays from A to X and from X to B must be straight, because each of these segments is wholly within a single medium and traced out at a single speed. The total time for the light to travel from A to B is therefore

The time *T* is again minimized when the derivative of *T* with respect to *X* vanishes, that is,

This gives the equation for *X*,

We do not really want to solve this, but rather to express it more geometrically. In terms of the angles $\vartheta $ and $\phi $ in Figure 1.5, the equation becomes

(p.8) or more usefully

This is Willebrord Snell’s law of refraction.^{1} It relates the angles of the light rays to the ratio of the light speeds *c*_{2} and *c*_{1}. Snell’s law can be tested experimentally even if the light speeds are unknown. To do this, the angle at which the light beam hits the surface must be varied, so that A and B are no longer fixed. When $cos\phi $ is plotted against $cos\vartheta $, the resulting graph is a straight line through the origin.

Suppose the light passes from air into water. The speed of light in water is less than its speed in air, so *c*_{2} is less than *c*_{1}, and $cos\phi $ is less than $cos\vartheta $. Therefore $\phi $ is greater than $\vartheta $. The result, as is easily verified, is that light rays are bent into the water towards the normal to the surface, as shown in Figure 1.5.

Snell’s law has many interesting consequences. It is key to applications such as light focussing and lens systems. It also accounts for the phenomenon of total internal reflection. This occurs when a light ray originating at B, in the medium where the light speed is less, hits the surface at a small angle $\phi $ for which $cos\phi $ is close to 1 and therefore $cos\vartheta >1$. There is then no solution for the angle $\vartheta $, so the ray cannot cross the surface into medium 1, and the entire ray is reflected internally. The critical angle of incidence ${\phi}_{\text{c}}$ for total internal reflection depends on the ratio of the light speeds in the two media. Equation (1.9) shows (p.9) that $cos{\phi}_{\text{c}}=\frac{{c}_{2}}{{c}_{1}}$. This result is important for applications such as the transmission of light signals down fibre optic cables.

Originally, the law of refraction was expressed in terms of a ratio of refractive indices on the right-hand side of equation (1.9). It was by considering Fermat’s principle that physicists realised that the ratio could be understood as a ratio of light speeds. Later, when the speed of light in various media could be directly measured, it was found that light travels at its maximal speed in a vacuum, and only slightly slower in air. In denser materials such as water or glass, however, its speed is considerably slower, by about 20–40%. The speed of light in a vacuum is an absolute constant, $299,792,458\text{}\mathrm{m}\phantom{\rule{thinmathspace}{0ex}}{\mathrm{s}}^{-1}$, which is often approximated as $3\times {10}^{8}\text{}\mathrm{m}\phantom{\rule{thinmathspace}{0ex}}{\mathrm{s}}^{-1}$. In dense media the speed may depend on the wavelength of the light, so in passing from air into glass or water, rays of different colours bend through different angles, which is why a refracted beam of white light splits up when entering a glass prism or water droplet.

## 1.1.2 The scope of variational principles

We have given a brief flavour of how some mathematical laws of nature can be formulated in terms of variational principles. These principles are actually much more general, and occur throughout physics. Be it the motion of particles, the waveforms of fields, quantum states, or even the shape of spacetime itself, we find that natural processes always optimize some physical quantity. Usually this means that the quantity is minimized or maximized, but it may be at a saddle point.^{2} The most important such quantity is known as an *action*, and many laws of physics can be formulated as a *principle of least action*. The appropriate mathematics for analysing these principles is called the *calculus of variations*. It is an extension of ordinary calculus, with its own additional tools that we will explain later.

As long ago as the 18th century, Jean le Rond D’Alembert, Leonhard Euler and Joseph-Louis Lagrange realized that Newton’s laws of motion could be derived from a principle of least action. This approach was perfected by William Rowan Hamilton in the 1830s. We now know that Maxwell’s equations for electric and magnetic fields also arise from an electromagnetic action principle, and in 1915 David Hilbert showed that Einstein’s newly discovered equations describing gravity as curved spacetime arise from an action principle. Even the relationship between classical physics and quantum mechanics is best understood in terms of an action principle. This idea was pioneered by Paul Dirac, and perfected by Feynman. Today, the action principle is seen as the best method of encapsulating the behaviour of particles and fields.

One advantage of formulating physical theories in this way is that the principle of least action is concise and easy to remember. For example, in Maxwell’s original formulation of electromagnetism, there were 20 equations for electromagnetic fields. In modern vector notation, due to Josiah Willard Gibbs, there are four Maxwell equations, supplemented by the Lorentz force law for charged particles. The action, on the other hand, is a single quantity constructed from the electromagnetic fields and the trajectories of charged particles, as we will describe in Chapter 3. This economy is essential when developing the more complicated gauge theories of elementary particles, discussed in Chapter 12, and even more esoteric theories, such as string theory.

(p.10) In Chapter 2 we will return to these ideas and show how Newtonian mechanics can be understood in terms of the principle of least action. By considering all possible infinitesimal variations in the motion of a physical body through space, we will derive Newton’s laws of motion. First, however, we must describe mathematically the arena in which this motion takes place.

# 1.2 Euclidean Space and Time

Familiar 3-dimensional Euclidean space, known as 3-space for short and often denoted by ${\mathbb{R}}^{3}$, is the stage on which the drama of the physical world is played out. This drama takes place in time, but time and space are not unified in non-relativistic physics, so we will not require a geometrical description of time as yet. 3-space has the Euclidean symmetries of rotations and translations, where a translation is a rigid motion without rotation. The most fundamental geometrical concept is the distance between two points, and this is unchanged by translations and rotations. It is natural to express the laws of physics in a way that is independent of position and orientation. Then their form does not change when the entire physical system is translated or rotated. This gives the laws a geometrical significance.

A point in space is most easily described using Cartesian coordinates. For this one needs to pick an origin O, and a set of axes that are mutually orthogonal, meaning at right angles. Every point P is uniquely represented by three real numbers, collectively written as a *vector* $\mathbf{\text{x}}=({x}_{1},{x}_{2},{x}_{3})$. Often, we will not distinguish a point from the vector representing it. To get from O to P one moves a distance *x*_{1} along the 1-axis, to A, then a distance *x*_{2} parallel to the 2-axis, to B, and finally a distance *x*_{3} parallel to the 3-axis, to P, as shown in Figure 1.6. O itself is represented by the vector (0, 0, 0).

The length or magnitude of **x** is the distance from O to P, and is denoted by $|\mathbf{\text{x}}|$. This distance can be calculated using Pythagoras’ theorem. OAB is a right angle triangle, so the distance from O to B is $\sqrt{{x}_{1}^{2}+{x}_{2}^{2}}$, and since OBP is also a right angle triangle, the distance
(p.11)
from O to P is $\sqrt{({x}_{1}^{2}+{x}_{2}^{2})+{x}_{3}^{2}}$. The square of the distance is therefore

which is the 3-dimensional version of Pythagoras’ theorem. If one performs a rotation about O, the distance $|\mathbf{\text{x}}|$ remains the same.

The rotation sending **x** to ${\mathbf{\text{x}}}^{\prime}$ may be an active one, making ${\mathbf{\text{x}}}^{\prime}$ and **x** genuinely different points. Alternatively, the rotation may be a passive one, by which we mean that the axes are rotated, but the point **x** does not actually change. All that happens is that **x** acquires a new set of coordinates ${\mathbf{\text{x}}}^{\prime}=({{x}_{1}}^{\prime},{{x}_{2}}^{\prime},{{x}_{3}}^{\prime})$ relative to the rotated axes. In both cases $|{\mathbf{\text{x}}}^{\prime}|=|\mathbf{\text{x}}|$.

The square of the distance between points **x** and **y** is

This distance is unaffected by both rotations and translations. A translation shifts all points by a fixed vector **c**, so **x** and **y** are shifted to $\mathbf{\text{x}}+\mathbf{\text{c}}$ and $\mathbf{\text{y}}+\mathbf{\text{c}}$. The difference $\mathbf{\text{x}}-\mathbf{\text{y}}$ is unchanged, and so is $|\mathbf{\text{x}}-\mathbf{\text{y}}|$.

When considering a pair of vectors **x** and **y**, it is useful to introduce their *dot product*

A special case of this is $\mathbf{\text{x}}\cdot \mathbf{\text{x}}={x}_{1}^{2}+{x}_{2}^{2}+{x}_{3}^{2}=|\mathbf{\text{x}}{|}^{2}$, expressing the squared length of **x** as the dot product of **x** with itself. It is not immediately obvious whether $\mathbf{\text{x}}\cdot \mathbf{\text{y}}$ is affected by a rotation. However, if we expand out the terms on the right-hand side of equation (1.11), we find that

and as $|\mathbf{\text{x}}|,\text{}|\mathbf{\text{y}}|$ and $|\mathbf{\text{x}}-\mathbf{\text{y}}|$ are all unaffected by a rotation, $\mathbf{\text{x}}\cdot \mathbf{\text{y}}$ must also be unaffected. We can use this result to find a more convenient expression for the dot product of **x** and **y**. When applied to a triangle with edges of length $|\mathbf{\text{x}}|,\text{}|\mathbf{\text{y}}|$ and $|\mathbf{\text{x}}-\mathbf{\text{y}}|$, as shown in Figure 1.7, we can rearrange the expression (1.13), and then use the cosine rule to obtain

where $\vartheta $ is the angle between the vectors **x** and **y**.

It follows that if $\mathbf{\text{x}}\cdot \mathbf{\text{y}}=0$, and the lengths of the vectors **x** and **y** are non-zero, then $cos\vartheta =0$ so the angle between **x** and **y** is $\vartheta =\pm \frac{\pi}{2}$, and the two vectors are orthogonal. For
(p.12)
example, the basis vectors along the Cartesian axes, $(1,0,0),\text{}(0,1,0)$ and $(0,0,1)$ are all of unit length, and the dot product of any pair of them vanishes, so they are orthogonal.

Critically, in Euclidean 3-space, the lengths of vectors and the angles between them are invariant under any rotation of all the vectors together, and this is why the dot product is a useful construction. Quantities such as $\mathbf{\text{x}}\cdot \mathbf{\text{y}}$ that are unaffected by rotations are called *scalars*.

There is a further, equally useful construction. From two vectors **x** and **y** one may construct a third vector, their *cross product* $\mathbf{\text{x}}\times \mathbf{\text{y}}$, as shown in Figure 1.8. This has components

The cross product is useful, because if both **x** and **y** are rotated around an arbitrary axis, then $\mathbf{\text{x}}\times \mathbf{\text{y}}$ rotates with them. (If one invented another vector product of **x** and **y** with components $({x}_{2}{y}_{3},{x}_{3}{y}_{1},{x}_{1}{y}_{2})$, say, then it would not have this rotational property and would have little geometrical significance.) Unlike the dot product $\mathbf{\text{x}}\cdot \mathbf{\text{y}}$, the cross product $\mathbf{\text{x}}\times \mathbf{\text{y}}$ is not *invariant* under rotations. We say that it transforms *covariantly* with **x** and **y** under rotations. ‘Co-variant’ means ‘varying with’ or ‘transforming in the same way as’, and this is an idea that occurs frequently in physics.

We can check this rotational covariance of $\mathbf{\text{x}}\times \mathbf{\text{y}}$ by considering the dot product of $\mathbf{\text{x}}\times \mathbf{\text{y}}$ with a third vector **z**. Using equations (1.15) and (1.12), we find

This is generally non-zero, but if either $\mathbf{\text{z}}=\mathbf{\text{x}}$ or $\mathbf{\text{z}}=\mathbf{\text{y}}$, it is easy to see that the six terms above cancel out in pairs, and the result is zero. This means that $\mathbf{\text{x}}\times \mathbf{\text{y}}$ is orthogonal to **x** and orthogonal to **y**, as shown in Figure 1.8. When subject to a rotation, the directions of $\mathbf{\text{x}}\times \mathbf{\text{y}},\text{}\mathbf{\text{x}}$ and **y** must therefore all rotate together. Now we just need to check that the
(p.13)
length of $\mathbf{\text{x}}\times \mathbf{\text{y}}$ is invariant under rotations. In terms of its components, the squared length of $\mathbf{\text{x}}\times \mathbf{\text{y}}$ is

and this can be reexpressed, after a little algebra, as

The right-hand side only includes quantities that are rotationally invariant, so $|\mathbf{\text{x}}\times \mathbf{\text{y}}|$ is similarly invariant. The right-hand side can be expressed in terms of lengths and angles as $|\mathbf{\text{x}}{|}^{2}|\mathbf{\text{y}}{|}^{2}-|\mathbf{\text{x}}{|}^{2}|\mathbf{\text{y}}{|}^{2}{cos}^{2}\vartheta $, which simplifies to $|\mathbf{\text{x}}{|}^{2}|\mathbf{\text{y}}{|}^{2}{sin}^{2}\vartheta $. The length of the vector $\mathbf{\text{x}}\times \mathbf{\text{y}}$ is therefore $|\mathbf{\text{x}}||\mathbf{\text{y}}|sin\vartheta $.

Under the exchange of **x** and **y**, the two quantities $\mathbf{\text{x}}\cdot \mathbf{\text{y}}$ and $\mathbf{\text{x}}\times \mathbf{\text{y}}$ have opposite symmetry properties. $\mathbf{\text{x}}\cdot \mathbf{\text{y}}=\mathbf{\text{y}}\cdot \mathbf{\text{x}}$, but $\mathbf{\text{x}}\times \mathbf{\text{y}}=-(\mathbf{\text{y}}\times \mathbf{\text{x}})$, as is clear from equations (1.12) and (1.15). The latter relation implies that $\mathbf{\text{x}}\times \mathbf{\text{x}}=0$ for any **x**.

From three vectors **x**, **y** and **z**, there are two useful geometrical quantities that can be constructed. One is the scalar $(\mathbf{\text{x}}\times \mathbf{\text{y}})\cdot \mathbf{\text{z}}$. This has several nice symmetry properties that can be verified using equation (1.16), in particular

The other geometrical quantity is the double cross product $(\mathbf{\text{x}}\times \mathbf{\text{y}})\times \mathbf{\text{z}}$, which is a vector. This can be expressed in terms of dot products through the important identity

This identity, which is covariant under rotations, is easily checked using the cross product definition (1.15). To gain some intuition into its form, note that $\mathbf{\text{x}}\times \mathbf{\text{y}}$ is orthogonal to the plane spanned by **x** and **y**, and taking the cross product with **z** gives a vector orthogonal to $\mathbf{\text{x}}\times \mathbf{\text{y}}$ and therefore back in this plane. $(\mathbf{\text{x}}\times \mathbf{\text{y}})\times \mathbf{\text{z}}$ must therefore be a linear combination of **x** and **y**. This vector must also be orthogonal to **z** and this is clearly true of the right-hand side of the identity, as

We have gone into these properties of $\mathbf{\text{x}}\cdot \mathbf{\text{y}}$ and $\mathbf{\text{x}}\times \mathbf{\text{y}}$ in some detail, because the laws of physics need to be expressed in a way that doesn’t change when the entire physical system is rotated or translated. Even more importantly, the laws of physics should not change if one passively rotates the axes or shifts the origin. Dot products and cross products therefore occur frequently in physical contexts, for example, in formulae for energy and angular momentum. In the next section we will meet a vector of partial derivatives, denoted by ∇, and should not be surprised that it appears in electromagnetic theory in expressions such as $\mathrm{\nabla}\cdot \mathbf{\text{E}}$ and $\mathrm{\nabla}\times \mathbf{\text{E}}$, where **E** is the electric field vector. We will define and use these quantities in Chapter 3.

Geometrically speaking, there is not much to add concerning time until we discuss relativity in Chapter 4. In non-relativistic physics we use a further Cartesian coordinate *t* to represent time. Given times *t*_{1} and *t*_{2}, it is the interval between them, ${t}_{2}-{t}_{1}$, that is
(p.14)
physically meaningful. Physical phenomena are unaffected by a time shift. If a process can start at *t*_{1} and end at *t*_{2} then it can equally well start at ${t}_{1}+c$ and end at ${t}_{2}+c$. Suppose some system starts at $t=0$ and ends in the same state at $t=T$. Then it will repeat, and come back to the same state at $t=2T,\text{}t=3T$, and so on. This has an application that is very familiar to us in the guise of a clock.

# 1.3 Partial Derivatives

Physics in 3-dimensional space often involves functions of several variables. When a function depends on more than one variable, we need to consider the derivatives with respect to all of these. Suppose $\varphi ({x}_{1},{x}_{2},{x}_{3})$ is a smooth function defined in Euclidean 3-space. The *partial derivative* $\frac{\partial \varphi}{\partial {x}_{1}}$ is just the ordinary derivative with respect to *x*_{1}, with *x*_{2} and *x*_{3} treated as fixed, or constant. It can be evaluated at any point $\mathbf{\text{x}}=({x}_{1},{x}_{2},{x}_{3})$. By taking *x*_{2} and *x*_{3} fixed, one is really just thinking of *ϕ* as a function of *x*_{1} along the line through **x** parallel to the 1-axis, and the partial derivative $\frac{\partial \varphi}{\partial {x}_{1}}$ is the ordinary derivative along this line. The partial derivatives $\frac{\partial \varphi}{\partial {x}_{2}}$ and $\frac{\partial \varphi}{\partial {x}_{3}}$ are similarly defined at **x** by differentiating along the lines through **x** parallel to the 2-axis and 3-axis.

It is easy to calculate the partial derivatives of functions that are known explicitly. For example, if $\varphi ({x}_{1},{x}_{2},{x}_{3})={x}_{1}^{3}{x}_{2}^{4}{x}_{3}$, then $\frac{\partial \varphi}{\partial {x}_{1}}$ is found by differentiating ${x}_{1}^{3}$ and treating ${x}_{2}^{4}{x}_{3}$ as a constant, and similarly for $\frac{\partial \varphi}{\partial {x}_{2}}$ and $\frac{\partial \varphi}{\partial {x}_{3}}$. Therefore

Recall that by using the ordinary derivative of a function $f(x)$, denoted by ${f}^{\prime}(x)$, we can obtain an approximate value for $f(x+\delta x)$ when $\delta x$ is small:

Similarly, by using the partial derivative $\frac{\partial \varphi}{\partial {x}_{1}}$, we obtain

By combining the three partial derivatives of *ϕ* at **x**, we obtain the more powerful result

This provides an approximation for *ϕ* at any point $\mathbf{\text{x}}+\delta \mathbf{\text{x}}$ close to **x**.

There is an implicit assumption here, which is that $\frac{\partial \varphi}{\partial {x}_{2}}$ is essentially the same at the point $({x}_{1}+\delta {x}_{1},{x}_{2},{x}_{3})$ as it is at $({x}_{1},{x}_{2},{x}_{3})$, and similarly for $\frac{\partial \varphi}{\partial {x}_{3}}$. This is why we supposed earlier that *ϕ* was smooth.

The collection of partial derivatives of *ϕ* forms a vector, denoted by $\mathrm{\nabla}\varphi $:

(p.15) Similarly $\delta \mathbf{\text{x}}=(\delta {x}_{1}\phantom{\rule{thinmathspace}{0ex}},\delta {x}_{2}\phantom{\rule{thinmathspace}{0ex}},\delta {x}_{3})$ is a vector. Equation (1.25) can be written more concisely as

a result we will use repeatedly. On the right-hand side is a genuine dot product that is unchanged if one rotates the axes. $\mathrm{\nabla}\varphi $ is called the *gradient* of *ϕ*.

A good way to think about a function is in terms of its contours. For a function *ϕ* in 3-space, the contours are the surfaces of constant *ϕ*. If $\delta \mathbf{\text{x}}$ is any vector tangent to the contour surface through **x**, then $\varphi (\mathbf{\text{x}}+\delta \mathbf{\text{x}})-\varphi (\mathbf{\text{x}})\simeq 0$ to linear order in $\delta \mathbf{\text{x}}$, so $\mathrm{\nabla}\varphi \cdot \delta \mathbf{\text{x}}=0$. Therefore $\mathrm{\nabla}\varphi $ is orthogonal to $\delta \mathbf{\text{x}}$, implying that $\mathrm{\nabla}\varphi $ is a vector orthogonal to the contour surface, as shown in Figure 1.9. In fact, $\mathrm{\nabla}\varphi $ is in the direction of steepest ascent of *ϕ*, and its magnitude is the rate of increase of $\varphi $ with distance in this direction. This justifies the name ‘gradient’.

There can be points **x** where all three partial derivatives vanish, and $\mathrm{\nabla}\varphi =0$. **x** is then a stationary point of *ϕ*. Whether the stationary point is a minimum, a maximum, or a saddle point depends on the second partial derivatives of *ϕ* at **x**.

There are nine possible second partial derivatives of *ϕ*; these include $\frac{{\partial}^{2}\varphi}{\partial {x}_{1}^{2}}$, $\frac{{\partial}^{2}\varphi}{\partial {x}_{1}\partial {x}_{2}}$, $\frac{{\partial}^{2}\varphi}{\partial {x}_{2}\partial {x}_{1}}$ and $\frac{{\partial}^{2}\varphi}{\partial {x}_{2}^{2}}$. The mixed partial derivative $\frac{{\partial}^{2}\varphi}{\partial {x}_{1}\partial {x}_{2}}$ is obtained by first differentiating with respect to *x*_{2}, and then differentiating the result by *x*_{1}, whereas for $\frac{{\partial}^{2}\varphi}{\partial {x}_{2}\partial {x}_{1}}$ the order of differentiation is reversed.

For example, for the function $\varphi ({x}_{1},{x}_{2},{x}_{3})={x}_{1}^{3}{x}_{2}^{4}{x}_{3}$, one has

Notice that the mixed partial derivatives are actually the same. This is an important and completely general result.

(p.16)
To prove this result, one needs to think about the rectangle of values of *ϕ* shown in Figure 1.10 and estimate in two ways the expression

One estimate is the difference of the differences along the vertical edges,

The other estimate, with the bracketing reorganized, is the difference of the differences along the horizontal edges,

As the left-hand sides of these two expressions (1.30) and (1.31) are the same, the mixed partial derivatives must be equal. This result is called the symmetry of mixed (second) partial derivatives, because there is a symmetry under exchange of the order of differentiation. We shall make use of this later, for example, when investigating Maxwell’s equations, and when deriving various thermodynamic relationships.

There is a particularly important combination of the second partial derivatives of *ϕ*, called the *Laplacian* of *ϕ*, and denoted by ${\mathrm{\nabla}}^{2}\varphi $. This is

(p.17) and it is a scalar, unchanged if the axes are rotated. The scalar property is evident if one regards $\left(\frac{\partial}{\partial {x}_{1}},\frac{\partial}{\partial {x}_{2}},\frac{\partial}{\partial {x}_{3}}\right)$ as a vector of derivatives and writes

or more compactly ${\mathrm{\nabla}}^{2}\varphi =\mathrm{\nabla}\cdot \mathrm{\nabla}\varphi $. More formally still, ${\mathrm{\nabla}}^{2}=\mathrm{\nabla}\cdot \mathrm{\nabla}$. For our familiar example $\varphi ={x}_{1}^{3}{x}_{2}^{4}{x}_{3}$,

a typical, non-zero result. However, there are plenty of functions whose Laplacian vanishes, for example, ${x}_{1}^{2}-{x}_{2}^{2}$ and ${x}_{1}{x}_{2}{x}_{3}$.

In 3-space, one often needs to find the gradient or the Laplacian of a function $f(r)$ that depends only on the radial distance *r* from O. Here, ${r}^{2}={x}_{1}^{2}+{x}_{2}^{2}+{x}_{3}^{2}$. These calculations can be a bit fiddly, because *r* involves a square root, but are simpler if one works with *r*^{2}.

Let’s find the gradient first. By the chain rule,

On the other hand, by direct partial differentiation of ${x}_{1}^{2}+{x}_{2}^{2}+{x}_{3}^{2}$,

Comparing these expressions we see that

**x** is a vector of magnitude *r*, and $\stackrel{\u02c6}{\mathbf{\text{x}}}$ is the unit vector that points radially outwards at every point (except O). One can also understand equation (1.37) by noting that the contours of *r* are spheres centred at O, and the rate of increase of *r* with distance from O is unity everywhere. Equation (1.35) is easily generalized. For a general function $f(r)$, the chain rule gives

The most important example of this result is

which is useful when considering the inverse square law forces of electrostatics and gravity.

(p.18) Next, let us find the Laplacian of $f(r)$. We have $\mathrm{\nabla}(f(r))=\frac{1}{r}{f}^{\prime}(r)\mathbf{\text{x}}$, so

By the usual Leibniz rule, there are two contributions to the last expression. In one, ∇ acts on the function $\frac{1}{r}{f}^{\prime}(r)$ to give the contribution

where we have applied the result (1.38) again. The other is a dot product, in which the components $\left(\frac{\partial}{\partial {x}_{1}}\phantom{\rule{thinmathspace}{0ex}},\frac{\partial}{\partial {x}_{2}}\phantom{\rule{thinmathspace}{0ex}},\frac{\partial}{\partial {x}_{3}}\right)$ of ∇ act respectively on the three components $({x}_{1},{x}_{2},{x}_{3})$ of **x** to give the number 3, so the second contribution is $\frac{3}{r}{f}^{\prime}(r)$. Adding these, the result is

The most important example is

This equation is valid everywhere except at O. $\frac{1}{r}$ is infinite at O, so its gradient is not defined there, and neither is its Laplacian. One says that $\frac{1}{r}$ is singular at O. The most general function just of the variable *r* whose Laplacian vanishes (except possibly at O) is $\frac{C}{r}+D$, where *C* and *D* are constants.

# 1.4 *e*, *π* and Gaussian Integrals

The transcendental numbers *e* and *π* appear throughout mathematics and physics, and will be used frequently in what follows. The exponential function *e*^{x}, often written as $expx$, and its complex counterpart ${e}^{ix}$ will also appear frequently. There are two remarkable relations between *e* and *π*. One is the famous Euler relation

and the other is the Gaussian integral formula

We shall explain these in this section and also describe two basic physical applications of the real and complex exponential functions.

The exponential function is defined by the series

and is positive for all *x*. Obviously ${e}^{0}=1$. Euler’s constant *e* is defined as *e*^{1}, the sum of the series for $x=1$. Numerically, $e=\mathrm{2.718\dots}$. By expanding out, one can verify that

which is the key property of the exponential function. This property makes it consistent to identify *e*^{x} (as a series) with the *x*-th power of *e*. As an illustration, *e*^{2} (as a series) equals
(p.19)
${e}^{1}\phantom{\rule{thinmathspace}{0ex}}{e}^{1}$ (the product of two series) so ${e}^{2}=e\times e$. Differentiating the series (1.46) term by term, one easily sees that

The importance of this simple formula is illustrated in section 1.4.1.

The extension of the exponential function to imaginary arguments is defined using the same series expansion,

where ${i}^{2}=-1$. The real and imaginary parts of this expansion are the well known series expansions of $cosx$ and $sinx$,

so

Now, $cos\pi =-1$ and $sin\pi =0$, so if we substitute the value $x=\pi $ into this expression we obtain the Euler relation, ${e}^{i\pi}=-1$. Raising it to the power of $2n$, we see that one consequence is that ${e}^{2ni\pi}=1$ for any integer *n*.

## 1.4.1 Radioactive decay

(p.20)
Radioactivity was discovered in 1896 by Henri Becquerel. When a radioactive nucleus decays it changes into a different nucleus. The rate of change of the number of radioactive nuclei *N* is described by the following law:

where *λ* is known as the decay constant. The radioactivity decays exponentially, as the solution of the differential equation (1.53) is

where *N*_{0} is the initial number of radioactive nuclei when $t=0$. The solution is shown in Figure 1.11. Taking logarithms, we obtain

The time ${\tau}_{\frac{1}{2}}$ taken for half the nuclei to decay is known as the *half-life* of the radioactive substance. It is given by $ln\left(\frac{1}{2}\right)=-\mathrm{\lambda}\phantom{\rule{thinmathspace}{0ex}}{\tau}_{\frac{1}{2}}$, so

We can also work out the mean lifetime $\stackrel{\u02c9}{t}$ of the radioactive nucleus. All *N*_{0} nuclei eventually decay so we can average over the times of decay, finding

where in the second line we have substituted for *t* from equation (1.55).

Radioactivity provides an extremely useful tool for dating artefacts. If we know that a sample material originally contained *N*_{0} radioactive nuclei, and now contains *N* of these, then we can determine the time *t* that has elapsed since the material formed,

Different radioactive nuclei may be used depending on the relevant timescales. For instance, uranium-238, with a half-life of about 4.5 billion years, has been used to date meteorites and thereby determine the age of the solar system, and carbon-14, with a half-life of 5730 years, is used to date archaeological remains.

## (p.21) 1.4.2 Waves and periodic functions

We can represent a wave varying with position *x* and time *t*, travelling towards the positive *x*-direction, as ${e}^{i(kx-\omega t)}$ with *k* and ω positive, as shown in Figure 1.12. Because of the Euler relation, the wave will be the same at positions for which *kx* differs by an integer multiple of $2\pi $, so the wavelength is $\frac{2\pi}{k}$. Similarly, the wave will be the same at times for which $\omega t$ differs by an integer multiple of $2\pi $, so the period is $\frac{2\pi}{\omega}$. *k* and ω are called, respectively, the *wavenumber* and *angular frequency* of the wave.

The phase of the wave remains constant where $kx-\omega t=\mathrm{c}\mathrm{o}\mathrm{n}\mathrm{s}\mathrm{t}\mathrm{a}\mathrm{n}\mathrm{t}$. The phase is therefore constant at a point *x* that moves along at velocity $\frac{\omega}{k}$, and this is the velocity of the wave. If *k* is negative, and ω still positive, the wave moves in the opposite direction.

The real and imaginary parts of the wave are $cos(kx-\omega t)$ and $sin(kx-\omega t)$. These are called sine waves, but one is phase shifted by $\frac{\pi}{2}$ relative to the other. Many types of wave, for example, electromagnetic waves and the waves on the surface of a fluid, are real, but in quantum mechanics the wavefunction of a freely moving particle is a complex wave.

## 1.4.3 The Gaussian integral

The integral of the Gaussian function ${e}^{-{x}^{2}}$, shown in Figure 1.13, cannot be expressed in terms of standard functions, so the indefinite integral from $-\mathrm{\infty}$ to *X* is not elementary. On the other hand, the definite integral from $-\mathrm{\infty}$ to ∞ has the value

This is the simplest *Gaussian integral*. It often arises in physics, and we will make use of it later.

(p.22)
*I* can be evaulated using a rather surprising trick. We begin by considering its square,

This can be expressed as the 2-dimensional integral

where the integral is over the whole plane, ${\mathbb{R}}^{2}$. Now convert to polar coordinates. Let *r* be the radial coordinate and $\vartheta $ the angular coordinate. Then, by Pythagoras’ theorem ${r}^{2}={x}_{1}^{2}+{x}_{2}^{2}$, and the integration measure is ${d}^{2}x=r\phantom{\rule{thinmathspace}{0ex}}dr\phantom{\rule{thinmathspace}{0ex}}d\vartheta $. So

The range of $\vartheta $ is $2\pi $ because, geometrically, $2\pi $ is the length of the unit circle. The extra factor of *r* makes the integral over *r* elementary, and the result is

so $I=\sqrt{\pi}$, as claimed.

A more general Gaussian integral is

where we have used the substitution $y=\sqrt{\alpha}\phantom{\rule{thinmathspace}{0ex}}x$. Another useful trick allows us to evaluate a sequence of integrals where the Gaussian function is multiplied by an even power of *x*. Differentiating the integral $I(\alpha )$ with respect to *α* brings down a factor of $-{x}^{2}$, so

(p.23)
Differentiating a second time with respect to *α*, we obtain

We can continue differentiating with respect to *α* to evaluate all integrals of the form ${\int}_{-\mathrm{\infty}}^{\mathrm{\infty}}{x}^{2n}{e}^{-\alpha {x}^{2}}\phantom{\rule{thinmathspace}{0ex}}dx$.

If the Gaussian is multiplied by an odd power of *x*, the integrand is an odd function, antisymmetric under $x\to -x$, so ${\int}_{-\mathrm{\infty}}^{\mathrm{\infty}}{x}^{2n+1}{e}^{-\alpha {x}^{2}}\phantom{\rule{thinmathspace}{0ex}}dx=0$. When the lower limit is 0, these integrals can be evaluated by substituting $y={x}^{2}$, then integrating by parts, to give

Repeating this procedure *n* times, we find ${\int}_{0}^{\mathrm{\infty}}{y}^{n}{e}^{-y}\phantom{\rule{thinmathspace}{0ex}}dy=n!{\int}_{0}^{\mathrm{\infty}}{e}^{-y}\phantom{\rule{thinmathspace}{0ex}}dy=n!$, and therefore

The basic Gaussian integral and these variants are useful in many areas of physics, especially quantum mechanics and quantum field theory.

We can also find interesting geometrical results by considering the *n*th power of *I*,

which can be re-expressed as an *n*-dimensional integral

Now convert to spherical polar coordinates $r,\mathrm{\Omega}$ in *n* dimensions, where Ω denotes collectively the $n-1$ angular coordinates. By Pythagoras’ theorem in *n* dimensions, ${r}^{2}={x}_{1}^{2}+{x}_{2}^{2}+\cdots +{x}_{n}^{2}$, and the integration measure ${d}^{n}x$ becomes ${r}^{n-1}\phantom{\rule{thinmathspace}{0ex}}dr\phantom{\rule{thinmathspace}{0ex}}d\mathrm{\Omega}$, where $d\mathrm{\Omega}$ denotes the volume element of the unit sphere in *n* dimensions, the unit $(n-1)$-sphere. So

The integral of $d\mathrm{\Omega}$ is the total volume of the unit $(n-1)$-sphere, and the remaining radial integral is one of the Gaussian integrals considered above.

For example, in the case of *I*^{3} the radial integral has the same form as the integral (1.65), but with the lower limit 0 (and $\alpha =1$). It equals $\frac{1}{4}\sqrt{\pi}$, half the full Gaussian integral, so

where *A* is the area of the unit 2-sphere, the sphere we are familiar with. We know that $I=\sqrt{\pi}$, so ${I}^{3}=\pi \sqrt{\pi}$, and therefore $A=4\pi $, the well known result for the area of the
(p.24)
sphere. Note that in this calculation, using a Gaussian integral, we have not needed to make an explicit choice of angular coordinates to find *A*.

By a similar calculation we can find a less well known result, the volume of the unit sphere in four dimensions, the 3-sphere. Just as the 2-sphere is a 2-dimensional surface enclosing a 3-dimensional ball, so the 3-sphere is a 3-dimensional volume enclosing a ball of 4-dimensional space. Equation (1.71) becomes

where *V* is the volume of the unit 3-sphere. Using ${I}^{4}={\pi}^{2}$ and the integral (1.68) in the case $n=1$, namely ${\int}_{0}^{\mathrm{\infty}}{e}^{-{r}^{2}}\phantom{\rule{thinmathspace}{0ex}}{r}^{3}\phantom{\rule{thinmathspace}{0ex}}dr=\frac{1}{2}$, we find $V=2{\pi}^{2}$.

## 1.4.4 The method of steepest descents

In many physics applications we arrive at integrals that cannot be evaluated exactly, where the integrand is a product of a variant of a Gaussian and another function. We will see an example of this in Chapter 11, when considering nuclear fusion. In such cases, the basic Gaussian integral can be used to give estimates of these more complicated integrals. Suppose $g(x)$ has a single maximum between *α* and *β* at *x*_{0}; then, since ${g}^{\mathrm{\prime}}({x}_{0})=0$ and ${g}^{\mathrm{\prime}\mathrm{\prime}}({x}_{0})<0$, we can use the expansion $g(x)\simeq g({x}_{0})-\frac{1}{2}|{g}^{\mathrm{\prime}\mathrm{\prime}}({x}_{0})|(x-{x}_{0}{)}^{2}$ near *x*_{0}. This means that the integral

can be approximated by

If, further, $F(x)$ varies slowly near *x*_{0}, then it can be treated as a constant $F({x}_{0})$ and pulled out of the integral to give

As the integrand is concentrated around the point *x*_{0}, we can extend the limits of integration to $\pm \mathrm{\infty}$ without significantly affecting the value of the integral, so

where in the last step we used the Gaussian integral (1.64).

This is known as the steepest descents approximation. It is accurate provided the second derivative ${g}^{\mathrm{\prime}\mathrm{\prime}}({x}_{0})$ has a large magnitude and higher order terms in the Taylor expansions of *g* and *F* around *x*_{0} can be neglected.

(p.25) 1.5 Further Reading

Bibliography references:

For a survey of variational principles and their history, see

D.S. Lemons, *Perfect Form: Variational Principles, Methods, and Applications in Elementary Physics*, Princeton: PUP, 1997.

H.H. Goldstine, *A History of the Calculus of Variations: from the 17th through the 19th Century*, New York: Springer, 1980.

For comprehensive coverage of the mathematics used in this book, consult

K.F. Riley, M.P. Hobson and S.J. Bence, *Mathematical Methods for Physics and Engineering (3rd ed.)*, Cambridge: CUP, 2006.

## Notes:

(^{1})
Snell’s law may be more familiar in terms of the angles ${\phi}^{\mathrm{\prime}}=\frac{\pi}{2}-\phi $ and ${\vartheta}^{\mathrm{\prime}}=\frac{\pi}{2}-\vartheta $ between the light ray and the normal (the perpendicular line) to the surface, in which case it takes the form $sin{\phi}^{\mathrm{\prime}}=\frac{{c}_{2}}{{c}_{1}}sin{\vartheta}^{\mathrm{\prime}}$.

(^{2})
A saddle point in a landscape is a stationary point of the height, like a mountain pass, but is neither a maximum nor a minimum.