Jump to ContentJump to Main Navigation
The Physical WorldAn Inspirational Tour of Fundamental Physics$

Nicholas Manton and Nicholas Mee

Print publication date: 2017

Print ISBN-13: 9780198795933

Published to Oxford Scholarship Online: July 2017

DOI: 10.1093/oso/9780198795933.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (www.oxfordscholarship.com). (c) Copyright Oxford University Press, 2017. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a monograph in OSO for personal use (for details see http://www.oxfordscholarship.com/page/privacy-policy).date: 23 October 2017

Fundamental Ideas

Fundamental Ideas

Chapter:
(p.4) 1 Fundamental Ideas
Source:
The Physical World
Author(s):

Nicholas Manton

Nicholas Mee

Publisher:
Oxford University Press
DOI:10.1093/oso/9780198795933.003.0002

Abstract and Keywords

Chapter 1 offers a simple introduction to the use of variational principles in physics. This approach to physics plays a key role in the book. The chapter starts with a look at how we might minimize a journey by car, even if this means taking a longer route. Soap films are also discussed. It then turns to geometrical optics and uses Fermat’s principle to explain the reflection and refraction of light. There follows a discussion of the significance of variational principles throughout physics. The chapter also covers some introductory mathematical ideas and techniques that will be used in later chapters. These include the mathematical representation of space and time and the use of vectors; partial differentiation, which is necessary to express all the fundamental equations of physics; and Gaussian integrals, which arise in many physical contexts. These mathematical techniques are illustrated by their application to waves and radioactive decay.

Keywords:   Fermat’s principle, geometrical optics, variational principle, Gaussian integral, waves, vectors, radioactive decay, soap films

1.1 Variational Principles

Many of our activities in everyday life are directed towards optimizing some quantity. We often try to perform tasks with minimal effort, or as quickly as possible. Here is a simple example: we may plan a road journey to minimize the travel time, taking a longer route in order to go faster along a section of highway. Figure 1.1 is a schematic road map between towns A and B. Speed on the ordinary roads is 50 mph, and on the highway passing through F, G and H it is 70 mph. The minimum journey time is 1 hr 24 mins along route AFGB, even though this is not the shortest route.

Fundamental Ideas

Fig. 1.1 Road map with distances in miles. The speed on ordinary roads is 50 mph, and on the highway 70 mph.

Remarkably, many natural processes can similarly be seen as optimizing some quantity. We say that they satisfy a variational principle. An elastic band stretched between two points lies along a straight line; this is the shortest path and also minimizes the elastic band’s energy. We can understand why a straight line is the shortest path as follows. First we need to assume that a shortest path does exist. In the current situation this is obvious, but there are more complicated optimization problems where there is no optimal solution. Now assume that the shortest path has a curved segment somewhere along it. Any curved segment can be approximated by part of a circle, as shown in Figure 1.2, and using a little trigonometry, we can check that the straight segment CD is shorter than the circular arc CD. In fact, the circular arc has length 2Rα, and the straight segment has length 2Rsinα, which is shorter. So the assumption that the shortest path is somewhere curved is contradictory. Therefore the shortest path is straight.

Fundamental Ideas

Fig. 1.2 Any curved section of a path can be approximated by part of a circle. A straight chord across this circle is shorter than the curved path.

(p.5) A soap film is another familiar, physical example of energy optimization. Although it might initially be vibrating, the soap film will eventually settle into a state in which it is at rest. Its energy is then the product of its constant surface tension and its area, so the energy is minimized when the area is minimized. For any smooth surface in 3-dimensional space, there are two principal radii of curvature, r1 and r2; for a surface of minimal area the two radii of curvature are equal, but point in opposite directions. Every region of the surface is saddle-like, as shown in Figure 1.3. We can understand physically why the surface tension has this effect. On each small element of the surface, the two curvatures produce forces. If they are equal in magnitude and opposite in direction then they cancel, and the surface element is in equilibrium. We therefore have an intimate connection between the physical ideas of energy and force and the geometrical concept of minimal area. We will discuss the geometry of curved surfaces further in Chapter 5.

Fundamental Ideas

Fig. 1.3 A soap film is a surface of minimal area. The two radii of curvature are equal, but the curvatures are in opposite directions. The force due to the curvature in one direction balances the force due to the curvature in the other direction.

(p.6) 1.1.1 Geometrical optics—reflection and refraction

Fermat’s principle in the field of optics was the first optimization principle to be discovered in physics. It was described by Pierre de Fermat in 1662. Geometrical optics is the study of idealized, infinitesimally thin beams of light, known as light rays. In the real world, narrow beams of light that are close to ideal rays can be obtained using parabolic mirrors or by projecting light through a screen containing narrow slits. Even if the light is not physically restricted like this, it can still be considered as a collection of rays travelling in different directions.

Fermat’s principle says that the path taken by a light ray between two given points, A and B, is the path that minimizes the total travel time. The path may be straight, or it may be bent or even curved as it passes through various media. A fundamental assumption is that in a given medium, a light ray has a definite, finite speed. In a uniform medium, for example air or water or a vacuum, the travel time equals the length of the path divided by the light speed. Since the speed is constant, the path of minimal time is also the shortest path, and this is the straight line path from A to B. So light rays are straight in a uniform medium, as is readily verified. A light ray heading off in the correct direction from a source at A will arrive at B, and even though the source may emit light in all directions, a small obstacle anywhere along the line between A and B will prevent light reaching B, and will cast a shadow there.

Fermat’s principle can be used to understand two basic laws of optics, the laws of reflection and refraction. First, let’s consider reflection. Suppose we have a long flat mirror in a uniform medium, and a light source at A. Let B be the light receiving point, on the same side of the mirror, as shown in Figure 1.4. Consider all the possible light rays from A to B that bounce off the mirror once. If the time for the light to travel from A to B is to be minimized, the segments before and after reflection must be straight. What we’d like to know is the position of the reflection point X.

Fundamental Ideas

Fig. 1.4 Reflection of a light ray from a mirror.

The coordinates in the figure show the x-axis along the mirror, and the reflection point X is at x=X. Consider the various lengths in the figure, and ignore the angles ϑ and φ for the moment. Using Pythagoras’ theorem to determine the path lengths, we find that the time for the light to travel from A to B via X is (p.7)

(1.1)
T=1c(a2+X2+b2+(LX)2),

where c is the speed of the light along both straight segments. The derivative of T with respect to X is

(1.2)
dTdX=1c(Xa2+X2LXb2+(LX)2),

and the travel time is minimized when this derivative vanishes, giving the equation for X,

(1.3)
Xa2+X2=LXb2+(LX)2.

Now the angles come in handy, as equation (1.3) is equivalent to

(1.4)
cosϑ=cosφ,

as can be seen from Figure 1.4. Therefore ϑ and φ are equal. We haven’t explicitly found X, but that doesn’t matter. The important result is that the incoming and outgoing light rays meet the mirror surface at equal angles. This is the fundamental law of reflection. In fact, by simplifying equation (1.3) or by considering the equation cotϑ=cotφ, we obtain Xa=(LX)b, and then X is easily found.

Refraction isn’t very different. Here, light rays pass from a medium where the speed is c1 into another medium where the speed is c2. The geometry of refraction is different from that of reflection, but not very much, and we use similar coordinates (see Figure 1.5). By Fermat’s principle, the path of the actual light ray from A to B, or from B to A, is the one that minimizes the time taken. Note that, unless c1=c2, this is definitely not the same as the shortest path from A to B, which is the straight line between them. The path of minimum time has a kink, just like the route via the highway that we considered earlier.

Fundamental Ideas

Fig. 1.5 The refraction of a light ray. c2, the light speed in medium 2, is less than c1, the speed in medium 1.

The rays from A to X and from X to B must be straight, because each of these segments is wholly within a single medium and traced out at a single speed. The total time for the light to travel from A to B is therefore

(1.5)
T=1c1a2+X2+1c2b2+(LX)2.

The time T is again minimized when the derivative of T with respect to X vanishes, that is,

(1.6)
dTdX=1c1Xa2+X21c2LXb2+(LX)2=0.

This gives the equation for X,

(1.7)
1c1Xa2+X2=1c2LXb2+(LX)2.

We do not really want to solve this, but rather to express it more geometrically. In terms of the angles ϑ and φ in Figure 1.5, the equation becomes

(1.8)
1c1cosϑ=1c2cosφ,

(p.8) or more usefully

(1.9)
cosφ=c2c1cosϑ.

This is Willebrord Snell’s law of refraction.1 It relates the angles of the light rays to the ratio of the light speeds c2 and c1. Snell’s law can be tested experimentally even if the light speeds are unknown. To do this, the angle at which the light beam hits the surface must be varied, so that A and B are no longer fixed. When cosφ is plotted against cosϑ, the resulting graph is a straight line through the origin.

Suppose the light passes from air into water. The speed of light in water is less than its speed in air, so c2 is less than c1, and cosφ is less than cosϑ. Therefore φ is greater than ϑ. The result, as is easily verified, is that light rays are bent into the water towards the normal to the surface, as shown in Figure 1.5.

Snell’s law has many interesting consequences. It is key to applications such as light focussing and lens systems. It also accounts for the phenomenon of total internal reflection. This occurs when a light ray originating at B, in the medium where the light speed is less, hits the surface at a small angle φ for which cosφ is close to 1 and therefore cosϑ>1. There is then no solution for the angle ϑ, so the ray cannot cross the surface into medium 1, and the entire ray is reflected internally. The critical angle of incidence φc for total internal reflection depends on the ratio of the light speeds in the two media. Equation (1.9) shows (p.9) that cosφc=c2c1. This result is important for applications such as the transmission of light signals down fibre optic cables.

Originally, the law of refraction was expressed in terms of a ratio of refractive indices on the right-hand side of equation (1.9). It was by considering Fermat’s principle that physicists realised that the ratio could be understood as a ratio of light speeds. Later, when the speed of light in various media could be directly measured, it was found that light travels at its maximal speed in a vacuum, and only slightly slower in air. In denser materials such as water or glass, however, its speed is considerably slower, by about 20–40%. The speed of light in a vacuum is an absolute constant, 299,792,458 ms1, which is often approximated as 3×108 ms1. In dense media the speed may depend on the wavelength of the light, so in passing from air into glass or water, rays of different colours bend through different angles, which is why a refracted beam of white light splits up when entering a glass prism or water droplet.

1.1.2 The scope of variational principles

We have given a brief flavour of how some mathematical laws of nature can be formulated in terms of variational principles. These principles are actually much more general, and occur throughout physics. Be it the motion of particles, the waveforms of fields, quantum states, or even the shape of spacetime itself, we find that natural processes always optimize some physical quantity. Usually this means that the quantity is minimized or maximized, but it may be at a saddle point.2 The most important such quantity is known as an action, and many laws of physics can be formulated as a principle of least action. The appropriate mathematics for analysing these principles is called the calculus of variations. It is an extension of ordinary calculus, with its own additional tools that we will explain later.

As long ago as the 18th century, Jean le Rond D’Alembert, Leonhard Euler and Joseph-Louis Lagrange realized that Newton’s laws of motion could be derived from a principle of least action. This approach was perfected by William Rowan Hamilton in the 1830s. We now know that Maxwell’s equations for electric and magnetic fields also arise from an electromagnetic action principle, and in 1915 David Hilbert showed that Einstein’s newly discovered equations describing gravity as curved spacetime arise from an action principle. Even the relationship between classical physics and quantum mechanics is best understood in terms of an action principle. This idea was pioneered by Paul Dirac, and perfected by Feynman. Today, the action principle is seen as the best method of encapsulating the behaviour of particles and fields.

One advantage of formulating physical theories in this way is that the principle of least action is concise and easy to remember. For example, in Maxwell’s original formulation of electromagnetism, there were 20 equations for electromagnetic fields. In modern vector notation, due to Josiah Willard Gibbs, there are four Maxwell equations, supplemented by the Lorentz force law for charged particles. The action, on the other hand, is a single quantity constructed from the electromagnetic fields and the trajectories of charged particles, as we will describe in Chapter 3. This economy is essential when developing the more complicated gauge theories of elementary particles, discussed in Chapter 12, and even more esoteric theories, such as string theory.

(p.10) In Chapter 2 we will return to these ideas and show how Newtonian mechanics can be understood in terms of the principle of least action. By considering all possible infinitesimal variations in the motion of a physical body through space, we will derive Newton’s laws of motion. First, however, we must describe mathematically the arena in which this motion takes place.

1.2 Euclidean Space and Time

Familiar 3-dimensional Euclidean space, known as 3-space for short and often denoted by R3, is the stage on which the drama of the physical world is played out. This drama takes place in time, but time and space are not unified in non-relativistic physics, so we will not require a geometrical description of time as yet. 3-space has the Euclidean symmetries of rotations and translations, where a translation is a rigid motion without rotation. The most fundamental geometrical concept is the distance between two points, and this is unchanged by translations and rotations. It is natural to express the laws of physics in a way that is independent of position and orientation. Then their form does not change when the entire physical system is translated or rotated. This gives the laws a geometrical significance.

A point in space is most easily described using Cartesian coordinates. For this one needs to pick an origin O, and a set of axes that are mutually orthogonal, meaning at right angles. Every point P is uniquely represented by three real numbers, collectively written as a vector x=(x1,x2,x3). Often, we will not distinguish a point from the vector representing it. To get from O to P one moves a distance x1 along the 1-axis, to A, then a distance x2 parallel to the 2-axis, to B, and finally a distance x3 parallel to the 3-axis, to P, as shown in Figure 1.6. O itself is represented by the vector (0, 0, 0).

Fundamental Ideas

Fig. 1.6 Representation of a point P by a vector x.

The length or magnitude of x is the distance from O to P, and is denoted by |x|. This distance can be calculated using Pythagoras’ theorem. OAB is a right angle triangle, so the distance from O to B is x12+x22, and since OBP is also a right angle triangle, the distance (p.11) from O to P is (x12+x22)+x32. The square of the distance is therefore

(1.10)
|x|2=x12+x22+x32,

which is the 3-dimensional version of Pythagoras’ theorem. If one performs a rotation about O, the distance |x| remains the same.

The rotation sending x to x may be an active one, making x and x genuinely different points. Alternatively, the rotation may be a passive one, by which we mean that the axes are rotated, but the point x does not actually change. All that happens is that x acquires a new set of coordinates x=(x1,x2,x3) relative to the rotated axes. In both cases |x|=|x|.

The square of the distance between points x and y is

(1.11)
|xy|2=(x1y1)2+(x2y2)2+(x3y3)2.

This distance is unaffected by both rotations and translations. A translation shifts all points by a fixed vector c, so x and y are shifted to x+c and y+c. The difference xy is unchanged, and so is |xy|.

When considering a pair of vectors x and y, it is useful to introduce their dot product

(1.12)
xy=x1y1+x2y2+x3y3.

A special case of this is xx=x12+x22+x32=|x|2, expressing the squared length of x as the dot product of x with itself. It is not immediately obvious whether xy is affected by a rotation. However, if we expand out the terms on the right-hand side of equation (1.11), we find that

(1.13)
|xy|2=|x|2+|y|22xy,

and as |x|, |y| and |xy| are all unaffected by a rotation, xy must also be unaffected. We can use this result to find a more convenient expression for the dot product of x and y. When applied to a triangle with edges of length |x|, |y| and |xy|, as shown in Figure 1.7, we can rearrange the expression (1.13), and then use the cosine rule to obtain

(1.14)
xy=12(|x|2+|y|2|xy|2)=|x||y|cosϑ,

where ϑ is the angle between the vectors x and y.

Fundamental Ideas

Fig. 1.7 The dot product of two vectors is xy=|x||y|cosϑ.

It follows that if xy=0, and the lengths of the vectors x and y are non-zero, then cosϑ=0 so the angle between x and y is ϑ=±π2, and the two vectors are orthogonal. For (p.12) example, the basis vectors along the Cartesian axes, (1,0,0), (0,1,0) and (0,0,1) are all of unit length, and the dot product of any pair of them vanishes, so they are orthogonal.

Critically, in Euclidean 3-space, the lengths of vectors and the angles between them are invariant under any rotation of all the vectors together, and this is why the dot product is a useful construction. Quantities such as xy that are unaffected by rotations are called scalars.

There is a further, equally useful construction. From two vectors x and y one may construct a third vector, their cross product x×y, as shown in Figure 1.8. This has components

(1.15)
x×y=(x2y3x3y2,x3y1x1y3,x1y2x2y1).

Fundamental Ideas

Fig. 1.8 The cross product x×y is a vector of length |x||y|sinϑ.

The cross product is useful, because if both x and y are rotated around an arbitrary axis, then x×y rotates with them. (If one invented another vector product of x and y with components (x2y3,x3y1,x1y2), say, then it would not have this rotational property and would have little geometrical significance.) Unlike the dot product xy, the cross product x×y is not invariant under rotations. We say that it transforms covariantly with x and y under rotations. ‘Co-variant’ means ‘varying with’ or ‘transforming in the same way as’, and this is an idea that occurs frequently in physics.

We can check this rotational covariance of x×y by considering the dot product of x×y with a third vector z. Using equations (1.15) and (1.12), we find

(1.16)
(x×y)z=x2y3z1x3y2z1+x3y1z2x1y3z2+x1y2z3x2y1z3.

This is generally non-zero, but if either z=x or z=y, it is easy to see that the six terms above cancel out in pairs, and the result is zero. This means that x×y is orthogonal to x and orthogonal to y, as shown in Figure 1.8. When subject to a rotation, the directions of x×y, x and y must therefore all rotate together. Now we just need to check that the (p.13) length of x×y is invariant under rotations. In terms of its components, the squared length of x×y is

(1.17)
|x×y|2=(x2y3x3y2)2+(x3y1x1y3)2+(x1y2x2y1)2,

and this can be reexpressed, after a little algebra, as

(1.18)
|x×y|2=(xx)(yy)(xy)2.

The right-hand side only includes quantities that are rotationally invariant, so |x×y| is similarly invariant. The right-hand side can be expressed in terms of lengths and angles as |x|2|y|2|x|2|y|2cos2ϑ, which simplifies to |x|2|y|2sin2ϑ. The length of the vector x×y is therefore |x||y|sinϑ.

Under the exchange of x and y, the two quantities xy and x×y have opposite symmetry properties. xy=yx, but x×y=(y×x), as is clear from equations (1.12) and (1.15). The latter relation implies that x×x=0 for any x.

From three vectors x, y and z, there are two useful geometrical quantities that can be constructed. One is the scalar (x×y)z. This has several nice symmetry properties that can be verified using equation (1.16), in particular

(1.19)
(x×y)z=x(y×z).

The other geometrical quantity is the double cross product (x×y)×z, which is a vector. This can be expressed in terms of dot products through the important identity

(1.20)
(x×y)×z=(xz)y(yz)x.

This identity, which is covariant under rotations, is easily checked using the cross product definition (1.15). To gain some intuition into its form, note that x×y is orthogonal to the plane spanned by x and y, and taking the cross product with z gives a vector orthogonal to x×y and therefore back in this plane. (x×y)×z must therefore be a linear combination of x and y. This vector must also be orthogonal to z and this is clearly true of the right-hand side of the identity, as

(1.21)
((xz)y(yz)x)z=(xz)(yz)(yz)(xz)=0.

We have gone into these properties of xy and x×y in some detail, because the laws of physics need to be expressed in a way that doesn’t change when the entire physical system is rotated or translated. Even more importantly, the laws of physics should not change if one passively rotates the axes or shifts the origin. Dot products and cross products therefore occur frequently in physical contexts, for example, in formulae for energy and angular momentum. In the next section we will meet a vector of partial derivatives, denoted by ∇, and should not be surprised that it appears in electromagnetic theory in expressions such as E and ×E, where E is the electric field vector. We will define and use these quantities in Chapter 3.

Geometrically speaking, there is not much to add concerning time until we discuss relativity in Chapter 4. In non-relativistic physics we use a further Cartesian coordinate t to represent time. Given times t1 and t2, it is the interval between them, t2t1, that is (p.14) physically meaningful. Physical phenomena are unaffected by a time shift. If a process can start at t1 and end at t2 then it can equally well start at t1+c and end at t2+c. Suppose some system starts at t=0 and ends in the same state at t=T. Then it will repeat, and come back to the same state at t=2T, t=3T, and so on. This has an application that is very familiar to us in the guise of a clock.

1.3 Partial Derivatives

Physics in 3-dimensional space often involves functions of several variables. When a function depends on more than one variable, we need to consider the derivatives with respect to all of these. Suppose ϕ(x1,x2,x3) is a smooth function defined in Euclidean 3-space. The partial derivative ϕx1 is just the ordinary derivative with respect to x1, with x2 and x3 treated as fixed, or constant. It can be evaluated at any point x=(x1,x2,x3). By taking x2 and x3 fixed, one is really just thinking of ϕ‎ as a function of x1 along the line through x parallel to the 1-axis, and the partial derivative ϕx1 is the ordinary derivative along this line. The partial derivatives ϕx2 and ϕx3 are similarly defined at x by differentiating along the lines through x parallel to the 2-axis and 3-axis.

It is easy to calculate the partial derivatives of functions that are known explicitly. For example, if ϕ(x1,x2,x3)=x13x24x3, then ϕx1 is found by differentiating x13 and treating x24x3 as a constant, and similarly for ϕx2 and ϕx3. Therefore

(1.22)
ϕx1=3x12x24x3,ϕx2=4x13x23x3,ϕx3=x13x24.

Recall that by using the ordinary derivative of a function f(x), denoted by f(x), we can obtain an approximate value for f(x+δx) when δx is small:

(1.23)
f(x+δx)f(x)+f(x)δx.

Similarly, by using the partial derivative ϕx1, we obtain

(1.24)
ϕ(x1+δx1,x2,x3)ϕ(x1,x2,x3)+ϕx1δx1.

By combining the three partial derivatives of ϕ‎ at x, we obtain the more powerful result

(1.25)
ϕ(x1+δx1,x2+δx2,x3+δx3)ϕ(x1,x2,x3)+ϕx1δx1+ϕx2δx2+ϕx3δx3.

This provides an approximation for ϕ‎ at any point x+δx close to x.

There is an implicit assumption here, which is that ϕx2 is essentially the same at the point (x1+δx1,x2,x3) as it is at (x1,x2,x3), and similarly for ϕx3. This is why we supposed earlier that ϕ‎ was smooth.

The collection of partial derivatives of ϕ‎ forms a vector, denoted by ϕ:

(1.26)
ϕ=ϕx1,ϕx2,ϕx3.

(p.15) Similarly δx=(δx1,δx2,δx3) is a vector. Equation (1.25) can be written more concisely as

(1.27)
ϕ(x+δx)ϕ(x)+ϕδx,

a result we will use repeatedly. On the right-hand side is a genuine dot product that is unchanged if one rotates the axes. ϕ is called the gradient of ϕ‎.

A good way to think about a function is in terms of its contours. For a function ϕ‎ in 3-space, the contours are the surfaces of constant ϕ‎. If δx is any vector tangent to the contour surface through x, then ϕ(x+δx)ϕ(x)0 to linear order in δx, so ϕδx=0. Therefore ϕ is orthogonal to δx, implying that ϕ is a vector orthogonal to the contour surface, as shown in Figure 1.9. In fact, ϕ is in the direction of steepest ascent of ϕ‎, and its magnitude is the rate of increase of ϕ with distance in this direction. This justifies the name ‘gradient’.

Fundamental Ideas

Fig. 1.9 The curves represent the contours of constant ϕ‎. The arrows represent the gradient ϕ.

There can be points x where all three partial derivatives vanish, and ϕ=0. x is then a stationary point of ϕ‎. Whether the stationary point is a minimum, a maximum, or a saddle point depends on the second partial derivatives of ϕ‎ at x.

There are nine possible second partial derivatives of ϕ‎; these include 2ϕx12, 2ϕx1x2, 2ϕx2x1 and 2ϕx22. The mixed partial derivative 2ϕx1x2 is obtained by first differentiating with respect to x2, and then differentiating the result by x1, whereas for 2ϕx2x1 the order of differentiation is reversed.

For example, for the function ϕ(x1,x2,x3)=x13x24x3, one has

(1.28)
2ϕx12=6x1x24x3,2ϕx1x2=12x12x23x3,2ϕx2x1=12x12x23x3,2ϕx22=12x13x22x3.

Notice that the mixed partial derivatives are actually the same. This is an important and completely general result.

(p.16) To prove this result, one needs to think about the rectangle of values of ϕ‎ shown in Figure 1.10 and estimate in two ways the expression

(1.29)
ϕ(x1+δx1,x2+δx2,x3)ϕ(x1+δx1,x2,x3)ϕ(x1,x2+δx2,x3)+ϕ(x1,x2,x3).

Fundamental Ideas

Fig. 1.10 Infinitesimal rectangle showing four positions at which the function ϕ‎ can be evaluated.

One estimate is the difference of the differences along the vertical edges,

(1.30)
ϕ(x1+δx1,x2+δx2,x3)ϕ(x1+δx1,x2,x3)ϕ(x1,x2+δx2,x3)ϕ(x1,x2,x3)ϕx2(x1+δx1,x2,x3)δx2ϕx2(x1,x2,x3)δx22ϕx1x2(x1,x2,x3)δx1δx2.

The other estimate, with the bracketing reorganized, is the difference of the differences along the horizontal edges,

(1.31)
ϕ(x1+δx1,x2+δx2,x3)ϕ(x1,x2+δx2,x3)ϕ(x1+δx1,x2,x3)ϕ(x1,x2,x3)ϕx1(x1,x2+δx2,x3)δx1ϕx1(x1,x2,x3)δx12ϕx2x1(x1,x2,x3)δx1δx2.

As the left-hand sides of these two expressions (1.30) and (1.31) are the same, the mixed partial derivatives must be equal. This result is called the symmetry of mixed (second) partial derivatives, because there is a symmetry under exchange of the order of differentiation. We shall make use of this later, for example, when investigating Maxwell’s equations, and when deriving various thermodynamic relationships.

There is a particularly important combination of the second partial derivatives of ϕ‎, called the Laplacian of ϕ‎, and denoted by 2ϕ. This is

(1.32)
2ϕ=2ϕx12+2ϕx22+2ϕx32,

(p.17) and it is a scalar, unchanged if the axes are rotated. The scalar property is evident if one regards x1,x2,x3 as a vector of derivatives and writes

(1.33)
2ϕ=x1,x2,x3ϕx1,ϕx2,ϕx3,

or more compactly 2ϕ=ϕ. More formally still, 2=. For our familiar example ϕ=x13x24x3,

(1.34)
2(x13x24x3)=2x12(x13x24x3)+2x22(x13x24x3)+2x32(x13x24x3)=6x1x24x3+12x13x24x3,

a typical, non-zero result. However, there are plenty of functions whose Laplacian vanishes, for example, x12x22 and x1x2x3.

In 3-space, one often needs to find the gradient or the Laplacian of a function f(r) that depends only on the radial distance r from O. Here, r2=x12+x22+x32. These calculations can be a bit fiddly, because r involves a square root, but are simpler if one works with r2.

Let’s find the gradient first. By the chain rule,

(1.35)
(r2)=2rrx1,rx2,rx3=2rr.

On the other hand, by direct partial differentiation of x12+x22+x32,

(1.36)
(r2)=(2x1,2x2,2x3)=2x.

Comparing these expressions we see that

(1.37)
r=xr=xˆ.

x is a vector of magnitude r, and xˆ is the unit vector that points radially outwards at every point (except O). One can also understand equation (1.37) by noting that the contours of r are spheres centred at O, and the rate of increase of r with distance from O is unity everywhere. Equation (1.35) is easily generalized. For a general function f(r), the chain rule gives

(1.38)
(f(r))=f(r)r=f(r)xr=f(r)xˆ.

The most important example of this result is

(1.39)
1r=1r2xˆ,

which is useful when considering the inverse square law forces of electrostatics and gravity.

(p.18) Next, let us find the Laplacian of f(r). We have (f(r))=1rf(r)x, so

(1.40)
2(f(r))=(f(r))=1rf(r)x.

By the usual Leibniz rule, there are two contributions to the last expression. In one, ∇ acts on the function 1rf(r) to give the contribution

(1.41)
1rf′′(r)1r2f(r)xrx=f′′(r)1rf(r),

where we have applied the result (1.38) again. The other is a dot product, in which the components x1,x2,x3 of ∇ act respectively on the three components (x1,x2,x3) of x to give the number 3, so the second contribution is 3rf(r). Adding these, the result is

(1.42)
2(f(r))=f′′(r)+2rf(r).

The most important example is

(1.43)
21r=2r3+2r1r2=0.

This equation is valid everywhere except at O. 1r is infinite at O, so its gradient is not defined there, and neither is its Laplacian. One says that 1r is singular at O. The most general function just of the variable r whose Laplacian vanishes (except possibly at O) is Cr+D, where C and D are constants.

1.4 e, π‎ and Gaussian Integrals

The transcendental numbers e and π‎ appear throughout mathematics and physics, and will be used frequently in what follows. The exponential function ex, often written as expx, and its complex counterpart eix will also appear frequently. There are two remarkable relations between e and π‎. One is the famous Euler relation

(1.44)
eiπ=1,

and the other is the Gaussian integral formula

(1.45)
ex2dx=π.

We shall explain these in this section and also describe two basic physical applications of the real and complex exponential functions.

The exponential function is defined by the series

(1.46)
ex=1+x+12x2+16x3++1n!xn+,

and is positive for all x. Obviously e0=1. Euler’s constant e is defined as e1, the sum of the series for x=1. Numerically, e=2.718…. By expanding out, one can verify that

(1.47)
ex+y=exey,

which is the key property of the exponential function. This property makes it consistent to identify ex (as a series) with the x-th power of e. As an illustration, e2 (as a series) equals (p.19) e1e1 (the product of two series) so e2=e×e. Differentiating the series (1.46) term by term, one easily sees that

(1.48)
ddx(ex)=ex.

The importance of this simple formula is illustrated in section 1.4.1.

The extension of the exponential function to imaginary arguments is defined using the same series expansion,

(1.49)
eix=1+ix12x2i6x3++inn!xn+,

where i2=1. The real and imaginary parts of this expansion are the well known series expansions of cosx and sinx,

(1.50)
cosx=112x2+124x4+,

(1.51)
sinx=x16x3+,

so

(1.52)
eix=cosx+isinx.

Now, cosπ=1 and sinπ=0, so if we substitute the value x=π into this expression we obtain the Euler relation, eiπ=1. Raising it to the power of 2n, we see that one consequence is that e2niπ=1 for any integer n.

1.4.1 Radioactive decay

(p.20) Radioactivity was discovered in 1896 by Henri Becquerel. When a radioactive nucleus decays it changes into a different nucleus. The rate of change of the number of radioactive nuclei N is described by the following law:

(1.53)
dNdt=λN,

where λ‎ is known as the decay constant. The radioactivity decays exponentially, as the solution of the differential equation (1.53) is

(1.54)
N=N0eλt,

where N0 is the initial number of radioactive nuclei when t=0. The solution is shown in Figure 1.11. Taking logarithms, we obtain

(1.55)
lnNN0=λt.

Fundamental Ideas

Fig. 1.11 Radioactive decay.

The time τ12 taken for half the nuclei to decay is known as the half-life of the radioactive substance. It is given by ln12=λτ12, so

(1.56)
τ12=ln2λ.

We can also work out the mean lifetime tˉ of the radioactive nucleus. All N0 nuclei eventually decay so we can average over the times of decay, finding

(1.57)
tˉ=1N00N0tdN=1λN00N0lnNN0dN=1λN0[NlnNNNlnN0]0N0=1λ,

where in the second line we have substituted for t from equation (1.55).

Radioactivity provides an extremely useful tool for dating artefacts. If we know that a sample material originally contained N0 radioactive nuclei, and now contains N of these, then we can determine the time t that has elapsed since the material formed,

(1.58)
t=1λlnN0N=τ12ln2lnN0N.

Different radioactive nuclei may be used depending on the relevant timescales. For instance, uranium-238, with a half-life of about 4.5 billion years, has been used to date meteorites and thereby determine the age of the solar system, and carbon-14, with a half-life of 5730 years, is used to date archaeological remains.

(p.21) 1.4.2 Waves and periodic functions

We can represent a wave varying with position x and time t, travelling towards the positive x-direction, as ei(kxωt) with k and ω‎ positive, as shown in Figure 1.12. Because of the Euler relation, the wave will be the same at positions for which kx differs by an integer multiple of 2π, so the wavelength is 2πk. Similarly, the wave will be the same at times for which ωt differs by an integer multiple of 2π, so the period is 2πω. k and ω‎ are called, respectively, the wavenumber and angular frequency of the wave.

Fundamental Ideas

Fig. 1.12 The plane wave ei(kxωt) travels in the x-direction at velocity ωk. As time passes, the amplitude of the wave at a fixed position remains constant and the phase of the wave rotates around a circle. The wave is shown decomposed into its real and imaginary components, which are two perpendicular sine waves with a relative phase shift of π2.

The phase of the wave remains constant where kxωt=constant. The phase is therefore constant at a point x that moves along at velocity ωk, and this is the velocity of the wave. If k is negative, and ω‎ still positive, the wave moves in the opposite direction.

The real and imaginary parts of the wave are cos(kxωt) and sin(kxωt). These are called sine waves, but one is phase shifted by π2 relative to the other. Many types of wave, for example, electromagnetic waves and the waves on the surface of a fluid, are real, but in quantum mechanics the wavefunction of a freely moving particle is a complex wave.

1.4.3 The Gaussian integral

The integral of the Gaussian function ex2, shown in Figure 1.13, cannot be expressed in terms of standard functions, so the indefinite integral from to X is not elementary. On the other hand, the definite integral from to ∞ has the value

(1.59)
I=ex2dx=π.

Fundamental Ideas

Fig. 1.13 The Gaussian function.

This is the simplest Gaussian integral. It often arises in physics, and we will make use of it later.

(p.22) I can be evaulated using a rather surprising trick. We begin by considering its square,

(1.60)
I2=ex12dx1ex22dx2.

This can be expressed as the 2-dimensional integral

(1.61)
I2=R2ex12x22d2x,

where the integral is over the whole plane, R2. Now convert to polar coordinates. Let r be the radial coordinate and ϑ the angular coordinate. Then, by Pythagoras’ theorem r2=x12+x22, and the integration measure is d2x=rdrdϑ. So

(1.62)
I2=02π0er2rdrdϑ=2π0er2rdr.

The range of ϑ is 2π because, geometrically, 2π is the length of the unit circle. The extra factor of r makes the integral over r elementary, and the result is

(1.63)
I2=2π12er20=π,

so I=π, as claimed.

A more general Gaussian integral is

(1.64)
I(α)=eαx2dx=1αey2dy=πα,

where we have used the substitution y=αx. Another useful trick allows us to evaluate a sequence of integrals where the Gaussian function is multiplied by an even power of x. Differentiating the integral I(α) with respect to α‎ brings down a factor of x2, so

(1.65)
x2eαx2dx=dI(α)dα=ddαπα=12απα.

(p.23) Differentiating a second time with respect to α‎, we obtain

(1.66)
x4eαx2dx=ddαπ2α32=34α2πα.

We can continue differentiating with respect to α‎ to evaluate all integrals of the form x2neαx2dx.

If the Gaussian is multiplied by an odd power of x, the integrand is an odd function, antisymmetric under xx, so x2n+1eαx2dx=0. When the lower limit is 0, these integrals can be evaluated by substituting y=x2, then integrating by parts, to give

(1.67)
0x2n+1ex2dx=120yneydy=12yney0+12n0yn1eydy=12n0yn1eydy.

Repeating this procedure n times, we find 0yneydy=n!0eydy=n!, and therefore

(1.68)
0x2n+1ex2dx=12n!.

The basic Gaussian integral and these variants are useful in many areas of physics, especially quantum mechanics and quantum field theory.

We can also find interesting geometrical results by considering the nth power of I,

(1.69)
In=ex12dx1ex22dx2exn2dxn,

which can be re-expressed as an n-dimensional integral

(1.70)
In=Rnex12x22xn2dnx.

Now convert to spherical polar coordinates r,Ω in n dimensions, where Ω‎ denotes collectively the n1 angular coordinates. By Pythagoras’ theorem in n dimensions, r2=x12+x22++xn2, and the integration measure dnx becomes rn1drdΩ, where dΩ denotes the volume element of the unit sphere in n dimensions, the unit (n1)-sphere. So

(1.71)
In=0er2rn1drdΩ.

The integral of dΩ is the total volume of the unit (n1)-sphere, and the remaining radial integral is one of the Gaussian integrals considered above.

For example, in the case of I3 the radial integral has the same form as the integral (1.65), but with the lower limit 0 (and α=1). It equals 14π, half the full Gaussian integral, so

(1.72)
I3=14πA

where A is the area of the unit 2-sphere, the sphere we are familiar with. We know that I=π, so I3=ππ, and therefore A=4π, the well known result for the area of the (p.24) sphere. Note that in this calculation, using a Gaussian integral, we have not needed to make an explicit choice of angular coordinates to find A.

By a similar calculation we can find a less well known result, the volume of the unit sphere in four dimensions, the 3-sphere. Just as the 2-sphere is a 2-dimensional surface enclosing a 3-dimensional ball, so the 3-sphere is a 3-dimensional volume enclosing a ball of 4-dimensional space. Equation (1.71) becomes

(1.73)
I4=V0er2r3dr,

where V is the volume of the unit 3-sphere. Using I4=π2 and the integral (1.68) in the case n=1, namely 0er2r3dr=12, we find V=2π2.

1.4.4 The method of steepest descents

In many physics applications we arrive at integrals that cannot be evaluated exactly, where the integrand is a product of a variant of a Gaussian and another function. We will see an example of this in Chapter 11, when considering nuclear fusion. In such cases, the basic Gaussian integral can be used to give estimates of these more complicated integrals. Suppose g(x) has a single maximum between α‎ and β‎ at x0; then, since g(x0)=0 and g(x0)<0, we can use the expansion g(x)g(x0)12|g(x0)|(xx0)2 near x0. This means that the integral

(1.74)
I=αβF(x)exp(g(x))dx

can be approximated by

(1.75)
Iexp(g(x0))αβF(x)exp12|g(x0)|(xx0)2dx.

If, further, F(x) varies slowly near x0, then it can be treated as a constant F(x0) and pulled out of the integral to give

(1.76)
IF(x0)exp(g(x0))αβexp12|g(x0)|(xx0)2dx.

As the integrand is concentrated around the point x0, we can extend the limits of integration to ± without significantly affecting the value of the integral, so

(1.77)
IF(x0)exp(g(x0))exp12|g(x0)|(xx0)2dx=F(x0)exp(g(x0))2π|g(x0)|,

where in the last step we used the Gaussian integral (1.64).

This is known as the steepest descents approximation. It is accurate provided the second derivative g(x0) has a large magnitude and higher order terms in the Taylor expansions of g and F around x0 can be neglected.

(p.25) 1.5 Further Reading

Bibliography references:

For a survey of variational principles and their history, see

D.S. Lemons, Perfect Form: Variational Principles, Methods, and Applications in Elementary Physics, Princeton: PUP, 1997.

H.H. Goldstine, A History of the Calculus of Variations: from the 17th through the 19th Century, New York: Springer, 1980.

For comprehensive coverage of the mathematics used in this book, consult

K.F. Riley, M.P. Hobson and S.J. Bence, Mathematical Methods for Physics and Engineering (3rd ed.), Cambridge: CUP, 2006.

Notes:

(1) Snell’s law may be more familiar in terms of the angles φ=π2φ and ϑ=π2ϑ between the light ray and the normal (the perpendicular line) to the surface, in which case it takes the form sinφ=c2c1sinϑ.

(2) A saddle point in a landscape is a stationary point of the height, like a mountain pass, but is neither a maximum nor a minimum.