By Math Vault | Analysis

To the surprise of many math enthusiasts and the like, it seems that we have been pulling out on an incredible amount of **calculus modules** these days. Of course, if you have had any terrible experience in learning *just* the mechanics of calculus, this seemingly-relentless outpouring of materials could make you want to puke. However, if that header image appeals to you *at first sight*, it could very well be that your brain was, subconsciously or otherwise, operating alone the line of:

## Ah! That lovely

Chain Rulefrom the calculus textbooks/classes back in the good old days!

Except that this time, we have decided to venture into an aspect of it most calculus users probably have never seen before ((or never bother to see?). That is, the somewhat-obscure side about how it is derived *in theory* — and the intuition behind it. Granted, if you’re coming from a background of applied mathematics, all this might sound a bit gibberish/nerve-wracking — let alone will it be very useful to you. However, if you’re on your way of joining the rank of mathematicians, but the proofs of Chain Rule never seem to click, then here is your chance! 🙂

Given a function $g$ defined on $I$, and another function $f$ defined on $g(I)$, we can defined a **composite function** $f \circ g$ (i.e., $f$ compose $g$) as follows:

\begin{align*} [f \circ g ](x) & \stackrel{df}{=} f[g(x)] \qquad (\forall x \in I) \end{align*}

In which case, we can refer to $f$ as the **outer function**, and $g$ as the **inner function**. Under this setup, the function $f \circ g$ maps $I$ first to $g(I)$, and then to $f[g(I)]$.

In addition, if $c$ is a point on $I$ such that:

- The inner function $g$ is differentiable at $c$ (with the derivative denoted by $g'(c)$).
- The outer function $f$ is differentiable at $g(c)$ (with the derivative denoted by $f'[g(c)]$).

then it would transpire that the function $f \circ g$ is *also* differentiable at $c$, where:

\begin{align*} (f \circ g)'(c) & = f'[g(c)] \, g'(c) \end{align*}

giving rise to the famous derivative formula commonly known as the **Chain Rule**.

Theorem 1 — The Chain Rule for Derivative

Given an **inner function** $g$ defined on $I$ and an **outer function** $f$ defined on $g(I)$, if $c$ is a point on $I$ such that $g$ is differentiable at $c$ and $f$ differentiable at $g(c)$ (i.e., the **image** of $c$), then we have that:

\begin{align*} (f \circ g)'(c) & = f'[g(c)] \, g'(c) \end{align*}

Or in **Leibniz’s notation**:

\begin{align*} \frac{df}{dx} = \frac{df}{dg} \frac{dg}{dx} \end{align*}

as if we’re going from $f$ to $g$ to $x$.

In English, the **Chain Rule** reads:

## The derivative of a composite function at a point, is equal to the derivative of the inner function at that point, times the derivative of the outer function at its image.

As simple as it might be, the fact that the derivative of a composite function can be evaluated in terms of that of its *constituent functions* was hailed as a tremendous breakthrough back in the old days, since it allows for the differentiation of a wide variety of **elementary functions** — ranging from $\displaystyle (x^2+2x+3)^4$ and $\displaystyle e^{\cos x + \sin x}$ to $\ln \left(\frac{3+x}{2^x} \right)$ and $\displaystyle \text{arcsec} (2^x)$.

More importantly, for a composite function involving *three* functions (say, $f$, $g$ and $h$), applying the Chain Rule *twice* yields that:

\begin{align*} f(g[h(c)])’ & = f'(g[h(c)]) \, \left[ g[h(c)] \right]’ \\ & = f'(g[h(c)]) \, g'[h(c)] \, h'(c) \end{align*}

(assuming that $h$ is differentiable at $c$, $g$ differentiable at $h(c)$, and $f$ at $g[h(c)]$ of course!)

In fact, extending this same reasoning to a *$n$-layer* composite function of the form $f_1 \circ (f_2 \circ \cdots (f_{n-1} \circ f_n) )$ gives rise to the so-called **Generalized Chain Rule:**

\begin{align*}\frac{d f_1}{dx} = \frac{d f_1}{d f_2} \, \frac{d f_2}{d f_3} \dots \frac{d f_n}{dx} \end{align*}

thereby showing that any composite function involving *any* number of functions — *if differentiable* — can have its derivative evaluated in terms of the derivatives of its *constituent functions* in a *chain-like* manner. Hence the **Chain Rule**.

All right. Let’s see if we can derive the Chain Rule from *first principles* then*: *given an inner function $g$ defined on $I$ and an outer function $f$ defined on $g(I)$, we are told that $g$ is differentiable at a point $c \in I$ and that $f$ is differentiable at $g(c)$. That is:

\begin{align*} \lim_{x \to c} \frac{g(x) – g(c)}{x – c} & = g'(c) & \lim_{x \to g(c)} \frac{f(x) – f[g(c)]}{x – g(c)} & = f'[g(c)] \end{align*}

Here, the goal is to show that the **composite function** $f \circ g$ indeed *differentiates* to $f'[g(c)] \, g'(c)$ at $c$. That is:

\begin{align*} \lim_{x \to c} \frac{f[g(x)] – f[g(c)]}{x -c} = f'[g(c)] \, g'(c) \end{align*}

As a *thought experiment*, we can kind of see that if we start on the left hand side by multiplying the fraction by $\dfrac{g(x) – g(c)}{g(x) – g(c)}$, then we would have that:

\begin{align*} \lim_{x \to c} \frac{f[g(x)] – f[g(c)]}{x -c} & = \lim_{x \to c} \left[ \frac{f[g(x)]-f[g(c)]}{g(x) – g(c)} \, \frac{g(x)-g(c)}{x-c} \right] \end{align*}

So that if for simplicity, we denote the **difference quotient** $\dfrac{f(x) – f[g(c)]}{x – g(c)}$ by $Q(x)$, then we should have that:

\begin{align*} \lim_{x \to c} \frac{f[g(x)] – f[g(c)]}{x -c} & = \lim_{x \to c} \left[ Q[g(x)] \, \frac{g(x)-g(c)}{x-c} \right] \\ & = \lim_{x \to c} Q[g(x)] \lim_{x \to c} \frac{g(x)-g(c)}{x-c} \\ & = f'[g(c)] \, g'(c) \end{align*}

Great! Seems like a home-run right? Well, *not so fast*, for there exists *two* fatal flaws with this line of reasoning…

First, we can only divide by $g(x)-g(c)$ if $g(x) \ne g(c)$. In fact, forcing this division now means that the quotient $\dfrac{f[g(x)]-f[g(c)]}{g(x) – g(c)}$ is no longer necessarily *well-defined* in a **punctured neighborhood** of $c$ (i.e., the set $(c-\epsilon, c+\epsilon) \setminus \{c\}$, where $\epsilon>0$). As a result, it no longer makes sense to talk about its limit as $x$ tends $c$.

And then there’s the second flaw, which is embedded in the reasoning that as $x \to c$, $Q[g(x)] \to f'[g(c)]$. To be sure, while it is true that:

- As $x \to c$, $g(x) \to g(c)$ (since differentiability implies continuity).
- As $x \to g(c)$, $Q(x) \to f'[g(c)]$ (remember, $Q$ is the
**difference quotient**of $f$ at $g(c)$).

It still doesn’t follow that as $x \to c$, $Q[g(x)] \to f'[g(c)]$. In fact, it is in general *false* that:

## If $x \to c$ implies that $g(x) \to G$, and $x \to G$ implies that $f(x) \to F$, then $x \to c$ implies that $(f \circ g)(x) \to F$.

Here, what is true instead is *this*:

Theorem 2 — Composition Law for Limits

Given an **inner function** $g$ defined on $I$ (with $c \in I$) and an **outer function** $f$ defined on $g(I)$, if the following two conditions are both met:

- As $x \to c$, $g(x) \to G$.
- $f(x)$ is
*continuous*at $G$.

then as $x \to c $, $(f \circ g)(x) \to f(G)$.

In any case, the point is that we have identified the two serious flaws that prevent our sketchy proof from working. Incidentally, this also happens to be the *pseudo-mathematical* approach employed by Lord Salman Khan to derive the Chain Rule. *Sad face*. 🙁

In which case, begging seems like an appropriate future *course of action*. 🙂

Lord Sal @khanacademy, mind reshooting the Chain Rule Proof video with a non-pseudo-math approach plz?

Actually, jokes aside, the important point to be made here is that this faulty proof nevertheless embodies the intuition behind the Chain Rule, which loosely speaking can be summarized as follows:

\begin{align*} \lim_{x \to c} \frac{\Delta f}{\Delta x} & = \lim_{x \to c} \frac{\Delta f}{\Delta g} \, \lim_{x \to c} \frac{\Delta g}{\Delta x} \end{align*}

Now, if you still recall, this is where we got stuck in the proof:

\begin{align*} \lim_{x \to c} \frac{f[g(x)] – f[g(c)]}{x -c} & = \lim_{x \to c} \left[ \frac{f[g(x)]-f[g(c)]}{g(x) – g(c)} \, \frac{g(x)-g(c)}{x-c} \right] \qquad (\text{kind of}) \\ & = \lim_{x \to c} Q[g(x)] \, \lim_{x \to c} \frac{g(x)-g(c)}{x-c} \qquad (\text{kind of})\\ & = \text{(ill-defined)} \, g'(c) \end{align*}

So that if only we can:

- Patch up the
*difference quotient*$Q(x)$ to make $Q[g(x)]$*well-defined*on a**punctured neighborhood**of $c$ — so that it now makes sense to define the limit of $Q[g(x)]$ as $x \to c$. - Tweak $Q(x)$ a bit to make it
*continuous*at $g(c)$ — so that the**Composition Law for Limits**would ensure that $\displaystyle \lim_{x \to c} Q[g(x)] = f'[g(c)]$.

then there might be a chance that we can turn our failed attempt into something more than fruitful. 🙂

Let’s see… How do we go about amending $Q(x)$, the **difference quotient** of $f$ at $g(c)$? Well, we’ll first have to make $Q(x)$ *continuous* at $g(c)$, and we do know that *by definition*:

\begin{align*} \lim_{x \to g(c)} Q(x) = \lim_{x \to g(c)} \frac{f(x) – f[g(c)]}{x – g(c)} = f'[g(c)] \end{align*}

Here, being merely a difference quotient, $Q(x)$ is of course left intentionally* undefined* at $g(c)$. However, if we *upgrade* our $Q(x)$ to $\mathbf{Q} (x)$ so that:

\begin{align*} \mathbf{Q}(x) \stackrel{df}{=} \begin{cases} Q(x) & x \ne g(c) \\ f'[g(c)] & x = g(c) \end{cases} \end{align*}

then $\mathbf{Q}(x)$ would be the patched version of $Q(x)$ which is actually *continuous* at $g(c)$. One puzzle solved!

All right. Moving on, let’s turn our attention now to another problem, which is the fact that the function $Q[g(x)]$, that is:

\begin{align*} \frac{f[g(x)] – f(g(c)}{g(x) – g(c)} \end{align*}

is not necessarily well-defined on a *punctured neighborhood* of $c$. But then you see, this problem has already been dealt with when we define $\mathbf{Q}(x)$! In particular, it can be verified that the definition of $\mathbf{Q}(x)$ entails that:

\begin{align*} \mathbf{Q}[g(x)] = \begin{cases} Q[g(x)] & \text{if $x$ is such that $g(x) \ne g(c)$ } \\ f'[g(c)] & \text{if $x$ is such that $g(x)=g(c)$} \end{cases} \end{align*}

Translation? The upgraded $\mathbf{Q}(x)$ ensures that $\mathbf{Q}[g(x)]$ has the *enviable* property of being pretty much *identical* to the plain old $Q[g(x)]$ — with the added bonus that it is actually defined on a *neighborhood* of $c$!

And with the two issues settled, we can now go back to square one — to the **difference quotient** of $f \circ g$ at $c$ that is — and verify that while the equality:

\begin{align*} \frac{f[g(x)] – f[g(c)]}{x – c} = \frac{f[g(x)]-f[g(c)]}{g(x) – g(c)} \, \frac{g(x)-g(c)}{x-c} \end{align*}

only holds for the $x$s in a *punctured neighborhood* of $c$ such that $g(x) \ne g(c)$, we now have that:

\begin{align*} \frac{f[g(x)] – f[g(c)]}{x – c} = \mathbf{Q}[g(x)] \, \frac{g(x)-g(c)}{x-c} \end{align*}

for *all* the $x$s in a *punctured neighborhood* of $c$. With this new-found realisation, we can now quickly finish the proof of Chain Rule as follows:

\begin{align*} \lim_{x \to c} \frac{f[g(x)] – f[g(c)]}{x – c} & = \lim_{x \to c} \left[ \mathbf{Q}[g(x)] \, \frac{g(x)-g(c)}{x-c} \right] \\ & = \lim_{x \to c} \mathbf{Q}[g(x)] \, \lim_{x \to c} \frac{g(x)-g(c)}{x-c} \\ & = f'[g(c)] \, g'(c) \end{align*}

where $\displaystyle \lim_{x \to c} \mathbf{Q}[g(x)] = f'[g(c)]$ as a result of the **Composition Law for Limits**. 🙂

Wow! That was a bit of a *detour* isn’t it? You see, while the **Chain Rule** might have been apparently intuitive to understand and apply, it is actually one of the first theorems in differential calculus out there that require a bit of *ingenuity* and *knowledge beyond calculus* to derive.

And if the derivation seems to mess around with the head a bit, then it’s certainly not hard to appreciate the *creative* and *deductive greatness* among the **forefathers of modern calculus** — those who’ve worked hard to establish a *solid*, *rigorous* foundation for calculus, thereby paving the way for its proliferation into various branches of applied sciences all around the world.

And as for you, *kudos* for having made it this far! As a token of appreciation, here’s an *interactive table* summarizing what we have discovered up to now:

Given an **inner function** $g$ defined on $I$ and an **outer function** $f$ defined on $g(I)$, if $g$ is differentiable at a point $c \in I$ and $f$ is differentiable at $g(c)$, then we have that:

\begin{align*} (f \circ g)'(c) & = f'[g(c)] \, g'(c) \end{align*}

Or in **Leibniz’s notation**:

\begin{align*} \frac{df}{dx} = \frac{df}{dg} \frac{dg}{dx} \end{align*}

Given an **inner function** $g$ defined on $I$ and an **outer function** $f$ defined on $g(I)$, if the following two conditions are both met:

- As $x \to c$, $g(x) \to G$.
- $f(x)$ is
*continuous*at $G$.

then as $x \to c $, $(f \circ g)(x) \to f(G)$.

Since the following equality only holds for the $x$s where $g(x) \ne g(c)$:

\begin{align*} \frac{f[g(x)] – f[g(c)]}{x -c} & = \left[ \frac{f[g(x)]-f[g(c)]}{g(x) – g(c)} \, \frac{g(x)-g(c)}{x-c} \right] \\ & = Q[g(x)] \, \frac{g(x)-g(c)}{x-c} \end{align*}

combined with the fact that $Q[g(x)] \not\to f'[g(x)]$ as $x \to c$, the argument falls apart.

Once we upgrade the **difference quotient** $Q(x)$ to $\mathbf{Q}(x)$ as follows:

\begin{align*} \mathbf{Q}(x) \stackrel{df}{=} \begin{cases} Q(x) & x \ne g(c) \\ f'[g(c)] & x = g(c) \end{cases} \end{align*}

we’ll have that:

\begin{align*} \frac{f[g(x)] – f[g(c)]}{x – c} = \mathbf{Q}[g(x)] \, \frac{g(x)-g(c)}{x-c} \end{align*}

for all $x$ in a **punctured neighborhood** of $c$. In which case, the proof of Chain Rule can be finalized in a few steps through the use of **limit laws**.

All right. That’s it for us as far as a *theoretical *discussion on Chain Rule is concerned. Still looking for more of this goodness? Oh well, we can certainly arrange to have you inside **the Vault** of course! And until then, we’ll be sticking to our playground on Facebook or Twitter!

**Math Vault and its Redditbots** has the singular goal of advocating for education in higher mathematics through *digital publishing* and the *uncanny* use of technologies. Head to the **Vault** for more math cookies. :)

Infinite Limits and the Behaviors of Polynomials at the Infinities — A Theoretical Musing

The Exponent Rule for Derivatives — Your One-Stop Shop to Staircase-Looking Functions

The Ultimate Guide to Logarithm: Basic Theory Commonly Missed in High School Which Turns a Log Noob into a Log Whiz

Integration Technique Series — How to Make Use of the Overshooting Method and Integrate with Ease

Add Your Reply