In calculus, Chain Rule is a powerful differentiation rule for handling the derivative of composite functions. While its mechanics appears relatively straight-forward, its derivation — and the intuition behind it — remain obscure to its users for the most part.
In what follows though, we will attempt to take a look what both of those. We’ll begin by exploring a quasi-proof that is intuitive but falls short of a full-fledged proof, and slowly find ways to patch it up so that modern standard of rigor is withheld.
Table of Contents
Chain Rule — A Review
Given a function $g$ defined on $I$, and another function $f$ defined on $g(I)$, we can defined a composite function $f \circ g$ (i.e., $f$ compose $g$) as follows:
\begin{align*} [f \circ g ](x) & \stackrel{df}{=} f[g(x)] \qquad (\forall x \in I) \end{align*}
In which case, we can refer to $f$ as the outer function, and $g$ as the inner function. Under this setup, the function $f \circ g$ maps $I$ first to $g(I)$, and then to $f[g(I)]$.
In addition, if $c$ is a point on $I$ such that:
- The inner function $g$ is differentiable at $c$ (with the derivative denoted by $g'(c)$).
- The outer function $f$ is differentiable at $g(c)$ (with the derivative denoted by $f'[g(c)]$).
then it would transpire that the function $f \circ g$ is also differentiable at $c$, where:
\begin{align*} (f \circ g)'(c) & = f'[g(c)] \, g'(c) \end{align*}
giving rise to the famous derivative formula commonly known as the Chain Rule.
Theorem 1 — The Chain Rule for Derivative
Given an inner function $g$ defined on $I$ and an outer function $f$ defined on $g(I)$, if $c$ is a point on $I$ such that $g$ is differentiable at $c$ and $f$ differentiable at $g(c)$ (i.e., the image of $c$), then we have that:
\begin{align*} (f \circ g)'(c) & = f'[g(c)] \, g'(c) \end{align*}
Or in Leibniz’s notation:
\begin{align*} \frac{df}{dx} = \frac{df}{dg} \frac{dg}{dx} \end{align*}
as if we’re going from $f$ to $g$ to $x$.
In English, the Chain Rule reads:
The derivative of a composite function at a point, is equal to the derivative of the inner function at that point, times the derivative of the outer function at its image.
As simple as it might be, the fact that the derivative of a composite function can be evaluated in terms of that of its constituent functions was hailed as a tremendous breakthrough back in the old days, since it allows for the differentiation of a wide variety of elementary functions — ranging from $\displaystyle (x^2+2x+3)^4$ and $\displaystyle e^{\cos x + \sin x}$ to $\ln \left(\frac{3+x}{2^x} \right)$ and $\operatorname{arcsec} (2^x)$.
More importantly, for a composite function involving three functions (say, $f$, $g$ and $h$), applying the Chain Rule twice yields that:
\begin{align*} f(g[h(c)])’ & = f'(g[h(c)]) \, \left[ g[h(c)] \right]’ \\ & = f'(g[h(c)]) \, g'[h(c)] \, h'(c) \end{align*}
(assuming that $h$ is differentiable at $c$, $g$ differentiable at $h(c)$, and $f$ at $g[h(c)]$ of course!)
In fact, extending this same reasoning to a $n$-layer composite function of the form $f_1 \circ (f_2 \circ \cdots (f_{n-1} \circ f_n) )$ gives rise to the so-called Generalized Chain Rule:
\begin{align*}\frac{d f_1}{dx} = \frac{d f_1}{d f_2} \, \frac{d f_2}{d f_3} \dots \frac{d f_n}{dx} \end{align*}
thereby showing that any composite function involving any number of functions — if differentiable — can have its derivative evaluated in terms of the derivatives of its constituent functions in a chain-like manner. Hence the Chain Rule.
Deriving the Chain Rule — Preliminary Attempt
All right. Let’s see if we can derive the Chain Rule from first principles then: given an inner function $g$ defined on $I$ and an outer function $f$ defined on $g(I)$, we are told that $g$ is differentiable at a point $c \in I$ and that $f$ is differentiable at $g(c)$. That is:
\begin{align*} \lim_{x \to c} \frac{g(x) – g(c)}{x – c} & = g'(c) & \lim_{x \to g(c)} \frac{f(x) – f[g(c)]}{x – g(c)} & = f'[g(c)] \end{align*}
Here, the goal is to show that the composite function $f \circ g$ indeed differentiates to $f'[g(c)] \, g'(c)$ at $c$. That is:
\begin{align*} \lim_{x \to c} \frac{f[g(x)] – f[g(c)]}{x -c} = f'[g(c)] \, g'(c) \end{align*}
As a thought experiment, we can kind of see that if we start on the left hand side by multiplying the fraction by $\dfrac{g(x) – g(c)}{g(x) – g(c)}$, then we would have that:
\begin{align*} \lim_{x \to c} \frac{f[g(x)] – f[g(c)]}{x -c} & = \lim_{x \to c} \left[ \frac{f[g(x)]-f[g(c)]}{g(x) – g(c)} \, \frac{g(x)-g(c)}{x-c} \right] \end{align*}
So that if for simplicity, we denote the difference quotient $\dfrac{f(x) – f[g(c)]}{x – g(c)}$ by $Q(x)$, then we should have that:
\begin{align*} \lim_{x \to c} \frac{f[g(x)] – f[g(c)]}{x -c} & = \lim_{x \to c} \left[ Q[g(x)] \, \frac{g(x)-g(c)}{x-c} \right] \\ & = \lim_{x \to c} Q[g(x)] \lim_{x \to c} \frac{g(x)-g(c)}{x-c} \\ & = f'[g(c)] \, g'(c) \end{align*}
Great! Seems like a home-run right? Well, not so fast, for there exists two fatal flaws with this line of reasoning…
First, we can only divide by $g(x)-g(c)$ if $g(x) \ne g(c)$. In fact, forcing this division now means that the quotient $\dfrac{f[g(x)]-f[g(c)]}{g(x) – g(c)}$ is no longer necessarily well-defined in a punctured neighborhood of $c$ (i.e., the set $(c-\epsilon, c+\epsilon) \setminus \{c\}$, where $\epsilon>0$). As a result, it no longer makes sense to talk about its limit as $x$ tends $c$.
And then there’s the second flaw, which is embedded in the reasoning that as $x \to c$, $Q[g(x)] \to f'[g(c)]$. To be sure, while it is true that:
- As $x \to c$, $g(x) \to g(c)$ (since differentiability implies continuity).
- As $x \to g(c)$, $Q(x) \to f'[g(c)]$ (remember, $Q$ is the difference quotient of $f$ at $g(c)$).
It still doesn’t follow that as $x \to c$, $Q[g(x)] \to f'[g(c)]$. In fact, it is in general false that:
If $x \to c$ implies that $g(x) \to G$, and $x \to G$ implies that $f(x) \to F$, then $x \to c$ implies that $(f \circ g)(x) \to F$.
Here, what is true instead is this:
Theorem 2 — Composition Law for Limits
Given an inner function $g$ defined on $I$ (with $c \in I$) and an outer function $f$ defined on $g(I)$, if the following two conditions are both met:
- As $x \to c$, $g(x) \to G$.
- $f(x)$ is continuous at $G$.
then as $x \to c $, $(f \circ g)(x) \to f(G)$.
In any case, the point is that we have identified the two serious flaws that prevent our sketchy proof from working. Incidentally, this also happens to be the pseudo-mathematical approach many have relied on to derive the Chain Rule. Not good.
In which case, begging seems like an appropriate future course of action…
Actually, jokes aside, the important point to be made here is that this faulty proof nevertheless embodies the intuition behind the Chain Rule, which loosely speaking can be summarized as follows:
\begin{align*} \lim_{x \to c} \frac{\Delta f}{\Delta x} & = \lim_{x \to c} \frac{\Delta f}{\Delta g} \, \lim_{x \to c} \frac{\Delta g}{\Delta x} \end{align*}
Now, if you still recall, this is where we got stuck in the proof:
\begin{align*} \lim_{x \to c} \frac{f[g(x)] – f[g(c)]}{x -c} & = \lim_{x \to c} \left[ \frac{f[g(x)]-f[g(c)]}{g(x) – g(c)} \, \frac{g(x)-g(c)}{x-c} \right] \quad (\text{kind of}) \\ & = \lim_{x \to c} Q[g(x)] \, \lim_{x \to c} \frac{g(x)-g(c)}{x-c} \quad (\text{kind of})\\ & = \text{(ill-defined)} \, g'(c) \end{align*}
So that if only we can:
- Patch up the difference quotient $Q(x)$ to make $Q[g(x)]$ well-defined on a punctured neighborhood of $c$ — so that it now makes sense to define the limit of $Q[g(x)]$ as $x \to c$.
- Tweak $Q(x)$ a bit to make it continuous at $g(c)$ — so that the Composition Law for Limits would ensure that $\displaystyle \lim_{x \to c} Q[g(x)] = f'[g(c)]$.
then there might be a chance that we can turn our failed attempt into something more than fruitful.
Deriving the Chain Rule — Second Attempt
Let’s see… How do we go about amending $Q(x)$, the difference quotient of $f$ at $g(c)$? Well, we’ll first have to make $Q(x)$ continuous at $g(c)$, and we do know that by definition:
\begin{align*} \lim_{x \to g(c)} Q(x) = \lim_{x \to g(c)} \frac{f(x) – f[g(c)]}{x – g(c)} = f'[g(c)] \end{align*}
Here, being merely a difference quotient, $Q(x)$ is of course left intentionally undefined at $g(c)$. However, if we upgrade our $Q(x)$ to $\mathbf{Q} (x)$ so that:
\begin{align*} \mathbf{Q}(x) \stackrel{df}{=} \begin{cases} Q(x) & x \ne g(c) \\ f'[g(c)] & x = g(c) \end{cases} \end{align*}
then $\mathbf{Q}(x)$ would be the patched version of $Q(x)$ which is actually continuous at $g(c)$. One puzzle solved!
All right. Moving on, let’s turn our attention now to another problem, which is the fact that the function $Q[g(x)]$, that is:
\begin{align*} \frac{f[g(x)] – f(g(c)}{g(x) – g(c)} \end{align*}
is not necessarily well-defined on a punctured neighborhood of $c$. But then you see, this problem has already been dealt with when we define $\mathbf{Q}(x)$! In particular, it can be verified that the definition of $\mathbf{Q}(x)$ entails that:
\begin{align*} \mathbf{Q}[g(x)] = \begin{cases} Q[g(x)] & \text{if $x$ is such that $g(x) \ne g(c)$ } \\ f'[g(c)] & \text{if $x$ is such that $g(x)=g(c)$} \end{cases} \end{align*}
Translation? The upgraded $\mathbf{Q}(x)$ ensures that $\mathbf{Q}[g(x)]$ has the enviable property of being pretty much identical to the plain old $Q[g(x)]$ — with the added bonus that it is actually defined on a neighborhood of $c$!
And with the two issues settled, we can now go back to square one — to the difference quotient of $f \circ g$ at $c$ that is — and verify that while the equality:
\begin{align*} \frac{f[g(x)] – f[g(c)]}{x – c} = \frac{f[g(x)]-f[g(c)]}{g(x) – g(c)} \, \frac{g(x)-g(c)}{x-c} \end{align*}
only holds for the $x$s in a punctured neighborhood of $c$ such that $g(x) \ne g(c)$, we now have that:
\begin{align*} \frac{f[g(x)] – f[g(c)]}{x – c} = \mathbf{Q}[g(x)] \, \frac{g(x)-g(c)}{x-c} \end{align*}
for all the $x$s in a punctured neighborhood of $c$. With this new-found realisation, we can now quickly finish the proof of Chain Rule as follows:
\begin{align*} \lim_{x \to c} \frac{f[g(x)] – f[g(c)]}{x – c} & = \lim_{x \to c} \left[ \mathbf{Q}[g(x)] \, \frac{g(x)-g(c)}{x-c} \right] \\ & = \lim_{x \to c} \mathbf{Q}[g(x)] \, \lim_{x \to c} \frac{g(x)-g(c)}{x-c} \\ & = f'[g(c)] \, g'(c) \end{align*}
where $\displaystyle \lim_{x \to c} \mathbf{Q}[g(x)] = f'[g(c)]$ as a result of the Composition Law for Limits.
Afterwords
Wow! That was a bit of a detour isn’t it? You see, while the Chain Rule might have been apparently intuitive to understand and apply, it is actually one of the first theorems in differential calculus out there that require a bit of ingenuity and knowledge beyond calculus to derive.
And if the derivation seems to mess around with the head a bit, then it’s certainly not hard to appreciate the creative and deductive greatness among the forefathers of modern calculus — those who’ve worked hard to establish a solid, rigorous foundation for calculus, thereby paving the way for its proliferation into various branches of applied sciences all around the world.
And as for you, kudos for having made it this far! As a token of appreciation, here’s an interactive table summarizing what we have discovered up to now:
Given an inner function $g$ defined on $I$ and an outer function $f$ defined on $g(I)$, if $g$ is differentiable at a point $c \in I$ and $f$ is differentiable at $g(c)$, then we have that:
\begin{align*} (f \circ g)'(c) & = f'[g(c)] \, g'(c) \end{align*}
Or in Leibniz’s notation:
\begin{align*} \frac{df}{dx} = \frac{df}{dg} \frac{dg}{dx} \end{align*}
Given an inner function $g$ defined on $I$ and an outer function $f$ defined on $g(I)$, if the following two conditions are both met:
- As $x \to c$, $g(x) \to G$.
- $f(x)$ is continuous at $G$.
then as $x \to c $, $(f \circ g)(x) \to f(G)$.
Since the following equality only holds for the $x$s where $g(x) \ne g(c)$:
\begin{align*} \frac{f[g(x)] – f[g(c)]}{x -c} & = \left[ \frac{f[g(x)]-f[g(c)]}{g(x) – g(c)} \, \frac{g(x)-g(c)}{x-c} \right] \\ & = Q[g(x)] \, \frac{g(x)-g(c)}{x-c} \end{align*}
combined with the fact that $Q[g(x)] \not\to f'[g(x)]$ as $x \to c$, the argument falls apart.
Once we upgrade the difference quotient $Q(x)$ to $\mathbf{Q}(x)$ as follows:
\begin{align*} \mathbf{Q}(x) \stackrel{df}{=} \begin{cases} Q(x) & x \ne g(c) \\ f'[g(c)] & x = g(c) \end{cases} \end{align*}
we’ll have that:
\begin{align*} \frac{f[g(x)] – f[g(c)]}{x – c} = \mathbf{Q}[g(x)] \, \frac{g(x)-g(c)}{x-c} \end{align*}
for all $x$ in a punctured neighborhood of $c$. In which case, the proof of Chain Rule can be finalized in a few steps through the use of limit laws.
And with that, we’ll close our little discussion on the theory of Chain Rule as of now. By the way, are you aware of an alternate proof that works equally well? If so, you have good reason to be grateful of Chain Rule the next time you invoke it to advance your work!
Wow, that really was mind blowing!
I did come across a few hitches in the logic — perhaps due to my own misunderstandings of the topic.
Firstly, why define g'(c) to be the lim (x->c) of [g(x) – g(c)]/[x-c].
If you were to follow the definition from most textbooks:
f'(x) = lim (h->0) of [f(x+h) – f(x)]/[h]
Then, for g'(c), you would come up with:
g'(c) = lim (h->0) of [g(c+h) – g(c)]/[h]
Perhaps the two are the same, and maybe it’s just my loosey-goosey way of thinking about the limits that is causing this confusion…
Secondly, I don’t understand how bold Q(x) works. I understand the law of composite functions limits part, but it just seems too easy — just defining Q(x) to be f'(x) when g(x) = g(c)… I can’t pin-point why, but it feels a little bit like cheating :P.
Lastly, I just came up with a geometric interpretation of the chain rule — maybe not so fancy :P.
f(g(x)) is simply f(x) with a shifted x-axis [Seems like a big assumption right now, but the derivative of g takes care of instantaneous non-linearity]. g'(x) is simply the transformation scalar — which takes in an x value on the g(x) axis and returns the transformation scalar which, when multiplied with f'(x) gives you the actual value of the derivative of f(g(x)). I like to think of g(x) as an elongated x axis/input domain to visualize it, but since the derivative of g'(x) is instantaneous, it takes care of the fact that g(x) may not be as linear as that — so g(x) could also be an odd-powered polynomial (covering every real value — loved that article, by the way!) but the analogy would still hold (I think).
Once again, thank you very much! 😀
Hi Anitej. For the first question, the derivative of a function at a point can be defined using both the x-c notation and the h notation. In fact, using a stronger form of limit comparison law, it can be shown that if the derivative exists, then the derivative as defined by both definitions are equivalent.
For the second question, the bold Q(x) basically attempts to patch up Q(x) so that it is actually continuous at g(c). Now, if we define the bold Q(x) to be f'(x) when g(x)=g(c), then not only will it not take care of the case where the input x is actually equal to g(c), but the desired continuity won’t be achieved either.
And as for the geometric interpretation of the Chain Rule, that’s definitely a neat way to think of it!
Well that sorts it out then… err, mostly.
But why resort to f'(c) instead of f'(g(c)), wouldn’t that lead to a very different value of f'(x) at x=c, compared to the rest of the values [That does sort of make sense as the limit as x->c of that derivative doesn’t exist]?
Either way, thank you very much — I certainly didn’t expect such a quick reply! 🙂
Oh. It is f'[g(c)]. Remember, g being the inner function is evaluated at c, whereas f being the outer function is evaluated at g(c). In particular, the focus is not on the derivative of f at c. You might want to go through the Second Attempt Section by now and see if it helps.
Thank you. This is awesome . This is one of the most used topic of calculus . You have explained every thing very clearly but I also expected more practice problems on derivative chain rule.
Hi Pranjal. For calculus practice problems, you might find the book “Calculus” by James Stewart helpful. It’s under the tag “Applied College Mathematics” in our resource page.
Well Done, nice article, thanks for the post
thank you very good article
Thank you. Chain rule is a bit tricky to explain at the theory level, so hopefully the message comes across safe and sound!