WEBVTT
00:00:04.040 --> 00:00:08.040
Picture yourself as an early calculus student about to begin your first course.
00:00:08.440 --> 00:00:13.280
The months ahead of you hold within them a lot of hard work, some neat examples, some not-so-neat examples.
00:00:13.280 --> 00:00:16.960
Beautiful connections to physics, not-so-beautiful piles of formulas to memorize.
00:00:17.160 --> 00:00:23.280
Plenty of moments of getting stuck and banging your head into a wall, a few nice “Aha” moments sprinkled in as well.
00:00:23.480 --> 00:00:26.920
And some genuinely lovely graphical intuition to help guide you through it all.
00:00:27.640 --> 00:00:34.400
But if the course ahead of you is anything like my first introduction to calculus or any of the first courses that I’ve seen in the years since.
00:00:34.840 --> 00:00:39.320
There’s one topic that you will not see, but which I believe stands to greatly accelerate your learning.
00:00:40.280 --> 00:00:44.680
You see, almost all of the visual intuitions from that first year are based on graphs.
00:00:45.080 --> 00:00:47.200
The derivative is the slope of a graph.
00:00:47.400 --> 00:00:49.640
The integral is a certain area under that graph.
00:00:50.240 --> 00:00:57.960
But as you generalize calculus beyond functions whose inputs and outputs are simply numbers, it’s not always possible to graph the function that you’re analyzing.
00:00:58.280 --> 00:01:00.920
There’s all sorts of different ways that you’d be visualizing these things.
00:01:01.480 --> 00:01:06.840
So if all your intuitions for the fundamental ideas, like derivatives, are rooted too rigidly in graphs.
00:01:07.200 --> 00:01:17.600
It can make for a very tall and largely unnecessary conceptual hurdle between you and the more quote, unquote, advanced topics like multivariable calculus and complex analysis, differential geometry.
00:01:18.880 --> 00:01:29.160
Now, what I wanna share with you is a way to think about derivatives, which I’ll refer to as the transformational view, that generalizes more seamlessly into some of those more general contexts where calculus comes up.
00:01:29.920 --> 00:01:34.880
And then we’ll use this alternate view to analyze a certain fun puzzle about repeated fractions.
00:01:35.520 --> 00:01:39.640
But first off, I just wanna make sure that we’re all on the same page about what the standard visual is.
00:01:40.120 --> 00:01:44.440
If you were to graph a function, which simply takes real numbers as inputs and outputs.
00:01:44.800 --> 00:01:50.120
One of the first things you learn in a calculus course is that the derivative gives you the slope of this graph.
00:01:50.760 --> 00:01:58.240
Where what we mean by that is that the derivative of the function is a new function which for every input 𝑥 returns that slope.
00:01:59.520 --> 00:02:04.440
Now, I’d encourage you not to think of this derivative-as-slope idea as being the definition of a derivative.
00:02:05.000 --> 00:02:10.360
Instead, think of it as being more fundamentally about how sensitive the function is to tiny little nudges around the input.
00:02:11.000 --> 00:02:16.920
And the slope is just one way to think about that sensitivity relevant only to this particular way of viewing functions.
00:02:17.520 --> 00:02:21.760
I have not just another video, but a full series on this topic if it’s something you wanna learn more about.
00:02:22.560 --> 00:02:32.760
Now the basic idea behind the alternate visual for the derivative is to think of this function as mapping all of the input points on the number line to their corresponding outputs on a different number line.
00:02:33.360 --> 00:02:40.240
In this context, what the derivative gives you is a measure of how much the input space gets stretched or squished in various regions.
00:02:42.320 --> 00:02:56.560
That is, if you were to zoom in around a specific input and take a look at some evenly spaced points around it, the derivative of the function of that input is gonna tell you how spread out or contracted those points become after the mapping.
00:02:57.880 --> 00:02:59.440
Here, a specific example helps.
00:02:59.720 --> 00:03:01.040
Take the function 𝑥 squared.
00:03:01.440 --> 00:03:05.440
It maps one to one and two to four, three to nine, and so on.
00:03:06.560 --> 00:03:09.240
And you could also see how it acts on all of the points in between.
00:03:12.800 --> 00:03:22.280
And if you were to zoom in on a little cluster of points around the input one and then see where they land around the relevant output, which for this function also happens to be one.
00:03:22.960 --> 00:03:24.720
You’d notice that they tend to get stretched out.
00:03:25.760 --> 00:03:28.960
In fact, it roughly looks like stretching out by a factor of two.
00:03:29.640 --> 00:03:35.560
And the closer you zoom in, the more this local behavior looks just like multiplying by a factor of two.
00:03:36.360 --> 00:03:41.720
This is what it means for the derivative of 𝑥 squared at the input 𝑥 equals one to be two.
00:03:42.280 --> 00:03:45.520
It’s what that fact looks like in the context of transformations.
00:03:46.400 --> 00:03:52.200
If you looked at a neighborhood of points around the input three, they would get roughly stretched out by a factor of six.
00:03:52.720 --> 00:03:57.360
This is what it means for the derivative of this function at the input three to equal six.
00:03:58.960 --> 00:04:05.120
Around the input one-fourth, a small region actually tends to get contracted, specifically by a factor of one-half.
00:04:05.680 --> 00:04:08.240
And that’s what it looks like for a derivative to be smaller than one.
00:04:10.960 --> 00:04:12.520
Now the input zero is interesting.
00:04:13.040 --> 00:04:17.920
Zooming in by a factor of 10, it doesn’t really look like a constant stretching or squishing.
00:04:18.440 --> 00:04:21.640
For one thing, all of the outputs end up on the right positive side of things.
00:04:23.280 --> 00:04:33.600
And as you zoom in closer and closer by 100𝑥 or by 1000𝑥, it looks more and more like a small neighborhood of points around zero, just gets collapsed into zero itself.
00:04:37.640 --> 00:04:39.960
And this is what it looks like for the derivative to be zero.
00:04:40.480 --> 00:04:45.000
The local behavior looks more and more like multiplying the whole number line by zero.
00:04:45.640 --> 00:04:49.280
It doesn’t have to completely collapse everything to a point at a particular zoom level.
00:04:49.760 --> 00:04:53.840
Instead, it’s a matter of what the limiting behavior is as you zoom in closer and closer.
00:04:55.560 --> 00:04:58.280
It’s also instructive to take a look at the negative inputs here.
00:05:00.880 --> 00:05:05.040
Things start to feel a little cramped since they collide with where all the positive input values go.
00:05:05.600 --> 00:05:08.880
And this is one of the downsides of thinking of functions as transformations.
00:05:09.320 --> 00:05:15.560
But for derivatives, we only really care about the local behavior anyway, what happens in a small range around a given input.
00:05:16.440 --> 00:05:20.840
Here, notice that the inputs in a little neighborhood around, say, negative two.
00:05:21.120 --> 00:05:22.200
They don’t just get stretched out.
00:05:22.640 --> 00:05:24.120
They also get flipped around.
00:05:24.600 --> 00:05:31.520
Specifically, the action on such a neighborhood looks more and more like multiplying by negative four, the closer you zoom in.
00:05:32.320 --> 00:05:35.640
This is what it looks like for the derivative of a function to be negative.
00:05:38.640 --> 00:05:39.800
And I think you get the point.
00:05:39.800 --> 00:05:52.840
This is all well and good, but let’s see how this is actually useful in solving a problem, A friend of mine recently asked me a pretty fun question about the infinite fraction one plus one divided by one plus one divided by one plus one divided by one, on and on and on and on.
00:05:53.560 --> 00:05:55.800
And clearly, you watch math videos online.
00:05:55.800 --> 00:05:57.160
So maybe you’ve seen this before.
00:05:57.520 --> 00:06:04.040
But my friend’s question actually cuts to something that you might not have thought about before, relevant to the view of derivatives that we’re looking at here.
00:06:05.000 --> 00:06:13.400
The typical way that you might evaluate an expression like this is to set it equal to 𝑥 and then notice that there’s a copy of the full fraction inside itself.
00:06:15.080 --> 00:06:18.800
So you can replace that copy with another 𝑥, and then just solve for 𝑥.
00:06:19.360 --> 00:06:24.600
That is, what you want is to find a fixed point of the function one plus one divided by 𝑥.
00:06:27.480 --> 00:06:28.400
But here’s the thing.
00:06:28.640 --> 00:06:36.120
There are actually two solutions for 𝑥, two special numbers where one plus one divided by that number gives you back the same thing.
00:06:37.000 --> 00:06:41.160
One is the golden ratio 𝜑, around 1.618.
00:06:41.160 --> 00:06:46.320
And the other is negative 0.618, which happens to be negative one divided by 𝜑.
00:06:47.000 --> 00:06:49.440
I like to call this other number 𝜑’s little brother.
00:06:49.440 --> 00:06:52.800
Since just about any property that 𝜑 has, this number also has.
00:06:53.520 --> 00:07:03.600
And this raises the question, would it be valid to say that that infinite fraction that we saw is somehow also equal to 𝜑’s little brother, negative 0.618?
00:07:04.800 --> 00:07:06.760
Maybe you initially say, “Obviously not!
00:07:07.160 --> 00:07:08.760
Everything on the left hand side is positive.
00:07:08.960 --> 00:07:11.240
So how could it possibly equal a negative number?”
00:07:12.640 --> 00:07:16.800
Well, first we should be clear about what we actually mean by an expression like this.
00:07:17.840 --> 00:07:28.480
One way that you could think about it — and it’s not the only way; there’s freedom for choice here — is to imagine starting with some constant like one and then repeatedly applying the function one plus one divided by 𝑥.
00:07:30.080 --> 00:07:32.920
And then asking what is this approach, as you keep going.
00:07:36.040 --> 00:07:39.440
And certainly, symbolically, what you get looks more and more like our infinite fraction.
00:07:39.840 --> 00:07:43.560
So maybe if you wanted to equal a number, you should ask what this series of numbers approaches.
00:07:45.160 --> 00:07:48.400
And if that’s your view of things, maybe you start off with a negative number.
00:07:48.400 --> 00:07:51.400
So it’s not so crazy for the whole expression to end up negative.
00:07:53.040 --> 00:08:01.760
After all, if you start with negative one divided by 𝜑, then applying this function one plus one over 𝑥, you get back the same number, negative one divided by 𝜑.
00:08:02.360 --> 00:08:05.320
So no matter how many times you apply it, you’re staying fixed at this value.
00:08:07.800 --> 00:08:13.440
But even then, there is one reason that you should probably view 𝜑 as the favorite brother in this pair.
00:08:14.000 --> 00:08:14.680
Here, try this.
00:08:14.960 --> 00:08:18.320
Pull up a calculator of some kind, then start with any random number.
00:08:18.880 --> 00:08:21.760
And then plug it into this function, one plus one divided by 𝑥.
00:08:22.320 --> 00:08:25.360
And then plug that number into one plus one over 𝑥.
00:08:25.360 --> 00:08:28.000
And then again and again and again and again and again.
00:08:28.800 --> 00:08:33.200
No matter what constant you start with, you eventually end up at 1.618.
00:08:33.880 --> 00:08:38.480
Even if you start with a negative number, even one that’s really, really close to 𝜑’s little brother.
00:08:39.000 --> 00:08:43.280
Eventually, it shies away from that value and jumps back over to 𝜑.
00:08:51.080 --> 00:08:52.360
So what’s going on here?
00:08:52.760 --> 00:08:55.800
Why is one of these fixed points favored above the other one?
00:08:56.680 --> 00:09:01.680
Maybe you can already see how the transformational understanding of derivatives is gonna be helpful for understanding this set-up.
00:09:02.240 --> 00:09:07.120
But for the sake of having a point of contrast, I wanna show you how a problem like this is often taught using graphs.
00:09:07.840 --> 00:09:13.960
If you were to plug in some random input to this function, the 𝑦-value tells you the corresponding output, right?
00:09:14.880 --> 00:09:22.720
So to think about plugging that output back into the function, you might first move horizontally until you hit the line 𝑦 equals 𝑥.
00:09:23.120 --> 00:09:28.160
And that’s gonna give you a position where the 𝑥-value corresponds to your previous 𝑦-value, right?
00:09:28.920 --> 00:09:34.440
So then from there, you can move vertically to see what output this new 𝑥-value has.
00:09:35.120 --> 00:09:35.840
And then you repeat.
00:09:36.280 --> 00:09:41.920
You move horizontally to the line 𝑦 equals 𝑥 to find a point whose 𝑥-value is the same as the output that you just got.
00:09:42.400 --> 00:09:44.560
And then you move vertically to apply the function again.
00:09:45.840 --> 00:09:50.760
Now personally, I think this is kind of an awkward way to think about repeatedly applying a function, don’t you?
00:09:51.320 --> 00:09:56.600
I mean it makes sense, but you kinda have to pause and think about it to remember which way to draw the lines.
00:09:57.160 --> 00:10:05.160
And you can, if you want, think through what conditions make this spiderweb process narrow in on a fixed point versus propagating away from it.
00:10:05.840 --> 00:10:06.640
And in fact, go ahead!
00:10:06.640 --> 00:10:09.040
Pause right now and try to think it through as an exercise.
00:10:09.360 --> 00:10:10.440
It has to do with slopes.
00:10:12.240 --> 00:10:19.600
Or if you wanna skip the exercise for something that I think gives a much more satisfying understanding, think about how this function acts as a transformation.
00:10:22.320 --> 00:10:27.840
So I’m gonna go ahead and start here by drawing a whole bunch of arrows to indicate where the various sampled input points will go.
00:10:28.080 --> 00:10:31.440
And side note, don’t you think this gives a really neat emergent pattern?
00:10:31.880 --> 00:10:35.040
I wasn’t expecting this, but it was cool to see it pop up when animating.
00:10:35.560 --> 00:10:38.840
I guess the action of one divided by 𝑥 gives this nice emergent circle.
00:10:39.280 --> 00:10:41.080
And then we’re just shifting things over by one.
00:10:41.800 --> 00:10:48.680
Anyway, I want you to think about what it means to repeatedly apply some function, like one plus one over 𝑥, in this context.
00:10:50.280 --> 00:10:55.920
Well, after letting it map all of the inputs to the outputs, you could consider those as the new inputs.
00:10:55.920 --> 00:10:59.280
And then just apply the same process again, and then again.
00:10:59.760 --> 00:11:01.440
And do it however many times you want.
00:11:02.960 --> 00:11:12.000
Notice, in animating this with a few dots representing the sample points, it doesn’t take many iterations at all before all of those dots kind of clump in around 1.618.
00:11:14.600 --> 00:11:23.840
Now remember, we know that 1.618, and its little brother, negative 0.618 on and on, stay fixed in place during each iteration of this process.
00:11:24.800 --> 00:11:27.440
But zoom in on a neighborhood around 𝜑.
00:11:28.280 --> 00:11:31.840
During the map, points in that region get contracted around 𝜑.
00:11:34.040 --> 00:11:41.040
Meaning that the function one plus one over 𝑥 has a derivative with a magnitude that’s less than one at this input.
00:11:41.880 --> 00:11:45.160
In fact, this derivative works out to be around negative 0.38.
00:11:46.080 --> 00:11:54.160
So what that means is that each repeated application scrunches the neighborhood around this number smaller and smaller, like a gravitational pull towards 𝜑.
00:11:55.240 --> 00:11:58.480
So now, tell me what you think happens in the neighborhood of 𝜑’s little brother.
00:12:01.280 --> 00:12:05.000
Over there, the derivative actually has a magnitude larger than one.
00:12:05.520 --> 00:12:08.400
So points near the fixed point are repelled away from it.
00:12:09.440 --> 00:12:13.760
And when you work it out, you can see that they get stretched by more than a factor of two in each iteration.
00:12:14.400 --> 00:12:17.240
They also get flipped around because the derivative is negative here.
00:12:17.520 --> 00:12:20.880
But the salient fact for the sake of stability is just the magnitude.
00:12:23.360 --> 00:12:29.240
Mathematicians would call this right value a stable fixed point, and the left one is an unstable fixed point.
00:12:29.920 --> 00:12:36.760
Something is considered stable if when you perturb it just a little bit, it tends to come back towards where it started rather than going away from it.
00:12:38.120 --> 00:12:47.240
So what we’re seeing is a very useful little fact, that the stability of a fixed point is determined by whether or not the magnitude of its derivative is bigger or smaller than one.
00:12:47.880 --> 00:12:55.560
And this explains why 𝜑 always shows up in the numerical play where you’re just hitting enter on your calculator over and over, but 𝜑’s little brother never does.
00:12:56.360 --> 00:13:02.720
Now as to whether or not you wanna consider 𝜑’s little brother a valid value of the infinite fraction, well, that’s really up to you.
00:13:03.240 --> 00:13:08.560
Everything we just showed suggests that if you think of this expression as representing a limiting process.
00:13:08.920 --> 00:13:17.560
Then because every possible seed value other than 𝜑’s little brother gives you a series converting to 𝜑, it does feel kinda silly to put them on equal footing with each other.
00:13:18.280 --> 00:13:20.080
But maybe you don’t think of it as a limit.
00:13:20.520 --> 00:13:29.280
Maybe the kind of math you’re doing lends itself to treating this as a purely algebraic object, like the solutions of a polynomial which simply has multiple values.
00:13:30.440 --> 00:13:31.760
Anyway, that’s beside the point.
00:13:32.240 --> 00:13:38.760
And my point here is not that viewing derivatives as this change in density is somehow better than the graphical intuition on the whole.
00:13:39.560 --> 00:13:44.800
In fact, picturing an entire function this way can be kind of clunky and impractical as compared to graphs.
00:13:45.440 --> 00:13:49.760
My point is that it deserves more of a mention in most of the introductory calculus courses.
00:13:50.120 --> 00:13:53.920
Because it can help make a student’s understanding of the derivative a little bit more flexible.
00:13:54.880 --> 00:14:05.480
Like I mentioned, the real reason that I’d recommend you carry this perspective with you as you learn new topics is not so much for what it does with your understanding of single variable calculus, it’s for what comes after.
00:14:06.120 --> 00:14:10.720
There are many topics typically taught in a college math department which — how shall I put this lightly?
00:14:11.400 --> 00:14:13.840
— don’t exactly have a reputation of being super accessible.
00:14:15.240 --> 00:14:26.360
So in the next video, I’m gonna show you how a few ideas from these subjects with fancy sounding names, like holomorphic functions and the Jacobian determinant, are really just extensions of the idea shown here.
00:14:26.880 --> 00:14:32.280
They really are some beautiful ideas, which I think can be appreciated from a really wide range of mathematical backgrounds.
00:14:32.640 --> 00:14:36.400
And they’re relevant to a surprising number of seemingly unrelated ideas.
00:14:36.760 --> 00:14:37.880
So stay tuned for that.