WEBVTT
00:00:04.360 --> 00:00:16.800
The goal here is simple: explain what a derivative is.
00:00:16.800 --> 00:00:24.200
The thing is though, there’s some subtlety to this topic and a lot of potential for paradoxes, if you’re not careful.
00:00:24.720 --> 00:00:30.040
So kind of a secondary goal is that you have an appreciation for what those paradoxes are and how to avoid them.
00:00:31.240 --> 00:00:35.960
You see, it’s common for people to say that the derivative measures an instantaneous rate of change.
00:00:36.560 --> 00:00:39.640
But when you think about it, that phrase is actually an oxymoron.
00:00:40.200 --> 00:00:42.960
Change is something that happens between separate points in time.
00:00:43.600 --> 00:00:48.640
And when you blind yourself to all but just a single instant, there’s not really any room for change.
00:00:49.520 --> 00:00:51.160
You’ll see what I mean more as we get into it.
00:00:51.440 --> 00:01:06.080
But when you appreciate that a phrase like “instantaneous rate of change” is actually nonsense, I think it makes you appreciate just how clever the fathers of calculus were in capturing the idea that that phrase is meant to evoke, but with a perfectly sensible piece of math, the derivative.
00:01:07.520 --> 00:01:16.160
As our central example, I want you to imagine a car that starts at some point A, speeds up, and then slows down to a stop at some point B 100 meters away.
00:01:16.680 --> 00:01:19.040
And let’s say it all happens over the course of 10 seconds.
00:01:20.600 --> 00:01:23.880
That’s the set-up to have in mind as we lay out what the derivative is.
00:01:24.600 --> 00:01:30.920
We could graph this motion, letting the vertical axis represent the distance traveled and the horizontal axis represent time.
00:01:30.920 --> 00:01:45.400
So at each time 𝑡, represented with a point on this horizontal axis, the height of the graph tells us how far the car has traveled in total after that amount of time.
00:01:46.720 --> 00:01:51.120
It’s pretty common to name a distance function like this, 𝑠 of 𝑡.
00:01:51.120 --> 00:01:52.800
I would use the letter 𝑑 for distance.
00:01:52.800 --> 00:01:55.480
But that guy already has another full-time job in calculus.
00:01:56.400 --> 00:01:59.680
Initially, this curve is quite shallow since the car is slow to start.
00:02:00.240 --> 00:02:04.280
During that first second, the distance that it travels doesn’t really change that much.
00:02:05.080 --> 00:02:13.000
Then, for the next few seconds, as the car speeds up, the distance traveled in a given second gets larger, which corresponds to a steeper slope in this graph.
00:02:14.000 --> 00:02:16.800
And then towards the end when it slows down, that curve shallows out again.
00:02:16.800 --> 00:02:27.040
And if we were to plot the car’s velocity in meters per second as a function of time, it might look like this bump.
00:02:27.880 --> 00:02:29.960
At early times, the velocity is very small.
00:02:30.480 --> 00:02:36.480
Up to the middle of the journey, the car builds up to some maximum velocity, covering a relatively large distance each second.
00:02:37.760 --> 00:02:39.800
Then, it slows back down towards the speed of zero.
00:02:41.720 --> 00:02:44.440
And these two curves here are definitely related to each other, right?
00:02:44.880 --> 00:02:50.880
If you change the specific distance-versus-time function, you’re gonna have some different velocity-versus-time function.
00:02:51.640 --> 00:02:55.000
And what we wanna understand is the specifics of that relationship.
00:02:55.600 --> 00:02:59.640
Exactly, how does velocity depend on a distance-versus-time function?
00:03:01.840 --> 00:03:07.200
And to do that, it’s worth taking a moment to think critically about what exactly velocity means here.
00:03:07.200 --> 00:03:11.440
Intuitively, we all might know what velocity at a given moment means.
00:03:11.680 --> 00:03:14.400
It’s just whatever the car’s speedometer shows in that moment.
00:03:14.400 --> 00:03:25.480
And intuitively, it might make sense that the car’s velocity should be higher at times when this distance function is steeper, when the car traverses more distance per unit time.
00:03:26.640 --> 00:03:30.800
But the funny thing is, velocity at a single moment makes no sense.
00:03:31.440 --> 00:03:39.880
If I show you a picture of a car, just a snapshot in an instant, and ask you how fast it’s going, you’d have no way of telling me.
00:03:39.880 --> 00:03:42.360
What you’d need are two separate points in time to compare.
00:03:43.200 --> 00:03:49.800
That way you can compute whatever the change in distance across those times is, divided by the change in time, right?
00:03:49.800 --> 00:03:51.320
I mean, that’s what velocity is.
00:03:51.320 --> 00:03:54.080
It’s the distance traveled per unit time.
00:03:55.680 --> 00:04:02.240
So how is it that we’re looking at a function for velocity that only takes in a single value of 𝑡, a single snapshot in time?
00:04:03.160 --> 00:04:04.040
It’s weird, isn’t it?
00:04:04.520 --> 00:04:07.480
We wanna associate individual points in time with the velocity.
00:04:07.920 --> 00:04:12.360
But actually, computing velocity requires comparing two separate points in times.
00:04:15.000 --> 00:04:17.360
If that feels strange and paradoxical, good!
00:04:17.920 --> 00:04:20.720
You’re grappling with the same conflicts that the fathers of calculus did.
00:04:21.360 --> 00:04:29.920
And if you want a deep understanding for rates of change, not just for a moving car but for all sorts of things in science, you’re gonna need to resolve this apparent paradox.
00:04:32.440 --> 00:04:34.520
First, I think it’s best to talk about the real world.
00:04:34.520 --> 00:04:36.760
And then, we’ll go into a purely mathematical one.
00:04:37.480 --> 00:04:40.400
Let’s think about what the car’s speedometer is probably doing.
00:04:41.240 --> 00:04:52.160
At some point, say three seconds into the journey, the speedometer might measure how far the car goes in a very small amount of time, maybe the distance traveled between three seconds and 3.01 seconds.
00:04:52.160 --> 00:05:01.440
Then, it could compute the speed in meters per second as that tiny distance traversed in meters divided by that tiny time, 0.01 seconds.
00:05:01.440 --> 00:05:08.160
That is, a physical car just sidesteps the paradox and doesn’t actually compute speed at a single point in time.
00:05:08.760 --> 00:05:11.480
It computes speed during a very small amount of time.
00:05:13.160 --> 00:05:18.600
So let’s call that difference in time d𝑡, which you might think of in this case as 0.01 seconds.
00:05:19.000 --> 00:05:22.360
And let’s call that resulting difference in distance d𝑠.
00:05:23.400 --> 00:05:30.160
So the velocity at some point in time is d𝑠 divided by d𝑡, the tiny change in distance over the tiny change in time.
00:05:31.520 --> 00:05:37.600
Graphically, you can imagine zooming in on some point of this distance-versus-time graph above 𝑡 equals three.
00:05:38.480 --> 00:05:42.960
That d𝑡 is a small step to the right since time is on the horizontal axis.
00:05:43.640 --> 00:05:50.280
And that d𝑠 is the resulting change in the height of the graph since the vertical axis represents distance traveled.
00:05:50.280 --> 00:05:59.440
So d𝑠 divided by d𝑡 is something you can think of as the rise-over-run slope between two very close points on this graph.
00:06:00.600 --> 00:06:03.360
Of course, there’s nothing special about the value 𝑡 equals three.
00:06:03.920 --> 00:06:06.040
We could apply this to any other point in time.
00:06:06.440 --> 00:06:13.800
So, we consider this expression d𝑠 over d𝑡 to be a function of 𝑡, something where I can give you a time 𝑡.
00:06:13.800 --> 00:06:18.480
And you can give me back the value of this ratio at that time, the velocity as a function of time.
00:06:18.480 --> 00:06:27.120
So, for example, when I had the computer draw this bump curve here, the one representing the velocity function, here’s what I had the computer actually do.
00:06:27.960 --> 00:06:31.240
First, I chose a small value for d𝑡.
00:06:31.240 --> 00:06:32.640
I think in this case it was 0.01.
00:06:33.440 --> 00:06:44.720
Then, I had the computer look at a whole bunch of times 𝑡 between zero and 10 and compute the distance function 𝑠 at 𝑡 plus d𝑡 and then subtract off the value of that function at 𝑡.
00:06:45.600 --> 00:06:53.440
In other words, that’s the difference in the distance traveled between the given time, 𝑡, and the time 0.01 seconds after that.
00:06:54.520 --> 00:06:58.040
Then, you can just divide that difference by the change in time, d𝑡.
00:06:58.520 --> 00:07:02.240
And that gives you the velocity in meters per second around each point in time.
00:07:04.360 --> 00:07:09.800
So with a formula like this, you could give the computer any curve representing any distance function, 𝑠 of 𝑡.
00:07:10.280 --> 00:07:12.880
And it could figure out the curve representing velocity.
00:07:13.920 --> 00:07:21.080
So now would be a good time to pause, reflect, make sure that this idea of relating distance to velocity by looking at tiny changes makes sense.
00:07:21.640 --> 00:07:25.120
Because what we’re gonna do is tackle the paradox of the derivative head-on.
00:07:27.640 --> 00:07:37.840
This idea of d𝑠 over d𝑡, a tiny change in the value of the function 𝑠 divided by the tiny change in the input that caused it, that’s almost what a derivative is.
00:07:38.760 --> 00:07:54.280
And Even though a car’s speedometer will actually look at a concrete change in time, like 0.01 seconds, and even though the drawing program here is looking at an actual concrete change in time, in pure math the derivative is not this ratio d𝑠 d𝑡 for a specific choice of d𝑡.
00:07:54.280 --> 00:08:00.640
Instead, it’s whatever that ratio approaches as your choice for d𝑡 approaches zero.
00:08:02.640 --> 00:08:07.120
Luckily, there is a really nice visual understanding for what it means to ask what this ratio approaches.
00:08:07.120 --> 00:08:16.880
Remember, for any specific choice of d𝑡, this ratio d𝑠 d𝑡 is the slope of a line passing through two separate points on the graph, right?
00:08:17.760 --> 00:08:29.640
Well, as d𝑡 approaches zero and as those two points approach each other, the slope of the line approaches the slope of a line that’s tangent to the graph at whatever point 𝑡 we’re looking at.
00:08:30.560 --> 00:08:36.480
So, the true, honest-to-goodness, pure math derivative is not the rise-over-run slope between two nearby points on the graph.
00:08:37.120 --> 00:08:40.880
It’s equal to the slope of a line tangent to the graph at a single point.
00:08:42.240 --> 00:08:43.600
Now notice what I’m not saying.
00:08:44.000 --> 00:08:52.240
I’m not saying that the derivative is whatever happens when d𝑡 is infinitely small, whatever that would mean, nor am I saying that you plug in zero for d𝑡.
00:08:53.040 --> 00:08:56.160
This d𝑡 is always a finitely small, nonzero value.
00:08:56.760 --> 00:08:58.720
It’s just that it approaches zero is all.
00:09:04.000 --> 00:09:04.960
I think that’s really clever.
00:09:05.360 --> 00:09:16.280
Even though change in an instant makes no sense, this idea of letting d𝑡 approach zero is a really sneaky backdoor way to talk reasonably about the rate of change at a single point in time.
00:09:16.960 --> 00:09:17.480
Isn’t that neat?!
00:09:18.080 --> 00:09:22.720
It’s kind of flirting with the paradox of change in an instant without ever needing to actually touch it.
00:09:23.320 --> 00:09:28.520
And it comes with such a nice visual intuition too, as the slope of a tangent line to a single point on the graph.
00:09:30.040 --> 00:09:42.200
And because change in an instant still makes no sense, I think it’s healthiest for you to think of this slope not as some instantaneous rate of change, but instead as the best constant approximation for a rate of change around a point.
00:09:42.200 --> 00:09:46.840
By the way, it’s worth saying a couple of words on notation here.
00:09:47.320 --> 00:09:57.560
Throughout this video, I’ve been using d𝑡 to refer to a tiny change in 𝑡 with some actual size and d𝑠 to refer to the resulting tiny change in 𝑠, which again has an actual size.
00:09:58.280 --> 00:10:00.680
And this is because that’s how I want you to think about them.
00:10:01.640 --> 00:10:11.040
But the convention in calculus is that whenever you’re using the letter 𝑑 like this, you’re kind of announcing your intention that eventually you’re gonna see what happens as d𝑡 approaches zero.
00:10:11.920 --> 00:10:25.960
For example, the honest-to-goodness pure math derivative is written as d𝑠 divided by d𝑡, even though it’s technically not a fraction, per say, but whatever that fraction approaches for smaller and smaller nudges in 𝑡.
00:10:25.960 --> 00:10:27.440
I think a specific example should help here.
00:10:28.200 --> 00:10:34.440
You might think that asking about what this ratio approaches for smaller and smaller values would make it much more difficult to compute.
00:10:35.160 --> 00:10:37.520
But, weirdly, it kind of makes things easier.
00:10:38.160 --> 00:10:42.880
Let’s say that you have a given distance-versus-time function that happens to be exactly 𝑡 cubed.
00:10:43.400 --> 00:10:47.280
So after one second, the car has traveled one cubed equals one meters.
00:10:47.720 --> 00:10:51.920
After two seconds, it’s traveled two cubed, or eight, meters, and so on.
00:10:53.040 --> 00:10:55.400
Now what I’m about to do might seem somewhat complicated.
00:10:55.800 --> 00:10:58.040
But once the dust settles, it really is simpler.
00:10:58.400 --> 00:11:01.520
And more importantly, it’s the kinda thing that you only ever have to do once in calculus.
00:11:01.720 --> 00:11:09.240
Let’s say you wanted to compute the velocity, d𝑠 divided by d𝑡, at some specific time, like 𝑡 equals two.
00:11:10.040 --> 00:11:14.240
And for right now, let’s think of d𝑡 as having an actual size, some concrete nudge.
00:11:14.560 --> 00:11:15.920
We’ll let it go to zero in just a bit.
00:11:17.080 --> 00:11:27.000
The tiny change in distance between two seconds and two plus d𝑡 seconds, well that’s 𝑠 of two plus d𝑡 minus 𝑠 of two, and we divide that by d𝑡.
00:11:27.000 --> 00:11:34.360
Since our function is 𝑡 cubed, that numerator looks like two plus d𝑡 cubed minus two cubed.
00:11:35.240 --> 00:11:37.760
And this, this is something can work out algebraically.
00:11:37.760 --> 00:11:39.840
Again, bear with me.
00:11:39.840 --> 00:11:41.680
There’s a reason that I’m showing you the details here.
00:11:41.680 --> 00:11:54.320
When you expand that top, what you get is two cubed plus three times two squared d𝑡 plus three times two times d𝑡 squared plus d𝑡 cubed.
00:11:55.000 --> 00:11:56.760
And all of that is minus two cubed.
00:11:58.320 --> 00:11:59.360
Now there’s a lot of terms.
00:11:59.360 --> 00:12:02.920
And I want you to remember that it looks like a mess, but it does simplify.
00:12:03.800 --> 00:12:05.680
Those two cubed terms, they cancel out.
00:12:06.680 --> 00:12:08.800
And then everything remaining here has a d𝑡 in it.
00:12:08.800 --> 00:12:13.480
And since there’s a d𝑡 on the bottom there, many of those cancel out as well.
00:12:14.240 --> 00:12:24.360
What this means is that the ratio, d𝑠 divided by d𝑡, has boiled down into three times two squared plus, well, two different terms that each have a d𝑡 in them.
00:12:25.520 --> 00:12:34.680
So if we ask what happens as d𝑡 approaches 0, representing the idea of looking at a smaller and smaller change in time, we can just completely ignore those other terms.
00:12:36.200 --> 00:12:43.000
By eliminating the need to think about a specific d𝑡, we’ve actually eliminated a lot of the complication in the full expression!
00:12:44.000 --> 00:12:47.240
So what we’re left with is this nice clean three times two squared.
00:12:48.480 --> 00:12:57.080
You can think of that as meaning that the slope of a line tangent to the point at 𝑡 equals two of this graph is exactly three times two squared or 12.
00:12:58.320 --> 00:13:01.040
And of course, there’s nothing special about the time 𝑡 equals two.
00:13:01.520 --> 00:13:08.000
We could more generally say that the derivative of 𝑡 cubed as a function of 𝑡, is three times 𝑡 squared.
00:13:11.200 --> 00:13:13.280
Now take a step back because that’s beautiful.
00:13:13.800 --> 00:13:16.280
This derivative is this crazy complicated idea.
00:13:16.560 --> 00:13:19.600
We’ve got tiny changes in distance over tiny changes in time.
00:13:19.920 --> 00:13:24.680
But instead of looking at any specific one of those, we’re talking about what that thing approaches.
00:13:25.120 --> 00:13:26.600
I mean, that’s a lot to think about!
00:13:26.600 --> 00:13:33.080
And yet, what we’ve come out with is such a simple expression, three times 𝑡 squared.
00:13:33.080 --> 00:13:35.720
And in practice, you wouldn’t go through all this algebra each time.
00:13:35.720 --> 00:13:44.440
Knowing that the derivative of 𝑡 cubed is three 𝑡 squared is one of those things that all calculus students learn how to do immediately without having to rederive it each time.
00:13:45.040 --> 00:13:51.760
And in the next video, I’m gonna show you a nice way to think about this and a couple of other derivative formulas in really nice geometric ways.
00:13:52.840 --> 00:14:04.680
But the point I wanna make by showing you all of the algebraic guts here is that when you consider the tiny change in distance caused by a tiny change in time for some specific value of d𝑡, you’d have kind of a mess.
00:14:05.280 --> 00:14:11.200
But when you consider what that ratio approaches as d𝑡 approaches zero, it lets you ignore much of that mess.
00:14:11.200 --> 00:14:12.960
And it really does simplify the problem.
00:14:13.760 --> 00:14:16.680
That right there is kind of the heart of why calculus becomes useful.
00:14:18.320 --> 00:14:28.680
Another reason to show you a concrete derivative like this is that it sets the stage for an example of the kind of paradoxes that come about if you believe too much in the illusion of instantaneous rate of change.
00:14:29.880 --> 00:14:34.000
So think about the actual car traveling according to this 𝑡 cubed to distance function.
00:14:34.640 --> 00:14:38.680
And consider its motion at the moment 𝑡 equals zero, right at the start.
00:14:39.640 --> 00:14:46.080
Now ask yourself whether or not the car is moving at that time.
00:14:46.080 --> 00:14:53.680
On the one hand, we can compute its speed at that point using the derivative, three 𝑡 squared, which for time 𝑡 equals zero works out to be zero.
00:14:54.680 --> 00:14:59.280
Visually, this means that the tangent line to the graph at that point is perfectly flat.
00:14:59.800 --> 00:15:03.200
So the car’s, quote, unquote, “instantaneous velocity” is zero.
00:15:03.880 --> 00:15:06.040
And that suggests that obviously it’s not moving.
00:15:07.160 --> 00:15:11.760
But on the other hand, if it doesn’t start moving at time zero, when does it start moving?
00:15:12.840 --> 00:15:14.480
Really, pause and ponder that for a moment.
00:15:15.040 --> 00:15:17.760
Is the car moving at time 𝑡 equals zero?
00:15:22.680 --> 00:15:23.600
Do you see the paradox?
00:15:24.280 --> 00:15:26.080
The issue is that the question makes no sense.
00:15:26.520 --> 00:15:28.720
It references the idea of change in a moment.
00:15:29.040 --> 00:15:30.440
But that doesn’t actually exist.
00:15:30.840 --> 00:15:32.640
That’s just not what the derivative measures.
00:15:33.480 --> 00:15:43.080
What it means for the derivative of a distance function to be zero is that the best constant approximation for the car’s velocity around that point is zero meters per second.
00:15:44.080 --> 00:15:51.080
For example, if you look at an actual change in time, say between time zero and 0.1 seconds, the car does move.
00:15:51.480 --> 00:15:53.920
It moves 0.001 meters.
00:15:54.640 --> 00:15:55.400
That’s very small.
00:15:55.840 --> 00:16:02.800
And importantly, it’s very small compared to the change in time, giving an average speed of only 0.01 meters per second.
00:16:04.080 --> 00:16:13.760
And remember, what it means for the derivative of this motion to be zero is that for smaller and smaller nudges in time, this ratio of meters per second approaches zero.
00:16:14.800 --> 00:16:16.600
But that’s not to say that the car is static.
00:16:17.360 --> 00:16:22.760
Approximating its movement with a constant velocity of zero is, after all, just an approximation.
00:16:24.320 --> 00:16:37.320
So whenever you hear people refer to the derivative as an instantaneous rate of change, a phrase which is intrinsically oxymoronic, I want you to think of that as a conceptual shorthand for the best constant approximation for rate of change.
00:16:37.320 --> 00:16:43.640
In the next couple of videos, I’ll be talking more about the derivative, what it looks like in different contexts.
00:16:43.640 --> 00:16:44.840
How do you actually compute it?
00:16:44.840 --> 00:16:45.680
Why is it useful?
00:16:45.680 --> 00:16:48.880
Things like that, focusing on visual intuition as always.