this post was submitted on 17 Sep 2024
305 points (98.4% liked)

xkcd

9048 readers
150 users here now

A community for a webcomic of romance, sarcasm, math, and language.

founded 2 years ago
MODERATORS
 

Conveniently for everyone, it turns out that dark energy is produced by subterranean parasitoid wasps.

https://explainxkcd.com/2986

you are viewing a single comment's thread
view the rest of the comments
[–] affiliate 5 points 3 months ago (1 children)

that’s fair. i remember multivariate being a bit rough back in the day. i feel like a lot of the difficulty with it might be due to how many shortcuts are taken when explaining things in singlevariate analysis, since a fair number of core concepts and tools don’t translate super well into the multivariate case.

i think the worst offender is the idea of the derivative as “the slope”, since that makes it quite hard to guess what the multidimensional derivative should be, and it makes the notions of gradient and partial derivatives a bit suspect. but some of the ways they teach integration in singlevariate analysis also don’t translate super well.

i feel like with calculus a bit part of the difficulty is in building up the intuition about how things work and what things mean, but my experience has been that that’s not a huge part of calculus courses. knowing some of the history about differentials and infinitesimals can also help a bit too, since that’s how calculus was first done, and it helps to understand the notation as well.

i hope some of this helps, and feel free to ask if you have any questions about some of the concepts

[–] [email protected] 2 points 3 months ago (1 children)

It should be easy, it's just analysis but with an added dimension, basically. How is it so hard? How is it that the more I'm "learning" for that damn math exam the less I know? Why do I need it in the first place? Why have exams at all? I know what I know, and it's not like I'm learning anything by preparing for them. I hate exams so much, it's so stressful.

I doubt you have the answers to that, even if you did, they wouldn't really help. So let's ask something useful, since you're offering.

What the hell is a total derivative, and why is it suddenly the same as a tangential plane?

Why is the gradient just a collection of the first partial derivatives? How's a tuple of them any useful? Apparently it's showing the direction of steepest ascend or something? I don't get it.

[–] affiliate 2 points 3 months ago (1 children)

How is it so hard?

I think a lot of the reason is that fields (the real numbers in this case) have some pretty lousy categorical properties, and you can't define a very nice additive and multiplicative structure on ℝ^n^ for n >3. So you end up having to deal with vector spaces instead of fields. i.e., you can't (in general) multiply or divide points in ℝ^n^ by other points in ℝ^n^, so you have way fewer tricks at your disposal. The other thing is that you don't have a way to order points in ℝ^n^, so nice things like the mean value property sort of disappear. There are a few other complications as well, but I think those are the big ones. It's a whole other beast than singlevariate analysis.

How is it that the more I’m “learning” for that damn math exam the less I know?

I feel like this is just an unfortunate part of learning math. I'm not really sure that feeling ever goes away, but it usually means you're making progress. My experience has been that the more math I learn, the more comfortable I get with the things I already know, and the more I realize how much is left to learn. So it feels like I only really know the "basic stuff" and continue to struggle with the "hard stuff". My advice would be to try to not be discouraged by it, although it's easier said than done.

Why do I need it in the first place? Multivariate analysis is super useful in applications, especially for 3d rendering/modeling. It shows up a lot in video game/physics programming, and probably a bunch of other things too. It's also foundational for more advanced things like tensor calculus/differential geometry/special relativity.

Why have exams at all? I'm going to be real with you, I'm completely on your side on this one. I think exams just cause a bunch of stress and that it would be better to just get rid of them. I never liked exams.

Onto the more technical questions. I'll try to make things handwavey to hopefully make the "big picture" shine through a bit. I think analysis textbooks are a bit guilty of getting too wrapped up in the details and missing the forest for the trees (or however the saying goes).

What the hell is a total derivative, and why is it suddenly the same as a tangential plane?

The total derivative is basically just a way to turn calculus problems into linear algebra problems. I think it's best understood by first looking at the one dimensional case, and then trying to generalize it a bit to higher dimensions. The key idea is this:

The derivative of a function f: ℝ -> ℝ at a point x~0~ is the best way to approximate f with a straight line at the point x~0~. This means that the linear equation y = f'(x~0~) * x + f(x~0~) is the most accurate approximation of f at x~0~.

Notice how in the 1-dimensional case this is just a "clever" way to rephrase that f'(x~0~) is the "instantaneous slope" of f at x~0~.

In higher dimensions, it no longer makes sense to approximate f with a straight line, because lines are 1-dimensional objects, whereas the domain/codomain of the function might not necessarily be 1-dimensional. However, it does still make sense to talk about the best linear approximation of f. A bit of linear algebra knowledge helps to make this idea clearer, but I'll try to do my best to explain it with as little linear algebra as I can. (But let me know if you want a more linear algebra heavy explanation.)

A higher dimensional linear function is (basically) just a matrix, and a matrix is basically just a way to (linearly) turn one vector into another vector. At a high level, you can think of a matrix as turning one copy of ℝ^m^ into another copy of ℝ^n^, possibly rotating/translating/scaling things in the process. (Compare this to the 1-dimensional case, where a 1 x 1 matrix is just a number, and multiplying by a a number "turns one copy of ℝ to another copy of ℝ", provided that number isn't 0.)

So, the total derivative is basically just a matrix that gives the best way to approximate a multivariable function f at a vector x~0~. And as you vary the input vectors, you end up tracing out a copy of ℝ^n^ for some n. i.e., you get an n-dimensional plane that corresponds to the "best" approximation for f. And "best approximation" is just a slightly less fancy way of saying "tangential".

Why is the gradient just a collection of the first partial derivatives?

I always found the gradient to be a bit confusing. But I think it helps to understand it best in terms of what it does, and not in terms of how it's defined. The "purpose" of the gradient is to let you compute the directional derivative. i.e., what is the derivative in the direction of a given vector v. So, lets use the notation

(∇f)(v) to denote the directional derivative of f, in the direction of v.

Let's consider the 3-dimensional case and write v = a~1~e~1~ + a~2~e~2~ + a~3~e~3~ for basis vectors e~i~ and real numbers a~i~.

Since "taking the derivative" is linear, we would expect to have

(∇f)(v) = (∇f)(a~1~e~1~ + a~2~e~2~ + a~3~e~3~) = a~1~(∇f)(e~1~) + a~2~(∇f)(e~2~) + a~3~(∇f)(e~3~).

In other words, we only need to compute the directional derivative of the basis vectors in order to figure out the gradient. That's pretty nice! Also, the derivative of ~f~ in the direction of e~i~ is exactly the partial derivative of f taken with respect to e~i~. Let's write f~i~ for the partial derivative with respect to e~i~ (just because I don't know how well Lemmy handles double subscripts). Then we can rewrite the above equation as

(∇f)(v) = = a~1~f~1~ + a~2~f~2~ + a~3~f~3~.

Now compare that with the dot product of the vectors (f~1~, f~2~, f~3~) and _v = (a~1~, a~2~, a~3~). It's exactly the same. So, the gradient can be defined in terms of taking the dot product of a vector with the partial derivatives. But I think that kind of loses a lot of the intuitive meaning of the gradient in the process.

I hope you found some of this helpful, and feel free to ask if you have any more questions/found something I said confusing.

[–] [email protected] 2 points 3 months ago (1 children)

Thanks for answering my frustrated questions, was a long day yesterday. I'll try to understand the deeper truths later, but I can already tell the matrix stuff goes over my head.

[–] affiliate 1 points 3 months ago

anytime. i’ve also had my fair share of long days studying analysis. and i feel like most of my time spent trying to learn analysis was spent fighting with the textbooks. i think the (ε,δ) stuff is to blame for that, but that’s a whole other topic.

anyways, i was thinking a bit more about the matrix stuff and i think i have a better explanation if you’re interested, since my previous one was probably a bit too abstract. i think it should honestly be criminal to teach multivariate analysis before linear algebra, since a lot of the purpose of multivariate analysis is to turn complicated problems into linear problems. but anyways, here’s the big picture:

you don’t really need to understand the ins and outs of matrices and be super familiar with them to get a sense of what the total derivative is, and how it should behave. for that purpose, here are some of the highlights of matrices and the total derivative:

Let A be an m x n matrix. Then:

  • Multiplication with A defines a so-called “linear function” from ℝ^n^ to ℝ^m^. put simply, this means that if you have a line in ℝ^n^, and you multiply each point in that line with A, then the result is a line in ℝ^m^. (This is because, under the hood, matrix multiplication is just a bunch of scalar multiplication and addition.)
  • There’s a slight catch to what I said above: sometimes you multiply the points in a line with a matrix and they all get sent to the 0 vector instead of to another line. (Compare this to what happens when A is a 1 x 1 matrix, i.e. a number, and multiplying every point in ℝ with A will either give you only the number 0, or it will give you all of ℝ.)
  • Now think about a plane: it’s something spanned by two lines. (The simplest case being ℝ^2^, which is spanned by the x and y axis.) Since matrices send lines to either lines or 0, there are three options for what can happen to a plane: it gets sent to a plane (no spanning lines get sent to 0), or a line (one of the spanning lines get sent to 0), or a 0 (both spanning lines get sent to 0). You can do some fancy math to show that the first case (where a plane get sent to a plane) is much more likely than the other two cases. So this is where the idea of a tangent plane comes from: approximate a function with a matrix, and the matrix corresponds to a plane that “stays close” to the function.
  • In any case, matrix multiplication is an extremely easy thing for computers to do, because there’s a formula for it. In contrast, evaluating arbitrary functions is not easy, and there’s no formula for that. This is really the main benefit of the total derivative: you can approximate the behavior of a function with matrix multiplication. And we know a whole lot more about dealing with matrices than we do about dealing with random functions.

So those are two ways to look at the total derivative: you can try to get a geometric understanding of what it does (approximate the function with the best fitting plane), or try to look at why it’s useful (turning harder problems into easier problems). But just to be clear, dealing with matrices is still hard, it’s just comparably a lot easier than dealing with random functions.