Intelligent Design Mathematics

Is Standard Calculus Notation Wrong?

Spread the love

We usually think of basic mathematics such as introductory calculus to be fairly solid. However, recent research by UD authors shows that calculus notation needs a revision.

Many people complain about ID by saying, essentially, “ID can’t be true because all of biology depends on evolution.” This is obviously a gross overstatement (biology as a field was just fine even before Darwin), but I understand the sentiment. Evolution is a basic part of biology (taught in intro biology), and therefore it would be surprising to biologists to find that fundamental pieces of it were wrong.

However, the fact is that oftentimes fundamental aspects of various fields are wrong. Surprisingly, this sometimes has little impact on the field itself. If premise A is faulty and leads to faulty conclusions, oftentimes workaround B can be invoked to get around A’s problems. Thus, A can work as long as B is there to get around its problems.

Anyway, I wanted to share my own experience of this with calculus. Some of you know that I published a Calculus book last year. My goal in this was mostly to counter-act the dry, boring, and difficult-to-understand textbooks that dominate the field. However, when it came to the second derivative, I realized that not only is the notation unintuitive, there is literally no explanation for it in any textbook I could find.

For those who don’t know, the notation for the first derivative is . The first derivative is the ratio of the change in y (dy) compared to the change in x (dx). The notation for the second derivative is . However, there is not a cogent explanation for this notation. I looked through 20 (no kidding!) textbooks to find an explanation for why the notation was the way that it was.

Additionally, I found out that the notation itself is problematic. Although it is written as a fraction, the numerator and denominator cannot be separated without causing math errors. This problem is somewhat more widely known, and has a workaround for it, known as Faa di Bruno’s formula.

My goal was to present a reason for the notation to my readers/students, so that they could more intuitively grasp the purpose of the notation. So, I decided that since no one else was providing an explanation, I would try to derive the notation myself.

Well, when I tried to derive it directly, it turns out that the notation is simply wrong (footnote – many mathematicians don’t like me using the terminology of “wrong”, but, I would argue that a fraction that can’t be treated like a fraction *is* wrong, especially when there is an alternative that does work like a fraction). Most people forget that is, in fact, a quotient. Therefore, the proper rule to apply to this is the quotient rule (a first-year calculus rule). When you do this to the actual first derivative notation, the notation for the second derivative (the derivative of the derivative) is actually . This notation can be fully treated as a fraction, and requires no secondary formulas to work with.

What does this have to do with Intelligent Design? Not much directly. However, it does show that, in any discipline, there is the possibility that asking good questions about basic fundamentals may lead to the revising of some of even the most basic aspects of the field. This is precisely what philosophy does, and I recommend the broader application of philosophy to science. Second, it shows that even newbies can make a contribution. In fact, I found this out precisely because I *was* a newbie. Finally, in a more esoteric fashion (but more directly applicable to ID), the forcing of everything into materialistic boxes limits the progress of all fields. The reason why this was not noticed before, I believe, is because, since the 1800s, mathematicians have not wanted to believe that infinitesimals are valid entities. Therefore, they were not concerned when the second derivative did not operate as a fraction – it didn’t need to, because it indeed wasn’t a fraction. Infinities and infinitesimals are the non-materialistic aspects of mathematics, just as teleology, purpose, and desire are the non-materialistic aspects of biology.

Anyway, for those who want to read the paper, it is available here:

Bartlett, Jonathan and Asatur Khurshudyan. 2019. Extending the Algebraic Manipulability of Differentials. Dynamics of Continuous, Discrete and Impulsive Systems, Series A: Mathematical Analysis 26(3):217-230.

I would love any comments, questions, or feedback.

121 Replies to “Is Standard Calculus Notation Wrong?

  1. 1
    daveS says:

    Most people forget that dy/dx is, in fact, a quotient.

    Are you a physicist? 🙂

    As a math-oriented person, I would strenuously object; it’s not a quotient. Rather, it’s just one fairly suggestive notation for the derivative of a function. You’re not supposed to separate the dy and dx (err…, unless you’re studying differential forms, or something crazy like that).

    But there is a lot of bad (although not really wrong) notation in calculus unfortunately. If I had my way, I would replace the operator d/dx with D or D_x (at the introductory calculus level), but then the physicists would be up in arms. I’m pretty sure we’re stuck with it for the foreseeable future.

  2. 2
    hazel says:

    Some thoughts

    1. I can give what some people might take as a cogent explanation why we use the notation we do for the second derivative, although it will be hard to type it. Also, I’m not saying that the problems johnnyb has identified aren’t legitimate (I just glanced at the paper), but here is, approximately, what I used to explain to my students. (Note: this was an introductory high school course in a small school, not a rigorous college Calc I course.)

    First, consider the units in a simple example: if s = distance in feet, s’ = v = ds/dt is in ft/sec (which agrees with the ds/dt notation). Next s’’ = a = ft/sec/sec, or ft/sec^2. Notice that units of the denominator are squared, but those of the numerator are not.

    Now consider this: dy/dx can be thought of as the rate of change of y in respect to x. The second derivative is the derivative of the derivative, so it can be thought of as d (dy/dx)/dx: that is, the derivative of dy/dx in respect to x, again. The notation suggest that the new numerator is d(d(y) and the new numerator, as with the unit example above, is (dx)^2.

    Therefore, we write d^2y in the numerator to mean the derivative of the derivative, but just dx^2 (dropping the parentheses) in the denominator

    2. To Dave: how do you feel about this? Let f = x^2. Therefore f’ = dy/dx = 2x, so dy = 2x*dx. Or how about implicit differentiation: if x^2 + y^2 = 25, can we write 2x*dx + 2y*dy = 0, so dy/dx = -x/y? Is this one of the crazy things you were referring to? 🙂

    3. Also, Dave, you write, “it’s (dy/dx) not a quotient. Rather, it’s just one fairly suggestive notation for the derivative of a function.”

    The way I taught the meaning and derivation of derivative was to take the slope of a secant line between a fixed point P and another point Q and consider the slope quotient delta y/delta x. Then let Q move towards P and take the limit of the resulting slope as the distance between P and Q becomes infinitesimally small, so that delta y become dy and delta x becomes dx, and thus dy/dx represents the “infinitely small” slope triangle at P: that is, the derivative of the function at that point.

    That is, a think it’s a good suggestive notation for derivative because it does represent the limit of a quotient that the students are very familiar with.

    I know some may think this very unrigorous but I think I had a lot of success building on these ideas to get successful calculus students.

    4. And I agree about D notation, if for no other reason than efficiency: D(x^2) = 2x is nice and clean. Then you can write y’ or dy/dx or whatever is convenient for 2x. Is this what you had in mind?

  3. 3
    daveS says:

    hazel,

    Regarding #2, yes those manipulations are what I was referring to. 🙂 Of course they have some utility, but the meaning of expressions such as an isolated “dx” is not usually accessible to a first-year calculus student. I can see how it’s pragmatic to teach such things, on the other hand.

    #4: Yes, that’s what I had in mind. And it reinforces the notion that d/dx is itself a function which sends functions to their derivatives.

  4. 4
    daveS says:

    johnnyb,

    The notation for the second derivative is d^2y/dx^2 . However, there is not a cogent explanation for this notation. I looked through 20 (no kidding!) textbooks to find an explanation for why the notation was the way that it was.

    FWIW, here’s how I understand the notation.

    d/dx is an operator which sends functions to their derivatives. You can compose this operator with itself.

    The second derivative of y wrt x is d/dx( d/dx( y ) ), which is also (d/dx o d/dx)(y), the “o” meaning composition.

    More briefly, this equals (d/dx)^2(y) or finally, d^2y/dx^2.

    It’s just like how we sometimes denote a linear transformation applied twice to a vector: T(T(v)) = T^2(v).

  5. 5
    hazel says:

    Fun discussion, Dave. So how do you explain the isolated dx in an integral, such as F = integral( 2x * dx) = x^2.

    When I was first learning to teach calc so kid could understand, one of my students want to know “where did the dx go” when you integrated, and I couldn’t explain.

    Later, when I learned to teach understanding the integral as an area, I explained that 2x was the height of an infinitely skinny rectangle, with base dx, so 2x dx stood for the area of that infinitely skinny rectangle, and then integration just added up all the rectangles to get the area.

    Do you buy that? 🙂

  6. 6
    johnnyb says:

    DaveS –

    As mentioned in the article, the problem that the mathematicians have had with differentials actually has very little to do with mathematics, and a lot to do with philosophy. It was claimed that infinitesimals were not a rigorous conception, but that has been shown to be false, especially with non-standard analysis (though this was just a formalization of things that were already known and done by those working with infinitesimals). Non-standard analysis just gave a more rigorous way of speaking of it.

    It is true that mathematicians prefer limits and physicists prefer infinitesimals. I, however, am neither. I’m a theologian with an interest in mathematics.

    I do agree that Arbogast’s notation is preferable to what we have now, and I mention it in the paper. However, as Hazel points out, in order for integration to work, you already have to be able to manipulate dy’s and dx’s. It is better to do so with a notation that actually has robust support for the operation!

  7. 7
    johnnyb says:

    Hazel –

    Your description of the current notation may be how it is intended, but it is in fact incorrect, even as you present it, which is the purpose of the paper (see the last paragraph of section 3). You said,

    Now consider this: dy/dx can be thought of as the rate of change of y in respect to x. The second derivative is the derivative of the derivative, so it can be thought of as d (dy/dx)/dx: that is, the derivative of dy/dx in respect to x, again. The notation suggest that the new numerator is d(d(y) and the new numerator, as with the unit example above, is (dx)^2.

    Therefore, we write d^2y in the numerator to mean the derivative of the derivative, but just dx^2 (dropping the parentheses) in the denominator

    While I’ve seen this justification before, it misses a crucial step. The problem with your suggestion is that if you actually perform “d (dy/dx)/dx”, the differentiation step would have to actually perform the differentiation operation. Since dy/dx is a quotient, the proper way to differentiate a quotient is with the quotient rule. That’s why the d^2y/dx^2 doesn’t work. That is not the application of the quotient rule. When you apply the quotient rule, then you get my new notation. So, you (and pretty much everyone else) were on the right track, but, because you failed to recognize that dy/dx was actually a quotient, you got off at the wrong station.

  8. 8
    kairosfocus says:

    JB, read as a fraction, the notation is indeed puzzling; it is an operator, and it acts on y = f(x), so it is reasonable to think of d^2 rather than dy*dy, more or less as your eqn 2 p. 221. The appearance as a fraction is clearly misleading in a certain sense and the odd notation has done some sweeping under the carpet . . . including bridges to physical considerations starting with Newton. I understand the appeal to the alternative D notation, and that reflects the Laplace transform, which renders it as s^2. That of course lurks behind the auxiliary equation that appears in solving differential equations. (Sometimes, I wonder if Laplace transforms should be snuck in early.) To get into real complexities and oddities, extend to the use of partial differentials and their rules. Similarly, in looking at the integral, the limit-summation approach leads to multiplying rectangular strips of width dx, so in effect it is a sum of multiples of infinitesimals, and the splitting dy/dx = f(x) –> dy = f(x)*dx –> S*dy = Sf(x)*dx and of course S*dy = y, with S standing in for elongated s, a form that has now dropped out of typography except in Mathematics. II only mention the superposed loop for integration across a closed domain such as a loop or surface to study things like fluxes. And yes, that’s another thing that is glossed over too often. It seems to me, that — as is so common — the “simple” things used as first educational steps lurk next to deep complexities. Which then leads to glossing over. Try another: spinning a Faraday disk at right angles to a B-field generates an EMF, but spinning the magnet and imagining this spins the field relative to the disk and should yield an EMF fails. There are many sharks swimming in these waters. KF

  9. 9
    kairosfocus says:

    PS: We should identify that an operator transforms a pre-image function into an image function, Mathematically. Physically, we have electronic, mechanical, pneumatic, hydraulic and electronic processes that effect such operations physically. Thus, the operational amplifier’s significance. A major use of such devices is in integrator chains that resolve differential equations physically.

  10. 10
    daveS says:

    hazel,

    So how do you explain the isolated dx in an integral, such as F = integral( 2x * dx) = x^2.

    One way is to interpret the expression f(x)dx as a differential form (a 1-form, as described here). Another way would be think of ∫ __ dx as a unit that cannot be analyzed further. You just place the function in between, perhaps place limits on the integral sign, and go from there.

    Certainly your explanation of 2x * dx as representing the product of the height and width of a rectangle appeals to intuition, and people do find that helpful. I tend to think of such explanations in terms of a Riemann sum, where the the dx has been replaced by Δx, which is a real number.

  11. 11
    daveS says:

    johnnyb,

    As mentioned in the article, the problem that the mathematicians have had with differentials actually has very little to do with mathematics, and a lot to do with philosophy. It was claimed that infinitesimals were not a rigorous conception, but that has been shown to be false, especially with non-standard analysis (though this was just a formalization of things that were already known and done by those working with infinitesimals). Non-standard analysis just gave a more rigorous way of speaking of it.

    It is a philosophical difference I suppose. I just don’t see there’s much to be gained in insisting that dy/dx is actually a fraction of infinitesimals, even though it is possible to do so rigorously. It’s simpler and ultimately more beneficial to learn basic calculus working within the real number system exclusively (IMHO).

    However, as Hazel points out, in order for integration to work, you already have to be able to manipulate dy’s and dx’s. It is better to do so with a notation that actually has robust support for the operation!

    It’s never necessary to think of dx or dy as infinitesimals, or to think of dy/dx as a fraction, however.

    Edit: Do you see how parsing dy/dx as the operator d/dx applied to the function y is advantageous? (Not to mention relatively simple and consistent with the textbooks).

    Second Edit: I see you do discuss d/dx as an operator in the paper you just linked to—I’ll try and read it later.

  12. 12
    johnnyb says:

    DaveS and Hazel –

    You all might be interested in this paper of mine on the teaching method that I use for calculus, and why I find that infinitesimals allows calculus to be much more natural for students to understand. Since infinitesimals are perfectly legitimate (as non-standard analysis shows), and they are easier for students to learn (as several studies and personal experience have shown), and they allow for improved notation (as the present paper shows), it seems silly not to teach using them.

    Simplifying and Refactoring Introductory Calculus

    The title comes from the computer science term “refactoring”. I had originally intended to show how different common refactoring forms from computer science played into this, but eventually just wanted to finish it so that aspect got much shorter shrift than I had originally intended.

  13. 13
    johnnyb says:

    Kairosfocus –

    To get into real complexities and oddities, extend to the use of partial differentials and their rules.

    Speaking of partial differentials. I actually have a paper that covers very similar ground with partial differentials. I didn’t include it in this paper for a few reasons (one of which is just to limit the scope of the paper, another of which is because I still had a few kinks to work out). Now that this one is published I need to finish the other one and send it out. Unfortunately, based on how long it took to publish this one, it will probably be a year or more before anyone sees it 🙁

  14. 14
    daveS says:

    johnnyb,

    Coming back to this:

    The notation for the second derivative is . However, there is not a cogent explanation for this notation. I looked through 20 (no kidding!) textbooks to find an explanation for why the notation was the way that it was.

    After looking at the above linked paper, I find this confusing, since your analysis of d/dx (or d/dx( )) as an operator is completely consistent with my explanation of the meaning of d^2y/dx^2 in #4 (and hazel’s earlier). Are our analyses not cogent?

    You keep insisting that dy/dx is indeed a quotient, but most of us choose not to view them that way. Therefore we need not sign on to your view that the notation for the second derivative should be obtained from dy/dx using the quotient rule.

    I do agree with your thoughts on motivation of the concept of limit before actually jumping in and calculating them. They need to understand clearly what their purpose is.

  15. 15
    johnnyb says:

    You keep insisting that dy/dx is indeed a quotient, but most of us choose not to view them that way. Therefore we need not sign on to your view that the notation for the second derivative should be obtained from dy/dx using the quotient rule.

    Let me put it this way – if you do actually treat it as a quotient, then all of a sudden things that required special formulas before (such as the second derivative chain rule) can now be done by simple algebra. Additionally, you can derive an inverse function theorem for the second derivative if you treat it as a quotient. So, treating it as a quotient gives you benefits. Not treating it as a quotient gives you no additional benefits, plus, since it is written as a quotient, you run into the very real problem that someone who isn’t aware that people write non-quotients in quotient form might use it that way (I’ve actually seen this happen in engineering).

    So, given that there is no advantage to not actually treating it as a quotient, I don’t see why we should write it as a quotient and then not use it that way.

    However, I do think we are both agreed that, if you aren’t using it as a quotient, Arbogast’s D notation is a great improvement (see the short discussion in Section 5, though its main comparison is with Lagrangian).

  16. 16
    daveS says:

    johnnyb,

    So, given that there is no advantage to not actually treating it as a quotient, I don’t see why we should write it as a quotient and then not use it that way.

    I think there definitely are advantages for students wanting to go further in mathematics. Knowing how to write epsilon-delta proofs and more generally, working strictly in the real number system is essential for success in more advanced analysis classes, for example. For other fields, this background is less important.

  17. 17
    johnnyb says:

    I don’t doubt that, for those going higher up in mathematics, epsilon-delta proofs are helpful. However, even though I developed the notation using ideas from non-standard analysis, that doesn’t actually get in the way of doing things “the old fashioned way”. Since the form of the original second derivative was not an actual quotient anyway, I don’t see how writing it the new way would be problematic. It adds potential. It doesn’t remove anything.

    In other words, if it doesn’t matter how you write it (i.e., it is mere notation, and doesn’t reflect any actual quotients), then it will continue to not matter if you change the notation to a new one. If it does matter how you write it (i.e., someone might actually want to *use* it as a quotient), then my approach is clearly better.

    So, in the case you are referring to, the notations are neutral with respect to each other. The difference is that mine gives you the flexibility to go in other directions if you wanted to, while the old fashioned way limits you to never doing that. So, I guess I’m still not seeing any *benefit* from the old notation, though I can see some circumstances where the old notation doesn’t do any active harm.

  18. 18
    PaV says:

    JohnnyB:

    Then, for the second step, this can be divided by dx, yielding: . . .

    I had trouble deriving your formula and then looked at your paper where you include this ‘second step.’ Can you legitimately divide a derivative by another differential and so build a true derivative form? In other words, when I see dy/dx, I think of the derivative of y with respect to x; not the derivative of y “divided” by a differential form. Likewise, d(dy/dx)/dx would be the derivative of the term in the numerator with respect to the differential in the denominator. It seems that if you actually took this derivative—instead of simply dividing, then you would get the same first term but the second term would be very different.

    What am I missing here?

    Also, the Faa di Bruno’s formula involves composite functions, but here were dealing with a straight forward function, y(x). Maybe you can elucidate.

  19. 19
    daveS says:

    johnnyb,

    It does take time to teach these “non-standard” methods, however, which could crowd out some of the important concepts math majors will need in future classes.

    There are also foundational issues. A bright freshman can understand how the construction of the real number system proceeds, and therefore how calculus is built from the ground up, so to speak. The set of hyperreal numbers is another matter. It’s really a bizarre set, being neither Archimedean nor a metric space.

  20. 20
    daveS says:

    PS to my #19

    To sum up, to those students for whom this approach works, more power to them.

  21. 21
    PaV says:

    JohnnyB:

    I just saw your footnote #5. So, a distinction is being made. When is this distinction necessary and used by mathematicians?

  22. 22
    johnnyb says:

    PaV –

    Great questions. Most people don’t realize that Liebnizian calculus actually didn’t have derivatives. It just had differentials. So, in my own writing, I separate out “differentiating” from “taking the derivative”. For me, the derivative is a ratio of differentials. dy/dx is the ratio of the change in y to the change in x in the smallest units. However, to *get* dy and dx, you differentiate. So, if you had “sin(x) = y^2”, you can differentiate this into “cos(x) dx = 2y dy”. You could then solve for “dy/dx” algebraically. “dy/dx = cos(x) / 2y”. This is a lot more straightforward for students, as it unifies explicit, implicit, and multivariable derivatives.

    So, the general form of d(u/v) is found using the quotient rule: (v du – u dv) / v^2. So, if we do d(dy/dx) we get “(dx d(dy) – dy d(dx))/dx^2”. Since we are differentiating and *then* dividing by “dx” (because we are finding the change in the derivative vs. the change in x), it becomes “(dx d(dy) – dy d(dx))/dx^3”. If you take d(dy) = d^2y and d(dx) = d^2x, and then simplify, this becomes “d^2y/dx^2 – (dy/dx)(d^2x/dx^2)”, which is my formula.

    As for Faa di Bruno’s formula, there is an example in the paper in Section 4 (see specifically equation 4) which shows how this form allows transformations without the formula. For instance if we started out with “y = x^3” and “x = t^2”, we could combine their derivatives algebraically without usage of Faa di Bruno’s formula. It’s all doing basic algebra and cancelling.

  23. 23
    johnnyb says:

    PaV – Regarding footnote 5, I’m not sure your point or question. That footnote basically reiterates the point of agreement between me and DaveS expressed in the last paragraph of comment #15 above.

  24. 24
    kairosfocus says:

    JB, what level (and what age) is the Calculus usually first taught in the US system? It seems to me the epsilon-delta, limits approach with sequence, series and limits in required background, comes later after a more intuitive view of such concepts? Yes, that feeds forward to Analysis and a lot of other things, but itself does not seem to me to be conceptually, educationally foundational. I for example used to do a vest pocket demo on water pouring into a cylindrical glass at a variable rate in a gaussian pulse then plotting RATE vs ACCUMULATION of flow, and showing the fundamental theorem intuitively as two inverse operations. The graphs — with time as natural independent variable, then allowed getting rate per chord to tangent and area under through accumulation of strips. Of course a Gaussian pulse then gives us a sigmoid and contrasts a maximum and an inflexion, with connexions. Move to a leaky tank and a negative pulse can be considered, put both together with a controller and we are at control systems . . . my first practical assignment was take the lid off your toilet tank (and if you don’t learn a bit about it, you haven’t really done practical control systems — about the commonest mechanical, process control system). It is also a physically real event. The use of limits then comes out. Such a case study helps to set up a ring of key ideas that can then be drawn on. For example time as underlying variable allows pondering trajectories, growth, saturation etc, a very dynamic view relevant in many fields of thought. The sequences, series, limits, epsilon delta approach then comes out. Nonstandard analysis feeds in on infinitesimals having some substance as dt where [dt]^2 –> 0, which can be seen on how say 10^-6 squared is 10^-12 and this squared is 10^-24, etc, and how the span gets much wider as we go on down. Thoughts? KF

  25. 25
    johnnyb says:

    Kairosfocus –

    A few things. First of all, in every single Calculus course I’ve ever seen, they are taught limits first. This includes high-school students taking Calculus. This focuses on epsilon-delta proofs. This is the first thing that Calculus students hit. It’s a travesty. They get this weird mathematical formalism that has almost no point whatsoever, except to prove something they haven’t gotten to yet. I move limits to the very end of my calculus course (I’ll send you a copy of my book, Calculus from the Ground Up, if you are interested).

    I think your physical examples of rate vs accumulation is an excellent one. I’m an equations guy, so I usually start with the equation “y = 2”, and then help the students build up an equation for the area under that line. That gives “y = 2x”. I then help students build up an equation for the area under that line, which is “y = x^2”. I then start plotting slopes under “y = x^2”, and show that it pretty much gets close to “y = 2x”. Then I ask what the slope of “y = 2x” is, which they should know from the equation of a line. Viola. It turns out they already know some calculus 🙂

    I’m pretty lazy, though, and I do almost all of my teaching on a whiteboard. No field trips for my students 😉

  26. 26
    daveS says:

    They get this weird mathematical formalism that has almost no point whatsoever, except to prove something they haven’t gotten to yet.

    😮

    It actually is quite important for those students who plan to major in mathematics.

  27. 27
    hazel says:

    JB writes, ” First of all, in every single Calculus course I’ve ever seen, they are taught limits first. This includes high-school students taking Calculus.”

    I don’t think this is true, and it is certainly not how I taught calculus at the high school level. I agree with kf and you that we should start with intuitive ideas and simple equations and build understanding of the fundamental meanings of calculus (rate of change and accumulated amounts). We should build solid mechanics of the basic operations, and then add more rigorous theory later, at appropriate times. Also, even at the college level the university that I am familiar with has two calculus tracks: a rigorous one for math and physical science majors, and a more informal one for people like business and social studies majors.

    I remember once helping a former high school student come to me about being lost in college calculus I on a certain point. I was able to put it back into the context of the meanings I had taught her the previous year, and then she was clear and ready to tackle the mechanics because the overall purpose made sense.

    Eventually I completely abandoned the textbook we had and used all my own materials. One aspect of my situation was that many of my students were not actually going to take calculus in college, but the calculus they took with me strengthened their algebra skills and knowledge of functions considerably so that they were very well prepared for college algebra. I also used lots and lots of real-world examples for them to understand how the ideas that calculus were around them all the time.

    Also, I too often introduced limits right at the end of the year, including at least the idea of epsilon/delta proofs, because I knew that might be the one thing they would run into at the start of their college calc course that they hadn’t seen. By that time they had become quite comfortable with thinking informally about infinitesimals and limits that they didn’t have a problem with understanding the formalities.

  28. 28
    johnnyb says:

    It actually is quite important for those students who plan to major in mathematics.

    Just to note, we already had this discussion, and I agreed with you it was important for the future (comment 17). The point of the present post (comment 25) was that, for a student who hasn’t actually done any calculus yet, introducing such at this point is pretty meaningless, because they have no context for why it is important. Likewise, teaching calculus using infinitesimals, I don’t actually teach a formal theory of infinitesimals until the last third of the semester. I mention the idea in passing when we are first getting started, with the promise that, if they want to do this more rigorously, we will return to the topic at the end of the course.

  29. 29
    kairosfocus says:

    H & JB, I’d start with something more basic, a tap and a cylindrical glass, then run water into it. The idea of a flow, a rate, of change and its accumulation as something concrete. With time as independent variable (onward, x(t) and y(t) thus trajectories). Then, a diagram, a leaky tank with inflow and outflow. Then, a gaussian pulse of water in, outflow locked off. Graphs — yes graphs — on parallel time axes: change and accumulation with causal correlations . . . and yes those will speak onwards in economics, marketing, sociotechnical systems, physics, biology, generalised growth, even statistics and more. Yes, ye olde water closet is relevant, and the effect of an outflow pulse counts too. Besides a bell curve or the like is often what a real world acceleration looks like — corner and jump discontinuity headaches. It’s not just Mathematics that gets a say here. Connexions count. From this what the slope of a curve is and what the area under it is can be analysed and will make sense. I have a sneaking feeling this is part of why I see such structures and quantities as part of the fabric of reality, I have been seeing connexions and cases. We can then go to the more conventional stuff. KF

  30. 30
    kairosfocus says:

    JB, thanks, would appreciate. KF

  31. 31
    daveS says:

    johnnyb,

    Referring to equation (6) on page 223 of the paper linked in the OP, you derive a formula for the third derivative. Did you explore even higher derivatives, and perhaps search for a compact way to express these derivatives? For example, I see 1s and 3s, which are the binomial coefficients 3 choose k for k = 0 up to 3. Perhaps there are some interesting connections between your formulas and the binomial theorem or other well-known identities.

    Maybe you could view all this as a system somewhat like the Weyl algebra, which is a ring of differential operators that arises in QM.

  32. 32
    steve_h says:

    It’s been at least 35 years since I did any calculus and I was never any good at it, so this may be a very stupid question.

    If we treat dy/dx as a ratio of two functions of x and apply the quotient rule, doesn’t one of your terms disappear as dx is effectively an infinitesimal constant so its derivative is zero (or to put it another way, the derivative of x with respect to x is 1, second derivative is zero)

  33. 33
    hazel says:

    I think “infinitesimal constant” is an oxymoron. An infinitesimal means that it can get arbitrarily close to zero: dx = limit delta x as delta x -> 0. If the infinitesimal were a constant, that would mean there was a smallest number “next to 0”, and none smaller, which is exactly wrong.

    That’s why you can’t evaluate dy/dx by just plugging in numbers: even though both are infinitesimal, they are approaching 0 at different rates so the ratio approaches a certain number (or function, the derivative, which depends on the original function in question.). This is all covered in the basic definition of derivative, where the difference function delta y/delta x approaches a limit as delta x goes to 0.

    This might not be the most precise explanation possible, but it is what occurs to me off the top of my head.

  34. 34
    hazel says:

    To Steve_h: JB addresses your question, or one like it, at the bottom of page 222. He says “However,
    in (5), the term d^2x/dx^2 is not itself necessarily zero, since it is not the second derivative of x with respect to x. However, he doesn’t say what it is, and I don’t understand his point.

  35. 35
    daveS says:

    hazel,

    I think “infinitesimal constant” is an oxymoron. An infinitesimal means that it can get arbitrarily close to zero: dx = limit delta x as delta x -> 0. If the infinitesimal were a constant, that would mean there was a smallest number “next to 0”, and none smaller, which is exactly wrong.

    I know you’re responding to steve_h here, but I do think johnnyb assumes the existence of “infinitesimal constants”, that is, numbers ε such that ε > 0 and yet ε < 1/n for all positive integers n. I don't know if you have commented here on such things. This strangeness is one more reason I'm skeptical about using these non-standard approaches in beginning calculus.

  36. 36
    kairosfocus says:

    DS, he is viewing infinitesimals as hyperreals, where in effect we can have some K in *N which is greater than 0, 1, 2, . . . in “ordinary” N and gives some 1/K = m, not quite zero by way of catapult. This is then extended to the similar *R. The values of delta-x as they run in towards 0 become infinitesimal in the extremely near neighbourhood. The ratio, dy/dx, of course is a ratio and may approach a limit, where y is a function of x on the given range. The limit of [f(x + h) – f(x)]/[(x +h) – x] or even simply: [f(x + h) – f(x)]/h as h -> 0 first principles approach shows what is happening as the chord tends to the tangent and the slope therefore approaches a limit, providing it exists. The value h is of course trending infinitesimal. Where, clearly, any particular x or f(x) value is surrounded by a near-neighbourhood cloud of hyperreals, using the additivity, imported by way of extending Peano. I have seen a suggestion that m such that m^2 ~ 0, is a good yardstick for what an infinitesimal is. I don’t know, but when I learned Calculus back in 4th form, that was called using first principles, likely tracing to Newton and fluxions. KF

    PS: let f(x) = y^2. Then f(x + h) = x^2 + 2xh + h^2, and f(x) = x^2 so the difference is 2xh + h^2 ~ 2xh. And of course 2xh/h = 2x, the “correct” value; 2xh would be an infinitesimal and we used h^2 ~ 0. Newton et al, of course, loved rendering functions into power series so extensions of polynomials allowed all sorts of results to be found. Robinson’s approach and other roads to the hyperreals, etc suggest themselves.

    PPS: See https://revisionmaths.com/advanced-level-maths-revision/pure-maths/calculus/differentiation-first-principles

  37. 37
    daveS says:

    Correction: I should rephrase a bit. A construction of the hyperreal numbers exists, so it’s not just johnnyb assuming infinitesimals exist.

  38. 38
    hazel says:

    Thanks, Dave. My approach is very standard, so I can’t speak with any knowledge if that is what johnnby means. Any insight into my comment at 34?

  39. 39
    daveS says:

    KF,

    “Trending infinitesimal”? Using this nonstandard approach, you don’t evaluate a limit, you just choose an infinitesimal ε and evaluate (f(x + ε) – f(x))/ε and finally “round” to the nearest real number.

    Edit:

    I have seen a suggestion that m such that m^2 ~ 0, is a good yardstick for what an infinitesimal is.

    Please let me know when you find one. 😛

  40. 40
    daveS says:

    Hazel,

    Regarding #34, I haven’t figured that out yet. 🙂

  41. 41
  42. 42
    hazel says:

    Re 36

    I don’t know why kf left out the denominator in the definition of derivative in his P.S. in 36: if f(x) = y^2, then f’(x) = limit (as h -> 0) [(x + h)^2 – x^2] / h = [x^2 + 2xh + h^2- x^2]/h = 2x + h. So the limit is 2x as h -> 0. There isn’t any need to consider h^2. This is the very first example I used, with a picture involving the secant as kf described, when teaching students how we find derivatives and what the method means.

    Also, how does h is “trending infinitesimal” differ from h -> 0?

  43. 43
    kairosfocus says:

    DS, no. We use the property of infinitesimals that higher order terms are effectively nil (which is close to a definition!). That’s a property not a rounding. There is a difference. And BTW, this is close to the root of exchanges we had three years ago. KF

    PS: The vanishing of simple and mixed higher order terms is a common device in physics and economics, indeed, we can see the marginality revolution of the latter emerging. The scale of a relevant market is such that an increment of one relevant unit is effectively infinitesimal. Then also in Almagest, there is a reference I am told by which the distance to the fixed stars (the surface of the celestial sphere) is such that by comparison that of earth to sun or size of earth is effectively that of a point. In short, infinitesimals including relative ones, have been lurking in the shadows for a long time.

  44. 44
    daveS says:

    KF,

    DS, no. We use the property of infinitesimals that higher order terms are effectively nil (which is close to a definition!). That’s a property not a rounding. There is a difference. And BTW, this is close to the root of exchanges we had three years ago. KF

    By “rounding to the nearest real number”, I meant taking the standard part of the hyperreal that you obtain from the difference quotient. This has nothing to do with “higher order” terms.

    Here’s how it goes with f(x) = x^2:

    (f(x + ε) – f(x))/ε = (x^2 + 2x ε + ε^2 – x^2)/ε = 2x + ε

    Now we take the standard part of 2x + ε which is 2x.

    ****

    Edit: From wikipedia:

    In non-standard analysis, the standard part function is a function from the limited (finite) hyperreal numbers to the real numbers. Briefly, the standard part function “rounds off” a finite hyperreal to the nearest real.

  45. 45
    kairosfocus says:

    H, I was working within the numerator, showing expansion of the binomial (x + h)^2 = x^2 + 2xh + h^2. I then reduced the higher order infinitesimal term to effective nullity per its quasi-defining property: “f(x + h) = x^2 + 2xh + h^2, and f(x) = x^2 so the difference” — i.e. f(x + h) – f(x) — is 2xh + h^2 ~ 2xh” yielding the final value of the numerator, an infinitesimal. I then brought in the h from the denominator in the comment “And of course 2xh/h = 2x” i.e. I cancelled the h’s popping back up to the conventional number. IIRC, this was the standard first example used in Calculus texts way back. KF

  46. 46
    kairosfocus says:

    DS, I used in effect the language from way back. Looking back, there was a lot that was being used intuitively and by suggestion.I am particularly noting the quasi-defining property that h^2 ~ 0. Similarly for some k of like scale to h, h*k ~ 0. And so forth. KF

    PS: Think of h = 10^-20. h^2 = 10^-40, which vanishes effectively in an addition of terms comparable to 10^-20, being twenty orders of magnitude down. Of course when the subtractions happen to leave a 10^-40 magnitude term as last man standing, you are in hot water.

  47. 47
    daveS says:

    To add a sliver of substance, you’re still working in the real number system here, alluding to 10^-20 for example. The hyperreal numbers are very different.

  48. 48
    kairosfocus says:

    DS, I simply used a description and reported a practice of removing higher order infinitesimal terms, including when that was relative. The same used to happen in error analysis and was justified on pretty much the grounds I just gave. KF

    PS: I add, I am showing a trend. Let h now be 10^-200, making h^2 = 10^-40,000. The second order term is in a runaway race to the floor. We can then proceed to taking h as sufficiently infinitesimal when h^2 ~ 0. We have a quasi-definition, which also implies that H = 1/h will be quasi-infinite. It then is a reasonable next step to formalise into the hyperreals as are now more formally discussed.

  49. 49
    kairosfocus says:

    F/N: I observe, further on in the Wikipedia article:

    The use of the standard part in the definition of the derivative is a rigorous alternative to the traditional practice of neglecting the square[citation needed] of an infinitesimal quantity. Dual numbers are a number system based on this idea. After the third line of the differentiation above, the typical method from Newton through the 19th century would have been simply to discard the dx^2 term. In the hyperreal system, dx^2 not_equal 0, since dx is nonzero, and the transfer principle can be applied to the statement that the square of any nonzero number is nonzero. However, the quantity dx^2 is infinitesimally small compared to dx; that is, the hyperreal system contains a hierarchy of infinitesimal quantities.

    This should help us to see the line of descent in the history of ideas, and why the older practice was workable.

    Standard part reduction in effect works around the problems of how does one get to higher order terms vanishing.

    KF

  50. 50
    daveS says:

    One advantage of the hyperreals is that we don’t have to use the prefix “quasi” so much.

  51. 51
    kairosfocus says:

    DS, I spoke in those descriptive terms to point to trends and reflecting how the whole process was highly informal and in key parts inductive on key case studies. IIRC, Leibniz used an instructive example to develop integration, which in its classical forms, again was largely intuitive. Formalisation came along 50 years ago, and may yet re-open some of the power of the approach of acknowledged giants. KF

  52. 52
    kairosfocus says:

    F/N: I should also note that as the chord tends to the tangent, the deviation from a straight line becomes smaller and smaller, suggesting again that one may set aside higher order terms in the equivalent power series. Where, in the thinking of the Calculus foundation era, such series representation was never far from the surface, as curve fitting per difference terms will show and as the Newton-Raphson method shows. KF

  53. 53
    kairosfocus says:

    PS: Oopsie, 10^400 in 48. 200 ord mag smaller.

  54. 54
    daveS says:

    KF,

    DS, I spoke in those descriptive terms to point to trends and reflecting how the whole process was highly informal and in key parts inductive on key case studies.

    At some point we need to be precise, however.

    When you speak of examining the value of (f(x + h) – f(x))/h for h “trending infinitesimal”, it appears you are considering real numbers h which are really, really small, but not infinitesimal. That is, informally calculating a limit in the usual way. You do not need the hyperreal numbers to accomplish this, so is there a point in bringing them up? Is there any reason to use the word “infinitesimal” when they actually do not exist in the real numbers? My suggestion: you can speak of the limit of the difference quotient as h tends to 0 in the real numbers.

  55. 55
    hazel says:

    re 54: Yes. The hyperreals may have some purpose someplace, but at the level of calculus we have been discussing here – high school and standard college classes, the idea of limit as h -> 0 is standard. While teaching I would use the phrase informal phrase “becomes infinitely small” to refer to the idea of some process approaching a limit, but I think my students understood well what we meant.

  56. 56
    johnnyb says:

    Dang, y’all. I step out for a day, thinking that the conversation is winding down, and it doubles. I’ll try to look through this this weekend.

  57. 57
    kairosfocus says:

    H, I am thinking more, high school. I recently saw a UK 4-5th form GCSE Math text and they are now bringing in Calculus beginnings. That’s 15 – 16+ year olds. For College, I think more formal stuff is on the cards, though I strongly tend to use instructive initial case studies; a structured appreciation of quantity domains seems a part, and hyperreals have a place. As a moderate, Richard Skemp constructivist, I believe in concrete then pictorial then abstract. I also favour a learning spiral approach by which activities “loop through” key themes in building learning, interconnecting and augmenting as topics are built up cumulatively to fulfill adequate understanding and function. KF

    PS: Keisler’s text may be a useful reference: http://www.math.wisc.edu/~keis.....-23-18.pdf

  58. 58
    hazel says:

    kf writes, “I believe in concrete then pictorial then abstract. I also favour a learning spiral approach by which activities “loop through” key themes in building learning, interconnecting and augmenting as topics are built up cumulatively to fulfill adequate understanding and function.”

    Very good – I agree with both the points kf mentions.

  59. 59
    kairosfocus says:

    PPS: I clip from his introductory case:

    The derivative and integral can be described in
    everyday language in terms of an automobile trip. An automobile instrument panel
    has a speedometer marked off in miles per hour with a needle indicating the speed.
    The instrument panel also has an odometer which tallies up the distance travelled in
    miles (the mileage).

    Both the speedometer reading and the odometer reading change with time;
    that is, they are both “functions of time.” The speed shown on the speedometer is
    the rate of change, or derivative, of the distance. Speed is found by taking a very small
    interval of time and forming the ratio of the change in distance to the change in time.
    The distance shown on the odometer is the integral of the speed from time zero to the
    present. Distance is found by adding up the distance travelled from the first use of the
    car to the present.

    Here, we see a familiar case and the concept that the Calculus is the quantitative study of rates and accumulations of change. It also opens the way to seeing how the two key operations are coupled, mutual inverses, hence for example the approach that the Integral is in practical terms often studied as the anti-derivative.

  60. 60
    steve_h says:

    Hazel@33: I agree that “infinitesimal constant” is probably an oxymoron. I don’t know exactly what dx is (I always thought it was a limit), but I am fairly sure that it not a function of x. As I understand it, the quotient rule is used to find the derivative with respect to x of the ratio of two functions which are both functions of x. Here, the value of y depends on the value of x; dy depends on the value of x and one small non-zero quantity dx; but as I understand it, dx does not depend on x.

    Hazel@34: The comment you refer to on page 222 seems a bit strange to me. We are looking for a second derivative with respect to x, and as I understand it, we start by finding a first dirivative (dy/dx) with the respect to x and then differentiate that again – also with respect to x; If the second step involves differentiating a ratio of two functions of x, then we work out that using those functions and their derivatives with respect to x.

    Anyway this is all a bit beyond my pay grade, so I’ll refrain from further participation (helped by the fact that I will now return home to a broken PC)- but thanks for your comments.

  61. 61
    hazel says:

    re 69: that is exactly how I introduced calculus from day 1. You car has a clock, a speedometer, and an odometer. The clock never breaks. If the speedometer breaks, then finding the speed based on the clock and odometer leads to differentiation. If the odometer breaks, finding the distance based on the clock and the speedometer leads to integration.

  62. 62
  63. 63
    hazel says:

    The article at Mind Matters doesn’t allow comments, so it’s hard to say whether people’s reactions have been supportive, questioning (as in this thread), or what, nor whether people have mostly reacted to the article text, or whether they have looked at and evaluated the paper itself.

    I think that perhaps the article is a bit misleading when it writes,

    that elementary calculus contains a longstanding flaw that has been present for over a century. …The flaw they discovered is one of notation. Now, you may be thinking, how can notation be wrong? Well, notation can be wrong when it implies untrue things, especially when notation exists that implies the correct things.

    I am not clear what the “untrue things” are. Probably it means that what is wrong is writing the second derivative as a fraction when it can’t be treated algebraically as a fraction. However, as Dave has pointed out, maybe the mistake is taking dy/dx as a fraction that can be manipulated algebraically. Also, as Steve_h pointed out, the quotient rule is, and I quote Wikipedia, “a method of finding the derivative of a function that is the ratio of two differentiable functions.” I’m not sure it is correct to think of dy and dx as differentiable functions, so I’m not sure using the quotient rule is appropropriate.

    Also, Dave and I have wondered about the explanation at the bottom of page 222: what exactly does d^2(x)/dx^2 mean if it doesn’t mean the second derivative of x, which would be zero.

    The other possible “untrue thing” that the article might be referring to is related to this: “because no one wanted to give differentials that same ontological status as other numbers …”

    I think maybe this points to the difference between seeing dx as a limit of delta x as delta x goes to zero, as in the standard formulation of calculus, and seeing dx as an “infinitesimal” (is that the same as “differential” in the above quote?) as is done in a non-standard approach to calculus. However, if thinking of dy and dx as infinitesimals includes incorporating the hyperreal system, as in the textbook paper kf linked to uses, I think it’s a judgment call as to which is best, but not an issue of which is true or not, nor of any relative difference in “ontological status”, whatever that mean.

    So, to help clarify, here’s a couple of questions for Johnny?

    1. Is your approach thinking of dy and dx as infinitesimals in the hyperreal sense?

    2. How do you understand the meaning of d^2(x)/dx^2?

    3. For that matter, how do you understand the meaning of d^2(y)/dx^2, if it doesn’t mean the second derivative?

    Very interesting discussion, by the way.

  64. 64
    daveS says:

    hazel,

    Regarding your second question, I guess the “numerator” is d^2(x) or d(dx).

    I wonder if turnabout is fair play here—should we insist that dx is a product, therefore we need to use the product rule to evaluate d(dx)? In that case, I guess d^2x = d(dx) = d(d)*x + d*d(x) = 2d^2x. Well, maybe not. 🙂

  65. 65
    PaV says:

    Johnynb@ 23:

    In the paper linking the footnote, you write:

    Therefore, when a compact representation of higher order derivatives is
    needed, this paper will use Arbogast’s notation for its clarity and succinctness.

    This is the distinction I was referring to. Does this distinction, that is, the ‘need’ for a “compact representation of higher order derivatives” germaine to ‘analysis’? Is this when Arbrogast’s D notation becomes important?

  66. 66
    hazel says:

    re 64: Hi Dave. Yes, johnny himself writes d(dx) as d^2(x) in his derivation using the quotient rule. But, as steve_h points out, we take the derivative of a function, and dx is not a function, so I don’t know what the derivative of dx, taken by itself, could mean. When we write d(dx), the d’s aren’t standing for the same thing, it seems to me, so this is not making sense to me.

    Maybe johnny will have time this weekend to explain and/or discuss.

  67. 67
    PaV says:

    Correction: When I ask “Does this distinction . . . “, I should have started with, “Is this distinction . . . “

  68. 68
    daveS says:

    Hazel,

    Yes, that’s a good point. I understood dx to stand for an infinitesimal in dy/dx, but it also seems to mean an operator applied to x, as in d(x); presumably these two usages end up being consistent (or I missed something). I can’t read the paper now to check the details unfortunately.

  69. 69
    hazel says:

    Dave, you write, “presumably these two usages end up being consistent (or I missed something)”

    Or perhaps, in fact, Johnny’s ideas aren’t really justified because they aren’t consistent, even if his algebraic manipulations work out.

    At Mind Matters, the article says,

    Correcting the notation will also likely open doors in fundamental calculus research. Better notation will improve the ability of mathematicians to do advanced work within calculus. Some of those fruits are already apparent, as the authors have already been using the new notation in published work with fruitful results.

    It would be interesting to see these fruitful results, even though there is a reasonable possibility they would be beyond me without more study than I was interested in doing.

    When I taught, I always tried to explain the meaning of formulas and procedures. I think good notation should help support good understanding of the underlying concepts, if possible, as well as efficient manipulation for algebraic purposes. I think that is why I’m puzzled, and perhaps skeptical, of the value of Johnny’s formulation.

    But, again, if it does lead to something new, or a better way of understanding and working with something old, then it has some value.

  70. 70
    johnnyb says:

    Sorry for my delay getting back.

    To start off with, Dave @ 31, I think the formula would be tied pretty closely to Faa di Bruno’s formula, probably using similar components. I have not really dug into this, though. Unfortunately, my time is limited, and I tend to have to pay people to get me to help 🙁 I found my coauthor through Upwork, and it cost quite a bit money (for me, anyway) to get him to help me flesh out the idea fully. I continue to go to him when I need mathematical work done (his work is excellent and original), but I’m almost entirely self-funded, so I can’t always do this.

    As for Weyl algebras, I have not gotten to those, but it sounds interesting. I googled it, and didn’t understand it, but I also didn’t spend a lot of time on it.

  71. 71
    daveS says:

    johnnyb,

    Thanks for the reply—I’ll take a look at your derivation again and see if I can understand how the third derivative version would go.

  72. 72
    johnnyb says:

    32-36, mostly for steve_h:

    Most people mistakenly think (as I once did as well), that (d^2x/dx^2) = 0. This is based on the idea that it would be the derivative of dx/dx, which is 1. However, the actual derivative of dx/dx is “(d^2x/dx^2) – (dx/dx)(d^2x/dx^2)”, which is obviously 0 by inspection.

    Now what d^2x/dx^2 actually *is* is a different story. It actually depends on what x is itself dependent on. There’s actually a really interesting idea from George Montanez on what this can be used for, but I haven’t had the time to look into it or the money to pay someone to.

    The reason why d^2x/dx^2 is thought to be zero is that calculus’ main application point is physics, and, in 19th century physics, the primary independent variable was time, and the conception of time was that it had a constant flow. Therefore, dt was considered constant, and, since it was a constant, its differential was 0, so d^2t/dt^2 *would* be zero. Then the tradition became that the bottom differential was always considered to be a “constant” differential, and therefore zero. I can see the possibility that a truly independent variable’s second differential should always be zero, but I don’t know if I’m fully convinced of that yet. Still trying to decide.

    As for “infinitesimal constants”, it’s not a contradiction in terms. There are infinitely many infinitesimal constants.

    36 – Kairosfocus: ” I have seen a suggestion that m such that m^2 ~ 0, is a good yardstick for what an infinitesimal is”. This is probably “smooth infinitesimal analysis”. I don’t really like this version, as they explicitly deny the law of the excluded middle. Non-standard analysis doesn’t require this. Instead, you have a “standard part” function, which yields the closest real number. If “e” is your base infinitesimal unit, you can have 3e/6e, and the “e”s cancel, yielding 1/2. However, if you have “(3e + 4e^2)/(6e + 2e^2)”, that also equals 1/2, because e^2 is infinitely smaller than e. However, if you just had 5e^2/4e^2 that would yield 5/4 as the e^2’s would cancel (which they wouldn’t if they were equal to zero). I haven’t seen how SIA treats second derivatives, but I am indeed curious.

  73. 73
    johnnyb says:

    Hazel @ 55 – I teach infinitesimals to my high school students, but at the end of the course on calculus. It might be fun, though, to try one year to do it first. I’m not sure which one would be more straightforward. My present approach is to use real-ish numbers at the beginning (I DON’T teach limits at the beginning), and then make it rigorous at the end with infinitesimals.

    It might be interesting to teach infinitesimals first, because the rules are actually really easy and straightforward. I’ll have to think on that approach. Anyway, for anyone wanting an overview of my approach and how it differs from normal, take a look here.

  74. 74
    johnnyb says:

    Hazel @ 63 –

    I think I answered the d^2x just above @ 72, but for the question of whether dy/dx are infinitesimals, (a) yes I treat them that way, but (b) I’m not sure if that is required for my notation to be correct.

    Dave @ 64 –

    Good question. If you look at the paper carefully, notice that all of the “d”s are in roman type, and all of the variables are in italic. This is to visually prevent confusion (though I’m not sure how much it helps). Usually, standard functions are written in roman type and variables in italic. Just like many people don’t put parenthesis around sin(x), but instead typeset “sin” in roman font and “x” in italic. I feel that, with dx, it is acting sufficiently like its own variable to warrant being stuck together as a unit, but I do like typesetting the “d” and the “x” differently so that it is clear that it is really “d(x)”.

    It *is* true that, sometimes, in mathematics, you have to double-up symbology. However, in the case of dy/dx, there is literally no reason for the symbology of division, except to make people think you are dividing. That is, we have Arbogast’s D() notation that we could use, but we don’t. Why not? Because we want people to look at this like a fraction. If we didn’t, there are a ton of other ways to write the derivative. That we do it as a fraction is hugely suggestive, especially, as I mentioned, there exists a correct way to write it as a fraction.

  75. 75
    daveS says:

    johnnyb, et al,

    Now what d^2x/dx^2 actually *is* is a different story. It actually depends on what x is itself dependent on.

    So I guess the “value” of d^2x/dx^2 is not clear at this point? I can see that, as d^2x/dx^2 is not the second derivative of x wrt x in your notation.

  76. 76
    johnnyb says:

    Dave @ 68 –

    Yes, d(x) is *both* a function (more like an operator) of x which, in the normal circumstances of calculus (smooth/continuous/etc) normally yields an infinitesimal value. The way to think of it is this. Imagine a variable “q” which really is the independent variable. The d() evaluates its interior both at “q” and “q + epsilon” and subtracts them. d(x) then refers to whatever change happens in x between “q” and “q + e”. d(x^2) = 2x dx, which means that the difference of x^2 depends on both where “x” is at the moment, and the output of d(x).

  77. 77
    johnnyb says:

    Dave @ 75 –

    If you mean that you need to know about other possible involved variables to understand d^2x/dx^2, then, yes, that is what I mean.

  78. 78
    daveS says:

    johnnyb,

    If there are no other variables, so x is _the_ independent variable, I don’t know how to evaluate d(dx) I guess. That is, I don’t know how to plug x + ε and x into dx. On the other hand, if dx were a function, it should be constant, I would assume? If so, this would imply d(dx) = 0. Is this correct?

  79. 79
    hazel says:

    Glad to see the revived discussion, although lots of points are being covered at once. I’d like to start with this one:

    Johnny, you write, “However, in the case of dy/dx, there is literally no reason for the symbology of division, except to make people think you are dividing.”

    Both kf and I have outlined the standard way one introduces the meaning of the definition of derivative f'(x) = limit (as h ->0) (f(x + h) – f(x))/h by thinking of the slope between the fixed point P = (x, f(x)) and Q = (x + h, f(x + h)), and then letting Q approach P by letting h -> 0. Slope is delta y/delta x, a ratio, and so dy/dx means the limit of that ratio.

    We often use mathematical symbols to mean related things: all algebra I teachers know the problems in explaining the three meanings of the negative sign: subtract, negative of, and negative number, although they all revolve around the same idea. Likewise, a/b can be thought of as division but it can also be thought of as a ratio between numbers. Just because we write dy/dx as a ratio using the slash and not, for instance, the colon a:b, doesn’t mean that there is no difference between its meaning as a ratio and its meaning as a quotient.

    So I disagree with your statement above: dy/dx is not meant to indicate a division, but the symbol is appropriate because it is being used to represent the ratio of the instantaneous, infinitely small, changes in x and y at a point

  80. 80
    hazel says:

    One more comment, which is similar to Dave’s:

    Johnny, you write, “Now what d^2x/dx^2 actually *is* is a different story. It actually depends on what x is itself dependent on.”

    But x is the independent variable. If you want to say that x is dependent on some other variable, you have pushed the problem one step back, but haven’t made it go away.

    I think at some point you need a better response to the objection that you mention at the bottom of page 222. Sayings that d^2x/dx^2 reduces to zero is “not necessarily true”, but not being able to say that it is definitely not zero, and why, is a problem for your approach that needs to be solved, I think.

  81. 81
    johnnyb says:

    DaveS –

    Yes, if x *is* the independent variable, and there is no possibility of x being dependent on something else, then d^2x (i.e., d(d(x))) IS zero.

    Hazel –

    Ratios and quotients have the same rules for manipulation. So, if you are thinking of it as a ratio vs a quotient, we aren’t actually in disagreement.

  82. 82
    hazel says:

    But if x is the independent variable, which is what we assume when we write y = f(x) and then y’ = dy/dx, then the last term of your derivation is zero, so your derivation just reduces to the standard notation for the second derivative.

    Are you saying your notation is only different than the standard notation if x is not the independent variable, and some other unnamed variable is?

  83. 83
    daveS says:

    hazel,

    That’s an interesting question. And even if x did depend on t, we’re differentiating with respect to x, so the t dependence would be irrelevant.

  84. 84
    johnnyb says:

    Hazel –

    Yes, that is what I’m saying.

    DaveS –

    The difference matters, because it prevents you from using the differential algebraically. Otherwise, you would need to carry around the variable with which you are differentiating. For instance, instead of dy, you would need d_x(y)/d_x(x), because the d(y) would be a *different* d(y) than the one for a d_t(y)/d_t(t). This is similar to what I’m developing for partial differentials.

  85. 85
    daveS says:

    johnnyb,
    Thanks, this is more involved than I thought. I guess the idea is that if you perform algebraic manipulations such as dy/dx = 2x => dy = 2xdx (separating dy and dx), then the dy has no “memory” of where it came from, that is, whether it was the numerator of dy/dx or dy/dt, e.g. Am I in the right ball park?

    Edit: Referring to #77, aren’t there always infinitely many other variables that could be involved, in principle? y could be a function of x, then x is a function of t1, t1 is a function of t2, etc.

  86. 86
    kairosfocus says:

    Folks,

    Money shot comment by JB:

    JB, 74: we have Arbogast’s D() notation that we could use, but we don’t. Why not? Because we want people to look at this like a fraction. If we didn’t, there are a ton of other ways to write the derivative. That we do it as a fraction is hugely suggestive, especially, as I mentioned, there exists a correct way to write it as a fraction.

    This is pivotal: WHY do we want that ratio, that fraction?

    WHY do we think in terms of a function y = f(x), which is continuous and “smooth” in a relevant domain, then for some h that somehow trends to 0 but never quite gets there — we cannot divide by zero — then evaluate:

    dy/dx is

    lim h –> 0

    of

    [f(x + h) – f(x)]/[(x + h) – x]

    . . . save that, we are looking at the tangent value for the angle the tangent-line of the f(x) curve makes with the horizontal, taken as itself a function of x, f'(x) in Newton’s fluxion notation.

    We may then consider f-prime, f'(x) as itself a function and seek its tangent-angle behaviour, getting to f”(x), the second order flow function. Then onwards.

    But in all of this, we are spewing forth a veritable spate of infinitesimals and higher order infinitesimals, thus we need something that allows us to responsibly and reliably conceive of and handle them.

    I suspect, the epsilon delta limits concept is more of a kludge work-around than we like to admit, a scaffolding that keeps us on safe grounds among the reals. After all, isn’t there no one closest real to any given real, i.e. there is a continuum?

    But then, is that not in turn something that implies infinitesimal, all but zero differences? Thus, numbers that are all but zero different from zero itself considered as a real? Or, should we be going all vector and considering a ring of the close in C?

    In that context, I can see that it makes sense to consider some K that somehow “continues on” from the finite specific reals we can represent, let’s use lower case k, and confine ourselves to the counting numbers as mileposts on the line:

    0 – 1 – 2 . . . k – k+1 – k+2 – . . . . – K – K+1 – K+2 . . .

    {I used the four dot ellipsis to indicate specifically transfinite span}

    We may then postulate a catapult function so 1/K –> m, where m is closer to 0 than ANY finite real or natural we can represent by any k can give.

    Notice, K is preceded by a dash, meaning there is a continuum back to say K/2 and beyond, descending and passing mileposts as we go: K-> K-1 –> K-2 . . . K/2 – [K/2 – 1] etc, but we cannot in finite successive steps bridge down to k thence to 1 and 0.

    Where, of course, we can reflect in the 0 point, through posing additive inverses and we may do the rotation i*[k] to get the complex span.

    Of course, all of this is to be hedged about with the usual non standard restrictions, but here is a rough first pass look at the hyperreals, with catapult between the transfinite and the infinitesimals that are all but zero. Where the latter clearly have a hierarchy such that m^2 is far closer to 0 than m.

    And, this is also very close to the surreals pincer game, where after w steps we can constrict a continuum trough in effect implying that a real is a power series sum that converges to a particular value, pi or e etc. then, go beyond, we are already in the domain of supertasks so just continue the logic to the transfinitely large domain, ending up with that grand class.

    Coming back, DS we are here revisiting issues of three years past was it: step along mile posts back to the singularity as the zeroth stage, then beyond as conceived as a quasi-physical temporal causal domain with prior stages giving rise to successors. We may succeed in finite steps from any finitely remote -k to -1 to 0 and to some now n, but we have no warrant for descent from some hyperreal remote past stage – K as the descent in finite steps, unit steps, from there will never span to -k. That is, there is no warrant for a proposed transfinite quasi-physical, causal-temporal successive past of our observed cosmos and its causal antecedents.

    Going back to the focus, if 0 is surrounded by an infinitesimal cloud closer than any k in R can give by taking 1/k, but which we may attain to by taking 1/K in *R, the hyperreals, then by simple vector transfer along the line, any real, r, will be similarly surrounded by such a cloud. For, (r + m) is in the extended continuum, but is closer than any (r + 1/k) can give where k is in R.

    The concept, continuum is strange indeed, stranger than we can conceive of.

    So, now, we may come back up to ponder the derivative.

    If a valid, all but zero number or quantity exists, then — I am here exploring the logic of structure and quantity, I am not decreeing some imagined absolute conclusion as though I were omniscient and free of possibility of error — we may conceive of taking a ratio of two such quantities, called dy and dx, where this further implies an operation of approach to zero increment. The ratio dy/dx then is much as conceived and h = [(x +h) – x] is numerically dx.

    But dx is at the same time a matter of an operation of difference as difference trends to zero, so it is not conceptually identical.

    Going to the numerator, with f(x), the difference dy is again an operation but is constrained by being bound to x, we must take the increment h in x to identify the increment in f(x), i.e. the functional relationship is thus bound into the expression. This is not a free procedure.

    Going to a yet higher operation, we have now identified that a flow-function f'(x) is bound to the function f(x) and to x, all playing continuum games as we move in and out by some infinitesimal order increment h as h trends to zero. Obviously, f'(x) and f”(x) can and do take definite values as f(x) also does, when x varies. So, we see operations as one aspect and we see functions as another, all bound together.

    And of course the D-notation as extended also allows us to remember that operations accept pre-image functions and yield image functions. Down that road lies a different perspective on arithmetical, algebraic, analytical and many other operations including of course the vector-differential operations and energy-potential operations [Hamiltonian] that are so powerful in electromagnetism, fluid dynamics, q-mech etc.

    Coming back, JB seems to be suggesting, that under x, y and other quasi-spatial variables lies another, tied to the temporal-causal domain, time. Classically, viewed as flowing somehow uniformly at a steady rate accessible all at once everywhere. dt/dt = 1 by definition. From this, we may conceive of a state space trajectory for some entity of interest p, p(x,y,z . . . t). At any given locus in the domain, we have a state and as t varies there is a trajectory. x and y etc are now dependent.

    This brings out the force of JB’s onward remark to H:

    if x *is* the independent variable, and there is no possibility of x being dependent on something else, then d^2x (i.e., d(d(x))) IS zero

    Our simple picture breaks if x is no longer lord of all it surveys.

    Ooooopsie . . .

    Trouble.

    As, going further, we now must reckon with spacetime and with warped spacetime due to presence of massive objects, indeed up to outright tearing the fabric at the event horizon of a black hole. Spacetime is complicated.

    A space variable is now locked into a cluster of very hairy issues, with a classical limiting case.

    Now, in that context, could JB draw out further what he is pondering?

    KF

  87. 87
    kairosfocus says:

    DS, it may be worse than that, we KNOW dy is bound up in the issue that quasi-spacetime is all mutually bound up. A total differential — per the basic expression from partial differentiation — is generally bound to a context of underlying influence variables with their own change behaviours and so we have to reckon with classical limiting cases. Indeed, dy = curly-dy/dx times dx plus a chain of similar factors suggests interesting questions. For example, is it meaningful to reduce that to one factor, x only? And, is x as independent as we thought? Where, too, what does it really mean for x to be independent — what does it cumulatively imply? Also, spatial variables are tied to state space trajectories that are influenced by time or other similar underlying parameters. KF

  88. 88
    kairosfocus says:

    DS, is a total differential, dy really total? What lurks behind it? KF

  89. 89
    hazel says:

    Dave, at 85 you write, “Aren’t there always infinitely many other variables that could be involved, in principle? y could be a function of x, then x is a function of t1, t1 is a function of t2, etc.”

    I think this point is the same as I was making at 80 when I wrote, “But x is the independent variable. If you want to say that x is dependent on some other variable, you have pushed the problem one step back, but haven’t made it go away.”

  90. 90
    kairosfocus says:

    H:

    I see something:

    H, 79: dy/dx is not meant to indicate a division, but the symbol is appropriate because it is being used to represent the ratio of the instantaneous, infinitely small, changes in x and y at a point

    By definition, a point is without scale. So, could it be that there are no changes at a point, though there may be changes around it?*

    That is, we have x, and with it f(x), which has associated a slope that is expressed in the tangent at the point. That tangent-slope itself varies in general as x takes different values so we may identify with f, f ‘(x) a flow function, then higher order flow functions. I put a space to allow clear visibility of the prime.

    In that context, we are looking at changes that occur so close to x that they are closer to x than any value (x + 1/k) once k is finite. That is, we have a hyperreal cloud surrounding x and are looking at values like (x + 1/K), K being beyond any finite k but connected to the reals by some sort of transfinite extension sufficiently close that m = 1/K is a value in the continuum around 0 but closer than any 1/k, k finite can give us.

    Is this part of what we are fishing for?

    KF

    *PS: We then ask, how are we moving around x, which brings in questions of onward influence. I don’t think we can avoid them, though in effect we may stipulate that x is considered to change smoothly similar to a steadily flowing time domain. I used to talk in terms of road cuts, where we can see how the height varies along the road, but that already smuggles in comparing distinct locations along the road. Something is allowing us to hop in location, and we should allow it to surface. Total differential dy reflects its influences. What does dx really mean, especially where x is independent?

  91. 91
    daveS says:

    hazel,

    Yes, it’s exactly the same point.

    The first 2/3 of johnnyb’s post #72 has thrown me for a loop, I confess.

    I am thinking of this situation strictly mathematically, where the independent variable is what it is, period. There is no philosophical musing about whether time is the “primary” independent variable. So if a problem is specified completely, then it should be crystal clear what d^2x/dx^2 is.

    Edit: As an example, suppose y = x^2 and x = 3t + 1. What is d^2x/dx^2?

    And what are d^2x and dx^2 separately?

  92. 92
    hazel says:

    Dave, you write, “I am thinking of this situation strictly mathematically, where the independent variable is what it is, period. There is no philosophical musing about whether time is the “primary” independent variable. So if a problem is specified completely, then it should be crystal clear what d^2x/dx^2 is.”

    I agree completely.

  93. 93
    daveS says:

    hazel,

    I’m just guessing here, but perhaps in johnnyb’s process, you calculate all differentials with respect to the “ultimate” independent variable at the start. And then somehow d^2x/dx^2 turns out to be nonzero in some of those cases where x depends on another variable.

  94. 94
    hazel says:

    But what if it’s turtles all the way down? 🙂

  95. 95
    daveS says:

    hazel,

    I’m back at my computer, so I can view the paper again. I think johnnyb addresses my questions to some extent there, but I still am mystified by exactly what this d^2x/dx^2 is in the simplest of cases. Perhaps johnnyb can show us what it is in my example from post #91 or in his example(s) from the paper.

  96. 96
    kairosfocus says:

    DS,

    In 72, JB makes an ontological point:

    what d^2x/dx^2 actually *is* is a different story. It actually depends on what x is itself dependent on . . . . I can see the possibility that a truly independent variable’s second differential should always be zero, but I don’t know if I’m fully convinced of that yet. Still trying to decide.

    That’s being not quantity or value in the first place, leading to a discussion on what value it should take, if it is always and everywhere nil. He is not sure on genuinely independent variables.

    He suggests, that the nature of the second order differential is something he has not got a direct or expert answer to.

    Now, too, he brings to bear a relevant issue, the nature of the absolute differential dy and how we got there. Context counts and dy from x as independent is not the same in meaning as dy from t as independent variable. Or from a set of variables that leads to the partial differential contribution sum we know from further work on partial differentials. We know y is a dependent, so on what does it depend is relevant.

    All of this then leads to infinitesimals, and thence to constants, i.e. specific infinitesimals of form m = 1/K, K beyond any finite k in R. m is closer to 0 than any finite k in R similarly taken under the reciprocal can give, k GRT 1. Something like dx or dy is in that range but is also the result of an operation of limits as chords go to tangents.

    KF

  97. 97
    daveS says:

    KF,

    He suggests, that the nature of the second order differential is something he has not got a direct or expert answer to.

    I’ll let johnnyb address this if he chooses, but I would *assume* this has all been hammered out if we’re issuing corrections to the notation of elementary calculus. I know what d^2x/dx^2 means in the “old” system. It should be utterly routine to explain it in the new, improved system.

    A minimal example where d^2x/dx^2 is nonzero might be a good starting point.

  98. 98
    kairosfocus says:

    H & DS,

    The differences between d^2x and dx^2 are interesting. Especially if behaving like a value is part of it.

    d[d(x)] is obviously a second order operation giving an image, repeat of the d operation.

    By contrast, dx times dx seems to be some sort of multiplication of a one stage go to differential operation, if they are to operate algebraically — for the sake of argument.

    That suggests that ontological difference is relevant, though obviously they may hold the same resultant value, maybe for all x. What something is as to essence and in logically accurate description is not the same as what quantitative value it holds.

    If there is an underlying implicit process, that would possibly make a difference: how do we creep h = [(x+h) – x] to not-quite-zero? Operationally, not merely as an idea. What is the product h^2, once it is at its not quite zero target; is that target properly dx?

    Likewise, on d^2 [y] what is the differential on dy = lim h-> 0 of [f(x+h) – f(x)], i.e. d^2[y]? Then, substitute x in this. What, then is d^2[x]? Thence, the ratio d^[x]:[dx]^2?

    Strange indeed, if valid.

    However, JB is actually doing something more direct, treating dy/dx as itself a function f ‘ (x) where this is u[x]/v[x], with substitution instances dy and dx for u and v. From this he puts forward on that calculation:

    d/dx[ dy/dx] = d^2y/dx^2 – [dy/dx]*[d^2x]/dx^2]

    Obviously, where x is indeed the final, independent variable the second term on RHS is dy/dx * 0 –> 0, yielding the result that is well known. This, on grounds that dx/dx = 1 and by substitution d/dx [1] = 0. But this is a reduction, not an equivalence. By comparison, in accounting, there is a classic equation that obtains for the balance sheet at all times. But the structure within the sides can make all the difference in the world. And, changing that internal structure ill-advisedly can be ruinous. Where the intra-side structures stand can make a significant difference, so numerical equality LHS = RHS is not the whole story.

    Now, what happens when y is now x itself?

    d/dx[ dx/dx] = d^2x/dx^2 – [dx/dx]*[d^2x]/dx^2]

    If dx/dx is 1, it reduces to 0 on RHS, but that is on substitution. On LHS, the slope of a constant is an operation that yields 0. What is so quantitatively and what is going on ontologically are it seems not quite identical.

    Curious, it seems.

    And, suggestive of why going to the epsilon-delta form would seem clearer conceptually.

    KF

  99. 99
    hazel says:

    kf writes, “However, JB is actually doing something more direct, treating dy/dx as itself a function f ‘ (x) where this is u[x]/v[x], with substitution instances dy and dx for u and v.”

    Steve_h pointed this out, and I seconded the issue: I think this is the heart of the flaw in Johnny’s thinking. As I quoted Wikipedia, the quotient rule is about the quotient of two differentiable functions, here labeled by kf as u and v. But in what way is it legitimate to consider dy and dx as functions us and v? And if x is considered the independent variable, as Johnny has acknowledged, d^2x/dx^2 is 0, so his process just produces the standard notation for the second derivative.

  100. 100
    johnnyb says:

    As for asking for a concrete example of d^2x/dx^2, let me just give the example from the paper, but write it in a different way.

    y = x^3
    dy = 3x^2 dx
    d^2y = 6x dx^2 + 3x^2 d^2x

    x = t^2
    dx = 2t dt
    d^2x = 2 dt^2 + 2t d^2t

    So, d^2x/dx^2 = (2 dt^2 + 2t d^2t)/(4t^2 dt^2)

    FYI – my brain is not firing on all cylinders this morning, and I’ve already had to edit this twice, so apologies if there are any math errors here.

  101. 101
    daveS says:

    johnnyb,

    Thanks, that helps.

    If we assume x = s^3, say, don’t we get yet another expression for d^2x/dx^2? Is it comparable to what you just obtained?

    (6sds^2 + 3s^2d^2s)/9s^4ds^2 is what I got, which could be reduced to (2ds^2 + sd^2s)/3s^3ds^2, or even:

    2/3s^3 + d^2s/3s^2ds^2 (?!)

  102. 102
    daveS says:

    PS:

    Perhaps if x = s^n, then:

    d^2x/dx^2 = (n – 1)/ns^n + d^2s/ns^(n – 1)ds^2

  103. 103
    hazel says:

    I don’t understand. In the example at 100, in your derivation of d^2x/dx^2, you don’t even use the equation about y. Also, the independent variable of the second equation is t, not x, so it would be d^2t/dt^2 that we would be interested in.

  104. 104
    johnnyb says:

    Hazel –

    That component becomes important *only* when other equations are involved. So, your question about d^2t/dt^2 can only be answered if I know additional things about t. The “appendage”, if you will, tells you how to incorporate new information into the existing formula.

    As a friend of mine who was an early reviewer put it, the “-(dy/dx)(d^2x/dx^2)” is kind of a “hold in reserve” type of a thing. It springs into action when we know something new about x. I’m hedging my bets, saying that, if I find out something new about “x”, this is how it is going to be combined with my existing knowledge.

    DaveS –

    That’s the basic idea, but I don’t have a pen/paper to verify your exact calculation. Yes, different relationships between x and other variables will imply different things about how x changes, which will lead to different formulas.

  105. 105
    hazel says:

    Johnny, you write,

    As a friend of mine who was an early reviewer put it, the “-(dy/dx)(d^2x/dx^2)” is kind of a “hold in reserve” type of a thing. It springs into action when we know something new about x. I’m hedging my bets, saying that, if I find out something new about “x”, this is how it is going to be combined with my existing knowledge.

    It seems we have agreed that if we simply have y= f(x), with x as the independent variable, then d^2x/dx^2 = 0, and so y’’ = d^2y/dx^2, as in standard notation.

    The quote above then says that if you find out “something new about x”, that would be expressed in the “-(dy/dx)(d^2x/dx^2)” part of your notation. Can you give an example? If you knew something new about x, such as some function x = f(t), how would that be incorporated into the second derivation of the original function involving y and x, using your quotient notation?

    Could you illustrate using the two functions at 100, and using the quotient form of your notation?

  106. 106
    johnnyb says:

    Hazel –

    We seem to be going around and around and saying the same thing. I’m not really sure what you are trying to get at. I’ll go over it this time, but after this it seems we are just wasting time talking past each other.

    If you have y = f(x), but also x = g(t), then x *is not* an independent variable. That’s why you need to keep the rest of that in – since you don’t know everything about the world, it may wind up that what you thought was the independent variable wasn’t, so you have to hang on to the extra. Think about integration. If I integrate dy = dx, I will get y = x + C. The “C” is there because there are facts that I don’t know about the original equation that I would need to know to fill out C. It *might* be that C is zero, and therefore discardable. But I would be wrong if I just left it out because, perchance, it might wind up being zero. Likewise, it might turn out that d^2x = 0, and I can throw out the whole right-hand side. But I need to keep it in there, because I don’t really have independent knowledge about the entire state of the universe.

    So, with @100, if I wanted the second derivative of “y” with respect to “t”, that would be “(d^2y/dt^2) – (dy/dt)(d^2t/dt^2)”. Using the original notation (where d^2x/dx^2 = 0), then d^2y/dx^2 = 6x. dx/dt = 2t. So, if we tried to combine these facts, we could square dx/dt and get (dx^2/dt^2), which can then be multiplied by our original second derivative:

    (d^2y/dx^2)(dx^2/dt^2) = d^2y / dt^2

    So, let’s now do it with the other side of the equation:

    6x * (2t)^2 = 24x * t^2

    Since x = t^2, this becomes 24t^4.

    However, that is *NOT* the second derivative of y with respect to t. If we substitute it in at the beginning, we would have y = t^6, and the first derivative would be 6t^5 and the second derivative would be 30t^4. We’ve *lost* 6t^4 somewhere along the way. Where did it go? Well, it was hiding in the -(dy/dx)(d^2x/dx^2) term. Since our original second derivative didn’t take into account the fact that x might not actually be independent, when we tried to combine it with another derivative it failed. Thus, it isn’t algebraically manipulable, because algebraic manipulations can affect which variable is a function of which other variable.

    For instance, the new notation allows you to discover an inverse function theorem for the second derivative by purely algebraic means (note – we originally thought we discovered this new, but, while I haven’t found it in either any textbooks or papers, I have seen it floating on the web on occasion, and most reviewers thought it was new as well). You can determine that:

    x” = -y”(1/y’)^3

    You can do this by pure algebraic manipulation of terms.

    y = f(x)
    y’ = dy/dx
    y” = d^2y/dx^2 – (dy/dx)(d^2x/dx^2)

    So, if we’re trying to establish a formula for x”, it would look like “d^2x/dy^2 – (dx/dy)(d^2y/dy^2)”. So, notice that for y” I have three dx’s on the bottom, and for x” I have three dy’s on the bottom. Therefore, to cover for this, I simply multiply by (dx/dy)^3. That gives me:

    y” (dx/dy)^3 = (dx/dy)(d^2y/dy^2) – (d^2x/dy^2)

    This is close, in fact, it is only off by a negative sign. So, multiply both sides by -1:
    -y” (dx/dy)^3 = (d^2x/dy^2) – (dx/dy)(d^2y/dy^2)

    Now, on the left-hand side, note that (dx/dy) can mean either x’ or (1/y’). So, we can just choose (1/y’), and this becomes our formula:

    -y” (1/y’)^3 = (d^2x/dy^2) – (dx/dy)(d^2y/dy^2)

    But you can only do this if you don’t assume ahead-of-time that x is an independent variable. Note that it doesn’t make sense to find the second derivative of x with respect to y if x is an independent variable. Algebraic manipulation assumes that you are going to put a hold on what is dependent on whatever else.

  107. 107
    daveS says:

    I’ll try using johnnyb’s example from the paper to see how this works.

    If y = x^3, and x is the independent variable, then d^2x/dx^2 = 0, so the second derivative of y wrt x is just d^2y/dx^2, and is 6x.

    Now if we know x = t^2, then d^2x/dx^2 = (2 dt^2 + 2t d^2t)/(4t^2 dt^2) = 1/2t^2 + d^2t/2tdt^2, so this time the second derivative does not collapse to the “usual” form.

    We have a d^2t/dt^2 in this term, and if t depends on another variable, we have to go through another round of this process (and stop only when we hit the “ultimate” independent variable). If t is _the_ independent variable, then d^2t/dt^2 = 0 and we’re done.

    In both cases, the second derivative of y wrt x is 6x. It’s just that when we have to use johnnyb’s formula (the one on the whiteboard), the terms in that expression change in a coordinated way so as to still equal 6x.

  108. 108
    daveS says:

    johnnyb,

    Is it fair to say that the places where you see the benefits of this approach (so far) involve composition of functions and/or the chain rule?

  109. 109
    johnnyb says:

    Dave –

    I use the chain rule a lot because it is the easiest place to discuss the benefits of this method and how it allows for an algebraic manipulation. However, the practical benefits so far have actually been developing the inverse function theorem, which doesn’t involve the chain rule. Even more so, the bigger benefits I think will come from the exploratory potentials. For instance, is there some way we might be able to put d^2y/d^2x to use (pay close attention to the formula because it is not the one you normally see). Does it have a meaning that we can use?

    Additionally, I think that knowing the formula will help people recognize when something is displaying “dependency” behavior. As I mentioned earlier, George Montanez has an interesting idea about this which I haven’t had time to follow up on yet. It might also help in numerical applications, though I’m not sure.

    In any case, I think the biggest benefit will come from the exploratory abilities of the new notation to give us real information about the nature of differentials.

  110. 110
    daveS says:

    johnnyb,

    Yes, I think the algebraic proof of the formula for the second derivative of f^-1 is the most compelling demonstration so far. IIRC, the “usual” proof does use the chain rule, and does get a bit messy. Your proof looks slicker.

  111. 111
    hazel says:

    I’ll quit soon, Johnny, as I don’t want you to waste your time.

    Also, it is really hard to follow paragraphs 3-7 of post 16, partially because of having type in plain text. I’ve tried writing out hings in the normal way, but can’t follow some of your skipped steps. I especially don’t know where the part after the sentence “So, let’s now do it with the other side of the equation:” comes from.

    But you conclude the the “missing”6t^4 was “hiding” in the “-(dy/dx)(d^2x/dx^2)” term. How? What would I substitute where to show that that term was equal to 6t^4?

  112. 112
    hazel says:

    Oops, at 111, I meant post 106, not post 16.

  113. 113
    daveS says:

    johnnyb & hazel,

    This sentence from post #106 illustrates perhaps a philosophical difference which is getting in our way:

    If you have y = f(x), but also x = g(t), then x *is not* an independent variable.

    Hazel pointed out that given a function y of x, you can always reparameterize by introducing a new variable t such that x is a function of t. For example, if y = x^2 (for all real x, say), then for each x you can set t = arctan(x) and now x(t) = tan(t). If x was an independent variable before, that shouldn’t change just because I artificially introduced a “dependence” of x on the new parameter.

    Similarly, even if we think of y as a function of x, we can often differentiate x with respect to y. We can even differentiate sin(x) with respect to sqrt(y), and so on.

    I still don’t have a handle on this, but I don’t think dependent vs independent variables are the key issue.

  114. 114
    johnnyb says:

    DaveS –

    Perhaps a terminology would suffice. “Conditionally Independent” vs. “Unconditionally Independent”. If y = f(x), x is a “conditionally independent variable”. If there is literally no other parameter upon which x depends, then we can call it an “unconditionally independent variable.” My notation is needed on “conditionally independent” variables, but not necessarily on “unconditionally independent” variables, for which the weird part of the term drops to zero.

  115. 115
    johnnyb says:

    Hazel –

    y = x^3
    dy/dx = 3x^2
    (d^2y/dx^2) – (dy/dx)(d^2x/dx^2) = 6x

    x = t^2
    dx/dt = 2t
    (d^2x/dt^2) – (dx/dt)(d^2t/dt^2) = 2

    Let’s start from “(d^2y/dx^2) – (dy/dx)(d^2x/dx^2) = 6x”. Now, we want dt’s on the bottom. Let’s begin by multiplying the whole thing by (dx/dt)^2. Since dx/dt = 2t, this yields:

    d^2y/dt^2 – dy/dx (d^2x)/dt^2 = 6x * (2t)^2

    Since x = t^2, this becomes

    d^2y/dt^2 – dy/dx (d^2x)/dt^2 = 6t^2 * 2t = 24t^4

    Now, note that our right side is the “wrong” derivative, but the left side isn’t yet in the form we need for the derivative. How do we close the gap? Interestingly, if we multiply y’ and x”, we will get the following term: “(dy/dx)((d^2x/dt^2) – (dx/dt)(d^2t/dt^2))” which, if you distribute, becomes “(dy/dx)(d^2x/dt^2) – (dy/dt)(d^2t/dt^2)” If we add this to both sides, the left-hand side will become the second derivative!

    (d^2y/dt^2) – (dy/dx) (d^2x/dt^2) + (dy/dx)(d^2x/dt^2) – (dy/dt)(d^2t/dt^2) = (d^2y/dt^2) – (dy/dt)(d^2t/dt^2)

    This is perfectly valid, as long as we do it to both sides. So, if we multiply y’ and x” on the right-hand side, that is “3x^2 * 2”. Since x = t^2, this becomes “3t^4 * 2t = 6t^4”. If we add that to the right side (because we added its equivalent to the left side), the right side becomes “24t^4 + 6t^4 = 30t^4”, which is the correct answer.

    All of these are pure algebraic manipulations, involving cancelling differentials, and doing things to both sides of the equation. Knowing Faa di Bruno’s formula helps you to *know* what to do (it would be harder to guess what to do without it), but it doesn’t affect the valid operations on the object.

    Hopefully that clears it up.

  116. 116
    kairosfocus says:

    JB, an important concept, the admission of ignorance in the process. x has an identity with its own core characteristics, some of which we may not know. In that light, we should be aware of the possibility that we are not exploring such or may be ignorant of such. Where, too, it is not an adequate response that we may always insert a definition that x is a variable dependent on some imaginary parameter, we are dealing with identity, cause and characteristics here. KF

  117. 117
    hazel says:

    Thanks, Johnny. I know 115 some time to type up, but I followed what you were doing. Hopefully having an example will help if others ask the same questions that I did.

  118. 118
    daveS says:

    johnnyb,

    Perhaps a terminology would suffice. “Conditionally Independent” vs. “Unconditionally Independent”. If y = f(x), x is a “conditionally independent variable”. If there is literally no other parameter upon which x depends, then we can call it an “unconditionally independent variable.” My notation is needed on “conditionally independent” variables, but not necessarily on “unconditionally independent” variables, for which the weird part of the term drops to zero.

    I believe I understand what you’re saying here, and it does make sense, although it also probably exposes philosophical differences once again.

    I would say that the mathematician (or quasi-mathematician, in my case) makes the decision about what the independent variable is, and that’s that. Maybe more to the point, she decides that she is going to differentiate y with respect to x, e.g., so she treats x as the independent variable.

    Further, if y is a function of x and x is a function of t, one can still differentiate y with respect to x, and the “answer” will not have anything to do with how x varies as a function of t. So the apparent dependence of d^2x/dx^2 on yet another variable (say t, as you illustrated in post #100) is a little puzzling, although presumably not a problem in the end—I guess the t’s must cancel when you evaluate your entire formula for d^2y/dx^2.

    That’s where I’m at now. I still have some questions about d^2x/dx^2.

    1) In order to evaluate d^2x/dx^2, is it necessary that we have expressed x as a function of another parameter t? (That is, is there any way other than the method you have demonstrated?)

    2) Is the value of d^2x/dx^2 dependent at all on the way x varies as a function of its parameter? Above I tried to calculate d^2x/dx^2 twice, once where x = t^2 and once where x = s^3, but I don’t know whether the two expressions I obtained are equal. So my question is, can we prove that this equality holds?

    d^2x/dx^2 = (2 dt^2 + 2t d^2t)/(4t^2 dt^2) = (6s ds^2 + 3s^2 d^2s)/(9s^4 ds^2)

    (After typing that equation out, I realize that s = t^(3/2), which we could use to relate the s’s, t’s, ds’s, and dt’s above. I suspect one can then prove the above equation holds, but I’ll have to wait until this evening to check. Perhaps its a simple consequence of the chain rule? )

  119. 119
    daveS says:

    PS to my #118:

    Regarding my question:

    1) In order to evaluate d^2x/dx^2, is it necessary that we have expressed x as a function of another parameter t? (That is, is there any way other than the method you have demonstrated?)

    I guess you could take your equation:

    d^2x = 2 dt^2 + 2t d^2t

    and solve it for d^2t/dt^2, obtaining:

    d^2t/dt^2 = (d^2x – 2)/(2t dt^2)

    which shows that if x is a function of t, then we can solve for d^2t/dt^2 in terms of differentials involving the dependent variable; thus working “up the chain” of dependence. I don’t know if this gives us any new information though.

  120. 120
    daveS says:

    I may be talking to myself now, but FTR, I haven’t come up with anything useful from the equation:

    d^2x/dx^2 = (2 dt^2 + 2t d^2t)/(4t^2 dt^2) = (6s ds^2 + 3s^2 d^2s)/(9s^4 ds^2)

    How d^2t could sometimes be 0 and sometimes not remains a mystery to me.

  121. 121
    kairosfocus says:

    DS, I’m here — just noted on the Notre Dame fire. KF

Leave a Reply