In this short post, I want to show a way to find the square root of a positive number without relying on the standard library using Newton's method. This is the method which seems to be used in many implementations of the C standard library. The example code is in the C programming language. Before going to the implementation, I provide the mathematical formulae used.

I want to demonstrate how one could solve it with a relatively small amount of code without sacrificing precision much or at all compared to the sqrt() function from the C standard library.

A Bit of Math

Before going to the code, we need to describe the method which we are going to use. No advanced understanding of math is required.

Finding the square root of a number

\[N = \sqrt{X}\]

using Newton's method requires multiple steps.

To find the value of the desired accuracy

\[N = A_n\]

we will use the following formula:

\[A_{n} = \frac{A_{n-1} + \frac{X}{A_{n-1}}}{2}\]

where accuracy depends on the value of \(n\). The higher the value of \(n\) the more accurate value of square root we could obtain. We can say that \(n\) is the number of approximations (how many times we want to apply this formula).

When choosing \(n\) one needs to keep in mind that, in general, Newton's method has quadratic convergence which intuitively means that the number of correct digits roughly doubles in every step. On a computer, the number of approximations is dictated by the hardware.

From the formula above it is obvious that we need to know a way to find a value of \(A_1\) to start.

A good starting value (or, should I say, a good first approximation) would be integer square root of \(X\):

\[I=\lfloor \sqrt{X} \rfloor\]

so

\[A_{1} = I.\]

We will refer to this formulae during the programming process. Now let's go to the implementation.

Implementation

From the description above it becomes obvious that the first step which we need to do is to calculate the integer square root \(I\). We can calculate this value iteratively increasing some number \(k\) starting from \(1\) until \(k^2 \gt X\). Reaching this condition means that we have found the integer square root \(I\) of the number \(X\):

\[I = k - 1.\]

We can achieve this using the following C code:

unsigned long isqrt(const unsigned long X)
{
    unsigned long k = 1, result = 1;
    while (result <= X)
    {
        k++;
        result = k*k;
    }
    return k--;
}

There are more efficient methods available, but this one is easier to understand.

Then we need to find \(A_n\)1 I have to admit that this sentence was added after some discussion at Reddit and it is quite bold. I tried to find some facts which will prove that I am wrong, but so far I have found none. But I am not a floating-point expert, though, so if you have information on this or, even better, the proof - please feel free to contact me. I am interested in improving this code and always ready to admit my mistakes. iteratively. Here the question arises: how should we peek \(n\)? We can paraphrase this question: how many times should we approximate the value?

In C we can use the following trick: considering that the floating-point numbers have finite precision, we may iterate until the value \(A_n\) and \(A_{n-1}\) will not become equal when compared using == operator in C. That means that we have reached the maximum possible precision on the current hardware (at least for this implementation). That is to say that I expect the method to converge as soon as

\[\left[A_{n-1} - A_n \right] \lt \epsilon\]

where \(\epsilon\) is the machine epsilon for double-precision floating-point values (defined as DBL_EPSILON constant in C).1

Please keep in mind that this approach may not work in some other programming environments. If it is the case, limit \(n\) with some sane number. Also keep in mind, that we are targeting platforms with IEEE 754-compliant floating-point units.

So, finally, here is the code which implements calculation of the square root using Newton's method (with some checks omitted for clarity):

double nsqrt(double X)
{
    /* ... */
    unsigned long I;
    double A_prev, A;
    if (X == 0.0 || X == 1.0 /* .. */)
    {
        return X;
    }
    /* ... */
    I = isqrt((unsigned long)X); /* see above */
    if (X == (double)(I*I)) /* exact root */
    {
        return (double)I;
    }
    A = A_prev = I;
    do {
        A_prev = A;
        A = (A + X/A)/2.0;
    } while (A_prev != A);

    return A;
}

The reader might be interested in how well the code above performs compared against the sqrt() function from the C standard library. According to my tests, precision wise, it produces very similar or even the same results, at least on Windows (x86_64) and Linux (x86_64, ARM). I believe that it would be the case for Plan 9 as well. There is one notable exception though: when calculating \(\sqrt{2}\) the results produced by the function above and the built-in one might differ. It seems that the functions in standard C library implementations may return the precomputed value of \(\sqrt{2}\) (the M_SQRT2 constant), which might have been computed using different, possibly more precise, method.

Let's add the corresponding check to the code (as well as some other checks). Here is the revised version of the function:

double nsqrt(double X)
{
    const double inf = 1.0l/0.0l; /* positive infinity */
    const double ninf = -inf; /* negative infinity */
    const double nan = 0.0l/0.0l; /* not a number */
    unsigned long I;
    double A_prev, A;
    if (X == 0.0 || X == 1.0 || X == inf)
    {
        return X;
    }
    else if (X == 2.0)
    {
        return M_SQRT2;
    }
    else if (X == ninf || X < -0.0)
    {
        return nan;
    }
    /* NaN cannot be compared using == operator */
    else if (memcmp((void *)&X, (void *)&nan, sizeof(double)) == 0)
    {
        return nan;
    }
    /* I =  floor(sqrt(X)) */
    I = isqrt((unsigned long)X);
    if (X == (double)(I*I)) /* exact root */
    {
        return (double)I;
    }
    /* Newton's method: A{n} = (A{n-1} + X/A{n-1})/2 */
    A = A_prev = I; /* A{1} = I, or A{1} = floor(sqrt(X)) */
    do {
        A_prev = A;
        A = (A + X/A)/2.0;
    } while (A_prev != A);
    return A;
}

As far as I can tell, this version should be standard-compliant (at least on my hardware), but I do not want to make such a bold statement, especially considering that developing the standard compliant version of sqrt() was not the main point of this post.

Moreover, I do not want to claim that this function works in the same way as sqrt() from the standard C library. For some input values that can be not true. At very least you can expect that the results would be very similar. Please keep it in mind when comparing the implementations.

The main point of this post is to show that one can solve the problem without sacrificing precision with a small amount of code, a bit of math and some knowledge of how the hardware works.

The Full Example

/* to make assert() work in release builds */
#undef NDEBUG

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>

#include <math.h>

/*
Given a number N, the task is to find the square root of N without using sqrt() function.
*/
static unsigned long isqrt(const unsigned long X)
{
    unsigned long k = 1, result = 1;
    while (result <= X)
    {
        k++;
        result = k*k;
    }
    return k--;
}

static double nsqrt(double X)
{
    const double inf = 1.0l/0.0l; /* positive infinity */
    const double ninf = -inf; /* negative infinity */
    const double nan = 0.0l/0.0l; /* not a number */
    unsigned long I;
    double A_prev, A;
    if (X == 0.0 || X == 1.0 || X == inf)
    {
        return X;
    }
    else if (X == 2.0)
    {
        return M_SQRT2;
    }
    else if (X == ninf || X < -0.0)
    {
        return nan;
    }
    /* NaN cannot be compared using == operator */
    else if (memcmp((void *)&X, (void *)&nan, sizeof(double)) == 0)
    {
        return nan;
    }
    /* I =  floor(sqrt(X)) */
    I = isqrt((unsigned long)X);
    if (X == (double)(I*I)) /* exact root */
    {
        return (double)I;
    }
    /* Newton's method: A{n} = (A{n-1} + X/A{n-1})/2 */
    A = A_prev = I; /* A{1} = I, or A{1} = floor(sqrt(X)) */
    do {
        A_prev = A;
        A = (A + X/A)/2.0;
    } while (A_prev != A);
    return A;
}

int main(int argc, char **argv)
{
    const double inf = 1.0l/0.0l; /* positive infinity */
    const double ninf = -inf; /* negative infinity */
    const double nan = 0.0l/0.0l; /* not a number */
    size_t i;
    /* some edge cases */
    assert(isnan(sqrt(nan)));
    assert(isnan(nsqrt(nan)));
    assert(isnan(sqrt(-1.0)));
    assert(isnan(nsqrt(-1.0)));
    assert(sqrt(inf) == nsqrt(inf));
    assert(isnan(sqrt(ninf)));
    assert(isnan(nsqrt(ninf)));
    assert(sqrt(0.0) == nsqrt(0.0));
    assert(sqrt(-0.0) == nsqrt(-0.0));
    assert(sqrt(1.0) == nsqrt(1.0));
    assert(sqrt(2.0) == nsqrt(2.0));

    for (i = 0; i < 100; i++)
    {
        const double r = sqrt(i), nr = nsqrt(i);
        printf("%lu - sqrt(): %.50f, nsqrt(): %.50f. same: %s\n",
                i, r, nr, (r == nr ? "true" : "false"));
    }
    return 0;
}

Footnotes:

1

I have to admit that this sentence was added after some discussion at Reddit and it is quite bold. I tried to find some facts which will prove that I am wrong, but so far I have found none. But I am not a floating-point expert, though, so if you have information on this or, even better, the proof - please feel free to contact me. I am interested in improving this code and always ready to admit my mistakes.