Arrays in C

1. Do array subscripts always start with zero?

Yes. If you have an array a[MAX] (in which MAX is some value known at compile time), the first element is a[0], and the last element is a[MAX-1]. This arrangement is different from what you would find in some other languages. In some languages, such as some versions of BASIC, the elements would be a[1] through a[MAX], and in other languages, such as Pascal, you can have it either way.
This variance can lead to some confusion. The "first element" in non-technical terms is the "zero'th" element according to its array index. If you're using spoken words, use "first" as the opposite of "last." If that's not precise enough, use pseudo-C. You might say, "The elements a sub one through a sub eight," or, "The second through ninth elements of a."
There's something you can do to try to fake array subscripts that start with one. Don't do it. The technique is described here only so that you'll know why not to use it.
Because pointers and arrays are almost identical, you might consider creating a pointer that would refer to the same elements as an array but would use indices that start with one. For example:

/* don't do this!!! */
int     a0[ MAX ];
int     *a1 = a0 - 1;   /* & a[ -1 ] */

Thus, the first element of a0 (if this worked, which it might not) would be the same as a1[1]. The last element of a0, a0[MAX-1], would be the same as a1[MAX]. There are two reasons why you shouldn't do this.
The first reason is that it might not work. According to the ANSI/ISO standard, it's undefined (which is a Bad Thing). The problem is that &a[-1] might not be a valid address; Your program might work all the time with some compilers, and some of the time with all compilers. Is that good enough?
The second reason not to do this is that it's not C-like. Part of learning C is to learn how array indices work. Part of reading (and maintaining) someone else's C code is being able to recognize common C idioms. If you do weird stuff like this, it'll be harder for people to understand your code. (It'll be harder for you to understand your own code, six months later.)

2. Is it valid to address one element beyond the end of an array?

It's valid to address it, but not to see what's there. (The really short answer is, "Yes, so don't worry about it.") With most compilers, if you say
int i, a[MAX], j;
then either i or j is at the part of memory just after the last element of the array. The way to see whether i or j follows the array is to compare their addresses with that of the element following the array. The way to say this in C is that either
& i == & a[ MAX ]
is true or
& a[ MAX ] == & j
is true. This isn't guaranteed; it's just the way it usually works. The point is, if you store something in a[MAX], you'll usually clobber something outside the a array. Even looking at the value of a[MAX] is technically against the rules, although it's not usually a problem. Why would you ever want to say &a[MAX]? There's a common idiom of going through every member of a loop using a pointer. Instead of

for ( i = 0; i < MAX; ++i )
{
        /* do something */;
}

C programmers often write this:

for ( p = a; p < & a[ MAX ]; ++p )
{
        /* do something */;
}

The kind of loop shown here is so common in existing C code that the C standard says it must work.

3. Can the sizeof operator be used to tell the size of an array passed to a function?

No. There's no way to tell, at runtime, how many elements are in an array parameter just by looking at the array parameter itself. Remember, passing an array to a function is exactly the same as passing a pointer to the first element. This is a Good Thing. It means that passing pointers and arrays to C functions is very efficient.
It also means that the programmer must use some mechanism to tell how big such an array is. There are two common ways to do that. The first method is to pass a count along with the array. This is what memcpy() does, for example:

char    source[ MAX ], dest[ MAX ];
/* ... */
memcpy( dest, source, MAX );

The second method is to have some convention about when the array ends. For example, a C "string" is just a pointer to the first character; the string is terminated by an ASCII NUL ('\0') character. This is also commonly done when you have an array of pointers; the last is the null pointer. Consider the following function, which takes an array of char*s. The last char* in the array is NULL; that's how the function knows when to stop.

void printMany( char *strings[] )
{
        int     i;
        i = 0;
        while ( strings[ i ] != NULL )
        {
             puts( strings[ i ] );
             ++i;
        }
}

Most C programmers would write this code a little more cryptically:

void  printMany( char *strings[] )
{
        while ( *strings )
        {
                puts( *strings++ );
        }
}

C programmers often use pointers rather than indices. You can't change the value of an array tag, but because strings is an array parameter, it's really the same as a pointer. That's why you can increment strings. Also,
while ( *strings )
means the same thing as
while ( *strings != NULL )
and the increment can be moved up into the call to puts().
If you document a function (if you write comments at the beginning, or if you write a "manual page" or a design document), it's important to describe how the function "knows" the size of the arrays passed to it. This description can be something simple, such as "null terminated," or "elephants has numElephants elements." (Or "arr should have 13 elements," if your code is written that way. Using hard coded numbers such as 13 or 64 or 1024 is not a great way to write C code, though.)

4. Is it better to use a pointer to navigate an array of values, or is it better to use a subscripted array name?

It's easier for a C compiler to generate good code for pointers than for subscripts.
Say that you have this:

/* X is some type */
X       a[ MAX ];       /* array */
X       *p;     /* pointer */
X       x;      /* element */
int     i;      /* index */

Here's one way to loop through all elements:

/* version (a) */
for ( i = 0; i < MAX; ++i )
{
        x = a[ i ];
        /* do something with x */
}

On the other hand, you could write the loop this way:

/* version (b) */
for ( p = a; p < & a[ MAX ]; ++p )
{
        x = *p;
        /* do something with x */
}

What's different between these two versions? The initialization and increment in the loop are the same. The comparison is about the same; more on that in a moment. The difference is between x=a[i] and x=*p. The first has to find the address of a[i]; to do that, it needs to multiply i by the size of an X and add it to the address of the first element of a. The second just has to go indirect on the p pointer. Indirection is fast; multiplication is relatively slow.
This is "micro efficiency." It might matter, it might not. If you're adding the elements of an array, or simply moving information from one place to another, much of the time in the loop will be spent just using the array index. If you do any I/O, or even call a function, each time through the loop, the relative cost of indexing will be insignificant.
Some multiplications are less expensive than others. If the size of an X is 1, the multiplication can be optimized away (1 times anything is the original anything). If the size of an X is a power of 2 (and it usually is if X is any of the built-in types), the multiplication can be optimized into a left shift. (It's like multiplying by 10 in base 10.)
What about computing &a[MAX] every time though the loop? That's part of the comparison in the pointer version. Isn't it as expensive computing a[i] each time? It's not, because &a[MAX] doesn't change during the loop. Any decent compiler will compute that, once, at the beginning of the loop, and use the same value each time. It's as if you had written this:

/* how the compiler implements version (b) */
X       *temp = & a[ MAX ];     /* optimization */
for ( p = a; p < temp; ++p )
{
        x = *p;
        /* do something with x */
}

This works only if the compiler can tell that a and MAX can't change in the middle of the loop. There are two other versions; both count down rather than up. That's no help for a task such as printing the elements of an array in order. It's fine for adding the values or something similar. The index version presumes that it's cheaper to compare a value with zero than to compare it with some arbitrary value:

/* version (c) */
for ( i = MAX - 1; i >= 0; --i )
{
        x = a[ i ];
        /* do something with x */
}

The pointer version makes the comparison simpler:

/* version (d) */
for ( p = & a[ MAX - 1 ]; p >= a; --p )
{
        x = *p;
        /* do something with x */
}

Code similar to that in version (d) is common, but not necessarily right. The loop ends only when p is less than a. That might not be possible.
The common wisdom would finish by saying, "Any decent optimizing compiler would generate the same code for all four versions." Unfortunately, there seems to be a lack of decent optimizing compilers in the world. A test program (in which the size of an X was not a power of 2 and in which the "do something" was trivial) was built with four very different compilers. Version (b) always ran much faster than version (a), sometimes twice as fast. Using pointers rather than indices made a big difference. (Clearly, all four compilers optimize &a[MAX] out of the loop.)
How about counting down rather than counting up? With two compilers, versions (c) and (d) were about the same as version (a); version (b) was the clear winner. (Maybe the comparison is cheaper, but decrementing is slower than incrementing?) With the other two compilers, version (c) was about the same as version (a) (indices are slow), but version (d) was slightly faster than version (b).
So if you want to write portable efficient code to navigate an array of values, using a pointer is faster than using subscripts. Use version (b); version (d) might not work, and even if it does, it might be compiled into slower code.
Most of the time, though, this is micro-optimizing. The "do something" in the loop is where most of the time is spent, usually. Too many C programmers are like half-sloppy carpenters; they sweep up the sawdust but leave a bunch of two-by-fours lying around.
www.cinterviews.com appreciates your contribution please mail us the questions you have to cinterviews.blogspot.com@gmail.com so that it will be useful to our job search community

Arrays in C

No comments:

Subscribe Cinterviews.com

Topics