Strings in C

 

In C, a string is an array of characters.  This array can be created statically or dynamically and accessed via a name or variable which contains its address.

 

Example #1 – non-dynamic allocation

 

#include <stdlib.h>

#include <stdio.h>

#define N 10

 

int main(int argc, char *argv[])

{

char fName[ N ];   

 

     /*

The memory allocated for the string fName looks like:

fName: [?][?][?][?][?][?][?][?][?][?]

             0  1  2  3  4  5  6  7  8  9

 

The ? means the value in the cell is undefined.

Whatever was last there is still there.

     */

 

printf("Enter your first name: ");

scanf( "%s", fName );

 

/*

assume the user types:   Timothy

    

The memory allocated for the string fName now looks like:

fName: ['T']['i']['m']['o']['t']['h']['y']['\0'][?][?]

              0    1    2    3    4    5    6    7    8  9

    

scanf copies the chars from the keyboard into their respective elements of the array and then adds a null character as a terminator.  '\0' is really a char value 0 (zero). In an array of chars, null is a zero byte (8 zero bits).  The null terminator must be present in order for printf and the string functions to work properly.  It follows that a character array of length N can safely store at most N-1 characters.

         */

 

printf("Your first name is %s\n", fName);

 

/* stops printing chars when it encounters the null character */

 

return 0;

} /* END OF MAIN */

 

The printf function starts at the first char of the array and continues printing chars until it sees the null character. The null character is not printed.  

 

A couple questions arise:

 

Question #1:  What if you enter a string with more than N-1 characters?

 

In this case scanf continues to copy characters into memory beyond the end of the array then adds the null terminator.  

 

if the user types     BillyJoeBob

 

then memory now looks like:

 

fName: ['B']['i']['l']['l']['y']['J']['o']['e']['B']['o']['b']['\0']

         0    1    2    3    4    5    6    7    8    9

 

 

                           OOPS!   We don't (necessarily) own this memory

 

We have just accessed memory via the fName variable that does not belong to the fName variable!  Even if the memory occupied by those last 2 bytes was being used by some other variable in our program, we just trashed its value!  Unfortunately C does not guarantee detecting such a mistake for us.  Your progam may crash before the data is finished being copied or, worse yet, it may continue to run with corrupted memory which can produce unexpected/unpredictable behaviour later.  These kinds of errors can be very difficult to discover since the behaviour may be inconsistent from run to run.  If the program does crash you will probably see an error message with the word segfault in it.

 

Question #2:  What if printf is fed a string that does not have a null terminator?

 

int i=0;

char c = 'a';

while (i < N)  /* we hardcode the contents of the string but no null terminator */

         fName[ i++] = c++;

 

and now memory looks like:

 

fName: ['a']['b']['c']['d']['e']['f']['g']['h']['i']['j']

         0    1    2    3    4    5    6    7    8    9

 

Our printf  function now does not know when to stop and it continues to print characters beyond the end of the array until it chance encounters a null value,  OR it crashes somewhere after the end of the array.


Common String Functions

(formal descriptions are found in the man pages)

 

 

STRING LENGTH

 

size_t strlen(const char *s);

 

 

The strlen function returns an unsigned int which is the number of characters in the string, not including the terminating null character.

 

 

STRING COPYING

 

char* strcpy(char *dest, const char *src);

 

The strcpy function copies the contents of one string into another and tacks on the terminating null character. It returns a pointer to the dest string.

 

char foo[10];

char bar[10];

 

strcpy(foo, "Hello");

 

foo: ['H']['e']['l']['l']['o']['\0'][?][?][?][?]

       0    1    2    3    4    5    6  7  8  9

 

Note that strcpy accepts a string literal as its source. The meaning of const in the prototype does not mean the src must be a string  literal. It just means that the code inside strcpy should not modify the source string.  If the code inside does  modify the src string, the compiler will issue a warning that a read only location is being assigned into. The compiler however will complete the compilation and let you do it. Those of you familiar with C++ may recall that C++ will refuse to compile const code that modifies a const arg. C however, as usual, lets you do something that is inconsistent (and possibly very bad!).

 

The strcpy function added a null character to the dest string.  What really happened here is that  a null character was actually stored at the end of the literal "Hello" by the compiler and copied onto the dest just like any other null-terminated source.

 

Note also that strcpy does not require dynamic (malloc'd) memory for the destination. The char * dest argument merely specifies that the address of a character (pointer to char) must be passed in for this arg.  The name dest as declared in main is in fact a const pointer to char.  It is const because we can never assign a new array's address into the dest declared in main. In main, the name dest is bound for life to the same chunk of memory. Thus we refer to dest in main as being a variable's name.

 

strcpy(bar, foo);

 

produces:

 

bar: ['H']['e']['l']['l']['o']['\0'][?][?][?][?]

       0    1    2    3    4    5    6  7  8  9

 

It is important to remember that strcpy copies from  src into the dest string starting at the address in dest.

 

strcpy(bar, "Tim");

 

produces:  

                                               this memory unchanged

 

bar: ['T']['i']['m']['\0']['o']['\0'][?][?][?][?]

       0    1    2    3     4    5    6  7  8  9

 

Note that strcpy does not alter the portion of the dest string after the null character deposited by the copy operation.

 

Strcpy does not do any error checking for invalid arguments. If the src string is not null-terminated than strcpy will read on further in memory until it crashes or chance encounters a null character.  If the dest string is not big enough to hold src then strcpy will copy chars beyond the end of dest until it crashes or completes the copy.

 

Either of these 2 cases are error conditions, even though your program might not actually crash on any particular run.

 

 

STRING COMPARISON

 

int strcmp(const char *s1, const char *s2);

 

 

The strcmp function behaves much like the compareTo method in Java Strings (or vice versa since C came first).

 

strcmp compares both strings one char at a time starting at the start addresses passed in. It subtracts the ascii value of s2[i] from s1[i]. If the difference is non-zero or if a null character is encountered in either string, the difference is returned. See the man pages for a formal description.

 

As with all the string functions no error checking is done and bad args produce crashes or unpredictable behavior.