Q&A about C
for CS16, CS Dept. UCSB

Back to CS16 topics

Introduction

On this web page are some questions that students have asked about programming in C, along with some answers that I hope are helpful.

Phill Conrad, CS Dept. UCSB

About atof(), atoi() and char *

Question: So, when values are stored in char * are they converted to ascII? Is that why we have to atof all values we extract from it?

Answer: char * values are indeed ASCII representations of data. What you type in on the command line starts out as ASCII, and then has to be converted into floating-point representation---and that's what atof does.

Here's a slightly longer explanation, and then a very detailed explanation:

Slightly longer explanation

A char * is basically a pointer to an array of char values, each of which is the bit value, in ASCII, of some character. It is required that we have a \0 at the end.

Suppose we write this on the command line:

./rectPrism 2.5 3 4<return>

Then, argv[1] is a char *, that is a "pointer". It contains the address in memory of a sequence of characters, which are, one per line:

'2' Bits in memory are the ASCII value 50, which in binary is: 0011 0010
'.' Bits in memory are the ASCII value 46, which in binary is: 0010 1110
'5' Bits in memory are the ASCII value 53, which in binary is: 0011 0101
'\0' Bits in memory are the ASCII value 50, which in binary is: 0000 0000

So, argv[1] points to this sequence of bits in memory:

0011 0010 0010 1110 0011 0101 0000 0000

But we need the value 2.5 as a double. That value is represented as 64 bits, which happen to be these (for reasons that may be explored in more detail either later in this class, or in CS64):

0100 0000 0000 0100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

So, literally, that's what atof does: convert an ASCII bit sequence into a floating-point bit sequence.

(Just like ASCII is the standard for individual characters, there is a standard for floating-point numbers--it's called IEEE-754. But I don't expect you to memorize that.)

An even more detailed explanation

Here's our command line again:

./rectPrism 2.5 3 4<return>

This starts out life as a sequence of ASCII characters. Here are some examples:

When you type the 'r' in rectPrism, the ascii value for 'r' is 114 decimal, or 72 in hex, 0111 0010 binary.
When you type the '2' key on your keyboard to type 2.5, what ends up somewhere in memory is the ASCII value 50 decimal, which is 32 hex, i.e. 0011 0010 in binary.
When you type the decimal point in 2.5, you type the '.' key, which ends up as ASCII value 46 decimal, 2E hex, or 0010 1110.
When you type the '5', this ends up as ASCII value 0011 0101
What I've written as <return> actually starts out as the ASCII character for "return", which is 13 decimal, or 15 hex, i.e. 0001 0101.

So, back to ./rectPrism 2.5 3 4<return>

When you hit enter on the command line, the Unix system will take this command, and do a little bit of processing on it before it hands it over to the C program. In particular, it will take the single string:

"./rectPrism 2.5 3 4\n"

and turn it into

"./rectPrism" "2.5" "3" "4"

It does this by literally sticking in \0 after each part of the command.

./rectPrism\02.5\03\04\0

The character \0 is not two characters--it's just the way we write the ASCII value 0000 0000. It is a special sentinel character that marks the end of a string (you may recall we read about this in an earlier homework assignments, because I wanted you to be familiar with it when we started talking about atoi, atof, argv, etc.)

So here's how to visualize this in memory, one character per line:
. <------ argv[0] points here
/
r
e
c
t
P
r
i
s
m
\0
2 <----- argv[1] points here
.
5
\0
3 <----- argv[2] points here
\0
4 <----- argv[3] points here
\0

And then argc has the value 4, because argv is an array with values argv[0], argv[1], argv[2] and argv[3]---four values in all.

So what argv[1] in the above example is the C string "2.5", represented as a pointer to a sequence of bytes in memory "2.5" followed by a \0 byte (which is 0000 0000 in binary.)

0011 0010 0010 1110 0011 0101 0000 0000

What atof does is to take this sequence of four bytes and turn it into a sequence of eight bytes that represent the number 2.5 as a floating point number ( a double). That bit sequence turns out to be this:

hex: 4004000000000000
binary 0100 0000 0000 0100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

The exact bit sequence is a little bit tough to explain at this point----we will talk about it in class eventually, but I've already put a lot of detail into this email, and I don't want to overwhelm you---I just want to answer your question completely. But if you want to see a calculator for this value, check out this web page:

http://babbage.cs.qc.cuny.edu/IEEE-754/Decimal.html

Followup question: That makes sense. My follow up question would be: when would we use atoi and atof if, when we type, ./myprogram 2.5 we get argv[1] = 2.5 and the value is automatically converted for you?

Answer: So, I'm not sure what you mean by "the value is automatically converted for you". You talked about "converted to ASCII" in your original question too---so I'm doubly confused.

I'm wondering if my explanation has led you down the wrong path.... there is no automatic conversion at all going on here---if you think that, then you have some wrong view of the situation that we need to understand and fix. :-)

Originally the question was:

So, when values are stored in char * are they converted to ascII? Is that
why we have to atof all values we extract from it?

Perhaps I should have answered your original question with "NO, That is NOT the reason." :-)

The reason we have to use atof to extract values from a char * is that they were always ASCII in the first place. They are never "converted to ASCII". On the contrary, they were never anything other than ASCII.

Below, I'll try to explain what I mean by "they were never anything other than ASCII".

(I didn't think this was a crucial point before so I sort of "let it slide", but now I'm thinking that by not explaining this, I may have contributed to some misunderstanding on your part.)

A char * is simply a pointer to a sequence of characters--that is the numeric address of those characters inside the computer's memory. Those characters could be "Apple", or they could be "@#$#$" or they could be "2.5". As far as the computer is concerned, there is no difference between these three. They are all just meaningless symbols---just about the only thing we can do is print them on the screen, or store them on the disk. In particular, the computer doesn't know how to do math on "2.5"---it can't multiply it by 2 and get 5.0. It can't divide by 5 and get 0.5.

The "A" character you just saw on the screen in quotes near the start of this sentence was ASCII from the time my finger hit the "A" key on my keyboard until it appeared on the screen in front of you---and it was ASCII the whole way over the internet in between. Same with the number "2.5" that you are now seeing---ASCII all the way from my fingers to the computer in front of you. No conversion---it never was anything other than ASCII.

Something like "2.5" only acquires "meaning" as a number---in the sense that we can multiply it by another number, add another number, divide by another number—after we do the atof conversion to store it in a double variable:

double x;
x = atof("2.5");

or, if "2.5" is pointed to by argv[1], then:

double x;
x = atof(argv[1]);

Once we do this conversion, we can then do math on the variable x. We can compute x*2.0, or x/2.0, or sqrt(x), for example.

But we can't do math on argv[1], at least not in any meaningful way. And this conversion is NOT automatic. We have to specify explicitly that we want the conversion from ASCII to double (i.e. double-precision floating point) to take place, using the atof function (or from ASCII to integer, using the atoi function.)

Returning to your follow-up question:

That makes sense. My follow up question would be: when would we use atoi and atof if, when we type, ./myprogram 2.5 we get argv[1] = 2.5 and the value is automatically converted for you?

So, now we understand that there is no automatic conversion---only ASCII from the time you type 2.5 until it lands in memory, pointed to by argv[1].

If you type this at the Unix prompt:

bash-3.2$  ./myprogram 2.5

Then afterwards:

argv[0] is now pointing to "./myprogram"
argv[1] is now pointing to "2.5"

And what argv[1] points to is a sequence of ASCII characters.

As an ASCII value, the number 2.5 is fairly useless to us. The only thing we can do with it is print it. We can't multiply it by 2, or divide it by 3, or calculate its square root. The ASCII value 2.5 is just three symbols---it might as well be ###, or y.x, or !!!.

We can only do MATH on the value 2.5 after we convert 2.5 to its floating point representation inside a variable like x, like this:

double x,y,z;
x = atof(argv[1]);

NOW, we can do things like:

y = x * 2.0;
y = x / 3.0;
z = sqrt(z); // this requires #include <math.h>

We can't do any of those things with argv[1]. In particular, the following is meaningless:

y = argv[1] * 2.0; // meaningless!

That probably won't even compile---but if it did, what it would mean is "take some address in memory like FFCA3124 and multiply it by 2.0." The address FFCA3124 would be the location in memory of the characters "2.5". After we multiply it by 2.0, we have a very large number that is probably not even be a valid address. The value stored in the variable y will have no relationship to the number 2.5, and it almost certainly won't be 5.0.

(As an analogy: multiplying a pointer by 2.0 is sort of like multiplying a zip code by 2. The zip code for UCSB is 93016. Multiply by 2.0, and you get 186212, but so what? That number doesn't mean anything--it isn't even a valid zip code anymore.)

Turning now to the atoi function:

We use atoi when we want to convert from what is in argv[1] into an integer, instead of into a floating-point number. For example, if we had

./myprogram 12 42 55

we could do:

int x,y,z;
x = atoi(argv[1]);
y = atoi(argv[2]);
z = atoi(argv[3]);

If instead, we have: ./myprogram 12.1 42.3 55.9

we'd use:

double x,y,z;
x = atof(argv[1]);
y = atof(argv[2]);
z = atof(argv[3]);

You may wonder: how do we know in advance which one to use if we don't know what the user is going to type?

The answer goes sort of like this:

(1) If a program needs to be able to take either numbers with decimals or integers, then always use double along with atof. Your users will be able to type either one. atof can convert either one and store the correct value in a double variable.

(2) If only integers make sense for your program (for example, you are specifying something that only makes sense in whole numbers), then use int, along with atoi. Anything your user types after the decimal point will just be ignored when atoi converts (it will not round, but just "chop off" the decimal---we call this "truncation").

I hope this helps to clear things up. If not, let me know. I want to get this explanation right!

Declaring the loop variable in a for loop
e.g. for (int i=0; i<n; i++)
like you do in Java and C++

Question: You taught that in C we have to declare the variable for a loop at the top of the block (i.e. the top of the set of { } the encloses the for loop—for example:

int main()
{
   int i;
   printf("Santa says: \n");
   
   for (i=0; i<3; i++)
      printf("Ho! ");

   printf("Merry Christmas!\n");
   return 0;
}

But I was taught before that you could write a for loop like this, declaring i in the for loop:


   for (int i=0; i<3; i++)

Why can't we do that in C?

Answer: It turns out that C++ and Java, both of which came later than C, introduced the ability to declare the loop control variable right in the for loop. There are many other improvements that C++ and Java made over plain C, including the new style of comment (//).

Some of these have made their way back into C, as C has continued to evolve. The // style of comment that was introduced into C++ and Java is now available by default on many C compilers, including the gcc C compiler that we use on CSIL.

The for (int i=0; i<3; i++) style of for loop is available in C99, a new dialect of C. This feature is available when you compile with the special command line option -std=c99, for example:

gcc -std=c99 myprog.c -o myprog

In CS16, for now, we are avoiding these features, and sticking mostly to the older ANSI C 1989 dialect, since

you are likely to encounter this dialect in "real-world" C code.
we want to avoid features that require special compiler options, until we have learned ways to automate selecting these—that is a topic that may be covered later in CS16, or in later courses such as CS24 and CS32.

We stray from ANSI C in a few small ways, for example allowing // style of comments.

C++ has inline functions. What about C?

First, see the answer about the for (int i=0; i<n; i++) style of for loop above, for a discussion of C99 vs. the older dialect of C we are using in CS16.

ANSI C does not have inline functions. They are available in C++, C99, and gnu extensions to C. The Wikipedia page on inline functions (retrieved on 10/19/2009) has a nice discussion of the pros/cons of inlines, including these points:

"Inline expansion is used to eliminate the time overhead when a function is called. It is typically used for functions that execute frequently. It also has a space benefit for very small functions, and is an enabling transformation for other optimizations.

Without inline functions, however, the compiler decides which functions to inline. The programmer has little or no control over which functions are inlined and which are not. Giving this degree of control to the programmer allows her/him to use application-specific knowledge in choosing which functions to inline.

...

Besides the problems associated with in-line expansion in general, inline functions as a language feature may not be as valuable as they appear, for a number of reasons:

Often, a compiler is in a better position than a human to decide whether a particular function should be inlined. Sometimes the compiler may not be able to inline as many functions as the programmer indicates.

As functions evolve, they may become suitable for inlining where they were not before, or no longer suitable for inlining where they were before. While inlining or un-inlining a function is easier than converting to and from macros, it still requires extra maintenance which typically yields relatively little benefit.

Inline functions used in proliferation in native C-based compilation systems can increase compilation time, since the intermediate representation of their bodies is copied into each call site where they are inlined. The potential increase in code size is mirrored by a potential increase in compilation time.
... "

What does the error "undefined reference to main" mean?

Question: I have been working on H04 and have been trying to test the code after I write it. However, when I try to test the code through csil, I get an error:

-bash-4.1$ make xCubed
cc     xCubed.c   -o xCubed
/usr/lib/gcc/i686-redhat-linux/4.4.3/../../../crt1.o: 
In function `_start':(.text+0x18): undefined reference to `main'
collect2: ld returned 1 exit status
make: *** [xCubed] Error 1

I got this error after copying the xCubed function into emacs from the example handout. I am also getting this error for my areaOfTriangle and other functions. Any ideas why this is occuring?

Answer: Thanks for your question. This error comes because every C program must have a main function.

You can get past this error by doing one of two things:

To just check the syntax: when you compile, instead of typing:

make xCubedinstead, type this:

cc -c xCubed.cThis will "compile only" without producing a program you can run. You'll get a file named "xCubed.o" in your account if the program compiles successfully---but you won't be able to run that program---you'll only know that the C code doesn't contain any syntax errors.
If you want to actually test, you'll need to include a main program such as the following in your file. Then you can compile with "make" the normal way, and actually test your function for different values:
```
    
int main()
{
  double x;
  
  // prompt for input
  
  printf("Enter a value for x: ");
  scanf("%lf",&x);

  // print results
  printf("xCubed(x)=%lf\n",xCubed(x));
  return 0;
  }
```
If x is an int instead of a double, be sure to change:

double to int
%lf to %i

In lab02, we'll be exploring even better ways to test our code.

Copyright 2009, Phillip T. Conrad, CS Dept, UC Santa Barbara. Permission to copy for non-commercial, non-profit, educational purposes granted, provided appropriate credit is given; all other rights reserved.

Q&A about C for CS16, CS Dept. UCSB