String Manipulation in C
Overview
The page teaches you how to work with strings in the C programming language. If you are used to higher level languages such as Python or Javascript, you'll quickly discover that C does not have some of the "nice" string manipulation features such as string concatenation with +
(e.g. str1 + str2
), no easy .length()
property (although there is a function for that, strlen()
) nor things like a built in regex engine. It doesn't really even have the concept of a string type, just "pointer to char" (char*
) (some would argue that for all intents and purposes that is a string type).
What Are Strings In C?
Strings in C are represented by either a pointer to char char* myStr
or array of char char myStr[10]
. What is laid out in memory is a series of byte sized ASCII encoded characters, terminated by the null character ('\0'
or 0x00
).
Say we wanted to store the string "Hello". We could write char* myStr = "Hello";
. If the compiler decided to plonk this at memory address 0x01
(I avoided 0x00
since that invalid -- it's the null pointer) then the memory would look like this:
Memory address | Value | ASCII
0x00 | |
0x01 | 0x48 | H
0x02 | 0x65 | e
0x03 | 0x6C | l
0x04 | 0x6C | l
0x05 | 0x6F | o
0x06 | 0x00 | \0
0x07 | |
But this is not all, the memory above is used just to create the literal "Hello"
. What also is created is the variable myStr
, which points to the first character in this string. Let's assume the compile decides to plonk this at memory address 0x08
, then your memory will look like this.
Memory address | Value | ASCII
0x00 | |
0x01 | 0x48 | H
0x02 | 0x65 | e
0x03 | 0x6C | l
0x04 | 0x6C | l
0x05 | 0x6F | o
0x06 | 0x00 | \0
0x07 | |
0x08 | 0x01 | // This is myStr
Whenever you pass your string variable myStr
to functions, this 0x01
value is what is passed in. String functions, aware of the type (it's a pointer to char), know to increment through memory until they hit the null char when they want to do operations on the string.
One thing C does differently to a lot of other languages is it's choice of using null to determine the end of the string, rather than reserving some bytes at the start of the character array for holding it's length. To find the length of the string you have to iterate through the bytes until you find 0x00
, which is what strlen()
does (more on this below).