C# Strings

Dec 2011

Table of Contents

Overview

Wondering how to use strings in C#? This article aims to be your definitive guide. We'll start with the basics, tour through some common string operations and then talk about some advanced things including performance tuning.

Basic Examples

Strings in C# are created directly with quotes. Like so:

string example = "This is a string.";

We now have a string stored in a variable called "example". We can do all manner of string operations with this example variable now. For instance, we could find the fourth character in the string:

char fourth = example[3];

The variable called "fourth" now contains the character "s". This reveals what strings actually are - an array of characters. Why did we use the number 3 to get the fourth character, and not the letter 4? Because all arrays in C# count their members starting from zero. So example[0] contains "T", example[1] contains "h" and so on.

Combing two strings is called concatenation. We do this with the "+" operator, like so:

string firstHalf = "This is ";
string secondHalf = "a string.";
string example = firstHalf + secondHalf;
int length = example.Length;

Once again the example variable contains the string "This is a string". We're also doing another string operation, namely calculating its length. The "Length" property returns the number of characters in the string. Now that we can combine strings let's try extracting part of the string:

string example = "This is a string.";
string firstHalf = example.Substring(0, 8);
string secondHalf = example.Substring(8);

We've now done the opposite of the last example. firstHalf contains the first eight characters of example, which equals "This is ". Note that spaces are characters too. secondHalf contains all the characters in example after, and including, the ninth. Which equals "a string." The first number we specify for the Substring method determines where we start looking, the second (if specified) determines how many characters we grab. When the second number is missing it grabs all the rest of the characters left in the string.

Changing the case (upper or lower) of a string is easy. Here's how:

string example = "This is a string.";
string lower = example.ToLower();
string upper = example.ToUpper();

lower contains "this is a string." upper contains "THIS IS A STRING." We should note here that nothing has happened to example at this point. It has stayed exactly the same as it was when it was created. In fact strings can never change, they are immutable. This is something we'll discuss further in the Performance section.

Lastly we'll demonstrate splitting a string into pieces. For this we'll use the Split method like so:

string example = "This is a string.";
string[] words = example.Split(' ');

Note that we requested that example gets split into the pieces that are separated by the space character (' '). Single quotes specify a single character. So, what we get here is an array of strings. The array is named words. We can access the strings in this array in the same way we accessed the characters in a string:

string firstWord = words[0];
string secondWord = words[1];

firstWord is set to "This" and secondWord is set to "is".

Searching and Replacing

There are two ways of finding a string inside of another string. The first is via the Contains method.

string example = "This is a string.";
bool containsIs = example.Contains("is");

Now to determine where the substring is:

string example = "This is a string.";
int startsAt = example.IndexOf("is");

IndexOf let's us now where the first instance of the substring was found. It returns the character index that the substring starts at. In the example above startsAt would be set to 2. We can look through the string in reverse by using LastIndexOf.

To replace part of a string with something else we use the Replacemethod.

string example = "This is a string.";
string replaced = example.Replace("string", "replacement");

The replaced variable will contain the string "This is a replacement." Note again that strings are immutable and the example variable still contains the string "This is a string."

Escaped Characters

When we create a string we can put anything we like between the quotes. However certain characters need to be escaped with a backslash to be properly understood. The most obvious character that needs escaping is the quote character. Visual C# can't make any sense of this:

string example = "Harry says "Hello World".";

The compiler doesn't know where the string ends and will complain accordingly. Instead we need to do this:

string example = "Harry says \"Hello World\".";

The compiler will now treat \" as a single quote character. Other common escape character are \t (tab), \r (carriage return) and \n (newline). The following:

string example = "This is a string.\r\nThis is a new line.\r\n\tThis line is indented.";

Will appear as:

This is a string.
This is a new line.
    This line is indented.

Converting Strings

Our next topic is string conversion. We'll demonstrate how to convert strings to other things, like numbers and dates. We'll also demonstrate how to convert other things to a string.

Converting a string to something else is easy. We simply use the "Parse" method of our desired class to perform the conversion, like so:

int parsedInt = int.Parse("123");
long parsedLong = long.Parse("983423958329493493");
float parsedDouble = float.Parse("1.2345");
double parsedDouble = double.Parse("1.2345678901234");
DateTime parsedDateTime = DateTime.Parse("2011-12-09");

So we've converted a string to integer, a string to long, a string to float, etc. Doing the reverse, integer to string, double to string, etcetera is equally easy. We use the ToString method:

int i = 123;
string intString = i.ToString();
float f = 123.0F;
string floatString = f.ToString();
DateTime dt = DateTime.Now;
string dateTimeString = dt.ToString();

Every object in C#/.NET has a ToString method that we can call to serialize the object to a string.

String Formatting

There are few ways to deal with string formatting in C#. The first thing we'll look at is the Format method. The most common way this is used is to take a format a string with placeholder values. Like so:

string example = "Hello {0}, you have {1} credits remaining.";
string formatted = string.Format(example, "Bob", 13);

The formatted variable will be set to "Hello Bob, you have 13 credits remaining." The idea here is that curly braces with a number inside are placeholders for the array of values that are passed into the Format method. This method takes a string that contains the placeholders as its first parameter and then takes any number of parameters as the placeholder values. Another example:

string example = "{0}{1}{2}{1}{3}.";
string formatted = string.Format(example, 2011, '-', 12, 14);

In this example formatted will be set to "2011-12-14". Notice that you can repeat a parameter index - we used "{1}" twice.

The other common way of adjusting the format of string in C# is to specify parameters to the ToString serialization method. All of the value types in C# allow have overloaded ToString methods that take a string as a parameter. This string controls the formatting. Let's start with an example:

int i = 45;
string formatted = i.ToString("#.00"); // -> "45.00"

formatted will be set to "45.00". Some more examples:

int i = 45000;
string formatted = i.ToString("#,###"); // -> "45,000"
double d = 1.23456;
formatted = d.ToString("#.00"); // -> "1.23"
DateTime dt = DateTime.Parse("2011-12-14");
formatted = dt.ToString("MMM dd, yyyy"); // -> "Dec 14, 2011"

For more info on the formatting options look in MSDN for the ToString method documentation for each class.

Globalization

Globalization refers to alphabet, language and cultural specifics that are important when dealing with strings. C#/.NET contain a wealth of classes to help deal with globalization. In order to work with them we should first include the core Globalization namespace. We do this by importing it with "using" at the top of our code file. Like so:

using System.Globalization;

Now let's look at a key globalization class: CultureInfo. This class specifies all manner of things that are relevant to different cultures. We have properties like DateTimeFormat and NumberFormat. This class also has a handy, static property called CurrentCulture that returns the CultureInfo that is the default for the computer the code is running on. So, the following code will result in different values of formatted, according to the culture settings of the computer:

string formatted = DateTime.Now.ToString(
    CultureInfo.CurrentCulture.DateTimeFormat.FullDateTimePattern);

Performance

There is a lot to be said about string performance in C# and .NET. The primary thing to remember is that strings are immutable in .NET. This means that once created, a string cannot be changed. To understand how this can cause performance problems let's look at some code.

string s = "";
for (int i = 0; i < 10000; i++)
    s = s + i.ToString();

This may look fine but is actually a very bad piece of code, performance-wise. This is what is happening:

  1. A string is created for "".
  2. For every iteration of the for loop:
    1. A new string is created to hold the result of i.ToString()
    2. A new string is created to hold the result of s + i.ToString()

This is bad because of the number of string objects that are created. And for every iteration of the for loop the strings that are allocated keep getting bigger and bigger. By the 9,999th iteration this is what is happening:

  1. We have string s and it is consuming 38,886 characters worth of memory.
  2. .NET creates another string for i.ToString() that consumes 4 bytes of memory ("9999").
  3. .NET creates another string that can hold 38,890 characters. It copies s into the first 38,886 chars and the i.ToString() string into the last 4 chars.
  4. .NET sets s to the new string in step #3 and the memory for the old s is marked for garbage collection.

The last bit about garbage collection is also interesting. Garbage collection deallocates objects that we're no longer using. It doesn't happen instantaneously so all those versions of s that were created on each iteration of the for will be in memory until the garbage collector runs. This simple piece of code could, in theory, consume memory 10 times or memory than you would expect!

So, how do we avoid this problem? We use the primary string-efficiency tool in the C# programmer's toolbox - StringBuilder. The StringBuilder class was written to solve our problem. Here's how we would re-write our code to be very efficient:

StringBuilder stringBuilder = new StringBuilder(50000);
for (int i = 0; i < 10000; i++)
    stringBuilder.Append(i);

The first line creates the StringBuilder instance. The parameter value of 50000 is not necessary but it is a useful hint that tells the StringBuilder that it should allocate 50,000 characters worth of memory for our operations. In the for loop we simply append i to the StringBuilder instance. Note that we don't have to call i.ToString(), the Append method is overloaded to take all the base types in .NET. Here's what is happening in this code:

  1. .NET allocates a StringBuilder instance and 50,000 chars of memory for the string that it will build.
  2. Every iteration of the for loop .NET serializes i into a char array and appends the chars to the byte array that StringBuilder has allocated.

So, with this code no strings are allocated at all. Memory is not duplicated and the garbage collector has little work to do. If we time the results we'll see that the StringBuilder code completes in less than 1% of the time that the original code completes in.

The trick to truly optimizing performance of StringBuilder is to always set a large value for the capacity parameter. If we don't set one at all StringBuilder defaults to only 16 bytes. This doesn't mean that we can only create a string with 16 bytes. When Append is called and the capacity would be exceeded StringBuilder doubles the amount of memory allocated. Memory allocations aren't terribly expensive but if you do enough of them they add up. It's typically much better to over-estimate the capacity by a significant amount than to have StringBuilder need to re-allocate. Go big!

Comments
by Carlye What libearting knowledge. Give me liberty or give me death.
Add a Comment Name: Comment: Enter the text in the image below:
w3 enterprises