Tuesday, June 22, 2010

Atomized Strings - .Net technique for memory optimization

Via: Atomize your strings to improve memory usage

Atomization is the technique of reusing string references as a way of optimizing memory usage. It can also lead to speed optimizations as reference comparisons is much faster than actual string comparisons.

As the post points out the NameTable class used for processing XML in .Net uses it for storing element names. You can use the NameTable class to get the same kind of optimizations.

From the post:

The process of taking a string and checking whether you already had one with the same value to reuse it is called atomizing a string. This has two nice properties.

  1. You end up using less memory to hold all the strings.
  2. You can compare the strings faster. The value is the same if and only if they are the same reference, so you can do a by-reference comparison, which is much faster. Even if you eventually mix non-atomized strings, you'll still have cases where the by-reference gives you a "quick yes" on equality comparison.

Sample code for using the NameTable class for optimizing your code by using atomized strings:

System.Xml.NameTable nt = new System.Xml.NameTable();
foreach (var detail in details)
{
string value =
(detail == null) ? null :
nt.Add(detail);
}

NameTable: http://msdn.microsoft.com/en-us/library/system.xml.nametable(v=VS.80).aspx

Hanselman: http://www.hanselman.com/blog/XmlAndTheNametable.aspx

Note:

It is important to remember that the CLR also attempts to optimize memory used by string literals by using the string intern pool. The string intern pool will use the same reference for s1 and s1 in the following example, but s3 will have a totally separate reference:

string s1 = "hello";
string s2 = "hello";
object.ReferenceEquals(s1,s2); //true
StringBuilder sb = new StringBuilder();
sb.Append("hello");
string s3 = sb.ToString();
object.ReferenceEquals(s1,s3); //false

No comments: