Aggregated Intelligence: Writing C# 2.0 Unsafe Code

This article is extracted from Practical .NET2 and C#2 by Patrick Smacchia. http://www.practicaldot.net/en2/Resources/PracticalDotNet2_Ch14.pdf Introduction We will see that C# allows suspending the verification of code by the CLR to allow developers to directly access memory using pointers. Hence with C#, you can complete, in a standard way, certain optimizations which were only possible within unmanaged development environments such as C++. These optimizations concern, for example, the processing of large amounts of data in memory such as bitmaps. Pointers and unsafe code C++ does not know the notion of code management. This is one of the advantages of C++ as it allows the use of pointers and thus allows developers to write optimized code which is closer to the target machine. This is also a disadvantage of C++ since the use of pointers is cumbersome and potentially dangerous, significantly increasing the development effort and maintenance required. Before the .NET platform, 100% of the code executed on the Windows operating system was unmanaged. This means the executable contains the code directly in machine instructions which are compatible with the type of processor (i.e. machine language code). The introduction of the managed execution mode with the .NET platform is revolutionary. The main sources of hard to track bugs are detected and resolved by the CLR. Amongst these: Array access overflows (Now dynamically managed by the CLR). Memory leaks (Now mostly managed by the garbage collector). The use of an invalid pointer. This problem is solved in a radical way as the manipulation of pointers if forbidden in managed mode. The CLR knows how to manipulate three kinds of pointers: Managed pointers. These pointers can point to data contained in the object heap managed by the garbage collector. These pointers are not used explicitly by the C# code. They are thus used implicitly by the C# compiler when it compiles methods with out and ref arguments. Unmanaged function pointers. The pointers are conceptually close to the notion of delegate. We will discuss them at the end of this article. Unmanaged pointers. These pointers can point to any data contained in the user addressing space of the process. The C# language allows to use this type of pointers in zones of code considered unsafe. The IL code emitted by the C# compiler corresponding to the zones of code which use these unmanaged pointers make use of specialized IL instructions. Their effect on the memory of the process cannot be verified by the JIT compiler of the CLR. Consequently, a malicious user can take advantage of unsafe code regions to accomplish malicious actions. To counter this weakness, the CLR will only allow the execution of this code at run-time if the code has the SkipVerification CAS meta-permission. Since it allows to directly manipulating the memory of a process through the use of an unmanaged pointer, unsafe code is particularly useful to optimize certain processes on large amounts of data stored in structures. Compilation options to allow unsafe code Unsafe code must be used on purpose and you must also provide the /unsafe option to the csc.exe compiler to tell it that you are aware that the code you wish to compile contains zones which will be seen as unverifiable by the JIT compiler. Visual Studio offers the Build Allow unsafe code project property to indicate that you wish to use this compiler option. Declaring unsafe code in C# In C#, the unsafe keyword lets the compiler know when you will use unsafe code. It can be used in three situations: Before the declaration of a class or structure. In this case, all the methods of the type can use pointers. Before the declaration of a method. In this case, the pointers can be used within the body of this method and in its signature. Within the body of a method (static or not). In this case, pointers are only allowed within the marked block of code. For example: unsafe{...} Let us mention that if a method accepts at least one pointer as an argument or as a return value, the method (or its class) must be marked as unsafe, but also all regions of code calling this method must also be marked as unsafe. Using pointers in C# Each object, whether it is a value or reference type instance, has a memory address at which it is physically located in the process. This address is not necessarily constant during the lifetime of the object as the garbage collector can physically move objects store in the heap..NET types that support pointers For certain types, there is a dual type, the unmanaged pointer type which corresponds to the managed type. A pointer variable is in fact the address of an instance of the concerned type. The set of types which authorizes the use of pointers limits itself to all value types, with the exception of structures with at least one reference type field. Consequently, only instances of the following types can be used through pointers: primitive types; enumerations; structures with no reference type fields; pointers.Declaring pointers A pointer might point to nothing. In this case, it is extremely important that its value should be set to null (0). In fact, the majority of bugs due to pointers come from pointers which are not null but which point to invalid data. The declaration of a pointer on the FooType is done as follows: FooType * pointeur; For example: long * pAnInteger = 0; Note that the declaration... int * p1,p2; ... makes it so that p1 is a pointer on an integer and p2 is a pointer.Indirection and dereferencing operators In C#, we can obtain a pointer on a variable by using the address of operator &. For example: long anInteger = 98;long * pAnInteger = &anInteger; We can access to the object through the indirection operator *. For example: long anInteger = 98;long * pAnInteger = &anInteger;long anAnotherInteger = *pAnInteger;// Here, the value of 'anAnotherInteger' is 98. The sizeof operator The sizeof operator allows obtaining the size in bytes of instances of a value type. This operator can only be used in unsafe mode. For example: int i = sizeof(int) // i is equal to 4int j = sizeof(double) // j is equal to 8 Pointer arithmetic A pointer on a type T can be modified through the use of the '++' and '--' unary operator. The '-' operator can also be used with pointers. The '++' operator increments the pointer by sizeof(T) bytes. The '--' operator decrements the pointer by sizeof(T) bytes. The '-' operator used between two pointers of same type T, returns a value of type long. This value is equal to the byte offset between the two pointers divided by sizeof(T). The comparison can also be used on two pointers of a same or different type. The supported comparison operators are:== != < > <= >=Pointer casting Pointers in C# do not derive from the Object class and thus the boxing and unboxing does not exist on pointers. However, pointers support both implicit and explicit casting. Implicit casts are done from any type of pointer to a pointer of type void*. Explicit casts are done from: Any pointer type to any other pointer type. Any pointer type to the sbyte, byte, short, ushort, int, uint, long, ulong types (caution, we are not talking about the sbyte*, byte*, short*... types). One of sbyte, byte, short, ushort, int, uint, long, ulong types to any pointer type. Double pointers Let us mention the possibility of using a pointer on a pointer (although somewhat useless in C#). Here, we talk of a double pointer. For example: long aLong = 98;long * pALong = &aLong;long ** ppALong = &pALong ; It is important to have a naming convention for pointers and double pointers. In general the name of a pointer is prefixed with 'p' while the name of a double pointer is prefixed with 'pp'. Pinned object The garbage collector has the possibility of physically moving the objects for which it is responsible. Objects managed by the garbage collector are generally reference type's instances while pointed objects are value type's instances. If a pointer points to a value type field of an instance of a reference type, there will be a potential problem as the instance of the reference type can be moved at any time by the garbage collector. The compiler forces the developer to use the fixed keyword in order to tell the garbage collector not to move reference type instances which contain a value field pointed to by a pointer. The syntax of the fixed keyword is the following: class Article { public long Price = 0;}unsafe class Program { unsafe public static void Main() { Article article = new Article(); fixed ( long* pPrice = &article.Price ) { // Here, you can use the pointer 'pPrice' and the object // referenced by 'article' cannot be moved by the GC. } // Here, 'pPrice' is not available anymore and the object // referenced by 'article' is not pinned anymore. }} If we had not used the fixed keyword in this example, the compiler would have produced an error as it can detect that the object referenced by the article may be moved during execution. We can pin several objects of a same type in the same fixed block. If we need to pin objects of a several types, you will need to use nested fixed blocks. You must pin objects the least often as possible, for the shortest duration possible. When objects are pinned, the work of the garbage collector is impaired and less efficient. Variables of a value type declared as local variable in a method do not need to be pinned since they are not managed by the garbage collector.Pointers and arrays In C#, the elements of an array made from a type which can be pointed to can be accessed by using pointers. Let us precise that an array is an instance of the System.Array class and is stored on the managed heap by the garbage collector. Here is an example which both shows the syntax but also the overflow of the array (which is not detected at compilation or execution!) due to the use of pointers: using System;public class Program { unsafe public static void Main() { // Create an array of 4 integers. int [] array = new int[4]; for( int i=0; i < 4; i++ ) array[i] = i*i; Console.WriteLine( "Display 6 items (oops!):" ); fixed( int *ptr = array ) for( int j = 0; j< 6 ; j++ ) Console.WriteLine( *(ptr+j) ); Console.WriteLine( "Display all items:" ); foreach( int k in array ) Console.WriteLine(k); }} Here is the display: Display 6 items (oops!):014902042318948Display all items:0149 Note that it is necessary to only pin the array and not each element of the array. This confirms the fact that during execution, the value type elements of an array are store in contiguous memory.Fixed arrays C#2 allows the declaration of an array field composed of a fixed number of primitive elements within a structure. For this, you simply need to declare the array using the fixed keyword and the structure using the unsafe keyword. In this case, the field is not of type System.Array but of type a pointer to the primitive type (i.e. the FixedArray field is of type int* in the following example): Example: unsafe struct Foo{ public fixed int FixedArray[10]; public int Overflow;}unsafe class Program { unsafe public static void Main() { Foo foo = new Foo(); foo.Overflow = -1; System.Console.WriteLine( foo.Overflow ); foo.FixedArray[10] = 99999; System.Console.WriteLine( foo.Overflow ); }} This example displays: -199999 Understand that FixedArray[10] is a reference to the eleventh element of the array since the indexes are zero based. Hence, we assign the 99999 value to the Overflow integer.Allocating memory on the stack with the stackalloc keyword C# allows you to allocate on the stack an array of elements of a type which can by pointed to. The stackalloc keyword is used for this, with the following syntax: public class Program { unsafe public static void Main() { int * array = stackalloc int[100]; for( int i = 0; i< 100 ; i++ ) array[i] = i*i; }} None of the elements of the array are initialized, which means that it is the responsibility of the developer to initialize them. If there is insufficient memory on the stack, the System.StackOverflowException exception is raised. The size of the stack is relatively small and we can allocate arrays containing only a few thousand elements. This array is freed implicitly when the method returns.Strings and pointers The C# compiler allows you to obtain a pointer of type char from an instance of the System.String class. You can use this feature to circumvent managed string immutability. Let us remind that managed string immutability allows to considerably ease their use. However, this can have a negative impact on performance. The System.StringBuiler class is not always the proper solution and it can also be useful to directly modify the characters of a string. The following example shows how to use this feature to write a method which converts a string to uppercase: public class Program { static unsafe void ToUpper( string str ) { fixed ( char* pfixed = str ) for ( char* p = pfixed; *p != 0; p++ ) *p = char.ToUpper(*p); } static void Main() { string str = "Hello"; System.Console.WriteLine(str); ToUpper(str); System.Console.WriteLine(str); }} Delegates and unmanaged function pointers You can invoke a function defined in a native DLL by the intermediate of a delegate fabricated from an unmanaged function pointer. In fact, using the GetDelegateForFunctionPointer() and GetFunctionPointerForDelegate() static methods of the Marshal class, the notion of delegates and function pointers becomes interchangeable: using System;using System.Runtime.InteropServices;class Program { internal delegate bool DelegBeep(uint iFreq, uint iDuration); [DllImport("kernel32.dll")] internal static extern IntPtr LoadLibrary(String dllname); [DllImport("kernel32.dll")] internal static extern IntPtr GetProcAddress(IntPtr hModule,String procName); static void Main() { IntPtr kernel32 = LoadLibrary( "Kernel32.dll" ); IntPtr procBeep = GetProcAddress( kernel32, "Beep" ); DelegBeep delegBeep = Marshal.GetDelegateForFunctionPointer(procBeep , typeof( DelegBeep ) ) as DelegBeep; delegBeep(100,100); }}

Wednesday, April 05, 2006

Writing C# 2.0 Unsafe Code

No comments: