A full knowledge of the C# 3.0 language enhancements is not
necessary to use Language Integrated Query (LINQ). For example, none of the new
language features require a modification of the common language runtime (CLR).
LINQ relies on new compilers (C# 3.0 or Microsoft Visual Basic 9.0), and these
compilers generate intermediate code that works well on Microsoft .NET 2.0,
given that you have the LINQ libraries available.
However, in this article, we provide short descriptions of C#
features (ranging from C# 1.x to C# 3.0) that you need
to clearly understand to work with LINQ most effectively. If you decide to skip
this article, you can come back to it later when you want to understand what is
really going on within LINQ syntax.
C# 2.0 Revisited
C# 2.0 improved the original C# language in many ways. For
example, the introduction of generics enabled developers to use C# to define
methods and classes having one or more type parameters. Generics are a
fundamental pillar of LINQ.
In this section, we will describe several C# 2.0 features that are
important to LINQ: generics, anonymous methods (which are the basis of lambda
expressions in C# 3.0), the yield keyword, and the
IEnumerable interface. You need to understand these concepts well to
best understand LINQ.
Generics
Many programming languages handle variables and objects by
defining specific types and strict rules about converting between types. Code
that is written in a strongly typed language lacks something in terms of
generalization, however. Consider the following code:
To use this code, we need a different version of Min
for each type of parameter we want to compare. Developers who are accustomed to
using objects as placeholders for a generic type (which is common with
collections) might be tempted to write a single Min function
such as this:
Unfortunately, the less than operator
(<) is not defined for the generic object type. We need to use a common (or
“generic”) interface to do that:
However, even if we solve this problem, we are faced with a bigger
issue: the indeterminate result type of the Min function.
A caller of Min that passes two integers should make a
type conversion from IComparable to int,
but this might raise an exception and surely would involve a CPU cost:
C# 2.0 solved this problem with generics. The basic principle of
generics is that type resolution is moved from the C# compiler to the jitter.
Here is the generic version of the Min function:
|
Note |
The jitter is the run-time compiler that is part of the .NET
runtime. It translates intermediate language (IL) code to machine code. When
you compile .NET source code, the compiler generates an executable image
containing IL code, which is compiled in machine code instructions by the
jitter at some point before the first execution.
|
Moving type resolution to the jitter is a good compromise: the
jitter can generate many versions of the same code, one for each type that is
used. This approach is similar to a macro expansion, but it differs in the
optimizations used to avoid code proliferation-all versions of a generic
function that use reference types as generic types share the same compiled
code, while the difference is maintained against callers.
With generics, instead of this:
you can write code such as this:
The cast for Min results has disappeared,
and the code will run faster. Moreover, the compiler can infer the generic
T type of the Min function from the parameters,
and we can write this simpler form:
|
Type Inference |
Type inference is a key feature. It allows you to write more
abstract code, making the compiler handle details about types. Nevertheless,
the C# implementation of type inference does not remove type safety and can
intercept wrong code (for example, a call that uses incompatible types) at
compile time.
|
Generics can also be used with type declarations (as classes
and interfaces) and not only to define generic methods. As we said earlier, a
detailed explanation of generics is not the goal of this book, but we want to
emphasize that you have to be comfortable with generics to work well with LINQ.
Delegates
A delegate is a class that encapsulates one or more methods.
Internally, one delegate stores a list of method pointers, each of which can be
paired with a reference to an instance of the class containing an instance
method.
A delegate can contain a list of several methods, but our attention
in this section is on delegates that contain only one method. From an abstract
point of view, a delegate of this type is like a “code container.” The code in
that container is not modifiable, but it can be moved along a call stack or
stored in a variable until its use is no longer necessary. It stores a context
of execution (the object instance), extending the lifetime of the object until
the delegate is valid.
The syntax evolution of delegates is the foundation for anonymous
methods, which we will cover in the next section. The declaration of a delegate
actually defines a type that will be used to create instances of the delegate
itself. The delegate declaration requires a complete method signature. In the
code below, we declare three different types: each one can be instantiated only
with references to methods with the same signatures.
Listing 2-1: Delegate
declaration
delegate void SimpleDelegate();
delegate int ReturnValueDelegate();
delegate void TwoParamsDelegate( string name, int age );
Delegates are a typed and safe form of old-style C function
pointers. With C# 1.x, a delegate instance can be
created only through an explicit object creation, such as those shown below.
Listing 2-2: Delegate
instantiation (C# 1.x)
public class DemoDelegate {
void MethodA() { … }
int MethodB() { … }
void MethodC( string x, int y ) { … }
void CreateInstance() {
SimpleDelegate a = new SimpleDelegate( MethodA );
ReturnValueDelegate b = new ReturnValueDelegate ( MethodB );
TwoParamsDelegate c = new TwoParamsDelegate( MethodC );
// …
}
}
The original syntax needed to create a delegate instance is
tedious: you always have to know the name of the delegate class, even if the
context forces the requested type, because it does not allow any other. This
requirement means, however, that the delegate type can be safely inferred from
the context of an expression.
C# 2.0 is aware of this capability and allows you to skip part of
the syntax. The previous delegate instances we have shown can be created
without the new keyword. You only need to specify the
method name. The compiler infers the delegate type from the assignment. If you
are assigning a SimpleDelegate type variable, the
new SimpleDelegate code is automatically generated by the C# compiler,
and the same is true for any delegate type. The code for C# 2.0 shown below
produces the same compiled IL code as the C# 1.x sample
code.
Listing 2-3: Delegate
instantiation (C# 2.0)
public class DemoDelegate {
void MethodA() { … }
int MethodB() { … }
void MethodC( string x, int y ) { … }
void CreateInstance() {
SimpleDelegate a = MethodA;
ReturnValueDelegate b = MethodB;
TwoParamsDelegate c = MethodC;
// …
}
// …
}
You can also define a generic delegate type, which is useful when a
delegate is defined in a generic class and is an important capability for many
LINQ features.
The common use for a delegate is to inject some code into an
existing method. In Listing 2-4, we assume that Repeat10Times
is an existing method that we do not want to change.
Listing 2-4: Common
use for a delegate
public class Writer {
public string Text;
public int Counter;
public void Dump() {
Console.WriteLine( Text );
Counter++;
}
}
public class DemoDelegate {
void Repeat10Times( SimpleDelegate someWork ) {
for (int i = 0; i < 10; i++) someWork();
}
void Run1() {
Writer writer = new Writer();
writer.Text = "C# article";
this.Repeat10Times( writer.Dump );
Console.WriteLine( writer.Counter );
}
// …
}
The existing callback is defined as SimpleDelegate,
but we want to pass a string to the injected method and we want to count how
many times the method is called. We define the Writer class,
which contains instance data that acts as a sort of parameter for the
Dump method. As you can see, we need to define a separate class just to
put together code and data that we want to use. A simpler way to code a similar
pattern is to use the anonymous method syntax.
Anonymous Methods
In the previous section, we illustrated a common use for a
delegate. C# 2.0 established a way to write the code shown in Listing 2-4 more
concisely by using an anonymous method. Listing 2-5 shows an example.
Listing 2-5: Using
an anonymous method
public class DemoDelegate {
void Repeat10Times( SimpleDelegate someWork ) {
for (int i = 0; i < 10; i++) someWork();
}
void Run2() {
int counter = 0;
this.Repeat10Times( delegate {
Console.WriteLine( "C# article" );
counter++;
} );
Console.WriteLine( counter );
}
// …
}
In this code, we no longer declare the Writer
class. The compiler does this for us automatically with a hidden and
automatically generated class name. Instead, we define a method inside the
Repeat10Times call, which might seem as though we are really passing a
piece of code as a parameter. Nevertheless, the compiler converts this code
into a pattern similar to the common delegate example with an explicit
Writer class. The only evidence for this conversion in our source code
is the delegate keyword before the code block. This
syntax is called an anonymous method.
|
Note |
Remember that you cannot pass code into a variable. You can
only pass a pointer to some code. Repeat this to yourself a couple of times
before going on.
|
The delegate keyword for anonymous methods
precedes the code block. When we have a method signature for a delegate that
contains one or more parameters, this syntax allows us to define the names of
the parameters for the delegate. The code in Listing 2-6 defines an
anonymous method for the TwoParamsDelegate delegate
type.
Listing 2-6: Parameters
for an anonymous method
public class DemoDelegate {
void Repeat10Times( TwoParamsDelegate callback ) {
for (int i = 0; i < 10; i++) callback( "Linq book", i );
}
void Run3() {
Repeat10Times( delegate( string text, int age ) {
Console.WriteLine( "{0} {1}", text, age );
} );
}
// …
}
We are now passing two implicit parameters to the delegate inside
the Repeat10Times method. Think about it: if you were
to remove the declaration for the text and
age parameters, the delegate block would generate two errors of
undefined names.
|
Important |
You will (indirectly) use delegates and anonymous methods in
C# 3.0, and for this reason, it is important to understand the concepts behind
them. Only in this way can you master this higher level of abstraction that
hides growing complexity.
|
Enumerators and Yield
C# 1.x defines two interfaces to
support enumeration. The namespace System.Collections contains
these declarations, shown in Listing 2-7.
Listing 2-7: IEnumerator
and IEnumerable declarations
public interface IEnumerator {
bool MoveNext();
object Current { get; }
void Reset();
}
public interface IEnumerable {
IEnumerator GetEnumerator();
}
An object that implements IEnumerable can
be enumerated through an object that implements IEnumerator.
The enumeration can be performed by calling the MoveNext
method until it returns false.
The code in Listing 2-8 defines a class that can be enumerated
in this way. As you can see, the CountdownEnumerator class
is more complex, and it implements the enumeration logic in a single place. In
this sample, the enumerator does not really enumerate anything but simply
returns descending numbers starting from the StartCountdown
number defined in the Countdown class (which is also
the enumerated class).
Listing 2-8: Enumerable
class
public class Countdown : IEnumerable {
public int StartCountdown;
public IEnumerator GetEnumerator() {
return new CountdownEnumerator( this );
}
}
public class CountdownEnumerator : IEnumerator {
private int _counter;
private Countdown _countdown;
public CountdownEnumerator( Countdown countdown ) {
_countdown = countdown;
Reset();
}
public bool MoveNext() {
if (_counter > 0) {
_counter--;
return true;
}
else {
return false;
}
}
public void Reset() {
_counter = _countdown.StartCountdown;
}
public object Current {
get {
return _counter;
}
}
}
The real enumeration happens only when the CountdownEnumerator
is used by a code block. For example, one possible use is shown in Listing
2-9.
Listing 2-9: Sample
enumeration code
public class DemoEnumerator {
public static void DemoCountdown() {
Countdown countdown = new Countdown();
countdown.StartCountdown = 5;
IEnumerator i = countdown.GetEnumerator();
while (i.MoveNext()) {
int n = (int) i.Current;
Console.WriteLine( n );
}
i.Reset();
while (i.MoveNext()) {
int n = (int) i.Current;
Console.WriteLine( "{0} BIS", n );
}
}
// …
}
The GetEnumerator call provides the
enumerator object. We make two loops on it just to show the use of the
Reset method. We need to cast the Current return
value to int because we are using the nongeneric
version of the enumerator interfaces.
|
Note |
C# 2.0 introduced enumeration support through generics. The
namespace System.Collections.Generic contains generic
IEnumerable<T> and IEnumerator<T> declarations.
These interfaces eliminate the need to convert data in and out from an
object type. This capability is important when enumerating value types
because there are no more box or unbox operations that might affect
performance.
|
Since C# 1.x, enumeration code can be
simplified by using the foreach statement. The code in
Listing 2-10 produces a result equivalent to the previous example.
Listing 2-10: Enumeration
using a foreach statement
public class DemoEnumeration {
public static void DemoCountdownForeach() {
Countdown countdown = new Countdown();
countdown.StartCountdown = 5;
foreach (int n in countdown) {
Console.WriteLine( n );
}
foreach (int n in countdown) {
Console.WriteLine( "{0} BIS", n );
}
}
// …
}
Using foreach, the compiler generates an
initial call to GetEnumerator and a call to
MoveNext before each loop. The real difference is that the code
generated by foreach never calls the Reset
method: two instances of CountdownEnumerator objects
are created instead of one.
|
Note |
The foreach statement can also be
used with classes that do not expose an IEnumerable interface
but that have a public GetEnumerator method.
|
C# 2.0 introduced the yield statement
through which the compiler automatically generates a class that implements the
IEnumerator interface returned by the GetEnumerator
method. The yield statement can be used only
immediately before a return or break
keyword. The code in Listing 2-11 generates a class equivalent to the
previous CountdownEnumerator.
Listing 2-11: Enumeration
using a yield statement
public class CountdownYield : IEnumerable {
public int StartCountdown;
public IEnumerator GetEnumerator() {
for (int i = StartCountdown - 1; i >= 0; i--) {
yield return i;
}
}
}
From a logical point of view, the yield return
statement is equivalent to suspending execution, which is resumed at the next
MoveNext call. Remember that the GetEnumerator method
is called only once for the whole enumeration, and it returns a class that
implements an IEnumerator interface. Only that class
really implements the behavior defined in the method that contains the
yield statement.
A method that contains yield statements is
called an iterator. An iterator can include many
yield statements. The code in Listing 2-12 is perfectly valid and
is functionally equivalent to the previous CountdownYield
class with a StartCountdown value of 5.
Listing 2-12: Multiple
yield statements
public class CountdownYieldMultiple : IEnumerable {
public IEnumerator GetEnumerator() {
yield return 4;
yield return 3;
yield return 2;
yield return 1;
yield return 0;
}
}
By using the generic version of IEnumerator,
it is possible to define a strongly typed version of the CountdownYield
class, shown in Listing 2-13.
Listing 2-13: Enumeration
using yield (typed)
public class CountdownYieldTypeSafe : IEnumerable<int> {
public int StartCountdown;
IEnumerator IEnumerable.GetEnumerator() {
return this.GetEnumerator();
}
public IEnumerator<int> GetEnumerator() {
for (int i = StartCountdown - 1; i >= 0; i--) {
yield return i;
}
}
}
The strongly typed version contains two GetEnumerator
methods: one is for compatibility with nongeneric code (returning
IEnumerable), and the other is the strongly typed one (returning
IEnumerator<int>).
The internal implementation of LINQ to Objects makes extensive
use of enumerations and yield. Even if they work under
the covers, keep their behavior in mind while you are debugging code.