Invited signatures |
The IEnumerable<T> interface
Published Feb. 20, 2007
|
1. IntroductionIn order to fully understand the new possibilities that LINQ will open to developers with the appearance of the next version of .NET, it is crucial to master several programming language resources this technology will rest upon; resources that were introduced in .NET Framework and the Microsoft languages for the platform (C# and Visual Basic) since .NET 2.0 [2]. These resources are: · Generics, which allow us to define types and methods parameterized with respect to one or more data types. · Iterators, which make it possible to concisely specify mechanisms for the on-demand or lazy iteration over the elements of a sequence. · Anonymous methods, which allow us to specify inline the code of a method to be referred by a delegate. In a previous installment we have already presented lambda expressions, a further improvement along that path that will be available in C# 3.0 [3]. · Nullable types, which make it possible to use the traditional null value semantics of reference types also with value types.
Along this installment and the next we will introduce two generic interfaces that play a very important role in LINQ [4, 5]: IEnumerable<T> and IQueryable<T>.
2. The IEnumerable<T> interfaceThe generic interface IEnumerable<T> (namespace System.Collections.Generic) was introduced in .NET 2.0 with the main goal of playing for generic types the same role that System.Collections.IEnumerable played in .NET 1.x: that of offering a mechanism to iterate over the elements of a sequence, generally with the ultimate goal of applying to such sequences the foreach programming pattern. As for LINQ, the importance of the IEnumerable<T> interface stems from the fact that any data type that implements it can directly serve as a source for query expressions. In particular, .NET 2.0 arrays and generic collections do implement it. The definition of IEnumerable<T> is as follows: // System.Collections.Generic public interface IEnumerable<T> : IEnumerable { IEnumerator<T> GetEnumerator(); } // System.Collections public interface IEnumerable { IEnumerator GetEnumerator(); }
As can be seen, the interface adds a method GetEnumerator(), that returns an enumerator – an object whose goal is to produce, as its name states, the elements of the sequence in some defined order. In order to make possible the iteration over generic collections from non-generic code, IEnumerable<T> inherits from it non-generic counterpart, IEnumerable, and so must also implement a non-generic version of GetEnumerator(), in which generally the same code of the generic version can be used. On the other hand, IEnumerator<T> is defined like this: // System.Collections.Generic public interface IEnumerator<T> : IDisposable, IEnumerator { T Current { get; } } // System.Collections public interface IEnumerator { object Current { get; } void Reset(); bool MoveNext(); }
Again, this interface relies on its non-generic counterpart. Overall, an IEnumerator<T> compliant type must implement the following members:
A class that implements IEnumerator<T> should take care of maintaining across iterations the state needed in order to guarantee the proper functioning of the interface methods. The reader could ask, why this separation in two levels, basically IEnumerable<T> being at the higher level, whereas IEnumerator<T> takes care of the “dirty work”? Why not letting the collections to directly implement IEnumerator<T>? The answer has to do with the need of allowing nested iterations over the same sequence. If the sequence implemented directly the enumerator interface, it would only support only one “iteration state” at any moment, and it would not be possible to program over the sequence several nested loops, such as those that appear in the typical implementation of a bubble sort. Instead of that, types should implement IEnumerable<T>, whose method GetEnumerator() should produce a new enumerator object every time it is called.
3. The foreach semantics for IEnumerable<T>Given the previous explanation, it should be more or less clear what the semantics of the foreach loop are when applied to objects that implement IEnumerable<T>. Basically, foreach obtains an enumerator by means of a call to GetEnumerator(), and then traverses it using its MoveNext() method. If we wanted to apply the same action to all the elements of an enumerable generic sequence, we would use the following code:
delegate void M<T>(T t); static void Iterate<T>(IEnumerable<T> secuencia, M<T> metodo) { foreach(T t in secuencia) metodo(t); }
The equivalent code without using foreach would be: static void Iterate<T>(IEnumerable<T> secuencia, M<T> metodo) { { IEnumerator<T> e = secuencia.GetEnumerator(); try { e.Reset(); while (e.MoveNext()) { T t = e.Current; metodo(t); } } finally { e.Dispose(); } } }
This is how we could use any of the previous methods in order to print on the console the integers in a list. Note the usage of a lambda expression [3] as an alternative to the instantiation of the corresponding delegate: static void Main(string[] args) { List<int> intList = new List<int>() { 1, 3, 5 }; Iterate<int>(intList, (int x) => Console.WriteLine(x)); }
4. An example of implementation of IEnumerable<T>Although this is not something you will frequently need to do, let’s program from the very start a class that implements IEnumerable<T> (in two different ways); this will help the reader to better understand the complexity associated to the implementation of this interface and will allow us to refresh the concepts associated with C# 2.0 iterators. The class we are going to develop will allow us to iterate over the sequence of the natural numbers from 1 to 1000, both inclusive. In our first, more “classical” implementation, we explicitly code all the methods in the IEnumerable<int> and IEnumerator<int> interfaces. Note the definition of the enumerator class as a nested class, a frequently used technique in cases like this: public class NaturalNumbersSequence: IEnumerable<int> { public class NaturalEnumerator: IEnumerator<int> { private int current = 1; private bool atStart = true; // interface members public int Current { get { return current; } } object IEnumerator.Current { get { return current; } } public void Reset() { atStart = true; current = 1; } public bool MoveNext() { if (atStart) { atStart = false; return true; } else { if (current < 1000) { current++; return true; } else return false; } } public void Dispose() { // do nothing } } public IEnumerator<int> GetEnumerator() { return new NaturalEnumerator(); } IEnumerator IEnumerable.GetEnumerator() { return new NaturalEnumerator(); } }
The second version builds upon the concept of iterator of C# 2.0, and offers an equivalent but much terser implementation: public class NaturalNumbersSequence: IEnumerable<int> { public IEnumerator<int> GetEnumerator() { for (int i = 1; i <= 1000; i++) yield return i; } IEnumerator IEnumerable.GetEnumerator() { for (int i = 1; i <= 1000; i++) yield return i; } }
In this version, the compiler takes care of synthesizing a class very similar to the NaturalEnumerator of the first version, and of building and returning an object of that class whenever an iteration is going to start. For any of the two versions, a fragment of code for iterating over that sequence of numbers will look the same: foreach(int i in new NaturalNumbersSequence()) Console.WriteLine(i);
5. On-demand generation during iterationSomething to bear in mind regarding IEnumerable<T> is the fact that, unless the task is to iterate over the elements of an array or collection already available in memory beforehand, possibly the different elements that compose a sequence will be produced dynamically as they are needed, a fact that in the world of programming is known as on-demand, deferred or lazy evaluation. For instance, suppose that we have defined an enumerable class named PrimeSequence that produces the sequence of the prime numbers: public class PrimeSequence: IEnumerable<int> { private IEnumerator<int> getEnumerator() { int i = 2; while (true) { if (i.IsPrime()) yield return i; i++; } } public IEnumerator<int> GetEnumerator() { return getEnumerator(); } IEnumerator IEnumerable.GetEnumerator() { return getEnumerator(); } }
Here (in order to continue exposing the new features in C# 3.0 :-) we have implemented IsPrime() as an extension method [4]. The set of the prime numbers is (potentially) infinite, so it would be unfeasible to generate beforehand all the members of the sequence; also, it is a well known fact that prime numbers become more and more scarce as we move along the numerical axis – so it would be very inefficient to generate 10,000 prime numbers just to consume a few of them later. Using the enumerable class we have implemented here, those problems are not relevant at all: new prime numbers will be calculated only when the client code will ask for them. Of course, in this case, the client code should be responsible for aborting the iteration, as the synthesized MoveNext() will always return true: // obtain primes less than 1000 foreach(int i in new PrimeSequence()) if (i > 1000) break; else Console.WriteLine(i);
6. ConclusionIn this article we have presented the IEnumerable<T> interface, which a .NET programmer must know well due to the importance it already has and the new relevance it will acquire with the next 3.0 version of C#. In our next installment we will talk about IQueryable<T>, a new interface that will play a crucial role in the implementation of technologies like LINQ To SQL (the extension of LINQ for relational databases). The source code of the examples can be downloaded from this site. In order to run it, the May 2006 LINQ Preview, available at [1], must be downloaded and installed.
7. References
|
Sample code (ZIP): |
File with sample code:
octavio_IEnumerable.zip - 5.74 KB
|