|
|
|
|
Query Operators
The remaining sections of article describe the main methods
and generic delegates provided by the System.Linq namespace
to query items with LINQ.
The Where Operator
Imagine that you need to list the names and cities of
customers from Italy. To filter a set of items, you can use the
Where operator, which is also called a restriction operator because it
restricts a set of items. Listing 4-3
shows a simple example.
Listing 4-3: A
query with a restriction
var expr =
from c in customers
where c.Country == Countries.Italy
select new { c.Name, c.City };
Here are the signatures of the Where operator:
As you can see, two signatures are available. In
Listing 4-3, we used the first signature, which enumerates items of the
source sequence and yields those that verify the predicate (c.Country
== Countries.Italy). The second signature accepts an additional
parameter of type Integer for the predicate. This
argument is used as a zero-based index of the elements within the
source sequence. Keep in mind that if you pass null arguments to the
predicates, an ArgumentNullException error will be
thrown. You can use the index parameter to start filtering by a particular
index, as shown in Listing 4-4.
Listing 4-4: A
query with a restriction and an index-based filter
var expr =
customers
.Where((c, index) => (c.Country == Countries.Italy && index >= 1))
.Select(c => c.Name);
|
Important |
In Listing 4-4, we
cannot use the LINQ query syntax because the Where version
that we want to call is not supported by an equivalent LINQ query clause. We
will use both syntaxes from here onward.
|
The result of Listing 4-4 will
be the list of Italian customers, skipping the first one. The capability to
filter items of the source sequence by using their
positional index is useful when you want to extract a specific page of data
from a large sequence of items. Listing 4-5
shows an example.
Listing 4-5: A
query with a paging restriction
int start = 5;
int end = 10;
var expr =
customers
.Where((c, index) => ((index >= start) && (index < end)))
.Select(c => c.Name);
Keep in mind that it is generally not a good practice to store
large sequences of data loaded from a database persistence layer in memory;
usually, it is better to page data at the persistence layer level. Therefore,
use this paging technique only if you have already loaded data into memory.
Reloading the current page from a persistence layer is less efficient than
directly accessing the sequence already loaded “in memory.”
Projection Operators
The following sections describe how to use projection
operators. These operators are used to select (or “project”) contents from the
source enumeration into the result.
Select
In Listing 4-3,
you saw an example of defining the result of the query by using the
Select operator. The signatures for the Select operator
are shown here:
The Select operator is one of the
projection operators because it projects the query results, making them
available through an object that implements IEnumerable<T>.
This object will enumerate items identified by the selector
predicate. Like the Where operator, Select
enumerates the source sequence and yields the result of
the selector predicate. Consider the following
predicate:
This predicate’s result will be a sequence of customer names (IEnumerable<string>).
Now consider this example:
This predicate projects a sequence of an anonymous type,
defined as a tuple of Name and City,
for each customer object. With the second overload of Select,
we can also provide an argument of type
Integer for the predicate. This zero-based index is used to define the
positional index of each item inserted in the resulting sequence.
SelectMany
Imagine that you want to select all the orders of customers
from Italy. You could write the query shown in
Listing 4-6 using the verbose method.
Listing 4-6: The
list of orders made by Italian customers
var orders =
customers
.Where(c => c.Country == Countries.Italy)
.Select(c => c.Orders);
foreach(var item in orders) { Console.WriteLine(item); }
Because of the behavior of the Select operator,
the resulting type of this query will be IEnumerable<Order[]>,
where each item in the resulting sequence represents the array of orders of a
single customer. In fact, the Orders property of a
Customer instance is of type Order[]. The
output of the code in Listing 4-6 would
be the following:
To have a “flat” IEnumerable<Order> result
type, we need to use the SelectMany operator:
This operator enumerates the source sequence
and merges the resulting items, providing them as a single enumerable sequence.
The second overload available is analogous to the equivalent overload for
Select, which allows a zero-based integer index for indexing purposes.
Listing 4-7 shows an example.
Listing 4-7: The
flattened list of orders made by Italian customers
IEnumerable<Order> orders =
customers
.Where(c => c.Country == Countries.Italy)
.SelectMany(c => c.Orders);
Using the query expression syntax, the query in
Listing 4-7 can be written with the code shown in
Listing 4-8.
Listing 4-8: The
flattened list of orders made by Italian customers, written with a query
expression
IEnumerable<Order> orders =
from c in customers
where c.Country == Countries.Italy
from o in c.Orders
select o;
The select keyword in query expressions,
for all but the initial from clause, is translated to
invocations of SelectMany. In other words, every time
you see a query expression with more than one from clause,
you can apply this rule: the select over the first
from clause is converted to an invocation of Select,
and the other select commands are translated into a
SelectMany call.
The third overload of SelectMany is useful
whenever you need to select a custom result from the source set of sequences
instead of simply merging their items, as with the two previous overloads. This
overload invokes the collectionSelector predicate over
the source sequence and returns the result of the
resultSelector predicate, applied to each item in the collections
selected by collectionSelector. In
Listing 4-9, you can see an example of this method, used to extract a
new anonymous type made from the Quantity and
IdProduct of each order of Italian customers.
Listing 4-9: The
list of Quantity and IdProduct of orders made by Italian customers
var items = customers
.Where(c => c.Country == Countries.Italy)
.SelectMany(c => c.Orders,
(c, o) => new {o.Quantity, o.IdProduct});
The query in Listing 4-9 can
be written with the query expression shown in
Listing 4-10.
Listing 4-10: The
list of Quantity and IdProduct of orders made by Italian customers, written
with a query expression
IEnumerable<Order> orders =
from c in customers
where c.Country == Countries.Italy
from o in c.Orders
select new {o.Quantity, o.IdProduct};
Ordering Operators
Another useful set of operators is the ordering operators
group. Ordering operators are used to determine the ordering and direction of
elements in output sequences.
OrderBy and OrderByDescending
Sometimes it is helpful to apply an order to the results of a
database query. LINQ can order the results of queries, in ascending or
descending order, by using ordering operators, just as we do in SQL syntax. For
instance, if you need to select the Name and
City of all Italian customers in descending order by Name,
you can write the corresponding query expression shown in
Listing 4-11.
Listing 4-11: A
query expression with a descending orderby clause
var expr =
from c in customers
where c.Country == Countries.Italy
orderby c.Name descending
select new { c.Name, c.City };
The query expression syntax will translate the orderby
keyword into one of the following ordering extension methods:
As you can see, the two main extension methods, OrderBy
and OrderByDescending, both have two overloads. The
methods’ names suggest their objective: OrderBy is for
ascending order, and OrderByDescending is for
descending order. The keySelector argument represents a
function that extracts a key, of type K, from each item
of type T, taken from the source
sequence. The extracted key represents the typed content to be compared by the
comparer while ordering, and the T type describes the
type of each item of the source sequence. Both methods
have an overload that allows you to provide a custom comparer. If no comparer
is provided or the comparer argument is null, a default
comparer is used (Comparer<K>.Default). It is
important to emphasize that these ordering methods return not just
IEnumerable<T> but IOrderedSequence<T>,
which is an interface that implements IEnumerable<T>
internally.
The code sample in Listing
4-11 will be translated to the following:
ThenBy and ThenByDescending
When you need to order data by many different keys, you can
take advantage of the ThenBy and ThenByDescending
operators. Here are their signatures:
These operators have signatures similar to OrderBy
and OrderByDescending. The difference is that
ThenBy and ThenByDescending can be applied only
to IOrderedSequence<T> and not to any
IEnumerable<T>. Therefore, you can use the ThenBy
or ThenByDescending operator just after the first use
of OrderBy or OrderByDescending.
Here is an example:
In Listing 4-12, you
can see the corresponding query expression.
Listing 4-12: A
query expression with orderby and thenby
var expr =
from c in customers
where c.Country == Countries.Italy
orderby c.Name descending, c.City
select new { c.Name, c.City };
|
Important |
In the case of multiple occurrences of the same key within a
sequence to be ordered, the result is not guaranteed to be “stable.” In such
conditions, the original ordering cannot be preserved by the comparer.
|
A custom comparer might be useful when the items in your
source sequence need to be ordered using custom logic. For instance,
imagine that you want to select all the orders of your customers ordered by
month:
If you apply the default comparer to the Month
property of the orders, you will get a result alphabetically ordered. The
result is wrong because the Month property is just a
string and not a number or a date:
You should use a custom MonthComparer that
correctly compares months:
The newly defined custom MonthComparer could
be passed as a parameter while invoking the OrderBy extension
method, as in Listing 4-13.
Listing 4-13: A
custom comparer used with an OrderBy operator
IEnumerable<Order> orders =
customers
.SelectMany(c => c.Orders)
.OrderBy(o => o.Month, new MonthComparer());
Reverse Operator
Sometimes you need to reverse the result of a query, listing
the last item in the result first. LINQ provides a last-ordering operator,
called Reverse, that allows you to perform this
operation:
The implementation of Reverse is quite
simple. It just yields each item in the source sequence
in reverse order. Listing 4-14 shows
an example of its use.
Listing 4-14: The
Reverse operator applied
var expr =
customers
.Where(c => c.Country == Countries.Italy)
.OrderByDescending(c => c.Name)
.ThenBy(c => c.City)
.Select(c => new { c.Name, c.City } )
.Reverse();
The Reverse operator, like many other
operators, does not have a short “alias” in LINQ query expressions. However, we
can merge query expression syntax with operators, as shown in
Listing 4-15.
Listing 4-15: The
Reverse operator applied to a query expression with orderby and thenby
var expr =
(from c in customers
where c.Country == Countries.Italy
orderby c.Name descending, c.City
select new { c.Name, c.City }
).Reverse();
As you can see, we apply the Reverse operator
to the expression resulting from Listing 4-11.
Under the covers, the inner query expression is first translated to the
resulting list of extension methods, and then the Reverse
method is applied. It is just like Listing
4-14, but easier to write.
Grouping Operators
Now you have seen how to select, filter, and order sequences
of items. Sometimes when querying contents, you also need to group results
based on specific criteria. To realize content groupings, you use a grouping
operator.
The GroupBy operator, also called a
grouping operator, is the only operator of this family and provides the
following overloads:
All of these overloads return IEnumerable<IGrouping<K,
T>>, where the IGrouping<K, T> generic
interface is a specialized implementation of IEnumerable<T>.
This implementation can return a specific Key of type
K for each item within the enumeration:
From a practical point of view, a type that implements this generic
interface is simply a typed enumeration with an identifying type
Key for each item. All the GroupBy methods work
on a source sequence as usual, and they call the
keySelector function to extract the Key value
from each item to group results based on the different Key
values. The elementSelector argument, if present,
defines a function that maps the source element within the source
sequence to the destination element of the resulting sequence. If you do not
specify the elementSelector, elements are mapped
directly from the source to the destination.
The GroupBy method selects pairs of keys
and items for each item in source, using the
keySelector predicate and, if present, the elementSelector
argument. Then it yields a sequence of IGrouping<K, T>
objects, where each group consists of a sequence of items with a common
Key value. The last optional argument you can pass to the method is a
custom comparer, which is useful when you need to
compare key values and define group membership. If no custom comparer
is provided, the EqualityComparer<K>.Default is
used. The order of keys and items within each group corresponds to their
occurrence within the source.
Listing 4-16 shows an example of using the GroupBy
operator.
Listing 4-16: The
GroupBy operator used to group customers by Country
var expr = customers.GroupBy(c => c.Country);
foreach(IGrouping<Countries, Customer> customerGroup in expr) {
Console.WriteLine("Country: {0}", customerGroup.Key);
foreach(var item in customerGroup) {
Console.WriteLine(item);
}
}
As Listing 4-16 shows, you
need to enumerate all group keys before iterating over the items contained
within each group. Every group is an instance of a type that implements
IGrouping<Countries, Customer>, because we are using the default
elementSelector that directly projects the source Customer
instances into the result. In query expressions, the GroupBy
operator can be defined using the group …
by … syntax, which is shown in
Listing 4-17.
Listing 4-17: A
query expression with a group by syntax
var expr =
from c in customers
group c by c.Country;
foreach(IGrouping<Countries, Customer> customerGroup in expr) {
Console.WriteLine("Country: {0}", customerGroup.Key);
foreach(var item in customerGroup) {
Console.WriteLine(item);
}
}
The code defined in Listing 4-17
is semantically equivalent to the code shown in
Listing 4-16.
Listing 4-18 is another example
of grouping, this time with a custom elementSelector.
Listing 4-18: The
GroupBy operator used to group customer names by Country
var expr =
customers
.GroupBy(c => c.Country, c => c.Name);
foreach(IGrouping<Countries, string> customerGroup in expr) {
Console.WriteLine("Country: {0}", customerGroup.Key);
foreach(var item in customerGroup) {
Console.WriteLine(" {0}", item);
}
}
Here is the result of this code:
In this last example, the result is a class that implements
IGrouping<Countries, string>, because the elementSelector
predicate projects only the customers’ names (of type string)
into the output sequence.
Join Operators
Join operators are used to define relationships within
sequences in LINQ queries. From a SQL and relational point of view, almost
every query requires joining one or more tables. In LINQ, a set of join
operators is defined to implement this behavior.
Join
The first operator of this group is of course the
Join method, which is defined by the following signatures:
Join requires a set of four generic types. The
T type represents the type of the outer source
sequence, and the U type describes the type of the
inner source sequence. The predicates outerKeySelector
and innerKeySelector define how to extract the
identifying keys from the outer and inner
source sequence items, respectively. These keys are both of type
K, and their equivalence defines the join condition. The
resultSelector predicate defines what to project into the result
sequence, which will be an implementation of IEnumerable<V>.
V is the last generic type needed by the operator, and
it defines the type of each single item in the join result sequence. The second
overload of the method has an additional custom equality comparer, used to
compare the keys. If the comparer argument is NULL or
if the first overload of the method is invoked, a default key comparer (EqualityComparer<TKey>.Default)
will be used.
Here is an example that will make the use of Join
more clear. Think about our customers, with their orders and products. In
Listing 4-19, a query joins orders with their corresponding products.
Listing 4-19: The
Join operator used to map orders with products
var expr =
customers
.SelectMany(c => c.Orders)
.Join( products,
o => o.IdProduct,
p => p.IdProduct,
(o, p) => new {o.Month, o.Shipped, p.IdProduct, p.Price });
The following is the result of the query:
In this example, orders represents the
outer sequence and products is the inner sequence. The
o and p used in lambda expressions are of type
Order and Product, respectively. Internally,
the operator collects the elements of the inner sequence
into a hash table, using their keys extracted with innerKeySelector.
It then enumerates the outer sequence and maps its
elements, based on the Key value extracted with
outerKeySelector, to the hash table of items. Because of its
implementation, the Join operator result sequence keeps
the order of the outer sequence first, and then uses
the order of the inner sequence for each
outer sequence element.
From an SQL point of view, the example in
Listing 4-19 can be thought of as an inner equijoin somewhat like the
following SQL query:
If you want to translate the SQL syntax into the Join
operator syntax, you can think about the columns selection in SQL as the
resultSelector predicate, while the equality condition on
IdProduct columns (of orders and products) corresponds to the pair of
innerKeySelector and outerKeySelector predicates.
The Join operator has a corresponding LINQ
syntax, which is shown in Listing 4-20.
Listing 4-20: The
Join operator query expression syntax
var expr =
from c in customers
from o in c.Orders
join p in products
on o.IdProduct equals p.IdProduct
select new {o.Month, o.Shipped, p.IdProduct, p.Price };
|
Important |
The order of items to relate (o.IdProduct
equals p.IdProduct) in LINQ query syntax must have the outer sequence
first and the inner sequence after; otherwise, the LINQ query will not compile.
This requirement is different from standard SQL queries, in which item ordering
does not matter.
|
GroupJoin
In cases in which you need to define something similar to a
LEFT OUTER JOIN or a RIGHT OUTER JOIN, you need to use the GroupJoin
operator. Its signatures are quite similar to the Join operator:
The only difference is the definition of the resultSelector
predicate. It requires an instance of IEnumerable<U>,
instead of a single object of type U, because it
projects a hierarchical result of type IEnumerable<V>,
made of a selection of each item extracted from the inner
sequence joined with a group of items, of type U,
extracted from the outer sequence.
As a result of this behavior, the output is not a flattened outer
equijoin, which would be produced by using the Join operator,
but a hierarchical sequence of items. Nevertheless, you can define queries
using GroupJoin with results equivalent to the
Join operator, whenever the mapping is a one-to-one relationship. In
case of the absence of a corresponding element group in the inner
sequence, the GroupJoin operator extracts the
outer sequence element paired with an empty sequence (Count
= 0). In Listing 4-21,
you can see an example of this operator.
Listing 4-21: The
GroupJoin operator used to map products with orders, if present
var expr =
products
.GroupJoin(
customers.SelectMany(c => c.Orders),
p => p.IdProduct,
o => o.IdProduct,
(p, orders) => new { p.IdProduct, Orders = orders });
foreach(var item in expr) {
Console.WriteLine("Product: {0}", item.IdProduct);
foreach (var order in item.Orders) {
Console.WriteLine(" {0}", order); }}
The following is the result of Listing
4-21:
You can see that products 4 and 6 have no mapping orders, but the
query returns them nonetheless. You can think about this operator like a SELECT
… FOR XML AUTO query in Transact-SQL in Microsoft SQL Server 2000
and 2005. In fact, it returns results hierarchically grouped like a set of XML
nodes nested within their parent nodes, similar to the default result of a FOR
XML AUTO query.
In a query expression, the GroupJoin operator
is defined as a join … into
… clause. The query expression shown in
Listing 4-22 is equivalent to Listing
4-21.
Listing 4-22: A
query expression with a join into clause
var customersOrders =
from c in customers
from o in c.Orders
select o;
var expr =
from p in products
join o in customersOrders
on p.IdProduct equals o.IdProduct
into orders
select new { p.IdProduct, Orders = orders };
In this example, we first define an expression called
customersOrders to extract the flat list of orders. (This expression
still uses the SelectMany operator.) We could also
define a single query expression, nesting the customersOrders
expression within the main query. This approach is shown in
Listing 4-23.
Listing 4-23: The
query expression of Listing 4-22
in its compact version
var expr =
from p in products
join o in (
from c in customers
from o in c.Orders
select o
) on p.IdProduct equals o.IdProduct
into orders
select new { p.IdProduct, Orders = orders };
Set Operators
Our journey through LINQ operators continues with a group of
methods that are used to handle sets of data, applying common set operations (union,
intersect, and except) and
selecting unique occurrences of items (distinct).
Distinct
Imagine that you want to extract all products that are mapped
to orders, avoiding duplicates. This requirement could be solved in standard
SQL using a DISTINCT clause within a JOIN query. LINQ provides a
Distinct operator, a member of the set operators. Its signature is
quite simple. It requires just a source sequence, from
which all the distinct occurrences of items will be yielded. An example of the
operator is shown in Listing 4-24.
Listing 4-24: The
Distinct operator applied to the list of products used in orders
var expr =
customers
.SelectMany(c => c.Orders)
.Join(products,
o => o.IdProduct,
p => p.IdProduct,
(o, p) => p)
.Distinct();
Distinct does not have an equivalent query expression
clause; hence, as we did in Listing 4-15,
we can apply this operator to the result of a query expression, as shown in
Listing 4-25.
Listing 4-25: The
Distinct operator applied to a query expression
var expr =
(from c in customers
from o in c.Orders
join p in products
on o.IdProduct equals p.IdProduct
select p
).Distinct();
By default, Distinct compares and
identifies elements using their GetHashCode and
Equals methods because, internally, it uses a default comparer of type
EqualityComparer<T>.Default. We can, if necessary, override our
type behavior to change the Distinct result, or we can
just use the second overload of the Distinct method.
This last overload accepts a comparer argument,
available to provide a custom comparer for instances of type T.
|
Note |
We will see an example of how to compare reference types in
the Union operator examples in
Listing 4-26.
|
Union, Intersect, and Except
Within the set operators group, three more operators are
useful for classic set operations. They are Union,
Intersect, and Except, and they share a similar
definition:
The Union operator yields the
first sequence elements and the second sequence
elements, skipping duplicates. For instance, in
Listing 4-26, you can see how to merge the orders of the second
customer with the orders of the third.
Listing 4-26: The
Union operator applied to the second and third customer orders
var expr = customers[1].Orders.Union(customers[2].Orders);
As with the Distinct operator, in
Union, Intersect, and Except,
the elements are compared by using the GetHashCode and
Equals methods in the first overload, or by using a custom
comparer in the second overload. Here is the result of
Listing 4-26:
The result might seem unexpected because we have two rows that
appear to be the same. However, if you look at the initialization code used in
all of our examples, each order is a different instance of the Order
reference type. Even if the second order of the second customer
is semantically equal to the first order of the third
customer, they have two different hash codes. You can see this effect in the
following code, where the two semantically equivalent Order
instances are in bold:
We have not defined a value type semantic for our Order
reference type. To get the expected result, we can implement a value type
semantic by overriding the GetHashCode and
Equals implementations of the type to be compared. In this situation,
it might be useful to do that, as you can see in this new Order
implementation:
Another way to get the correct result is to use the second overload
of the Union method, providing a custom comparer for
the Order type. A final way to get the expected
distinct behavior is to define the Order type as a
value type, using struct instead of class
in its declaration. By the way, it is
not always possible to define a struct, because
sometimes you need to implement an object-oriented infrastructure using type
inheritance.
Remember that an anonymous type is defined as a reference type with
a value type semantic. In other words, all anonymous types are defined as a
class with an override of GetHashCode and
Equals written by the compiler.
In Listing 4-27, you can
find an example of using Intersect and Except.
Listing 4-27: The
Intersect and Except operators applied to the second and third customer orders
var expr1 = customers[1].Orders.Intersect(customers[2].Orders);
var expr2 = customers[1].Orders.Except(customers[2].Orders);
The Intersect operator yields only the
elements that occur in both sequences, and the Except operator
yields all the elements in the first sequence that are
not present in the second sequence. Once again, there
are no compact clauses to define set operators in query expressions, but we can
apply them to LINQ query results, as in Listing
4-28.
Listing 4-28: Set
operators applied to query expressions
var expr =
(from c in customers
from o in c.Orders
where c.Country == Countries.Italy
select o
).Intersect(
from c in customers
from o in c.Orders
where c.Country == Countries.USA
select o);
Aggregate Operators
At times, you need to make some aggregations over sequences
to make calculations on source items. To accomplish this, LINQ provides the
family of aggregate operators that implement the most common aggregate
functions: Count, LongCount,
Sum, Min, Max,
Average, and Aggregate.
Many of these operators are simple to use because their behavior is easy to
understand.
Count and LongCount
Imagine that you want to list all customers, each one
followed by the number of orders the customer has placed. In
Listing 4-29, you can see an equivalent syntax, based on the
Count operator.
Listing 4-29: The
Count operator applied to customer orders
var expr =
from c in customers
select new {c.Name, c.City, c.Country, OrdersCount = c.Orders.Count() };
The Count operator provides a couple of
signatures, as does the LongCount operator:
The signature shown in
Listing 4-29 is the common and simpler one that simply counts items in
the source sequence. The second method overload accepts
a non-nullable predicate, which is used to filter the
items to count. LongCount variations simply return a
long instead of an integer.
Sum
The Sum operator requires more
attention because it has multiple definitions:
We used Numeric in the syntax to generalize
the return type of the Sum operator. In practice, it
has many definitions, one for each of the main Numeric types:
int, int?, long,
long?, float, float?,
double, double?,
decimal, and decimal?.
|
Important |
As you probably know, in C# 2.0 and later, the question mark
that appears after a value type name (T?) defines a
nullable type (Nullable<T>) of this type. For
instance, int? means Nullable<System.Int32>.
|
The first implementation sums the source sequence
items, assuming that the items are all the same numeric type, and returns the
result. In the case of an empty source sequence, zero
is returned. In the case of nullable types, the result might be null. This
implementation can be used when the items can be summed directly. For example,
we can sum an array of integers as in this code:
When the sequence is not made up of simple Numeric
types, we need to extract values to be summed from each item in the
source sequence. To do that, we can use the second overload, which
accepts a selector argument. You can see an example of
this syntax in Listing 4-30.
Listing 4-30: The
Sum operator applied to customer orders
var customersOrders =
from c in customers
from o in c.Orders
join p in products
on o.IdProduct equals p.IdProduct
select new { c.Name, OrderAmount = o.Quantity * p.Price };
var expr =
from c in customers
join o in customersOrders
on c.Name equals o.Name
into customersWithOrders
select new { c.Name,
TotalAmount = customersWithOrders.Sum(o => o.OrderAmount) };
In Listing 4-30, we join
customers with the customersOrders sequence, returning
for each customer the total number of orders, calculated with the
Sum operator. As usual, we can collapse the previous code using nested
queries, which is the approach shown in Listing
4-31.
Listing 4-31: The
Sum operator applied to customer orders, with a nested query
var expr =
from c in customers
join o in (
from c in customers
from o in c.Orders
join p in products
on o.IdProduct equals p.IdProduct
select new { c.Name, OrderAmount = o.Quantity * p.Price }
) on c.Name equals o.Name
into customersWithOrders
select new { c.Name,
TotalAmount = customersWithOrders.Sum(o => o.OrderAmount) };
Min and Max
Within the set of aggregate operators, Min
and Max calculate the minimum and maximum values of the
source sequence, respectively. Both of these extension methods provide a rich
set of overloads:
The first signature, as in the Sum operator,
provides many definitions for the main numeric types (int,
int?, long, long?,
float, float?, double,
double?, decimal, and
decimal?), and it computes the minimum or maximum value on an
arithmetic basis, using the elements of the source sequence.
This signature is useful when the source elements are numbers by themselves, as
in Listing 4-32.
Listing 4-32: The
Min operator applied to order quantities
var expr =
(from c in customers
from o in c.Orders
select o.Quantity
).Min();
The second signature computes the minimum or maximum value of the
source elements regardless of their type. The comparison is made using the
IComparable<T> interface implementation, if supported by the
source elements, or the nongeneric IComparable interface
implementation. If the source type T does not implement
either of these interfaces, an ArgumentException error
will be thrown, with an Exception.Message equal to “At
least one object must implement IComparable.” To
examine this situation, take a look at Listing
4-33, in which the resulting anonymous type does not implement either
of the interfaces required by the Min operator.
Listing 4-33: The
Min operator applied to wrong types (thereby throwing an ArgumentException)
var expr =
(from c in customers
from o in c.Orders
select new { o.Quantity}
).Min();
In the case of an empty source or null source values, the result
will be null whenever the Numeric type is a nullable
type; otherwise, ArgumentNullException will be thrown.
The selector predicate, available in the last two
signatures, defines the function with which to extract values from the
source sequence elements. For instance, you can use these overloads to
avoid errors related to missing interface implementations (IComparable<T>/IComparable),
as in Listing 4-34.
Listing 4-34: The
Max operator applied to custom types, with a value selector
var expr =
(from c in customers
from o in c.Orders
select new { o.Quantity}
).Min(o => o.Quantity);
Average
The Average operator calculates the
arithmetic average of a set of values, extracted from a source sequence. Like
the previous operators, this function works with the source elements themselves
or with values extracted using a selector predicate:
The Numeric type can be int,
int?, long, long?,
float, float?, double,
double?, decimal, or
decimal?. The Result type always reflects the
“nullability” of the numeric type. When the Numeric type
is int or long, the
Result type is double. When the
Numeric type is int? or long?,
the Result type is double?.
Otherwise, the Numeric and Result
types are the same.
When the sum of the values used to compute the arithmetic average
is too large for the result type, an OverflowException error
is thrown. Because of its definition, the Average operator’s
first signature can be invoked only on a Numeric sequence.
If you want to invoke it on a source sequence, you need to provide a
selector predicate. In Listing
4-35, you can see an example of both of the overloads.
Listing 4-35: Both
Average operator signatures applied to product prices
var expr =
(from p in products
select p.Price
).Average();
var expr =
(from p in products
select new { p.Price }
).Average(p => p.Price);
The second signature is useful when you are defining a query in
which the average is just one of the results to extract. An example is shown in
Listing 4-36, where we extract all customers and their average order
amounts.
Listing 4-36: Customers
and their average order amounts
var expr =
from c in customers
join o in (
from c in customers
from o in c.Orders
join p in products
on o.IdProduct equals p.IdProduct
select new { c.Name, OrderAmount = o.Quantity * p.Price }
) on c.Name equals o.Name
into customersWithOrders
select new { c.Name,
AverageAmount = customersWithOrders.Average(o =>
o.OrderAmount) };
The results will be similar to the following:
Aggregate
The last operator in this set is Aggregate.
Take a look at its definition:
This operator repeatedly invokes the func function,
storing the result in an accumulator. Every step calls the function with the
current accumulator value as the first argument, starting from seed,
and with the current element within the source sequence
as the second argument. At the end of the iteration, the operator returns the
final accumulator value.
The only difference between the first two signatures is that the
second requires an explicit value for the seed of type
U. The first signature uses the first element in the source
sequence as the seed
and infers the seed type from the source
sequence itself. The third signature looks like the second, but it requires a
resultSelector predicate to call when extracting the final result.
In Listing 4-37, we use
the Aggregate operator to extract the most expensive
order for each customer.
Listing 4-37: Customers
and their most expensive orders
var expr =
from c in customers
join o in (
from c in customers
from o in c.Orders
join p in products
on o.IdProduct equals p.IdProduct
select new { c.Name, o.IdProduct,
OrderAmount = o.Quantity * p.Price }
) on c.Name equals o.Name
into orders
select new { c.Name,
MaxOrderAmount =
orders
.Aggregate((t, s) => t.OrderAmount > s.OrderAmount ?
t : s)
.OrderAmount };
As you can see, the function called by the Aggregate
operator compares the OrderAmount property of each
order executed by the current customer and accumulates the more expensive one.
At the end of each customer aggregation, the accumulator will contain the most
expensive order, and its OrderAmount property will be
projected into the final result, coupled with the customer Name
property. The following is the output from this query:
In Listing 4-38,
you can see another sample of aggregation. This example calculates the total
ordered amount for each product.
Listing 4-38: Products
and their ordered amounts
var expr =
from p in products
join o in (
from c in customers
from o in c.Orders
join p in products
on o.IdProduct equals p.IdProduct
select new { p.IdProduct, OrderAmount = o.Quantity * p.Price }
) on p.IdProduct equals o.IdProduct
into orders
select new { p.IdProduct,
TotalOrderedAmount =
orders
.Aggregate(0m, (a, o) => a += o.OrderAmount)};
Here is the output of this query:
In this second sample, the aggregate function uses an accumulator
of Decimal type. It is initialized to zero (seed
= 0m) and accumulates the OrderAmount values
for every step. The result of this function will also be a Decimal
type.
Both of the previous examples could also be defined by invoking the
Max or Sum operators, respectively. They are
shown in this section to help you learn about the Aggregate
operator’s behavior. In general, keep in mind that the Aggregate
operator is useful whenever there are no specific aggregation operators
available; otherwise, you should use an operator such as Min,
Max, Sum, and so on. For
instance, consider the example in Listing
4-39.
Listing 4-39: Customers
and their most expensive orders paired with the month of execution
var expr =
from c in customers
join o in (
from c in customers
from o in c.Orders
join p in products
on o.IdProduct equals p.IdProduct
select new { c.Name, o.IdProduct, o.Month,
OrderAmount = o.Quantity * p.Price }
) on c.Name equals o.Name into orders
select new { c.Name,
MaxOrder =
orders
.Aggregate( new { Amount = 0m, Month = String.Empty },
(t, s) => t.Amount > s.OrderAmount
? t
: new { Amount = s.OrderAmount,
Month = s.Month })};
The result of Listing 4-39
is shown here:
In this example, the Aggregate operator
returns a new anonymous type called MaxOrder: it is a
tuple composed of the amount and month of the most expensive order made by each
customer. The Aggregate operator used here cannot be
replaced by any of the other predefined aggregate operators because of its
specific behavior and result type.
The only way to produce a similar result using standard
aggregate operators is to call two different aggregators. That would require
two source sequence scannings: one to get the max
amount and one to get its month. Be sure to pay attention to the
seed definition, which declares the resulting anonymous type that will
be used by the aggregation function as well.
Generation Operators
When working with data by applying aggregates, arithmetic
operations, and mathematical functions, sometimes you need to also iterate over
numbers or item collections. For example, think about a query that needs to
extract orders placed for a particular set of years, between 2000 and 2007, or
a query that needs to repeat the same operation over the same data. The
generation operators are useful for operations such as these.
Range
The first operator of this set is Range.
It is a simple extension method that yields a set of Integer
numbers, selected within a specified range of values, as shown in its
signature:
The code in Listing 4-40 illustrates
a means to filter orders for the years between 2005 and 2007.
|
Important |
Please note that in the following example, a where
condition would be more appropriate because we are iterating orders
many times. The example in Listing 4-40
is provided only for demonstration and is not the best solution for the
specific query.
|
Listing 4-40: A
set of years generated by the Range operator, used to filter orders
var expr =
Enumerable.Range(2005, 3)
.SelectMany(x => (from o in orders
where o.Year == x
select new { o.Year, o.Amount }));
The Range operator can also be used to
implement classical mathematical operations such as square, power, factorial,
and so on. Listing 4-41 shows an
example of using Range and Aggregate
to calculate the factorial of a number.
Listing 4-41: A
factorial of a number using the Range operator
static int Factorial(int number) {
return (Enumerable.Range(0, number + 1)
.Aggregate(0, (s, t) => t == 0 ? 1 : s *= t)); }
Repeat
Another generation operator is Repeat,
which returns a set of count occurrences of
element. When the element is an instance of a
reference type, each repetition returns a reference to the same instance, not a
copy of it.
The Repeat operator is useful for
initializing enumerations (using the same element for all instances) or for
repeating the same query many times. In
Listing 4-42, we repeat the customer name selection two times.
Listing 4-42: The
Repeat operator, used to repeat the same query many times
var expr =
Enumerable.Repeat( (from c in customers
select c.Name), 2)
.SelectMany(x => x);
Please note that in this example, Repeat
returns a sequence of sequences, formed by two lists of customer names. For
this reason, we used SelectMany to get a flat list of
names.
Empty
The last of the generation operators is Empty,
which is used to create an empty enumeration of a particular type
T. This operation can be useful to initialize empty sequences.
Listing 4-43 provides an example that
uses Empty to fill an empty enumeration of
Customer.
Listing 4-43: The
Empty operator used to initialize an empty set of customers
IEnumerable<Customer> customers = Enumerable.Empty<Customer>();
Quantifiers Operators
Imagine that you need to check for the existence of elements
within a sequence, based on conditions or selection rules. First you select
items with Restriction operators, and then you use
aggregate operators such as Count to determine whether
any item that verifies the condition exists. There is, however, a set of
operators, called quantifiers, specifically used to check for existence
conditions over sequences.
Any
The first operator we will describe in this group is the
Any method, which evaluates a predicate and
returns a Boolean result:
As you can see from the method’s signatures, the method has an
overload that requires only the source sequence,
without a predicate. This method returns true when at
least one element in the source sequence exists or
false if the source sequence is empty. To
optimize its execution, Any returns as soon as a result
is available. In Listing 4-44, you
can see an example that determines whether there is any order of product one (IdProduct
== 1) within all the customer orders.
Listing 4-44: The
Any operator applied to all customer orders to check orders of IdProduct == 1
bool result =
(from c in customers
from o in c.Orders
select o)
.Any(o => o.IdProduct == 1);
result = Enumerable.Empty<Order>().Any(o => o.IdProduct == 1);
In this example, the operator evaluates items only until the
first order matching the condition (IdProduct == 1) is
found. The second example in Listing 4-44
illustrates a trivial example of the Any operator with
a false result, using the Empty
operator described earlier.
All
When you want to determine whether all of the items of a
sequence verify a filtering condition, you can use the All
operator. It returns a true result only if the
condition is verified by all the elements in the source
sequence:
For instance, in Listing 4-45
we determine whether every order has a positive quantity.
Listing 4-45: The
All operator applied to all customer orders to check the quantity
bool result =
(from c in customers
from o in c.Orders
select o)
.All(o => o.Quantity > 0);
result = Enumerable.Empty<Order>().All(o => o.Quantity > 0);
|
Important |
The All predicate applied to an empty
sequence will always return true. The internal operator
implementation in LINQ to Objects enumerates all the source
sequence items. It returns false as soon as an element
that does not verify the predicate is found. If the
sequence is empty, the predicate is never called and
the true value is returned.
|
Contains
The last quantifier operator is the Contains
extension method, which determines whether a source sequence
contains a specific item value:
In the LINQ to Objects implementation, the method tries to use the
Contains method of ICollection<T> if the
source sequence implements this interface. In cases when
ICollection<T> is not implemented, Contains
enumerates all the items in source, comparing each one
with the given value of type T and
using a custom comparer if provided, the second method
overload, or EqualityComparer<T>.Default otherwise.
In Listing 4-46, you can
see an example of the Contains method as it is used to
check for the existence of a specific order within the collection of orders of
a customer.
Listing 4-46: The
Contains operator applied to the first customer’s orders
orderOfProductOne = new Order {Quantity = 3, IdProduct = 1 ,
Shipped = false, Month = "January"};
bool result = customers[0].Orders.Contains(orderOfProductOne);
Because of its behavior, the Contains method
invoked in Listing 4-46 returns
true only if you use the same instance of Order
as the value to compare. Otherwise, you need a custom
comparer or a value type semantic for
Order type (a reference type that overloads the GetHashCode
and Equals methods or a value type, as we have already
seen) to look for an equivalent order in the sequence.
Partitioning Operators
Selection and filtering operations sometimes need to be
applied only to a subset of the elements of the source sequence. For instance,
you might need to extract only the first N elements that verify a condition.
You can use the Where and Select
operators with the zero-based index argument of their predicate, but this
approach is not always useful and intuitive. It is better to have specific
operators for these kinds of operations because they are performed quite
frequently.
A set of partitioning operators is provided to satisfy these needs.
Take and TakeWhile select the first N items or
the first items that verify a predicate, respectively. Skip
and SkipWhile complement the Take
and TakeWhile operators, skipping the first N items or
the first items that validate a predicate.
Take
We will start with the Take and
TakeWhile family:
The Take operator requires a
count argument that represents the number of items to take from the
source sequence. Negative values of count determine
an empty result; values over the sequence size return the full source
sequence. This method is useful for all queries in which you need the top N
items. For instance, you could use this method to select the top N customers
based on their order amount, as shown in Listing
4-47.
Listing 4-47: The
Take operator, applied to extract the two top customers ordered by order amount
var topTwoCustomers =
(from c in customers
join o in (
from c in customers
from o in c.Orders
join p in products
on o.IdProduct equals p.IdProduct
select new { c.Name, OrderAmount = o.Quantity * p.Price }
) on c.Name equals o.Name
into customersWithOrders
let TotalAmount = customersWithOrders.Sum(o => o.OrderAmount)
orderby TotalAmount descending
select new { c.Name, TotalAmount }
).Take(2);
As you can see, the Take operator
clause is quite simple, while the whole query is more articulated. The query
contains several of the basic elements and operators we have previously
discussed. The let clause, in addition to
Take, is the only clause that we have not already seen in action. The
let keyword is useful to define an alias for a value or for a variable
representing a formula. In this sample, we need to use the sum of all order
amounts on a customer basis as a value to project into the resulting anonymous
type. At the same time, the same value is used as a sorting condition.
Therefore, we defined an alias named TotalAmount to
avoid duplicate formulas.
TakeWhile
The TakeWhile operator works like the
Take operator, but it checks a formula to extract items instead of
using a counter. Here are the method’s signatures:
There are two overloads of the method. The first requires a
predicate that will be evaluated on each source
sequence item. The method enumerates the source sequence
and yields items if the predicate is true;
it stops the enumeration when the predicate result
becomes false, or when the end of the source
is reached. The second overload also requires a zero-based index for the
predicate to indicate where the query should start evaluating the
source sequence.
Imagine that you want to identify your top customers, generating a
list that makes up a minimum aggregate amount of orders. The problem looks
similar to the one we solved with the Take operator in
Listing 4-47, but we do not know how many customers we need to examine.
TakeWhile can solve the problem by using a predicate that calculates
the aggregate amount and uses that number to stop the enumeration when the
target is reached. The resulting query is shown in
Listing 4-48.
Listing 4-48: The
TakeWhile operator, applied to extract the top customers that form 80 percent
of all orders
// globalAmount is the total amount for all the orders
var limitAmount = globalAmount * 0.8m;
var aggregated = 0m;
var topCustomers =
(from c in customers
join o in (
from c in customers
from o in c.Orders
join p in products
on o.IdProduct equals p.IdProduct
select new { c.Name, OrderAmount = o.Quantity * p.Price }
) on c.Name equals o.Name
into customersWithOrders
let TotalAmount = customersWithOrders.Sum(o => o.OrderAmount)
orderby TotalAmount descending
select new { c.Name, TotalAmount }
)
.TakeWhile( X => {
bool result = aggregated < limitAmount;
aggregated += X.TotalAmount;
return result;
} );
Skip and SkipWhile
The Skip and SkipWhile
signatures are very similar to those for Take and
TakeWhile:
As we mentioned previously, these operators complement the
Take and TakeWhile couple. In fact, the
following code returns the full sequence of customers:
The only point of interest is that SkipWhile
skips the source sequence items while the
predicate evaluates to true and starts yielding
items as soon as the predicate result is
false, suspending the predicate evaluation on
all the remaining items.
Element Operators
Element operators are defined to work with single items of a
sequence, to extract a specific element by position or by using a predicate,
rather than a default value in case of missing elements.
First
We will start with the First method,
which extracts the first element in the sequence by using a predicate or a
positional rule:
The first overload returns the first element in the
source sequence, and the second overload uses a predicate
to identify the first element to return. If there are no elements that verify
the predicate or there are no elements at all in the
source sequence, the operator will throw an InvalidOperationException
error. Listing 4-49 shows an
example of the First operator.
Listing 4-49: The
First operator, used to select the first American customer
var item = customers.First(c => c.Country == Countries.USA);
Of course, this example could be defined by using a
Where and Take operator. However, the
First method better demonstrates the intention of the query, and it
also guarantees a single (partial) scan of the source sequence.
FirstOrDefault
If you need to find the first element only if it exists,
without any exception in case of failure, you can use the FirstOrDefault
method. This method works like First, but if there are
no elements that verify the predicate or if the source sequence
is empty, it returns a default value:
The default returned is default(T) in the
case of an empty source, where that default(T)
returns null for reference types and nullable types. If
no predicate argument is provided, the method returns
the first element of the source if it exists. An
example is shown in Listing 4-50.
Listing 4-50: Examples
of the FirstOrDefault operator syntax
var item = customers.FirstOrDefault(c => c.City == "Las Vegas");
Console.WriteLine(item == null ? "null" : item.ToString()); // returns null
IEnumerable<Customer> emptyCustomers = Enumerable.Empty<Customer>();
item = emptyCustomers.FirstOrDefault(c => c.City == "Las Vegas");
Console.WriteLine(item == null ? "null" : item.ToString()); // returns null
Last and LastOrDefault
The Last and LastOrDefault
operators are complements of First and FirstOrDefault.
The former have signatures and behaviors that mirror the latter:
These methods work like First and
FirstOrDefault. The only difference is that they select the last
element in source instead of the first.
Single
Whenever you need to select a specific and unique item from a
source sequence, you can use the operators Single
or SingleOrDefault:
If no predicate is provided,
single extracts from the source sequence the
first single element. Otherwise, it extracts the single element that verifies
the predicate. If there is no predicate and the source
sequence contains more than one item, an InvalidOperationException
error will be thrown. If there is a predicate and there
are no matching elements or there is more than one match in the
source, the method will throw an InvalidOperationException
error, too. You can see some examples in
Listing 4-51.
Listing 4-51: Examples
of the Single operator syntax
// returns Product 1
var item = products.Single(p => p.IdProduct == 1);
Console.WriteLine(item == null ? "null" : item.ToString());
// InvalidOperationException
item = products.Single();
Console.WriteLine(item == null ? "null" : item.ToString());
// InvalidOperationException
IEnumerable<Product> emptyProducts = Enumerable.Empty<Product>();
item = emptyProducts.Single(p => p.IdProduct == 1);
Console.WriteLine(item == null ? "null" : item.ToString());
SingleOrDefault
The SingleOrDefault operator provides
a default result value in the case of an empty sequence or no matching elements
in source. Its signatures are like those for
Single:
The default value returned by this method is default(T),
as in the FirstOrDefault and LastOrDefault
extension methods.
|
Important |
The default value is returned only if no elements match the
predicate. An InvalidOperationException error
is thrown when the source sequence contains more than
one matching item.
|
ElementAt and ElementAtOrDefault
Whenever you need to extract a specific item from a sequence
based on its position, you can use the ElementAt or
ElementAtOrDefault method:
The ElementAt method requires an
index argument that represents the position of the element to extract.
The index is zero based; therefore, you need to provide
a value of 2 to extract the third element. When the value of index
is negative or greater than the size of the source sequence,
an ArgumentOutOfRangeException error is thrown. The
ElementAtOrDefault method differs from ElementAt
because it returns a default value-default(T) for
reference types and nullable types-in the case of a negative index
or an index greater than the size of the
source sequence. Listing 4-52
shows some examples of how to use these operators.
Listing 4-52: Examples
of the ElementAt and ElementAtOrDefault operator syntax
// returns Product 2
var item = products.ElementAt(2);
Console.WriteLine(item == null ? "null" : item.ToString());
// returns null
item = Enumerable.Empty<Product>().ElementAtOrDefault(6);
Console.WriteLine(item == null ? "null" : item.ToString());
// returns null
item = products.ElementAtOrDefault(6);
Console.WriteLine(item == null ? "null" : item.ToString());
DefaultIfEmpty
DefaultIfEmpty returns a default element for an empty
sequence:
By default, it returns the list of items of a source
sequence. In the case of an empty source, it returns a default value that is
default(T) in the first overload or defaultValue
if you use the second overload of the method.
Defining a specific default value can be helpful in many
circumstances. For instance, imagine that you have a public static property
named Empty, used to return an empty instance of a
Customer:
Sometime this is useful, especially when unit testing code. Another
situation is when a query uses GroupJoin to realize a
left outer join. The possible resulting NULLs can be replaced by a default
value chosen by the query author.
In Listing 4-53, you can
see how to use DefaultIfEmpty, eventually with a custom
default value such as Customer.Empty.
Listing 4-53: Example
of the DefaultIfEmpty operator syntax, both with default(T) and a custom
default value
var expr = customers.DefaultIfEmpty();
var customers = Enumerable.Empty<Customer>(); // Empty array
IEnumerable<Customer> customersEmpty =
customers.DefaultIfEmpty(Customer.Empty);
Other Operators
To complete our coverage of LINQ query operators, we describe
a few final extension methods in this section.
Concat
The first one is the concatenation operator, named
Concat. As its name suggests, it simply appends a sequence to another,
as we can see from its signature:
The only requirement for Concat arguments
is that they enumerate the same type T. We can use this
method to append any IEnumerable<T> sequence to
another of the same type. Listing 4-54
shows an example of customer concatenation.
Listing 4-54: The
Concat operator, used to concatenate Italian customers with customers from the
United States
var italianCustomers =
from c in customers
where c.Country == Countries.Italy
select c;
var americanCustomers =
from c in customers
where c.Country == Countries.USA
select c;
var expr = italianCustomers.Concat(americanCustomers);
SequenceEqual
Another useful operator is the equality operator, which
corresponds to the SequenceEqual extension method:
This method compares each item in the first sequence with each
corresponding item in the second sequence. If the two sequences have exactly
the same number of items with equal items in every position, the two sequences
are considered equal. Remember the possible issues of reference type semantics
in this kind of comparison. You can consider overriding GetHashCode
and Equals to drive the result of this operator, or you
can use the second method overload, providing a custom implementation of
IEqualityComparer<T>.
|
|
|
|