What Is LINQ?
LINQ is a programming model that introduces queries as a
first-class concept into any Microsoft .NET language. However, complete support
for LINQ requires some extensions in the language used. These extensions boost
productivity, thereby providing a shorter, meaningful, and expressive syntax to
manipulate data.
Following is a simple LINQ query for a typical software solution
that returns the names of customers in Italy:
Do not worry about syntax and keywords (such as var)
for now. The result of this query is a list of strings. You can enumerate these
values with a foreach loop in C#:
Both the query definition and the
foreach loop just shown are regular C# 3.0 statements. At this point,
you might wonder what we are querying. What is Customers?
Is this query a new form of embedded SQL? Not at all. The same query (and the
following foreach code) can be applied to an SQL
database, to a DataSet, to an array of objects in
memory, or to many other kinds of data. Customers could
be a collection of objects:
Customers could be a DataTable in
a DataSet:
Customers could be an entity class that describes a
physical table in a relational database:
Finally, Customers could be an entity class
that describes a conceptual model and is mapped to a relational database:
As you will see, the SQL-like syntax used in LINQ is called a
query expression. Languages that implement embedded SQL define only a
simplified syntax to put SQL statements into a different language, but these
statements are not integrated into the language’s native syntax and type
system. For example, you cannot call a function written using the host language
in the middle of an SQL statement, although this is possible in LINQ. Moreover,
LINQ is not limited to querying databases, as embedded SQL is.
How LINQ Works
Assuming that you understood the concept of having syntax to
integrate queries into a language, you may want to see how this works. When you
write the following code:
the compiler generates this code:
From now on, we will skip the Customers declaration
for the sake of brevity. When the query becomes longer, as you see here:
the generated code is longer too:
As you can see, the code apparently calls instance members on the
object returned from the previous call. You will see that this apparent
behavior is regulated by the extension methods feature
of the host language (C# in this case). The implementation of the
Where, OrderBy, and Select
methods-called by the sample query-depends on the type of Customers
and on namespaces specified in previous using statements.
Extension methods are a fundamental syntax feature that is used by LINQ to
operate with different data domains using the same syntax.
|
More Info |
An extension method appears to extend a class (the class of
Customers in our examples), but in reality an external method receives
the instance of the class that seems to be extended as the first argument. The
var keyword used to declare query infers the
variable type declaration from the initial assignment, which in this case will
return an IEnumerable<T> type.
|
Another important concept is the timing of operations over data. In
general, a LINQ query is not really executed until there is access to the query
result, because it describes a set of operations that will be performed when
necessary. The access to a query result does the real work. This can be
illustrated in the case of a foreach loop:
There are also methods that iterate a LINQ query result, producing
a persistent copy of data in memory. For example, the ToList
method produces a typed List<T> collection:
When the LINQ query operates on data that is on a relational
database (such as Microsoft SQL Server), the LINQ query generates an equivalent
SQL statement instead of operating with in-memory copies of data tables. The
query execution on the database is delayed until the first access to the query
result. Therefore, if in the last two examples Customers
was a Table<Customer> type (a physical table in a
relational database) or an ObjectQuery<Customer> type
(a conceptual entity mapped to a relational database), the equivalent SQL query
would not be sent to the database until the foreach loop
was executed or the ToList method was called. The LINQ
query can be manipulated and composed in different ways until that time.
Relational Model vs. Hierarchical/Graph
Model
At first sight, LINQ might appear to be just another SQL
dialect. This similarity has its roots in the way a LINQ query can describe a
relationship between entities such as an SQL join:
This is similar to the regular way of querying data in a relational
model. However, LINQ is not limited to a single data domain like the relational
model is. In a hierarchical model, suppose that each customer has its own set
of orders, and each order has its own list of products. In LINQ, we can get the
list of products ordered by each customer in this way:
The previous query contains no joins. The relationship between
Customers and Orders is expressed by the second
from clause, which uses c.Orders to say “get
all Orders of the c
Customer.” The relationship between Orders and
Products is expressed by the Product member of
the Order instance. The result projects the product
name for each order row using o.Product.ProductName.
Hierarchical relationships are expressed in type definitions
through references to other objects. To support the previous query, we would
have classes similar to following.
Listing 1-1: Type
declarations with simple relationships
public class Customer {
public string Name;
public string City;
public Order[] Orders;
}
public struct Order {
public int Quantity;
public Product Product;
}
public class Product {
public int IdProduct;
public decimal Price;
public string ProductName;
}
However, chances are that we want to use the same Product
instance for many different Orders of the same product.
We probably also want to filter Orders or
Products without accessing them through Customer.
A common scenario is the one shown below.
Listing 1-2: Type
declarations with two-way relationships
public class Customer {
public string Name;
public string City;
public Order[] Orders;
}
public struct Order {
public int Quantity;
public Product Product;
public Customer Customer;
}
public class Product {
public int IdProduct;
public decimal Price;
public string ProductName;
public Order[] Orders;
}
By having an array of all products declared as follows:
we can query the graph of objects, asking for the list of orders
for the single product with an ID equal to 3:
With the same query language, we are querying different data
models. When you do not have a relationship defined between the entities used
in the query, you can always rely on subqueries and joins that are available in
LINQ syntax just as in an SQL language. However, when your data model already
defines entity relationships, you can leverage them, avoiding replication (with
possible mistakes) of the same information in many places.
If you have entity relationships in your data model, you can still
use explicit relationships in a LINQ query-for example, when you want to force
some condition, or when you simply want to relate entities that do not have
native relationships. For example, imagine that you want to find customers and
suppliers who live in the same city. Your data model might not provide an
explicit relationship between these attributes, but you can always write the
following:
And something like the following will be returned:
If you have experience using SQL queries, you probably assume that
a query result is always a “rectangular” table, one that repeats the data of
some columns many times in a join like the previous one. However, often a query
contains several entities with one or more one-to-many
relationships. With LINQ, you can write queries that return a hierarchy or
graph of objects like the following one:
The last query returns a row for each customer, each containing a
list of suppliers available in the same city as the customer. This result can
be queried again, just as any other object graph with LINQ. Here is how the
hierarchized results might appear:
If you want to get a list of customers and provide each customer
with the list of products he ordered at least one time and the list of
suppliers in the same city, you can write a query like this:
You can take a look at the result for a couple of customers to
understand how data is returned from the previous single LINQ query:
This type of result would be hard to obtain with one or more SQL
queries, because it would require an analysis of query results to build the
desired objects graph. LINQ offers an easy way to move your data from one model
to another and different ways to get the same results.
LINQ requires you to describe your data in terms of entities that
are also types in the language. When you build a LINQ query, it is always a set
of operations on instances of some classes. These objects might be the real
container of data, or they might be a simple description (in terms of metadata)
of the external entity you are going to manipulate. A query can be sent to a
database through an SQL command only if it is applied to a set of types that
map tables and relationships contained in the database. After you have defined
entity classes, you can use both approaches we described (joins and entity
relationship navigation). The conversion of all these operations in SQL
commands is the responsibility of the LINQ engine.
|
Note |
You can create entity classes by using code-generation tools
such as SQLMetal or the LINQ to SQL Designer in Microsoft Visual Studio.
|
In Listing 1-3, you can see an example of a Product
class that maps a relational table named Products, with five columns that
correspond to public data members.
Listing 1-3: Class
declaration mapped on a database table
[Table("Products")]
public class Product {
[Column(IsPrimaryKey=true)] public int IdProduct;
[Column(Name="UnitPrice")] public decimal Price;
[Column()] public string ProductName;
[Column()] public bool Taxable;
[Column()] public decimal Tax;
}
When you work on entities that describe external data (such as
database tables), you can create instances of these kinds of classes and
manipulate in-memory objects just as if data from all tables were loaded in
memory. These changes are submitted to the database through SQL commands when
you call the SubmitChanges method, as you can see in
Listing 1-4.
Listing 1-4: Database
update calling the SubmitChanges method
var taxableProducts =
from p in db.Products
where p.Taxable == true
select p;
foreach( Product product in taxableProducts ) {
RecalculateTaxes( product );
}
db.SubmitChanges();
The Product class in the preceding example
represents a row in the Products table of an external database. When
SubmitChanges is called, all changed objects generate an SQL command to
update the corresponding rows in the table.
XML Manipulation
LINQ has a different set of classes and extensions to support
the manipulation of XML data. We will create some examples using the following
scenario. Imagine that your customers are able to send orders using XML files
like the ORDERS.XML file shown in Listing 1-5.
Listing 1-5: A
fragment of an XML file of orders
<?xml version="1.0" encoding="utf-8" ?>
<orders xmlns="http://schemas.devleap.com/Orders">
<order idCustomer="ALFKI" idProduct="1" quantity="10" price="20.59"/>
<order idCustomer="ANATR" idProduct="5" quantity="20" price="12.99"/>
<order idCustomer="KOENE" idProduct="7" quantity="15" price="35.50"/>
</orders>
Using standard Microsoft .NET 2.0 System.Xml
classes, you can load the file using a DOM approach or you can parse its
contents using an XmlReader implementation. Regardless
of the solution you choose, you must always consider nodes, node types, XML
namespaces, and whatever else is related to the XML world. Many developers do
not like working with XML because it requires the knowledge of another domain
of data structures and uses syntax of its own.
If you need to extract all the products ordered with their
quantities, you can parse the orders file using an XmlReader
to accomplish this, as shown below.
Listing 1-6: Reading
the XML file of orders using an XmlReader
String nsUri = "http://schemas.devleap.com/Orders";
XmlReader xmlOrders = XmlReader.Create( "Orders.xml" );
List<Order> orders = new List<Order>();
Order order = null;
while (xmlOrders.Read()) {
switch (xmlOrders.NodeType) {
case XmlNodeType.Element:
if ((xmlOrders.Name == "order") &&
(xmlOrders.NamespaceURI == nsUri)) {
order = new Order();
order.CustomerID = xmlOrders.GetAttribute( "idCustomer" );
order.Product = new Product();
order.Product.IdProduct =
Int32.Parse( xmlOrders.GetAttribute( "idProduct" ) );
order.Product.Price =
Decimal.Parse( xmlOrders.GetAttribute( "price" ) );
order.Quantity =
Int32.Parse( xmlOrders.GetAttribute( "quantity" ) );
orders.Add( order );
}
break;
}
}
You could also use an XQuery like the following one to select
nodes:
However, XQuery also requires learning another language and syntax.
Moreover, the result of the previous XQuery sample should be converted into a
set of Order instances to be used within our code.
Finally, for many developers it is not very intuitive. As we have already said,
LINQ provides a query engine suitable for any kind of source, even an XML
document. By using LINQ queries, you can achieve the same result with less
effort and with unified programming language syntax.
Listing 1-7 shows a LINQ to XML query made over the orders file.
Listing 1-7: Reading
the XML file using LINQ to XML
XDocument xmlOrders = XDocument.Load( "Orders.xml" );
XNamespace ns = "http://schemas.devleap.com/Orders";
var orders = from o in xmlOrders.Root.Elements( ns + "order" )
select new Order {
CustomerID = (String)o.Attribute( "idCustomer" ),
Product = new Product {
IdProduct = (Int32)o.Attribute("idProduct"),
Price = (Decimal)o.Attribute("price") },
Quantity = (Int32)o.Attribute("quantity")
};
Using the new Microsoft Visual Basic 9.0 syntax, you can reference
XML nodes in your code by using an XPath-like syntax, as shown below.
Listing 1-8: Reading
the XML file using LINQ to XML and Visual Basic 9.0 syntax
Imports <xmlns:o="http://schemas.devleap.com/Orders">
' ...
Dim xmlOrders As XDocument = XDocument.Load("Orders.xml")
Dim orders = _
From o In xmlOrders.<o:orders>.<o:order> _
Select New Order With {
.CustomerID = o.@idCustomer,_
.Product = New Product With {
.IdProduct = o.@idProduct,
.Price = o.@price}, _
.Quantity = o.@quantity}
The result of these LINQ to XML queries could be used to
transparently load a list of Order entities into a
customer Orders property, using LINQ to SQL to submit
the changes into the physical database layer:
If you need to generate an ORDERS.XML file starting from your
customer’s orders, you can at least leverage Visual Basic 9.0 XML literals to
define the output’s XML structure. This is shown below.
Listing 1-9: Creating
the XML of orders using Visual Basic 9.0 XML literals
Dim xmlOrders = <o:orders>
<%= From o In orders _
Select <o:order idCustomer=<%= o.CustomerID %>
idProduct=<%= o.Product.IdProduct %>
quantity=<%= o.Quantity %>
price=<%= o.Product.Price %>/> %>
</o:orders>
You can appreciate the power of this solution, which keeps the XML
syntax without losing the stability of typed code and transforms a set of
entities selected via LINQ to SQL into an XML InfoSet.