LINQ to XML Queries
Now consider XML nodes from a different point of view: every
node set can be thought of as a sequence of nodes and queried by using LINQ
queries, just as with any other sequence of type IEnumerable<T>.
Starting from this postulate, we argue that every concept we have already seen
applied to other sequences in the fields of LINQ queries (such as LINQ to
Objects, LINQ to Entities, and so forth) can also be leveraged with XML nodes,
because LINQ to XML exposes every collection of nodes as an IEnumerable<T>
instance.
For example, we can use the standard query extension methods,
already described in
Chapter 4, to query XML nodes, too. There are also custom extension
methods, specifically defined to be applied to sequences of IEnumerable<X*>,
declared in the System.Xml.Linq.Extensions class. In
this section, we will cover all these methods.
Attribute, Attributes
Each instance of XElement supports a
set of methods to access its attributes, as shown here:
As you can see, the first method returns a single
XAttribute instance that is retrieved by name if it exists. If it does
not exist, the method returns NULL. The second method returns a sequence of
attributes of type IEnumerable<XAttribute>, which
are useful for LINQ queries, containing all the attributes of an
XElement instance. The last method shown returns a sequence of type
IEnumerable<XAttribute> that contains zero or one items.
Attributes of one element are a collection of unique named nodes; therefore, an
element with multiple occurrences of the same attribute name cannot exist.
Element, Elements
Every XContainer instance provides
methods to select single elements by name or to select sequences of elements
that are eventually filtered by their name (of type XName).
Here are their signatures:
The Element method iterates over the child
nodes of the current XContainer and returns the first
XElement, whose name corresponds to the argument of type
XName provided. Because of the argument type (XName),
you have to provide a valid node name, with its XML namespace URI in a case in
which you are looking for a qualified element, as shown in
Listing 6-22.
Listing 6-22: A
sample LINQ to XML query based on the Element extension method
XNamespace ns = "http://schemas.devleap.com/Customers";
XElement xmlCustomers = new XElement(ns + "customers",
from c in customers
where c.Country == Countries.Italy
select new XElement(ns + "customer",
new XAttribute("name", c.Name),
new XAttribute("city", c.City),
new XAttribute("country", c.Country)));
XElement element = xmlCustomers.Element(ns + "customer");
To get all the customers, we can use the Elements
method, as shown in Listing 6-23.
Listing 6-23: Another
sample LINQ to XML query based on the Elements extension method
var elements = xmlCustomers.Elements();
foreach (XElement e in elements) {
Console.WriteLine(e);
}
Here is the result:
The last overload of the Elements method
just allows filtering child elements by name. There is no way, using the
Element or Elements method, to get a single
XElement child of the current XContainer without
providing a filtering name, given that there are more than one child elements.
However, you can leverage the First extension method of
LINQ to Objects to achieve this goal. Here is an example:
Let’s try to leverage what we have just learned with LINQ queries.
Imagine that you need to transform a source document into a new schema.
Listing 6-24 shows the source document.
Listing 6-24: Source
XML with a list of customers
<?xml version="1.0" encoding="utf-8"?>
<customers>
<customer name="Paolo" city="Brescia" country="Italy" />
<customer name="Marco" city="Torino" country="Italy" />
<customer name="James" city="Dallas" country="USA" />
<customer name="Frank" city="Seattle" country="USA" />
</customers>
And Listing 6-25 shows
the desired output, where we changed the namespace of elements and filtered
customer elements on a country value basis.
Listing 6-25: Destination
XML with a list of customers transformed
<?xml version="1.0" encoding="utf-8"?>
<c:customers xmlns:c="http://schemas.devleap.com/Customers">
<c:customer>
<c:name>Paolo</c:name>
<c:city>Brescia</c:city>
</c:customer>
<c:customer>
<c:name>Marco</c:name>
<c:city>Torino</c:city>
</c:customer>
</c:customers>
We could use XSLT code to transform the source into the output.
Listing 6-26 provides really simple XSLT to do that.
Listing 6-26: XSLT
to transform XML from Listing 6-24 to
Listing 6-25
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:c="http://schemas.devleap.com/Customers">
<xsl:template match="customers">
<c:customers>
<xsl:for-each select="customer[@country = 'Italy']">
<c:customer>
<c:name><xsl:value-of select="@name"/></c:name>
<c:city><xsl:value-of select="@city"/></c:city>
</c:customer>
</xsl:for-each>
</c:customers>
</xsl:template>
</xsl:stylesheet>
Nevertheless, if we are already in .NET code, we can avoid exiting
from our code context and instead use a simple LINQ query like the one in
Listing 6-27.
Listing 6-27: A
functional construction used to transform XML from
Listing 6-24 to Listing 6-25
XNamespace ns = "http://schemas.devleap.com/Customers";
XElement destinationXmlCustomers =
new XElement(ns + "customers",
new XAttribute(XNamespace.Xmlns + "c", ns),
from c in sourceXmlCustomers.Elements("customer")
where c.Attribute("country").Value == "Italy"
select new XElement(ns + "customer",
new XElement(ns + "name", c.Attribute("name")),
new XElement(ns + "city", c.Attribute("city"))));
We personally like and appreciate XSLT features and their strong
syntax, but using them requires learning another query language. We know and
clearly understand that many developers are not familiar with XSLT syntax and
probably will prefer the LINQ solution, which is easier for a .NET developer to
write and also typed and checked from a compiler point of view. Finally, you
can consider the Visual Basic 9.0 version of this code, shown in
Listing 6-28.
Listing 6-28: A
Visual Basic 9.0 XML literal used to transform XML from
Listing 6-24 to Listing 6-25
Dim destinationXmlCustomers = _
<c:customers xmlns:c="http://schemas.devleap.com/Customers">
<%= From c In sourceXmlCustomers.<customers>.<customer> _
Where (c.@country = "Italy") _
Select _
<c:customer xmlns:c="http://schemas.devleap.com/Customers">
<c:name><%= c.@name %></c:name>
<c:city><%= c.@city %></c:city>
</c:customer> %>
</c:customers>
This approach is probably the one that is the quickest to write and
easiest to understand because you can directly think about the output XML. We
can make it even easier by using global XML namespaces. It is important to
notice the syntax used to select elements and attributes from the source XML
document. We use a special Visual Basic 9.0 syntax that you already saw in
Chapter 3. The syntax recalls XPath node selection. As you can see, we
select all the element nodes named
customer, which are children of the customers element
within the sourceXmlCustomer, by using the following
syntax:
The Visual Basic 9.0 compiler, as with XML literals, converts the
syntax into a standard LINQ to XML invocation of Elements
methods. In the same way, the syntax used to select attributes named
name and city (c.@name and
c.@city) recalls XPath attribute selection rules and is
converted into calls of the Attribute method of the
XElement type.
Sometimes XML schemas support optional elements or optional
attributes. When we define transformations using LINQ to XML, we work at a
higher level and use object instances rather than nodes. In cases where we
define an XElement-using functional construction-and
assign it a NULL value, the result is an empty closed element, like the one
shown in the following example:
The result is an empty tag: <city />,
as shown here:
In cases where we need to omit the element declaration when it is
empty (NULL), we can use the conditional operator, as shown in the following
sample:
Whenever we add NULL content to an XContainer,
it is skipped without throwing any kind of exception.
XNode Selection Methods
The XNode class provides some methods
that are useful for selecting elements and nodes related to the current node
itself. For instance, the ElementsBeforeSelf and
ElementsAfterSelf methods both return a sequence of type
IEnumerable<XElement> that contains the elements before or after
the current node, respectively. They both provide an overload with a parameter
of type XName to filter elements by name.
In addition, NodesBeforeSelf and
NodesAfterSelf methods return a sequence of type IEnumerable<XNode>
that contains all the nodes, regardless of their node type, before or after the
current one.
Similarities Between XPath Axes and
Extension Methods
Extension methods are defined in the System.Xml.Linq.Extensions
class that recall XPath Axes functions. The first two methods that we will
consider are Ancestors and Descendants,
which return an IEnumerable<XElement> sequence of
elements for a particular XNode instance.
Descendants returns all the elements after the current node in the
document graph, regardless of their depth in the graph. Ancestors
is somehow complementary to Descendants and returns all
the elements before the current node in the document graph. Both are shown
here:
These methods are useful for querying an XML source to find a
particular element after or before the current one, regardless of its position
in the graph. Consider the XML document in Listing
6-29.
Listing 6-29: An
XML instance to search with LINQ to XML
<?xml version="1.0" encoding="ibm850"?>
<customers>
<customer>
<name>Paolo</name>
<city>Brescia</city>
<country>Italy</country>
</customer>
<customer>
<name>Marco</name>
<city>Torino</city>
<country>Italy</country>
</customer>
</customers>
The following line of code returns 8 as the
number of descendant elements of an XML document like the one in
Listing 6-29:
The descendant elements are as follows: two <customer
/> elements, two <name /> elements,
two <city /> elements, and two <country
/> elements.
Two other extension methods that work like the previous ones are
AncestorsAndSelf and DescendantAndSelf. They
both act like the previously seen methods but also return the current element.
As it happens with XPath Axes, we can retrieve all the elements of an XML
source just by specifying the union of the results of Ancestors
and DescendantsAndSelf or AncestorsAndSelf
and Descendants.
If you need to select all the descendant nodes rather than
only the elements, you can use methods such as DescendantNodes
of XContainer or DescendantNodesAndSelf
of XElement, which return all descendant nodes
regardless of their node types, eventually with the node itself for the
DescendantNodesAndSelf method. There is also a Nodes
extension method, which returns all child nodes of one XContainer,
again regardless of their node types.
InDocumentOrder
One last extension method that needs to be explained is the
InDocumentOrder method. It orders an IEnumerable<XNode>
sequence of nodes related to the same XDocument using
the previously seen XNodeDocumentOrderComparer class,
which bases its behavior on the CompareDocumentOrder method.
This extension method is very useful whenever you want to select nodes ordered
on the basis of their order of occurrence in a document.
In the following example, you can see how to use it:
The result of this sample code is the full list of nodes
declared within our xmlCustomers document, ordered by
declaration.