|
|
|
|
LINQ to XML API
The LINQ to XML API is independent from LINQ to XML queries,
and it allows developers to build and manage XML contents regardless of whether
they will query them with LINQ and extension methods. You can use this API as a
stand-alone utility or in conjunction with LINQ queries. This new API is built
with World Wide Web Consortium (W3C) XML Infoset instances in mind, rather than
just XML 1.0 documents. Therefore, the in-memory tree is the objective of this
API, not the bare XML text file.
|
Note |
W3C defines XML Infoset as a set of
information items that describes the structure of any well-formed XML document.
You can think of an XML Infoset as the in-memory node graph description,
corresponding to an XML document, aside from the physical nature of the
document itself. For further details on XML Infoset, read the W3C
Recommendation:
http://www.w3.org/TR/xml-infoset/.
|
The goal of the LINQ to XML API is to provide an object-oriented
approach for XML construction and management, avoiding or solving many common
issues related to XML manipulation through W3C DOM. With LINQ to XML, the
approach to XML is no longer document centric, as it is in W3C DOM. Using LINQ
to XML, elements can be created and can exist detached from any document,
namespace usage has been simplified, and traversing the in-memory tree is like
scanning any other object graph. To make all of this possible, the API is based
on a set of classes, all with names prefixed by an X (and
which we will often refer to as X* classes in this the
chapter), that correspond to the main common nodes of an XML document. In
Figure 6-1, you can see the object model hierarchy.
To start using this API, you must reference the System.Xml.Linq
assembly and use its classes. The following sections describe the main types
defined in System.Xml.Linq.
XElement
This is one of the main classes of the LINQ to XML API. As
you can see from Figure 6-1, it has
the same hierarchical level as the XDocument class and
is derived from the base XNode class, through
XContainer. As its name suggests, it describes an XML element and can
be used as the container of any XML fragment parented to a tag. It provides
many constructors and static methods, some of which are very useful. For
instance, we can load the content of an XElement from
an existing XmlReader instance to reuse existing code
based on System.Xml classes by
using the static Load method of XElement.
Constructors such as the following create XML node graphs using functional
construction:
The params Object[] optional list of
parameters of one constructor represents a list of child nodes, attributes, or
both of the elements we are defining. For instance, an XElement
named customer, with a child element named
firstName, can be defined by using the code in
Listing 6-3.
Listing 6-3: A
sample XElement constructed using the LINQ to XML API
XElement tag = new XElement("customer",
new XElement("firstName", "Paolo"));
Using a standard DOM approach, we should have to define an
XmlDocument instance, explicitly create the elements, and append each
child node to its parent. Take a look at the code block in
Listing 6-4 to compare a DOM approach with the new functional
construction we have just used.
Listing 6-4: Definition
of an XML element using DOM
XmlDocument doc = new XmlDocument();
XmlElement customerElement = doc.CreateElement("customer");
XmlElement firstNameElement = doc.CreateElement("firstName");
firstNameElement.InnerText = "Paolo";
customerElement.AppendChild(firstNameElement);
doc.AppendChild(customerElement);
As you can see, the DOM approach is verbose and difficult to
understand. Probably the easiest way to define this customer element is to use
Visual Basic 9.0 XML literals, as demonstrated in
Listing 6-5.
Listing 6-5: Definition
of an XML element using Visual Basic 9.0 XML literals
Dim customerName As String = "Paolo"
Dim tag As XElement = _
<customer>
<firstName><%= customerName %></firstName>
</customer>
As you saw in
Chapter 3, this syntax will be translated by the Visual Basic 9.0
compiler into the equivalent functional construction.
XElement instances can also be saved into a
String, an XmlWriter, or a TextWriter.
Every XElement allows the reading of its content with
direct casting, using a custom implementation of the
Explicit operator, defined to obtain a typed
version of the Value of the element. Compared to a
classic System.Xml.XmlElement, this is a great
improvement because we can manage XML nodes typed from a .NET point of view
with a value-centric approach. To better understand this concept, consider the
sample code in Listing 6-6.
Listing 6-6: Sample
of explicit type casting using XElement content
XElement order = new XElement("order",
new XElement("quantity", 10),
new XElement("price", 50),
new XAttribute("idProduct", "P01"));
Decimal orderTotalAmount =
(Decimal)order.Element("quantity") *
(Decimal)order.Element("price");
Console.WriteLine("Order total amount: {0}", orderTotalAmount);
Here we use an XElement that describes an
order. Imagine that we received this instance of the order from an order
management system rather than constructing it explicitly by code. As you can
see, we extract the elements named quantity and
price and we convert them to a Decimal type.
The conversion will return the inner Value of each
element node, trying to cast it to Decimal. To handle
the case of invalid content, we need to catch a FormatException,
because the various Explicit operator overloads
internally use XmlConvert from System.Xml
or Parse methods of .NET types.
Finally, note that the XElement constructor
automatically handles XML encoding of text. Consider
Listing 6-7.
Listing 6-7: Sample
of explicit escaping of XML text
XElement notes = new XElement("notes",
"Some special characters like & > < <div/> etc.");
The result is encoded automatically using XmlConvert
and looks like the following:
Also, node names are checked against XML naming rules and
invalid names are rejected, throwing a System.Xml.XmlException.
(For further details, see XSD types Name and
NMToken on the W3C Web site at:
http://www.w3.org.) This behavior is different
from that of old XmlWriter, where names were
automatically encoded. Sincerely, we think that it is better to make developers
aware of syntactic rules rather than always hide them under the cover. However,
if you want to define “irregular" node names with
LINQ to XML, you can just use the XmlConvert class,
invoking its methods, EncodeName or EncodeNmToken,
respectively.
XDocument
The XDocument class represents an XML
Infoset document instance. We can create document instances starting from a
params Object[] list of objects of the following types:
XElement, XDeclaration, XProcessingInstruction,
XDocumentType, and XComment.
Surprisingly, XDocument does not have a
constructor with a parameter of type XmlReader,
Stream, or whatever describes a source file or Uniform Resource
Identifier (URI). In fact, XDocument, like
XElement, provides a set of static Load methods
that can work with String, XmlReader/XmlWriter,
and TextReader or TextWriter.
To persist the XML Infoset XDocument instances, you
need to provide a set of Save methods. Generally, an
XDocument instance is useful whenever you need to create processing
instructions or document type declarations on top of the XML document;
otherwise, XElement is a better choice and is easier to
use.
|
Important |
As we have already seen, Visual Basic 9.0 XML literals are
parsed by the Visual Basic 9.0 compiler to generate standard LINQ to XML API
syntax. During this parsing phase, the compiler supports a subset of
constructors provided by the various LINQ to XML types. For instance, whenever
you need to create an XDocument using Visual Basic 9.0
XML literals, the only constructor supported is the one that requires a first
argument of type XDeclaration (for example, a
processing instruction) on top of the document. Any other XML literal missing
the trailing XDeclaration will be assumed to be an
XElement instance.
|
XAttribute
This class represents an XML attribute instance and can be
added to any XContainer by using its constructor and
LINQ to XML functional construction. Notice that the XAttribute
class is independent from XNode and, consequently, from
XElement and XDocument. It has only the base
XObject class in common with all other X* classes. Like the
XElement class, it provides a rich set of conversion operators so that
it can provide its content already typed from a .NET point of view. From a
practical point of view, working with attributes is quite similar to working
with elements. However, from an internal point of view, attributes are handled
as a name/value pair mapped to the container element. Each XAttribute
provides a couple of properties, called NextAttribute and
PreviousAttribute, that are useful for browsing the
sequence of attributes of an element.
XNode
XNode is the base class for many of the X* classes, and
it implements the entire tree-node management infrastructure, providing methods
to add, move, remove, and replace nodes within the XML Infoset. For instance,
the AddAfterSelf and AddBeforeSelf
methods are useful for inserting one or more nodes after or before the current
one. Listing 6-8 provides an
example of these methods-specifically, it shows how to
use these methods to insert a couple of addresses into the previously seen
customer, just after the first address.
Listing 6-8: Sample
usage of the AddAfterSelf method of XNode
XElement customer = XElement.Load(@"..\..\customer.xml");
XElement firstAddress =
(customer.Descendants("addresses").Elements("address")).First();
firstAddress.AddAfterSelf(
new XElement("address",
new XAttribute("type", "IT-blog"),
"http://blogs.devleap.com/"),
new XElement("address",
new XAttribute("type", "US-blog"),
"http://weblogs.asp.net/PaoloPia/"));
As you can see, we can add a set of nodes because these methods
provide a couple of overloads, which are shown here:
The first two overloads in the preceding list require a single
parameter of type Object, while the second two
overloads accept a params Object[] variable list of
parameters. You might be wondering why these methods, like many of the
previously seen constructors, accept the type Object instead
of XNode or any other X* class instance. The answer is
quite simple but very interesting: Whenever we provide an object to methods and
constructors of X* classes, the API checks to determine whether they implement
IEnumerable to recursively handle their contents; if they do not, the
API converts them to a String, calling their
ToString() implementation. NULL parameters are just ignored.
We can write LINQ to XML syntax to load a set of nodes, as in the
following code block, based on functional construction and using C# merged with
LINQ queries. In Listing 6-9, we use
the well-known customers sequence-which we used in
Chapter 4, “LINQ
Syntax Fundamentals”-to build an XML document based on those customers.
Listing 6-9: A
LINQ to XML sentence merged with LINQ queries
XElement xmlCustomers = new XElement("customers",
from c in customers
where c.Country == Countries.Italy
select new XElement("customer",
new XAttribute("name", c.Name),
new XAttribute("city", c.City),
new XAttribute("country", c.Country)));
The result looks like the following XML document:
The same result can be achieved by using Visual Basic 9.0 XML
literals with the code shown in Listing 6-10.
Listing 6-10: A
LINQ to XML sentence merged with LINQ queries, using Visual Basic 9.0 XML
literals
Dim xmlCustomers As XElement = _
<customers>
<%= From c In customers _
Where (c.Country = Countries.Italy) _
Select _
<customer>
<firstName><%= c.FirstName %></firstName>
</customer> %>
</customers>
Another interesting method provided by XNode
is DeepEqual. It is a static method, useful to fully
compare a couple of XML nodes for equality, as the name suggests. It works by
comparing nodes using an internal abstract instance method still called
DeepEqual. In this way, every type inherited from XNode
implements its own DeepEqual behavior. For example,
XElement compares element names, element content, and element
attributes. The XNodeEqualityComparer class that we
will use later in this chapter, within LINQ to XML queries, is based on
DeepEqual.
XName and XNamespace
When defining XML contents and node graphs, usually you must
also map nodes to their XML namespace. In Listing
6-11, you can see how to define nodes with an XML namespace by using a
classic DOM approach.
Listing 6-11: XML
namespace handling using classic DOM syntax
XmlDocument document = new XmlDocument();
XmlElement customer = document.CreateElement("c", "customer",
"http://schemas.devleap.com/Customer");
document.AppendChild(customer);
XmlElement firstName = document.CreateElement("c", "firstName",
"http://schemas.devleap.com/Customer");
customer.AppendChild(firstName);
As you can see, we use an overload of the CreateElement
method, which requires three parameters: a namespace prefix, a tag local name,
and the full namespace URI. The same can be done for XML attributes, using
CreateAttribute of XmlDocument or
SetAttribute of XmlElement. To tell the truth,
this way of working is not all that difficult to understand and implement.
Nevertheless, developers often create confusion when using this approach and
complain that XML namespaces are difficult to manage. The real issue probably
derives from namespace prefixes, which are just aliases to the real XML
namespaces. Theoretically, prefixes are used to simplify namespace references;
in reality, they might cause confusion. To address feedback from developers,
the LINQ to XML API was designed to provide an easier way of working with XML
namespaces, avoiding any explicit use of prefixes. Every node name is an
instance of the XName class, which can be defined by a
String or by a pairing of an XNamespace and a
String. In Listing 6-12, you
can see how to define XML content by using a single default XML namespace.
Listing 6-12: LINQ
to XML namespace declaration
XNamespace ns = "http://schemas.devleap.com/Customer";
XElement customer = new XElement(ns + "customer",
new XAttribute("id", "C01"),
new XElement(ns + "firstName", "Paolo"),
new XElement(ns + "lastName", "Pialorsi"));
As you can see, the XNamespace definition
looks like a String, but it is not. Internally, every
XNamespace has a more complex behavior. Here is the output of the
preceding code:
Using Visual Basic 9.0 syntax, we can define the namespace directly
inside the XML content, as Listing 6-13
shows.
Listing 6-13: Visual
Basic 9.0 XML literals used to declare XML content with a default XML namespace
Dim customer As XDocument = _
<?xml version="1.0" encoding="utf-8"?>
<customer id="C01" xmlns="http://schemas.devleap.com/Customer">
<firstName>Paolo</firstName>
<lastName>Pialorsi</lastName>
</customer>
Now consider Listing 6-14,
where we use a couple of XML namespaces.
Listing 6-14: Multiple
XML namespaces within a single XElement declaration
XNamespace nsCustomer = "http://schemas.devleap.com/Customer";
XNamespace nsAddress = "http://schemas.devleap.com/Address";
XElement customer = new XElement(nsCustomer + "customer",
new XAttribute("id", "C01"),
new XElement(nsCustomer + "firstName", "Paolo"),
new XElement(nsCustomer + "lastName", "Pialorsi"),
new XElement(nsAddress + "addresses",
new XElement(nsAddress + "address",
new XAttribute("type", "email"),
"paolo@devleap.it"),
new XElement(nsAddress + "address",
new XAttribute("type", "home"),
"Brescia - Italy")));
Again, the output is a document with all qualified XML nodes:
At this point, we have seen that XNamespace
is quite simple to use and that the LINQ to XML API automatically handles
namespace declaration, avoiding the explicit use of prefixes. You are probably
curious about what happens when we define an XName as a
concatenation of an XNamespace instance and a
String to represent the local name of the node. Each XName
instance can be represented as a String, using its
ToString method:
Here is the result of the preceding line of code:
Let’s try to use this “resolved” text instead of the concatenation
(XNamespace instance plus local name) used previously:
In the System.Xml.Linq API, the resolved
text “{namespace}local-name” is called the “expanded name” and is semantically
equivalent to defining the XNamespace separately. The
concatenation of an XNamespace and a String
produces a new XName equivalent to the expanded name.
Now we are missing only XML namespace prefixes. We have seen that
this new API handles namespace declaration by itself. However, sometimes we
might need to influence how to serialize nodes and represent namespaces by
overriding the default behavior of LINQ to XML. To achieve this goal, we can
explicitly define the prefixes to use for namespaces by using xmlns
attributes within our elements, as we do in the example in
Listing 6-15.
Listing 6-15: LINQ
to XML declaration of an XML namespace with a custom prefix
XNamespace ns = "http://schemas.devleap.com/Customer";
XElement customer = new XElement(ns + "customer",
new XAttribute(XNamespace.Xmlns + "c", ns),
new XAttribute("id", "C01"),
new XElement(ns + "firstName", "Paolo"),
new XElement(ns + "lastName", "Pialorsi"));
The output looks like the following:
As you can see, we defined “c” as the prefix of nodes associated
with the XNamespace instance named ns.
One more time, the corresponding and easiest Visual Basic 9.0
syntax is shown in Listing 6-16.
Listing 6-16: Visual
Basic 9.0 XML literals used to declare an XML namespace with a custom prefix
Dim customer As XDocument = _
<?xml version="1.0" encoding="utf-8"?>
<c:customer xmlns:c="http://schemas.devleap.com/Customer" id="C01">
<c:firstName>Paolo</c:firstName>
<c:lastName>Pialorsi</c:lastName>
</c:customer>
You might think that, starting from LINQ to XML, namespaces are
simpler to handle and prefixes are transparently taken out of your control. On
the other hand, you might now have the impression that if you need to influence
prefixes, you need to do a little more work, at least using C# 3.0. In fact,
Visual Basic 9.0 XML literals also simplify namespace declaration, leveraging a
feature called global XML namespaces. This new feature
allows you to globally declare an XML namespace URI with its corresponding
prefix within a Visual Basic 9.0 code file so that you can reuse it many times
in code. In Listing 6-17, you can see
an example.
Listing 6-17: Visual
Basic 9.0 XML literals and global XML namespaces
Imports System.Xml.Linq
Imports System.Linq
Imports <xmlns:c="http://schemas.devleap.com/Customer">
Public Class Program
Private Shared Sub Listing6_17()
Dim xmlCustomers As XDocument = _
<?xml version="1.0" encoding="utf-8"?>
<c:customers>
<c:customer name="Paolo" city="Brescia" country="Italy"/>
<c:customer name="Marco" city="Torino" country="Italy"/>
<c:customer name="James" city="Dallas" country="USA"/>
<c:customer name="Frank" city="Seattle" country="USA"/>
</c:customers>
End Sub
End Class
The key point of this sample is the Imports
statement, which declares the global namespace prefix c
for namespace
http://schemas.devleap.com/Customer. This
particular kind of Imports syntax can be used only to
declare an XML namespace with its prefix. It is not allowed to declare a
default XML namespace without a prefix.
Let’s look at a final example, shown in
Listing 6-18, using C# 3.0 to define a default namespace and a custom
prefixed one.
Listing 6-18: C#
3.0 syntax used to define a default namespace and a custom prefix for one
XNamespace nsCustomer = "http://schemas.devleap.com/Customer";
XNamespace nsAddress = "http://schemas.devleap.com/Address";
XElement customer = new XElement(nsCustomer + "customer",
new XAttribute("id", "C01"),
new XElement(nsCustomer + "firstName", "Paolo"),
new XElement(nsCustomer + "lastName", "Pialorsi"),
new XElement(nsAddress + "address", "Brescia - Italy",
new XAttribute(XNamespace.Xmlns + "a", nsAddress)));
The code in Listing 6-18 produces
an XML fragment like the following one:
To query the previous XML content for the purpose of extracting the
lastName node, we can just write a line of code like the following one:
Using Visual Basic 9.0 and global XML namespaces, we can use code
like this:
Later in this chapter, we will examine in detail how to query
XML contents using LINQ to XML queries with both C# 3.0 and Visual Basic 9.0
syntax.
Other X* Classes
This new API has other available classes that define
processing instructions (XProcessingInstruction),
document types (XDocumentType), comments (XComment),
and text nodes (XText). They are all derived from
XNode and are typically used to build XDocument
instances.
XObject and Annotations
XObject represents the base class of the whole LINQ to
XML API, and it mainly provides methods and properties to work with annotations
on nodes. Annotations are a new mechanism that maps metadata to XML nodes. For
instance, we can add custom user information to our nodes as shown in
Listing 6-19.
Listing 6-19: Annotations
applied to an XElement instance
XElement customer = XElement.Load(@"..\..\customer.xml");
CustomerAnnotation annotation = new CustomerAnnotation();
annotation.Notes = "This is a good customer!";
customer.AddAnnotation(annotation);
CustomerAnnotation is a custom type and can be any .NET
type. We can then retrieve annotations from XML nodes by using one of the two
generic methods, Annotation<T> and
Annotations<T>. These generic methods search for an annotation of
type T or one that is derived from T
in the current node, and if one exists, Annotation<T>
and Annotations<T> return the first one or the
full set of them, respectively.
Because XObject is the base class of
every kind of X* class that is used to describe an XML node, annotations can be
added to any node. Usually, annotations are used to keep state information,
such as the mapping to source entities or documents used to build XML, while
the code handles real XML content.
|
|
|
|