XSD is dead, long live XSD!

My little contribution to the F# OSS ecosystem is schema support for the XML Type Provider. It's been recently merged into F# Data (and will ship soon in the upcoming version 3.0) after being available for a while as a standalone project.

It "comes with comprehensible documentation" but I'm going to use this blog to post a few tips covering marginal aspects.

Before introducing the type provider (and today's tip about nillable elements) let me spend a few words about schemas.

Validation

Having a schema allows to validate documents against it. We will use the following handy snippet:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
open System.Xml
open System.Xml.Schema

let createSchema (xmlReader: XmlReader) =
    let schemaSet = XmlSchemaSet()
    schemaSet.Add(null, xmlReader) |> ignore
    schemaSet.Compile()
    schemaSet

let parseSchema xsdText =
    use reader = XmlReader.Create(new System.IO.StringReader(xsdText))
    createSchema reader

let loadSchema xsdFile =
    use reader = XmlReader.Create(inputUri = xsdFile)
    createSchema reader

let validator schemaSet xml =
    let settings = XmlReaderSettings(
                    ValidationType = ValidationType.Schema,
                    Schemas = schemaSet)
    use reader = XmlReader.Create(new System.IO.StringReader(xml), settings)
    try
        while reader.Read() do ()
        Result.Ok ()
    with :? XmlSchemaException as e ->
        Result.Error e.Message

Given a schema (AuthorXsd) and some documents (xml1 and xml2):

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
[<Literal>]
let AuthorXsd = """
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
  elementFormDefault="qualified" attributeFormDefault="unqualified">
    <xs:element name="author" type="authorType" />
    <xs:complexType name="authorType">
        <xs:sequence>
          <xs:element name="name" type="xs:string" />
          <xs:element name="born" type="xs:int" nillable="true" />
        </xs:sequence>
    </xs:complexType>
</xs:schema>"""

let xml1 = """
<author>
    <name>Karl Popper</name>
    <born>1902</born>
</author>"""

let xml2 = """
<author>
    <born>1902</born>
</author>"""

we can check their validity:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
let validateAuthor = AuthorXsd |> parseSchema |> validator

validateAuthor xml1
|> printfn "validation result for xml1: %A"

validateAuthor xml2
|> printfn "validation result for xml2: %A"

and see that xml2 lacks the name element:

validation result for xml1: Ok ()
validation result for xml2: Error
  "The element 'author' has invalid child element 'born'. List of possible elements expected: 'name'."

Type Provider

The XML Type Provider can be used with the Schema parameter, generating a type with Name and Born properties.

1: 
2: 
3: 
4: 
5: 
6: 
open FSharp.Data

type AuthorXsd = XmlProvider<Schema=AuthorXsd>

let author = AuthorXsd.Parse xml1
printfn "%A" (author.Name, author.Born)

Beware that no validation is performed; in fact, also xml2 could be parsed, albeit accessing the Name property would cause an exception. If you need to validate your input you have to do it yourself using code like the above validation snippet, which is useful anyway: whenever the type provider behaves unexpectedly, first check whether the input is valid.

You may be surprised, for example, that the following document is invalid:

1: 
validateAuthor "<author><name>Karl Popper</name></author>"
Error
  "The element 'author' has incomplete content. List of possible elements expected: 'born'."

Nillable Elements

The validator complains about the born element lacking, although it was declared nillable.

Declaring a nillable element is a weird way to specify that its value is not mandatory. A much simpler and more common alternative is to rely on minOccurs and maxOccurs to constrain the allowed number of elements. But in case you stumble across a schema with nillable elements, you need to be aware that valid documents look like this:

1: 
2: 
3: 
4: 
5: 
6: 
"""
<author>
    <name>Karl Popper</name>
    <born xsi:nil="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" />
</author>"""
|> validateAuthor
Ok ()

You may legitimately wonder what the heck is this strange nil attribute. It belongs to a special W3C namespace and its purpose is to explicitly signal the absence of a value.

The element tag must always be present for a nillable element! But the element is allowed to have content only when the nil attribute is false (or is simply omitted like in xml1):

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
"""
<author>
    <name>Karl Popper</name>
    <born xsi:nil="false" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        1902
    </born>
</author>"""
|> validateAuthor
Ok ()

For nillable elements the XML Type Provider creates two optional properties (Nil and Value).

1: 
printfn "%A" (author.Born.Nil, author.Born.Value)
(None, Some 1902)

For valid elements if Nil = Some true, then Value = None. The converse does not hold in general: for certain data types like xs:string that admit empty content, it is possible to have Value = None even if Nil = Some false or Nil = None; in fact the nil attribute helps disambiguate subtleties about the lack of a value: the value was not entered vs the value NULL was entered (can you feel the smell of the billion dollar mistake?).

In practice, when reading XML, you mostly rely on Value and ignore Nil. When you use the type provider to write XML, on the other hand, you need to pass appropriate values in order to obtain a valid document:

1: 
2: 
3: 
AuthorXsd.Author(name = "Karl Popper",
                 born = AuthorXsd.Born(nil = Some true, value = None))
|> printfn "%A"
<author>
  <name>Karl Popper</name>
  <born p2:nil="true" xmlns:p2="http://www.w3.org/2001/XMLSchema-instance" />
</author>
namespace System
namespace System.Xml
namespace System.Xml.Schema
val createSchema : xmlReader:XmlReader -> XmlSchemaSet
val xmlReader : XmlReader
type XmlReader = interface IDisposable new : unit -> unit member Close : unit -> unit member Dispose : unit -> unit + 1 overload member GetAttribute : i: int -> string + 2 overloads member GetValueAsync : unit -> Task<string> member IsStartElement : unit -> bool + 2 overloads member LookupNamespace : prefix: string -> string member MoveToAttribute : i: int -> unit + 2 overloads member MoveToContent : unit -> XmlNodeType ...
<summary>Represents a reader that provides fast, noncached, forward-only access to XML data.</summary>
val schemaSet : XmlSchemaSet
Multiple items
type XmlSchemaSet = new : unit -> unit + 1 overload member Add : targetNamespace: string * schemaUri: string -> XmlSchema + 3 overloads member Compile : unit -> unit member Contains : targetNamespace: string -> bool + 1 overload member CopyTo : schemas: XmlSchema [] * index: int -> unit member Remove : schema: XmlSchema -> XmlSchema member RemoveRecursive : schemaToRemove: XmlSchema -> bool member Reprocess : schema: XmlSchema -> XmlSchema member Schemas : unit -> ICollection + 1 overload member CompilationSettings : XmlSchemaCompilationSettings ...
<summary>Contains a cache of XML Schema definition language (XSD) schemas.</summary>

--------------------
XmlSchemaSet() : XmlSchemaSet
XmlSchemaSet(nameTable: XmlNameTable) : XmlSchemaSet
XmlSchemaSet.Add(schemas: XmlSchemaSet) : unit
XmlSchemaSet.Add(schema: XmlSchema) : XmlSchema
XmlSchemaSet.Add(targetNamespace: string, schemaDocument: XmlReader) : XmlSchema
XmlSchemaSet.Add(targetNamespace: string, schemaUri: string) : XmlSchema
val ignore : value:'T -> unit
<summary>Ignore the passed value. This is often used to throw away results of a computation.</summary>
<param name="value">The value to ignore.</param>
XmlSchemaSet.Compile() : unit
val parseSchema : xsdText:string -> XmlSchemaSet
val xsdText : string
val reader : XmlReader
XmlReader.Create(inputUri: string) : XmlReader
   (+0 other overloads)
XmlReader.Create(input: System.IO.TextReader) : XmlReader
   (+0 other overloads)
XmlReader.Create(input: System.IO.Stream) : XmlReader
   (+0 other overloads)
XmlReader.Create(reader: XmlReader, settings: XmlReaderSettings) : XmlReader
   (+0 other overloads)
XmlReader.Create(inputUri: string, settings: XmlReaderSettings) : XmlReader
   (+0 other overloads)
XmlReader.Create(input: System.IO.TextReader, settings: XmlReaderSettings) : XmlReader
   (+0 other overloads)
XmlReader.Create(input: System.IO.Stream, settings: XmlReaderSettings) : XmlReader
   (+0 other overloads)
XmlReader.Create(inputUri: string, settings: XmlReaderSettings, inputContext: XmlParserContext) : XmlReader
   (+0 other overloads)
XmlReader.Create(input: System.IO.TextReader, settings: XmlReaderSettings, inputContext: XmlParserContext) : XmlReader
   (+0 other overloads)
XmlReader.Create(input: System.IO.TextReader, settings: XmlReaderSettings, baseUri: string) : XmlReader
   (+0 other overloads)
namespace System.IO
Multiple items
type StringReader = inherit TextReader new : s: string -> unit member Close : unit -> unit member Dispose : disposing: bool -> unit member Peek : unit -> int member Read : unit -> int + 2 overloads member ReadAsync : buffer: char [] * index: int * count: int -> Task<int> + 1 overload member ReadBlock : buffer: Span<char> -> int member ReadBlockAsync : buffer: char [] * index: int * count: int -> Task<int> + 1 overload member ReadLine : unit -> string ...
<summary>Implements a <see cref="T:System.IO.TextReader" /> that reads from a string.</summary>

--------------------
System.IO.StringReader(s: string) : System.IO.StringReader
val loadSchema : xsdFile:string -> XmlSchemaSet
val xsdFile : string
val validator : schemaSet:XmlSchemaSet -> xml:string -> Result<unit,string>
val xml : string
val settings : XmlReaderSettings
Multiple items
type XmlReaderSettings = new : unit -> unit member Clone : unit -> XmlReaderSettings member Reset : unit -> unit member Async : bool member CheckCharacters : bool member CloseInput : bool member ConformanceLevel : ConformanceLevel member DtdProcessing : DtdProcessing member IgnoreComments : bool member IgnoreProcessingInstructions : bool ...
<summary>Specifies a set of features to support on the <see cref="T:System.Xml.XmlReader" /> object created by the <see cref="Overload:System.Xml.XmlReader.Create" /> method.</summary>

--------------------
XmlReaderSettings() : XmlReaderSettings
type ValidationType = | None = 0 | Auto = 1 | DTD = 2 | XDR = 3 | Schema = 4
<summary>Specifies the type of validation to perform.</summary>
field ValidationType.Schema: ValidationType = 4
<summary>Validate according to XML Schema definition language (XSD) schemas, including inline XML Schemas. XML Schemas are associated with namespace URIs either by using the <see langword="schemaLocation" /> attribute or the provided <see langword="Schemas" /> property.</summary>
XmlReader.Read() : bool
Multiple items
module Result from Microsoft.FSharp.Core
<summary>Contains operations for working with values of type <see cref="T:Microsoft.FSharp.Core.Result`2" />.</summary>
<category>Choices and Results</category>


--------------------
[<Struct>] type Result<'T,'TError> = | Ok of ResultValue: 'T | Error of ErrorValue: 'TError
<summary>Helper type for error handling without exceptions.</summary>
<category>Choices and Results</category>
union case Result.Ok: ResultValue: 'T -> Result<'T,'TError>
<summary> Represents an OK or a Successful result. The code succeeded with a value of 'T. </summary>
Multiple items
type XmlSchemaException = inherit SystemException new : unit -> unit + 4 overloads member GetObjectData : info: SerializationInfo * context: StreamingContext -> unit member LineNumber : int member LinePosition : int member Message : string member SourceSchemaObject : XmlSchemaObject member SourceUri : string
<summary>Returns detailed information about the schema exception.</summary>

--------------------
XmlSchemaException() : XmlSchemaException
XmlSchemaException(message: string) : XmlSchemaException
XmlSchemaException(message: string, innerException: exn) : XmlSchemaException
XmlSchemaException(message: string, innerException: exn, lineNumber: int, linePosition: int) : XmlSchemaException
val e : XmlSchemaException
union case Result.Error: ErrorValue: 'TError -> Result<'T,'TError>
<summary> Represents an Error or a Failure. The code failed with a value of 'TError representing what went wrong. </summary>
property XmlSchemaException.Message: string with get
<summary>Gets the description of the error condition of this exception.</summary>
<returns>The description of the error condition of this exception.</returns>
Multiple items
type LiteralAttribute = inherit Attribute new : unit -> LiteralAttribute
<summary>Adding this attribute to a value causes it to be compiled as a CLI constant literal.</summary>
<category>Attributes</category>


--------------------
new : unit -> LiteralAttribute
val AuthorXsd : string = " <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:element name="author" type="authorType" /> <xs:complexType name="authorType"> <xs:sequence> <xs:element name="name" type="xs:string" /> <xs:element name="born" type="xs:int" nillable="true" /> </xs:sequence> </xs:complexType> </xs:schema>"
val xml1 : string
val xml2 : string
val validateAuthor : (string -> Result<unit,string>)
val printfn : format:Printf.TextWriterFormat<'T> -> 'T
<summary>Print to <c>stdout</c> using the given format, and add a newline.</summary>
<param name="format">The formatter.</param>
<returns>The formatted result.</returns>
Multiple items
namespace FSharp

--------------------
namespace Microsoft.FSharp
Multiple items
namespace FSharp.Data

--------------------
namespace Microsoft.FSharp.Data
type AuthorXsd = XmlProvider<...>
type XmlProvider =
<summary>Typed representation of a XML file.</summary> <param name='Sample'>Location of a XML sample file or a string containing a sample XML document.</param> <param name='SampleIsList'>If true, the children of the root in the sample document represent individual samples for the inference.</param> <param name='Global'>If true, the inference unifies all XML elements with the same name.</param> <param name='Culture'>The culture used for parsing numbers and dates. Defaults to the invariant culture.</param> <param name='Encoding'>The encoding used to read the sample. You can specify either the character set name or the codepage number. Defaults to UTF8 for files, and to ISO-8859-1 the for HTTP requests, unless <c>charset</c> is specified in the <c>Content-Type</c> response header.</param> <param name='ResolutionFolder'>A directory that is used when resolving relative file references (at design time and in hosted execution).</param> <param name='EmbeddedResource'>When specified, the type provider first attempts to load the sample from the specified resource (e.g. 'MyCompany.MyAssembly, resource_name.xml'). This is useful when exposing types generated by the type provider.</param> <param name='InferTypesFromValues'>If true, turns on additional type inference from values. (e.g. type inference infers string values such as "123" as ints and values constrained to 0 and 1 as booleans. The XmlProvider also infers string values as JSON.)</param> <param name='Schema'>Location of a schema file or a string containing xsd.</param>
val author : XmlProvider<...>.Author
XmlProvider<...>.Parse(text: string) : XmlProvider<...>.Author
Parses the specified XSD string
property XmlProvider<...>.Author.Name: string with get
property XmlProvider<...>.Author.Born: XmlProvider<...>.Born with get
property XmlProvider<...>.Born.Nil: Option<bool> with get
property XmlProvider<...>.Born.Value: Option<int> with get
type Author = inherit XmlElement new : name: string * born: Born -> Author + 1 overload member Born : Born member Name : string
type Born = inherit XmlElement new : nil: Option<bool> * value: Option<int> -> Born + 1 overload member Nil : Option<bool> member Value : Option<int>
union case Option.Some: Value: 'T -> Option<'T>
<summary>The representation of "Value of type 'T"</summary>
<param name="Value">The input value.</param>
<returns>An option representing the value.</returns>
union case Option.None: Option<'T>
<summary>The representation of "No value"</summary>