XSD is dead, long live XSD!

My little contribution to the F# OSS ecosystem is schema support for the XML Type Provider. It's been recently merged into F# Data (and will ship soon in the upcoming version 3.0) after being available for a while as a standalone project.

It "comes with comprehensible documentation" but I'm going to use this blog to post a few tips covering marginal aspects.

Before introducing the type provider (and today's tip about nillable elements) let me spend a few words about schemas.

Validation

Having a schema allows to validate documents against it. We will use the following handy snippet:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
open System.Xml
open System.Xml.Schema

let createSchema (xmlReader: XmlReader) =    
    let schemaSet = XmlSchemaSet() 
    schemaSet.Add(null, xmlReader) |> ignore
    schemaSet.Compile()
    schemaSet

let parseSchema xsdText =
    use reader = XmlReader.Create(new System.IO.StringReader(xsdText))
    createSchema reader

let loadSchema xsdFile =    
    use reader = XmlReader.Create(inputUri = xsdFile)
    createSchema reader

let validator schemaSet xml =
    let settings = XmlReaderSettings(
                    ValidationType = ValidationType.Schema,
                    Schemas = schemaSet)
    use reader = XmlReader.Create(new System.IO.StringReader(xml), settings)
    try
        while reader.Read() do ()
        Result.Ok ()
    with :? XmlSchemaException as e ->
        Result.Error e.Message

Given a schema (AuthorXsd) and some documents (xml1 and xml2):

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
[<Literal>]
let AuthorXsd = """
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
  elementFormDefault="qualified" attributeFormDefault="unqualified">
    <xs:element name="author" type="authorType" />
    <xs:complexType name="authorType">
        <xs:sequence>
          <xs:element name="name" type="xs:string" />
          <xs:element name="born" type="xs:int" nillable="true" />
        </xs:sequence>
    </xs:complexType>
</xs:schema>"""

let xml1 = """
<author>
    <name>Karl Popper</name>
    <born>1902</born>
</author>"""

let xml2 = """
<author>
    <born>1902</born>
</author>"""

we can check their validity:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
let validateAuthor = AuthorXsd |> parseSchema |> validator

validateAuthor xml1
|> printfn "validation result for xml1: %A"

validateAuthor xml2
|> printfn "validation result for xml2: %A"

and see that xml2 lacks the name element:

validation result for xml1: Ok ()
validation result for xml2: Error
  "The element 'author' has invalid child element 'born'. List of possible elements expected: 'name'."

Type Provider

The XML Type Provider can be used with the Schema parameter, generating a type with Name and Born properties.

1: 
2: 
3: 
4: 
5: 
6: 
open FSharp.Data

type AuthorXsd = XmlProvider<Schema=AuthorXsd>

let author = AuthorXsd.Parse xml1
printfn "%A" (author.Name, author.Born)

Beware that no validation is performed; in fact, also xml2 could be parsed, albeit accessing the Name property would cause an exception. If you need to validate your input you have to do it yourself using code like the above validation snippet, which is useful anyway: whenever the type provider behaves unexpectedly, first check whether the input is valid.

You may be surprised, for example, that the following document is invalid:

1: 
validateAuthor "<author><name>Karl Popper</name></author>"
Error
  "The element 'author' has incomplete content. List of possible elements expected: 'born'."

Nillable Elements

The validator complains about the born element lacking, although it was declared nillable.

Declaring a nillable element is a weird way to specify that its value is not mandatory. A much simpler and more common alternative is to rely on minOccurs and maxOccurs to constrain the allowed number of elements. But in case you stumble across a schema with nillable elements, you need to be aware that valid documents look like this:

1: 
2: 
3: 
4: 
5: 
6: 
"""
<author>
    <name>Karl Popper</name>
    <born xsi:nil="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" />
</author>"""
|> validateAuthor
Ok ()

You may legitimately wonder what the heck is this strange nil attribute. It belongs to a special W3C namespace and its purpose is to explicitly signal the absence of a value.

The element tag must always be present for a nillable element! But the element is allowed to have content only when the nil attribute is false (or is simply omitted like in xml1):

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
"""
<author>
    <name>Karl Popper</name>
    <born xsi:nil="false" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        1902
    </born>
</author>"""
|> validateAuthor
Ok ()

For nillable elements the XML Type Provider creates two optional properties (Nil and Value).

1: 
printfn "%A" (author.Born.Nil, author.Born.Value)
(None, Some 1902)

For valid elements if Nil = Some true, then Value = None. The converse does not hold in general: for certain data types like xs:string that admit empty content, it is possible to have Value = None even if Nil = Some false or Nil = None; in fact the nil attribute helps disambiguate subtleties about the lack of a value: the value was not entered vs the value NULL was entered (can you feel the smell of the billion dollar mistake?).

In practice, when reading XML, you mostly rely on Value and ignore Nil. When you use the type provider to write XML, on the other hand, you need to pass appropriate values in order to obtain a valid document:

1: 
2: 
3: 
AuthorXsd.Author(name = "Karl Popper",
                 born = AuthorXsd.Born(nil = Some true, value = None))
|> printfn "%A"
<author>
  <name>Karl Popper</name>
  <born p2:nil="true" xmlns:p2="http://www.w3.org/2001/XMLSchema-instance" />
</author>
namespace System
namespace System.Xml
namespace System.Xml.Schema
val createSchema : xmlReader:XmlReader -> XmlSchemaSet

Full name: 2018-07-22-Xsd type provider and nillable elements.createSchema
val xmlReader : XmlReader
type XmlReader =
  member AttributeCount : int
  member BaseURI : string
  member CanReadBinaryContent : bool
  member CanReadValueChunk : bool
  member CanResolveEntity : bool
  member Close : unit -> unit
  member Depth : int
  member Dispose : unit -> unit
  member EOF : bool
  member GetAttribute : name:string -> string + 2 overloads
  ...

Full name: System.Xml.XmlReader
val schemaSet : XmlSchemaSet
Multiple items
type XmlSchemaSet =
  new : unit -> XmlSchemaSet + 1 overload
  member Add : schemas:XmlSchemaSet -> unit + 3 overloads
  member CompilationSettings : XmlSchemaCompilationSettings with get, set
  member Compile : unit -> unit
  member Contains : targetNamespace:string -> bool + 1 overload
  member CopyTo : schemas:XmlSchema[] * index:int -> unit
  member Count : int
  member GlobalAttributes : XmlSchemaObjectTable
  member GlobalElements : XmlSchemaObjectTable
  member GlobalTypes : XmlSchemaObjectTable
  ...

Full name: System.Xml.Schema.XmlSchemaSet

--------------------
XmlSchemaSet() : unit
XmlSchemaSet(nameTable: XmlNameTable) : unit
XmlSchemaSet.Add(schema: XmlSchema) : XmlSchema
XmlSchemaSet.Add(schemas: XmlSchemaSet) : unit
XmlSchemaSet.Add(targetNamespace: string, schemaDocument: XmlReader) : XmlSchema
XmlSchemaSet.Add(targetNamespace: string, schemaUri: string) : XmlSchema
val ignore : value:'T -> unit

Full name: Microsoft.FSharp.Core.Operators.ignore
XmlSchemaSet.Compile() : unit
val parseSchema : xsdText:string -> XmlSchemaSet

Full name: 2018-07-22-Xsd type provider and nillable elements.parseSchema
val xsdText : string
val reader : XmlReader
XmlReader.Create(input: System.IO.TextReader) : XmlReader
   (+0 other overloads)
XmlReader.Create(input: System.IO.Stream) : XmlReader
   (+0 other overloads)
XmlReader.Create(inputUri: string) : XmlReader
   (+0 other overloads)
XmlReader.Create(reader: XmlReader, settings: XmlReaderSettings) : XmlReader
   (+0 other overloads)
XmlReader.Create(input: System.IO.TextReader, settings: XmlReaderSettings) : XmlReader
   (+0 other overloads)
XmlReader.Create(input: System.IO.Stream, settings: XmlReaderSettings) : XmlReader
   (+0 other overloads)
XmlReader.Create(inputUri: string, settings: XmlReaderSettings) : XmlReader
   (+0 other overloads)
XmlReader.Create(input: System.IO.TextReader, settings: XmlReaderSettings, inputContext: XmlParserContext) : XmlReader
   (+0 other overloads)
XmlReader.Create(input: System.IO.TextReader, settings: XmlReaderSettings, baseUri: string) : XmlReader
   (+0 other overloads)
XmlReader.Create(input: System.IO.Stream, settings: XmlReaderSettings, inputContext: XmlParserContext) : XmlReader
   (+0 other overloads)
namespace System.IO
Multiple items
type StringReader =
  inherit TextReader
  new : s:string -> StringReader
  member Close : unit -> unit
  member Peek : unit -> int
  member Read : unit -> int + 1 overload
  member ReadAsync : buffer:char[] * index:int * count:int -> Task<int>
  member ReadBlockAsync : buffer:char[] * index:int * count:int -> Task<int>
  member ReadLine : unit -> string
  member ReadLineAsync : unit -> Task<string>
  member ReadToEnd : unit -> string
  member ReadToEndAsync : unit -> Task<string>

Full name: System.IO.StringReader

--------------------
System.IO.StringReader(s: string) : unit
val loadSchema : xsdFile:string -> XmlSchemaSet

Full name: 2018-07-22-Xsd type provider and nillable elements.loadSchema
val xsdFile : string
val validator : schemaSet:XmlSchemaSet -> xml:string -> Result<unit,string>

Full name: 2018-07-22-Xsd type provider and nillable elements.validator
val xml : string
val settings : XmlReaderSettings
Multiple items
type XmlReaderSettings =
  new : unit -> XmlReaderSettings + 1 overload
  member Async : bool with get, set
  member CheckCharacters : bool with get, set
  member Clone : unit -> XmlReaderSettings
  member CloseInput : bool with get, set
  member ConformanceLevel : ConformanceLevel with get, set
  member DtdProcessing : DtdProcessing with get, set
  member IgnoreComments : bool with get, set
  member IgnoreProcessingInstructions : bool with get, set
  member IgnoreWhitespace : bool with get, set
  ...

Full name: System.Xml.XmlReaderSettings

--------------------
XmlReaderSettings() : unit
type ValidationType =
  | None = 0
  | Auto = 1
  | DTD = 2
  | XDR = 3
  | Schema = 4

Full name: System.Xml.ValidationType
field ValidationType.Schema = 4
XmlReader.Read() : bool
Multiple items
module Result

from Microsoft.FSharp.Core

--------------------
type Result<'T,'TError> =
  | Ok of ResultValue: 'T
  | Error of ErrorValue: 'TError

Full name: Microsoft.FSharp.Core.Result<_,_>
union case Result.Ok: ResultValue: 'T -> Result<'T,'TError>
Multiple items
type XmlSchemaException =
  inherit SystemException
  new : unit -> XmlSchemaException + 3 overloads
  member GetObjectData : info:SerializationInfo * context:StreamingContext -> unit
  member LineNumber : int
  member LinePosition : int
  member Message : string
  member SourceSchemaObject : XmlSchemaObject
  member SourceUri : string

Full name: System.Xml.Schema.XmlSchemaException

--------------------
XmlSchemaException() : unit
XmlSchemaException(message: string) : unit
XmlSchemaException(message: string, innerException: exn) : unit
XmlSchemaException(message: string, innerException: exn, lineNumber: int, linePosition: int) : unit
val e : XmlSchemaException
union case Result.Error: ErrorValue: 'TError -> Result<'T,'TError>
property XmlSchemaException.Message: string
Multiple items
type LiteralAttribute =
  inherit Attribute
  new : unit -> LiteralAttribute

Full name: Microsoft.FSharp.Core.LiteralAttribute

--------------------
new : unit -> LiteralAttribute
val AuthorXsd : string

Full name: 2018-07-22-Xsd type provider and nillable elements.AuthorXsd
val xml1 : string

Full name: 2018-07-22-Xsd type provider and nillable elements.xml1
val xml2 : string

Full name: 2018-07-22-Xsd type provider and nillable elements.xml2
val validateAuthor : (string -> Result<unit,string>)

Full name: 2018-07-22-Xsd type provider and nillable elements.validateAuthor
val printfn : format:Printf.TextWriterFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
Multiple items
namespace FSharp

--------------------
namespace Microsoft.FSharp
Multiple items
namespace FSharp.Data

--------------------
namespace Microsoft.FSharp.Data
type AuthorXsd = XmlProvider<...>

Full name: 2018-07-22-Xsd type provider and nillable elements.AuthorXsd
type XmlProvider

Full name: FSharp.Data.XmlProvider


<summary>Typed representation of a XML file.</summary>
       <param name='Sample'>Location of a XML sample file or a string containing a sample XML document.</param>
       <param name='SampleIsList'>If true, the children of the root in the sample document represent individual samples for the inference.</param>
       <param name='Global'>If true, the inference unifies all XML elements with the same name.</param>
       <param name='Culture'>The culture used for parsing numbers and dates. Defaults to the invariant culture.</param>
       <param name='Encoding'>The encoding used to read the sample. You can specify either the character set name or the codepage number. Defaults to UTF8 for files, and to ISO-8859-1 the for HTTP requests, unless `charset` is specified in the `Content-Type` response header.</param>
       <param name='ResolutionFolder'>A directory that is used when resolving relative file references (at design time and in hosted execution).</param>
       <param name='EmbeddedResource'>When specified, the type provider first attempts to load the sample from the specified resource
          (e.g. 'MyCompany.MyAssembly, resource_name.xml'). This is useful when exposing types generated by the type provider.</param>
       <param name='InferTypesFromValues'>If true, turns on additional type inference from values.
          (e.g. type inference infers string values such as "123" as ints and values constrained to 0 and 1 as booleans. The XmlProvider also infers string values as JSON.)</param>
       <param name='Schema'>Location of a schema file or a string containing xsd.</param>
val author : XmlProvider<...>.Author

Full name: 2018-07-22-Xsd type provider and nillable elements.author
XmlProvider<...>.Parse(text: string) : XmlProvider<...>.Author


Parses the specified XML string
property XmlProvider<...>.Author.Name: string
property XmlProvider<...>.Author.Born: XmlProvider<...>.Born
property XmlProvider<...>.Born.Nil: Option<bool>
property XmlProvider<...>.Born.Value: Option<int>
type Author =
  inherit XmlElement
  new : name: string * born: Born -> Author + 1 overload
  member Born : Born
  member Name : string

Full name: FSharp.Data.XmlProvider,Schema="
<xs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\"
  elementFormDefault=\"qualified\" attributeFormDefault=\"unqualified\">
    <xs:element name=\"author\" type=\"authorType\" />
    <xs:complexType name=\"authorType\">
        <xs:sequence>
          <xs:element name=\"name\" type=\"xs:string\" />
          <xs:element name=\"born\" type=\"xs:int\" nillable=\"true\" />
        </xs:sequence>
    </xs:complexType>
</xs:schema>".Author
type Born =
  inherit XmlElement
  new : nil: Option<bool> * value: Option<int> -> Born + 1 overload
  member Nil : Option<bool>
  member Value : Option<int>

Full name: FSharp.Data.XmlProvider,Schema="
<xs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\"
  elementFormDefault=\"qualified\" attributeFormDefault=\"unqualified\">
    <xs:element name=\"author\" type=\"authorType\" />
    <xs:complexType name=\"authorType\">
        <xs:sequence>
          <xs:element name=\"name\" type=\"xs:string\" />
          <xs:element name=\"born\" type=\"xs:int\" nillable=\"true\" />
        </xs:sequence>
    </xs:complexType>
</xs:schema>".Born
union case Option.Some: Value: 'T -> Option<'T>
union case Option.None: Option<'T>