Is there any difference between 'valid xml' and 'well formed xml'?

68,566

Solution 1

There is a difference, yes.

XML that adheres to the XML standard is considered well formed, while xml that adheres to a DTD is considered valid.

Solution 2

Well-formed vs Valid XML

Well-formed means that a textual object meets the W3C requirements for being XML.

Valid means that well-formed XML meets additional requirements given by a specified schema.


Official Definitions

Per the W3C Recommendation for XML:

[Definition: A data object is an XML document if it is well-formed, as defined in this specification. In addition, the XML document is valid if it meets certain further constraints.]


Observations:

  • A document that is not well-formed is not XML. (Well-formed XML is commonly used but technically redundant.)
  • Being valid implies being well-formed.
  • Being well-formed does not imply being valid.
  • Although the W3C Recommendation for XML defines validity to be against a DTD, conventional use allows the term to be applied for conformance to XML schemas specified via XSD, RELAX NG, Schematron, or other methods.

Examples of what causes a document to be...

Not well-formed:

  • An element lacks a closing tag (and is not self-closing).
  • Elements overlap without proper nesting: <a><b></a></b>
  • An attribute value is missing a closing quote that matches the opening quote.
  • < or & are used in content rather than &lt or &amp;.
  • Multiple root elements exist.
  • Multiple XML declarations exist, or an XML declaration appears other than at the top of the document.

Invalid

  • An element or attribute is missing but required by the XML schema.
  • An element or attribute is used but undefined by the XML schema.
  • The content of an element does not match the content specified by the XML schema.
  • The value of an attribute does not match the type specified by the XML schema.

Namespace-Well-Formed

Technically, colon characters are permitted in component names in XML. However, colons should only be used in names for namespace purposes:

Note:

The Namespaces in XML Recommendation [XML Names] assigns a meaning to names containing colon characters. Therefore, authors should not use the colon in XML names except for namespace purposes, but XML processors must accept the colon as a name character.

Therefore, another term, namespace-well-formed, is defined in the Namespaces in XML 1.0 W3C Recommendation that implies all of the XML rules for well-formedness plus those governing namespaces and namespace prefixes.

Colloquially, the term well-formed is often used where namespace-well-formed would be more precise. However, this is a minor technical manner of less practical consequence than the distinction between well-formed vs valid XML described in this answer.

Solution 3

Valid XML is XML that succeeds validation against a DTD.

Well formed XML is XML that has all tags closed in the proper order and, if it has a declaration, it has it first thing in the file with the proper attributes.

In other words, validity refers to semantics, well-formedness refers to syntax.

So you can have invalid well formed XML.

Solution 4

As others have said, well-formed XML conforms to the XML spec, and valid XML conforms to a given schema.

Another way to put it is that well-formed XML is lexically correct (it can be parsed), while valid XML is grammatically correct (it can be matched to a known vocabulary and grammar).

An XML document cannot be valid until it is well-formed. All XML documents are held to the same standard for well-formedness (an RFC put out by the W3). One XML document can be valid against some schemas, and invalid against others. There are a number of schema languages, many of which are themselves XML-based.

Solution 5

Well-Formed XML is XML that meets the syntactic requirements of the language. Not missing any closing tags, having all your singleton tags use <whatever /> instead of just <whatever>, and having your closing tags in the right order.

Valid XML is XML that uses a DTD and complies with all its requirements. So if you use an attribute improperly, you violate the DTD and aren't valid.

All valid XML is well-formed, but not all well-formed XML is valid.

Share:
68,566
user18931
Author by

user18931

Updated on July 08, 2022

Comments

  • user18931
    user18931 almost 2 years

    I wasn't aware of a difference, but a coworker says there is, although he can't back it up. What's the difference if any?

  • Quentin
    Quentin almost 14 years
    Probably worth pointing out that well-formedness is a prerequisite for validity.
  • LarsH
    LarsH almost 11 years
    I would disagree with the third paragraph. Neither term says anything about semantics (the meaning of something). DTDs have no way to indicate what a particular element or attribute means. That would be the goal of efforts like Web Ontology Language. Rather, well-formedness refers to a low level of syntax (maybe better referred to as lexical correctness), while validity refers to a higher level of syntax (call it "structural" if you like).
  • LarsH
    LarsH almost 11 years
    @Quentin: that's an important point, and one that recognized XML experts agree on (lists.w3.org/Archives/Public/www-xml-linking-comments/… "The spec explicitly says ..."); but it's not entirely obvious from the XML spec. Do you have a citation for it? Are you basing it on w3.org/TR/REC-xml/#dt-valid ?
  • Kent Pawar
    Kent Pawar over 10 years
    Hello @Rachna. This explains the validation part quite well, but doesn't explain when we can call a XML file "well-formed"...
  • Admin
    Admin over 9 years
    @LarsH By definition, if an XML document isn't well-formed it can't be checked against a DTD or schema.
  • LarsH
    LarsH over 9 years
    @LegoStormtroopr: I agree with you, but my question was, where does the spec say so? Where is the definition you refer to? w3.org/TR/REC-xml/#dt-valid tells what is sufficient - but not what is required - for a document to be "valid". E.g. an XML document checked against an XML Schema can be valid without having a DTD. As such, this definition doesn't exclude the possibility of other ways for a document to be valid.
  • kjhughes
    kjhughes over 9 years
    @LarsH, the spec reference you seek (confirming Quentin's correct assertion that well-formedness is a prerequisite for validity) is: Definition: A data object is an XML document if it is well-formed, as defined in this specification. In addition, the XML document is valid if it meets certain further constraints. See my answer below for further valid XML vs well-formed XML considerations. Thanks.
  • LarsH
    LarsH over 9 years
    @kjhughes: Thanks for answering. That's probably the best spec reference we'll find, but it's not very clear. "In addition" seems to mean that an XML document must be well-formed in order to be valid, but it could be a lot more explicit. Moreover, the link there for "valid" (pointing to w3.org/TR/REC-xml/#dt-valid) makes it clear that the "if" in the spec definitions does not mean "if and only if". (Otherwise, no document could be valid without a DTD.) That further weakens the interpretation that a document can be valid only if it is well-formed.
  • Mathias Müller
    Mathias Müller about 8 years
    This is already a wonderful answer, but perhaps it would help to add a note about namespaces, i.e. about the property of being namespace-well-formed? As you know, namespaces are a common pitfall for beginners and many people would describe a document with namespace problems as "not well-formed".
  • kjhughes
    kjhughes over 7 years
    Thanks, @MathiasMüller. I've added an explanation of namespace-well-formed per your request.