The presentation explains what XML is and what are its various aspects.
The presentation provides insights into Elements, Attributes, Schema etc
Size: 320.67 KB
Language: en
Added: Jul 31, 2019
Slides: 57 pages
Slide Content
XML
Jerry Kurian. Over 20 years experience. Technology Innovator & Entrepreneur Started coding with an Intel 486 machine more than 25 years back and enjoying it ever since. Developed using VB, Pascal, C++, Java Enterprise and OSS, Scala , Node JS and the saga continues. Started using Spring, hibernate before it became hip. Started using Scala when it was in its infancy. After spending 8 years working in various software companies like Huawei Tech, Quidnunc across UK, US, China and India, the entrepreneurship bug bit in 2006 (before it was hip!!). Built one of the pioneers in SMS social network called CellZapp , I developed the product on my own and sold it to marquee customers like ESPN and Hungama Digital. Recently launched a product in field informatics www.isense-tech.co.in . Successfully launched across 3 pilot customers and on track to sign up more. A family man with two kids, I am a passionate weekend cricketer and an involved dad. I urge my two sons to follow their dreams, which they do by staying out of the conventional schooling system and exploring their passion at a democratic free school called BeMe . Check it out at http://beme.org.in
Origin of XML XML (Extensible Markup Language) is a derivative of SGML (Standard Generalized Markup Language), the earliest attempt at a markup language XML is not a programming language, but a set of rules that structure data in a representational manner XML rules are standard and allow easy eXtensibility as per business needs
Why XML Most application have domain specific data that needs to be shared across components With the service orientation of new-age applications, data has to be shared across different applications too Application can share data in a format that can be parsed and understood by a program
Why XML Data has traditionally been shared by defining protocols and arranging data as per the protocol Every protocol needs development of a parser for understanding the protocol and extracting data out of it Development of parser is not an easy undertaking and in fact adds no value to the overall application in terms of its actual business goals
Why XML XML provides an easy substitution to the need of creating proprietary protocols By following XML rules, new domain specific language (Protocol) can be generated without the need for creating its custom parser Any XML document can be parsed by using a valid XML parser
Why XML XML allows application developers to define a business specific protocol which is easy to read for humans as well as easy to parse for applications Numerous parsers are available in all programming language to parse any XML document
Advantages XML allows definition of data in a format understandable to both humans and computers Standard rules of XML allow a standard parser to be used for parsing any XML document XML enables representation of data in simple texts, allowing easy transfer over any type of communication medium
XML document An XML document is made up of a set of tags in the form of ‘<‘ ‘some text’ ‘>’ that denotes start of a ‘node’ The node area ends with ‘<‘ ‘/’ ‘some text’ ‘>’ The XML nodes are made up of Element Attribute Entity Comment
XML usage problem definition Consider a multi user gaming platform where each user plays a game on his own machine and makes a move Data about each move is sent to the other user in the form of XML The game requires each player to send a challenge question to another player with choice of at least 3 answers, one of which can be right
XML usage problem definition Whenever a move is sent by player 1 to player 2, the details of player 1 along with current points should also be sent
XML Definition In the problem definition, the various elements are Player Player Name Player Address Player Points Questions Question Answer
XML Definition The various elements identified in the previous slide can provide almost all the information about a move made by a player These elements will be arranged in an XML document in the following manner
How will you create an XML representing answers from the player who Is being questioned?
<game> <person> <name>Hari</name> <address>Bangalore</address> <points>50</points> </person> <questions> <question> <query>What is XML</query> <answer>A markup language</answer> </question> <question> <query>Where is Bangalore located</query> <answer>Karnataka</answer> </question> <questions> </game> XML Representing answer from the other player can be represented as
Element Element is the basic building block of XML document Every aspect of the domain is described through the Element In our example, the nodes like <person>, <question> etc are elements As seen above, one element can contain one or more elements as its child element
Attribute If an element has some additional characteristics, which is not an element in itself, then it can be denoted using an attribute The attribute is placed within the element node and contains a name=value pair In our example, the list of answers should contain one correct answer. The correctness of an answer can be denoted using an attribute
Root element The XML elements can be represented in the form of a tree The top most element of the XML document is the Root element and each of its child is a root to its own children In our case, the <game> element is the root element of the document.
Empty Elements There could be elements that do not have any child elements under it These elements could just have the attributes in it Such elements are called Empty elements Empty elements are usually denoted as <element_name/>. This is same as <element_name></element_name> with no content between
Comments Comments can be added into an XML document to give more information about tags Comments will be ignored by the parser Comments can be provided between the tags <!- - and - - > <!- - Your comment here - - >
Entity Entities can be used to substitute a value for a data item Entities behave like macros where they are place holders for something else Entities start with & and end with ; Predefined entity like " will be replaced by a ‘ when parsed
CDATA As seen in the example, most of the element contain text between then, which is the value for the element The XML parser returns the value of element by getting the content between the nodes If the content contains some special characters like ‘<‘, ‘>’ its, then it may lead to error in parsing
CDATA Such characters can be escaped by using entities as explained earlier But if you want to avoid entities, then CDATA section can be used When CDATA section is encountered, the parser will leave it alone and pass the text unchanged CDATA can be defined in the following format <![CDATA[ content ]]>.
CDATA Example <points><![CDATA[ <20 ]]></points>
DTD
Document types There are two types of XML documents Well Formed Well formed and valid Well formed documents are any XML document that follow the general XML rules The XML documents above are examples of well formed XML documents
Document types Well formed and valid XML documents are ones that not only follow general XML rules, but also conform to certain domain specific grammar The domain specific grammar is denoted using DTD (Document type definition) DTDs define rules for a domain specific XML document
DTD DTD is made of tags that define the various nodes allowed in an XML document The DTD can be used to define the various aspects of XML document like Element Attribute Entities
DTD A document can refer to a DTD using the <!DOCTYPE> element <!DOCTYPE document [ <! - - DTD goes here - -> ]> <game> <person> <name>Jerry</name> <address>Bangalore</address> <points>10</points> </person>
DTD An XML document can also refer to an external DTD file instead of defining it as part of the XML document itself <!DOCTYPE document SYSTEM “game.dtd"> The SYSTEM specifies this to be a private DTD
Public DTDs DTDs can be created by public body and can be accessed by any XML document <!DOCTYPE document PUBLIC ‘dtd’> The dtd location needs to be specified using a formal public identifier (FPI) FPI Example: -//W3C//DTD XHTML 1.0 Transitional//EN
FPI Rules The first field indicates whether the DTD is for a formal standard. For DTDs you create on your own, this field should be -. If a non-official standards body has created the DTD , you use +. For formal standards bodies, this field is a reference to the standard itself (such as ISO/IEC 19775:2003). The second field holds the name of the group or person responsible for the DTD . You should use a name that is unique (for example, W3C just uses W3C). The third field specifies the type of the document the DTD is for and should be followed by a unique version number of some kind (such as Version 1.0). The fourth field specifies the language in which the DTD is written (for example, EN for English).
Declaring Element The XML elements are declared in DTD using the following syntax <!ELEMENT name content_model > The name indicates the name of the element The content_model indicates the content that the element is allowed to have as its children If there is no content_model specified then the element will be treated as an empty element
Declaring Element In our example, the game element can be declared in the following way <!ELEMENT game (person,questions)> The above element definition specifies that the game element can have person and questions elements as its children If an element provides content_model as ANY then that element can contain any type of children, effectively telling parser to ignore validation of the element <!ELEMENT name ANY>
Child Elements The DTD can specify the number of children allowed for each element <!ELEMENT game (person)> Specifies game element can have only one person child element <!ELEMENT questions (question)*> Specifies that the questions element can zero or many question elements as children
Child Elements Element x or y can be present- but not both x | y Element x should be followed by element y x , y There can be zero or one occurrence of the element ? There can be one or more occurrences of the element + There can be zero or more occurrences of the element * Description Notation
Attribute Attributes provide additional details for an element Attributes can be defined in a DTD using the following notation <!ATTLIST element_name attribute_name type default_value
Attribute definition In our example, the element answer has an attribute correct <!ATTLIST answer correct CDATA #IMPLIED> Specifies default value for attribute value Mandates the attribute #REQUIRED Sets attribute’s value to value #FIXED value Attribute is optional #IMPLIED
Attribute Types Attributes can have the following types CDATA Enumerated type NMTOKEN NMTOKENS ID
Attribute Types CDATA- allows character data that should not contain special characters Enumerated types provides a comma separated list of options. <!ATTLIST answer correct (true | false) #REQUIRED> ]> NMTOKEN are any name token that confirm to XML standards NMTOKENS are a set of NMTOKENS seperated by white space
Entity An entity in XML is just a data item Entities are usually text that are used quite often across the document Entities can also be binary data Entities can be declared like <!ENTITY name definition >
XML Schemas XML schemas are an alternate way of defining the structure of an XML document Schemas are much more comprehensive and detailed way of specifying the XML syntax Schemas also specify the element and attribute of an XML document
Schema Example The game XML document can be defined as <?xml version="1.0" encoding="UTF-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="game" type="gameType"/> <xsd:complexType name="gameType"> <xsd:sequence> <xsd:element name="person" minOccurs="1"/> <xsd:element name="questions" type="questionType" minOccurs="1"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="questionType"> <xsd:sequence> <xsd:element ref="query"/> <xsd:element name="answers" type="answersType"/> </xsd:sequence> </xsd:complexType> <xsd:element name="query" type="xsd:string"/> <xsd:complexType name="answersType"> <xsd:sequence> <xsd:element name="answer" type="answerType"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="answerType"> <xsd:attribute name="correct" type="xsd:string" use="optional"/> </xsd:complexType> </xsd:schema>
Schema Element Schema element can be defined in the following manner <xsd:element name=“query" type="xsd:string"/> Any element that contain child elements or attribute needs to be defined as a complexType Elements that enclose only simple data such as numbers, strings or date are simpleTypes
Schema Element There are some built-in simple schema types like String anyURI Boolean Date dateTime <xsd:sequence> element specifies the sequence of the elements
Number of elements The person element has a minOccurs attribute to specify that it will occur at least once To make an element option, minOccurs should be 0 To make it appear from 0 to 10 times, then we can use minOccurs=“0” and maxOccurs=“10” To specify unlimited number of occurances, set maxOccurs=“unbounded”
Values of element An element can be specified a default value through <xsd:element name=“term” type=“xsd:integer” default=“10”/> An element can be specified a fixed value through <xsd:element name=“term” type=“xsd:integer” fixed=“200”/>
Attributes An element with attributes can be specified in the following manner <xsd:attribute name="correct" type="xsd:string" use="optional"/> Optional tag specifies that the attribute is optional Some of the other use attribute that can be specified are Default Fixed Optional Prohibited required
Namespace The namespaces are useful in reuse of XML tags Once XML document can reuse part of another well defined XML document The new XML document may contain elements that have same name as of the other XML document being referred The name clashes can be avoided using a namespace
Namespace The namespace for an XML document can be defined using the targetNamespace attribute of the schema element <xsd:schema targetNamespace="http://xmlpowercorp" xmlns="http://xmlpowercorp" xmlns:xsd="http://www.w3.org/2001/XMLSchema" attributeFormDefault="qualified" elementFormDefault
Namespace The qualified attribute value specifies that the namespace name will be specified before every element in that namespace To avoid this we can use set the value to unqualified
Unqualified namespace <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://xmlpowercorp" elementFormDefault="unqualified" attributeFormDefault="unqualified> <xsd:element name="game" type="gameType"/> <xsd:complexType name="gameType"> <xsd:sequence> <xsd:element name="person" minOccurs="1"/> <xsd:element name="questions" type="questionType" minOccurs="1"/> </xsd:sequence> <xsd:attribute name="documentDate" type="xsd:date"/> </xsd:complexType> In this case, the parser will assume that all elements are to be found in the default namespace, which will create problem in this case for questionType
Unqualified Namespace <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://xmlpowercorp" xmlns:xmp="http://xmlpowercorp" elementFormDefault="unqualified" attributeFormDefault="unqualified> <xsd:element name="game" type=“xmp:gameType"/> <xsd:complexType name="gameType"> <xsd:sequence> <xsd:element name="person" minOccurs="1"/> <xsd:element name="questions" type=xmp:questionType" minOccurs="1"/> </xsd:sequence> <xsd:attribute name="documentDate" type="xsd:date"/> </xsd:complexType> In this case, the parser will look at the qualified xmp for the elements gameType and questionType and know that it belongs a different namespace