Title: 863??
1??
- ??????????????????????
- ??????????????(?????????????),??????????????????
2XML
3???????XML (eXtensible Markup Language)
- XML?????????????SGML(Standard Generalized Markup
Language)????????(HTML)? - 1996?,?????(W3C)???????????XML(eXtensible Markup
language)???? - 1998?2?10????XML1.0,??????????????????????????????
?? - ????
- http//www.w3.org/TR/REC-xml/
- W3C Recommendation 04 February 2004
- XML is a family of technologies XSL, XSLT,
XPath, Xlink, Xpointer, DOM, etc.
4(No Transcript)
5An Example XML Document
lt?xml version"1.0" encoding"ISO-8859-1"?gt lttrade
Batchgt lttrade account"2520034" action"buy"
Duration "good-till-canceled"gt ltsymbolgtSUNWlt/sy
mbolgt ltquantitygt1000lt/quantitygt ltlimitgt20lt/limit
gt ltdategt2001-03-05lt/dategt lt/tradegt lttrade
account"9240196" action"sell"
duration"day"gt ltsymbolgtCSCOlt/symbolgt ltquantitygt50
0lt/quantitygt ltdategt2001-03-05lt/dategt lt/tradegt lt/tr
adeBatchgt lt!-- This is a comment --gt
6XML Document
- The document is composed of declarations,
elements, comments, character references, and
processing instructions, all of which are
indicated in the document by explicit markup - A data object is an XML document if it is
well-formed, as defined in XML specification - A well-formed XML document may in addition be
valid if it meets certain further constraints
7XML Document Contents
- XML declaration
- Processing Instructions
- Elements
- Tags Start-Tags, End-Tags
- Attributes
- Empty-Element Tags
- PCDATA
- CDATA
- White Spaces
- Comments
Prolog
Content
8????????
- ???????(lt)?????(gt)?????????????????????
- ??????????????????????????
- ????????????????-???
- ??????,?????????
- ??????
- ????????????
9XML ?????????????
- XML ????? XML ??????,??????????????????
- lt?xml version"1.0"
- encoding"gb2312"?gt
- ??lt!----gt
- ????lt?......?gt
- ??lt!ENTITY dw "developerWorks"gt
10???
- XML ??????????????????????????,??????????????????
- lt?xml version"1.0"?gt
- lt!-- A well-formed document --gt
- ltXML??gt
- Hello, World!
- lt/XML??gt
11XML Namespace - ????
- ltdefinitions
- xmlnsxsd"http//www.w3.org/2000/XMLSchema"
- xmlnsxsd1http//example.com/stockquote.xsd
- xmlnssoap"http//schemas.xmlsoap.org/wsdl/soap
/" - xmlns"http//schemas.xmlsoap.org/wsdl/"
- targetNamespace"http//example.com/stockquot
e.wsdl"gt - lttypesgtlttypes /gt
- lt/ definitions gt
- ???????URI
- ?????????????
12XML Namespace(?)
- XML?????????????????,?W3C???????
- ?XML?,????????tag??????????????,?????XML????????,?
????????Namespaces???????????? - ?XML Namespace??????Namespace??URI?????,?XML?????
?????????????????Namespace,???????????????????????
?? - ???Namespace?XML 1.0???,??????????????????????????
????local names(????)????????????????????,????????
?????????,???????????????XML???????,?????????????
13XML??(2)
- lt?xml version"1.0" encoding"gb2312" ?gt
- lt??gt
- Â lt??gt???lt/??gt
- Â lt??gt???????? ?????lt/??gt
- Â lt?? ????"??"gt??lt/??gt
- lt??gt
- Â lt??gt????lt/??gt
- Â lt??gtweb ????lt/??gt
- Â lt/??gt
- Â lt??gt13701068603lt/??gt
- Â ltE-mailgtdfma_at_nlsde.buaa.edu.cnlt/E-mailgt
- Â lt??gt??????????lt/??gt
- lt/??gt
14Well-Formed XML Documents
- A "Well Formed" XML document has correct XML
syntax - A textual object is a well-formed XML document if
it has the correct XML syntax - It contains one or more elements
- There is exactly one element, called the root, or
document element - The name in an element's end-tag must match the
element type in the start-tag Names are
case-sensitive - Each of the parsed entities which is referenced
directly or indirectly within the document is
well-formed
15DTD
- ??????(Document Type Definition ,DTD)?XML
1.0?????? - DTD????????????????,????XML??????
161 lt?XML version"1.0" standalone"yes"?gt 2
lt!DOCTYPE Students 4 lt!ELEMENT Students
(Student)gt 5 lt!ATTLIST Students Class
CDATA REQUIREDgt 6 lt!ELEMENT Student (Name,
Age?)gt 7 lt!ATTLIST Student SId CDATA
REQUIREDgt 9 lt!ELEMENT Name (PCDATA)gt 13
lt!ELEMENT Age (PCDATA)gt 18 gt 19 ltStudents
ClassSY9061gt 20 ltStudent SId"12345"gt 21
ltNamegtLinlt/Namegt 22
ltAgegt20lt/Agegt 23 ltAddressgt 24 ltCountrygtC
hinalt/Countrygt 25 ltCitygtBeiJinglt/Citygt 26
lt/Addressgt 27 lt/Studentgt 28 ltStudent
SId"12345"gt 29 ltNamegtLinlt/Namegt 30
lt/Studentgt 31 lt/Studentsgt
17 DTD???
- ELEMENT ??
- ??????
- ATTLIST
- ???????????????????????
- ENTITY
- ????????
- NOTATION
- ??????????(???????)?????,?????????????????
18DTD??????
- DOCTYPE ??
- ??DTD??
- lt!DocType catalog ???? gt
- ??DTD??
- lt!DOCTYPE catalog SYSTEM http//myserver/decs/pub
catalog.dtdgt
19?????
- ??D T D ??????????,??????????????????? ?????X M L
?????D T D ,?????????????D T D ,???????D T D
??????
20DTD???
- DTD?????????,????XML????,????XML???,?????????????
??XML,?????????????????DTD,??????????XML??,??XML?
???? - DTD???????????
- ????,???????DTD???????,??,??XML???????DTD,?????XML
?????????
21XML Information Set and Canonical XML
- ?????XML??????????????
- ???????????
- ???????XML?????
UTF-8
lt?xml version"1.0" encoding"gb2312" ?gt
lt??gt  lt??gt???lt/??gt lt??gt Â
lt??gt????lt/??gt  lt??gtweb ????lt/??gt Â
lt/??gt ltE-mailgtlt/E-mailgt lt/??gt
White space, CR, CR-LF, and LF line termination
ltE-mail/gt
22XML Information Set
- ?W3C???????XML Information Set????????????????????
????XML?????? - XML Information Set?????????,????(document)
???(element)???(attribute)???(character)???(commen
t)????????????????????XML???????? - ???????,??XML????????XML?????,???XML Information
Set???????????XML?????? - W3C?????XML??????????XML?????????XML Information
Set?????????????,XML Information
Set????????XML??????????????????
23- XML Information Set???????XML Information
Set?????????,????????????XML?????,????????? - XML?????????????(Information Item)??,???????????(p
roperties)? - XML Information Set????????,????????,??XML
Information Set???????????? - Information Set?Information Item????tree,no
de???????
24- Information Set????11????Information Item?
- The Document Information Item
- Element Information Item
- Attribute Information Item
- Processing Instruction Information Item
- Character Information Item
- Comment Information Item
- The Document Type Declaration Information Item
- Unexpanded Entity Reference Information Item
- Unparsed Entity Information Item
- Notation Information Item
- Namespace Information Item
25- What is not in Information set
- The content models of elements, from ELEMENT
declarations in the DTD. - The grouping and ordering of attribute
declarations in ATTLIST declarations. - The document type name.
- White space outside the document element.
- White space immediately following the target name
of a PI. - Whether characters are represented by character
references. - The difference between the two forms of an empty
element ltfoo/gt and ltfoogtlt/foogt. - White space within start-tags (other than
significant white space in attribute values) and
end-tags. - The difference between CR, CR-LF, and LF line
termination.
26- The order of attributes within a start-tag.
- The order of declarations within the DTD.
- The boundaries of conditional sections in the
DTD. - The boundaries of parameter entities in the DTD.
- Comments in the DTD.
- The location of declarations (whether in internal
or external subset or parameter entities). - Any ignored declarations, including those within
an IGNORE conditional section, as well as entity
and attribute declarations ignored because
previous declarations override them. - The kind of quotation marks (single or double)
used to quote attribute values. - The boundaries of general parsed entities.
- The boundaries of CDATA marked sections.
- The default value of attributes declared in the
DTD.
27Canonical XML
- ?W3C???????
- ?XML 1.0??????,??????????,????????????,???????????
???,???????????????????????,???XML??????,?????????
????????XML??,??Canonical XML???????,??????? - ????????????????,???????,???????(??XML
Signature),???????XML???????????????????
28The End!
29???????
30ASCII
- ??,Internet????????ANSI?ASCII???(American
Standard Code for Information Interchange,
?????????) - ???7 bits???????,????128???,?????000-07F?
- ASCII??????????????????????????000-020?07F?33??
????
31- ISO-8859-1????????,????ASCII,??????000-0xFF,000-
07F?????ASCII??,080-09F???????,0xA0-0xFF???????
? - ISO-8859-1??????ASCII??????,??????????????????????
????????????????????,??????ISO-8859-1???
32- Latin1?ISO-8859-1???,???????Latin-1?
- ASCII?????7????,ISO-8859-1?????8?????
- ??ISO-8859-1????????????????,???ISO-8859-1????????
?????????????????????,?????????????ISO-8859-1?????
??????????????,MySQL????????Latin1??????????
33Unicode and UCS
- ????????????????,????????????????,????????????????
??? - ??,???????????????????????Unicode???ISO???????????
???,??????????????????????????????? - Unicode????Universal Multiple-Octet Coded
Character Set,???UCS? - UCS?????Unicode Character Set????
34- Unicode unicode.org???????, ??????????????.
- ?1.0??16???, ?U0000?UFFFF. ??2byte???????
- ?2.0?????16???, ???16????????, ?????16????,
???20???, ????0?0x10FFFF.
35- ISO 10646???????31??????
- ????????(00000-0xFFFD)?????????(Basic
Multilingual Plane, BMP) - ?????????????????
- BMP????????????????,??????BMP????????????????
36- ?1991???,?????????????????????
- ?unicode2.0??, unicode????ISO 10646-1????????,
- ISO???ISO10646??????0x10FFFF?UCS-4????, ????????.
- ??Unicode???????????ISO 10646????
- Unicode 3.0???????BMP????
37UCS-2 UTF-16
- UCS????????,?????????????????
- ?????UCS???6C49
- UTF-8?UTF-7?UTF-16???????????
- UTF?UCS Transformation Format????
38- UCS-2?UTF-16?UCS???(????Unicode???)?????????????
- UCS-2??????????,????????????????,?????BMP????????
- UCS-2???GBK?Big5,?????????,???????????,???????????
???????????
39- UTF-16?????,??????BMP??????,?4??????BMP???????????
???? - UTF-16?UCS-2???,UTF-16?????????????UCS-2??,?????BM
P????UCS-2?????UTF-16?
40- UTF-16??????????,?????UTF-16???,??????????????????
???Unicode???594E,??Unicode???4E59???????UTF-1
6???594E,????????? - Unicode????????????????BOM?BOM??Bill Of
Material?BOM?,??Byte Order Mark?
41- BOM???????????
- ?UCS????????"ZERO WIDTH NO-BREAK
SPACE"???,?????FEFF??FFFE?UCS????????,????????????
??UCS?????????????,?????"ZERO WIDTH NO-BREAK
SPACE"? - ?????????FEFF,?????????Big-Endian?????FFFE,??????
???Little-Endian??????"ZERO WIDTH NO-BREAK
SPACE"????BOM?
42(????)
- UCS-2?UTF-16??????????????????,???big
endian?little endian(?????)? - ???(U554A)?big endian????0554A,?little
endian????04A55? - UCS-2?UTF-16???????big endian???
- ??????????????????????BOM(Byte order
Mark),0xFEFF???big endian,0xFFFE???little endian? - UCS-2BE?UCS-2LE?????????????,???big endian?little
endian,UTF-16BE?UTF-16LE??????????BE???,?????UCS-2
???UCS-2BE???
43UTF-8
- UTF-8?UCS???????????,UTF-16??????????(16?),?UTF-8?
?????????(8?)? - UTF-16????????????????,UTF-8?????????????????
- ????UTF-8??????????UCS-2?????,?UCS-2?UTF-8????????
?
44UCS-2 UTF-8
U0000 - U007F 0xxxxxxx
U0080 - U07FF 110xxxxx 10xxxxxx
U0800 - UFFFF 1110xxxx 10xxxxxx 10xxxxxx
45- ?????Unicode???6C49?6C49?0800-FFFF??,??????3????
?1110xxxx 10xxxxxx 10xxxxxx??6C49??????0110
110001 001001, ??????????????x,??11100110
10110001 10001001,?E6 B1 89? - ?????UCS-2???0554A,???????0101 0101 0100
1010,??UTF-8?????????1110 0101 10 010101 10
001010,????????0xE5958A?
46- ??UTF-8??????UCS??????,????UTF-8?????
- UTF-8???ASCII??,????ASCII??????UTF-8??ASCII???????
???000-07F????????ASCII??,?????????????GBK?Big5?
??????UTF-8??????? - ??U007F?UCS??,?UTF-8???????????
- UTF-8??????????????000-0xFD??(???UCS-4?????,????0
00-0xEF??)???????????????????? - ???????????080-0xBF??0xFE?0xFF?UTF-8???????
- GBK??????????UCS-2??????U0800 -
UFFFF??,????GBK?????????UTF-8????3?????GBK???????
??UTF-8???????3????,?GBK??????? - ?UTF-8?????????????????,?????????????????,????????
???????????,????????,???????????????????????UTF-8?
????????????
47?????????
- ????????????,??????????????????????????????????
- ??????????????????????
- ???????
- ??????
- ?????????
- ???????????????????,???? Charset/encoding,???
EF BB BF UTF-8FE FF UTF-16/UCS-2, little
endianFF FE UTF-16/UCS-2, big endianFF FE
00 00 UTF-32/UCS-4, little endian.00 00 FE FF
UTF-32/UCS-4, big-endian.
48- ???????????????,?????????????????????????????,????
????????????????HTTP?????????,??????????????????HT
TP????????????? - Content-Type text/htmlcharsetutf-8
49- ?????????Html??,??????????????
- ltmeta http-equivContent-Type
contenttext/html charsetUTF-8/gt - ?????????????????????,???????Charset????
- UCS-2/UTF-16?BOM???????????????,??????????BOM?????
?
50- Java ?????????????? UTF-16??? Java ??? charset
??? 16 ? UTF-16 ??????????????????? - Java ????????????????? charset?
- US-ASCII, 7 ? ASCII ??,??? ISO646-US?Unicode
????????? - ISO-8859-1,  ISO ????? No.1,??? ISO-LATIN-1
- UTF-8, 8 ? UCS ????
- UTF-16BE, 16 ? UCS ????,Big Endian(??????????)????
- UTF-16LE, 16 ? UCS ????,Little-endian(??????????)?
??? - UTF-16, 16 ? UCS ????,?????????????????
51??????
52GB2312, 1980
- GB2312????????????????????,?????????????????,?
????????,1981?5?1???,?????????????????? - GB2312???????????????????7445?????,?????6763??
- GB2312????????????????????,?????????????,???????
??????,??????????
53- GB2312?????????,?????????94??,?????94??,??????????
??????????????????? ?10??????,?1601???16?1?,??????
?? - ????01-09????????,16-87?????,10-15?88-94??????????
?????????????????? ?3755?,??16-55?,???????/??????
????????????3008?,??56-87?,???/??????????
?????????,??????????????????????,?????????????????
?????????? - ???????????????,??????????????
54- GB2312???????2121H-777EH,?ASCII???,??????GB???????
???1????? - ???????????????0xA0????GB2312???
- EUC-CN?????GB2312???,?GB2312?????
- ????GB2312???????? Unicode?UTF-8?
55GBK, 1995
- GB2312-80????6763?,?????????,??????????????????,??
???????,????????,????????,???GB2312-80,????????
?????(??)?(??)?(????)????,??????,????????????????
????,????????????????????,???????????? - ????????,????UNICODE???,?????????????1995?12?1???
??????? - GBK???GB2312 ????,????ISO 10646????,??????????????
????????
?????
56- 1995????????GBK1.0???21886???,??????????????????21
003???? - GBK ????????,???????8140-FEFE??,????81-FE??,????40
-FE??,??XX7F???? - GBK ????????, GBK?????????08140-0xFEFE,???????07
F??????????081-0xFE,??????040-7E?080-0xFE?
57- ??????040-07E????(?)?????,???????????????????
????? GBK??????,??????????????GB2312????????????? - ????040-07E?GBK????????,?????????ASCII????,?????
???????? - ??GBK??????080???????? ?ASCII????????????040?AS
CII?????????,?????????,???????????????Big5????????
???
58Big5
- Big5??????,????????081-0xFE,????????040-07E?0xA
1-0xFE??GBK??,??????080-0xA0????08140-0xA0FE????
?,???????? - Big5????????????,???????,?????????????GBK?????????
?????Big5????????Big5?????????,??????Big5????????,
????????Windows?????????CP950????????Big5???,?Big5
???????7?????????Big5?????????GBK??????,????Big5??
????GBK????????,??????????? - ??Big5????ASCII?????(???????040-07E),??Big5?????
??????GBK???????,???????040-07E???????????,?????
??05C(/)?07C()????????GBK???????
59GB18030 , 2000
- GB18030??????GBK?GB2312,????????????,?????????????
GB18030?????Unicode3.1????,??????????,GBK?????????
?,??????????????????????? - GB18030???????,?????????????????
- GBK?GB2312?????????,?????ASCII?????????,??????????
???????????? - GB18030
- ????????000-07F,?????ASCII
- ?????????GBK??,????081-0xFE,???
??????040-07E?080-FE - ??????????????????081-0xFE,??????030-039
60??????
- ??" Â ?Google????E9BB91E799BD Â Â
- ???????BADAB0D7 Â
- ??????utf8???,???????MBCS?GB2312??????? Â
back