Title: Painless XML Authoring? How DITA Simplifies XML
1Painless XML Authoring?How DITA Simplifies XML
- Bob Doyle
- editor_at_cmsreview.com
- bobdoyle_at_skybuilders.com
- 617-876-5676
- Skype bobdoyle
2A brief poll. Whos heard of
- Structured writing? Information Mapping?
- Task-oriented Documentation? vs. ?
- Minimalism? John Carroll?
- Single-source publishing? vs. Reuse?
- Component Content Management?
- Topic-based authoring?
- Bob Horn, John Brockmann, JoAnn Hackos, Ginny
Redish, Ruth Clark?
3All heard of DITA?
- Information Typing
- Topics Concept, Task, and Reference
- DITA Maps
- DITA Open Toolkit
- DITA is Simplified XML
- Specialization
4A brief surevy of tools
- PTC Arbortext (Epic)
- JustSystems XMetal
- Adobe FrameMaker
- Word to DITA (in.vision, Info Mapping)
- XML Spy, oXygen
5Heard of me?
- Ph.D. Astrophysics, Harvard, 1968
- Collaborative Observing Program,
- NASA Skylab 1970-72
- Super8 Sound, 1973-78
- Merlin and 5 other computer games 1977-81
- iXO Telecomputer 1980-87
- MacPublisher 1984-1987
- Digital Video Editor, New Media Magazine
-1993-1999
6Parker Brothers Games
7iXO Telecomputer
- Computer-initiated dialogues (AI)
- Yes, No, Help, Repeat keys
- Operators are standing by
- Stock trades, airline reservations, bill paying.
- Hearing-impaired
- Powered from phone line
- Venture capital 13 million
- Never developed the backend database services
- Huge NOL carry-forward
8MacPublisher
- First Desktop Publishing Program
- 11th Certified Mac Developer
- Shipped in 1984
- Laserwriter in 1985
- First spot color text on Apple Imagewriter
- First rotated text/gaphics
- Sold 20,000 copies
- MacIndexer
- Mac-Hyphen
- Sold to Letraset in 1987
9Doing What Recently
- CEO, skyBuilders.com
- Editor, CMS Review
- related websites CMS Wiki, CMS Forum, CMS News,
CMS Calendar, CMS Glossary, CMSML, CMS Boston,
Open Internet Lexicon, TaxoTips - Founder, CM Professionals
- Contributing Editor, EContent Magazine
- Founder, DITA Users
- related websites DITA Infocenter, DITA News,
DITA Newsletter, DITA Blog, DITA Wiki, and DITA
Tutor
10The First Podcast - 2003
- Christopher Lydon (NPRs The Connection)
- Dave Winer
- Adam Curry
- Bloggercon
- BlogAudio.org
- Lydons Open Source Show
11EContent Magazine
- Contributing Editor
- 6 columns per year
- XML Authoring Tools Review
- 12 online columns per year
- EC100 selection
12Joined OASIS - 2006
- Organization for the Advancement of Structured
Information Standards - Member DITA Technical Committee
- Member Learning and Content SC
- Member Help SC
- Observer Translation SC
- Member Editorial Board
- Organizer Boston DITA User Group
13DITA Users Launched in March
- DITA Users is an international membership
organization - 400 members from 21 countries.
- Members learn topic-based structured writing.
- Author DITA with DITA Storm browser-based editor
- Deliverables for web (XHTML), print (PDF), Help
(Eclipse) from single-source documents. - Members have a personal workspace folder.
- Finished work on web to show colleagues and
clients. - Member directory has contact information.
- Discounts on major DITA conferences, on tools
(?), on - DITA tutorials and workshops, and on the DITA
Report.
14DITA Infocenter Launched April
- DITA Infocenter is Eclipse-based Online Help
- DITA Architectural Specification (1.0 and 1.1)
- DITA Language Specification (1.0 and 1.1)
- Open Toolkit User Guide (1.3.1)
- Full-text search
- Index of keywords
- Table of contents
- Generated from DITA files with Open Toolkit
15DITA News Launched June
- Aggregates blog posts from DITA bloggers.
- Extensive listings of DITA tools from A to Z.
- Events calendar with conference listings,
- Websites, Publications, Webinars.
- Glossary of DITA terms.
- Content syndicated to other websites
- Single-source publishing tools.
16DITA Blog Launched July
- Group blog
- Anyone may join
- RSS feeds syndicate to DITA News
17DITA Wiki Launched July
- Resources with comments and discussions.
- Mediawiki software (Wikipedia)
- Architectural and Language specifications
- Vendors and Products
- Professional Services
- Edited directly by the vendors
- User comments
- People section - major DITA players
- Glossary of terms
18DITA Newsletter Launched September
- Monthly summary of DITA news
- Industry mailing list for press releases.
- DITA Mentor Awards
- Next months events listings
- Member discount offers
19DITA Tutor Launched September
- Learning management system (Moodle LMS)
- Self-paced online tutorials
- Instructor-led online workshops
- Powerpoint presentations
- Some with audio recording
- Recorded webinars
- Courses in DITA techniques
- Certificates of completion.
20DITA User Groups
- dita-users_at_groups.yahoo.com
- http//dita.xml.org/user-groups
- Encouraging remote attendance
- Recording meeting presentations
- Archiving to DITA Tutor
- Possibly repurpose as eLearning
- What collaboration tools should we use?
21Structured Writing 1960s and 70s
- Structured writing requires an analysis of
content and a reorganization into the smallest
possible coherent topics. Decades of research on
such analysis and organization have been done by
Information Mapping, who identified common
document types, information types, and
information blocks (chunks or topics) in use in
education and commerce. - The reduction in structured authoring time may be
offset by the increased time needed to analyze
the content and break it into reusable chunks.
There is no doubt that granular content, with
well-defined purposes for each paragraph and
sentence, is easier to author than linear
content. But you may need skilled (i.e., more
expensive) information developers to chunk your
material.
22Task-oriented Documentation 1980s
- Task-oriented docs have replaced system-oriented
or product-oriented docs - the old comprehensive
user manual. - ROI - The number of calls per month to the help
desk on a product will almost certainly change
when product documentation is task oriented and
minimalist. And task-oriented content can feed
directly into help-desk scripts.
23Minimalism 1990s
- Minimalism aims to provide just what the
impatient user is looking for. Remember, the web
surfer is always just one click away from going
to your competition's website. Your job is to
strip away unnecessary content and get to the
point. You can measure the return by pre-testing
and post-testing content that has been
re-architected along minimalist principles. - Minimalism appears to promise reduced costs for
the simple reason that there is so much less
content in well-prepared minimalist material. But
it takes talented people to write succinct,
action-oriented procedures that get users to
understand quickly what they need to know and
successfully do it. And minimalist material is
best when it is tested for effectiveness, adding
to costs. -
24Single-source Publishing 1990s
- The original definition of single-source
publishing was providing multiple output formats
like Web, Print, and Online Help from the
original documents. - When you have one source for each piece of
content, you get the astonishing ability to
change it in one place and have the change
propagate everywhere. A product name change
becomes much more manageable. Your
business-critical marketing messages are
standardized everywhere. Some call single source
a "single source of truth" because you are
assured that your customers are not getting mixed
messages that can confuse them, reduce sales, and
increase the need for tech support.
25Single-source plus Reuse
- Reusable content has a single source, of course,
but reuse generally refers to content originally
developed for one context that can be reused in
another. This requires content that is
topic-based and written for reuse by avoiding
explicit references to context. - The cost savings associated with reuse of content
increase greatly when your content goes through a
workflow with distinct review and approval
stages, for example legal approval. Content that
is reused generally can avoid all or most of the
extra steps in the workflow that involve accuracy
of content. You will still need design approval
of the in-context appearance of the reused
content.
26Component Content Management
- The latest buzzword in CMS is "component." Most
web content management (WCMS) segment content at
the web page. While this may be adequate for
simple websites written by one or a few content
contributors, it is not acceptable for websites
whose pages act as portals to diverse kinds of
interactive content. - Modern corporate pages pull content in from
multiple sources. Each content block is filled
with a content component managed independently of
all the other blocks on the page. A component has
its own versioning and scheduling, its own
writers, reviewers, and approval process.
27Topic-based authoring
- A topic is a unit of information with a title and
some form of content, short enough to be specific
to a single subject or answer a single question,
but long enough to make sense on its own and be
authored as a unit. - A topic aims to be context-free, so it contains
no links to other topics. - In DITA, the topic is the basic unit of authoring
and of reuse. - A topic is a content component
28Why Concept, Task, and Reference?
- Remember Macintosh doc guidelines?
- Learning MacPaint, Using MacPaint, the MacPaint
Reference. - Todays OReilly Books Learning PHP,
Programming PHP, PHP the Definitive Reference - Concept What is it?
- Task How do I do?
- Reference All the details.
29Whats a DITA Map?
- The DITA Map provides context for your
context-free topics the content. - You can have many maps, each one arranging the
topics for different requirements a reference
manual, a tutorial, a help desk. - The map is like a table of contents that rebuilds
the book dynamically.
30Whats the DITA Open Toolkit?
- The Open Toolkit is an open-source end-to-end
single-source publishing system. - It takes your topics and your maps and generates
multiple output format deliverables, like print
(PDF), web (HTML), and Help. - It is free and has been integrated into leading
DITA editing and CMS tools.
31Why Simplified XML?
- DITA is XML.
- XML is way harder than HTML and most writers want
no part of HTML. - So how can DITA be easier than XML?
- Because XML separates content from presentation
- And it also separates content from structure
32What Is Content Anyway?
- Its not the Presentation or the Structure!
- Separate Presentation Layer from Content
- Structure the Content
- Tag Content with Meaning (semantics) by Metadata
33Three Kinds of Markup
- The three layers use different markup
- Style - ltfontgt, ltbgt, ltigt
- Structure - ltpgt, ltolgt
- Semantics ltnamegt, ltpricegt, ltproductgt
34Three Kinds of XML
- The three layers use different technologies
- XSLT Stylesheets (CSS)
- XML Schemas (DTDs)
- XML/DITA Documents
35Three Different Professions
- The three layers are the work of different
professionals - Designers for Style
- Architects for Structure
- Authors for Content and metadata
36Simplified XML again
- The DITA Open toolkit is XML with a starter set
of stylesheets (XSLTs) and schemas (DTDs) so your
organization does not have to invest in months or
years of development - But simplified can be too simple
37DITA is not for writers alone..
- Without style designers (XSLTs)
- Without structural architects (DTDs)
- DITA sucks!
- Its like publishing your annual report in
Notepad text! - Although topics are components, they dont have
the metadata needed to assemble them
intelligently.
38So whats the benefit for writers?
- Your work can feed into the dynamic assembly of
complex information products - Websites, Help systems, Custom Print
Documentation, Mobile snippets - You are an assembly line writer in the age of
information automation! - Love it or hate it?
39Topics are Content Components
- Even subtopic elements can be reusable components
- Elements just need unique IDs
- Then they can be conrefd (content referenced)
which means you can include them by reference in
other topics. - Specialized topics have metadata created by the
structure architects.
40So what is specialization?
- You can specialize structures
- You can specialize element names
- Then valid topics can be written in
DITA-compliant authoring tools without knowing
anything about the underlying XML - And they can be assembled automatically using the
metadata implicit in the specialization.
41Three examples of specialization
- Concepts are specialized topics
- Tasks are specialized topics
- References are specialized topics
- By understanding those specializations, you will
know how specialization works - But remember that specialization is the work of
document architects and information designers
42A close look at a topic
- A topic has only three required elements.
- an id attribute in the main topic tag (for reuse)
- a title
- a body
43A close look at a topic
- It can have dozens of optional elements, many of
which are very familiar HTML elements, like
paragraphs ltpgt, lists ltulgt, and tables lttablegt
44A close look at a topic
- Elements are shown schematically as colored boxes
in a hierarchy. - They are actually XML tag structures, properly
nested and well formed. - lttopic id"1"gt
- lttitlegtMy Topiclt/titlegt
- ltshortdescgtAbout my topic...lt/shortdescgt
- ltbodygt
- ltpgtSome contentlt/pgt
- ltpgtSome more contentlt/pgt
- lt/bodygt
- lt/topicgt
45The Concept Type
- The concept type specializes topic element names
and topic structure. - The root element is renamed concept and the body
element is renamed conbody. - Any number of paragraphs, lists, tables, etc. may
appear, but none of these are allowed after the
first section or example. - Sections and examples can then appear in any
order.
46The Task Type
- The task type specializes topic element names and
topic structure. - The root element is renamed task and the body
element is renamed taskbody. - One task prerequisite and one context (both
specializations of section) are followed by steps
(a specialization of ordered list). - Each step must have a command, then optional
info, a step example, choices, and a step result.
- The set of steps is followed by the task result,
examples, and any task postrequisite.
47The Reference Type
- The task type specializes topic element names and
topic structure. - The root element is renamed reference and the
body element is renamed refbody. - The refbody includes a properties element (a
specialization of simpletable) a three-column
table of property types, values, and
descriptions. - The element refsyn (reference syntax) is a
specialization of the section element.
48Thank you.
- Contact Bob Doyle
- editor_at_cmsreview.com
- bobdoyle_at_skybuilders.com
- Read my EContent articles
- www.econtentmag.com/About/AboutAuthor.aspx?AuthorI
D155 - Please join DITA Users
- www.ditausers.org/membership/how_to_join
- Merlin lives!
- www.theelectronicwizard.com
- This presentation is online at
- www.ditausers.org/users/bobdoyle/DocTrainEast2007.
ppt
49DITA Users Network 2007
- DITA Blog
- DITA Infocenter
- DITA News
- DITA Newsletter
- DITA Tutor
- DITA Users
- DITA Wiki
50DITA Report - November
- Coming November 2007
- Based on my XML Editors Review
- Marketplace analysis
- Vendors and Products Evaluated
- Strategies from 1 to 100s of writers
- Online tour of authoring tools
51XML Editors
- Altova XML Spy
- Cladonia Exchanger
- Stylus Studio
- SyncRO Soft ltoXygen/gt
- Adobe FrameMaker
- Arbortext Editor
- XMetal Author
- Syntext Serna
Eight top XML Editors were studied Chosen from
65 in CMS Review Editor Listings Published
in the June issue of EContent Magazine Extended
version - XML Editors Report
52Which Editors Do You Use?
- A quick poll of your experience
53The XML Editors Report
- Personal use license
- Corporate license
- One year of release versions
- Online consulting included
- Screen share to look at interfaces
54CM Pros Best Practices
55CMS Trends
- Open Source (and Open Documents)
- Online (ASPs and Web Services)
- Offshore? (Globalization)
- Enabling technologies (XML, Javascript)
- AJAX, Web 2.0
56Information Architecture and Content Management.
- Two Kinds of Information Architecture
- IA of document sets, books in a library, a
website, the World Wide Web organization,
cataloging, metadata tagging, accessibility,
findability. - IA of a single document - page structure, allowed
navigation elements and reusable content
components.
57Defining Content Management
- What is a CM System?
- What Is Content Management?
- What Is Content?
58What is a CM System?
- It is humans using computers and software to
assist in managing content. - It has two main parts
- The user interface.
- The database (content repository).
- Everything else is magic middleware.
- It helps manage the content lifecycle.
59What Is Content Management?
- Content management is the whole process from
creation and capture of original content to the
delivery of different versions to many publishing
channels - Print
- Web
- Cellphone
- Etc.
60The Content Lifecycle
- 7 stages
- Organize
- Rules
- Create
- Storage
- Assembly
- Publish
- Archive
- Context
- Users
- Content
61Brown Television (BTV)
62Hi-8 Users Group
- Funded Videomaker Magazine, Hi-8 Group became
Desktop Video Group in 1992
63HRTV and Quad Sound
- Harvard-Radcliffe Film Workshop was in the
basement of Holmes Hall (North/Pforzheimer House)
where the old Radcliffe Radio Station and Morse
Music Library were located. In the mid-80s it
became HRTV and the radio broadcast booth and
adjoining sound rooms became Quad Sound Studios.
64CMS Review
65Other CMS Review Sites
- CMS Forum
- CMS Wiki
- CMSML
- CMS News
- CMS Calendar
- CMS Glossary
- CMS Boston
- Memography
- Open Internet
- Lexicon
- TaxoTips
- List-2-Web
66CMS Review Glossary
67Finding a CMS
- The CMSML project at CMS Review and CM Pros
Click compare to get the results below...
Select two CMS or enter search terms to find
CMS that match your criteria. The directory is a
faceted classification scheme.
68CM Professionals
- Nearly 1000 members in 2006
- Website (7/10 Google PageRank)
- Benefits - Mail, Member Directory
- Glossary, Resource Library, Calendar
- Communities - CMSML, DITA, Global
- News, Blog aggregation
- Globalization, Personalization
69CM Professionals
70CM Pros Member Directory
71CM Pros Calendar
72CM Pros Videos
- Eighty hours of video from Gilbane Conferences,
IA Summit, OSCOM, Bloggercons at Harvard.
Bob Boiko interviews Shino
73CM Pros Communities
- CMS Markup Language
- (and Faceted CMS Directory)
- Globalization website in 10 languages
- (translations by volunteers)
- DITA
- (JoAnn Hackos, Scott Abel, others)
74DITA Island
- Second Life meetings on DITA