Painless XML Authoring? How DITA Simplifies XML

1 / 74
About This Presentation
Title:

Painless XML Authoring? How DITA Simplifies XML

Description:

Merlin and 5 other computer games 1977-81. iXO Telecomputer 1980-87 ... Merlin lives! www.theelectronicwizard.com. This presentation is online at: ... – PowerPoint PPT presentation

Number of Views:193
Avg rating:3.0/5.0
Slides: 75
Provided by: bob131
Learn more at: http://www.ditausers.org

less

Transcript and Presenter's Notes

Title: Painless XML Authoring? How DITA Simplifies XML


1
Painless XML Authoring?How DITA Simplifies XML
  • Bob Doyle
  • editor_at_cmsreview.com
  • bobdoyle_at_skybuilders.com
  • 617-876-5676
  • Skype bobdoyle

2
A brief poll. Whos heard of
  • Structured writing? Information Mapping?
  • Task-oriented Documentation? vs. ?
  • Minimalism? John Carroll?
  • Single-source publishing? vs. Reuse?
  • Component Content Management?
  • Topic-based authoring?
  • Bob Horn, John Brockmann, JoAnn Hackos, Ginny
    Redish, Ruth Clark?

3
All heard of DITA?
  • Information Typing
  • Topics Concept, Task, and Reference
  • DITA Maps
  • DITA Open Toolkit
  • DITA is Simplified XML
  • Specialization

4
A brief surevy of tools
  • PTC Arbortext (Epic)
  • JustSystems XMetal
  • Adobe FrameMaker
  • Word to DITA (in.vision, Info Mapping)
  • XML Spy, oXygen

5
Heard of me?
  • Ph.D. Astrophysics, Harvard, 1968
  • Collaborative Observing Program,
  • NASA Skylab 1970-72
  • Super8 Sound, 1973-78
  • Merlin and 5 other computer games 1977-81
  • iXO Telecomputer 1980-87
  • MacPublisher 1984-1987
  • Digital Video Editor, New Media Magazine
    -1993-1999

6
Parker Brothers Games
7
iXO Telecomputer
  • Computer-initiated dialogues (AI)
  • Yes, No, Help, Repeat keys
  • Operators are standing by
  • Stock trades, airline reservations, bill paying.
  • Hearing-impaired
  • Powered from phone line
  • Venture capital 13 million
  • Never developed the backend database services
  • Huge NOL carry-forward

8
MacPublisher
  • First Desktop Publishing Program
  • 11th Certified Mac Developer
  • Shipped in 1984
  • Laserwriter in 1985
  • First spot color text on Apple Imagewriter
  • First rotated text/gaphics
  • Sold 20,000 copies
  • MacIndexer
  • Mac-Hyphen
  • Sold to Letraset in 1987

9
Doing What Recently
  • CEO, skyBuilders.com
  • Editor, CMS Review
  • related websites CMS Wiki, CMS Forum, CMS News,
    CMS Calendar, CMS Glossary, CMSML, CMS Boston,
    Open Internet Lexicon, TaxoTips
  • Founder, CM Professionals
  • Contributing Editor, EContent Magazine
  • Founder, DITA Users
  • related websites DITA Infocenter, DITA News,
    DITA Newsletter, DITA Blog, DITA Wiki, and DITA
    Tutor

10
The First Podcast - 2003
  • Christopher Lydon (NPRs The Connection)
  • Dave Winer
  • Adam Curry
  • Bloggercon
  • BlogAudio.org
  • Lydons Open Source Show

11
EContent Magazine
  • Contributing Editor
  • 6 columns per year
  • XML Authoring Tools Review
  • 12 online columns per year
  • EC100 selection

12
Joined OASIS - 2006
  • Organization for the Advancement of Structured
    Information Standards
  • Member DITA Technical Committee
  • Member Learning and Content SC
  • Member Help SC
  • Observer Translation SC
  • Member Editorial Board
  • Organizer Boston DITA User Group

13
DITA Users Launched in March
  • DITA Users is an international membership
    organization
  • 400 members from 21 countries.
  • Members learn topic-based structured writing.
  • Author DITA with DITA Storm browser-based editor
  • Deliverables for web (XHTML), print (PDF), Help
    (Eclipse) from single-source documents.
  • Members have a personal workspace folder.
  • Finished work on web to show colleagues and
    clients.
  • Member directory has contact information.
  • Discounts on major DITA conferences, on tools
    (?), on
  • DITA tutorials and workshops, and on the DITA
    Report.

14
DITA Infocenter Launched April
  • DITA Infocenter is Eclipse-based Online Help
  • DITA Architectural Specification (1.0 and 1.1)
  • DITA Language Specification (1.0 and 1.1)
  • Open Toolkit User Guide (1.3.1)
  • Full-text search
  • Index of keywords
  • Table of contents
  • Generated from DITA files with Open Toolkit

15
DITA News Launched June
  • Aggregates blog posts from DITA bloggers.
  • Extensive listings of DITA tools from A to Z.
  • Events calendar with conference listings,
  • Websites, Publications, Webinars.
  • Glossary of DITA terms.
  • Content syndicated to other websites
  • Single-source publishing tools.

16
DITA Blog Launched July
  • Group blog
  • Anyone may join
  • RSS feeds syndicate to DITA News

17
DITA Wiki Launched July
  • Resources with comments and discussions.
  • Mediawiki software (Wikipedia)
  • Architectural and Language specifications
  • Vendors and Products
  • Professional Services
  • Edited directly by the vendors
  • User comments
  • People section - major DITA players
  • Glossary of terms

18
DITA Newsletter Launched September
  • Monthly summary of DITA news
  • Industry mailing list for press releases.
  • DITA Mentor Awards
  • Next months events listings
  • Member discount offers

19
DITA Tutor Launched September
  • Learning management system (Moodle LMS)
  • Self-paced online tutorials
  • Instructor-led online workshops
  • Powerpoint presentations
  • Some with audio recording
  • Recorded webinars
  • Courses in DITA techniques
  • Certificates of completion.

20
DITA User Groups
  • dita-users_at_groups.yahoo.com
  • http//dita.xml.org/user-groups
  • Encouraging remote attendance
  • Recording meeting presentations
  • Archiving to DITA Tutor
  • Possibly repurpose as eLearning
  • What collaboration tools should we use?

21
Structured Writing 1960s and 70s
  • Structured writing requires an analysis of
    content and a reorganization into the smallest
    possible coherent topics. Decades of research on
    such analysis and organization have been done by
    Information Mapping, who identified common
    document types, information types, and
    information blocks (chunks or topics) in use in
    education and commerce.
  • The reduction in structured authoring time may be
    offset by the increased time needed to analyze
    the content and break it into reusable chunks.
    There is no doubt that granular content, with
    well-defined purposes for each paragraph and
    sentence, is easier to author than linear
    content. But you may need skilled (i.e., more
    expensive) information developers to chunk your
    material.

22
Task-oriented Documentation 1980s
  • Task-oriented docs have replaced system-oriented
    or product-oriented docs - the old comprehensive
    user manual.
  • ROI - The number of calls per month to the help
    desk on a product will almost certainly change
    when product documentation is task oriented and
    minimalist. And task-oriented content can feed
    directly into help-desk scripts.

23
Minimalism 1990s
  • Minimalism aims to provide just what the
    impatient user is looking for. Remember, the web
    surfer is always just one click away from going
    to your competition's website. Your job is to
    strip away unnecessary content and get to the
    point. You can measure the return by pre-testing
    and post-testing content that has been
    re-architected along minimalist principles.
  • Minimalism appears to promise reduced costs for
    the simple reason that there is so much less
    content in well-prepared minimalist material. But
    it takes talented people to write succinct,
    action-oriented procedures that get users to
    understand quickly what they need to know and
    successfully do it. And minimalist material is
    best when it is tested for effectiveness, adding
    to costs.

24
Single-source Publishing 1990s
  • The original definition of single-source
    publishing was providing multiple output formats
    like Web, Print, and Online Help from the
    original documents.
  • When you have one source for each piece of
    content, you get the astonishing ability to
    change it in one place and have the change
    propagate everywhere. A product name change
    becomes much more manageable. Your
    business-critical marketing messages are
    standardized everywhere. Some call single source
    a "single source of truth" because you are
    assured that your customers are not getting mixed
    messages that can confuse them, reduce sales, and
    increase the need for tech support.

25
Single-source plus Reuse
  • Reusable content has a single source, of course,
    but reuse generally refers to content originally
    developed for one context that can be reused in
    another. This requires content that is
    topic-based and written for reuse by avoiding
    explicit references to context.
  • The cost savings associated with reuse of content
    increase greatly when your content goes through a
    workflow with distinct review and approval
    stages, for example legal approval. Content that
    is reused generally can avoid all or most of the
    extra steps in the workflow that involve accuracy
    of content. You will still need design approval
    of the in-context appearance of the reused
    content.

26
Component Content Management
  • The latest buzzword in CMS is "component." Most
    web content management (WCMS) segment content at
    the web page. While this may be adequate for
    simple websites written by one or a few content
    contributors, it is not acceptable for websites
    whose pages act as portals to diverse kinds of
    interactive content.
  • Modern corporate pages pull content in from
    multiple sources. Each content block is filled
    with a content component managed independently of
    all the other blocks on the page. A component has
    its own versioning and scheduling, its own
    writers, reviewers, and approval process.

27
Topic-based authoring
  • A topic is a unit of information with a title and
    some form of content, short enough to be specific
    to a single subject or answer a single question,
    but long enough to make sense on its own and be
    authored as a unit.
  • A topic aims to be context-free, so it contains
    no links to other topics.
  • In DITA, the topic is the basic unit of authoring
    and of reuse.
  • A topic is a content component

28
Why Concept, Task, and Reference?
  • Remember Macintosh doc guidelines?
  • Learning MacPaint, Using MacPaint, the MacPaint
    Reference.
  • Todays OReilly Books Learning PHP,
    Programming PHP, PHP the Definitive Reference
  • Concept What is it?
  • Task How do I do?
  • Reference All the details.

29
Whats a DITA Map?
  • The DITA Map provides context for your
    context-free topics the content.
  • You can have many maps, each one arranging the
    topics for different requirements a reference
    manual, a tutorial, a help desk.
  • The map is like a table of contents that rebuilds
    the book dynamically.

30
Whats the DITA Open Toolkit?
  • The Open Toolkit is an open-source end-to-end
    single-source publishing system.
  • It takes your topics and your maps and generates
    multiple output format deliverables, like print
    (PDF), web (HTML), and Help.
  • It is free and has been integrated into leading
    DITA editing and CMS tools.

31
Why Simplified XML?
  • DITA is XML.
  • XML is way harder than HTML and most writers want
    no part of HTML.
  • So how can DITA be easier than XML?
  • Because XML separates content from presentation
  • And it also separates content from structure

32
What Is Content Anyway?
  • Its not the Presentation or the Structure!
  • Separate Presentation Layer from Content
  • Structure the Content
  • Tag Content with Meaning (semantics) by Metadata

33
Three Kinds of Markup
  • The three layers use different markup
  • Style - ltfontgt, ltbgt, ltigt
  • Structure - ltpgt, ltolgt
  • Semantics ltnamegt, ltpricegt, ltproductgt

34
Three Kinds of XML
  • The three layers use different technologies
  • XSLT Stylesheets (CSS)
  • XML Schemas (DTDs)
  • XML/DITA Documents

35
Three Different Professions
  • The three layers are the work of different
    professionals
  • Designers for Style
  • Architects for Structure
  • Authors for Content and metadata

36
Simplified XML again
  • The DITA Open toolkit is XML with a starter set
    of stylesheets (XSLTs) and schemas (DTDs) so your
    organization does not have to invest in months or
    years of development
  • But simplified can be too simple

37
DITA is not for writers alone..
  • Without style designers (XSLTs)
  • Without structural architects (DTDs)
  • DITA sucks!
  • Its like publishing your annual report in
    Notepad text!
  • Although topics are components, they dont have
    the metadata needed to assemble them
    intelligently.

38
So whats the benefit for writers?
  • Your work can feed into the dynamic assembly of
    complex information products
  • Websites, Help systems, Custom Print
    Documentation, Mobile snippets
  • You are an assembly line writer in the age of
    information automation!
  • Love it or hate it?

39
Topics are Content Components
  • Even subtopic elements can be reusable components
  • Elements just need unique IDs
  • Then they can be conrefd (content referenced)
    which means you can include them by reference in
    other topics.
  • Specialized topics have metadata created by the
    structure architects.

40
So what is specialization?
  • You can specialize structures
  • You can specialize element names
  • Then valid topics can be written in
    DITA-compliant authoring tools without knowing
    anything about the underlying XML
  • And they can be assembled automatically using the
    metadata implicit in the specialization.

41
Three examples of specialization
  • Concepts are specialized topics
  • Tasks are specialized topics
  • References are specialized topics
  • By understanding those specializations, you will
    know how specialization works
  • But remember that specialization is the work of
    document architects and information designers

42
A close look at a topic
  • A topic has only three required elements.
  • an id attribute in the main topic tag (for reuse)
  • a title
  • a body

43
A close look at a topic
  • It can have dozens of optional elements, many of
    which are very familiar HTML elements, like
    paragraphs ltpgt, lists ltulgt, and tables lttablegt

44
A close look at a topic
  • Elements are shown schematically as colored boxes
    in a hierarchy.
  • They are actually XML tag structures, properly
    nested and well formed.
  • lttopic id"1"gt
  • lttitlegtMy Topiclt/titlegt
  • ltshortdescgtAbout my topic...lt/shortdescgt
  • ltbodygt
  • ltpgtSome contentlt/pgt
  • ltpgtSome more contentlt/pgt
  • lt/bodygt
  • lt/topicgt

45
The Concept Type
  • The concept type specializes topic element names
    and topic structure.
  • The root element is renamed concept and the body
    element is renamed conbody.
  • Any number of paragraphs, lists, tables, etc. may
    appear, but none of these are allowed after the
    first section or example.
  • Sections and examples can then appear in any
    order.

46
The Task Type
  • The task type specializes topic element names and
    topic structure.
  • The root element is renamed task and the body
    element is renamed taskbody.
  • One task prerequisite and one context (both
    specializations of section) are followed by steps
    (a specialization of ordered list).
  • Each step must have a command, then optional
    info, a step example, choices, and a step result.
  • The set of steps is followed by the task result,
    examples, and any task postrequisite.

47
The Reference Type
  • The task type specializes topic element names and
    topic structure.
  • The root element is renamed reference and the
    body element is renamed refbody.
  • The refbody includes a properties element (a
    specialization of simpletable) a three-column
    table of property types, values, and
    descriptions.
  • The element refsyn (reference syntax) is a
    specialization of the section element.

48
Thank you.
  • Contact Bob Doyle
  • editor_at_cmsreview.com
  • bobdoyle_at_skybuilders.com
  • Read my EContent articles
  • www.econtentmag.com/About/AboutAuthor.aspx?AuthorI
    D155
  • Please join DITA Users
  • www.ditausers.org/membership/how_to_join
  • Merlin lives!
  • www.theelectronicwizard.com
  • This presentation is online at
  • www.ditausers.org/users/bobdoyle/DocTrainEast2007.
    ppt

49
DITA Users Network 2007
  • DITA Blog
  • DITA Infocenter
  • DITA News
  • DITA Newsletter
  • DITA Tutor
  • DITA Users
  • DITA Wiki

50
DITA Report - November
  • Coming November 2007
  • Based on my XML Editors Review
  • Marketplace analysis
  • Vendors and Products Evaluated
  • Strategies from 1 to 100s of writers
  • Online tour of authoring tools

51
XML Editors
  • Altova XML Spy
  • Cladonia Exchanger
  • Stylus Studio
  • SyncRO Soft ltoXygen/gt
  • Adobe FrameMaker
  • Arbortext Editor
  • XMetal Author
  • Syntext Serna

Eight top XML Editors were studied Chosen from
65 in CMS Review Editor Listings Published
in the June issue of EContent Magazine Extended
version - XML Editors Report
52
Which Editors Do You Use?
  • A quick poll of your experience

53
The XML Editors Report
  • Personal use license
  • Corporate license
  • One year of release versions
  • Online consulting included
  • Screen share to look at interfaces

54
CM Pros Best Practices
55
CMS Trends
  • Open Source (and Open Documents)
  • Online (ASPs and Web Services)
  • Offshore? (Globalization)
  • Enabling technologies (XML, Javascript)
  • AJAX, Web 2.0

56
Information Architecture and Content Management.
  • Two Kinds of Information Architecture
  • IA of document sets, books in a library, a
    website, the World Wide Web organization,
    cataloging, metadata tagging, accessibility,
    findability.
  • IA of a single document - page structure, allowed
    navigation elements and reusable content
    components.

57
Defining Content Management
  • What is a CM System?
  • What Is Content Management?
  • What Is Content?

58
What is a CM System?
  • It is humans using computers and software to
    assist in managing content.
  • It has two main parts
  • The user interface.
  • The database (content repository).
  • Everything else is magic middleware.
  • It helps manage the content lifecycle.

59
What Is Content Management?
  • Content management is the whole process from
    creation and capture of original content to the
    delivery of different versions to many publishing
    channels
  • Print
  • Web
  • Cellphone
  • Etc.

60
The Content Lifecycle
  • 7 stages
  • Organize
  • Rules
  • Create
  • Storage
  • Assembly
  • Publish
  • Archive
  • Context
  • Users
  • Content

61
Brown Television (BTV)
  • Doug Liman

62
Hi-8 Users Group
  • Funded Videomaker Magazine, Hi-8 Group became
    Desktop Video Group in 1992

63
HRTV and Quad Sound
  • Harvard-Radcliffe Film Workshop was in the
    basement of Holmes Hall (North/Pforzheimer House)
    where the old Radcliffe Radio Station and Morse
    Music Library were located. In the mid-80s it
    became HRTV and the radio broadcast booth and
    adjoining sound rooms became Quad Sound Studios.

64
CMS Review
65
Other CMS Review Sites
  • CMS Forum
  • CMS Wiki
  • CMSML
  • CMS News
  • CMS Calendar
  • CMS Glossary
  • CMS Boston
  • Memography
  • Open Internet
  • Lexicon
  • TaxoTips
  • List-2-Web

66
CMS Review Glossary
67
Finding a CMS
  • The CMSML project at CMS Review and CM Pros

Click compare to get the results below...
Select two CMS or enter search terms to find
CMS that match your criteria. The directory is a
faceted classification scheme.
68
CM Professionals
  • Nearly 1000 members in 2006
  • Website (7/10 Google PageRank)
  • Benefits - Mail, Member Directory
  • Glossary, Resource Library, Calendar
  • Communities - CMSML, DITA, Global
  • News, Blog aggregation
  • Globalization, Personalization

69
CM Professionals
70
CM Pros Member Directory
71
CM Pros Calendar
72
CM Pros Videos
  • Eighty hours of video from Gilbane Conferences,
    IA Summit, OSCOM, Bloggercons at Harvard.

Bob Boiko interviews Shino
73
CM Pros Communities
  • CMS Markup Language
  • (and Faceted CMS Directory)
  • Globalization website in 10 languages
  • (translations by volunteers)
  • DITA
  • (JoAnn Hackos, Scott Abel, others)

74
DITA Island
  • Second Life meetings on DITA
Write a Comment
User Comments (0)