The Universities - PowerPoint PPT Presentation

About This Presentation
Title:

The Universities

Description:

Defines the 1938 orthography. 32 000 entries. The dictionary is linked to the Meta Dictionary ... different dialects and/or according to changing orthographies ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 52
Provided by: edd1
Category:

less

Transcript and Presenter's Notes

Title: The Universities


1
The Universities Collection Databases
  • The Universities Collection Databases denotes
    all databases developed by the Unit for digital
    documentation at the Arts Faculty, University of
    Oslo.
  • The databases contains data from archaeology,
    antropology, botany, zoology, numismatics,
    history, history of arts, lexicography
  • The databases are accessible via specially
    developed end user applications and via the WWW.

2
The Universities' Collection databases
  • This presentation gives an overview of
  • A common user interface
  • Samples from some of the databases

3
Implementation
  • The databases are implemented in Oracle 8.1.7,
    not using any spesific object oriented features
  • The object types (and the table structures) are
    defined in a common meta database
  • All databases are accessed via a common framework
  • The common framework get design and structure
    information from the meta database. All queries
    are generated automatically on the basis of the
    information in the meta database.
  • Each user is granted access via a user database
  • The user interface program checks the meta
    database for new versions of modules and upgrade
    it self automatically via the net.
  • New databases are added regularly
  • A WWW version is being developed

4
The users have their personal navigator for quick
access to databases of interest.
Each database has an assosiated object type
5
The users can add their own folders or categories
to the navigator
6
Choose a database (archaeological artifacts and
finds)
Search for the artifact type ring
7
Click on a column title to sort the result grid
8
Drag and drop a column title to group the rows in
the grid
9 rings found in the county 'Akershus'
9
Double click to view detailed information (show
the object viewer)
10
The artifacts found together with the selected
ring (in the same find event)
11
The users can export the data as HTML, Excel or
according to the users predefined report
templates
12
The result grid exported to Excel
13
The users can define report templates
14
Drag and drop result rows onto a predefined
report template of the corresponding object type
to create a report
15
The report is ready to be printed
16
Click to save pointers to selected rows in the
result grid
A list can hold pointers to a manually selected
set of objects or a dynamic set (query defined).
The pointers can be of a single object type or
have different typed. In the latter case the type
will be the common supertype
17
Click on the list icon to see the content of a
list. In the system a stored list is just a (sub)
database and can be queried.
18
Additional pointers to can be added to an
existing list
19
Click on the explorer icon to get an overview of
users and data sources (databases and stored
lists)
20
Click to see both the result grid and the object
viewer
Select another database (here place names
excerpts)
21
Click to switch windows
Display the object correponding to the next row
in the result grid
22
The users can create and store their personal
result grid design
The tree structure reflects the structure of the
object type as defined in the meta database
23
The users can create and store their personal
query form design
24
Linguistic and lexicographic applications
  • Lexicographic archives
  • Lexical databases
  • Dictionary databases
  • Editing tools
  • The Meta Dictionary - a tool for the field
    linguist or lexicographer
  • The Norwegian Dictionary project
  • Text corpus tools

25
Lexical archives
  • The database for the traditional word slip
    collection of the Norwegian Dictionary project
  • Main collection 2 900 000 facsimiles
  • Regional collection 187 000 facsimiles
  • The database is linked to the Meta Dictionary

26
  • Head word
  • Part of speech
  • Literature references
  • Place of utterance
  • Facsimiles

27
Morphological databases
  • Lists with lemmata and inflected forms for the
    two Norwegian written languages (bokmål, nynorsk)
  • Basis for a two level morpho-syntactic tagger
  • Produced in collaboration with the Text
    Laboratory at the Arts faculty, Univ. of Oslo
  • Bokmål 156.000 lemmata, 1,2 million inflected
    forms
  • Nynorsk 123.000 lemmata, 896.000 inflected forms
  • The databases are linkedto the Meta Dictionary

28
Lemma
Paradigme codes and generated inflected forms
29
Dictionary databases
  • Database tools for two major Norwegian
    dictionaries
  • The entire process from editing to camara ready
    manuscript
  • The tools are integrated in the common framwork
  • The manuscripts are linked to the Meta Dictionary

30
The dictionary entry
31
Fields for different information categories
Graphical representation of the definition
structure
The editing tools are for the time being not a
parts of the common framwork
AWYSIWYGpresentation of the entry
32
The entries can be viewed in the their running
context
The program generates the head word part of the
entry based on the lemma and part of speech
marking
Navigation buttons
33
  • A set of entries (or the entire manuscript) can
    be typeset in the PDF format and presented on the
    screen.
  • The entries are exported from the database as XML
    documents, converted via TEX, DVI to PDF and send
    back to the user.

34
The Norwegian Dictionary
  • A national dictionary project (nynorsk)
  • To be finished in 12 volumes by year 2014
  • DOK is developing the software solutions
  • The dictionary manuscript is linked to the Meta
    Dictionary

35
Graphical representation of the entry
The full text based on the structure of the entry
36
Each part of the dictionary entry has its own
data entry form
Data entry form for the head word part
Artikkelteksten vert kontinuerleg oppdatert
The entry text is updated automatically
37
Skards dictionary
  • Defines the 1938 orthography
  • 32 000 entries
  • The dictionary is linked to the Meta Dictionary

38
(No Transcript)
39
The Meta Dictionary
  • A tool for systematising weakly normalized
    languages and a tool for the development of the
    Norwegian Dictionary (NO2014)
  • Interlinks different lexical databases
  • 521 000 headwords (NO2014)
  • The backbone in the (NO2014) project

40
924 slips about the word hus (house)
41
Word forms /lemmata written in different dialects
and/or according to changing orthographies
42
Word compound analysis
43
Object viewer according to the type of lexical
resource (here slips)
Links to other lexical resources
44
Tool for fast normalization of the head words in
the Meta Dictionary
Each project assistant has to normalize 300
entries a day
All links are manually checked
45
Norwegian (Nynorsk) electronic text corpus
  • Background
  • Editorial requirements for NO2014
  • Design and implementation
  • Unit for digital documentation, DOK
  • Work began in August 2002 and will continue
    according to the tasked assigned to the unit by
    NO2014 for one year
  • daniel.ridings_at_muspro.uio.no

46
Norwegian (Nynorsk) electronic text
corpusLong-term goals
  • The definitive corpus for New Norwegian for
    lexicography and for other domains using
    electronic resources
  • A corpus access system that can be reused for
    other languages and text collections
  • Incorporation of robust methods from
    computational linguistics with the goal of
    creating a linguistic workbench, over and above a
    corpus workbench

47
Norwegian (Nynorsk) electronic text
corpusApplication Area
  • Editorial work within NO2014
  • Headword selection
  • Choice of examples
  • Examples are catalogued in the Meta Dictionary
  • Sense division
  • Firth Knowing a word by the company it keeps.
  • Aided by the refined collection of examples

48
Norwegian (Nynorsk) electronic text
corpusIntegration with the Meta Dictionary
  • Excerpta refined by
  • Methods from computational linguistics
  • Human interaction
  • Eventually a selection will be made for
    publication, but in the framework of the Meta
    Dictionary, even those that were excluded from
    publication will remain available for other
    application areas
  • Communication with the editing software through
    the Meta Dictionary

49
Norwegian (Nynorsk) electronic text corpusDesign
  • Representative corpus based on specifications
    produced by the EU language resources project,
    LE-PAROLE
  • SGML markup in accordance with PAROLEs
    specifications, based on TEI
  • One-to-One mapping between the PAROLE format and
    a database structure defined in Oracle.

50
Norwegian (Nynorsk) electronic text corpusStatus
  • 25,000,000 words
  • Dag og Tid (news paper)
  • 21,000,000 words
  • Legacy data
  • approx 5,000,000 literature
  • Existing agreements
  • Weekly deliveries from Dag og Tid
  • Samlaget (publishing house)
  • Syn og Segn (monthly magazine)

51
Norwegian (Nynorsk) electronic text corpusThe
next steps
  • Application for access through the web, last
    quarter of 2002
  • Balancing the domains covered by the corpus
    continuous
  • Stand-alone windows application
  • Continuous incorporation of computational
    linguistics methods for phrase identification and
    extraction,
Write a Comment
User Comments (0)
About PowerShow.com