The Universities - PowerPoint PPT Presentation

1 / 51

About This Presentation

Title:

The Universities

Description:

Defines the 1938 orthography. 32 000 entries. The dictionary is linked to the Meta Dictionary ... different dialects and/or according to changing orthographies ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 52

Provided by: edd1

Category:

more less

Transcript and Presenter's Notes

Title: The Universities

1
The Universities Collection Databases

The Universities Collection Databases denotes
all databases developed by the Unit for digital
documentation at the Arts Faculty, University of
Oslo.
The databases contains data from archaeology,
antropology, botany, zoology, numismatics,
history, history of arts, lexicography
The databases are accessible via specially
developed end user applications and via the WWW.

2
The Universities' Collection databases

This presentation gives an overview of
A common user interface
Samples from some of the databases

3
Implementation

The databases are implemented in Oracle 8.1.7,
not using any spesific object oriented features
The object types (and the table structures) are
defined in a common meta database
All databases are accessed via a common framework
The common framework get design and structure
information from the meta database. All queries
are generated automatically on the basis of the
information in the meta database.
Each user is granted access via a user database
The user interface program checks the meta
database for new versions of modules and upgrade
it self automatically via the net.
New databases are added regularly
A WWW version is being developed

4
The users have their personal navigator for quick
access to databases of interest.
Each database has an assosiated object type
5
The users can add their own folders or categories
to the navigator
6
Choose a database (archaeological artifacts and
finds)
Search for the artifact type ring
7
Click on a column title to sort the result grid
8
Drag and drop a column title to group the rows in
the grid
9 rings found in the county 'Akershus'
9
Double click to view detailed information (show
the object viewer)
10
The artifacts found together with the selected
ring (in the same find event)
11
The users can export the data as HTML, Excel or
according to the users predefined report
templates
12
The result grid exported to Excel
13
The users can define report templates
14
Drag and drop result rows onto a predefined
report template of the corresponding object type
to create a report
15
The report is ready to be printed
16
Click to save pointers to selected rows in the
result grid
A list can hold pointers to a manually selected
set of objects or a dynamic set (query defined).
The pointers can be of a single object type or
have different typed. In the latter case the type
will be the common supertype
17
Click on the list icon to see the content of a
list. In the system a stored list is just a (sub)
database and can be queried.
18
Additional pointers to can be added to an
existing list
19
Click on the explorer icon to get an overview of
users and data sources (databases and stored
lists)
20
Click to see both the result grid and the object
viewer
Select another database (here place names
excerpts)
21
Click to switch windows
Display the object correponding to the next row
in the result grid
22
The users can create and store their personal
result grid design
The tree structure reflects the structure of the
object type as defined in the meta database
23
The users can create and store their personal
query form design
24
Linguistic and lexicographic applications

Lexicographic archives
Lexical databases
Dictionary databases
Editing tools
The Meta Dictionary - a tool for the field
linguist or lexicographer
The Norwegian Dictionary project
Text corpus tools

25
Lexical archives

The database for the traditional word slip
collection of the Norwegian Dictionary project
Main collection 2 900 000 facsimiles
Regional collection 187 000 facsimiles
The database is linked to the Meta Dictionary

Head word
Part of speech
Literature references
Place of utterance
Facsimiles

27
Morphological databases

Lists with lemmata and inflected forms for the
two Norwegian written languages (bokmål, nynorsk)
Basis for a two level morpho-syntactic tagger
Produced in collaboration with the Text
Laboratory at the Arts faculty, Univ. of Oslo
Bokmål 156.000 lemmata, 1,2 million inflected
forms
Nynorsk 123.000 lemmata, 896.000 inflected forms
The databases are linkedto the Meta Dictionary

28
Lemma
Paradigme codes and generated inflected forms
29
Dictionary databases

Database tools for two major Norwegian
dictionaries
The entire process from editing to camara ready
manuscript
The tools are integrated in the common framwork
The manuscripts are linked to the Meta Dictionary

30
The dictionary entry
31
Fields for different information categories
Graphical representation of the definition
structure
The editing tools are for the time being not a
parts of the common framwork
AWYSIWYGpresentation of the entry
32
The entries can be viewed in the their running
context
The program generates the head word part of the
entry based on the lemma and part of speech
marking
Navigation buttons
33

A set of entries (or the entire manuscript) can
be typeset in the PDF format and presented on the
screen.

The entries are exported from the database as XML
documents, converted via TEX, DVI to PDF and send
back to the user.

34
The Norwegian Dictionary

A national dictionary project (nynorsk)
To be finished in 12 volumes by year 2014
DOK is developing the software solutions
The dictionary manuscript is linked to the Meta
Dictionary

35
Graphical representation of the entry
The full text based on the structure of the entry
36
Each part of the dictionary entry has its own
data entry form
Data entry form for the head word part
Artikkelteksten vert kontinuerleg oppdatert
The entry text is updated automatically
37
Skards dictionary

Defines the 1938 orthography
32 000 entries
The dictionary is linked to the Meta Dictionary

38
(No Transcript)
39
The Meta Dictionary

A tool for systematising weakly normalized
languages and a tool for the development of the
Norwegian Dictionary (NO2014)
Interlinks different lexical databases
521 000 headwords (NO2014)
The backbone in the (NO2014) project

40
924 slips about the word hus (house)
41
Word forms /lemmata written in different dialects
and/or according to changing orthographies
42
Word compound analysis
43
Object viewer according to the type of lexical
resource (here slips)
Links to other lexical resources
44
Tool for fast normalization of the head words in
the Meta Dictionary
Each project assistant has to normalize 300
entries a day
All links are manually checked
45
Norwegian (Nynorsk) electronic text corpus

Background
Editorial requirements for NO2014
Design and implementation
Unit for digital documentation, DOK
Work began in August 2002 and will continue
according to the tasked assigned to the unit by
NO2014 for one year
daniel.ridings_at_muspro.uio.no

46
Norwegian (Nynorsk) electronic text
corpusLong-term goals

The definitive corpus for New Norwegian for
lexicography and for other domains using
electronic resources
A corpus access system that can be reused for
other languages and text collections
Incorporation of robust methods from
computational linguistics with the goal of
creating a linguistic workbench, over and above a
corpus workbench

47
Norwegian (Nynorsk) electronic text
corpusApplication Area

Editorial work within NO2014
Headword selection
Choice of examples
Examples are catalogued in the Meta Dictionary
Sense division
Firth Knowing a word by the company it keeps.
Aided by the refined collection of examples

48
Norwegian (Nynorsk) electronic text
corpusIntegration with the Meta Dictionary

Excerpta refined by
Methods from computational linguistics
Human interaction
Eventually a selection will be made for
publication, but in the framework of the Meta
Dictionary, even those that were excluded from
publication will remain available for other
application areas
Communication with the editing software through
the Meta Dictionary

49
Norwegian (Nynorsk) electronic text corpusDesign

Representative corpus based on specifications
produced by the EU language resources project,
LE-PAROLE
SGML markup in accordance with PAROLEs
specifications, based on TEI
One-to-One mapping between the PAROLE format and
a database structure defined in Oracle.

50
Norwegian (Nynorsk) electronic text corpusStatus

25,000,000 words
Dag og Tid (news paper)
21,000,000 words
Legacy data
approx 5,000,000 literature
Existing agreements
Weekly deliveries from Dag og Tid
Samlaget (publishing house)
Syn og Segn (monthly magazine)

51
Norwegian (Nynorsk) electronic text corpusThe
next steps

Application for access through the web, last
quarter of 2002
Balancing the domains covered by the corpus
continuous
Stand-alone windows application
Continuous incorporation of computational
linguistics methods for phrase identification and
extraction,

Write a Comment

User Comments (0)