Title: The Universities
1The Universities Collection Databases
- The Universities Collection Databases denotes
all databases developed by the Unit for digital
documentation at the Arts Faculty, University of
Oslo. - The databases contains data from archaeology,
antropology, botany, zoology, numismatics,
history, history of arts, lexicography - The databases are accessible via specially
developed end user applications and via the WWW.
2The Universities' Collection databases
- This presentation gives an overview of
- A common user interface
- Samples from some of the databases
3Implementation
- The databases are implemented in Oracle 8.1.7,
not using any spesific object oriented features - The object types (and the table structures) are
defined in a common meta database - All databases are accessed via a common framework
- The common framework get design and structure
information from the meta database. All queries
are generated automatically on the basis of the
information in the meta database. - Each user is granted access via a user database
- The user interface program checks the meta
database for new versions of modules and upgrade
it self automatically via the net. - New databases are added regularly
- A WWW version is being developed
4The users have their personal navigator for quick
access to databases of interest.
Each database has an assosiated object type
5The users can add their own folders or categories
to the navigator
6Choose a database (archaeological artifacts and
finds)
Search for the artifact type ring
7Click on a column title to sort the result grid
8Drag and drop a column title to group the rows in
the grid
9 rings found in the county 'Akershus'
9Double click to view detailed information (show
the object viewer)
10The artifacts found together with the selected
ring (in the same find event)
11The users can export the data as HTML, Excel or
according to the users predefined report
templates
12The result grid exported to Excel
13The users can define report templates
14Drag and drop result rows onto a predefined
report template of the corresponding object type
to create a report
15The report is ready to be printed
16Click to save pointers to selected rows in the
result grid
A list can hold pointers to a manually selected
set of objects or a dynamic set (query defined).
The pointers can be of a single object type or
have different typed. In the latter case the type
will be the common supertype
17Click on the list icon to see the content of a
list. In the system a stored list is just a (sub)
database and can be queried.
18Additional pointers to can be added to an
existing list
19Click on the explorer icon to get an overview of
users and data sources (databases and stored
lists)
20Click to see both the result grid and the object
viewer
Select another database (here place names
excerpts)
21Click to switch windows
Display the object correponding to the next row
in the result grid
22The users can create and store their personal
result grid design
The tree structure reflects the structure of the
object type as defined in the meta database
23The users can create and store their personal
query form design
24Linguistic and lexicographic applications
- Lexicographic archives
- Lexical databases
- Dictionary databases
- Editing tools
- The Meta Dictionary - a tool for the field
linguist or lexicographer - The Norwegian Dictionary project
- Text corpus tools
25Lexical archives
- The database for the traditional word slip
collection of the Norwegian Dictionary project - Main collection 2 900 000 facsimiles
- Regional collection 187 000 facsimiles
- The database is linked to the Meta Dictionary
26- Head word
- Part of speech
- Literature references
- Place of utterance
- Facsimiles
27Morphological databases
- Lists with lemmata and inflected forms for the
two Norwegian written languages (bokmål, nynorsk) - Basis for a two level morpho-syntactic tagger
- Produced in collaboration with the Text
Laboratory at the Arts faculty, Univ. of Oslo - Bokmål 156.000 lemmata, 1,2 million inflected
forms - Nynorsk 123.000 lemmata, 896.000 inflected forms
- The databases are linkedto the Meta Dictionary
28Lemma
Paradigme codes and generated inflected forms
29Dictionary databases
- Database tools for two major Norwegian
dictionaries - The entire process from editing to camara ready
manuscript - The tools are integrated in the common framwork
- The manuscripts are linked to the Meta Dictionary
30The dictionary entry
31Fields for different information categories
Graphical representation of the definition
structure
The editing tools are for the time being not a
parts of the common framwork
AWYSIWYGpresentation of the entry
32The entries can be viewed in the their running
context
The program generates the head word part of the
entry based on the lemma and part of speech
marking
Navigation buttons
33- A set of entries (or the entire manuscript) can
be typeset in the PDF format and presented on the
screen.
- The entries are exported from the database as XML
documents, converted via TEX, DVI to PDF and send
back to the user.
34The Norwegian Dictionary
- A national dictionary project (nynorsk)
- To be finished in 12 volumes by year 2014
- DOK is developing the software solutions
- The dictionary manuscript is linked to the Meta
Dictionary
35Graphical representation of the entry
The full text based on the structure of the entry
36Each part of the dictionary entry has its own
data entry form
Data entry form for the head word part
Artikkelteksten vert kontinuerleg oppdatert
The entry text is updated automatically
37Skards dictionary
- Defines the 1938 orthography
- 32 000 entries
- The dictionary is linked to the Meta Dictionary
38(No Transcript)
39The Meta Dictionary
- A tool for systematising weakly normalized
languages and a tool for the development of the
Norwegian Dictionary (NO2014) - Interlinks different lexical databases
- 521 000 headwords (NO2014)
- The backbone in the (NO2014) project
40924 slips about the word hus (house)
41Word forms /lemmata written in different dialects
and/or according to changing orthographies
42Word compound analysis
43Object viewer according to the type of lexical
resource (here slips)
Links to other lexical resources
44Tool for fast normalization of the head words in
the Meta Dictionary
Each project assistant has to normalize 300
entries a day
All links are manually checked
45Norwegian (Nynorsk) electronic text corpus
- Background
- Editorial requirements for NO2014
- Design and implementation
- Unit for digital documentation, DOK
- Work began in August 2002 and will continue
according to the tasked assigned to the unit by
NO2014 for one year - daniel.ridings_at_muspro.uio.no
46Norwegian (Nynorsk) electronic text
corpusLong-term goals
- The definitive corpus for New Norwegian for
lexicography and for other domains using
electronic resources - A corpus access system that can be reused for
other languages and text collections - Incorporation of robust methods from
computational linguistics with the goal of
creating a linguistic workbench, over and above a
corpus workbench
47Norwegian (Nynorsk) electronic text
corpusApplication Area
- Editorial work within NO2014
- Headword selection
- Choice of examples
- Examples are catalogued in the Meta Dictionary
- Sense division
- Firth Knowing a word by the company it keeps.
- Aided by the refined collection of examples
48Norwegian (Nynorsk) electronic text
corpusIntegration with the Meta Dictionary
- Excerpta refined by
- Methods from computational linguistics
- Human interaction
- Eventually a selection will be made for
publication, but in the framework of the Meta
Dictionary, even those that were excluded from
publication will remain available for other
application areas - Communication with the editing software through
the Meta Dictionary
49Norwegian (Nynorsk) electronic text corpusDesign
- Representative corpus based on specifications
produced by the EU language resources project,
LE-PAROLE - SGML markup in accordance with PAROLEs
specifications, based on TEI - One-to-One mapping between the PAROLE format and
a database structure defined in Oracle.
50Norwegian (Nynorsk) electronic text corpusStatus
- 25,000,000 words
- Dag og Tid (news paper)
- 21,000,000 words
- Legacy data
- approx 5,000,000 literature
- Existing agreements
- Weekly deliveries from Dag og Tid
- Samlaget (publishing house)
- Syn og Segn (monthly magazine)
51Norwegian (Nynorsk) electronic text corpusThe
next steps
- Application for access through the web, last
quarter of 2002 - Balancing the domains covered by the corpus
continuous - Stand-alone windows application
- Continuous incorporation of computational
linguistics methods for phrase identification and
extraction,