Diapositive 1 - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Diapositive 1

Description:

Evaluation of performance on each citation or collection of citations. ... Stats are obtained by comparing machine indexing. with the final record after human ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 17
Provided by: khay
Category:

less

Transcript and Presenter's Notes

Title: Diapositive 1


1
INIST Machine-aided indexing Abdelmajid
Khayari Stéphane Schneider INIST/CNRS France NFA
IS Forum New York. April 22, 2005
2
  • INIST
  • Institute for Scientific
  • and Technical Information
  • A service of the French CNRS.
  • Activities collection, analysis and
    dissemination of the results and findings of
    worldwide research.
  • Fields covered science, technology, medicine,
    humanities and social sciences.
  • Leading scientific and technical document
    supplier in France.
  • Producer of multilingual, multidisciplinary
    bibliographic databases, PASCAL, FRANCIS and
    ISD covering the core worldwide scientific
    literature.

2
INIST Machine-Aided Indexing. NFAIS forum. New
York. April 22, 2005.
3
  • INIST (continued)
  • Provider of customized services to the
    scientific
  • community (portals, current awareness,
    training, etc.)
  • Partner in open access initiatives.
  • Research partner in the NTIC community

3
INIST Machine-Aided Indexing. NFAIS forum. New
York. April 22, 2005.
4
  • Aims and scopes
  • Introduction of a part of automation in the
    indexing
  • process
  • To which extent can the process be automated ?
  • Which approach is suitable ?
  • What are the prerequisites ?
  • Evaluation of the final result
  • Is there support from the indexers ?
  • Does it meet the expectations of the
  • indexers ?
  • Are the results acceptable ?

4
INIST Machine-Aided Indexing. NFAIS forum. New
York. April 22, 2005.
5
  • Current indexing practices
  • About 70 internal and external specialized
    indexers.
  • Documents in diverse languages (main English,
  • French, Spanish, German).
  • Semi-manual allocation of descriptors and
  • classification codes.
  • Use of controlled vocabularies and
    classification
  • schemes completed with free key-words.
  • Multilingual descriptors.

5
INIST Machine-Aided Indexing. NFAIS forum. New
York. April 22, 2005.
6
  • Workstation
  • Development of a home-made platform accessed by
  • INIST indexers since year 2000 via our intranet.
  • Set up of indexing programs and fine tuning.
  • Collaborative work on terminology resources.
  • About 2000 input records processed each night.
  • Use of fully automated indexing for periodicals
    that
  • are not manually analyzed (May, 2004).

6
INIST Machine-Aided Indexing. NFAIS forum. New
York. April 22, 2005.
7
  • Two indexing programs
  • Lexical method. Uses equivalence rules gathered
    in
  • subject terminological resources to assign
  • descriptors and classification codes to
    documents.
  • Statistical approach (lexical collocation). A
    two-stage
  • process
  • Training stage a corpus of human-indexed
  • citations is processed to create association
  • dictionaries.
  • Indexing stage using the association
    dictionaries,
  • controlled vocabulary descriptors and
  • classification codes are assigned to the
  • incoming documents.

7
INIST Machine-Aided Indexing. NFAIS forum. New
York. April 22, 2005.
8
  • Lexical method
  • Text processing of bibliographical records
    (titles abstracts author keywords)
  • Parsing text to phrases and phrases to words.
  • Lemmatization.
  • Matching with subject terminology resources
  • Searching for terms that correspond to
    descriptors and classification codes.
  • Searching intervals for compound terms (2 to 6
    words).
  • Ordering of candidate key-words and
    classification codes semantic categories are
    used to construct the indexing grid.

8
INIST Machine-Aided Indexing. NFAIS forum. New
York. April 22, 2005.
9
  • Lexical method (continued)
  • Generation of additional descriptors and codes
    each keyword or code may trigger another one
    using association rules in a cascade-like manner
  • Rat -gt Animal / Acropulpitis -gt Finger.
  • Pointing task -gt Manual task -gt Motor control
  • Pointing task Vision -gt Visuomotor integration
  • Ranking of keywords and codes.
  • Filtering of the assigned elements number of
    occurrences for each category is used as a
    filter to set the desired number of candidate
    descriptors.

9
INIST Machine-Aided Indexing. NFAIS forum. New
York. April 22, 2005.
10
  • Lexical method (continued)
  • Check-up by human indexers candidate
    descriptors and codes are validated and
    completed.
  • Continuous feedback on terminological resources
    is operated in parallel by introducing new
    equivalence rules, new data (synonyms) in
    existing rules or deleting noise-producing
    rules.
  • Feedback on the indexing program (changing
    parameters and terminology resources
    combination).

10
INIST Machine-Aided Indexing. NFAIS forum. New
York. April 22, 2005.
11
  • Initial version
  • Deployment of a basic version using the 2
    methods.
  • A semi-automated process is used to construct
    the subject terminology resources.
  • Bibliographic databases are used to extract
    corpora dealing with a specific subject (i.e.
    pain)
  • Corpora are processed to extract a ranked list
    of descriptors which is run against the
    controlled vacabulary to extract synonyms,
    translations, semantic categories, etc.
  • This core thematic dictionary is enriched with
    new concepts and new data.
  • During an iterative indexing and re-indexing
    process, the performance of the dictionary is
    improved.

11
INIST Machine-Aided Indexing. NFAIS forum. New
York. April 22, 2005.
12
  • Initial version (continued)
  • Sharing of thematic resources - Someone has the
    dictionary of diseases or of geographical names
    I need.
  • Access to full-text articles (OCRized and
    directly from publishers).
  • Direct feedback to administrators/developers
    incorporated.
  • Evaluation of performance on each citation or
    collection of citations. Final indexing is
    compared to the initial one as proposed by the
    program.

INIST Machine-Aided Indexing. NFAIS forum. New
York. April 22, 2005.
12
13
  • Evaluation
  • Indexers support for MAI was not easy to obtain
  • Important psychological reluctance at the
    beginning
  • (the machine will never be able to perform a
    highly
  • intellectual task abstraction).
  • The crucial need to formalize the specialist
  • knowledge is becoming well understood.
  • Many concerns about fully automated indexing
    since
  • the standard scale is the human produced
    indexing
  • (i.e. a candidate descriptor which is not
    inaccurate
  • per se will be considered as wrong since the
    human
  • indexer did not include it in the final record.)

13
INIST Machine-Aided Indexing. NFAIS forum. New
York. April 22, 2005.
14
  • Evaluation (continued)
  • The lexical method is predominantly used by
    indexers.
  • The statistical one is used mainly for
    determination
  • of classification codes.
  • Stats are obtained by comparing machine indexing
  • with the final record after human revision
  • Performance is proportional to the degree of
  • improvement of terminology resources (in pilot
  • subject fields up to 80 accurate candidate
  • descriptors can be obtained).
  • Unsuccessful machine-indexing triggers feedback
    on
  • computer programs and on terminology
  • resources

14
INIST Machine-Aided Indexing. NFAIS forum. New
York. April 22, 2005.
15
  • Evaluation (continued)
  • During the deployment phase, time-saving is not
  • always achieved because feedback on
  • terminology resources is time-consuming.
  • Nevertheless, benefits are real in terms of
  • Indexing consistency (less intra- and
    inter-individual
  • variations)
  • Indexors expertise and knowledge acquired
    during
  • the abstraction-indexing processes are integrated
  • into an organization resource (knowledge
  • capitalization and sharing).

15
INIST Machine-Aided Indexing. NFAIS forum. New
York. April 22, 2005.
16
  • Future trends
  • Indexing programs improvement
  • Improvement of textual pattern extraction
    (genes, etc.)
  • Introduction of advanced natural language
    processing
  • Extraction of concepts.
  • Extraction of relationships between concepts.
  • Improvement of citation pre-classification in
    order to
  • be able to assign the right combination of
  • subject resources.
  • Constitution of a unified terminology database

16
INIST Machine-Aided Indexing. NFAIS forum. New
York. April 22, 2005.
Write a Comment
User Comments (0)
About PowerShow.com