Automatic Summarisation for Systematic Reviews using Text Mining - PowerPoint PPT Presentation

About This Presentation
Title:

Automatic Summarisation for Systematic Reviews using Text Mining

Description:

The 1st national text mining centre in the world www.nactem.ac.uk ... Clustering Techniques to categorise and collate documents on similar subtopics ... – PowerPoint PPT presentation

Number of Views:249
Avg rating:3.0/5.0
Slides: 29
Provided by: bria69
Category:

less

Transcript and Presenter's Notes

Title: Automatic Summarisation for Systematic Reviews using Text Mining


1
Automatic Summarisation forSystematic
Reviewsusing Text Mining
  • Brian Rea
  • National Centre for Text Mining
  • School of Computer Science
  • University of Manchester

2
What is NaCTeM
  • The 1st national text mining centre in the world
    www.nactem.ac.uk
  • Location Manchester Interdisciplinary Biocentre
    (MIB) www.mib.ac.uk
  • Remit Provision of text mining services to
    support UK research
  • Funded by the JISC, BBSRC, EPSRC

3
What is ASSERT?
  • Expansion of NaCTeM into the social sciences
  • Development of component based framework to
    assist with systematic reviews
  • Provision of an exemplar service for document
    summarisation
  • Customisation of tools for subject specific
    collections
  • Dissemination of the benefits of text mining
    within the social science disciplines
  • www.nactem.ac.uk/assert

4
Systematic Reviews
  • First, extensive searches are carried out in
    order to locate as much relevant research as
    possible according to a query.
  • Then the mass of data retrieved by this process
    is screened until only the most relevant and
    reliable literature remains to form the focus of
    the review.
  • Finally, the literature is synthesised and
    summary reports are written to inform policy and
    practice by helping users of the research to make
    evidence-informed decisions.

5
Searching
  • A combination of Web Crawl and Information
    Retrieval systems to allow for iteratively deeper
    searches.
  • Terminology Management to discover key concepts
    for later stages and to improve search criteria
  • Clustering Techniques to categorise and collate
    documents on similar subtopics
  • Query expansion techniques to widen the search
  • Visualisation to allow for improved usability and
    access to documents

6
Screening
  • Multiple document views with term highlighting
    for a quick overview of a text
  • Interactive topic maps for identifying subject
    relations
  • Interactive filtering by metadata queries
  • Customisable exclusion criteria to assist with
    the definition of review scope
  • Semi-automatic filtering by assisted
    classification
  • Summarisation techniques to identify significant
    sections of each document

7
Synthesis
  • Multi-Document Summarisation techniques to assist
    with comprehension of the subtopics.
  • Interactive evidence clusters for quick access to
    important document collection subsets
  • Evidence retrieval and reference through existing
    systems to assist in report generation
  • Commented log files and note facilities to
    integrate with existing methodology

8
Text Mining Solutions
9
Term Extraction
10
Document Clustering
11
Cluster Visualisation
12
Query Expansion
  • Required for the discovery of potentially unknown
    documents as part of systematic review
  • Leverage the cluster results and term
    significance scores
  • Three levels of expansion
  • Document
  • Cluster
  • Relevant Collection
  • Bias searches towards documents discussing topics
    with a similar significance
  • Allows user to explore the wider collection or
    narrow the focus to achieve the overall goal

13
Similarity Scores
C-Value
Document Boost Field Boost Field Length
Normalisation
14
Document Filtering
  • Similarity does not necessarily equate to
    relevance
  • Exclusion criteria are defined during the review
    design phase to explicitly layout what is not to
    be included
  • Examples include
  • Date, country or organisation involved in the
    study
  • Population or research methodology
  • Type of document review, intermediary report,
    update
  • Related but unimportant topics
  • Powerful system for defining and storing filters
    which can be rerun with each harvest iteration
  • Use metadata or lexical queries in addition to
    document classification techniques

15
Document Classification
  • Supervised Machine Learning trained upon subsets
    of documents from the wider collection
  • Automatically identifies the key features that
    define and distinguish between classes
  • Used in ASSERT to predict relevance to goal and
    to individual topic clusters
  • Can provide additional information relating to
    why a document should be used
  • Accuracy improved with larger training sets
  • Therefore benefits from iterative improvements
    following manual decisions

16
Exclusion 1
Cluster A
Exclusion 2
Cluster B
Cluster N
Exclusion X
Other
Other
Relevant
Not Relevant
17
Cycles of Exploration
18
Summarisation
  • Sentence extraction based upon significant terms
    within a document
  • Redundant sentences are removed from a ranked
    list
  • Users can specify the length of the summary
    according to individual requirements
  • Sectioning allows the user to focus on important
    parts of the document methodology, results or
    conclusions
  • Multi-document summaries can be constructed with
    results presented in a chronological order

19
(No Transcript)
20
Decision Logging
  • Document ID 14726544
  • First Introduced Harvest Round 3
  • Document Similarity 14726543, 14761123, 14626567
  • Cluster Similarity CL1-2 3, CL2-2 54, CLO-2
    23
  • Relevant Similarity 98
  • Excluded Analysis Round 3
  • Manual Exclusion EX3 by Reviewer BR
  • Predicted exclusion 78
  • Reviewer Comments ltblankgt
  • Comments ltblankgt

21
Collection Export
  • ltcollection idreview3-selectedOnlygt
  • ltcluster idCL2-2 nameEU Treatygt
  • ltdocument id14726544gt
  • lttitlegtEU treaty 'in Britain 's interest '.
    lt/titlegt
  • ltsummarygtThe treaty agreed by EU member states
    is
  • ltcontentgtThe treaty agreed by EU member states
    is
  • ltlastUpdatedgt15th August 2007 1341
    GMTlt/lastUpdatedgt
  • lt/documentgt
  • ltdocument id14626945gt
  • lttitlegtUK-Portugal talks to focus on
    EUlt/titlegt
  • ltsummarygtPortuguese PM Jose Socrates
  • ltcontentgtPrime Minister Jose Socrates
  • ltlastUpdatedgt17th August 2007 0329
    GMTlt/lastUpdatedgt
  • lt/documentgt

22
Development
  • Developed with iterative prototype methodology
  • Examination of Agile development methods
  • Implemented as Java Servlets to ensure component
    based system for reuse and portability
  • Presentation separated from processing model to
    allow multiple user interfaces over a single API
  • To be released as an integrated solution and
    workflow for systematic reviews
  • Aim for web services of the core components by
    the end of the project

23
Deployment Domains
  • Mental Health Rehabilitation
  • Walking and Cycling Schemes
  • BBC News Feeds
  • Obesity and Diabetes
  • Open source journals

24
Demonstration
ASSERT News Browser
25
Benefits to Users
  • Provision of a focused search with goal based
    results
  • Allows expansion beyond known keywords for a more
    complete search
  • Visualization of a result set creates an overview
    of the research in the domain
  • Integration into existing research workflows
    through import/export capabilities
  • Save time and effort

26
Issues and Barriers
  • Limited availability of large collections of
    cross disciplinary documents
  • Required customization to new document types
  • Perceived difficulty of integrating text mining
    into existing work practices
  • Black box approach does not help in generating
    trust in internal program logic
  • Lack of awareness in benefits of text mining as a
    whole in the social science community

27
What is ASSIST?
  • Additional funding by JISC to organise and
    support a community call for the social sciences.
  • 360K to be shared between two projects to
    support expansion of ASSERT and related tools for
    other social science research.
  • Specific focus on RoI from assistive technologies
    and significant benefit to existing research.
  • Chosen projects are
  • Frame Analysis in Media
  • Education Evidence Portal

28
More information
  • www.nactem.ac.uk/assert
  • brian.rea_at_manchester.ac.uk
  • Visit us at booth 35
Write a Comment
User Comments (0)
About PowerShow.com