Machine Learning and the Semantic Web - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Machine Learning and the Semantic Web

Description:

Machine Learning and the Semantic Web Hendrik Blockeel Katholieke Universiteit Leuven Department of Computer Science Thanks : Raymond Kosala, Nico Jacobs – PowerPoint PPT presentation

Number of Views:321
Avg rating:3.0/5.0
Slides: 32
Provided by: HendrikB3
Category:

less

Transcript and Presenter's Notes

Title: Machine Learning and the Semantic Web


1
Machine Learning and the Semantic Web
  • Hendrik Blockeel
  • Katholieke Universiteit Leuven
  • Department of Computer Science
  • Thanks Raymond Kosala, Nico Jacobs

2
Overview
  • Machine learning and data mining
  • Relationship with semantic web
  • Synergy between both
  • Some concrete examples
  • Document classification
  • Information integration
  • Conclusions

3
Machine Learning Data Mining
  • Related technology, different focus
  • Machine learning
  • Programs that improve their performance on
    certain tasks
  • Focus on adaptive behaviour
  • Data mining
  • Discovering implicit knowledge (regularities) in
    large amounts of data
  • Focus on handling large amounts of data
  • Very useful technology in the context of the Web

4
Learning Agents
  • Programs that
  • Learn the users preferences
  • Make life for the user as simple as possible
  • E.g., intelligent mail reader
  • E.g., adaptive web pages
  • Move links, create direct links, ...
  • Index page synthesis (Perkowitz Etzioni, IJCAI
    1999)
  • Learn how to find reliable information
  • E.g., learn which other people have similar
    preferences to this user, use their opinions to
    make suggestions
  • (other applications learning to play games, ...)

5
(No Transcript)
6
Mining the Web
  • Analyze data that are available on the Web
  • Distinguish 3 types
  • Web content mining
  • Look in contents of documents (text, ...)
  • Web structure mining
  • Look at links between documents
  • Web usage mining
  • Look at user logs (e.g. who accessed a web page,
    which links often used, ...)

7
Web Content Mining
  • Relies on information extraction
  • E.g., in a text find keywords, ...
  • Techniques from machine learning, statistics, ...
    used to guess from context
  • what a word means
  • what its function in the text is
  • ...
  • Fill a schema with specific slots, based on
    analysis of text
  • Even more complicated recognise objects in
    pictures, ...
  • I.E. is a complex matter

8
Mining for Genes
  • Jenssen et al. (2001), Nature Genetics 28, A
    literature network of human genes
  • Mining MEDLINE database of abstracts
  • Find names of genes occurring together
  • Construct similarity graph
  • Construct a database with this information
  • Database contains knowledge no single individual
    has, or could obtain without data mining
  • Similar techniques could be used on the web
  • One extra problem uncertainty about reliability

9
Web Structure Mining
  • Analyse structure of the web
  • Which sites have many incoming / outgoing links?
  • Identify hubs
  • Find clusters of sites that are strongly
    interconnected
  • Web communities
  • ...
  • E.g., Google
  • Identifies important pages based on links that
    point to it (rather than contents of page itself)

10
Web Usage Mining
  • Log user behaviour
  • Which links are often followed, in which order,
    how long is a page looked at, ...
  • Possible at several levels
  • General usage statistics
  • User-specific statistics
  • Relating behaviour to properties of user, insofar
    available
  • E.g., adaptive web sites
  • Adaplix project
  • automatic index page creation

11
Web Mining As It Currently Is
  • Machine learning / data mining strongly rely on
  • Data quantity
  • Data quality
  • Quantity is usually not a problem on the Web
  • Quality is!
  • Much data not in easily processable format
  • E.g. Inside text documents need information
    extraction
  • Unstructured, poorly structured, heterogeneously
    structured
  • Lots of noise
  • ...

12
How Is All This Related to the Semantic Web?
  • There can be a synergy
  • Machine learning can help with building the
    Semantic Web
  • The Semantic Web will help mining the Web, making
    Web interfaces and agents more intelligent

13
What Machine Learning Can Do for the Semantic Web
  • Upgrading the current web to a semantic web
    involves a lot of work
  • Can partially be automated!
  • Examples
  • Learning ontologies
  • Automatic document classification
  • Information integration
  • ...

14
Learning Ontologies
  • Maedche Staab (2001), Ontology learning for
    the semantic web
  • View
  • Manually creating of ontologies is very
    labour-intensive
  • Fully automating creating of ontologies is not
    feasible
  • Hence develop tool that helps building
    ontologies
  • Basic components
  • Good graphical interface (interaction
    man-machine)
  • Powerful underlying machine learning techniques

15
Text-To-Onto
  • Framework
  • Import / reuse existing ontologies
  • Extract ontology from documents
  • Identify new terms, map onto existing concepts or
    define new ones
  • Identify relationships between concepts
  • ...
  • Many opportunities for general machine learning
    techniques
  • Prune ontology
  • Refine ontology

16
Some Useful Techniques for Learning Ontologies
  • Term extraction from texts
  • Identification of concepts
  • Hierarchical Clustering
  • Clustering finding groups of similar things
  • Hierarchical clustering clusters of clusters
  • Taxonomy can be constructed through hierarchical
    clustering of concepts
  • Association rules
  • Find sets of terms that often occur together
  • May indicate important relations
  • E.g., events in texts often co-occur with
    locations

17
Information Integration
  • Doan, Domingos, Halevy Reconciling Schemas of
    Disparate Data Sources, ACM SIGMOD 2001
  • Context
  • Given databases with different schemas
  • Find similarities in schemas, guess how concepts
    map onto each other
  • Integrate the schemas
  • Essentially the same as mapping ontologies onto
    each other

18
Automated Document Classification
  • Mitchell et al.
  • Based on examples of web pages what kind of
    page they are (course page, student page, ...),
  • Learn to classify new pages
  • Can be based on contents of page, links pointing
    to page, typical structure of certain kinds of
    web sites (e.g. universities), ...
  • Note helps to relate objects to ontology
  • Problem how to get labeled examples
  • Unlimited amount of unlabelled pages available
  • But labelling them manually is labour intensive!

19
Exploiting Unlabelled Data
  • A solution co-training (Blum Mitchell 1998)
  • Learn separate (imperfect) classifiers from
    disjoint sets of sufficient information
  • E.g. Learn to classify pages from
  • Content of page (Home page of CS 101)
  • Links pointing to page (CS 101)
  • Take classifications that classifier A is most
    certain of, add these labels to training set for
    B (and vice versa)
  • Repeat multiple times (kind of bootstrapping
    process)
  • Co-training allows to exploit large amounts of
    unlabelled data!

20
What the Semantic Web Can Do for Machine Learning
  • Will make mining the web much easier
  • Reason 1 removal of ambiguity
  • More precise knowledge of what is meant with
    certain terms
  • Reason 2 structured vs. unstructured data
  • Learning from structured data is much easier than
    from unstructured data
  • Reason 3 availability of background knowledge
  • Can be used to make better decisions when learning

21
Removal of Ambiguity
  • Example text document classification
  • E.g., given a text, tell in which newsgroups it
    belongs
  • Typical approaches bag of words
  • Look only at which words occur, in the text, and
    how often
  • Each time a word occurs that occurs mainly in one
    particular class, increase probability for that
    class
  • But words are ambiguous!
  • Increased classification accuracy can be expected
    by removing ambiguity

22
Mining From (Un)structured Data
  • Mining data intensively querying data
  • Answering a querying is
  • Easy in structured data
  • Relational database, XML, ...
  • Harder in semi-structured data (e.g., HTML)
  • Hard in unstructured data
  • Information exraction needed
  • Could do this by learning a wrapper
  • This involves one extra layer of learning
  • Relating this to our text example taking into
    account function of words in text

23
Availability of Background Knowledge
  • Learning finding relevant patterns in behaviour
  • Important to have the right context to describe
    these patterns
  • Example
  • Making interesting offers to clients
  • People who bought this book also bought ...
  • Instance-based learning
  • Estimate profile of user
  • Find users with similar profile
  • Look at behaviour of those users to help current
    user

24
Availability of Background Knowledge
  • Can work better if more background knowledge is
    available, e.g., type of book, author, ...
  • For instance, for books
  • similar profile users that up till now bought
    same books as this user
  • May not be many people
  • similar often bought books by same author
  • Probably many more people, allows for more
    reasonable guess
  • similar often bought books of same genre
    (fiction, ...)
  • May work even better
  • Ontologies (among other) provide such background
    knowledge

25
Web Mining Revisited
  • Semantic Web will change
  • Content mining
  • Clearer view on contents and meaning of documents
  • Structure mining
  • More relevant structure
  • Usage mining
  • More relevant information on actions of user
  • Will in general improve intelligence of systems
  • E.g. mail filter gets a better view of contents
    of mails

26
Promising Learning Techniques
  • Many different learning techniques exist
  • Neural networks, support vector machines,
    instance-based learning, bayesian learning,
    association rules, ...
  • Not all equally suitable for any task
  • E.g. SVM for document classification works well
  • E.g. instance-based learning find other users
    with same profile as this user to make
    predictions
  • Intelligent agents will use a mix of them
  • Relational learners seem interesting
  • Can handle explicit information on objects and
    relations between them
  • Classic example Inductive logic programming

27
Inductive Logic Programming
  • Induces rules in first order logic from examples
    or other rules
  • Such rules can be used to reason with
  • The reasoning can be explained
  • Cf. example of mail program
  • Can use existing background knowledge
  • knowledge intensive learning
  • Currently good background knowledge has to be
    engineered manually
  • Will become more easily available with semantic
    web
  • Example mining in chemical domains

28
(No Transcript)
29
Mining in chemical domains
  • Example problem relate activity of molecule to
    its properties
  • Useful for, e.g., drug development
  • Which properties are important?
  • Chemically relevant properties functional
    groups, 3D structure, ... ?
  • Has to be encoded manually
  • Ideally get relevant information from some
    trustworthy data source as and when needed
  • Intelligent agents will exploit (tap) the
    common intelligence of the Web

30
Conclusions
  • Machine learning is an promising tool for the
    Semantic Web
  • For building it
  • For exploiting it
  • Clear synergy between Semantic Web efforts and
    Machine Learning efforts

31
Some References
  • Maedche, A Machine Learning Perspective for the
    Semantic Web, position paper www.semanticweb.org/
    SWWS/program/position/soi-maedche.pdf
  • Maedche Staab (2001) Ontology Learning for the
    Semantic Web, IEEE Intelligent Systems 16(2)
  • Jenssen et al., Nature Genetics 28
  • Doan et al. (2001), ACM SIGMOD conf.
  • Kosala Blockeel (2000), SIGKDD Explorations
    2(1)
  • Mitchell (1996), Machine Learning
Write a Comment
User Comments (0)
About PowerShow.com