Applications of Slow Intelligence Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Applications of Slow Intelligence Systems

Description:

Title: IT Does not matter! Author: 730208 Last modified by: KSI Created Date: 7/20/2004 5:39:39 AM Document presentation format: On-screen Show Company – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 61
Provided by: 7303
Category:

less

Transcript and Presenter's Notes

Title: Applications of Slow Intelligence Systems


1
Applications ofSlow Intelligence Systems
2
Outline
  • Application Social Influence Analysis
  • Application Product Service Optimization
  • Application Topic/Trend Detection
  • Application High Dimensional Feature Selection
  • Discussion

3
Outline
  • Application Social Influence Analysis
  • Application Product Service Optimization
  • Application Topic/Trend Detection
  • Application High Dimensional Feature Selection
  • Discussion

4
Application to Social Influence Analysis
  • In large social networks, nodes (users,
    entities) are influenced by others for many
    different reasons. How to model the diffusion
    processes over social network and how to predict
    which node will influence which other nodes in
    network have been an active research topic
    recently. Many researchers proposed various
    algorithms. How to utilize these algorithms and
    evolutionarily select the best one with the most
    appropriate parameters to do social influence
    analysis is our objective in applying the SIS
    technology.

5
The Social Influence Analysis SIS System
  • Input data stream is first processed by the
    Pre-Processor. The Enumerator then invokes the
    super-component that creates the various social
    influence analysis algorithms such as Linear
    Threshold LIM, Susceptible-Infective-Susceptible
    SIS, Susceptible-Infective-Recovered SIR and
    Independent Cascading. The Tester collects and
    presents the test results.

6
LIM Results of concept 1 and concept 3 with two
combinations of parameters in Plurk dataset
7
LIM Results of concept 1 and concept 3 with two
combinations of parameters in Facebook dataset
8
The SIA/SIS System
  • The Timing Controller will restart the social
    influence analysis cycle with a different SIA
    super component such as the Heat Diffusion
    algorithms, or with different pre-processor. The
    Eliminator eliminates the inferior SIA
    algorithms, and the Concentrator selects the
    optimal SIA algorithm.

9
Outline
  • Application Social Influence Analysis
  • Application Product Service Optimization
  • Application Topic/Trend Detection
  • Application High Dimensional Feature Selection
  • Discussion

10
SIS Application to Product Configuration
Production of personalized or custom-tailored
goods or services to meet consumers' diverse and
changing needs
11
Ontological Filter and Slow Intelligence System
12
A Scenario
  • A customer would like to buy a Personal Computer
    in order to play videogames and surf on the
    internet.
  • He knows that he needs an operating system, a
    web browser and an antivirus package.
  • In particular, the user prefers a Microsoft
    Windows operating system. He lives in the United
    States and prefers to have a desktop. He also
    prefers low cost components.

13
Ontological Transform for Product Configurator
14
Outline
  • Application Social Influence Analysis
  • Application Product Service Optimization
  • Application Topic/Trend Detection
  • Application High Dimensional Feature Selection
  • Discussion

15
Topic Detection and Tracking (TDT) System
Overview
  • Detect current hot topics and predict future hot
    topics based on data collected from the internet
  • TDT System composes of
  • Crawler Extractor
  • Collect latest data from Internet for users
    needs
  • Restrict range of data collection from web data
    (focus crawler)
  • Topic Extractor
  • Discover current hot topics from a set of text
    documents
  • Topic Detector
  • Predict hot topics

15
16
Topic/Trend Detection System
  • Crawler Extractor

Social Media
HTML documents
Users Keywords of Interests
Web Crawler
Text documents
Web data DB
Topic Extractor
Information Extractor
Extract articles and metadata (title, author,
content, etc) from semi-structured web content
Crawler Extractor
17
Focused Crawler Classification
Yahoo! Open Directory Project
Taxonomy Creation
Example Collection
  • URLs
  • Browsing
  • System proposes the most common classes
  • User marks as GOOD
  • User change trees

Taxonomy Selection and Refinement
  • System propose URLs found in small neighborhood
    of examples.
  • User examines and includes some of these
    examples.

Interactive Exploration
Training
  • Integrate refinements into statistical class
    model
  • (classifier-specific action).

17
18
Focused Crawler Distillation
  • Identify relevant hubs by running a topic
    distillation algorithm.
  • Raise visit priorities of hubs and immediate
    neighbors.

Distillation
  • Report most popular sites and resources.
  • Mark results as useful/useless.
  • Send feedback to classifier and distiller.

Feedback
18
19
Extractor
  • Given a Web page
  • Build the HTML tag tree
  • Mine data regions
  • Mining data records directly is hard
  • Identify data records from each data region
  • Learn the structure of a general data record
  • A data record can contain optional fields
  • Extract the data

19
20
TDT Petri Net Simulation
  • Topic Detection and Tracking

20
21
21
22
Crawler
22
23
Initial State
23
24
Accept user input
24
25
Validate user input
25
26
Refine user input
26
27
Train the system
27
28
Detect most popular topic
28
29
Extractor
29
30
Extractor activated
30
31
Generate HTML tag trees
31
32
Detect important data
32
33
Train the system with record
33
34
Extract data
34
35
Save data into knowledge base
35
36
Topic Detection and Tracking
36
37
Slow Intelligence Steps in blue colorAccept
user requestSend request data to TDTEnumerator
generates combinationsEliminator selects the
best method to fit our needEvaluate
combinationsUse concentrator to highlight the
selected resultsSend the result to TDTGenerate
the instructions to the serverDispatcher gets
the instructionDecide where we are going to send
the instructionsSend the instructions to the
serverEnd of simulation run
37
38
Outline
  • Application Social Influence Analysis
  • Application Product Service Optimization
  • Application Topic/Trend Detection
  • Application High Dimensional Feature Selection
  • Discussion

39
Introduction
  • High-dimensional feature selection is a hot topic
    in statistics and machine learning.
  • Model relationship between one response and
    associated features , based on a
    sample of size n.

39
40
Math formulation
  • Let be a vector of responses
    and be
  • their associated covariate vectors where
    .
  • When for the classification
    problem, we assume a
  • Logistic model
  • We estimate the regression coefficient and
    the bias by
  • minimizing the loss function

40
41
Application
  • Supervised learning gene selection problem in
    bioinformatics
  • one wants to eliminate those irrelevant genes
    (features) to obtain a robust classifier.
  • one wants to know which genes are the most
    critical factors to the disease.

each samples data with p gene expression levels
n samples, patients or healthy ones
Important genes selected
41
each Gene expression level
42
Challenges
  • Dimensionality grows rapidly with interactions of
    the features
  • Portfolio selection and networking modeling
    2000 stocks involve
  • over 2 millions unknown parameters in the
    covariance matrix.
  • Protein-protein interaction the sample size may
    be in the order of
  • thousands, but the number of features can be in
    the order of millions.
  • To construct effective method to learn
    relationships between features and responses in
    high dimension for scientific purposes.

42
43
Feature Selection Approach
  • Main SIS procedure
  • main_Enumerator
  • main_Eliminator
  • main_Adaptator
  • main_Propagator
  • main_Concentrator
  • time controller
  • Sub procedure
  • sub_enumerator
  • sub_concentrator
  • knowledge base

43
44
Main Enumerator
  • Enumerate p features
  • Among these features, some are relevant
    to the responses while others not.

44
45
Main Eliminator
  • Apply Pearson Correlation between each feature
    and response , then rank the value from high
    to low and eliminate the lowest
    features.
  • is a pre-defined constant.
  • is selected top feature set.

45
46
Sub Enumerator
  • Enumerate all feature selection algorithms in
    Knowledge base by applying them to feature set
    . And select top features as set from
    for each algorithm.
  • Knowledge Base stores the existing candidate
    algorithms.
  • We add L1-regularized regression, elastic-net
    regularized regression
  • and forward stepwise regression. In
    principle, any feature selection
  • algorithms can be put into the knowledge
    base.

46
47
Sub Concentrator
  • For each selected feature set , we compute
    the loss function
  • and choose the best algorithm with the
    minimum loss.
  • Then the sub system selects features
    from .
  • We denote the feature set

47
48
Main Adaptor
  • For all other features in the total p features,
  • we add each one to and compute the
    loss function

48
49
Main Concentrator
  • Ranking all with
    from low to high, and select the top
    features with the smallest .
  • top features

49
50
Main Propagator
  • Add these top features to to form
    the new feature set .
  • top features

50
51
Timing Controller
  • Timing controller controls the termination of
    whole process. It sets a threshold .
  • if , it stops after sub
    concentration process and outputs the selected
    features
  • if , the process continues to
    main adaption.
  • The larger the is, the more accurate
    feature selection result is, but it needs more
    time to compute. Thus slow decision cycles can
    result in better performance for a long run.

51
52
General algorithm

52
53
Experimental ResultsDataset description
  • Leukemia dataset
  • Leukemia is a type of cancer of the blood.
    This dataset consists of 72
  • samples including 47 acute myeloid
    leukemia and 25 patients with
  • lymphoblastic leukemia, including
    expression levels of 7129 human
  • genes. The data is separated to 38 samples
    for training set and 34
  • samples for testing set.
  • Colon cancer dataset
  • This dataset consists of 62 samples
    including 40 tumor colon tissues
  • and 22 normal colon tissue, including
    expression levels of 2000 human
  • genes. The data is separated to 32 samples
    for training set and 30
  • samples for testing set.

53
54
Experimental protocol
  • We compare our system with the three individual
    feature selection algorithms in Knowledge base.
  • We report the number errors
  • and balance error rate

54
55
Experimental results
  • Our method out-performs individual algorithm.
  • When we increase K,
  • the number of cycles
  • defined by time controller,
  • the accuracy of our system
  • improves. It is a tradeoff
  • between the running time
  • and the performance.

55
56
Experimental results
  • For biological background, these genes are
    critical for leukemia disease
  • Zyxin is known to interact with leukemogenic bHLH
    proteins. This one is selected by both SIS (K5)
    and SIS (K10).
  • Cystatin C (CST3) and Cystatin A are very
    important two genes selected by SIS (K10) not by
    SIS(K5), which indicates larger K leads more
    accurate result.

56
57
Outline
  • Application Social Influence Analysis
  • Application Product Service Optimization
  • Application Topic/Trend Detection
  • Application High Dimensional Feature Selection
  • Discussion

58
Discussions
  • Implemented Social Influence Analysis algorithms
    to find best model based upon Slow Intelligence
    principles
  • Applied Slow Intelligence principle to
    ontological filtering for Product and Service
    Selection
  • Modeled and simulated Trend and Topic Detection
    system using Petri net with the framework of Slow
    Intelligence System.
  • Studied a new feature selection application with
    the framework of Slow Intelligence System. It
    leads to superior performance and can handle high
    dimensional data.

58
59
Further Work
  • Design mechanism to dynamically update the
    knowledge base by applying SIS approach onto
    itself
  • Design a user-friendly interface to develop and
    manage an application system

59
60
QA
Write a Comment
User Comments (0)
About PowerShow.com