Applications of Slow Intelligence Systems - PowerPoint PPT Presentation

About This Presentation

Title:

Applications of Slow Intelligence Systems

Description:

Title: IT Does not matter! Author: 730208 Last modified by: KSI Created Date: 7/20/2004 5:39:39 AM Document presentation format: On-screen Show Company – PowerPoint PPT presentation

Number of Views:106

Avg rating:3.0/5.0

Slides: 61

Provided by: 7303

Learn more at: https://people.cs.pitt.edu

Category:

more less

Transcript and Presenter's Notes

Title: Applications of Slow Intelligence Systems

1
Applications ofSlow Intelligence Systems
2
Outline

Application Social Influence Analysis
Application Product Service Optimization
Application Topic/Trend Detection
Application High Dimensional Feature Selection
Discussion

3
Outline

Application Social Influence Analysis
Application Product Service Optimization
Application Topic/Trend Detection
Application High Dimensional Feature Selection
Discussion

4
Application to Social Influence Analysis

In large social networks, nodes (users,
entities) are influenced by others for many
different reasons. How to model the diffusion
processes over social network and how to predict
which node will influence which other nodes in
network have been an active research topic
recently. Many researchers proposed various
algorithms. How to utilize these algorithms and
evolutionarily select the best one with the most
appropriate parameters to do social influence
analysis is our objective in applying the SIS
technology.

5
The Social Influence Analysis SIS System

Input data stream is first processed by the
Pre-Processor. The Enumerator then invokes the
super-component that creates the various social
influence analysis algorithms such as Linear
Threshold LIM, Susceptible-Infective-Susceptible
SIS, Susceptible-Infective-Recovered SIR and
Independent Cascading. The Tester collects and
presents the test results.

6
LIM Results of concept 1 and concept 3 with two
combinations of parameters in Plurk dataset
7
LIM Results of concept 1 and concept 3 with two
combinations of parameters in Facebook dataset
8
The SIA/SIS System

The Timing Controller will restart the social
influence analysis cycle with a different SIA
super component such as the Heat Diffusion
algorithms, or with different pre-processor. The
Eliminator eliminates the inferior SIA
algorithms, and the Concentrator selects the
optimal SIA algorithm.

9
Outline

Application Social Influence Analysis
Application Product Service Optimization
Application Topic/Trend Detection
Application High Dimensional Feature Selection
Discussion

10
SIS Application to Product Configuration
Production of personalized or custom-tailored
goods or services to meet consumers' diverse and
changing needs
11
Ontological Filter and Slow Intelligence System
12
A Scenario

A customer would like to buy a Personal Computer
in order to play videogames and surf on the
internet.
He knows that he needs an operating system, a
web browser and an antivirus package.
In particular, the user prefers a Microsoft
Windows operating system. He lives in the United
States and prefers to have a desktop. He also
prefers low cost components.

13
Ontological Transform for Product Configurator
14
Outline

Application Social Influence Analysis
Application Product Service Optimization
Application Topic/Trend Detection
Application High Dimensional Feature Selection
Discussion

15
Topic Detection and Tracking (TDT) System
Overview

Detect current hot topics and predict future hot
topics based on data collected from the internet
TDT System composes of
Crawler Extractor
Collect latest data from Internet for users
needs
Restrict range of data collection from web data
(focus crawler)
Topic Extractor
Discover current hot topics from a set of text
documents
Topic Detector
Predict hot topics

15
16
Topic/Trend Detection System

Crawler Extractor

Social Media
HTML documents
Users Keywords of Interests
Web Crawler
Text documents
Web data DB
Topic Extractor
Information Extractor
Extract articles and metadata (title, author,
content, etc) from semi-structured web content
Crawler Extractor
17
Focused Crawler Classification
Yahoo! Open Directory Project
Taxonomy Creation
Example Collection

URLs
Browsing

System proposes the most common classes
User marks as GOOD
User change trees

Taxonomy Selection and Refinement

System propose URLs found in small neighborhood
of examples.
User examines and includes some of these
examples.

Interactive Exploration
Training

Integrate refinements into statistical class
model
(classifier-specific action).

17
18
Focused Crawler Distillation

Identify relevant hubs by running a topic
distillation algorithm.
Raise visit priorities of hubs and immediate
neighbors.

Distillation

Report most popular sites and resources.
Mark results as useful/useless.
Send feedback to classifier and distiller.

Feedback
18
19
Extractor

Given a Web page
Build the HTML tag tree
Mine data regions
Mining data records directly is hard
Identify data records from each data region
Learn the structure of a general data record
A data record can contain optional fields
Extract the data

19
20
TDT Petri Net Simulation

Topic Detection and Tracking

20
21
21
22
Crawler
22
23
Initial State
23
24
Accept user input
24
25
Validate user input
25
26
Refine user input
26
27
Train the system
27
28
Detect most popular topic
28
29
Extractor
29
30
Extractor activated
30
31
Generate HTML tag trees
31
32
Detect important data
32
33
Train the system with record
33
34
Extract data
34
35
Save data into knowledge base
35
36
Topic Detection and Tracking
36
37
Slow Intelligence Steps in blue colorAccept
user requestSend request data to TDTEnumerator
generates combinationsEliminator selects the
best method to fit our needEvaluate
combinationsUse concentrator to highlight the
selected resultsSend the result to TDTGenerate
the instructions to the serverDispatcher gets
the instructionDecide where we are going to send
the instructionsSend the instructions to the
serverEnd of simulation run
37
38
Outline

Application Social Influence Analysis
Application Product Service Optimization
Application Topic/Trend Detection
Application High Dimensional Feature Selection
Discussion

39
Introduction

High-dimensional feature selection is a hot topic
in statistics and machine learning.
Model relationship between one response and
associated features , based on a
sample of size n.

39
40
Math formulation

Let be a vector of responses
and be
their associated covariate vectors where
.
When for the classification
problem, we assume a
Logistic model
We estimate the regression coefficient and
the bias by
minimizing the loss function

40
41
Application

Supervised learning gene selection problem in
bioinformatics
one wants to eliminate those irrelevant genes
(features) to obtain a robust classifier.
one wants to know which genes are the most
critical factors to the disease.

each samples data with p gene expression levels
n samples, patients or healthy ones
Important genes selected
41
each Gene expression level
42
Challenges

Dimensionality grows rapidly with interactions of
the features
Portfolio selection and networking modeling
2000 stocks involve
over 2 millions unknown parameters in the
covariance matrix.
Protein-protein interaction the sample size may
be in the order of
thousands, but the number of features can be in
the order of millions.
To construct effective method to learn
relationships between features and responses in
high dimension for scientific purposes.

42
43
Feature Selection Approach

Main SIS procedure
main_Enumerator
main_Eliminator
main_Adaptator
main_Propagator
main_Concentrator
time controller
Sub procedure
sub_enumerator
sub_concentrator
knowledge base

43
44
Main Enumerator

Enumerate p features
Among these features, some are relevant
to the responses while others not.

44
45
Main Eliminator

Apply Pearson Correlation between each feature
and response , then rank the value from high
to low and eliminate the lowest
features.
is a pre-defined constant.
is selected top feature set.

45
46
Sub Enumerator

Enumerate all feature selection algorithms in
Knowledge base by applying them to feature set
. And select top features as set from
for each algorithm.
Knowledge Base stores the existing candidate
algorithms.
We add L1-regularized regression, elastic-net
regularized regression
and forward stepwise regression. In
principle, any feature selection
algorithms can be put into the knowledge
base.

46
47
Sub Concentrator

For each selected feature set , we compute
the loss function
and choose the best algorithm with the
minimum loss.
Then the sub system selects features
from .
We denote the feature set

47
48
Main Adaptor

For all other features in the total p features,
we add each one to and compute the
loss function

48
49
Main Concentrator

Ranking all with
from low to high, and select the top
features with the smallest .
top features

49
50
Main Propagator

Add these top features to to form
the new feature set .
top features

50
51
Timing Controller

Timing controller controls the termination of
whole process. It sets a threshold .
if , it stops after sub
concentration process and outputs the selected
features
if , the process continues to
main adaption.
The larger the is, the more accurate
feature selection result is, but it needs more
time to compute. Thus slow decision cycles can
result in better performance for a long run.

51
52
General algorithm

52
53
Experimental ResultsDataset description

Leukemia dataset
Leukemia is a type of cancer of the blood.
This dataset consists of 72
samples including 47 acute myeloid
leukemia and 25 patients with
lymphoblastic leukemia, including
expression levels of 7129 human
genes. The data is separated to 38 samples
for training set and 34
samples for testing set.
Colon cancer dataset
This dataset consists of 62 samples
including 40 tumor colon tissues
and 22 normal colon tissue, including
expression levels of 2000 human
genes. The data is separated to 32 samples
for training set and 30
samples for testing set.

53
54
Experimental protocol

We compare our system with the three individual
feature selection algorithms in Knowledge base.
We report the number errors
and balance error rate

54
55
Experimental results

Our method out-performs individual algorithm.
When we increase K,
the number of cycles
defined by time controller,
the accuracy of our system
improves. It is a tradeoff
between the running time
and the performance.

55
56
Experimental results

For biological background, these genes are
critical for leukemia disease
Zyxin is known to interact with leukemogenic bHLH
proteins. This one is selected by both SIS (K5)
and SIS (K10).
Cystatin C (CST3) and Cystatin A are very
important two genes selected by SIS (K10) not by
SIS(K5), which indicates larger K leads more
accurate result.

56
57
Outline

Application Social Influence Analysis
Application Product Service Optimization
Application Topic/Trend Detection
Application High Dimensional Feature Selection
Discussion

58
Discussions

Implemented Social Influence Analysis algorithms
to find best model based upon Slow Intelligence
principles
Applied Slow Intelligence principle to
ontological filtering for Product and Service
Selection
Modeled and simulated Trend and Topic Detection
system using Petri net with the framework of Slow
Intelligence System.
Studied a new feature selection application with
the framework of Slow Intelligence System. It
leads to superior performance and can handle high
dimensional data.

58
59
Further Work

Design mechanism to dynamically update the
knowledge base by applying SIS approach onto
itself
Design a user-friendly interface to develop and
manage an application system

59
60
QA

Write a Comment

User Comments (0)