Title: ?? ???? (Data Mining Techniques ) : An Overview
1?? ???? (Data Mining Techniques ) An Overview
- ????????
- ? ??
- jkim_at_kordic.re.kr
2????? ?? ?? ??
???
????
??
- ????
- Point of Sale
- ATM
- ????
- ????
- ??
- ????
- ????
- ??????
- A?? ???? 80? B??? ????
- ????? ??? ???? 6??? ??
- A??? ?? ??? B??? 2?
- ?? ??? ??? ??
- ????? ?
- ??? ??
- ??? ?? ??? ?
- ????? ????? ?
- ??? ?? ???? ?
- ??? ?
3Data Mining ?? ?
- ??? ??????
- ??? ??? ????
- ???? ?? ??? ????
- ??? ??????? ????
- ???
- ??? ????? ??, ??, ??, ??,??? ???
4? ?
- ???? ??? ?? ??? ??
- ?????? ???? ?? ??
- ??? ??? ???
- ??, ??? ?? (???)
- ??? ??? ??
- ????? ?? ??
- ????(Machine Learning) ??? ??
- Knowledge Discovery, Knowledge Extraction,
Machine Learning, Data/Pattern Analysis
5Data Mining ??
- 1) ??? ??
- - ??? ??? ?? ??
- 2) ??? ??
- - ?? ?? ??? ?? ?? ??
- - ???, ???, ???,
- 3) ?? ??
- 4) ?? ??
- - ??? (??), ?? ??
- - ??, ???
6Data Mining ??
Select
Transform
Mine
Assimilate
????
????
????
?? ? ??
DATABASE
??? ???
Extracted Data
Selected Data
Assimilated data
Transformed Data
Visualization
???
7? ? ? ?
Target Audience
Customers Purchased frozen orange juice in 12oz
cans
- Purchase History
- Point of Sale Data
- Survey data
60? ??? ??? ?? ??
Royal Customers (buys the same brand 80 of time)
8Data Mining?? ??
- u
- ??? ??, ??? ??? ???
- u
- ??? ??????? ??? ???
- u
- ??? ?? ??? ???
- u
9??? ??? ??? ???
Data Mining?? ??
- Association(??? ??)
- Characterization(????)
- Classification(??)
- Summarization(??)
- Clustering(???)
- Sequential Pattern Discovery(??????)
- Trend(?? ??)
- Deviation Detection(??????)
10??? DB? ??? ???
Data Mining????
- Relational DB
- transactional DB
- Object-oriented DB
- Spatial DB
- Temporal DB
- Textual vs Multimedia
- Hetrogeneous,
11?? ??? ???
Data Mining ????
- ????? ????? ???
- ??? ??, rule induction
- ????/??? ??
- Statistical Classification(supervised learning)
- Clustering Techniques(unsupervised learning)
- Time Series Analysis
- ???? ??
- ????? ??? functional mapping? ??
- ??? ?? algorithm? ??
12??? ?? ??
- Transaction DB? ????
- RULE ??? ??
- A gt B support, confidence
- support (A and B) / (total transactions)
- confidence (A and B) / (A)
- ? milk gt bread 7, 70
- ?? 1 ??????? ????
- ?? 2 ??? ??????
- ???? ??? ???? ?? ??
13??? ?? ??
Association Rules with Maximum support of 50
?? ??
14Data Classification
- ?????? ??? ??? ??
- ????? Class-label ? feature set?? ??
- ????(Supervised Learning)
- ????? ??? ??, ??? ??
- ??? ??? ??? ? ??? ?? ??
- ?? Credit Approval, ?? ??
- ? ??? ????? ????? ?? ???? ?? ?? ??
- Decision Tree, ???, ??? ???
15Classification Example
Classifier
Class 1 ??? ??
Class 2 ??? ??
Class 3 ??? ??
16Decision Tree Classifier
- ?????? Decision Tree ???? ??
- ID3 algorithm
17Neural Network Classifier
- ??? ?????? ??? ???? ??
- ??? Neuron? ????? ???
- ?? ???? ??
- Error-back-propagation ????????
- ??? Functional Mapping? ?? ???
18Neural Network Classifier
Input layer
Hidden layer
Output layer
19Sequential Pattern Discovery
- Transaction ????? ??? ?? ??
- ??
- ??????? ?? ?? ??
- ???? ?? ??
- ?? ??? ?? ?? ??, ??
- ??? ??? ?? ??, ??
- ???
- ??? ??? ??
- Hidden Markov Model for doubly stochastic process
modeling
20Sequential Pattern Example
Sequential Pattern in DataBase
21Similar Time Series
Matching Curve Found
22Clustering(???)
- ??? ???? ? ???? ??
- ????? ??? ???
- Unsupervised Learning Algorithms
- Symbolic
- Neural Network based (Kohonen Feature Map)
- ??
- ???? ??? ??? - ?? ??? ??
- ??? ???, ????? ?? ?? ????
23Clustering Example
24Symbolic Clustering
Similarity 2
Similarity 2
Diff3
Diff2.83
Diff3
Similarity 3
Total Score for this cluster partition average
similarity average difference
2.33 2.94 5.27
25Data Mining Interface
- Interactive Mining
- GUI? ?? Task? ??
- Data Mining Query Language
- find association rules
- related to gpa, birth_place, family_income
- from student
- where major CS and birth_place Seoul
- with support threshold 0.05
- with confidence threshold 0.7
26Kohonens Feature Map
- ???? ??? ??? ??
- ??? ??? ??? ???? ???? ??
- ??? Feature Map??? ? ??? ??
- ???? ??
- Feature Map ?? ??? ?? Difference
- ????? ?? ??
- 1) ??? ?? X? ?? ? ?? ?? N? ??
- 2) N? ? N? ???? ????? X? ???? ??
- 3) ?? ??? ??? ??? ??? ?? ?? ??
27???? ??? ?? ??
- ?????? ?
- ??? ????? ??
- ? ??? ???? ?
- ?? ??? ??? ???? ??? ????? ?
- ?? ??? ?? ??, ??? ???? ??
- ?? ??? ??? ????? ?
- ?? ??
- ??? ??????? ????? ???? ???????
- ?? ??, ????, ?? ??, ?? ??
28??? ?? ??
Scoring???
???? ???? ???? ??? ??
?? ?? ?? ?? ????
????
? ??? ????
29??????? ??? Overview
???? ???? ???? ?????
???? DB
Credit ???
Decision Tree
??? ??
???? ??
?? ??? Scoring (Neural Network
Scoring ???
Credit ?? ? ???? ??
30?? ?? ???? ????
- LG?????
- ???? ????? ??? ??
- ?? ???? ???? ???? ?? ?? ??
- ????? ?? ??
- ????, ????, ??? ??, ??? ??
- ??? ???? Fraud Score ??
- 1995? LG???? ???? 14???? ??
- ?? ??? ??
31IBM Intelligent Miner
32?? ??
- Mining Business Databases, Brachman, et al.,
CACM, Vol39, No11, 1996 - Mining Scientific Data, Fayyad, et al., CACM,
Vol39, No11, 1996 - Quest(IBM Almaden)
- http//www.almaden.ibm.com/cs/quest
- DBMiner(Simon Fraser Univ.)
- http//db.cs.sfu.ca/DBMiner
- KDD(GTE)
- http//info.gte.com/kdd/index.html
- International Conference on Knowledge Discovery
and Data Mining - Advances in Knowledge Discovery and Data Mining,
MIT press, 1996
33? ?
- ??? ?? ?? gt ??, ??? ?? ??
- ??????? ??? ??
- ????, DB ??? ?? ??
- ???? ??? ??? ?? ??? ??
- ?? ?? ??? ?? ?
- ??? ?????? ?? ?? ??
- Hot Research Item