Kein Folientitel

About This Presentation

Title:

Kein Folientitel

Description:

to a linear order and to visual variables. More. constraints. on search. 26 ... Search criterion textual property. Communication Visual data mining. Step 5 Example ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 75

Provided by: carbonVide

Category:

more less

Transcript and Presenter's Notes

Title: Kein Folientitel

1
Semantic Web Usage Mining Overview and Case
Studies
Bettina Berendt
Humboldt University Berlin Institute of
Information Systems www.wiwi.hu-berlin.de/berendt
2
Goals and top-level questions

Make the worlds knowledge available to the world
How do people discover knowledge on the Web?
How can more knowledge sources contribute to the
Web?

3
Approaches to the current Webs biggest
challenges lots of data, human-understandable
Web Mining extracts implicit knowledge
The Semantic Web makes knowledge machine- understa
ndable
Berendt, Hotho, Stumme, Proc. ISWC
2002 Berendt, Mladenic, et al. (Eds.), From Web
to Semantic Web, Springer LNAI 2004 Berendt,
Grobelnik, Mladenic et al. (Eds.), Semantics,
Web, and Mining, Springer LNAI 2006
4
Agenda
Web Mining
Why?
5
1. What should I buy?
6
2. Where do I find relevant information on ...?
7
3. What do people do there?
Name
8
4. How can a site be made usable for a
worldwide audience?
9
5a. Why go to a shop ...

... if everything is available on the Internet?

10
5b. What is my site worth for my business?
11
6. How to help people become active members of
the knowledge society help them to contribute
content?
12
Agenda
Web Mining
How?
13
Web Mining

Knowledge discovery (aka Data
mining)
the non-trivial process of identifying valid,
novel, potentially useful, and ultimately
understandable patterns in data. 1
Web Mining
the application of data mining techniques on the
content, (hyperlink) structure, and usage of Web
resources.

Web mining areas Web content mining
1 Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.,
Uthurusamy, R. (Eds.) (1996). Advances in
Knowledge Discovery and Data Mining. Boston, MA
AAAI/MIT Press
14
Data analysis the textbook version

The meaning of attributes is clear
The meaning of attribute values is clear
? Data modelling can be applied directly (e.g.,
regression, classification, clustering,
association-rule discovery)

(A simplified extract from the adult dataset in
the UCI machine learning repository)
15
Data analysis the reality ? data mining /
knowledge discovery process

...
p3ee24304.dip.t-dialin.net - - 19/Mar/20021203
51 0100"GET /search.html?tjane20austenSID023
785ordasc HTTP/1.0" 200 1759
p3ee24304.dip.t-dialin.net - - 19/Mar/20021205
06 0100 "GET /search.html?tjane20austenmvide
oSID023785orddesc HTTP/1.0" 200 8450
p3ee24304.dip.t-dialin.net - - 19/Mar/20021206
41 0100 "GET /view.asp?id3456SID023785
HTTP/1.0" 200 3478
...

What is the meaning of the attributes?
What is the meaning of the attribute values?
? Data modelling is only one part!

CRISP-DM
16
Where does semantics come in?
Semantics
17
Agenda
Semantic Web
How?
18
What is an ontology?

Definition Core ontology with axioms
a structure O ( C, C , R , s , R , A )
consisting of
two disjoint sets C (concept identifiers) and R
(relation identifiers)
a partial order C on C (concept hierarchy or
taxonomy)
a function s R ? C (signature), where C is
the set of all finite tuples of elements in C
a partial order R on R (relation hierarchy),
where
r1 R r2 implies s(r1) s(r2)
?i (s(r1)) C ?i (s(r2)) for all 1 i s(r1),

with ?i the projection on the i-th component
a set A of axioms in a logical language L

an explicit specification of a shared
conceptualisation (Gruber, 1993)
Stumme, Hotho, Berendt, Journal of Web
Semantics, 2006, and sources there
19
Agenda
20
Semantics of requests Step 1 Domain ontology

community portal ka2portal.aifb.uni-karlsruhe.de
ontology-based
Knowledge base in F-Logic
Static pages annotations
Dynamic pages generated from queries
Queries also in F-Logic
Logs contain these queries

Oberle, Berendt, Hotho, Gonzalez, Proc. AWIC
2003
21
Semantics of requests Step 2 Modelling
requests and sessions-as-sets

RESEARCHER
PERSON
PROJECT
PUBLICATION
RESEARCHTOPIC
EVENT
ORGANIZATION
RESEARCHINTEREST
LASTNAME
TITLE
ISABOUT
EVENTS
EVENTTITLE
WORKSATPROJECT
AUTHOR
AFFILIATION
ISWORKEDONBY
PROGRAMCOMMITTEE
EMPLOYS

An example query with concepts and relations
FORALL N,PEOPLE lt-PEOPLE Employeeaffiliation-gt
gt "http//www.anInstitute.org" and
PEOPLEPersonlastName-gtgtN.
Query feature vector of concepts
relations ? Session feature vector of
concepts relations, summed over all queries in
the session
Clustering, Association rules, Classification, ...
22
Semantics of sequences Step 3 Strategy pattern
discovery

An ontology of navigation strategies
Define strategy templates as regular expressions
Of requests (mapped to ontological entities)
Of transitions (between ontological entities)
Ex. .search . individual
Discover strategies by learning a strategy trie

affiliationSearch, 629
topicSearch, 312
...
...
repetition, 402
refinement, 113
...
individual, 112
repetition, 295
...
Berendt Spiliopoulou, VLDB Journal,
2000 Berendt, Data Mining and Knowledge
Discovery, 2002
23
NB For more exploratory analyses The Web Usage
Miner WUM

select t
from node a b, template a b as t
where a.url startswith "SEITE1-"
and a.occurrence 1
and b.url contains "1SCHULE"
and b.occurrence 1
and (b.support / a.support) gt 0.2

Spiliopoulou, 1999 Berendt Spiliopoulou, VLDB
Journal, 2000
24
Semantics of sequences Step 4 Strategy pattern
evaluation

Use strategy patterns statistics to
Derive descriptive measures of patterns
support, confidence
popularity, effectiveness, efficiency
Apply inferential statistics to compare patterns

Berendt, Data Mining and Knowledge Discovery,
2002
25
Communication Visual data mining Step 5
Mapping an ontological relation over concepts
to a linear order and to visual variables
Concreteness
Goal Individual page
Reach goal
Refine search
More constraints on search
First search page
Remain unspecific
Abandon search
Time
26
Ad Q.3 What do people do there?
27
Communication Visual data mining Step 5
Example
Berendt, Data Mining and Knowledge Discovery,
2002, Berendt, Postproc. WebKDD 2001
28
An online shop with a difference
Berendt, Günther, Spiekermann, Communications
of the ACM,2005
29
Communication Visual data miningStep 6 Visual
abstraction ? new semantic patterns
Close- ness to product
Shopping for cameras
Shopping for jackets
Berendt, Data Mining and Knowledge Discovery,
2002, Berendt, Postproc. WebKDD 2002
30
Ad Q.4 Worldwide usability
31
The impact of language and domain knowledge on
search option choice

2 studies on the use of search options in the
eHealth site
Webserver log 3 928 235 requests / 277 809
sessions from 188 countries
83.2 first-language users, 16.8
second-language users
Webserver log Questionnaire 165 (106) people
from 34 countries
84.9 first-language users, 15.1 second-language
users
10.4 physicians, 89.6 patients
Results
Search engine, alphabetical search in particular
first-language users, physicians
Content-organized search in particular
second-language patients
?
Domain knowledge compensates for limited language
knowledge.

Kralisch Berendt, New Review of Hypermedia and
Multimedia, 2005
32
Semantics Service ontology
33
Results on frequent search patterns
Alphabetical search hub-and-spoke ? only
linguistic relations (6.4)
Diagnoses are hubs" for navigation (5.3, 4)
Localization search linear / Depth-first ?
search refinement medical knowledge (5)
Berendt, Postproc. WebKDD 2005
34
Mining with ISOVISSemantic drill-down,
visualizing detail context
Berendt, Postproc. WebKDD 2005
35
Ad Q.5 Shopping behaviour and Web site value
36
5. What is my site worth for my business?

A site is often only a part of a distribution
strategy / one channel to reach customers.
What are the conversion rates (how many visitors
become buyers etc.)?
What are the cross-channel effects?

Internet market shares BCG 2002
37
Semantics The buying process as a service
ontology
38
Mining (example) Association rules for
investigating preferences in the buying process

Study based on 100K sessions, 13K transactions
from 2002 at a leading European retailer of
consumer electronics showed, among other things
Online payment ? Direct delivery (s0.27, c0.97)
lt 1/3 tradit. online users!
Online payment ? In-store pickup (s0.02, c0.03)
Cash on delivery ? Direct delivery (s0.02,
c0.03)
In-store payment ? In-store pickup (s0.69,
c0.94)
? Site is primarily used for information search.

? Key performance indicators (Web metrics ),
e.g.
conversion efficiency
offline conversion
effectivity and effiziency of search options

Berendt Spiliopoulou, VLDB Journal,
2000, Berendt, Data Mining and Knowl. Discovery,
2002 Teltzrow Berendt, Proc. WebKDD 2003
39
Agenda
Web Mining
(Semantic) Web
40
Step 6 Deployment of results Example 1 Using
results for site improvement
Name
City
Name

Path analysis metrics c2 analysis showed
All search criteria were approx. equally
effective
Location-based search was most popular
City-based search was most efficient ... but
least popular
? Modify site design to make efficient search
more popular

Berendt Spiliopoulou, VLDB Journal,
2000, Berendt, Data Mining and Knowl. Discovery,
2002 Spiliopoulou Pohle, DMKD, 2001
41
Step 6 Deployment of results Example 2 Using
results for personalization
Kralisch, Eisend, Berendt, Proc. HCI
International, 2005
42
Step 6 Deployment of results Example 3 A
privacy-preserving Web-metrics analysis service
Teltzrow, Preibusch, Berendt, IEEE EC Conf.
2004
43
Agenda
Web Mining
... ltBIBLIOGRAPHYgtltFLOATgtltPAGENUMBERgt136lt/PAGENUMB
ERgtlt/FLOATgt ltHEADgtLiteraturverzeichnislt/HEADgt
ltCITATION WORKTYPE"journal" PUBLISHED"PUBLISHED"
gtltCUT ID"bib-15-"gt1 lt/CUTgtltWORKAUTHORgtAgarwal,
R. Krueger, B. P. Scholes, G. D. Yang, M.
Yom, J. Mets, L. Fleming, G. R.lt/WORKAUTHORgtUltAR
TICLETITLEgtltrafast energy transfer in LHC-II
revealed by three-pulse photon echo peak shift
measurementslt/ARTICLETITLEgt, ltWORKTITLEgtJ. Phys.
Chem. Blt/WORKTITLEgt, ltPUBDATEgt2000lt/PUBDATEgt,
ltNUMBERgt104lt/NUMBERgt, ltPAGESgt2908lt/PAGESgt,
lt/CITATIONgt ...
Semantic Web
44
Data and metadata in the Digital Library EDOC

ltBIBLIOGRAPHYgtltFLOATgtltPAGENUMBERgt136lt/PAGENUMBERgtlt
/FLOATgt
ltHEADgtLiteraturverzeichnislt/HEADgt
...
ltCITATION WORKTYPE"journal" PUBLISHED"PUBLISHED"
gt
ltCUT ID"bib-45-"gt2 lt/CUTgtltWORKAUTHORgtAlbrecht,
T. F. Bott, K. Meier, T. Schulze, A. Koch,
M. Cundiff, S. T. Feldmann, J. Stolz, W.
Thomas, P. Koch, S. W. Goumlbel E.
O.lt/WORKAUTHORgt ltARTICLETITLEgtDisorder mediated
biexcitonic beats in semiconductor quantum
wellslt/ARTICLETITLEgt, ltWORKTITLEgtPhys. Rev.
Blt/WORKTITLEgt, ltPUBDATEgt1996lt/PUBDATEgt,
ltNUMBERgt54lt/NUMBERgt, ltPAGESgt4436lt/PAGESgt,
lt/CITATIONgt ...
(http//edoc.hu-berlin.de/diml/dtd/xdiml.dtd)

45
Authoring support for document servers

Surveys Web usage mining analysis of a digitial
publishing service showed
Metadata creation is one of the main barriers for
contribution.
Reasons include deficiencies in
information flow
understanding and use of structured search
education in structured writing
HCI aspects

? Marketing
) ) ? Education )
Berendt, Brenstein, Li, Wendland, Proc. ETD
2003 Berendt, Proc. AAAI Spring Symposium KCVC,
2005
46
and this has consequences(problems of the
fully manual approach)

ltBIBLIOGRAPHYgtltFLOATgtltPAGENUMBERgt136lt/PAGENUMBERgtlt
/FLOATgt
ltHEADgtLiteraturverzeichnislt/HEADgt
ltCITATION WORKTYPE"journal" PUBLISHED"PUBLISHED
"gt
ltCUT ID"bib-15-"gt1 lt/CUTgtltWORKAUTHORgtAgarwal,
R. Krueger, B. P. Scholes, G. D. Yang, M.
Yom, J. Mets, L. Fleming, G. R.lt/WORKAUTHORgtUltAR
TICLETITLEgtltrafast energy transfer in LHC-II
revealed by three-pulse photon echo peak shift
measurementslt/ARTICLETITLEgt, ltWORKTITLEgtJ. Phys.
Chem. Blt/WORKTITLEgt, ltPUBDATEgt2000lt/PUBDATEgt,
ltNUMBERgt104lt/NUMBERgt, ltPAGESgt2908lt/PAGESgt,
lt/CITATIONgt
...

47
The fully automatic approach
48
Why is this a problem?
Cardona Marx, Physik Journal 2004
Berendt, in Neues Handbuch Hochschullehre, 2003
49

Build a tool that is
user-friendly
intelligent
modular and extensible

50
Berendt, Dingel, Hanser, Proc. ECDL 2006
51
IR-THESIS System architecture
Text mining / Information Extraction tools
Web services
Databases (local a/o mirrored)
Web services
VBA macro
other WS and info. sources
52
(No Transcript)
53
Search and retrieval
54
(No Transcript)
55
(No Transcript)
56
Organisation of the literature /bibliography
construction
57
(No Transcript)
58
Discussion
59
(No Transcript)
60
Writing
61
Conclusions and outlook

Semantics are often necessary to do mining at all
Semantics often allow the analyst to make more
sense of the results
Semantic Web Mining is semi-automatic ?
interactive tools!
Standardisation can make the mining process more
automatic
Mining can help to generate semantics

To what extent are further user and context
modelling useful a/o necessary for valid
conclusions (intentions, goals, constraints, )?
How can we encourage standards?
When are explicit (formal) semantics better, when
implicit semantics?
How can we move beyond the Web (ubiquitous
environments)?
How can privacy be protected in a data-rich and
mining-rich world? (Are privacy semantics à la
P3P a solution?)
What do users want? What about other
stakeholders? Whom and what and how to ask?

62
Thank you for your attention!
63
Discussion points 1 Is reference markup
ontological / Semantic Web?

DiML (Dissertation Markup Language), used in the
case study above, is approximately structured
like Bibtex (with the difference that the type of
publication is an attribute, so there is only one
top-level concept citation). This makes it
comparable also to Dublin Core. The system in
ist latest versions also contains mapping to DC
and other commonly used schemata.
This makes it indeed an extremely primitive
ontology (essentially, a concept hierarchy with
one concept, publication with attributes with
literals as value range author, title, etc.).
Extensions to make this really semantic include
(some are part of our current work)
Author, affiliation, etc. as concepts with
instances, as in Repec.org ? introduces relations
like is-author-of
Unique identifiers of publications that allow the
detection of duplicates, as in Citeseer
Links to libraries, as in OpenURL
Versioning and other interesting relations
between different publications (cf. The Dublin
Core element relation)

64
Discussion point 2 Can folksonomies be used
instead of ontologies? (1)

This is a difficult question, not least because
it is still unclear what exactly tags are
an object-level summary and thus more content, or
a truly meta-level classification which comes
from a set of labels that is categorically
different from just more content words ?
In the following, I use the second
interpretation. I refer to folksonomy tags as
"concepts" because a folksonomy can formally be
regarded as an extremely simple ontology a set
of concepts with no hierarchical or other
relations between them.
The answer to the question in the title of this
slide depends on the aspect of folksonomies one
is most interested in, and how important one
thinks certain properties of ontologies.

65
Discussion point 2 Can folksonomies be used
instead of ontologies? (2)

The answer tends to be YES when one focuses on
WHO DEFINES THE CONCEPTS
All ontologies used in the case studies shown
were based on or extended popular models and/or
ontologies in the domain of investigation
search in the educational portal models of
information search from information science
shopping models of the customer buying process
from marketing
shopping with bot assistance the same our
design of questions, developed in conjunction
with a major German retailer
search in the medical portal like search in the
educational portal plus the medical ICD-9, the
International Classification of Diseases
DiML/DC).
But in fact, none of the ontologies used in the
case studies here was a "standard" in the sense
that many people agree on it and many
applications use it - in fact, there are precious
few such standard behaviour models!
In that sense, the ontologies used here are, like
much of the Semantic Web work, just one
possibility proposed by a number of people (the
research group application partners), instead
of the result of a standardisation effort.
IN FOLKSONOMY-STYLE TAGGING, A RESOURCE USUALLY
HAS MORE THAN ONE TAG
Any set of concepts that a group agrees on can be
used.
In SWUM (Semantic Web Usage Mining), Web pages
are mapped to single concepts (ex. slides 22ff.)
or sets of concepts (ex. slide 21). This set of
concepts could also be a tag set as in
del.icio.us.

66
Discussion point 2 Can folksonomies be used
instead of ontologies? (3)

The answer tends to be MAYBE when one focuses on
DYNAMICS introduce a non-stability of the
mapping, which means that the patterns would
change "depending on how you look at them" -
which may or may not be desirable
My opinion This quickly becomes untractable,
thus an ontology-based treatment of different
viewpoints and dynamics (? ontology evolution)
appears to be the better choice.
The answer tends to be NO when one focuses on
FORMAL PROPERTIES
HIERARCHIES generalization is an important
feature of many mining algorithms (unless you
abstract, you may not find any pattern.
(Non-hierarchical) RELATIONS
In folksonomies, there are no relations on
concepts. Therefore, meaningful visualizations
become harder to produce (note that the
stratograms shown on slides 27 and 29 require
relations that induce a linear order on
concepts).
Also, all other inference possibilities are lost.
COMPARABILITY The results of SWUM can only be
compared (e.g., conversion rates in one site with
those in another site) if stable and uniform
ontologies are used.

67
Discussion point 3 Which of the techniques shown
in this talk are being used in industry and other
real-world sites? (1)

Pre-remark 1 The contents of this talk was
(recent) research, thus it would be surprising to
see it already incorporated into industrial
practice. However, given that Web usage mining
has been around for a number of years, the
question is valid.
Pre-remark 2 Web usage mining is used on a large
scale by search engines. Google says it, Yahoo!
Says it. Both say they rely rather on
latent-semantic-indexing style semantics than on
Semantic-Web-style semantics (but they do use
lexica and other helpers) the boundaries are
fluid. Anyway, they dont say too much about the
details of their algorithms. After all, mining is
their business model ...
Anyway, we believe that SWUM is applicable to
analysing search when the focus is on what
services of a site(s interface) are used, not
when the content of searches is investigated (cf.
content vs. Service conceptual hierarchies in
Berendt Spiliopoulou, VLDB Journal 2000). Thus,
search engines are not the intended application
areas of our techniques, but retail, information,
e-Government, etc. sites.
The question should therefore be rephrased as 3
questions
Do off-the-shelf software packages (used by
end-user companies either on-site or in ASP mode,
i.e. without external consultants to do the
analyses) support Web usage mining, and
specifically Web usage mining with semantics?
The answer is Very partly.
Do consultants offer SWUM analyses?
The answer is partly.
What are the likely reasons?
A tentative answer is Perception problems and
lack of incentives.

68
Discussion point 3 (2) Support in off-the-shelf
software basic forms of analysis

Pageview counts and simple OLAP-type analyses
(hits by country, by language, etc.) are pretty
standard and supported even by most of the
simplest freeware products (e.g., Analog). Their
usage is very common in industry.
State-of-the-art commercial analysis software
like Webtrends allows a certain degree of
programming for extracting more attributes that
can be subjected to OLAP-type analyses (see below
for an example).
State-of-the-art software often also supports the
extraction of more information transferred via
Javascript. An example is Google Analytics.
Syntax is generally the only basis. Semantics
usually comes in only insofar as the Content
Management System used by most sites today
provides a certain frame of reference and
meaning.

69
Discussion point 3 (3) Support in off-the-shelf
software Conversion rates

Software generally also supports the definition
of simple templates from which conversion rates
can be computed automatically (e.g., a click on
page X with referrer Y, or after a sequence of
pages that started with referrer Y, is a
converted customer brought to us from the banner
shown on affiliated site S).
Conversion rates are not only extremely simple
(divide the number of sessions that reached X and
then Y by the number of sessions that reached X),
but also quite powerful Every success measure
that can be defined via reachability can be
cast a conversion rate.
The 3-click rule (every page must be reachable
with 3 clicks) is a related and equally
simple-to-compute measure. That a page is
reachable in 3 clicks can be computed from the
site graph, that it is reached can be computed
from frequent sequences. This only requires that
the tool can compute frequent contiguous
sequences, which is algorithmically simple and
requires little thinking on the part of the
analyst.
For conversion-rate computations, semantics
occurs in the simple sequence templates offered
by the tools, the mapping is gathered from the
users via Web forms or scripts.
Conversion rates are also related to pricing
models such as GoogleAds.
For a survey of software, see http//www.kdnuggets
.com/solutions/web-mining.html

70
Discussion point 3 (4) Support in off-the-shelf
software possibilities and limitations /
example country language

Language
is usually defined as either the presentation
language (in a site with dynamic pages generated
by a content management system, this can easily
be extracted)
or the language (assumed to be) preferred by the
user (the browser setting, which in most cases is
likely to be the default with which the browser
is shipped).
Country is inferred from the IP address and an IP
? geo-coordinates mapping. Such mappings are
provided by software like Maxmind. This is
relatively reliable according to the producers
and according to a test we did (publication in
preparation).
To obtain the users native language, we inferred
it from the Geo-IP mapping and official data on
official languages in countries around the world.
In a small experimental sample in which we asked
users to specify their native language, we
obtained quite high accuracy (Kralisch Berendt,
NRHM 2005).
I do not know of data on the accuracy of the
browser setting ? native language mapping, or of
data comparing it to the Geo-IP approach we used.
But only the combination presentation language
users native language gives information about
whether a user accesses content in his/her native
language or in a foreign language and this
knowledge may be much more important for
personalization than presentation language or
preferred language alone (see Kralisch, Ph.D.
dissertation 2006, http//edoc.hu-berlin.de/docvie
ws/abstract.php?id27410)
Nonetheless, even the semantics of presentation
language / user language are to my knowledge
not utilized in off-the-shelf software. One
reason is that the awareness of the importance of
language in Internet design has only begun.

71
Discussion point 3 (5) Consultancy companies

More advanced forms of conversion-rate analysis,
which rely on (some) semantics, have been
introduced or popularized by consultancy
companies.
Examples
NetGenesis (Cutler Sterne) E-Metrics White
Paper, 2000, http//www.emetrics.org/articles/whit
epaper.html
The funnelmetrics introduced there are now also
offered, for example, by Google Analytics
http//www.google.com/analytics/feature_funnel.ht
ml
Accenture (R. Ghani), Mining the Web to add
semantics to retail data mining, in Berendt et
al., Web Mining From Web to Semantic Web (2004).
survey by Anand et al., On the deployment of Web
usage mining, ibid.
Unfortunately, publicly available data on Web
usage are usually at a very high level of
aggregation and (also for this reason) build on
essentially non-semantic analysis types, e.g.
http//www.nielsen-netratings.com/resources.jsp?s
ectionpr_netvnav1

72
Discussion point 3 (6) Likely reasons

One major problem is a divergence between the
(current or definitional?) nature of data mining
/ knowledge discovery on the one hand, and
business expectations on the other
KD is still more an art than an engineering
process, with few standards even for process.
Business often expects data mining to be a set of
fully automatic, pre-packaged black-box
solutions.
The CRISP-DM process model shown on slide 16 ,
for example, is a very high-level attempt at
standardisation which leaves many details open.
In fact, it can be (and often is) argued that
the search for interesting and novel patterns
through exploratory data analysis by definition
involves hand-crafting. Going back to the
original definition of data mining (see slide
13), one could argue that looking for the values
of pre-defined pattern templates (e.g.,
conversion rates) is the antithesis of novel
patterns and thus by definition not data mining.
On the other hand, Web usage mining is
essentially market research a study of user /
consumer behaviour. Market research is an
established discipline in which it is quite
accepted that methods involve human intervention
and interpretation rather than the automatic
application of pre-packaged procedures (one
example is the focus-group method).

73
Discussion point 3 (7) Likely reasons contd.

Maybe this is a perception problem While it is
clear that consumer opinions bear a strong
qualitative element (such that focus groups
cannot be prepared, administered and interpreted
by a machine only), data mining carries the image
of number crunching (implying that computers are
the main actors here).
In line with this, the responsible people often
have disjoint qualifications The market research
people have a strong background in the relevant
social-science methods the IT people (who are
expected to do the data mining on the side) can
use tools, but usually have limited knowledge
about empirical methods in general or data mining
in particular.
This point was discussed at a panel at the WebKDD
workshop at SIGKDD 2005 one result was that the
job description Chief Data Officer ( a
senior-management person with resources who knows
about data mining in the sense of data analysis
AND computers) was a really recent invention. In
the meantime, data-mining consultancies filled
the gap (but had to convince companies they were
worth it).
Or it is a problem of lacking standards (once we
have behaviour models of retail sites, of
education sites, etc., we can pre-package these
behaviour ontologies and even compare sites).
Standards (in behaviour modeling) require that
there is an interest in what the behaviour models
say, and an interest in being comparable to other
sites. Encouraging developments in this direction
can currently be observed in the digital
libraries community.
... to be continued ...