Title: Kein Folientitel
1Semantic Web Usage Mining Overview and Case
Studies
Bettina Berendt
Humboldt University Berlin Institute of
Information Systems www.wiwi.hu-berlin.de/berendt
2Goals and top-level questions
- Make the worlds knowledge available to the world
- How do people discover knowledge on the Web?
- How can more knowledge sources contribute to the
Web?
3Approaches to the current Webs biggest
challenges lots of data, human-understandable
Web Mining extracts implicit knowledge
The Semantic Web makes knowledge machine- understa
ndable
Berendt, Hotho, Stumme, Proc. ISWC
2002 Berendt, Mladenic, et al. (Eds.), From Web
to Semantic Web, Springer LNAI 2004 Berendt,
Grobelnik, Mladenic et al. (Eds.), Semantics,
Web, and Mining, Springer LNAI 2006
4Agenda
Web Mining
Why?
51. What should I buy?
62. Where do I find relevant information on ...?
73. What do people do there?
Name
84. How can a site be made usable for a
worldwide audience?
95a. Why go to a shop ...
- ... if everything is available on the Internet?
105b. What is my site worth for my business?
116. How to help people become active members of
the knowledge society help them to contribute
content?
12Agenda
Web Mining
How?
13Web Mining
- Knowledge discovery (aka Data
mining) - the non-trivial process of identifying valid,
novel, potentially useful, and ultimately
understandable patterns in data. 1 - Web Mining
- the application of data mining techniques on the
content, (hyperlink) structure, and usage of Web
resources.
Web mining areas Web content mining
1 Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.,
Uthurusamy, R. (Eds.) (1996). Advances in
Knowledge Discovery and Data Mining. Boston, MA
AAAI/MIT Press
14Data analysis the textbook version
- The meaning of attributes is clear
- The meaning of attribute values is clear
- ? Data modelling can be applied directly (e.g.,
regression, classification, clustering,
association-rule discovery)
(A simplified extract from the adult dataset in
the UCI machine learning repository)
15Data analysis the reality ? data mining /
knowledge discovery process
- ...
- p3ee24304.dip.t-dialin.net - - 19/Mar/20021203
51 0100"GET /search.html?tjane20austenSID023
785ordasc HTTP/1.0" 200 1759 - p3ee24304.dip.t-dialin.net - - 19/Mar/20021205
06 0100 "GET /search.html?tjane20austenmvide
oSID023785orddesc HTTP/1.0" 200 8450 - p3ee24304.dip.t-dialin.net - - 19/Mar/20021206
41 0100 "GET /view.asp?id3456SID023785
HTTP/1.0" 200 3478 - ...
- What is the meaning of the attributes?
- What is the meaning of the attribute values?
- ? Data modelling is only one part!
CRISP-DM
16Where does semantics come in?
Semantics
17Agenda
Semantic Web
How?
18What is an ontology?
- Definition Core ontology with axioms
- a structure O ( C, C , R , s , R , A )
consisting of - two disjoint sets C (concept identifiers) and R
(relation identifiers) - a partial order C on C (concept hierarchy or
taxonomy) - a function s R ? C (signature), where C is
the set of all finite tuples of elements in C - a partial order R on R (relation hierarchy),
where - r1 R r2 implies s(r1) s(r2)
- ?i (s(r1)) C ?i (s(r2)) for all 1 i s(r1),
with ?i the projection on the i-th component - a set A of axioms in a logical language L
an explicit specification of a shared
conceptualisation (Gruber, 1993)
Stumme, Hotho, Berendt, Journal of Web
Semantics, 2006, and sources there
19Agenda
20Semantics of requests Step 1 Domain ontology
- community portal ka2portal.aifb.uni-karlsruhe.de
- ontology-based
- Knowledge base in F-Logic
- Static pages annotations
- Dynamic pages generated from queries
- Queries also in F-Logic
- Logs contain these queries
Oberle, Berendt, Hotho, Gonzalez, Proc. AWIC
2003
21Semantics of requests Step 2 Modelling
requests and sessions-as-sets
- RESEARCHER
- PERSON
- PROJECT
- PUBLICATION
- RESEARCHTOPIC
- EVENT
- ORGANIZATION
- RESEARCHINTEREST
- LASTNAME
- TITLE
- ISABOUT
- EVENTS
- EVENTTITLE
- WORKSATPROJECT
- AUTHOR
- AFFILIATION
- ISWORKEDONBY
- PROGRAMCOMMITTEE
- EMPLOYS
An example query with concepts and relations
FORALL N,PEOPLE lt-PEOPLE Employeeaffiliation-gt
gt "http//www.anInstitute.org" and
PEOPLEPersonlastName-gtgtN.
Query feature vector of concepts
relations ? Session feature vector of
concepts relations, summed over all queries in
the session
Clustering, Association rules, Classification, ...
22Semantics of sequences Step 3 Strategy pattern
discovery
- An ontology of navigation strategies
- Define strategy templates as regular expressions
- Of requests (mapped to ontological entities)
- Of transitions (between ontological entities)
- Ex. .search . individual
- Discover strategies by learning a strategy trie
affiliationSearch, 629
topicSearch, 312
...
...
repetition, 402
refinement, 113
...
individual, 112
repetition, 295
...
Berendt Spiliopoulou, VLDB Journal,
2000 Berendt, Data Mining and Knowledge
Discovery, 2002
23NB For more exploratory analyses The Web Usage
Miner WUM
- select t
- from node a b, template a b as t
- where a.url startswith "SEITE1-"
- and a.occurrence 1
- and b.url contains "1SCHULE"
- and b.occurrence 1
- and (b.support / a.support) gt 0.2
Spiliopoulou, 1999 Berendt Spiliopoulou, VLDB
Journal, 2000
24Semantics of sequences Step 4 Strategy pattern
evaluation
- Use strategy patterns statistics to
- Derive descriptive measures of patterns
- support, confidence
- popularity, effectiveness, efficiency
- Apply inferential statistics to compare patterns
Berendt, Data Mining and Knowledge Discovery,
2002
25Communication Visual data mining Step 5
Mapping an ontological relation over concepts
to a linear order and to visual variables
Concreteness
Goal Individual page
Reach goal
Refine search
More constraints on search
First search page
Remain unspecific
Abandon search
Time
26Ad Q.3 What do people do there?
27Communication Visual data mining Step 5
Example
Berendt, Data Mining and Knowledge Discovery,
2002, Berendt, Postproc. WebKDD 2001
28An online shop with a difference
Berendt, Günther, Spiekermann, Communications
of the ACM,2005
29Communication Visual data miningStep 6 Visual
abstraction ? new semantic patterns
Close- ness to product
Shopping for cameras
Shopping for jackets
Berendt, Data Mining and Knowledge Discovery,
2002, Berendt, Postproc. WebKDD 2002
30Ad Q.4 Worldwide usability
31The impact of language and domain knowledge on
search option choice
- 2 studies on the use of search options in the
eHealth site - Webserver log 3 928 235 requests / 277 809
sessions from 188 countries - 83.2 first-language users, 16.8
second-language users - Webserver log Questionnaire 165 (106) people
from 34 countries - 84.9 first-language users, 15.1 second-language
users - 10.4 physicians, 89.6 patients
- Results
- Search engine, alphabetical search in particular
first-language users, physicians - Content-organized search in particular
second-language patients - ?
- Domain knowledge compensates for limited language
knowledge.
Kralisch Berendt, New Review of Hypermedia and
Multimedia, 2005
32Semantics Service ontology
33 Results on frequent search patterns
Alphabetical search hub-and-spoke ? only
linguistic relations (6.4)
Diagnoses are hubs" for navigation (5.3, 4)
Localization search linear / Depth-first ?
search refinement medical knowledge (5)
Berendt, Postproc. WebKDD 2005
34Mining with ISOVISSemantic drill-down,
visualizing detail context
Berendt, Postproc. WebKDD 2005
35Ad Q.5 Shopping behaviour and Web site value
365. What is my site worth for my business?
- A site is often only a part of a distribution
strategy / one channel to reach customers. - What are the conversion rates (how many visitors
become buyers etc.)? - What are the cross-channel effects?
Internet market shares BCG 2002
37Semantics The buying process as a service
ontology
38Mining (example) Association rules for
investigating preferences in the buying process
- Study based on 100K sessions, 13K transactions
from 2002 at a leading European retailer of
consumer electronics showed, among other things - Online payment ? Direct delivery (s0.27, c0.97)
lt 1/3 tradit. online users! - Online payment ? In-store pickup (s0.02, c0.03)
- Cash on delivery ? Direct delivery (s0.02,
c0.03) - In-store payment ? In-store pickup (s0.69,
c0.94) - ? Site is primarily used for information search.
- ? Key performance indicators (Web metrics ),
e.g. - conversion efficiency
- offline conversion
- effectivity and effiziency of search options
Berendt Spiliopoulou, VLDB Journal,
2000, Berendt, Data Mining and Knowl. Discovery,
2002 Teltzrow Berendt, Proc. WebKDD 2003
39Agenda
Web Mining
(Semantic) Web
40Step 6 Deployment of results Example 1 Using
results for site improvement
Name
City
Name
- Path analysis metrics c2 analysis showed
- All search criteria were approx. equally
effective - Location-based search was most popular
- City-based search was most efficient ... but
least popular - ? Modify site design to make efficient search
more popular
Berendt Spiliopoulou, VLDB Journal,
2000, Berendt, Data Mining and Knowl. Discovery,
2002 Spiliopoulou Pohle, DMKD, 2001
41Step 6 Deployment of results Example 2 Using
results for personalization
Kralisch, Eisend, Berendt, Proc. HCI
International, 2005
42Step 6 Deployment of results Example 3 A
privacy-preserving Web-metrics analysis service
Teltzrow, Preibusch, Berendt, IEEE EC Conf.
2004
43Agenda
Web Mining
... ltBIBLIOGRAPHYgtltFLOATgtltPAGENUMBERgt136lt/PAGENUMB
ERgtlt/FLOATgt ltHEADgtLiteraturverzeichnislt/HEADgt
ltCITATION WORKTYPE"journal" PUBLISHED"PUBLISHED"
gtltCUT ID"bib-15-"gt1 lt/CUTgtltWORKAUTHORgtAgarwal,
R. Krueger, B. P. Scholes, G. D. Yang, M.
Yom, J. Mets, L. Fleming, G. R.lt/WORKAUTHORgtUltAR
TICLETITLEgtltrafast energy transfer in LHC-II
revealed by three-pulse photon echo peak shift
measurementslt/ARTICLETITLEgt, ltWORKTITLEgtJ. Phys.
Chem. Blt/WORKTITLEgt, ltPUBDATEgt2000lt/PUBDATEgt,
ltNUMBERgt104lt/NUMBERgt, ltPAGESgt2908lt/PAGESgt,
lt/CITATIONgt ...
Semantic Web
44Data and metadata in the Digital Library EDOC
- ltBIBLIOGRAPHYgtltFLOATgtltPAGENUMBERgt136lt/PAGENUMBERgtlt
/FLOATgt - ltHEADgtLiteraturverzeichnislt/HEADgt
- ...
- ltCITATION WORKTYPE"journal" PUBLISHED"PUBLISHED"
gt - ltCUT ID"bib-45-"gt2 lt/CUTgtltWORKAUTHORgtAlbrecht,
T. F. Bott, K. Meier, T. Schulze, A. Koch,
M. Cundiff, S. T. Feldmann, J. Stolz, W.
Thomas, P. Koch, S. W. Goumlbel E.
O.lt/WORKAUTHORgt ltARTICLETITLEgtDisorder mediated
biexcitonic beats in semiconductor quantum
wellslt/ARTICLETITLEgt, ltWORKTITLEgtPhys. Rev.
Blt/WORKTITLEgt, ltPUBDATEgt1996lt/PUBDATEgt,
ltNUMBERgt54lt/NUMBERgt, ltPAGESgt4436lt/PAGESgt, - lt/CITATIONgt ...
- (http//edoc.hu-berlin.de/diml/dtd/xdiml.dtd)
45Authoring support for document servers
- Surveys Web usage mining analysis of a digitial
publishing service showed - Metadata creation is one of the main barriers for
contribution. - Reasons include deficiencies in
- information flow
- understanding and use of structured search
- education in structured writing
- HCI aspects
? Marketing
) ) ? Education )
Berendt, Brenstein, Li, Wendland, Proc. ETD
2003 Berendt, Proc. AAAI Spring Symposium KCVC,
2005
46 and this has consequences(problems of the
fully manual approach)
- ltBIBLIOGRAPHYgtltFLOATgtltPAGENUMBERgt136lt/PAGENUMBERgtlt
/FLOATgt - ltHEADgtLiteraturverzeichnislt/HEADgt
- ltCITATION WORKTYPE"journal" PUBLISHED"PUBLISHED
"gt - ltCUT ID"bib-15-"gt1 lt/CUTgtltWORKAUTHORgtAgarwal,
R. Krueger, B. P. Scholes, G. D. Yang, M.
Yom, J. Mets, L. Fleming, G. R.lt/WORKAUTHORgtUltAR
TICLETITLEgtltrafast energy transfer in LHC-II
revealed by three-pulse photon echo peak shift
measurementslt/ARTICLETITLEgt, ltWORKTITLEgtJ. Phys.
Chem. Blt/WORKTITLEgt, ltPUBDATEgt2000lt/PUBDATEgt,
ltNUMBERgt104lt/NUMBERgt, ltPAGESgt2908lt/PAGESgt, - lt/CITATIONgt
- ...
47The fully automatic approach
48Why is this a problem?
Cardona Marx, Physik Journal 2004
Berendt, in Neues Handbuch Hochschullehre, 2003
49- Build a tool that is
- user-friendly
- intelligent
- modular and extensible
50Berendt, Dingel, Hanser, Proc. ECDL 2006
51IR-THESIS System architecture
Text mining / Information Extraction tools
Web services
Databases (local a/o mirrored)
Web services
VBA macro
other WS and info. sources
52(No Transcript)
53Search and retrieval
54(No Transcript)
55(No Transcript)
56Organisation of the literature /bibliography
construction
57(No Transcript)
58Discussion
59(No Transcript)
60Writing
61Conclusions and outlook
- Semantics are often necessary to do mining at all
- Semantics often allow the analyst to make more
sense of the results - Semantic Web Mining is semi-automatic ?
interactive tools! - Standardisation can make the mining process more
automatic - Mining can help to generate semantics
- To what extent are further user and context
modelling useful a/o necessary for valid
conclusions (intentions, goals, constraints, )? - How can we encourage standards?
- When are explicit (formal) semantics better, when
implicit semantics? - How can we move beyond the Web (ubiquitous
environments)? - How can privacy be protected in a data-rich and
mining-rich world? (Are privacy semantics à la
P3P a solution?) - What do users want? What about other
stakeholders? Whom and what and how to ask?
62Thank you for your attention!
63Discussion points 1 Is reference markup
ontological / Semantic Web?
- DiML (Dissertation Markup Language), used in the
case study above, is approximately structured
like Bibtex (with the difference that the type of
publication is an attribute, so there is only one
top-level concept citation). This makes it
comparable also to Dublin Core. The system in
ist latest versions also contains mapping to DC
and other commonly used schemata. - This makes it indeed an extremely primitive
ontology (essentially, a concept hierarchy with
one concept, publication with attributes with
literals as value range author, title, etc.). - Extensions to make this really semantic include
(some are part of our current work) - Author, affiliation, etc. as concepts with
instances, as in Repec.org ? introduces relations
like is-author-of - Unique identifiers of publications that allow the
detection of duplicates, as in Citeseer - Links to libraries, as in OpenURL
- Versioning and other interesting relations
between different publications (cf. The Dublin
Core element relation)
64Discussion point 2 Can folksonomies be used
instead of ontologies? (1)
- This is a difficult question, not least because
it is still unclear what exactly tags are - an object-level summary and thus more content, or
- a truly meta-level classification which comes
from a set of labels that is categorically
different from just more content words ? - In the following, I use the second
interpretation. I refer to folksonomy tags as
"concepts" because a folksonomy can formally be
regarded as an extremely simple ontology a set
of concepts with no hierarchical or other
relations between them. - The answer to the question in the title of this
slide depends on the aspect of folksonomies one
is most interested in, and how important one
thinks certain properties of ontologies.
65Discussion point 2 Can folksonomies be used
instead of ontologies? (2)
- The answer tends to be YES when one focuses on
- WHO DEFINES THE CONCEPTS
- All ontologies used in the case studies shown
were based on or extended popular models and/or
ontologies in the domain of investigation - search in the educational portal models of
information search from information science - shopping models of the customer buying process
from marketing - shopping with bot assistance the same our
design of questions, developed in conjunction
with a major German retailer - search in the medical portal like search in the
educational portal plus the medical ICD-9, the
International Classification of Diseases
DiML/DC). - But in fact, none of the ontologies used in the
case studies here was a "standard" in the sense
that many people agree on it and many
applications use it - in fact, there are precious
few such standard behaviour models! - In that sense, the ontologies used here are, like
much of the Semantic Web work, just one
possibility proposed by a number of people (the
research group application partners), instead
of the result of a standardisation effort. - IN FOLKSONOMY-STYLE TAGGING, A RESOURCE USUALLY
HAS MORE THAN ONE TAG - Any set of concepts that a group agrees on can be
used. - In SWUM (Semantic Web Usage Mining), Web pages
are mapped to single concepts (ex. slides 22ff.)
or sets of concepts (ex. slide 21). This set of
concepts could also be a tag set as in
del.icio.us.
66Discussion point 2 Can folksonomies be used
instead of ontologies? (3)
- The answer tends to be MAYBE when one focuses on
- DYNAMICS introduce a non-stability of the
mapping, which means that the patterns would
change "depending on how you look at them" -
which may or may not be desirable - My opinion This quickly becomes untractable,
thus an ontology-based treatment of different
viewpoints and dynamics (? ontology evolution)
appears to be the better choice. - The answer tends to be NO when one focuses on
- FORMAL PROPERTIES
- HIERARCHIES generalization is an important
feature of many mining algorithms (unless you
abstract, you may not find any pattern. - (Non-hierarchical) RELATIONS
- In folksonomies, there are no relations on
concepts. Therefore, meaningful visualizations
become harder to produce (note that the
stratograms shown on slides 27 and 29 require
relations that induce a linear order on
concepts). - Also, all other inference possibilities are lost.
- COMPARABILITY The results of SWUM can only be
compared (e.g., conversion rates in one site with
those in another site) if stable and uniform
ontologies are used.
67Discussion point 3 Which of the techniques shown
in this talk are being used in industry and other
real-world sites? (1)
- Pre-remark 1 The contents of this talk was
(recent) research, thus it would be surprising to
see it already incorporated into industrial
practice. However, given that Web usage mining
has been around for a number of years, the
question is valid. - Pre-remark 2 Web usage mining is used on a large
scale by search engines. Google says it, Yahoo!
Says it. Both say they rely rather on
latent-semantic-indexing style semantics than on
Semantic-Web-style semantics (but they do use
lexica and other helpers) the boundaries are
fluid. Anyway, they dont say too much about the
details of their algorithms. After all, mining is
their business model ... - Anyway, we believe that SWUM is applicable to
analysing search when the focus is on what
services of a site(s interface) are used, not
when the content of searches is investigated (cf.
content vs. Service conceptual hierarchies in
Berendt Spiliopoulou, VLDB Journal 2000). Thus,
search engines are not the intended application
areas of our techniques, but retail, information,
e-Government, etc. sites. - The question should therefore be rephrased as 3
questions - Do off-the-shelf software packages (used by
end-user companies either on-site or in ASP mode,
i.e. without external consultants to do the
analyses) support Web usage mining, and
specifically Web usage mining with semantics? - The answer is Very partly.
- Do consultants offer SWUM analyses?
- The answer is partly.
- What are the likely reasons?
- A tentative answer is Perception problems and
lack of incentives.
68Discussion point 3 (2) Support in off-the-shelf
software basic forms of analysis
- Pageview counts and simple OLAP-type analyses
(hits by country, by language, etc.) are pretty
standard and supported even by most of the
simplest freeware products (e.g., Analog). Their
usage is very common in industry. - State-of-the-art commercial analysis software
like Webtrends allows a certain degree of
programming for extracting more attributes that
can be subjected to OLAP-type analyses (see below
for an example). - State-of-the-art software often also supports the
extraction of more information transferred via
Javascript. An example is Google Analytics. - Syntax is generally the only basis. Semantics
usually comes in only insofar as the Content
Management System used by most sites today
provides a certain frame of reference and
meaning.
69Discussion point 3 (3) Support in off-the-shelf
software Conversion rates
- Software generally also supports the definition
of simple templates from which conversion rates
can be computed automatically (e.g., a click on
page X with referrer Y, or after a sequence of
pages that started with referrer Y, is a
converted customer brought to us from the banner
shown on affiliated site S). - Conversion rates are not only extremely simple
(divide the number of sessions that reached X and
then Y by the number of sessions that reached X),
but also quite powerful Every success measure
that can be defined via reachability can be
cast a conversion rate. - The 3-click rule (every page must be reachable
with 3 clicks) is a related and equally
simple-to-compute measure. That a page is
reachable in 3 clicks can be computed from the
site graph, that it is reached can be computed
from frequent sequences. This only requires that
the tool can compute frequent contiguous
sequences, which is algorithmically simple and
requires little thinking on the part of the
analyst. - For conversion-rate computations, semantics
occurs in the simple sequence templates offered
by the tools, the mapping is gathered from the
users via Web forms or scripts. - Conversion rates are also related to pricing
models such as GoogleAds. - For a survey of software, see http//www.kdnuggets
.com/solutions/web-mining.html
70Discussion point 3 (4) Support in off-the-shelf
software possibilities and limitations /
example country language
- Language
- is usually defined as either the presentation
language (in a site with dynamic pages generated
by a content management system, this can easily
be extracted) - or the language (assumed to be) preferred by the
user (the browser setting, which in most cases is
likely to be the default with which the browser
is shipped). - Country is inferred from the IP address and an IP
? geo-coordinates mapping. Such mappings are
provided by software like Maxmind. This is
relatively reliable according to the producers
and according to a test we did (publication in
preparation). - To obtain the users native language, we inferred
it from the Geo-IP mapping and official data on
official languages in countries around the world.
In a small experimental sample in which we asked
users to specify their native language, we
obtained quite high accuracy (Kralisch Berendt,
NRHM 2005). - I do not know of data on the accuracy of the
browser setting ? native language mapping, or of
data comparing it to the Geo-IP approach we used.
- But only the combination presentation language
users native language gives information about
whether a user accesses content in his/her native
language or in a foreign language and this
knowledge may be much more important for
personalization than presentation language or
preferred language alone (see Kralisch, Ph.D.
dissertation 2006, http//edoc.hu-berlin.de/docvie
ws/abstract.php?id27410) - Nonetheless, even the semantics of presentation
language / user language are to my knowledge
not utilized in off-the-shelf software. One
reason is that the awareness of the importance of
language in Internet design has only begun.
71Discussion point 3 (5) Consultancy companies
- More advanced forms of conversion-rate analysis,
which rely on (some) semantics, have been
introduced or popularized by consultancy
companies. - Examples
- NetGenesis (Cutler Sterne) E-Metrics White
Paper, 2000, http//www.emetrics.org/articles/whit
epaper.html - The funnelmetrics introduced there are now also
offered, for example, by Google Analytics - http//www.google.com/analytics/feature_funnel.ht
ml - Accenture (R. Ghani), Mining the Web to add
semantics to retail data mining, in Berendt et
al., Web Mining From Web to Semantic Web (2004).
- survey by Anand et al., On the deployment of Web
usage mining, ibid. - Unfortunately, publicly available data on Web
usage are usually at a very high level of
aggregation and (also for this reason) build on
essentially non-semantic analysis types, e.g. - http//www.nielsen-netratings.com/resources.jsp?s
ectionpr_netvnav1
72Discussion point 3 (6) Likely reasons
- One major problem is a divergence between the
(current or definitional?) nature of data mining
/ knowledge discovery on the one hand, and
business expectations on the other - KD is still more an art than an engineering
process, with few standards even for process. - Business often expects data mining to be a set of
fully automatic, pre-packaged black-box
solutions. - The CRISP-DM process model shown on slide 16 ,
for example, is a very high-level attempt at
standardisation which leaves many details open. - In fact, it can be (and often is) argued that
the search for interesting and novel patterns
through exploratory data analysis by definition
involves hand-crafting. Going back to the
original definition of data mining (see slide
13), one could argue that looking for the values
of pre-defined pattern templates (e.g.,
conversion rates) is the antithesis of novel
patterns and thus by definition not data mining.
- On the other hand, Web usage mining is
essentially market research a study of user /
consumer behaviour. Market research is an
established discipline in which it is quite
accepted that methods involve human intervention
and interpretation rather than the automatic
application of pre-packaged procedures (one
example is the focus-group method).
73Discussion point 3 (7) Likely reasons contd.
- Maybe this is a perception problem While it is
clear that consumer opinions bear a strong
qualitative element (such that focus groups
cannot be prepared, administered and interpreted
by a machine only), data mining carries the image
of number crunching (implying that computers are
the main actors here). - In line with this, the responsible people often
have disjoint qualifications The market research
people have a strong background in the relevant
social-science methods the IT people (who are
expected to do the data mining on the side) can
use tools, but usually have limited knowledge
about empirical methods in general or data mining
in particular. - This point was discussed at a panel at the WebKDD
workshop at SIGKDD 2005 one result was that the
job description Chief Data Officer ( a
senior-management person with resources who knows
about data mining in the sense of data analysis
AND computers) was a really recent invention. In
the meantime, data-mining consultancies filled
the gap (but had to convince companies they were
worth it). - Or it is a problem of lacking standards (once we
have behaviour models of retail sites, of
education sites, etc., we can pre-package these
behaviour ontologies and even compare sites). - Standards (in behaviour modeling) require that
there is an interest in what the behaviour models
say, and an interest in being comparable to other
sites. Encouraging developments in this direction
can currently be observed in the digital
libraries community. - ... to be continued ...
74Thank you for your questions!