Title: Technical Developments Related to Quality Issues
1Technical Developments Related to Quality Issues
- Brian Kelly
- UK Web Focus
- UKOLN University of Bath
- Bath, BA2 7AY
- B.Kelly_at_ukoln.ac.uk
- http/www.ukoln.ac.uk/
- Contents
- Application-based Developments
- Protocol Developments
- Conclusions
UKOLN is funded by the British Library Research
and Innovation Centre, the Joint Information
Systems Committee of the Higher Education Funding
Councils, as well as by project funding from the
JISCs Electronic Libraries Programme and the
European Union. UKOLN also receives support from
the University of Bath where it is based.
1
2Application-Based Solutions
- Sophisticated search engines are being developed
- Google
- Large-scale search engine for the research
community (now commercial) - Clever
- IBM research project
- Direct Hit!
- Records how users make use of search engines
- Alexa
- Allows end users to vote on resources
2
3Google
- Google uses a "PageRank" technique - important
resources are pointed to from many sites and
important sites (e.g. Yahoo). - See ltURL http//www.google.com/gt
Following the link to the first hit
Search for Digital Libraries
3
4Clever
See ltURL http//www.almaden.ibm.com/cs/k53/cleve
r.htmlgt)
- Aims to find small set of documents the most
authoritative information on the requested
subject. - Uses a standard search engine to gather a "root
set" of pages matching the query. Next, adds all
pages pointing to or pointed to by the root set.
Thereafter, it uses only the links between these
pages to distill the best authorities and hubs.
AltaVista results include sites selling medical
services.
Distinct pages found using Clever
Clever finds the key Baseball sites.
4
5Direct Hit
- Direct Hit
- Integrated with search engines such as Yahoo
- Ranks results based on clicking profile from
other users of the search service
http//www.directhit.com/
Users searching for Dublin Core typically click
on links related to metadata. Therefore put
these at the top of the search results.
5
6Alexa
- Alexa
- Enables end users to "rate" site when surfing
- Includes access to related links
- Based on central archive of the web (see ltURL
http//www.archive.org/gt - See also Netscape's What's Related facility
http//www.alexa.com/
- Possibilities
- Signed votes
- Use Alexa model with UK database of resources
6
7Summary
- Good News
- New generation of experimental search engines are
being developed - Algorithms include
- Making use of link information
- Making use of end users input
- Collaborative bookmarks (cf FireFly - You like
"Sex" and "Drugs". So does he, and he also likes
"Rock'n'Roll") - But such techniques make use of "brute strength"
approach - Is there a more elegant solution?
7
8We Need Metadata!
- Web originally based on 3 architectural
components. - Metadata is the missing component.
The W3C is developing a machine-understandable
metadata framework which can automate a variety
of tasks (resource discovery, content filtering,
etc.)
8
9RDF
- RDF (Resource Description Framework)
- Provides a metadata framework ("machine
understandable metadata for the web") - Based on ideas from content rating (PICS),
resource discovery (Dublin Core), etc. - Based on a formal data model (direct label
graphs) - Applications include
- cataloging resources resource discovery
- intellectual property rights content rating
- digital signatures
- privacy
RDF Data Model
9
10Certificates
- Certificates can be provided for
- Services Users
- Code (Java, ActiveX)
- Certificate Authorities (CAs) can distribute
certificates - Global CAs (Verisign, Thawte)
- National CAs (Post Office, central University
body, British Library, etc) - Government legislation this session related to
digital signatures
10
11Certificates Within An Organisation
- Digital signatures will enable publishers (e.g.
Universities) to give an authoritative stamps to
digital resources
Staff and students can be given a certificate
which is used for authentication
Admissions
The CVCP could give certificates to Universities,
who would then be authorised to distribute
certificates within the university
Within the University, the Research Office and PR
Office can allocate legally-binding signatures to
authorised publications
11
12Developments for Gateways
- Quality information gateways
- Can make use of signed resources to help
cataloguing - Can provide input to sophisticated search engines
(similar to Google)
Signed gateway this gateway follows xx quality
conventions
A central organisation could give certificates to
approved information gateways
12
13Conclusions
- Automated Indexing
- AltaVista approach
- Comprehensive
- Junk indexed
- Too may hits
- Manual Indexing
- Subject Gateway approach
- Quality
- Value-added services
- Incomplete
- Expensive
- A Third Way
- Combination of automated and manual approaches
- Involvement from SBIG, author and end user
- Exciting possibilities
- Uncertainty of timescales and success
- Coordination required - political issues
(ownership of metadata, selling ads, etc.)
13