FINDING THINGS THAT ARE HARD TO FIND - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

FINDING THINGS THAT ARE HARD TO FIND

Description:

Documents on the WEB that general search engines cannot or ... Crawlers, spiders: go out to find (7/24/365) new & changed sites; periodic, not for each query ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 29
Provided by: jgd2
Category:

less

Transcript and Presenter's Notes

Title: FINDING THINGS THAT ARE HARD TO FIND


1
FINDING THINGS THAT ARE HARD TO FIND
  • by Jozsef GEGES (Ph.D.)
  • Ovidius Co.Ltd.

2
What is Invisible WEB
  • Documents on the WEB that general search engines
    cannot or WILL not include in their indexes
  • You cannot find them by general search engines
  • Contains a vast amount of information
  • much of it authoritative, qualitative
  • much of it specialized
  • Information not structured

3
  • Web Characterization Project
  • Growing number of web sites (IP address, page,
    sub-sub pages)
  • In 2007 40 public, 24 private, 26 provisional
    sites
  • Public sites (2006)
  • 58 English, 7,5 German, 5-6 Japanese, 3-4
    each French, Spanish, Chinese, 2 each Italian,
    Dutch, 1 each Korean, Russian, Polish,
    Portuguese
  • Adult sites (2005) 4.1
  • IP address volatility - all sites (disappearance
    pattern)
  • 6.4 of sites in 2002 were also in 1998 34 in
    2002 by 2006 end

4
How people search ...
  • Using only one or two search engines
  • Using not more than two or three keywords
  • Focusing on not more than three hit lists
    (pages) app. first 50 hits

5
Why should this process be changed ...
  • Contents are very different
  • Location of information wide
  • Information not sructured enough no relations
    covered up
  • Search engines are not sofisticated specialized
    enough
  • No filtering on quality

6
Search engine techology
  • Crawlers, spiders go out to find (7/24/365)
  • new changed sites periodic, not for each query
  • Databases, caches
  • gather content could be submitted, bought
  • Indexing creating appropriate entries
  • various, mostly proprietary algorithms
  • Retrieval engine searching on basis of query
  • Interface gathers query, displays results
  • could be ordered by pay
  • Results hit list

7
Search engines differ
  • Substantial differences among search engines on
    each aspect
  • No engine covers more than 16 of WEB
  • Hard to discern compare coverage
  • National search engines own coverage
  • Tipical search engines own coverage
  • Many comprehensive sources independent of search
    engines

8
Looking for information
  • Meta search engines
  • Specialized engines
  • Portals and Reference sources
  • Libraries as web sources
  • Subject databases
  • Societies, organizations

9
Meta search engines
  • Search engines that cover search engines
  • Search.com meta engine of meta engines
  • Dogpile -results from a number of search engines
    Google, Yahoo! Search, Live Search, Ask.com,
    About, MIVA, LookSmart and more.Surfwax -gives
    statistics and text sources
  • Search Engines Worldwide
  • 211 countries, over 3105 engines
  • Search Engine Guide categorized by topic
    http//www.searchengineguide.com/pages/Health/

10
  • Vivisimo clusters results innovative They use a
    mathematical algorithm and deep linguistic
    knowledge to find relationships between search
    terms and bring them to light
  • Complete Planet Discover over 70,000 searchable
    databases and specialty search engines (Deep Web
    Directory)
  • IncyWincy Directory of resources
  • Webbrain results in tree structure fun to use.
    Web visually, so you can explore a dynamic
    picture of related information, instead of
    searching through long lists of text

11
Spec. engines catalogues
  • Cover general specific areas
  • Open Directory Project large edited catalogue
    of the web with 4,593,821 sites - 79,154 editors
    - over 590,000 categories (www.dmoz.org)
  • Nat. Acad. of Sciences of Belarus Interesting WWW
    sites about science.
  • BUBL LINK -selected Internet resources covering
    all academic subject areas UK
    (http//bubl.ac.uk/)
  • InfiniSource search in categories
    (www.infinisource.com)

12
  • Exist in many domains subjects
  • Psychcrawler American Psychological Association
  • web index for psychology
  • Entrez PubMed Nat. Library of Medicine
  • CiteSeer - NEC Research Centre
  • scientific literature, citations index - free
  • KIRKE - Katalog der Internetressourcen für die
    Klassische Philologie aus Erlangen
  • a variety of resources
  • Sch of Slavonic East European Studies,
    University College London
  • includes country resources
  • University of Michigan Document Centre
  • official documents from all over the world

13
Reference services
  • Reference services - several models
  • QA, directories, email answers etc.
  • Martindales Reference Desk
  • comprehensive, amazing also a health desk
  • Ask Jeeves!
  • most popular, commercial
  • Ask ERIC
  • education questions- email answers
  • Information Please almanac type questions
    QuestionPoint Library of Congress OCLC project
    for a global reference network
  • Virtual Reference Desk Library of Congress
  • compilation of web reference sites
  • LiveRef - maintained at Iowa State University a
    registry of real time digital reference services

14
Libraries as web sources
  • Academic libraries providing open collections
    services models vary
  • Semmelweis University libraries - big long term
    effort
  • University of California, Berkeley
  • a most elaborate effort together with Sun
    Corporation
  • Bibliothèque Nationale de France
  • includes virtual exhibitions, among others
  • Karolinska Univesrity Lib.

15
Virtual libraries on the Web
  • Libraries emerging only on the Web
  • Virtual Library
  • Switzerland, US, UK other countries oldest
    virtual library on the Web
  • Internet Public Library Michigan
  • also a long term effort
  • Librarians Index of the Internet
  • very popular and comprehensive
  • Academic Info Digital Library
  • many links to digital collections resources in
    various subjects
  • Gabriel http//www.theeuropeanlibrary.org/portal
    /index.html
  • Gateway to European National Libraries

16
Subjects databases
  • Many subject specific sites
  • rich often unique coverage services
  • different approaches requirements
  • Examples in health related domains
  • WebMDHealth WebMD Health, Medscape,
    MedicineNet, eMedicine, eMedicine Health, RxList,
    and The Heart.org. news, medical information
  • Mayo Clinic HealthOasis health advice

17
Societies, organizations
  • Great many rich sources for searching
  • differences in requirements, depth, richness
  • Examples from variety of organizations
  • Assoc. for Computing Machinery
  • Digital Library subscription or registration
  • US State Department
  • about the U.S other countries
  • FDA, APA, AHA etc.
  • Special Medical Subject web site

18
Language barriers
  • English still the major language
  • but declining, now slightly over 50
  • Multilingual retrieval search engines
  • Euroseek
  • searches in a number of languages
  • AlltheWeb aquisated by YAHOO! Already
  • results in 45 languages

19
Language barriers translations
  • A number of translation sites
  • machine aided i.e. plug in terms, phrases,
    sentences in one review in the other language
    (much better than nothing)
  • Free Translations from to English, 8 other
    languages
  • Google translate
  • Babel Fish from to English and 9 languages,
    translates URLs
  • Travlang great for travellers, but annoying
    commercials

20
Web keep updated
  • What is going on the Web? Some major sources of
    news and evaluations
  • Free Pint newsletter, articles, links
  • Internet Resources Newsletter UK based
  • ResearchBuzz daily updates many aspects
  • About.com Web Search tools, Web Search Forum
  • Resource Shelf newsletter with archive

21
Evaluations, ratings
  • Many sources evaluate web sites
  • The Scout Report
  • librarians BIBLE! Annotations. Comprehensive.
  • Medical Library Assoc. ten most useful sites
  • MLA user guide for health inf., recommendations
  • Web 100 commercial, user ratings, news
  • Evaluating web pages UC Berkeley
  • tutorial and guide

22
Archiving the web
  • Internet Archive a large undertaking
  • includes web archive lots more publicly
    available free
  • 10 billion web pages archived from 1996 to a few
    months ago
  • WaybackMachine search to look at old versions
    of web pages
  • But there is more. e.g.
  • Million Book Project
  • International Childrens Digital Library

23
Needed for Web searching
  • Knowledge competencies on
  • variety of web sources their organization
  • search engines
  • web search strategies
  • search dynamics, feedback
  • Keeping up up up
  • constant updates, changes, innovations
  • many domain/subject specific

24
Needed for professionals
  • Knowledge of SOURCES in area of interest
  • search engines not enough
  • not too helpful in finding these other sources
    structure hard to discern
  • Evaluation of sources
  • a key professional skill!
  • standard criteria Web criteria
  • authority accuracy currency (timeliness)
    objectivity coverage, persistence, usability

25
Needed competencies
  • Knowledge of users use
  • Knowledge of searching
  • Use of technology
  • Adaptability, flexibility
  • Integration with other resources
  • Teaching others
  • Constant learning update
  • keeping up, keeping up, keeping up

26
Nancy Clark RecommendationsFlorida Sate
University Library
27
Nancy Clark Recommendations II
28
Nancy Clark Recommendations III
29
  • If you keep these in your mind we DO believe
  • Paradise will be Regained
Write a Comment
User Comments (0)
About PowerShow.com