Exploring the Invisible Web - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Exploring the Invisible Web

Description:

purpose search engines, but it is nevertheless online, free, and of the highest ... As Search Engines change their policies, what is invisible today can become ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 20
Provided by: Leight4
Category:

less

Transcript and Presenter's Notes

Title: Exploring the Invisible Web


1
Exploring the Invisible Web
Kevin R. Morgan Ed.D Professor/Designer St
Petersburg College
2
Introduction The Visible and Invisible Web
I readily believe there are more invisible than
visible things in the universe. (Burnett,
1692) motto in Coleridges The Rime of the
Ancient Mariner
There are vast amounts of information available
online, but even more information exists beyond
the grasp of the general search engine.
There is a much larger universe of invisible
information in databases and directories which
cant be accessed by general purpose search
engines, but it is nevertheless online, free, and
of the highest academic standards.
3
The Information Age and the WWW
  • The Internet Network to Knowledge
  • A 21st century library of Alexandria
  • A content of rich and interactive information
    network
  • Internet- information resources for teaching
    and learning

Utilizing the WWW for research and learning
The World Wide Web is estimated to contain over
3 billion documents. (Barker, 2003)
.
The Invisible Web is estimated to be 2-50 more
times bigger than the visible web.
4
What is the Invisible Web?
The Invisible Web is a metaphor used to
describe The vast depth or domain of information
that lies beyond the visibility of our tools for
gathering information.
It is not really invisible, just passed over or
missed. The Invisible Web includes the following
and more
  • content that has been excluded from general
    purpose search engines and Web directories.
  • examples include databases from universities,
    libraries,
  • organizations, businesses, and government
    agencies.
  • a substantial part of the total Internet.

5
How Big is the Invisible Web ?
One study conducted by search company
BrightPlanet, estimated that the inaccessible
part of the web is about 500 times larger than
what search engines already provide access.
They estimated about 500 billion pages of
information available on the web, and only 1/500
of that information could be reached via general
search engines. (Sulivan, 2000)
More conservative estimates place the Invisible
Web at 2-50 times bigger than the visible web.
(Barker, 2003)
Even using the most conservative estimates, the
Invisible Web represents a considerable quantity
of information that lies beneath the surface of
the Web. It is deeper than we thought!
6
I
Invisible Web 100-500 larger than Visible Web
6 billion- 30 billion documents ???
Visible Web 3 billion documents Captured in
General Purpose Search Engines
Government Directories
Educational Research
Library of Congress
Eric
Institutional Directories
Scientific Research
Specialized Search Engines and Directories
Colleges and Universities
Organizations Public/Private
7
The Web Visible and Invisible Information
The visible Web is made up of HTML Web pages
that search engines have chosen to include in
their indices.
Google, Alta Vista, Look Smart and others general
purpose search engines all cover the surface of
the Web but are limited in going into the deeper
reaches of cyber space.
There is an even greater amount of invisible
information in databases which cant be directly
accessed by general purpose search engines, but
it is never the less online and freely available
to the savvy searcher.
8
Search Engines Robots, Knowbots and Spiders
Search Engines do not really search the Web
directly.
Computer robot programs, referred to sometimes as
"crawlers" or "knowledge-bots" or "knowbots" are
used by search engines to roam the World Wide
Web.
Most large search engines operate several robots
or spiders all the time. Even so, the Web is so
enormous that it can take six months for spiders
to cover it, resulting in a degree of
"out-of-datedness" in results. (Barker 2003)
Spiders or crawlers are programmed to retrieve
general information by avoiding unfriendly or
dangerous URLs that can trap them in endless
loops of information or spider traps.
9
Reasons for Invisibility of Some Pages
There are certain types of pages that search
engine companies routinely exclude by policy to
save time and money.
Some pages present technical barriers to web
crawlers and are passed over by general browsers
for time and efficiency.
For example, A spider or crawler will back off
when encountering a question mark (?) in a URL.
To save time and money, spiders are programmed
to avoid or exclude many sites, including
educational, Governmental, and organizational
databases.
10
Visibility and Invisibility
Visible Web
Invisible Web
Educators Reference Desk
ERIC Database
The Library of Congress
Special Collections
a page has a ? in its URL
URLs ending in edu, org, gov
Institutions and Organizations Internal
directories
General Search Engines and Subject Gateways
It is very difficult to predict what sites will
or won't be part of the Invisible Web. As Search
Engines change their policies, what is invisible
today can become visible tomorrow. Many sites are
already hybrid- with both visible and invisible
components.
11
The Value in Using the Invisible Web
Invisible Web resources offer the highest level
of authority as educational institutions and
government organizations maintain a high level
of quality control over their information.
Specialized search interfaces provide more
control over search input and output with
increased precision.
Comprehensive resources allow searchers to
perform exhaustive searches within a specific
subject area and keep up-to-date and current.
The search can yield exhaustive results of timely
content. Invisible Web databases have the most
current information available online as they are
updated often.
12
Understanding the Invisible Web
The data found in the Invisible Web cannot be
accessed easily via general purpose search
engines.
The Invisible Web is not the sole solution to all
ones information needs. It should be used in
conjunction with other informational sources,
including general searches.
Invisible Web resources clearly identify who is
providing the information, making it easy to
judge the authority of the content and its
provider.
Targeted crawlers offer more comprehensive
coverage of their subjects than general purpose
search engines.
13
Finding the Subject Databases and Directories
Much of the Invisible Web is made up of the
contents of thousands of specialized databases
accessible online.
Have a clear subject in mind to find the best
specialized databases for your subject of study
or field of research.
Many databases can be found by using the word,
database after a subject term, such as
humanities database or history database.
Another tip is to search using the words web
directory and then your topic. If a directory web
page refers to itself using the words "web
directory," you will locate it.
14
Searching Tip Use Subject Gateways
Searching through subject databases and web
directories may be unfruitful for the novice
searcher or student. Many of These independent
searches can end in blocked access.
Problem Many of the databases are password
protected.
Solution An easier and more fruitful method for
finding databases relating to a specific subject
area is to use some of the gateway sites that
have already been organized by subject and
content.
These subject gateways are organized from general
and specific, enabling students, educators, and
researchers to finding valuable visible and
invisible sources on the Internet.
15
Educational Gateways
  • Infomine provides a gateway to scholarly
    Internet resource collections
    http//infomine.ucr.edu/

Academic Info also provides an educational
subject directory and subject gateways
http//www.academicinfo.net/
The Educators Reference Desk has become the new
access gateway to the ERIC databases
http//www.eduref.org/
The Alliance for Life-Long Learning offers online
classes from Stanford, Yale, and Oxford
Universities and provides a library of online
resources through its Academic Subject
directories that meet the highest academic
standards http//www.alllearn.org/er/directories
.cgi
16
General Purpose Subject Gateways
  • Use the Invisible Web Directory from Sherman and
    Prices companion site to The Invisible Web
    http//www.invisible-web.net/

See this multi-subject guide to specialized
search engines http//www.searchability.com/
Explore CompletePlanet to link to over 103,000
searchable databases and specialty search
engines http//www.completeplanet.com/
17
Evaluating Invisible Web Resources
  • The Librarians Index to the Internet provides an
    annotated directory with cross-reference links to
    both visible and invisible content
    http//lii.org/

ResearchBuzz provides daily updates on search
engines, new software, browser technology Web
directories and databases http//researchbuzz.c
om
The Scout Report provide academics, researchers,
librarians, and the K-12 community with valuable
online information http//scout.cs.wisc.edu/ind
ex.php
The Internet Resources Newsletter is a monthly
newsletter for academics, students, scientists,
and social scientists http//www.hw.ac.uk/libWWW
/irn/irn.html
18
References
Barker, (2003) Recommended Search Engines Table
of Features UC Berkley http//www.lib.berkeley.e
du/TeachingLib/Guides/Internet/SearchEngines.html
Sherman Price, (2001) The Invisible Web
Uncovering Information Sources Search Engines
Cant Find. CyberAge
Sullivan, (2000) Invisible Web Gets Deeper, The
Search Engine Report, August 2000.
http//searchenginewatch.com/sereport/article.php/
2162871
19
Exploring the Invisible Web
Contact Information
Dr. Kevin R. Morgan St. Petersburg College
eCampus Seminole, Florida
morgank_at_spcollege.edu
Write a Comment
User Comments (0)
About PowerShow.com