Title: Exploring the Invisible Web
1Exploring the Invisible Web
Kevin R. Morgan Ed.D Professor/Designer St
Petersburg College
2Introduction The Visible and Invisible Web
I readily believe there are more invisible than
visible things in the universe. (Burnett,
1692) motto in Coleridges The Rime of the
Ancient Mariner
There are vast amounts of information available
online, but even more information exists beyond
the grasp of the general search engine.
There is a much larger universe of invisible
information in databases and directories which
cant be accessed by general purpose search
engines, but it is nevertheless online, free, and
of the highest academic standards.
3The Information Age and the WWW
- The Internet Network to Knowledge
- A 21st century library of Alexandria
- A content of rich and interactive information
network
- Internet- information resources for teaching
and learning
Utilizing the WWW for research and learning
The World Wide Web is estimated to contain over
3 billion documents. (Barker, 2003)
.
The Invisible Web is estimated to be 2-50 more
times bigger than the visible web.
4What is the Invisible Web?
The Invisible Web is a metaphor used to
describe The vast depth or domain of information
that lies beyond the visibility of our tools for
gathering information.
It is not really invisible, just passed over or
missed. The Invisible Web includes the following
and more
- content that has been excluded from general
purpose search engines and Web directories.
- examples include databases from universities,
libraries, - organizations, businesses, and government
agencies.
- a substantial part of the total Internet.
5How Big is the Invisible Web ?
One study conducted by search company
BrightPlanet, estimated that the inaccessible
part of the web is about 500 times larger than
what search engines already provide access.
They estimated about 500 billion pages of
information available on the web, and only 1/500
of that information could be reached via general
search engines. (Sulivan, 2000)
More conservative estimates place the Invisible
Web at 2-50 times bigger than the visible web.
(Barker, 2003)
Even using the most conservative estimates, the
Invisible Web represents a considerable quantity
of information that lies beneath the surface of
the Web. It is deeper than we thought!
6I
Invisible Web 100-500 larger than Visible Web
6 billion- 30 billion documents ???
Visible Web 3 billion documents Captured in
General Purpose Search Engines
Government Directories
Educational Research
Library of Congress
Eric
Institutional Directories
Scientific Research
Specialized Search Engines and Directories
Colleges and Universities
Organizations Public/Private
7The Web Visible and Invisible Information
The visible Web is made up of HTML Web pages
that search engines have chosen to include in
their indices.
Google, Alta Vista, Look Smart and others general
purpose search engines all cover the surface of
the Web but are limited in going into the deeper
reaches of cyber space.
There is an even greater amount of invisible
information in databases which cant be directly
accessed by general purpose search engines, but
it is never the less online and freely available
to the savvy searcher.
8Search Engines Robots, Knowbots and Spiders
Search Engines do not really search the Web
directly.
Computer robot programs, referred to sometimes as
"crawlers" or "knowledge-bots" or "knowbots" are
used by search engines to roam the World Wide
Web.
Most large search engines operate several robots
or spiders all the time. Even so, the Web is so
enormous that it can take six months for spiders
to cover it, resulting in a degree of
"out-of-datedness" in results. (Barker 2003)
Spiders or crawlers are programmed to retrieve
general information by avoiding unfriendly or
dangerous URLs that can trap them in endless
loops of information or spider traps.
9Reasons for Invisibility of Some Pages
There are certain types of pages that search
engine companies routinely exclude by policy to
save time and money.
Some pages present technical barriers to web
crawlers and are passed over by general browsers
for time and efficiency.
For example, A spider or crawler will back off
when encountering a question mark (?) in a URL.
To save time and money, spiders are programmed
to avoid or exclude many sites, including
educational, Governmental, and organizational
databases.
10Visibility and Invisibility
Visible Web
Invisible Web
Educators Reference Desk
ERIC Database
The Library of Congress
Special Collections
a page has a ? in its URL
URLs ending in edu, org, gov
Institutions and Organizations Internal
directories
General Search Engines and Subject Gateways
It is very difficult to predict what sites will
or won't be part of the Invisible Web. As Search
Engines change their policies, what is invisible
today can become visible tomorrow. Many sites are
already hybrid- with both visible and invisible
components.
11The Value in Using the Invisible Web
Invisible Web resources offer the highest level
of authority as educational institutions and
government organizations maintain a high level
of quality control over their information.
Specialized search interfaces provide more
control over search input and output with
increased precision.
Comprehensive resources allow searchers to
perform exhaustive searches within a specific
subject area and keep up-to-date and current.
The search can yield exhaustive results of timely
content. Invisible Web databases have the most
current information available online as they are
updated often.
12Understanding the Invisible Web
The data found in the Invisible Web cannot be
accessed easily via general purpose search
engines.
The Invisible Web is not the sole solution to all
ones information needs. It should be used in
conjunction with other informational sources,
including general searches.
Invisible Web resources clearly identify who is
providing the information, making it easy to
judge the authority of the content and its
provider.
Targeted crawlers offer more comprehensive
coverage of their subjects than general purpose
search engines.
13Finding the Subject Databases and Directories
Much of the Invisible Web is made up of the
contents of thousands of specialized databases
accessible online.
Have a clear subject in mind to find the best
specialized databases for your subject of study
or field of research.
Many databases can be found by using the word,
database after a subject term, such as
humanities database or history database.
Another tip is to search using the words web
directory and then your topic. If a directory web
page refers to itself using the words "web
directory," you will locate it.
14Searching Tip Use Subject Gateways
Searching through subject databases and web
directories may be unfruitful for the novice
searcher or student. Many of These independent
searches can end in blocked access.
Problem Many of the databases are password
protected.
Solution An easier and more fruitful method for
finding databases relating to a specific subject
area is to use some of the gateway sites that
have already been organized by subject and
content.
These subject gateways are organized from general
and specific, enabling students, educators, and
researchers to finding valuable visible and
invisible sources on the Internet.
15Educational Gateways
- Infomine provides a gateway to scholarly
Internet resource collections
http//infomine.ucr.edu/
Academic Info also provides an educational
subject directory and subject gateways
http//www.academicinfo.net/
The Educators Reference Desk has become the new
access gateway to the ERIC databases
http//www.eduref.org/
The Alliance for Life-Long Learning offers online
classes from Stanford, Yale, and Oxford
Universities and provides a library of online
resources through its Academic Subject
directories that meet the highest academic
standards http//www.alllearn.org/er/directories
.cgi
16General Purpose Subject Gateways
- Use the Invisible Web Directory from Sherman and
Prices companion site to The Invisible Web
http//www.invisible-web.net/
See this multi-subject guide to specialized
search engines http//www.searchability.com/
Explore CompletePlanet to link to over 103,000
searchable databases and specialty search
engines http//www.completeplanet.com/
17Evaluating Invisible Web Resources
- The Librarians Index to the Internet provides an
annotated directory with cross-reference links to
both visible and invisible content
http//lii.org/
ResearchBuzz provides daily updates on search
engines, new software, browser technology Web
directories and databases http//researchbuzz.c
om
The Scout Report provide academics, researchers,
librarians, and the K-12 community with valuable
online information http//scout.cs.wisc.edu/ind
ex.php
The Internet Resources Newsletter is a monthly
newsletter for academics, students, scientists,
and social scientists http//www.hw.ac.uk/libWWW
/irn/irn.html
18References
Barker, (2003) Recommended Search Engines Table
of Features UC Berkley http//www.lib.berkeley.e
du/TeachingLib/Guides/Internet/SearchEngines.html
Sherman Price, (2001) The Invisible Web
Uncovering Information Sources Search Engines
Cant Find. CyberAge
Sullivan, (2000) Invisible Web Gets Deeper, The
Search Engine Report, August 2000.
http//searchenginewatch.com/sereport/article.php/
2162871
19Exploring the Invisible Web
Contact Information
Dr. Kevin R. Morgan St. Petersburg College
eCampus Seminole, Florida
morgank_at_spcollege.edu