Title: Organization of the Web
1Organization of the Web
T.B. RajashekarNational Centre for Science
Information (NCSI)Indian Institute of
ScienceBangalore - 560 012(E-Mail
raja_at_ncsi.iisc.ernet.in)
2Why study organization of the Web?
- Using the Web is analogues to using libraries and
electronic databases - Libraries
- How the documents are organised document types,
classification system used - Access tools catalogues, indexes, automated
catalogues, access points - Our information need (search topic) translate
these in terms of organization scheme employed by
the library - Information Retrieval systems (e.g. electronic
databases) - How the database is organised, record content,
fields, search elements - Indexing and query language, thesaurus, Boolean
logic, truncation, etc. - Our information need formulated as a search
expression using the query language
3Organization of the Web
- Web servers and browsers
- Web servers store variety of web compatible
documents and provide access to these on the
Internet or an intranet - PCs, RISC-based workstations/servers
- These documents are accessed using Web browsers
like Netscape and IE - Palm tops, Laptops, PCs, workstations, etc.
- Web sites and URL
- One or more web servers identified with a unique
web site address on the Internet (e.g.
www.iisc.ernet.in) - Documents available on a Web site are uniquely
identified using the URL scheme access
protocol//host.domain port/path/file
name(Ex. http//www.ncsi.iisc.ernet.in/ncsi/data
base.html)
4Organization of the Web...
- Anatomy of a web site
- Hardware, software (OS, web server, CGI,
database, indexing and search, etc.) - Dedicated Internet connectivity
- Information content Documents stored in variety
of formats (HTML, SGML,PDF, databases, images,
audio, video, etc.) - HTML pages integrate access to this information
- Organised in a hierarchical manner
- Home page (root page) provides links to second
level HTML pages which in turn link to third
level HTML pages, and so on - These pages may contain images and provide access
to databases through search forms, PDF files,
audio and video, etc. or link to documents on
other servers
5Organization of the Web...
6Organization of the Web...
The World Wide Web is non-linear. There is no
top, there is no bottom. Non-linear means you do
not have to follow a hierarchical path to
information resources.
You can jump from one link (resource) toanother
You can go directly to a resource if you know the
Uniform Resource Locator (URL) (its address)
You can even jump to specific parts of a document.
7Organization of the Web...
- Structure of a web page
- Title (viewable in the title bar of the browser)
- Meta tags (not viewable)
- Description
- Keywords
- Author
- Creation/ modification dates
- Body (viewable content)
- Text, links, etc.
- Page attributes name, date of creation/modificati
on, size
8Organization of the Web...
- Web is the totality of web pages stored on web
servers - Spectacular growth in web-based information
sources and services - Education and research
- Entertainment
- Business and commerce
- Personal home pages
- Estimated to contain over 1 billion documents
- Doubling each year
- Over 80 million web sites
9Types of Web Sites
- Shopping sites
- www.amazon.com (book store)
- www.garden.com (gardening)
- www.indiabookshop.com (book store)
- www.shopperstop.com (clothes, etc.)
- Community sites
- www.acs.org (American Chemical Society)
- www.indiatimes.com
- www.gardencityinfo.com (Bangalore)
- www.nse-india.com (stock traders)
10Types of Web Sites
- Entertainment sites
- disney.go.com (Walt Disney)
- www.starwars.com
- www.carnaticmusic.com
- www.indiatalkies.com
- Identity sites
- www.ibm.com
- www.wipro.com
- www.iisc.ernet.in
11Types of Web Sites
- Learning sites
- www.digitalthink.com (web-based training in IT)
- www.netskills.ac.uk (Internet training)
- www.nationalgeographic.com
- Information sites
- www.computers.com
- www.eb.com (Encyclopaedia Britannica)
- www.thomasregister.com
- www.timesofindia.com
12Types of Web Sites
- Govt. web sites
- Combine identity and information functions
- Public domain, unrestricted access
- Disseminate information about plans, policies,
projects, people, facilities, performance,
technologies, etc. - Receive feedback from the public
- Bring openness, transparency in its operations
- Constitutional obligation
13Accessing Web-based Information Key Problems
- Identification of sources (documents)
- No central card catalog
- Most web pages are not indexed in standard
vocabulary, unlike library catalogues or journal
article indexes - Impossible to reach all related pages/ sites
directly - Need to use intermediate, resource finding tools