Proxy Servers - PowerPoint PPT Presentation

About This Presentation
Title:

Proxy Servers

Description:

Proxy Servers What Is a Proxy Server? Intermediary server between clients and the actual server Proxy processes request Proxy processes response Intranet proxy may ... – PowerPoint PPT presentation

Number of Views:422
Avg rating:3.0/5.0
Slides: 43
Provided by: CarolynW157
Category:
Tags: proxy | servers

less

Transcript and Presenter's Notes

Title: Proxy Servers


1
Proxy Servers
2
What Is a Proxy Server?
  • Intermediary server between clients and the
    actual server
  • Proxy processes request
  • Proxy processes response
  • Intranet proxy may restrict all outbound/inbound
    requests the intranet server

3
What Does a Proxy Server Do?
  • Between client and server
  • Receives the client request
  • Decides if request will go on to the server
  • May have cache may respond from cache
  • Acts as the client with respect to the server
  • Uses one of its own IP addresses to get page
    from server

4
Usual Uses for Proxies
  • Firewalls
  • Employee web use control (email etc.)
  • Web content filtering (kids)
  • Black lists (sites not allowed)
  • White lists (sites allowed)
  • Keyword filtering of page content

5
User Perspective
  • Proxy is invisible to the client
  • IP address of proxy is the one used or the
    browser is configured to go there
  • Speed up retrieval if using caching
  • Can implement profiles or personalization

6
Main Proxy Functions
  • Caching
  • Firewall
  • Filtering
  • Logging

7
Web Cache Proxy
  • Our concern is not with browser cache!
  • Store frequently used pages at proxy rather than
    request the server to find or create again
  • Why?
  • Reduce latency faster to get from proxy so
    makes the server seem more responsive
  • Reduce traffic reduces traffic to actual server

8
Proxy Caches
  • Proxy cache serves hundreds/thousands of users
  • Corporate and intranets often use
  • Most popular requests are generated only once
  • Good news
  • Proxy cache hit rates often hit 50
  • Bad news
  • Stale content (stock quotes)

9
How Does a Web Cache Work?
  • Set of rules in either or both
  • Proxy admin
  • HTTP header

10
Dont Cache Rules
  • HTTP header
  • Cache-control max-agexxx, must-revalidate
  • Expires date
  • Last-modified date
  • Pragma no-cache (doesnt always work!)
  • Object is authenticated or secure
  • Fails proxy filter rules
  • URL
  • Meta data
  • MIME type
  • Contents

11
Getting From Cache
  • Use cache copy if it is fresh
  • Within date constraint
  • Used recently and modified date is not recent

12
2. Firewalls
  • Proxies for security protection
  • More on this later

13
3. Filtering at the Proxy
  1. URL lists (black and white lists)
  2. Meta data
  3. Content filters

14
Filtering
label base
Web doc
URL lists
keywords
URLs
ratings
URLs
ratings
15
The Problem the Web
  • 1 billion documents (April 2000)
  • Average query is 2 words (e.g., Sara name)
  • Continual growth
  • Balance global indexing and access and
    unintentional access to inappropriate material

16
Filtering Application Types
  • Proxies
  • Black lists
  • White lists
  • Keyword profiles
  • Labels

17
Black and White Lists
  • Black list URLs proxy will not access
  • White list URLs proxy will allow access

18
How Is Filtering/selection Done?
  • Build a profile of preferences
  • Match input against the profile using rules

19
Black and White Lists
  • Black list of URLs
  • No access allowed
  • White list of URLs
  • Access permitted

20
Lists in Action
  • 1 billion documents!
  • Who builds the lists
  • Who updates them
  • Frequency of updates

21
Labels
  • Metadata tags
  • Rule driven PICS rules for example
  • Labels are part of document or separate
  • Separate label bureau

22
Labels
  • Metadata (goes with page)
  • Label Bureau (stored separately from page)

23
Meta Data as part of HTML doc
  • ltHTMLgt
  • ltHEADgt
  • ltMETA
  • HTTP-EQUIVkeywords CONTENTfederalgt
  • ltMETA
  • HTTP-EQUIVkeywords
  • CONTENTtaxgt
  • lt/HEADgt
  • lt/HTMLgt
  • Browser and/or proxy interpret the metadata

24
Metadata Apart From Doc
  • Label bureaus
  • Request for a doc is also a request for labels
    from one or more label bureaus
  • Who makes the labels
  • Text analysis
  • Community of users
  • Creator of document

25
Labels Collaborative Filtering
Search Engine
Label Bureau B
Labels
Author Labels
Label Bureau A
Web Site
Rating Service
26
PICS and PICS Rules
  • Tools for communities to use profiles and
    control/direct access
  • Structure designed by W3 consortium
  • Content designed by communities of users

27
PICS Rating Data
  • (PICS1-1 http//www.abc.org/r1.5
  • by John Doe
  • labels on 1998.11.05
  • until 2000.11.01
  • for http//www.xyz.com/new.html
  • ratings (violence 2 blood 1 language 4)
  • )

28
Using a URL List Filtering
  • (PicsRule-1.1
  • (Policy (RejectByURL (http//www.xyz.com/)
  • Policy (AcceptIf otherwise)
  • )
  • )

29
Using the PICS Data
  • (PicsRule-1.1
  • (serviceinfo (
  • http//www.lablist.org/ratings/v1.html
  • shortname PTA
  • bureauURL http//www.lablist.org/ratings
  • UseEmbedded N
  • )
  • Policy (RejectIf ((PTA.violence gt3) or
    (PTA.language gt2)))
  • Policy (AcceptIf otherwise)
  • )
  • )

30
Example Medical PICS labels
  • Su UMLS vocab word 0-9999999
  • Aud- audience 1-patient, 3-para, 5-GP, etc.
  • Ty-information type 5-scientist, 3-patient,
    4-prod
  • C-country 1-Can, 4-Afghan, etc.
  • Etc.
  • Ratings(su 0019186 aud 35 Ty 3 C 1)

31
User Profiles for Labels
  • Rules for interpreting ratings
  • Based on
  • User preferences
  • User access privileges
  • Who keeps these
  • Who updates these
  • How fine is the granularity

32
Labels and Digital Signatures
  • Labels can also be used to carry digital
  • Signature and authority information

33
Example
  • (''byKey'' ((''N'' ''aba21241241'')
  • (''E'' ''abcdefghijklmnop'')))
  • (''on'' ''1996.12.02T2220-0000'')
  • (''SigCrypto'' ''aba1241241''))
  • (''Signature'' ''http//www.w3.org/TR/1998/REC-DS
    ig-label/DSS-1_0''
  • (''ByName'' ''plipp_at_iaik.tu-graz.ac.at'')
  • (''on'' ''1996.12.02T2220-0000'')
  • (''SigCrypto'' ((''R'' ''aba124124156'')
  • (''S'' ''casdfkl3r489'')))
    ))

34
Proxy level (hidden)
35
Text analysis of Page content
  • Proxy examines text of page before showing it
  • Generally keyword based
  • Profile of black and/or white keywords

36
Profiles for Text analysis
  • Keywords ( weights sometimes)
  • Reflect interest of user or user group
  • May be used to eliminate pages
  • All but
  • May be used to select pages
  • Only those

37
Keyword matching algorithms
  1. Extract keywords
  2. Eliminate noisy words with stop list (1/3)
  3. Stem (computer compute computation)
  4. Match to profile
  5. Evaluate value of match
  6. Check against a threshold for match
  7. Show or throw!

38
Stop List (35)
  • the for
  • of on
  • and is
  • to with
  • in by
  • a as
  • be this
  • will are
  • from that
  • or at
  • been an
  • was were
  • have has
  • it
  • (27 words)

39
Matching Profile to Page
  • Similarity?
  • How many profile terms occur in doc?
  • How often?
  • How many docs does term occur in?
  • How important is the term to the profile?

40
Cosine Similarity Measurement
  • Profile terms weighted PW (0,1) ? importance
  • Document terms weighted TW (0,1)
  • frequency in doc
  • frequency in whole set
  • Overall closeness of doc to profile
  • ?(all profile terms)TW PW
  • --------------------------------------------
  • ?(?(all profile terms)TW2PW2)

41
What works well?
Nothing
42
Whats the problem?
  • Site Labels
  • Who does them?
  • Are they authentic?
  • Has the source changed?
  • A billion docs?
  • Black and White lists
  • Ditto
  • Text analysis of page contents
  • Poor results
Write a Comment
User Comments (0)
About PowerShow.com