Week 1 You'll always find what you want - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Week 1 You'll always find what you want

Description:

... the structure of the web are still the same regardless of the commercial twist. ... Moreover, it locates the text most relevant to your specific query and ... – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 52
Provided by: johnn54
Category:
Tags: always | find | text | twist | want | week

less

Transcript and Presenter's Notes

Title: Week 1 You'll always find what you want


1
Week 1 You'll always find what you want
2
A deep and uncharted webSpace
  • The web is huge
  • The problem is that time and space have a
    different meaning on the web.
  • Everything that 'happens' is carved forever
  • try to pull something 'off the web'
  • Everything you write and publish will defy
    eternity, carved in electrons the very moment
    you put something on the web, someone, somewhere,
    will make a copy out of it.
  • It is bound to reappear, somewhere sometime
    indestructible and redoutable powers of the void.

3
A deep and uncharted webTime
  • Time is different too.
  • As anyone that has real web-experience knows,
    something you wrote, or published, remains
    unanswered - and apparently uncared of - for
    months, or years... and then, all of a sudden,
    when you almost forgot it yourself, a dozen
    persons begin contacting you out of the void,
    with an enormous and for you inexplicable
    interest in what you wrote so long ago.

4
(No Transcript)
5
Structure
  • Four main different areas
  • Core
  • Hidden databases
  • Outside linkers
  • Outside linked
  • Different techniques are used to access these
    different areas.

6
Hidden databases
  • Hidden databases.
  • These are pages that the Nucleus points
  • May (or may not) point back to the Nucleus.
  • For access-restrictive reasons visitors of sites
    located here are supposed to "pay" in order to
    access them. As you may imagine, these pages are
    NOT mutually linked.
  • Fortunately the web was originally built in order
    to share knowledge.
  • The building blocks, the "basic frames" behind
    the structure of the web are still the same
    regardless of the commercial twist.

7
Hidden databases
  • If I may dare a comparison exactly as it is
    pretty easy to break any software protection
    written in a higher language if you know (and
    use) assembly, so it is easy to break any
    server-user delivered barrier to a given database
    if you know (and can outflank) the protocols used
    by browsers and servers.
  • As a result let's simply say that for some it is
    relatively easy to access all pages in this area
    reversing the (simple) perl or javascript tricks
    used to keep them "off limits" .
  • (You wont even have to recurr to common exploits
    à la "politically correct" -)

8
Outside linked
  • The "outside linked".
  • The sites in this area are linked from the
    Nucleus but do not point back to it.
  • For instance, the elements of a database of
    images, linked from the Nucleus but not
    necessarily pointing back to it.
  • These pages are "outside" the nucleus, yet not
    particularly difficult to find.

9
Outside linkers
  • Like matter and anti-matter the "outside linked"
    correspond an inversed related part of the web
    the "outside linkers" pages.
  • Indeed all the pages located in this specific
    area of the web do "point" to the Nucleus but are
    not pointed back from it.
  • Imagine as an example the personal links page of
    a scientist lotta interesting links to the
    Nucleus yet no need to publicize its existence.
  • A page with information you may need is there,
    somewhere, without any link whatsoever that could
    bring you to it. Indeed there are no links back
    from the Nucleus to these pages.

10
Outside linkers
  • The "outside linkers" are a part of the web you
    cannot reach using "normal" search techniques,
    since no link whatsoever points to them.
  • Yet they may hoard knowledge you need. There are,
    fortunately, some techniques that you can apply
    in order to find them, the most simple and common
    one being 'klebing'
  • Klebing is using the information found inside the
    referral fields of site loggings and statistics
    when your target visits your site. The trick is
    to lure your potential targets to an interesting
    page you create and keep it updated until they
    land there and unsuspectingly leave an entry in
    your web log or stat server.

11
The Triad
12
Why Teoma
  • Because you can refine, refine, refine... and
    that's it!
  • This is NOT just a simple "search within these
    results" thingy
  • Best choice for starting a query on BROAD topics.
  • Teoma is resistant to index spamming (a huge
    problem for Google).
  • Teoma eliminated all advertisement banners and
    interstitials (popups) in January 2003
  • While Google's 'global linking' gives credit to
    every link equally, Teoma (should) instead find
    'the links that count'... and count them.
  • Teoma "creates on the fly clusters of web pages
    into topic specific web communities"

13
Why Google
  • With google you can forget hyperlinks just find
    pages selecting a set of very peculiar words that
    uniquely identify a given page.
  • Google searches inside .pdf, .doc, etc. files.
    Moreover, it locates the text most relevant to
    your specific query and highlights your keywords
    and its context

14
Why Fast
  • Because it is fast.
  • Because it covers parts of the web that are not
    covered by google.
  • Because it is less polluted by the useless
    commercial sites. Because it mines the "deep" and
    "obscure" web more than google or teoma.
  • Because it is less censored than google. Because
    it is simply the best main engine for multi-words
    complex searches.

15
Searching
  • 1) LOWERCASEAlways enter your search terms in
    lower case (unless you want to limit your
    search). Most search engine will thus find both
    upper and lower case occurrences of your
    searchstring. "pAris" is NOT the same as "paris"

16
Searching
  • 2) EXACT SEQUENCE ""Enclose terms in double
    quotation marks if you want to retrieve those
    exact terms in that exact sequence. This may be
    very useful in order to find a specific page.
    Thus "saerch engine" will retrieve some (11)
    pages WITH THIS SAME MISSPELLING ERROR.

17
Searching
  • 3) NARROW DOWN  AND      and ELIMINATE
    MERCILESSY  AND NOT    - Narrow your
    searches by linking your search terms with AND or
    , or simply use the plus sign . The search
    engine will find only those pages that contain
    all of your search terms. Similarly, exclude
    pages that are not relevant to your search by
    preceding the search term with AND NOT or or
    simply use the minus sign -. "search engines"
    hints tips techniques -tits -sex -"make money"
    is better than the more simple "search engines"
    hints tips techniques.

18
Searching
  • 4) DOWNSIDE OF THE BOOLEAN operatorsIt's often
    difficult to specify exactly what you want to
    include or exclude. You can also get unexpected
    results if you are not careful about your use of
    operators and parentheses. For example, the
    search seeking OR searching AND finding is the
    same as the search seeking OR (searching AND
    finding). Both queries will find documents that
    contain both searching and finding, together with
    documents that contain the word seeking. However,
    the query (seeking OR searching) AND finding is
    not the same. It will find documents containing
    the word finding and, in the same document,
    either seeking or searching. Be careful with the
    boolean operators!

19
Searching
  • 5) "PECULIAR" stringsYou should always strive to
    use differentiating keywords when searching the
    web. Words that are commonly used will not help
    you much. Extremely common words like articles
    and prepositions are so worthless that they are
    completely ignored. Try to use words which
    underline the peculiarity of your target. Common
    words, when combined with boolean qualifiers, can
    be very effective. You must identify the main
    concepts in your topic and determine any
    synonyms, alternate spellings, or variant word
    forms for the concepts. Remember that the most
    "peculiar" a word, the more useful it will be in
    order to sharpen your search. title"search
    strateg" hints tipsin this case we did
    include the "search strateg" string (which
    already has an elevate PEC) in the title
    keyword.

20
Searching
  • 6) ASTERISKNote also the use of the asterisk
    in the previous example it MUST be used
    after at least 3 characters, it is valid for up
    to 5 characters or as an element of a phrase.For
    Altavista
  • Asterisk () After 3 specified characters will
    search for matches in up to 5 trailing letters.
  • Question Mark (?) After 3 specified characters
    will match exactly one more character.
  • Double Asterisk () More flexible as it will
    search for matches for an unlimited number of
    trailing characters.
  • You also have the ability use the wildcards
    interchangeably and more than once in the same
    search string

21
Searching
  • 7) STOP WORDSStop words are words such as "and"
    "the" and "or" which search engines exclude from
    their searches to make them more effective. These
    terms are excluded because they are either
    extremely common or they are used by the search
    engine for performing more specialized searches.
    Just think about how many documents on the Web
    contain the word "the" and you'll understand how
    important is a good stop words list for all
    search engines.

22
Errors you encounter
  • 400 - Bad requestWhat does it mean?There's
    something wrong with the URL you typed. Maybe the
    server you're contacting doesn't recognize the
    document you're asking for, maybe it doesn't
    exist, or maybe you're not authorized to access
    it. What can you do about it?Check the URL. Pay
    special attention to uppercase and lowercase
    letters, colons, and slashes. Here's a tip one
    style convention many sites observe is to slap
    initial capital letters on directory names but
    not filenames. If you get this message
    repeatedly, maybe the note you copied the URL
    from mixed up its uppercase and lowercase.
  • 401 - UnauthorizedWhat does it mean?You're
    probably accessing a site that's protected and
    you're not on the host's preferred guest list or
    you typed the password incorrectly. Some sites
    also put a block on domain types--if you're not
    from a .gov or .edu domain, for example, you may
    not be able to gain access. What can you do
    about it?If you're sure you're allowed in, try
    again, and this time look at the keyboard when
    you type. Passwords are often case-sensitive, so
    if you've got your Caps Lock on, take it off. If
    you're trying to break in, we don't want to know,
    but the odds are stacked against you.
  • 403 - ForbiddenWhat does it mean?You may not be
    allowed to access this document, probably because
    it's either blocked to your domain or it's
    password-protected. What can you do about it?If
    you know the password, try again, carefully. If
    you don't know the password but think you're
    eligible for one, contact the site's Webmaster
    and ask for it.
  • 404 - Not found What does it mean?The server
    that hosts the site can't find the HTML document
    at the end of the URL. It may be a simple case of
    a mistyped URL, but it may also mean that the
    document doesn't exist anymore. What can you do
    about it?Try going one level up (deleting the
    last part of the URL to the nearest slash) to see
    if the site is live. If it is, check if there are
    links to the document you're looking for. Failing
    that, delete the last slash and type .html (or
    shtml) instead, and see what that gives you.
  • 503 - Service unavailable What does it
    mean?There are a variety of possibilities your
    access provider's server may be down, your
    company's gateway (the connection between the LAN
    and the Internet) may be broken, or your own
    system isn't working. What can you do about
    it?This is usually an easy one wait a minute
    and try again. If the error persists, identify
    the culprit (access provider, gateway, or your
    system) by process of elimination.
  • Bad file requestWhat does it mean?Your browser
    supports forms complete with data-entry fields
    and drop-down lists, but not the form you're
    trying to access. Perhaps there's an error or
    unsupported feature in the form. What can you do
    about it?Send email to the Webmaster and try the
    form again some other day

23
Errors you encounter
  • Cannot add form submission result to bookmark
    list What does it mean?You've just entered a
    search request and tried to save the result as a
    bookmark. Though it may appear as a discrete
    address, the result isn't a legitimate URL, so
    you can't add it to your bookmark list. What can
    you do about it?Try saving the result page as an
    HTML page on your hard disk. Use the Save As
    command then add the saved page to your bookmark
    list. Depending on the CGI script behind the
    query, you may or may not be successful. But it's
    worth a try.
  • Connection refused by hostWhat does it mean?You
    may not be allowed to access this document,
    probably because it's either blocked to your
    domain or it's password-protected. What can you
    do about it?If you know the password, try again,
    carefully. If you don't know the password but
    think you're eligible for one, contact the site's
    Webmaster and ask for it.
  • Failed DNS lookupWhat does it mean?The domain
    name system can't translate the URL to a valid
    Internet address. This is either a harmless blip
    or the result of a mistyped URL (specifically, a
    mistyped host name). What can you do about
    it?Blips in DNS lookup are common, and often you
    can rectify this by clicking the Reload button.
    If that doesn't work, check your typing of the
    URL carefully. If the problem persists, try again
    after an hour or so.
  • File contains no dataWhat does it mean?The site
    you've accessed is the right one, but there are
    no Web page documents on it. You may have
    stumbled upon this site just as updated versions
    are being uploaded. What can you do about
    it?Try the URL again, carefully. If that doesn't
    help, try again in an hour.
  • Helper application not foundWhat does it
    mean?Your browser doesn't recognize a file at
    the Web or Net site you're visiting. Most
    browsers can be extended using helper
    applications (or viewers) to read files they
    don't otherwise recognize. These files aren't
    necessarily graphics--they can be sound files,
    movie clips, or ZIP or SIT archive files you're
    trying to download. What can you do about
    it?The dialog box that carries this message will
    usually give a clue about the file type that's
    missing. (You may see some gibberish about octet
    streams, but after that you'll probably see some
    reference to graphic-TIFF, which gives it away.)
    Look at CNET's Survival Kits for your computing
    platform (Mac, PC, or Unix) for viewers for the
    most common file types. Then follow your
    browser's instructions for assigning a viewer for
    each file format you wish to view online.
  • Host unavailableWhat does it mean?The machine
    that hosts this site is probably down for
    maintenance. What can you do about it?If at
    first you don't succeed, hit Refresh or Reload
    again and again. But wait a while between
    refreshes.

24
Errors you encounter
  • Host unknown What does it mean?The server may
    be down for maintenance, or you may have lost the
    connection (your modem disconnected, or your
    company's T1 line is choking). What can you do
    about it?Hit the Reload button first. This is
    often a blip in the Net. Then check the URL for
    typos (and don't forget case-sensitivity). Then
    make sure you're connected by hitting Reload,
    which will re-establish connections in many
    cases.
  • Network connection was refused by the serverWhat
    does it mean?The server is probably too busy to
    handle one more user, but it's not configured to
    generate its own message, so this generic message
    shows up instead. What can you do about it?As
    always, try and try again. If that doesn't work,
    wait as long as you can. Then try again.
  • NNTP server errorWhat does it mean?You're
    trying to log on to a Usenet newsgroup, but you
    can't get to it. The Usenet server is something
    that's made available by your Internet service
    provider, so it may be that this newsgroup isn't
    available at all. What can you do about it?Make
    sure you've typed the URL correctly. If that
    doesn't help, try again later. If the problem
    persists, contact your access provider and give
    them a piece of your mind.
  • Permission denied What does it mean?You're
    trying to upload a file to an ftp site, and the
    site's administrator doesn't want you to.
    Alternatively, you're using the wrong syntax when
    trying to get a file. Or maybe the site is
    currently too busy to handle your upload. What
    can you do about it?First check that you used
    the correct syntax. Then try again later. If the
    problem persists, send email to the Webmaster and
    ask how you can upload a file to that site.
  • Too many connections--try again later What does
    it mean?This is another variation on the
    rush-hour error message. You've picked the wrong
    time to call, that's all. What can you do about
    it?Do as it says--try again later, or keep
    hitting the Refresh button until you succeed.
  • Too many users What does it mean?No ftp site
    has unlimited access physical connections or
    administrator policy allocate a number of
    anonymous users to a given site. When that number
    is exceeded, all who try to log on receive this
    message. What can you do about it?Just keep
    trying until you get lucky. However, on a busy
    site (like Netscape's the week after a big
    announcement) or one with very limited access
    rights, you may be out of luck. If so, check to
    see whether the site has mirrors, and try one of
    those.
  • Unable to locate hostWhat does it mean?The
    server may be down for maintenance, or you may
    have lost the connection (your modem disconnected
    or your company's T1 line is choking). What can
    you do about it?Hit the Reload button first.
    This is often a blip in the Net. Check the URL
    for typos (and don't forget case-sensitivity),
    then make sure you're connected by hitting
    Reload, which will re-establish connections in
    many cases.

25
Errors you encounter
  • Unable to locate the serverWhat does it
    mean?You have either mistyped the URL, or the
    server doesn't exist (you may have outdated
    information). What can you do about it?Your
    mission, should you choose to accept it enter
    the URL again, looking at the keyboard as you
    type. No luck? Check with your source to verify
    that the URL is correct.
  • Viewer not found What does it mean?Your browser
    doesn't recognize a file at the Web or Net site
    you're visiting. Viewable files aren't
    necessarily graphics--they can be sound files,
    movie clips, ZIP or SIT archive files, and so on.
    If it's not a GIF or JPEG file, your browser may
    not know what it is. What can you do about
    it?The dialog box that carries this message will
    usually give a clue about the file type that's
    missing. (You may see some gibberish about octet
    streams, but after that you'll probably see some
    reference to graphic-TIFF, which gives it away.)
    Look at CNET's Survival Kits for your computing
    platform (Mac, PC, or Unix) for viewers for the
    most common file types. Then follow your
    browser's instructions for assigning a viewer for
    each file format you wish to view online.
  • You can't log on as an anonymous user What does
    it mean?This message covers a multitude of sins.
    Some ftp sites allow people who aren't members,
    some don't. Others may allow nonmembers, but
    limit the number of visitors. Another possibility
    is that your browser doesn't support anonymous
    ftp access. The way most browsers handle this is
    to submit "anonymous" as the user ID and your
    email address as the password. The America Online
    browser is one of the few that don't do this.
    What can you do about it?Either try again later
    after the rush hour or enter your user ID and
    password manually (using ftp software such as
    WS-FTP). Remember your ID is anonymous and your
    password is your (I hope for you bogus) email
    address.

26
Project 1Part II
  • Building the Zen of awareness

27
Cat burglars in the museum after dark
  • Our Target
  • http//gallica.bnf.fr/

"70000 documents numérisés, une navigation plus
intuitive, cette nouvelle version de Gallica
constitue la mise à jour la plus importante
depuis la création de ce serveur en octobre 1997.
28
http//gallica.bnf.fr/
29
Find Me In the Museum
30
Become One With Your Environment
  • Relax
  • If you focus you are trying too hard and you will
    miss what is happening.
  • Try for 15 minutes then take a break and do
    something else while your mind explores the
    problemlaterally
  • What kind of things is this browser loading?

31
An Outside Linker?Can you find it?
32
http//gallica.bnf.fr/Fonds_Mosaiques/
33
Hints to Find Me
  • 07720489
  • Scripts can be used to acquire the target

34
What to Submit
  • Either the photo (.jpg)
  • Or a short (900 word) essay on
  • How you think the museum is organized (diagrams
    would be helpful)
  • Why you were unable to find the picture
  • Your thoughts on the outside linker where is
    it why you cant find itwhat is used for
  • Other interesting observations you may have
    collected

35
Research
Research ?
36
The Search
p or photograph
37
The Search
Try a photo..
38
Become One With the Screen
Catalog number
Whats this?
39
This is where our picture is hiding
mediator.exe
Catalogue number
40
Lets see what's coming over the wire
41
What is mediator.exe?
42
Lead Technologies Inc.
LEAD Technologies Inc. v1.01?
Mediator.jpeg
Beginning of picture bits
Normal JPEG header information
Beginning of picture bits
Normal JPEG file format
43
LeadTools
Save the image
44
mediator.exe
The LEADTOOLS Image Server is an ISAPI extension
for IIS 4 and above and is perfect for web
administrators that have images located at a
central location that need to be converted and
processed based on each clients specific needs.
45
HEX Editor
  • The tool of choice for looking at binary files
  • The tool of choice for modifying unprotected
    binary files
  • We can modify the executable so it does what we
    want it to do without recompiling.
  • Common Features
  • Search
  • Dif
  • Modify

46
PE Format
Windows PE file format
47
swf Format
SWF format (flash files)
signature
version
file size
48
PDF Format
PDF format
49
Demo cust.exe
What are the options Buy Reset the system
clock Change the source code Modify the binary
The shareware has expired. We can only order or
exit the program
50
cust.exe
By changing the binary image we can Add new
functionality Eliminate existing
functionality Force changes in code flow
We have relied on the compiler to encrypt the
source. Is our program really secure?
51
cust.exe
By using only a HEX editor we have added new
functionality that allows us to continue
Write a Comment
User Comments (0)
About PowerShow.com