Title: Electronic Commerce
1Electronic Commerce
- Fall 2004 Sept 21
- Guest Lecturer
- Prof. Michael Lesk
2Our lecturer
- Professor Lesk has worked in the area of search
engines since before it existed, and wrote much
of the code of the original SMART search engine
which has evovled into Google etc. - He is the author of Books, Bytes and Bucks, an
influential ext on Digital Libraries.
3Todays topics
- Search engines and how they work
- Markets Versus auctions who benefits?
4Search engines
- Search engines have three components
- The spider
- Crawls the web and gets all the pages it can find
- Uses technology such as the wget program in unix
- The indexer/engine
- Digests the pages
- Builds and index
- Finds pages matching your query
- The interface
- Page where you insert your queries
- Advertising space for sponsors and clients
5Economic Model
- Value lies in our ability to find information in
a library of several billion items - This is a service that most people will not pay
for - Service is supported by its marketing/advertising
function - Important to distinguish between honest placement
of ads, and devious manipulation of search
results (buying rank). - The latter is considered unfair by users of the
negines
6How big is this business?
- Google has a market value of several billion
dollars - The use of the internet to find goods and
services is growing - Assignment Try to find figures, on the Internet,
that indicate how large a business the Search
engine business is, and how it is expected to
grow.
7How do search engines work?
- They start with the source of a page (say this is
page 347658) - lth1gtSearch engineslt/h1gt
- lth2gtWord countslt/h2gt
- A search engine processed the page and looks
for each occurrence of each word. It may also
take not of where in the page the word occurs.
For example, the engine will note that the word
engine occurs four times in this short page. - Then they build an index with entries like
- Engine (347658,4), (347658, h1), (97653, 2),
(PageId, number) - Search (347658,2), etc. ..
- When you enter the query
- Search engines
- The matching engine goes to work in several steps
-
-
8The matching engine
- Step 1. Boolean AND.
- This step rejects all pages that are missing any
one of the words in your query. - Step 2. Ranking based on frequencies
- This step gives a page with (engine, 4) and
(search, 1) a combined weight something like - 41 5.
- Actually, the amount of weight given to the word
is increased a little if the word is an uncommon
one, and decreased if it is a common one. So for
the query - New engines
- The score would give more weight to engine than
to the more common word new.
9Matching, final step
- The higly successful engine Google also uses a
concept called page rank. - The idea is that along with the words in the page
the words that appear as the anchor text pointing
to that page - An excellent source for search engines also tells
us something about the page. What it tells us is
that this page is about search engines
10Authoritativeness
- But it also tells us that someone thought the
page was good enough to point to it. So that
gives it a degree of authority. - We can think of authority as a kind of money.
If I make a page that points to 3 pages, then I
give 1/3 of my own pages authority to each of
them. If each of them points to 4 other pages,
then they are transferring the authority they
have, to those pages. Because the web has many
closed loops in it, the authority races around
like an invisible fluid, and eventually it comes
to equilibrium, with each page having a certain
amount of it. - This is illustrated in the spreadsheet at
http//www.scils.rutgers.edu/iti-ec/ITI410/2004/W
eek3/pageRank.xls - Note, to use it you must download it, from the
page above. It contains a macro that is not
digitally signed. But Prof. Kantor made it, and
you can trust it.
11Page Rank Small Web
A
C
B
D
E
12Tracking authoritativeness
To use this spreadsheet for computing "authority" To use this spreadsheet for computing "authority" To use this spreadsheet for computing "authority"
1. Put thematrix in the dark box. Be sure the numbes in each row sum to 1 1. Put thematrix in the dark box. Be sure the numbes in each row sum to 1 1. Put thematrix in the dark box. Be sure the numbes in each row sum to 1
2. Put 1 in each of G3 to G8 the yellow cells 2. Put 1 in each of G3 to G8 the yellow cells 2. Put 1 in each of G3 to G8 the yellow cells
3. Press ctrl-Z and watch the numbers change 3. Press ctrl-Z and watch the numbers change 3. Press ctrl-Z and watch the numbers change
4. When the number in the green box hits zero, stop. 4. When the number in the green box hits zero, stop. 4. When the number in the green box hits zero, stop.
5. Read the authority weights in the yellow boxes 5. Read the authority weights in the yellow boxes 5. Read the authority weights in the yellow boxes
6. Note that as long as you have some number in each yellow box when you start 6. Note that as long as you have some number in each yellow box when you start 6. Note that as long as you have some number in each yellow box when you start
7. The result is always the same
How it works. The "authority" is spread from each page to the ones it links to How it works. The "authority" is spread from each page to the ones it links to How it works. The "authority" is spread from each page to the ones it links to
The authority keeps flowing around until it stabilizes. The authority keeps flowing around until it stabilizes. The authority keeps flowing around until it stabilizes.
The result tells how much authority each page really has. The result tells how much authority each page really has. The result tells how much authority each page really has.
13Questions
- Why does Page A end up with no authority
- Why does page C have less than the others.
- Exercise download this spreadsheet
- Design your own graph with 5 pages
- Put the matrix into the spreadsheet
- Compute the authoritativeness
- Explain the results!! Due next Tuesday.
14Putting it all together
- The engine combines all that it has done
- Boolean screening to be sure the words are there
- Calculation based on word frequencies to decide
how relevant the page is for your query - Calculation based on page rank to decide if a
page is authoritative. - And it presents the results in a ranked order.
15Summary
- The three parts of the search engine are
- Crawler
- Indexer/matching engine
- Presentation page
- This represents a vital tool in organizing the
explosion of information - It has an uncertain economic model. Currently
supported in the same way that the media are,
that is by advertising. - However, some companies also sell the technology,
so that you can buy the Google search engine
and apply it to your own web site, to provide
better organization for internal users. -
16The challenge of genre
- If you take a typical topic such as
- egely vitality
- You find hundreds of pages. If you suspect that
this is a hoax or pseudoscientific device, how do
you zoom in on the small number of pages that
debunk it (rather than trying to sell it to you
for 180.) - egely vitality hoas does not work
- egely vitality randi does work. Why!!
17Markets Versus Auctions
- A second way in which the internet and world wide
web have created a new kind of business is the
automated auctions such as eBay. eBay is actually
making money, and has done so from its early
days. - To see why, look at the spreadsheet
- http//www.scils.rutgers.edu/iti-ec/ITI410/2004/W
eek3/MarketVsAuction_1.xls - This lets you experiment. The numbers at the left
are the price that various potential buyers are
willing to pay for your product. - The numbers across the tops of the columns are
the various prices that you might sell it at.
(Why not sell at prices in between the customers
preferred prices?)
18Auction continued
- In cell B2 you can place the cost of making the
product. - The row called profit shows how much profit you
can make at each particular price. - In each case, the highest profit that you can
make from a market price is lower than what you
can make from an auction
19Who benefits?
- Both sides!!
- The seller does not have to choose a single best
price, which prevents some people form buying it,
even tough they cold afford more than it costs
the producer - The seller gets more profit
- The process does take longer, as the sales are
done serially, so that each price is set
separately. - If you offer several widgets for sale at once,
you are obligated to clear at a single price,
which will bring in less money from the people
who really want to have them.
20Summary
- The internet has produced a new industry search
engines - We have explained how it works
- The economic models are either advertising or
sales of the engine technology - Certain kinds of screening, such as scientific
critique versus advertising hype are hard to do
21And ..
- The internet has given new life to an old
industry, the auction - By permitting people to buy from anywhere in the
world - By automating the management of the auction, so
that the prices of running it have dropped very
substantially. - Note auctions are not perfect for everything.
Google (the same one) tried a Dutch auction to
set a price for its stock that would not have a
bubble. The result was, speculators stayed away,
and they had to reduce the number of shares they
offered, in order to clear at a price that would
not kill them. - Well talk about that more in a later lecture.