Andrew System Email Architecture at Carnegie Mellon University - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Andrew System Email Architecture at Carnegie Mellon University

Description:

Separation of the mail store from a distributed file system ... More CMU Mail System Goals ... Authentication of user sending intra-site email ... – PowerPoint PPT presentation

Number of Views:362
Avg rating:3.0/5.0
Slides: 48
Provided by: robsiem
Category:

less

Transcript and Presenter's Notes

Title: Andrew System Email Architecture at Carnegie Mellon University


1
Andrew SystemE-mail Architecture atCarnegie
Mellon University
  • Rob Siemborski Walter Wong
  • rjs3_at_andrew.cmu.edu wcw_at_cmu.edu

Computing Services Carnegie Mellon
University 5000 Forbes Ave Pittsburgh, PA 15213
Last Revision 02/19/2004 (rjs3)
2
Presentation Overview
  • History Goals
  • The Big Picture
  • Mail Transfer Agents
  • Mail Processing (Spam Virus Detection)
  • The Directory
  • The Cyrus IMAP Aggregator
  • Clients and Andrew Webmail
  • Current Andrew Hardware Configuration
  • Future Directions

3
The Early Years
  • Early 80s The Andrew Project
  • Campus-wide computing
  • Joint IBM/CMU Venture
  • One of the first large scale distributed systems,
    challenging the mainframe mentality
  • The Andrew File System (AFS)
  • The Andrew Message System (AMS)

4
Goals of theAndrew Message System
  • Reliability
  • Machine and Location Independence
  • Integrated Message Database
  • Personal Mail and Bulletin Boards
  • Separation of Interface from Functionality
  • Support for Multi-Media
  • Scalability
  • Easy to Extend, Easy to Use

5
End of AMS
  • AMS was a nonstandard system
  • Avoid becoming a technology island
  • Desire to not maintain our own clients.
  • AMS was showing scalability problems
  • Desire to decouple the file system from the mail
    system

6
Project Cyrus Goals
  • Scalable to tens of thousands of users
  • Support wide use of bulletin boards
  • Use widely accepted standards-based technologies
  • Comprehensive client support on all major
    platforms
  • Supports a disconnected mode of operation for the
    mobile user

7
Project Cyrus Goals (2)
  • Supports Kerberos authentication
  • Allows for easy sharing of private folders with
    select individuals
  • Separation of the mail store from a distributed
    file system
  • Can be independently installed, managed and set
    up for use in small departmental computing
    facilities

8
More CMU Mail System Goals
  • Allow users to have a single _at_cmu.edu address no
    matter where their preferred mail store is
    located
  • Some departments provide their own email accounts
  • CMUName Service
  • Ability to detect and act on incoming Spam and
    Virus Messages
  • Provide access mail over the Web
  • Integration of messaging into the overall
    Computing Experience

9
The Big Picture
The Internet
Users / Mail Clients
LDAP Directory Servers
Cyrus IMAP Aggregator
Mail Transfer Agents (Three Pools)
10
Mail Transfer Agents
The Internet
Users / Mail Clients
LDAP Directory Servers
Cyrus IMAP Aggregator
Mail Transfer Agents (Three Pools)
11
Mail Transfer Agents
  • Andrew has 3 Pools of Mail Transfer Agent (MTA)
    Machines
  • Mail exchangers (MX Servers) receive and handle
    mail from the outside world for the
    ANDREW.CMU.EDU domain.
  • The SMTP Servers process user submitted
    messages (SMTP.ANDREW.CMU.EDU)
  • Mail exchangers for the CMU.EDU domain (the
    CMU.EDU MXs)
  • All Andrew MTAs run Sendmail

12
Mail Transfer Agents (2)
  • Why 3 Pools?
  • MX Servers
  • Subject to the ebb and flow of the outside world
  • Significant CPU-intensive processing
  • Typically handle much larger queues (7,000
    messages each)
  • SMTP Servers
  • Speak directly to our clients
  • Need to be very responsive
  • Very small queues (200 messages each)

13
Mail Transfer Agents (3)
  • CMU.EDU MXs
  • Service separation from Andrew MX servers
  • Mostly just forwarding
  • No real need to duplicate processing done on
    Andrew MX servers
  • All Three Pools are Redundant
  • Minimize impact of a machine failure

14
Mail Transfer Agents (4)
  • Separate MTA pools give significant control over
    incoming email.
  • A message may touch multiple pools
  • Example

Message processed by CMU.EDU MX, bound
for foo_at_ANDREW.CMU.EDU
User submits message to foo_at_CMU.EDU via SMTP
servers
Message processed by ANDREW MX
Final Delivery To Cyrus Aggregator
15
Mail Processing
  • All mail through the system is processed to
    some degree.
  • Audit Logging
  • Cleaning badly-formed messages
  • Blocking restricted sender/recipients/relays
  • More substantial processing done by Andrew MX
    Servers

16
Mail Processing (2)
  • Spam Detection
  • Performed only on Andrew MX servers
  • Uses Heuristic Algorithms to identify Spam
    Messages (SpamAssassin)
  • Tags message with a header and score
  • User initiated filters (SIEVE) can detect the
    header and act upon it (bounce the message or
    file it into an alternate folder)
  • Very computationally expensive on MX

17
Mail Processing (3)
  • Virus Detection
  • Performed on all MTA servers
  • Uses signatures to match virus messages (ClamAV)
  • Bounce message immediately at the initial SMTP
    transaction
  • Initial debate between bounce vs. tag

18
The Directory
The Internet
Users / Mail Clients
LDAP Directory Servers
Cyrus IMAP Aggregator
Mail Transfer Agents (Three Pools)
19
The Directory
  • Mail delivery and routing is assisted by an
    LDAP-accessible database.
  • Every valid destination address has an LDAP
    entity
  • LDAP lookups can do fuzzy matching
  • LDAP queries done against replicated pool

20
The Directory (2)
  • Every account has a mailRoutingAddress the next
    hop of the delivery process
  • mRA is not generally user configurable
  • Some accounts have a user-configurable
    mailForwardingAddress (mFA)
  • mFA will override the mRA

21
The Cyrus IMAP Aggregator
The Internet
Users / Mail Clients
LDAP Directory Servers
Cyrus IMAP Aggregator
Mail Transfer Agents (Three Pools)
22
The IMAP Protocol
  • Standard Protocol developed by the IETF
  • Messages Remain on Server
  • MIME (Multipurpose Internet Mail Extentions)
    Aware
  • Support for Disconnected Operation
  • AMS-Like Features (ACLs, Quota, etc)

23
The Cyrus IMAP Server
  • CMU Developed IMAP/POP Server
  • Released to public and maintained as active Open
    Source project under BSD-like License
  • No servers were available implemented all of the
    features needed to replace AMS
  • Designed to be a Black Box server
  • Performance and Scalability were key to design

24
Initial Cyrus IMAP Deployment
  • Single monolithic server (1994-2002)
  • Originally deployed alongside AMS
  • Features were implemented incrementally
  • Users were transitioned incrementally
  • Local users provided a good testing pool
  • Scaled surprisingly well

25
Cyrus IMAP Aggregator Design
  • IMAP not well suited to clustering
  • No real concept of mailbox location
  • Clients expect consistent views of the server and
    its mailboxes
  • Significantly varying client implementation
    quality
  • Aggregator was designed to make many servers
    appear to be a single server to allow any user to
    share a folder with any other user

26
Cyrus IMAP Aggregator Design (2)
Users / Mail Clients
  • Three Participating Types of Servers
  • IMAP Frontends (dataless Proxies)
  • IMAP Backends (Normal IMAP Servers your data
    here)
  • MUPDATE (Mailbox Database)

Frontends Proxy Requests For Clients
Backends hold Traditional Mailbox Data
MUPDATE Server Maintains list
27
IMAP Frontends
Users / Mail Clients
  • Fully redundant
  • All are identical
  • Maintain local replica of mailbox list
  • Proxies most requests, querying backends as
    needed
  • May also send IMAP referrals to capable clients

Frontends Proxy Requests For Clients
Backends hold Traditional Mailbox Data
MUPDATE Server Propogates mailbox list changes
to frontends
28
IMAP Backends
Users / Mail Clients
  • Basically Normal IMAP Servers
  • Mailbox Operations are approved recorded by
    MUPDATE server
  • Create / Delete
  • Rename
  • ACL Changes

Requests are proxied by Frontends
Backends hold Traditional Mailbox Data
MUPDATE Server approves mailbox operations
29
MUPDATE Server
Users / Mail Clients
  • Specialized Location Server (similar to VLDB in
    AFS)
  • Provides guarantees about replica consistency
  • Simpler than maintaining database consistency
    between all the frontends
  • Protocol published as experimental RFC 3656

Frontends update local mailbox list replicas
Backends send mailbox list updates
MUPDATE Server approves and replicates updates
30
Cyrus AggregatorData Usage
  • User INBOXes and sub folders
  • Users can share their folders
  • Internet mailing lists as public folders
  • Netnews Newsgroups as public folders
  • Public folders for workflow general
    discussion, etc
  • Continued bboard paradigm 30,000 folders
    visible

31
Cyrus IMAP AggregatorAdvantages
  • Horizontal Scalability
  • Adding new capacity to frontend and/or backend is
    easy to do and can be done with no user visible
    downtime
  • Management possible through single IMAP client
    session
  • Wide client interoperability
  • Simple Client configuration
  • Ability to (mostly) transparently move users from
    one backend to another
  • Failures are partitioned

32
Cyrus IMAP AggregatorLimitations
  • Backends are NOT redundant
  • MUPDATE is a single point of failure
  • Failure only results in error when trying to
    CREATE/DELETE/RENAME or change ACLs on mailboxes

33
Cyrus IMAP AggregatorBackups
  • Disk partition backup via Kerberized Amanda
    http//www.amanda.org
  • Restores are manual
  • 21 day rotation no archival
  • Backup to disk (no tapes)

34
Cyrus IMAP AggregatorOther Protocol Support
  • POP3 support for completeness
  • Possibly creates more problems than not (where
    did all my mail go?)
  • NNTP to populate shared folders (newsgroups)
  • NNTP access to mail store
  • not yet implemented at CMU
  • Cyrus SASL for flexible authentication methods.
    We use GSSAPI, KERBEROS_V4
  • TLS/SSL for clients that only support password
    based authentication

35
Cyrus AggregatorSMTP AUTH
  • Allow relaying for clients
  • Restricted posting to shared folders
  • Authentication of user sending intra-site email
  • Unfortunately message authentication infomation
    is not exposed in any client

36
Cyrus IMAP AggregatorLMTP
  • LMTP (RFC2033) is used to transport mail directly
    from MTA to back-end mail store
  • Allows back-ends to serve data and not be
    message queue processors
  • LMTP AUTH used to preserve SMTP AUTH information

37
Cyrus AggregatorSIEVE
  • Simple server side filtering language (RFC3028)
  • Locally provided simple web interfaces to
    generate scripts for
  • Spam filtering
  • Vacation announcements
  • Unix command line upload tool (sieveshell) for
    those who want to write their own

38
Clients
The Internet
Users / Mail Clients
LDAP Directory Servers
Cyrus IMAP Aggregator
Mail Transfer Agents (Three Pools)
39
Clients
  • IMAP has many publicly available clients
  • Varying quality
  • Varying feature sets
  • Central computing recommends Mulberry
  • Roaming Profiles via IMSP
  • Many IMAP extensions supported (e.g. ACL)
  • UI not as popular
  • Other supported clients
  • Outlook
  • Entourage
  • PINE

40
Clients - Webmail
  • Use SquirrelMail as a Webmail Client
  • Local Modifications
  • Interaction with WebISO (pubcookie)
    Authentication
  • Kerberos Authentication to Cyrus
  • Local proxy (using imtest) to reduce connection
    load on server
  • Preferences and session information shared via
    AFS (simple, non-ideal)

41
Clients Mailing Lists
  • dist for personal mailing listsdistuser/fo
    o.dl_at_andrew.cmu.edu
  • Majordomo for Internet-style mailing lists
  • Prototype web interface for accessing bboards
  • Authenticated (for protected bboards)http//bboar
    d.andrew.cmu.edu/bb/org.acs.asg.coverage
  • Unauthenticated (for mailing list
    archives)http//asg.web.cmu.edu/bb/archive.info-c
    yrus

42
Andrew Mail Statistics
  • Approximately 30,000 Users
  • 15,000 Peak Concurrent IMAP Sessions
  • 10 IMAP Connections / Second
  • 850 Peak Concurrent Webmail Sessions
  • Approximately 3.5 Million Emails/week
  • See Alsohttp//graphs.andrew.cmu.edu

43
Andrew Hardware
  • 5 frontends
  • 3 Sun Ultra 80s (2x450mhz UltraSparc II 2 GB
    memory Internal 10000 RPM disk)
  • 2 SunFire 280Rs (2x1ghz UltraSparc III 4 GB
    memory Internal 10000 RPM disk)
  • 5 backends
  • 4 Sun 220R (450mhz UltraSparc II 2GB memory
    JetStor II-LVD RAID5 8x36 GB 15000 RPM disks)
  • 1 SunFire 280R (2x1ghz UltraSparc III 4GB
    memory JetStor III U160 RAID5 8x73 GB 15000 RPM
    disks)
  • 1 mupdate
  • Dell 2450 (Pentium III 733 MHz 1 GB memory
    PERC3 RAID5 4x36GB 10000RPM disks)
  • 3 ANDREW.CMU.EDU MX
  • Dell 2650 (Pentium 4 3ghz 2 GB memory PERC3
    RAID1 2x73GB 15,000rpm disks)
  • 3 SMTP.ANDREW.CMU.EDU
  • Dell 2650 (Pentium 4 3ghz 2 GB memory PERC3
    RAID1 2x73GB 15,000rpm disks)
  • 2 CMU.EDU MX
  • Dell 2650 (Pentium 4 3ghz 2 GB memory PERC3
    RAID1 2x73GB 15,000rpm disks)
  • 1 mailing list
  • Dell 2650 (Pentium 4 2.8ghz 1 GB memory PERC3
    RAID1 2x73GB 15,000rpm disks)
  • 3 webmail
  • Dell Optiplex GX 260 small form factor (Pentium 4
    2.4Ghz 1GB memory 80GB ATA disk)

44
Current Issues
  • Lack of client support for check new for IMAP
    folders (even when client supports NNTP)
  • Large number of visible folders can be
    problematic for clients (i.e. PocketInbox)

45
Potential Future Work
  • Online/Self-Service Restores (e.g. AFS
    OldFiles, delayed EXPUNGE)
  • Geographically distributed IMAP aggregator
  • Virtual Search Folders
  • Fault tolerance
  • Replicate backends
  • Support multiple redundant MUPDATE master servers
  • Multi-Access Messaging Hub
  • One Mail Store, many APIs
  • IMAP, POP, NNTP, HTTP/DAV/RSS, XML/SOAP
  • Web Bulletin Boards / blog interface
  • Remove Shared Folder / Mailing List Distinction
  • SIEVE enhancements
  • Ability to file across different backends

46
Current Software
  • MTA Sendmail 8.12.10
  • LDAP OpenLDAP 2.2
  • Cyrus 2.2.3
  • MIMEDefang 2.28
  • SpamAssassin 2.61
  • ClamAV 0.63
  • Squirrelmail 1.4.2 (w/Local Modifications)

47
References
  • Computing Services at Carnegie Mellonhttp//www.c
    mu.edu/computing
  • Project Cyrushttp//asg.web.cmu.edu/cyrus
  • Contacts Rob Siemborski rjs3_at_andrew.cmu.edu Wa
    lter Wong wcw_at_cmu.edu
Write a Comment
User Comments (0)
About PowerShow.com