Title: Network Quarantine At Cornell University
1Network QuarantineAtCornell University
- Steve Schuster
- Director, Information Security Office
2Overview
- Cornells incident response strategy
- Introduction to Network Quarantine
- Review of Scan at Registrations System (SARS)
- Post Mortem (What we did intelligently)
- Future considerations
3Organizational Structure
- Contact Center
- Part of Customer Services and Marketing
- Address end user support
- Patch support
- Virus remediation
- Network Operations Center (NOC)
- Part of Systems and Operations
- Initial security triage
- Incident response
- Blocks
- Notifications
- IT Security Office
- Development of operational procedures
- Technical solutions
- Backline support
4Some Security Challenges at Cornell
- A general openness and decentralization leads to
a larger number of incidents - Responding to incidents can be staff intensive
- Unmanaged (students) systems arrive on our
network several times each year - Incident notification is a challenge
- Wide range of end user support needs
5Responding to Incidents
- Security Office will react and contain campus
systems that are compromised or highly vulnerable - NOC had a mix of tools and manual processes for
opening case, notifying impacted parties and
implementing containment - Security Office often sends NOC containment
requests that were tedious to service with
current tools - Response to wide range security issues put much
strain on Contact Center - Current mechanism for containment was not fully
effective and didnt work in some environments
6Network Quarantine
- Objectives
- Provide better end user communication based upon
observed incident - Articulate self-remediation information and
requirements when appropriate - Improve cost effectiveness of security support
- Noc
- Contact Center
- More effective system isolation
- Better incident tracking and remediation for
local support providers - Quicker/escalated response for critical systems
7Network Quarantine(Basic Features)
- The right action is taken depending upon type of
system - Registration 10 space
- DMZ blocked
- Critical system notification
- Response for systems identified as critical is
escalated to Security Office and appropriate
local support provider - Incidents can be created, modified and closed via
web and socket interfaces - Latter allows batch and automation
- NQ interacts with Vantive, creating new case when
incident opened - Modifications to an incident trigger e-mail to
user, net admin and updates to Vantive - Specific incident remediation information
provided for end users - With appropriate credentials, CIT personnel,
including Contact Center, and campus system
administrators can search for and review incidents
8Network Quarantine(Incident Types)
9Network Quarantine(Incident Types)
10Network Quarantine(Incident Messages)
11Network Quarantine(Incident Containment)
12Network Quarantine(Incident Remediation)
13Network Quarantine(Users View)
14Network Quarantine(Users View)
15Network Quarantine(Users View)
128.XXX.XXX.XXX
16Network Quarantine(Specific Features)
- For each new incident
- New incident type for tracking
- Establishment of resolution requirements
- Incident specific message to users
- Users receive much better communication
- Self-release feature
- Users are able correct the issue
- Save staff time at the Contact Center
- Process automation, better user communication and
self-release has saved money
17Incident Response Costs
- Virus remediation costs/incident
- Contact Center Average 10 minutes
- NOC Average 3 minutes
- System compromise costs/incident
- Contact Center
- Simple support -- 20 minutes
- Full rebuild 1-4 hours
- NOC Average
- Average 5 minutes
18Network Quarantine (Cost Savings)
- Virus remediation costs/incident
- Contact Center Same but many self-release
- NOC under 1 minute
- System compromise costs/incident
- Contact Center
- Simple support -- 20 minutes
- Full rebuild 1-4 hours
- NOC Average
- Under 1 minute
19Scan at Registration System(SARS)
- All on-campus student computers were
automatically scanned upon registration - Objects
- Drastically reduce the number of infected or
compromised student systems coming to campus - Promote better security practices
20Enabling Features of NQ that Supported SARS
- Automation of containment and remediation
- Redirection to Network Quarantine infrastructure
- Articulated steps to support self-remediation
- Incident tracking
21Scan at Registration System (SARS)
- Requirements for ResNet registration
- Each computer system must be registered with a
valid NetID - Each computer must be configured to a minimum set
of security standards - No open writable fileshares
- All administrative accounts must have a password
- Must be patched
22Student Registration Process
- Every on-campus student went through the follow
process - Plug into network and get redirected to ResNet
Registration page - Authentication with NetID and fill in necessary
information for registration - Wait 90 seconds for registration to complete and
system check to occur - If the system passed all three tests
- Registration compete
- Else
- Redirected to NQ
- Informed of the problem and provided directions
for remediation - Rescan upon completion of remediation
- Repeat
23Scan at Registration Statistics
- Approximately 6500 systems scanned over move in
weekend - Of all systems scanned
- 65 were probably firewalled
- 35 were not firewalled
- 25 were clean
- 10 had at least one of the three problems
- Close to 12 of the systems had at least one
problem (780) - Around 85 of all quarantined students were able
to perform self remediation
24Network QuarantineOn-Boarding Metrics
25Post Mortem
- Gaining early support from Contact Center and NOC
was an absolute requirement - Cant under estimate the stress of move in
weekend (the parent affect) - Trust is important but bail out features go
further - If the scanning or quarantine infrastructure
failed registration would continue as before - If the Contact Center could not support the
demands of quarantined students all could be
released immediately
26Future Considerations
- Should scanning be expanded to other constituents
and infrastructures? - Should we be more aggressive with our scanning?
- Scan more frequently
- Deeper analysis
- Should we limit ourselves to network scanning or
install end point components? - Should we establish minimun expectations for all
computers connecting to our network?