Title: MS MUG Best Practices
1MS MUG Best Practices
- The Microsoft Manufacturers Users Group
2What is the MS MUG?
- Microsoft Manufacturing User Group
- User group devoted to addressing issues faced
when applying Microsoft technology to industrial
applications - Formed in February, 1999
- 70 Members
- Users
- Software suppliers
- Microsoft
3Best Practices
- Information to help users implement systems that
are - More reliable
- Cost effective
- Easier to support
- Target audience is non-IT people
- Address issues specific to manufacturing
4Topics in the Best Practices
- Architecture
- Security
- Redundancy Reliability
- System Monitoring
- Change Management
- Support
- Backup Recovery
5Architecture
6Network Architecture
- Most Microsoft based applications in
manufacturing are connected to a network - 10MB 100MB Ethernet widely available cost
effective - Many control systems support Ethernet
- Utilize Ethernet switches vs. hubs for connecting
devices to network - Routers can be used to
- isolate mission critical portions of LAN from
rest of LAN - Connect LAN to WAN to provide remote support
7Network Media
- Unshielded twisted pair most widely used
- Fiber optic good for electrical noise immunity
8Wireless
- Wireless technology continuing to evolve
- Security is an issue that needs to be addressed
when using wireless
9Hardware
10Software
- When developing software architecture consider
- Where software applications should be installed
- Where the data should be stored
- Where the OPC server should be installed
11Security
12Fundamentals of Global Standards
- Security is a Process not a Project
- What are standards?
- How are standards enforced?
- Uniform Accounts
- Independent Security Administrators
- Process Control Systems are different from Office
Systems -
13Unified Global Security Standards
- Know what is on your system
- Control the Easy Stuff
- Who has physical access
- Who has connections
- Who gets information about your system
- Have good Passwords
14Unified Global Security Standards
- Accounts
- Make sure they are well identified
- Only create when definitely needed
- Delete when done
- Process Station Accounts should change passwords
regularly - FTP accounts must be limited
15Ports and Services
- Shut down unused Ports and Services
- Review all necessary services
- Turn on monitoring and alerting
- Do not install sample applications on production
systems - Restrict SNMP Telnet
- Restrict FTP HTTP
16Administration and Maintenance
- Learn to audit the systems
- Keep up-to-date with updates
- Restrict non-standard practices
- Control physical access
- Make Backups, Backup Backup again
- Limit remote connections
- Avoid Network Services on Process Control Machines
17Administration and Maintenance
- Do NOT rely on Share Protection
- File/Folder ACL (Access Control Lists) should be
used - Do maintenance on accounts
- Limit who is an administrator
- Service accounts are special
- Control Domain Trusts
18Attack Response
- Remain calm
- Keep an open mind
- Take good notes
- Notify the right people and get help
- Enforce a "need to know" policy
- Use out-of-band communications
- Contain the problem
- Make backups
- Get rid of the problem
- Get back in business
19Redundancy Reliability
20Software
- Primary backup ID status
- Redirection of clients in event of primary or
backup failure - Synchronization of archives
- I/O communication switching
- Synchronization of tags
- Failover time
- Cluster awareness
- Load balancing
- Ability to interact with power management systems
- Data replication across networks
21Data Storage
- RAID drives
- Remote archives
22Power
- UPS
- Surge Protector
- Redundant power supplies
- Line conditioners
- Generator
23Hardware
- Fault tolerant computers
- Clusters
- Industrial computers
- Locate hardware where environmental factors are
controlled
24System Monitoring
25Background
- What is System Monitoring?
- A systematic approach to ensure that your
computer system runs as well as the first day it
was installed. - Allows you to observe the health of a system in
real-time - Under-utilized in most manufacturing applications
- Help catch problems before a disaster occurs and
provide a tool for proactive maintenance - Questions you can ask
- Does anyone ever look at the event viewer?
- Does anyone monitor key system variables?
- Do you know what the normal load on the system
is? - Is the system load staying the same or tracking
up or down?
26Objectives
- High Availability
- Ideally 99.999 (i.e. 5 minutes of
downtime per year)average is 99.964 uptime
(i.e. 3.2 hours of downtime per year) - Allocation of Outages
- 20 Infrastructure
- 40 Applications
- 40 People
Infrastructure Power, Network, cabling, hardware,
Operating System
Application Custom and Commercial App maintenance
People Operator error, malicious activities,
deliberate outage for maintenance
27System Level Management Process (SLM)
- Build the team
- Define the System
- Measure relevant variables
- Set system level objectives (SLOs)
- Apply Monitoring tools
- Improve through actions
- For more information SLAs www.nextslm.org
- Microsoft also has a tech-note on SLM/SLA at
- http//www.microsoft.com/technet/prodtechnol/windo
ws2000serv/maintain/opsguide/slmgmtog.asp
281. Build the team
- One process owner (or Monitoring Manager) for
each critical component - Implement automated monitoring if system
component where appropriate - An internal team is necessary even if IT is
outsourced - Define typical duties for monitoring managers or
Process Owners, e.g. - Detect management events and raise alerts across
multiple shifts - Execute documented operational procedures for
event escalations - Follow data security procedure
- Adhere to maintenance contracts
- Provide regular feedback on operational
performance - Ensure detection of alerts from all
infrastructure components - Ensure that system resources are in good working
order - Monitor backup, restore, recovery and
verification procedures
292. Define the System
- Data flow
- Identify boundaries
- Data providers and data consumers
- Focus on availability and performance
- E.g. PLC are data provider, desktop are data
consumers goal is to deliver timely and
accurate process data to desktop user
303. Measure Relevant Variables
- Select metrics that both team and operations
agree are meaningful - Document baseline data on CPU, RAM, Hard disk,
web traffic, communication and monitor these
systems for increased burden - Monitor configuration changes (registry, patches,
file permission, access control, user data)
314. Set Service Level Objectives (SLOs)
- Place rules (e.g. thresholds, filters, trends) to
avoid false alarms - E.g. physical memory drops below 20
- Also, include system wide deliverables e.g. 10
minutes response time after an emergency
maintenance request - Monitor and reward team performance
325. Apply monitoring tools
- Partly this can be achieved by tools (software
hardware) available in market today - Consider using Agents (software programs),
background task that monitors variables - E.g. Network monitoring tool, email agent
336. Improve through Action
- Continuous Improvements is a way of life.
- E.g. as alarms and anomalies are experienced, the
control part of System Monitoring and Control
now becomes important. It is at this point that a
change in hardware or procedure takes place so
that it prevents the problem from occurring
again.
34Change Management
35Change Management Process
36Risk Levels
37Testing Validation
38Emergency Change Management
39Support
40Major Process Management Disciplines
- Incident Management
- Problem Management
- Change Management
- Release Management
- Configuration Management
41Support Process Model
Management Tools
Incidents
Incidents
Service Desk
Incident Management
Changes
Problem Management
Releases
Change Management
Release Management
Configuration Management
Changes
Incidents
Releases
Problems Known Errors
Change Management Database
The ITIL Support Process Model
42Backup And Recovery
Backup and Recovery
- A critical part of any application.
- Should be included in the design and
implementation.
43Types Of Backup
- Image
- Normal
- Incremental
- Differential
- Copy
- Daily
44Image Backup
- Snapshot of entire disk
- No applications can be running
- Restore requires identical hardware
- Fastest to restore
- Norton Ghost is an example
45Normal Backup
- Selected files are copied
- Marks file as backed up
- Best when many files have changed
46Incremental Backup
- Best when fewer files change between backups
- Marks file as backed up
- Restore from the last normal backup and then add
all the incremental backups
47Differential Backup
- Copies only files new or changed since last
normal or incremental backup - Marks file as backed up
- Restore by restoring the last normal and the
differential backup
48Copy Backup
- Copies all selected files
- No files marked as backed up
49Daily
- Copies all files that changed on the day the
backup is performed - No files marked as backed up
50Software License Keys
- Software keys are common in automation software
and usually cannot be backed up. - Procedures must be obtained from the manufacturer
and documented
51Document The Recovery Process
- Disasters can happen when those that created the
backup are unavailable - Written procedures should be available to guide
maintenance personnel through the recovery process
52Thanks!
- The following companies contributed to the
development of the MS MUG Best Practices - 3M
- ARC Advisory Group
- Boeing
- Cisco
- Lousiana Center for Manufacturing Sciences
- Microsoft
- Procter Gamble
- Siemens
- Wonderware
53For More Information
- MS MUG Best Practices document can be downloaded
from the MS MUG website - http//www.omac.org/wgs/MSMUG/msmug_default.htm