|
Articles / System documentation and
disaster
When fire broke out in the computer room at the
headquarters of Britain’s Open University in Milton Keynes in 1987 it
destroyed the VAX system used by the faculty to store their research work,
and all of the back-up tapes that were, some would say foolishly, stored in
the same room. The media reported that “years of research work”
had been lost.
Imagine that this scenario occurred for your business and
it was your company’s data that was lost. Would it be a problem if
you were to lose the last two months records of your financial transactions?
Of course it would. But that’s not going to happen, is it, because
you’ve got adequate, documented, systems in place to cater for such
eventualities, haven’t you? Have you?
Computer systems are vital to the running of all
businesses and organisations these days. From the smallest one-person
operation to the largest multi-national corporations, the data held on their
computer systems is vital to the efficient and profitable running of those
organisations. These can be any kind of organisation: educational
institutions, businesses, government departments, charities, hospitals, etc.
Management of those systems, including maintenance of comprehensive systems
documentation and an accurate and up-to-date disaster recovery plan, is one
of the key elements in ensuring that “mission critical” systems continue
to function efficiently regardless of what disasters may occur.
One major risk factor in particular – systems management
documentation - is often neglected because it can be costly to create and
maintain. It’s only when something goes wrong that people begin to
ask questions like:
-
“Where are the procedures for doing this?”
-
“Why did this happen, weren’t they following the
procedure?”
-
“How can I change this system without any
documentation on what’s in it?”
Systems management documentation covers a multitude of
different areas. Everything about any computer system, and the people
who use it, can and should be documented so that, when things go wrong (and
something always goes wrong), remedial action can be taken to minimise the
consequences.
System management documentation covers:
-
system configuration
-
system administration procedures
-
data dictionaries
-
backup procedures
-
disaster recovery plan
-
help desk knowledge bases
-
document management systems
And the consequences of neglecting these include:
System configuration includes the details how all of the
systems are set-up. This includes both software and hardware.
Software includes operating systems, system tools, software packages and
software you have developed in-house. Hardware includes not only PCs
and servers but also any peripheral device connected to them. If the
configuration of any of these things is not standard, and that is how they
are normally used, then if one of them should fail and need to be set up
again you will have difficulty doing this if the configuration information
is not documented and that documentation is not kept up to date..
System administration procedures can include details of
how systems run day to day, how things like new user accounts are to be set
up, how occasional situations (eg disks becoming full) are dealt with -
anything to do with the day-to-day running of your IT systems. If
these are not documented, or not kept up to date, it’s not only dealing
with problems that arise that would be problematic, but whenever an
intermittent problem turns it has to be solved by re-inventing the wheel
again and again. Good system administration documentation saves time
and therefore money. You might say that experienced system managers
and operators know this stuff and don’t need the documentation, but what
if they are off sick, or on holiday, or (god forbid) encounter the underside
of the proverbial bus?
Data dictionaries detail the way in which your databases
and associated applications work: what is in a table, what is in a field,
how is this data used by the system, what do programs do with the data?
Without adequate and up to date data dictionary information, the maintenance
and enhancement of those systems takes much longer, and is less likely to be
effective. Again, time costs money.
Backup procedures include schedules of tapes to be used,
where they go, what goes on them, what to do with them when the backup is
finished, how to check that the backup worked, how to restore things from
backup, where tapes are stored (on-site and off), how to get tapes back from
offsite storage. Most IT departments have their backup procedures
documented, but things change and if the documentation is not kept up to
date, problems can arise. IT department staff come and go and what
happens in the middle of the night when the relatively new system operator
is on his or her own and something that they haven’t run across before
goes wrong and there’s no documentation to help them? They can’t
do their job and those vital backups don’t get done.
Disaster recovery plans deal with every possible
eventuality that may befall a computer system, from minor problems to their
complete destruction and the need to restore the entire system, and how it
is done. Most large organisations have long since recognised the need
for a disaster recovery plan and have implemented one. Keeping it up
to date is essential. The details of new hardware and software and
their configurations (see above) must be included whenever they come along.
Hopefully a disaster recovery plan will never be needed, but if it is, and
it hasn’t been kept up to date, you may find that you restore your system
as it was three years ago, instead of last week.
Help desk knowledge bases are immensely useful, for those
who have them, for storing information about how problems have been dealt
with in the past and how to deal with them if they arise again.
Keeping these up to date is vital to ensure that mission-critical systems
keep running.
Document management systems are often used as the
repositories for all of the information upon which a business depends to
keep it running. Ensuring that everyone has access to them, recognises
their importance and keeps them up to date (though this needs to be
controlled) is an essential part of ensuring the continued efficient running
of any organisation.
While the hardware and software that an organisation uses
is important, it is useless without the people who use it and the people who
ensure that it runs smoothly and keeps on running. Implementing, and
most importantly having continuing commitment to a quality management system
such as ISO9001:2000 is one means to ensure that those people recognise the
need to maintain things like system management documentation and to
continually review and improve it.
The people who run IT departments are always busy.
There are always more immediate and seemingly more important things to do,
like developing new projects. But when those new projects go in, there
is the need for the accompanying documentation to ensure that those systems
run smoothly and keep on running and that if problems occur, something can
be done about them swiftly. People leave organisations. Problems
occur infrequently. Some things are only done once a year and can be
forgotten. Systems change. Making sure that those systems are
always clearly and accurately documented is vital in ensuring that they are
managed properly so that the entire organisation benefits from their
efficient running.
|