|
Articles / Estimating a data dictionary
In early 2001 we
were called in to a Sydney-based organisation due to some problems they were
having with maintenance of two existing applications. This article describes
the approach that we use for estimating work of this type, with reference to
this specific project.
Both systems
have long and complex histories. One ('System 1' for the purposes of this
case study) started as a snapshot of another system, and those parts of the
client code for that system which were considered unnecessary had been
removed (although most of the database tables were still in the database
structure). The other ('System 2') had been developed from scratch, but was
not yet complete.
Both were
written in Sybase/PowerBuilder and run under NT.
The client
called us in to prepare a data dictionary (and other maintenance
documentation not dealt with in this case study) in order to:
-
reduce the
ongoing costs of maintenance by allowing flexible outsourcing of the
maintenance activity
-
reduce the
risks associated with having only one or two programmers with adequate
knowledge of the systems
Both systems
already had some form of user documentation or online help, and also
operations documentation detailing how to add new users, etc. However, the
only maintenance documentation available was in the standard headers for the
stored procedures for the applications.
Tools
We first looked
at the tools available: CASE tools are an expensive way of solving the
problem, and provide little additional leverage for existing applications.
In our experience, CASE tools are very good if used from the start of a
development project. But even though they have the capacity to automatically
analyse an existing project, the information that they draw from it is
limited: the only information that a CASE tool can get out of a schema is
what's already in the schema. The information that a developer actually
needs ("What is actually stored in this field?", "Does this
table store data, or is it just used as temporary storage during a
transaction?") needs to be drawn from the knowledge of existing
developers.
Word documents
are easy to maintain, but have a couple of problems: they are not designed
for online viewing and searching during development (although this is
obviously possible) and they are difficult to synchronise with database
changes: when a new field is added to the database, the developer has to
remember to update the Word document to reflect that change.
We have
developed a compromise solution for our clients which consists of a Word
document with additional VBA macros. This works in the following way:
-
an SQL file
representing the database schema is generated from the DBMS
-
the Word
file macros analyse this, and populate the Word file with entries that
represent the tables and fields within the tables (not indexes)
-
a writer (or
later, a developer) edits comments in the Word file against each of the
entities (tables, fields)
-
another VBA
macro writes these comments out to an HTML file
-
when there
are changes to the database structure later, an updated SQL file is
analysed by the VBA macros, and any additional fields are added to the
Word file - the existing comments on unchanged fields are retained
automatically
-
an updated
HTML file is generated
We provide the
use of these macros on our projects free of charge.
Content
For each table,
we planned documentation that covers:
For each field
in the table, the documentation covers:
Note that the
normal SQL field definitions (eg data type, whether Null is allowed) are
echoed automatically from the SQL CREATE script.
Estimate
The following
tables show our estimates of the page count and time to write each of the
parts of the documentation.
Our estimate of
the page count (our estimates are all based on notional A4 'pages' of text,
even though the documentation will probably never be printed) was based on
the following assumptions:
|
Item
|
Count
per page
|
Pages
per day
|
|
Table
|
2
|
5
|
|
Field
|
10
|
5
|
We gained item
counts for the work by asking the DBA to generate an SQL CREATE script for
the database, and then using our macros to analyse it. The resulting item
counts for System 1 were as follows:
At that stage,
we had no way of knowing how much of the database was actually used; much of
it had been copied from the original application without modification. For
budget purposes, we assumed that 75% of the material was actually used; a
pessimistic assumption.
On this
assumption, the amount to actually be documented was:
Based on the
above page count assumptions, System 1 database documentation was estimated
to result in the following:
|
Item
|
Pages
(notional A4)
|
Person.
days
|
|
Tables
|
61
|
12
|
|
Fields
|
106
|
21
|
|
Totals
|
167
|
33
|
The item counts
for the System 2 database (again, generated from a CREATE script) were as
follows:
Initially we had
no way of knowing which of these was actually in use in the current
application, so we (pessimistically) assumed that they were all in use.
Based on the
above page count assumptions, the database documentation for the System 2
should have resulted in the following:
|
Item
|
Pages
(notional A4)
|
Person.
days
|
|
Tables
|
57
|
11
|
|
Fields
|
93
|
19
|
|
Totals
|
150
|
30
|
We allowed a
full 5 days for each writer to get up to speed with the application and the
work environment, and a further 10% of project management time. With two
writers, the total project resource requirements were estimated at:
|
Writers
|
63+2*5
= 73 person.days
|
|
Project
manager
|
10% =
7 person.days
|
|
Total
|
80
person.days
|
The project ran
on time and slightly under budget.
|