realise value from processes and systems  
    
     
     
Articles / Estimating a data dictionary

In early 2001 we were called in to a Sydney-based organisation due to some problems they were having with maintenance of two existing applications. This article describes the approach that we use for estimating work of this type, with reference to this specific project.

Both systems have long and complex histories. One ('System 1' for the purposes of this case study) started as a snapshot of another system, and those parts of the client code for that system which were considered unnecessary had been removed (although most of the database tables were still in the database structure). The other ('System 2') had been developed from scratch, but was not yet complete.

Both were written in Sybase/PowerBuilder and run under NT.

The client called us in to prepare a data dictionary (and other maintenance documentation not dealt with in this case study) in order to:

  • reduce the ongoing costs of maintenance by allowing flexible outsourcing of the maintenance activity

  • reduce the risks associated with having only one or two programmers with adequate knowledge of the systems

Both systems already had some form of user documentation or online help, and also operations documentation detailing how to add new users, etc. However, the only maintenance documentation available was in the standard headers for the stored procedures for the applications.

Tools

We first looked at the tools available: CASE tools are an expensive way of solving the problem, and provide little additional leverage for existing applications. In our experience, CASE tools are very good if used from the start of a development project. But even though they have the capacity to automatically analyse an existing project, the information that they draw from it is limited: the only information that a CASE tool can get out of a schema is what's already in the schema. The information that a developer actually needs ("What is actually stored in this field?", "Does this table store data, or is it just used as temporary storage during a transaction?") needs to be drawn from the knowledge of existing developers.

Word documents are easy to maintain, but have a couple of problems: they are not designed for online viewing and searching during development (although this is obviously possible) and they are difficult to synchronise with database changes: when a new field is added to the database, the developer has to remember to update the Word document to reflect that change.

We have developed a compromise solution for our clients which consists of a Word document with additional VBA macros. This works in the following way:

  • an SQL file representing the database schema is generated from the DBMS

  • the Word file macros analyse this, and populate the Word file with entries that represent the tables and fields within the tables (not indexes)

  • a writer (or later, a developer) edits comments in the Word file against each of the entities (tables, fields)

  • another VBA macro writes these comments out to an HTML file

  • when there are changes to the database structure later, an updated SQL file is analysed by the VBA macros, and any additional fields are added to the Word file - the existing comments on unchanged fields are retained automatically

  • an updated HTML file is generated

We provide the use of these macros on our projects free of charge.

Content

For each table, we planned documentation that covers:

  • whether the table is used in the present implementation and if so:

  • what entity the table represents

  • (broadly) what parts of the system it is used in

For each field in the table, the documentation covers:

  • whether the field is used in the present implementation and if so:

  • what the data in the field represents in business terms

  • whether there are any restrictions to the values that can be stored in the field (eg codes that represent particular values)

Note that the normal SQL field definitions (eg data type, whether Null is allowed) are echoed automatically from the SQL CREATE script.

Estimate

The following tables show our estimates of the page count and time to write each of the parts of the documentation.

Our estimate of the page count (our estimates are all based on notional A4 'pages' of text, even though the documentation will probably never be printed) was based on the following assumptions:

Item

Count per page

Pages per day

Table

2

5

Field

10

5

We gained item counts for the work by asking the DBA to generate an SQL CREATE script for the database, and then using our macros to analyse it. The resulting item counts for System 1 were as follows:

  • Tables: 163

  • Fields: 1413

At that stage, we had no way of knowing how much of the database was actually used; much of it had been copied from the original application without modification. For budget purposes, we assumed that 75% of the material was actually used; a pessimistic assumption.

On this assumption, the amount to actually be documented was:

  • Tables: 122

  • Fields: 1060

Based on the above page count assumptions, System 1 database documentation was estimated to result in the following:

Item

Pages (notional A4)

Person. days

Tables

61

12

Fields

106

21

Totals

167

33

The item counts for the System 2 database (again, generated from a CREATE script) were as follows:

  • Tables: 113

  • Fields: 930

Initially we had no way of knowing which of these was actually in use in the current application, so we (pessimistically) assumed that they were all in use.

Based on the above page count assumptions, the database documentation for the System 2 should have resulted in the following:

Item

Pages (notional A4)

Person. days

Tables

57

11

Fields

93

19

Totals

150

30

We allowed a full 5 days for each writer to get up to speed with the application and the work environment, and a further 10% of project management time. With two writers, the total project resource requirements were estimated at:

Writers

63+2*5 = 73 person.days

Project manager

10% = 7 person.days

Total

80 person.days

The project ran on time and slightly under budget.

 

Popular articles