Estimating Project Size

Much of this page is adapted from chapter 5 of Watts Humphrey's book "A Discipline for Software Engineering". It is from him that I have borrowed the idea that size is measured throught the use of proxies. The general idea is that in to estimate the size of a project one should have a more easily constructible representative for it. If one knows the size of the representative and has good historical data on the relationship between the size of the proxy and the eventual project one can then estimate the size of the current project.

Before looking at the process in more detail one should be aware of the shortcomings of the process. First, it provides little guidance when the project is of a completely new type. Secondly it must be corrected for the tendency of a given individual to either over or under estimate. Lastly the process itself takes time so that convincing management that this time is needed may be a problem.

Size measures and Project Length

The usual measure of program size is lines of source code (LOC). More usually KLOC for one thousand lines of source code. It is important to be consistent in how one measures LOC. Ideally it should be done automatically. For this purpose one does well to adopt a set of standards for laying out source code, including documentation.

The actual number of LOC is dependant on the language that is used. The following table gives typical figures for the number of LOC required to code the same process in different languages. The measure used is lines of code per function point

             LOC per FP

Assembler               320
C                       150
COBOL                   106
FORTRAN                 106
Pascal                   91
PL/I                     80
Ada                      71
Prolog                   64
APL                      32
Smalltalk                21
Spreadsheet Languages     6

Function Points as Proxies

Function Points are extensively used as proxies for code. The idea is due to Albrecht. It is based upon analysis of a large number of actual commercial applications. Most of the applications use screens for data entry and display. The function points are:-

Input Items. Inp.
Usually an input screen for adding/changing data
Output Items Out
A display screen.
Inquiries Inq
Screens for interrogating an application.
Data Files Maf
The files that the system accesses, modifies or updates,
Interfaces Inf
Files shared with other applications, including databases and parameter lists.

These categories can be further weighted according to their complexity. An estimate of the size of the project (in FP) is given by

FP = 4 * Inp + 5 * Out + 4 * Inq + 10 * Maf + 7 * Inf

Objects as Proxies

An alternative proxy is the Object line of Code. The idea here is that there is a relationship between the size of the code when written as messages to objects and the final code size. The link between the two is made by estimating the size of the objects. Humphrey shows in his book a good regression line between OLC (object lines of code) and LOC.

The size of an object can be estimated in terms of the number of methods that are required to implement the object and an estimate of the category of the object. The following table, taken from Humphrey's book shows a collection of estimates of object size in LOC per method. You lineage may be different!

C++ Object Size in LOC per method.

Category    Very   Small  Medium  Large    Very
            Small                          Large
Calculation  2.34   5.13    11.25   24.66   54.04
Data         2.60   4.79     8.84   16.31   30.09
I/O          9.01  12.06    16.15   21.62   30.09
Logic        7.55  10.98    15.98   23.25   33.83
Set-Up       3.88   5.04     6.56    8.53   11.09
Text         3.75   8.00    17.07   36.41   77.66

Note that this process requires that you design the system before you estimate its size. (You can estimate how much a house will cost until you have designed it in broad outline!).

The process can be considerably improved by using statistical methods to track the accuracy of your estimates.

Jonathan Hodgson
Last Change: 2002 February 10th