Coding style
Bsoft is written to facilitate the rapid development of image and
molecular processing applications. The coding style is kept simple and
designed to avoid ambiguities. There is a certain formalization and
discipline required to code this way, as laid out in the following
guidelines:
- Modularity:
- Keep globals to a minimum: The main code in Bsoft has
only two
package-wide globals meant mostly for reporting purposes, "verbose" and
"memory".
- Separate program front ends from functionality: In Bsoft,
each
program is viewed as an user interface to process options and drive
processing. The Bsoft library can therefore be used in the scope of
any program.
- Encapsulate units of functionality in actual functions:
In
Bsoft I attempt to write each function doing only a specific task.
- Every source file in the library has its own header file
(don't
merge header files!). E.g.: utilities.c and utilities.h.
- Separate I/O from processing: All the functions reading
and
writing files and dealing with specific formats feed into a small
number of interface functions:
- Images: read_img and write_img
- Molecules: read_molecules and write_molecules
- Parameters: read_project and write_project
- Generality:
- Only three forms of information, each with associated
objects
(structures) in the code encapsulating all the relevant data:
- Images:
- Molecules:
- Parameters:
- Every
function is written to deal with all incarnations of the
data form it processes. For images, this means that every function
needs to address all data types. For molecules, both atomic coordinate
and sequence data are encoded in the same structural hierarchy, and
each function needs to take this into account.
- A typical function should be written to provide a general
solution to a problem posed, rather than just returning a specific
result.
- Command line option handling:
- Old model: The original Bsoft option handling was managed
in
typical Unix fashion as single letter tags followed in some cases
by an option value or argument (using the getopt function).
- New model: The use of single letter tags proved to be too
restrictive as Bsoft grew, and a new mechanism was introduced allowing
the user to use truncated versions of long option tags provided they
were
unambiguous. This is largely compatible with the old style options as
long as a
space is used between the option tag and value.
- New model and the usage block: This model uses the
"usage"
block of strings to determine option mappings, making the design of the
usage strings important, as set out in the following rules:
- Any line starting with '-' is assumed to indicate an
option
description.
- The option tag can only be 15 characters long.
- The option tag must be separated from the example value
by a
whitespace.
- The presence of an example value indicates that the
option
requires a value.
- New model mechanism: The command line argument list is
parsed
for options indicated by '-' as the first character. An argument deemed
an option tag is scanned against the usage block to find the full tag
and determine whether it takes a value. The option tag-value pairs are
stored in a linked list and returned. These tag-value pairs are then
evaluated to set command-line parameters.
- Error handling:
- Function return values: Functions in the Bsoft package
returns
three types of values, where each can be used as an indication of an
error:
- An integer used as error code: Error codes are always
less
than zero.
- A calculated value: An error may be indicated as an
implausable value for the return variable.
- A pointer to a structure: A NULL return value indicates
an
error.
- Handling: To make an error condition as useful as
possible, the
point of failure in each function in the calling hierarchy should be
identified by propagating the error condition back to the top level.
This means that an error should not let the program exit at a low level
function.
- Warnings: A warning is required to indicate an unexpected
condition, or a corrective action that may be counter to what the user
expects, but mostly a non-fatal condition.
- Image processing model:
- An image is read as a whole, processed, and the output
written
as a whole. This ensures modularity in the code, avoiding mixing I/O
and processing issues. Due to the possibly prohibitive size of a
multi-image file, a facility has been provided to access individual
images from a multi-image file.
- Functions may process image data in place (i.e.,
replacing the
old data) to limit memory requirements, or generate new image
structures, depending on the requirements of the algorithm.
- Documentation: Preceding each function should be a
block of
comments written according to a specific syntax allowing automatic
extraction:
- The comment block must precede the function and start
with
"/**" and end with "**/" on their own lines.
- All keywords within the block must start with "@" as the
first
character on a line.
- The first keyword must be "Function:", followed by a
space and
the function name.
- Keywords (in order):
- Function:
- Author:
- Description:
- Algorithm:
- Arguments:
- Returns:
Image file formats
The varieties of image formats and even greater varieties of programs
producing files of these formats, mean that problems are encountered
because the programmers did not adhere to a complete and up to date
specification of a format, and typically took shortcuts to avoid having
to deal with all the issues included in a file format. This generates
problems such as poor data type support, omission of statistical
information, and even garbage in some fields which make well-behaved
programs crash. Here are some of the policies in Bsoft dealing with
such sloppiness in image format handling:
- Bsoft implements the notion of access to all images,
regardless
of format. The notion of an image format converter as a standalone
functionality is therefore considered outdated.
- The Bsoft policy is to adhere as closely as possible to the
file
format specification. The priority is therefore to follow published
specifications, and then to try to deal with the I/O of other
packages. In the case of TIFF files, Bsoft provides for many datatypes
(including short and float) described in the version 6 specification.
- Bsoft tries to clean up image header problems as best it
can, and
often such problems experienced with other programs can be resolved by
passing it through a Bsoft program such as "bimg".
- Endianness is handled on reading images based on the byte
order found
in particular header fields. When writing images, the native byte order
of the processor is imposed.
- The data type is preserved as far as possible, changed only
on
user request (option) or when the receiving file format does not
support the data type.
- Due to numerous problems encountered with reading date and
time
fields in image files, Bsoft programs now only write the date and time
into these fields.
- Labels and titles in image headers may contain garbage
with
control characters detrimental to program execution. Bsoft programs
write their own strings into these fields.