Table of Contents
Image and molecular processing requires and produces large amounts of informational and parametric data. The traditional way to deal with this in software packages is to define one or more package-specific parameter file formats. Often, these formats have a strict notion of the placement and order of informational elements, making it hard to impossible to ensure backward compatibility as these formats evolve. The alternative to such formats is a flexible, tag-based format that is easy to maintain and extend. For Bsoft, I adopted the STAR (Self-defining Text Archiving and Retrieval) format (Hall, 1991). An alternative is XML, although as text files, XML files are less readable and takes much more space compared to the equivalent STAR files.
The STAR format is a plain text format (i.e., human-readable) that defines tag-value pairs with few rules on the composition of the file:
The file name must end in one of the extensions: ".star" or ".cif" (case insensitive) (CIF: Crystallographic Information File)
Each file must have one or more data blocks. The start of a data block is defined by the keyword "data_"followed by an optional string for identification (e.g., "data_micrograph_145"). For image processing, each set of parameters for a micrograph is mapped to a data block. Precede the keyword by an empty line to avoid problems in parsing.
Multiple values associated with one or more tags in a data block can be arranged in a table using the keyword "loop_" followed by the list of tags and columns of values. The values are delimited by whitespace (i.e., blanks, tabs, end-of-lines and carriage returns). The loop must be followed by an empty line to indicate its end.
Tag names always starts with an underscore ("_"). Each tag name may only be used once within each data block.
Data items or values can be numeric or strings of characters. A string is interpreted as a single item when it doesn't contain spaces or when it follows a tag not in a loop, or when it is enclosed in quotes (single or double), or when it is enclosed between lines with the first character a semicolon in each line. In Bsoft, all values are read as strings. Conversions to numeric values have to be made from the specific program accessing the data.
Comments are strings which can occur in three places:
File comments: All text before the first "data_" keyword
Data block comments: Strings on their own lines starting with "#" or with ";" as the first character in the line. The ";" character is used for multiple lines and must be followed by a closing ";" as the first character of a new line.
Item comments: Strings on the same line as and following tag-value items, also indicated by a leading "#".
Hint: Make ample use of empty lines to delimit different parts of a parameter file.
Many Bsoft programs add processing information to initial comments in a file, to allow the user to track the history of the file.