This library is about file formats where data can (mainly) only be appended to files. This is very useful for all kinds of batch processing. The data are nevertheless indexed by a primary key, so the results of the batch runs can immediately be used for online queries.
It is also possible to delete and overwrite data, but there are restrictions. What this library definitely does not support, however, is to automatically reclaim free space. Once a region of data becomes unused, it remains unused forever, and is never allocated again for newly added data.
You may ask what the advantage is compared with fully automatic database systems that exist as database servers (e.g. MySQL) or as libraries (e.g. Sqlite). First, the formats described here are very simple, and the data need to pass fewer abstraction layers when it flows from disk to the application or vice versa. The simplicity makes the formats also quite robust. Second, and this is the key advantage, both reading and writing data is very fast, as the data blocks are contiguously allocated. When only sequential accesses are done, random seeking is practically eliminated. So the domain of this library is high-speed batch processing where data is mainly sequentially accessed.
Kvseq is the most basic file format. Kvseq is simply a sequence of key/value pairs. Keys and values are arbitrary strings. The pairs can optionally have a delete flag to mark pairs as deleted. It is generally not possible to change the size of a key or value string after it is written (there is some support for strings up to 255 characters, but only if space has been preallocated).
The Kvseq format is implemented by the module
Seqdb_containers.Kvseq. Kvseq has a special data safety feature:
These files can be "checkpointed" in the sense that a certain file
length is declared as being a valid one. Later, files can be "rolled
back" from their actual size to the last valid one. Note that these
mechanisms do not have any effect on the contents of the Kvseq files,
but only protect the structure of the files, so the files can always
be read entry by entry, even after a computer crash. Using this
feature is optional.
Hindex is a hashed index format. The file is mainly a big hash table
with a fixed size. Basically, the cells of the index may contain any
data. The Hindex format is, however, specially designed to be used in
conjunction with Kvseq. Usually, the cells contain pointers to the
keys of Kvseq entries. By combining Kvseq and Hindex one gets a format
where data can be efficiently read and written in a sequential way,
and additionally the entries can be looked up by key. Hindex is
implemented by Seqdb_containers.Hindex.
Users of Hindex should take care of not making the indexes too large. As the lookup procedure works by hashing, usually every lookup requires that a disk block is to be read. If the index is not too large, it can be cached by the operating system, and not every read operation reaches the disk drive. Hashed indexes have a special access characteristics - in particular, one cannot hope that by looking up a certain key a similar key is stored close to the first key (as one can assume for b-tree indexes).
Based upon Kvseq and Hindex, this library also provides a higher-level format, the file systems. These are called like this because their functionality is similar to what the operating system calls a file system: A mapping from names to blobs of data. The file systems allow more mutating operations than one would get by simply combining Kvseq with Hindex. In particular, one can change the size of existing files, and there is a timestamp for every file. Finally, the file system has a locking scheme to allow concurrent accesses by several processes.
It is recommended to use file systems in favor of combining Kvseq and
Hindex directly. There are tools to manage file systems (see below),
e.g. for reclaiming space in special compaction passes (garbage
collection). File systems are implemented by Seqdb_fsys_ht.
There is a special variant of file systems, the so-called append-only
file systems. These restrict the possible operations again - in
particular, one can only modify the physically last file of the
system. The consequence is that the files are then implicitly ordered
by timestamp, and the timestamps can be used as a kind of pointer into
the file system. Especially, one can iterate over all files whose
timestamps are bigger than a given value. For append-only file systems,
a timestamp index is maintained to allow this with reasonable speed.
Another advantage of append-only file systems is that strict read-only
accesses require less locking, and it becomes possible that one
process adds data to the system while another one reads files at the
same time. This variant is implemented by Seqdb_fsys_ao.
Note that the file systems do not support subdirectories.
Archives can be used to subdivide the data of a file into smaller parts (like a zip file). The format used here allows the addition of new elements to archives at the end.
The archive format can be used with real OS files, or with the file systems described in the last section.
Kvseq and Hindex files have a common superblock format. Some fields, like the effective size of the file are used by all formats, while other fields are format-specific.
There are two tools:
fct is the "file container tool". The most useful function is to
look at and to mofify superblocks. For Kvseq and Hindex formats
there are further functions, e.g. one can list all keys of a Kvseq
file.filesys is the tool for managing file systems from the command-line.
It allows inspection and (in a limited form) manipulation of file
systems. Most important, however, are the routines to rebuild indexes,
to compact file systems, and to repair damaged file systems.The file formats have numerous variants. If the variant needs a special representation, it is always configured by a setting in the superblock. For example, one can configure the way keys and values are encoded for Kvseq:
The size of the files is limited by 2^63-1 bytes.
Other configuration options can be set at runtime. For example, one can demand that the whole Hindex of a file system is cached in RAM.