{1 Management routines for Seqdb files} {2 Viewing and changing the superblock} The [fct] tool allows the user to view and change the superblock (among other things). To look at the superblock, call it like {[ fct superblock file.data ]} which outputs something like {[ Superblock contents: DTOTSZ : 254377116 HAVEDUPS: 1 FORMAT : 16 (kvseq) ITOTSZ : 112136172 HTALGO : 1 (MD5) HAVEAO : 1 ENTRIES : 397645 SBSIZE : 4096 FILESIZE: 372283222 PURPOSE : 5067492219884229697 (FSYSDATA) ALIGN : 0 AENTRIES: 397645 VALREPR : 3 (variable length up to 8 Ebytes (exa bytes)) FILEINCR: 4194304 ISZ : 512 SYNCTIME: 1200913506 KVDELFL : 1 KEYREPR : 0 (variable length up to 255 bytes) SYNCSIZE: 372283222 ]} For the interpretation of these fields, look at {!Seqdb_containers.Sb_consts}. Some interpretation hints: {ul {- If a superblock is shown, the file has the Seqdb magic at the beginning} {- [FORMAT] informs about the type of the file (kvseq, hindex, perm)} {- [PURPOSE] informs about the context of the file. If it is [FSYSDATA], [FSYSIDX], or [FSYSTIME] the file is part of a Seqdb file system. These file systems always have a data, an index, and an (empty) lock file, and optionally a time index file. Other strings may also appear as [PURPOSE].} {- [FILESIZE] (if present) is the logical length of the file. Only the first [FILESIZE] bytes of the file are interpreted at all. The rest is already allocated but still unused extension space.} {- [SYNCSIZE] (if present) is the valid length of the file, i.e. the length of the file at the last [fsync] time. It is ok that [SYNCSIZE < FILESIZE] while a file is in use. After using a file, however, there should be [SYNCSIZE = FILESIZE]. Otherwise, there was an unclean shutdown.} {- [SYNCTIME] is the timestamp of the last [fsync], in seconds since the epoch.} {- The total number of entries is in [ENTRIES], and the number of non-deleted entries is in [AENTRIES]. Basically, the number [ENTRIES - AENTRIES] are unused entries whose space is wasted. Some further notes about this: {ul {- In Seqdb file systems, both the data and the index file have these fields, but they show different numbers. Every file is represented by several entries in the data file and one entry in the index file. So [AENTRIES] of the index file is the number of files.} {- If [HAVEDUPS=1], deleted files are not marked as deleted in the data file, so we always have [ENTRIES=AENTRIES] in this case. Files are still deleted in the index, so the fields there are still meaningful.} }} {- For index files, it is very important how many entries of the hash table are already used. The hash table has the fixed size [HTSIZE]. The ratio [ENTRIES/HTSIZE] is the important number. If it is >0.5 performance decreases, if >0.8 this is already quite dramatic.} {- [DTOTSZ] tries to accumulate how much space is used for non-deleted entries, and [ITOTSZ] tries to accumulate how much space is used for inodes. Both numbers are only approximations. The idea is that [FILESIZE - DTOTSZ - ITOTSZ] is an indicator for wasted space, and if this number is too large, a compaction run might be worthwhile.} } With a command like {[ fct superblock file.data -set ISZ 256 ]} one can change the variables of the superblock. This is seldom a good idea, but sometimes nevertheless useful: {ul {- [ISZ] is the size of newly allocated inodes. This variable can always be changed. Values smaller than 256 do not make sense.} {- [FILEINCR] is the number of bytes the file is extended at once when it is full. A large [FILEINCR] reduces fragmentation, but also wastes space. The default is 4M (4194304). This variable can always be changed.} {- [HAVEDUPS] can be set to 1 for kvseq files at any time to allow duplicates (if the software accessing the files allows it). Once set, it must never be reset to 0.} {- [KEYREPR], [VALREPR], and [KVDELFL] can be changed for empty kvseq files. } } {2 Other functions of [fct]} With [-help] you can get the list of functions: {[ usage: fct is one of the following: superblock: Show/modify the superblock create: Create new files (kvseq/hindex/perm) add: Add entries to files list: List entries in files get: Get an entry from file(s) group: Group a perm file by keys and depend on the command you are issuing. use 'fct -help' to get command-specific help. ]} Especially, the [create] command is useful to create empty kvseq, hindex, or perm files. {2 Doing compaction runs of file systems} If a Seqdb file system wastes too much space, it might be advisable to do a compaction run. This means that the existing entries are iterated, and the entries are transferred from the original file system to a copy. Finally, the copy is renamed so it replaces the original. A compaction run is also a chance to change some fundamental parameters of file sytems, e.g. the index type. In order to start a compaction run, just do {[ filesys compact ]} where [] is the name of the file system without suffixes (e.g. [foo] if the files are called [foo.data], [foo.idx], and [foo.lock]). Some notes: {ul {- During the run, the original file system is only read-locked, i.e. other processes can still read from it, but any writers have to wait until the compaction is over.} {- By default, any errors in the original file system abort the compaction run. You can change that with the [-fault-tolerant] switch. It is then tried to recover from errors.} {- For file systems with the [HAVEDUPS] options, it is required to do two iteration passes (to figure out the duplicates). This is done automatically, but takes some time. You may try another method for eliminating duplicates with the [-index-iteration] switch.} } {2 Reindexing file systems} If the index of a file system becomes too full, it is required to enlarge the index. There is a procedure called reindexing that iterates over the entries of a file system, and writes all indexable information into a new index. This way, the size of the index can be changed. In order to start a reindex run, just do {[ filesys reindex -index-size ]} where [] is the name of the file system without suffixes (e.g. [foo] if the files are called [foo.data], [foo.idx], and [foo.lock]), and [] is the new size of the index (given in the maximum number of entries of the hash table). Some notes: {ul {- During the run, the original file system is only read-locked, i.e. other processes can still read from it, but any writers have to wait until the reindexing run is over.} {- By default, any errors in the original file system abort the reindexing run. You can change that with the [-fault-tolerant] switch. It is then tried to recover from errors.} {- Unless [-index-iteration] is given, the old index is ignored (except for its type). Because of this, reindexing can also be used to repair corrupt indexes.} {- For file systems with the [HAVEDUPS] options, it is required to do two iteration passes (to figure out the duplicates). This is done automatically, but takes some time. You may try another method for eliminating duplicates with the [-index-iteration] switch. In this case, the iteration is driven by the old index.} } {2 Repairing file systems} After a system crash the checkpointing mechanism prevents that the not-yet-synced parts of the file system are trusted. This is a simple mechanism to protect the core structure of the system, but not a full solution against errors in the file system. In particular, it may happen that inodes contain bad pointers. The repair procedure iterates over all inodes, and deletes the bad ones. In order to repair, just do {[ filesys repair ]} where [] is the name of the file system without suffixes (e.g. [foo] if the files are called [foo.data], [foo.idx], and [foo.lock]). Some notes: {ul {- When repairing, the file system is modified, and thus an exclusive lock is required.} } {2 Other functions of filesys} The [filesys] tool has a few other commands: {[ usage: filesys is one of the following: create: Create a filesys get: Get a file from the filesys put: Put a file into the filesys list: List the contents of a filesys delete: Delete a file in the filesys rename: Rename a file in the filesys reindex: Create a new index for a filesys compact: Compact the filesys rollback: Rollback filesys to last checkpoint repair: Rollback & repair filesys and depend on the command you are issuing. use 'filesys -help' to get command-specific help. ]}