module Kvseq_hp:A version ofKVSEQ
KVSEQ using pointers with key hashes.
This implementation has to be used when the CELLSZ of the
Hindex file is 2.type t
type entry
type pointer
type contents = {
|
delflag : |
(* | The delete flag (if supported, false otherwise) | *) |
|
key : |
(* | The keys are arbitrary strings of their repr_class | *) |
|
value : |
(* | The values are arbitrary strings of their repr_class | *) |
typerepr_class =[ `Fixed of int | `Int16 | `Int32 | `Int64 | `Int8 | `Lim8 of int ]
`Int8: Strings up to a length of 255 bytes`Int16: Strings up to a length of 65535 bytes`Int32: Strings up to a length of 2^32-1 bytes`Int64: Strings up to a length of 2^63-1 bytes (signed!)`Fixed n: Strings with a length of exactly n bytes.
0 <= n <= 255.`Lim8 n: Strings up to a length of n bytes where
n is at most 255. For the string n bytes are allocated
for the string, so it can be later changed in length.create or access.
Both functions require a Seqdb_rdwr.file_descr object as input
argument. This object allows detailed control over the lifetime
of the Unix file descriptor. This object has the method
file_descr which must simply return the open file descriptor
of the file, and dispose_hint which may optionally be interpreted by
the object. A simple version of this object would be:
let fd = Unix.openfile filename [ Unix.O_RDWR ] 0 in
object
method file_descr = fd
method dispose_hint() = ()
end
Generally, one has to call the Seqdb_containers.KVSEQ.flush
function to ensure that all data are written to the file before
closing the descriptor. So it would be legal to do:
flush kvseq;
Unix.close fd
However, one has to be careful not to use kvseq anymore if
the file descriptor object is that simple as shown, because
the file_descr method would now return the closed descriptor.
By using cleverer file descriptor objects it is possible to
continue to access kvseq after flushing. See
Seqdb_rdwr.file_descr and Seqdb_rdwr.managed_descr for
details.
val create : ?buffer_size:int ->
?chunk_size:int ->
?sbsize:int ->
?fileincr:int64 ->
?supports_deletions:bool ->
?keyrepr:repr_class ->
?valrepr:repr_class ->
?alignment:int ->
?have_statistics:bool ->
?suggested_hash_algo:Seqdb_containers.Hash_algo.hash_algo ->
?purpose:string -> Seqdb_rdwr.file_descr -> tSeqdb_containers.Sb_consts and Seqdb_formats):
sbsize is the size of the superblock to write (512 by default)fileincr is by how many bytes full files are extended if more
space is needed (4M by default)supports_deletions is whether the entries have the delete flag
(true by default)keyrepr is how the keys are represented (`Int64 by default)valrepr is how the values are represented (`Int64 by default)alignment is whether an alignment constraint applies (not
set by default)have_statistics says whether a statistics about the number
of entries and their size in bytes is maintained in superblock
variables (ENTRIES, and AENTRIES - true by default).suggested_hash_algo is always `MD5purpose is a string up to 8 chars describing the purpose of
the filepurpose
abd fileincr can be changed later by modifying the superblock.
There are also parameters configuring the access layer. These are only valid as long as the file is accessed, and can be set to different values every time the file is opened:
buffer_size: the size of the RAM buffer. The buffer is split
up into chunks of chunk_size, and every chunk may point to
a different area in the file. A value less than chunk_size
means ad-hoc buffering of up to chunk_size bytes. (Default: 0)chunk_size: the size of an individual buffer chunk.
(Default: 16K)val access : ?buffer_size:int ->
?chunk_size:int ->
?conservative:bool -> Seqdb_rdwr.file_descr -> t
conservative: If true, the logical file length is assumed
to be only SYNCSIZE and not FILESIZE, i.e. a rollback is
done if SYNCSIZE < FILESIZE. You can test this special condition
after access
with rollback_flag (below). The actual rollback is not immediately
done, but at the next good opportunity. You can enforce it by
calling sync. (Default: conservative=false)buffer_size: see Seqdb_containers.KVSEQ.createchunk_size: see Seqdb_containers.KVSEQ.createval superblock : t -> Seqdb_containers.Superblock.tmark_superblock_as_dirty.val mark_superblock_as_dirty : t -> unitval rollback_flag : t -> boolaccess, this flag is true if a
rollback to the last synchronized file size is to be done.
See also the description for access. The flag is reset when
the rollback has been carried out.val configure : ?flush_every:int ->
?auto_sync:int option ->
?auto_fadvise:bool ->
?onsync:(unit -> unit) -> t -> unitflush_every: The superblock is written every this number of
adds, deletes, replaces. (Default: 1)auto_sync: If non-None, every auto_sync seconds the whole file
is synced to disk. This also sets the superblock variables
SYNCSIZE and SYNCTIME. A value of 0 means: sync after every
modification. (Default: Some 0 - you want to change this)auto_fadvise: Advises to remove the file from the page cache
after every sync (automatic syncs & explicit syncs). The superblock
is not removed from the cache. (Default: false)onsync: This function is called before an automatic or explicit
sync is done. For example, one can sync the attached
Hindex at that time. (Default: do nothing)val get_pointer : entry -> pointerval get_contents : entry -> contentsval get_key : entry -> stringval has_key : entry -> string -> boolval get_value : entry -> stringval get_value_length : entry -> int64val get_total_length : entry -> int64val get_delflag : entry -> boolval lookup : t ->
pointer -> entryval validate_pointer : t -> pointer -> boolPOINTABLEval add : t ->
contents -> entrydelflag is true, but the file does not support deletions.val replace : entry -> contents -> unitdelflag is true, but the file does not support deletions.val rename : entry -> string -> unitval delete : entry -> unitval blit_to_string : entry -> int64 -> string -> int -> int -> unitblit_to_string e e_pos s s_pos len: Copies the substring of length
len at position e_pos from e's value to s at position s_pos.val blit_from_string : string -> int -> entry -> int64 -> int -> unitblit_from_string s s_pose e_pos len: Copies the substring of length
len at position s_pos from s to e's value at position e_pos.
Generally it is allowed that the value becomes longer by this
operation. However, the same restriction as for replace applies:
Unless the entry is the last, the length of the value must not be
changed.
val flush : t -> unitval sync : t -> unitflush).
Also sets the superblock variables SYNCSIZE and SYNCTIME.val first_entry : t -> entryEnd_of_fileval next_entry : entry -> entryEnd_of_fileval recover_entry : Pcre.regexp ->
t ->
pointer option -> entryThe idea is to call it with the pointer of the last readable entry in order to skip damaged regions in the file and to find the next valid entry after that.
Raises End_of_file if nothing can be found.
val string_of_pointer : pointer -> stringval pointer_of_string : string -> pointerval int64_of_pointer : pointer -> int64val pointer_length : intpointer_length = 8.val keyrepr : t -> repr_classval valrepr : t -> repr_classval supports_deletions : t -> boolval alignment : t -> intval have_statistics : t -> boolval suggested_hash_algo : t -> Seqdb_containers.Hash_algo.hash_algo optionval num_entries : t -> int64val num_active_entries : t -> int64Not_found if not availableval free_mark : stringval del_mark : stringval fadvise_wontneed : t -> unitval fadvise_iterating : t -> unit
It is sufficient to call this function once before starting the
iteration. This mode is turned off when reaching EOF or when
fadvise_wontneed is called.
val fadvise_willneed : t -> int64 -> int64 -> unit
The library only uses FADV_WILLNEED and FADV_DONTNEED. When looking
up an entry, the pages are read ahead if this looks useful (with
FADV_WILLNEED). This is done anyway without needing any hint from
the caller.
The fadvise_iterating mode is implemented by giving FADV_WILLNEED
and FADV_DONTNEED hints at the right moments. Note that by default
the underlying device also does a read-ahead. This library does
not depend on this function, however.
Linux allows it to turn off this implied read-ahead by
fadvising FADV_RANDOM. This is not done by this library, but
may be a good idea to do when only random lookups are expected.
In this case, the user of this library should do it.