Module Seqdb_containers.Kvseq_hp


module Kvseq_hp: KVSEQ 
A version of KVSEQ using pointers with key hashes. This implementation has to be used when the CELLSZ of the Hindex file is 2.

type t 
The kvseq file
type entry 
A live entry in a kvseq file
type pointer 
A pointer to a live entry

type contents = {
   delflag : bool; (*The delete flag (if supported, false otherwise)*)
   key : string; (*The keys are arbitrary strings of their repr_class*)
   value : string; (*The values are arbitrary strings of their repr_class*)
}
type repr_class = [ `Fixed of int | `Int16 | `Int32 | `Int64 | `Int8 | `Lim8 of int ] 
Representation class:

File descriptors. In order to open a Kvseq file one has to call create or access. Both functions require a Seqdb_rdwr.file_descr object as input argument. This object allows detailed control over the lifetime of the Unix file descriptor. This object has the method file_descr which must simply return the open file descriptor of the file, and dispose_hint which may optionally be interpreted by the object. A simple version of this object would be:

         let fd = Unix.openfile filename [ Unix.O_RDWR ] 0 in
         object
           method file_descr = fd
           method dispose_hint() = ()
         end
      

Generally, one has to call the Seqdb_containers.KVSEQ.flush function to ensure that all data are written to the file before closing the descriptor. So it would be legal to do:

         flush kvseq;
         Unix.close fd
      

However, one has to be careful not to use kvseq anymore if the file descriptor object is that simple as shown, because the file_descr method would now return the closed descriptor.

By using cleverer file descriptor objects it is possible to continue to access kvseq after flushing. See Seqdb_rdwr.file_descr and Seqdb_rdwr.managed_descr for details.

val create : ?buffer_size:int ->
?chunk_size:int ->
?sbsize:int ->
?fileincr:int64 ->
?supports_deletions:bool ->
?keyrepr:repr_class ->
?valrepr:repr_class ->
?alignment:int ->
?have_statistics:bool ->
?suggested_hash_algo:Seqdb_containers.Hash_algo.hash_algo ->
?purpose:string -> Seqdb_rdwr.file_descr -> t
Write an empty kvseq structure into the file referenced by the file descriptor. The parameters mean (but see also Seqdb_containers.Sb_consts and Seqdb_formats):

The above parameters are saved in the superblock. Only purpose abd fileincr can be changed later by modifying the superblock.

There are also parameters configuring the access layer. These are only valid as long as the file is accessed, and can be set to different values every time the file is opened:


val access : ?buffer_size:int ->
?chunk_size:int ->
?conservative:bool -> Seqdb_rdwr.file_descr -> t
Access the kvseq file referenced by the file descriptor.


val superblock : t -> Seqdb_containers.Superblock.t
Get the superblock. Note that if you modify the superblock, it is not automatically written back unless you also call mark_superblock_as_dirty.
val mark_superblock_as_dirty : t -> unit
The superblock is marked as dirty, and will be written out at the next good opportunity
val rollback_flag : t -> bool
After opening the kvseq with access, this flag is true if a rollback to the last synchronized file size is to be done. See also the description for access. The flag is reset when the rollback has been carried out.
val configure : ?flush_every:int ->
?auto_sync:int option ->
?auto_fadvise:bool ->
?onsync:(unit -> unit) -> t -> unit
Sets some (non-persistent) parameters:
val get_pointer : entry -> pointer
Get the pointer of an entry
val get_contents : entry -> contents
Get the contents of an entry
val get_key : entry -> string
Get only the key of the entry
val has_key : entry -> string -> bool
Checks whether the entry has the key
val get_value : entry -> string
Get only the value of the entry
val get_value_length : entry -> int64
Get the length of the value
val get_total_length : entry -> int64
Get the total length (used space) of the entry
val get_delflag : entry -> bool
Get only the delflag of the entry
val lookup : t ->
pointer -> entry
Get an entry by looking up a pointer
val validate_pointer : t -> pointer -> bool
See POINTABLE
val add : t ->
contents -> entry
Add another value to the file, and return the new entry. Fails if delflag is true, but the file does not support deletions.
val replace : entry -> contents -> unit
Replace the value stored inside an entry with a new version. The new version must have the same size except it happens to be the last entry of the file. Fails otherwise. Also fails if delflag is true, but the file does not support deletions.
val rename : entry -> string -> unit
Rename the entry. The resulting new key must have the same size on disk as the old one.
val delete : entry -> unit
Same as replacing the entry with a deleted entry
val blit_to_string : entry -> int64 -> string -> int -> int -> unit
blit_to_string e e_pos s s_pos len: Copies the substring of length len at position e_pos from e's value to s at position s_pos.
val blit_from_string : string -> int -> entry -> int64 -> int -> unit
blit_from_string s s_pose e_pos len: Copies the substring of length len at position s_pos from s to e's value at position e_pos.

Generally it is allowed that the value becomes longer by this operation. However, the same restriction as for replace applies: Unless the entry is the last, the length of the value must not be changed.

val flush : t -> unit
Ensure that everything is written out (but a sync is not forced). Furthermore, it is ensured that all file descriptors are forgotten about.
val sync : t -> unit
Ensure that everything is physically written to disk (implies flush). Also sets the superblock variables SYNCSIZE and SYNCTIME.
val first_entry : t -> entry
Returns the first entry, or raises End_of_file
val next_entry : entry -> entry
Returns the next entry of a given entry, or raises End_of_file
val recover_entry : Pcre.regexp ->
t ->
pointer option -> entry
This is a recovery function for reading damaged files. It tries to find the next valid entry by investigating the file after the pointer on byte level (if no pointer is given, it tries to find the first valid entry of the file). The function only accepts entries whose keys match the passed regular expression.

The idea is to call it with the pointer of the last readable entry in order to skip damaged regions in the file and to find the next valid entry after that.

Raises End_of_file if nothing can be found.

val string_of_pointer : pointer -> string
val pointer_of_string : string -> pointer
val int64_of_pointer : pointer -> int64
val pointer_length : int
Convert pointer to/from string, and int64. pointer_length = 8.
val keyrepr : t -> repr_class
val valrepr : t -> repr_class
val supports_deletions : t -> bool
val alignment : t -> int
val have_statistics : t -> bool
val suggested_hash_algo : t -> Seqdb_containers.Hash_algo.hash_algo option
Query features of the file
val num_entries : t -> int64
val num_active_entries : t -> int64
Get statistics. Not_found if not available
val free_mark : string
val del_mark : string
The pointer values marking free and deleted hash table cells
val fadvise_wontneed : t -> unit
Tell the page cache that we won't need this file any more. Note that when writing, only synced pages are affected. The superblock is excluded from the advice.
val fadvise_iterating : t -> unit
Tell the page cache that we are iterating over the file, and it is a good idea to read ahead pages. Additionally, pages that are no longer useful are removed from the page cache.

It is sufficient to call this function once before starting the iteration. This mode is turned off when reaching EOF or when fadvise_wontneed is called.

val fadvise_willneed : t -> int64 -> int64 -> unit
Advise to load the size bytes at pointer

General note about fadvise:

The library only uses FADV_WILLNEED and FADV_DONTNEED. When looking up an entry, the pages are read ahead if this looks useful (with FADV_WILLNEED). This is done anyway without needing any hint from the caller.

The fadvise_iterating mode is implemented by giving FADV_WILLNEED and FADV_DONTNEED hints at the right moments. Note that by default the underlying device also does a read-ahead. This library does not depend on this function, however.

Linux allows it to turn off this implied read-ahead by fadvising FADV_RANDOM. This is not done by this library, but may be a good idea to do when only random lookups are expected. In this case, the user of this library should do it.