libztxt is a simple C library for manipulating zTXT databases. It doesn't do anything exceptional, so it should be very portable, to the point of not having to modify the code at all (maybe some of the headers, though).
Currently, libztxt provides a means for offloading all of the zTXT processing and database creation from your program. You just specify what the data is, a few settings, and then a complete database comes out. You can then write the DB out to a file or stream it off to some place else.
libztxt also has the ability to deconstruct zTXT databases and return the component pieces: text, bookmarks, and annotations. These can then be written out to files on disk.
libztxt is distributed as part of the source of makeztxt. You will find it in the libztxt/ subdirectory. The current version of libztxt is 2.01 distributed with makeztxt 1.62. libztxt, like makeztxt and Weasel, is licensed under the GNU GPL 2.0. If anybody makes use of this library in any program, please send me an email and let me know (just out of curiosity).
libztxt includes the following functions:
/* Functions to manipulate a zTXT structure */ ztxt * ztxt_init(void); void ztxt_free(ztxt *db); void ztxt_add_regex(ztxt *db, char *regex); void ztxt_add_bookmark(ztxt *db, char *title, long offset); void ztxt_add_annotation(ztxt *db, char *title, long offset, char *annotext); int ztxt_process(ztxt *db, int method, int line_length); void ztxt_generate_db(ztxt *db); int ztxt_disect(ztxt *db); int ztxt_crc32(int crc, const void *buf, int len); /* Print a list of bookmarks to stdout */ void ztxt_list_bookmarks(ztxt *db); /* * Use these to change the defaults. * You must call ztxt_set_data() to do any work. * Calling ztxt_set_title() is also a good idea. */ void ztxt_set_title(ztxt *db, char *new_title); void ztxt_set_data(ztxt *db, char *new_data, long datasize); void ztxt_set_output(ztxt *db, char *data, long datasize); void ztxt_set_creator(ztxt *db, long new_creator); void ztxt_set_type(ztxt *db, long new_type); void ztxt_set_wbits(ztxt *db, int new_wbits); void ztxt_set_compressiontype(ztxt *db, int new_comptype); void ztxt_set_attribs(ztxt *db, short new_attribs); /* * Use these to retrieve values from the ztxt structure. */ char * ztxt_get_output(ztxt *db); long ztxt_get_outputsize(ztxt *db); char * ztxt_get_input(ztxt *db); long ztxt_get_inputsize(ztxt *db); short ztxt_get_num_bookmarks(ztxt *db); bmrk_node * ztxt_get_bookmarks(ztxt *db); short ztxt_get_num_annotations(ztxt *db); anno_node * ztxt_get_annotations(ztxt *db); /* * Utility functions provided by libztxt */ char * ztxt_strip_spaces(char *str); int ztxt_whitespace(char yoda); char * ztxt_sanitize_string(char *str);
Here is the quick and easy method to using libztxt. Everybody's doing it, why aren't you?
char *bigbuffer; long bufsize; ztxt *db; char *outdata; /* Miscellaneous code */ /* Read the input into a big buffer */ db = ztxt_init(); ztxt_set_data(db, bigbuffer, bufsize); ztxt_set_title(db, "My Kool zTXT"); ztxt_add_regex(db, "Chapter [IVX]+"); ztxt_add_regex(db, "Appendix [IVX]+"); ztxt_process(db, 0, 0); ztxt_list_bookmarks(db); ztxt_generate_db(db); outdata = ztxt_get_output(db); /* Write outdata to a file */
ztxt * ztxt_init(void);
This function should be called before you call any of the others. It will allocated an empty ztxt structure and sets the default values.
void ztxt_free(ztxt *db);
When you are finished with a particular ztxt, call this function to free memory used by the ztxt structure. This will free all new buffers AND the ztxt structure itself. This will not, however, free the input buffer you have provided.
void ztxt_add_regex(ztxt *db, char *regex);
Add a regular expression to the linked list stored in the ztxt. Call this function to add each regular expression you want to use. regex is the pattern string to be compiled.
void ztxt_add_bookmark(ztxt *db, char *title, long offset);
Add a bookmark to the linked list of bookmarks stored in the ztxt. This function is for adding bookmarks which will not be generated by libztxt. Those will be added automatically when generated. title is the bookmark's title and has a maximum size of 20 characters (MAX_BMRK_LENGTH) excluding the NULL terminator. offset is the absolute offset into the text data where the bookmark is anchored. The position given here is a position in the processed text used by libztxt.
void ztxt_add_annotation(ztxt *db, char *title, long offset, char *annotext);
Add an annotation to the linked list of annotations stored in the ztxt. title is the annotation's title and has a maximum size of 20 characters (MAX_BMRK_LENGTH) excluding the NULL terminator. offset is the absolute offset into the text data where the annotation is anchored. The position given here is a position in the processed text used by libztxt. annotext is the actual text of the annotation. A copy will be made of the text provided so the provided pointer need not be persistent. annotext has a maximum length of 4096 characters including the NULL terminator.
int ztxt_process(ztxt *db, int method, int line_length);
This function does the bulk of the work that the library performs. Before calling this function, you must have previously called ztxt_set_data(), so that it has some data to process. ztxt_process takes the input buffer and reformats it for the Palm. This involves removing any carriage returns (if from a DOS file) and stripping linefeeds. The reformatted text is then scaned using the supplied regular expressions for possible bookmarks. Lastly, the reformatted text is compressed using zlib. You can have some control over this process by calling ztxt_set_wbits(). method and line_length control how the reformatting affects the text:
void ztxt_generate_db(ztxt *db);
Once the ztxt has been processed, call this function to generate a full database. This will assemble the database header, zTXT record 0, the compressed data, any bookmarks, and any annotations. The result is then stored in the ztxt structure. This is a complete zTXT database and can be written straight to disk or sent off elsewhere without any further processing. The output data may be accessed by calling ztxt_get_output(). The size of the output buffer may be retrieved by calling ztxt_get_outputsize().
int ztxt_disect(ztxt *db);
To deconstruct a zTXT database, load the whole database into memory and store it in the structure using ztxt_set_output(). Calling this function will then populate the rest of the structure with the components of the zTXT. The text data will be decompressed and extracted and all bookmarks and annotations will be extracted. This data may then be retrieved with ztxt_get_input(), ztxt_get_bookmarks(), and ztxt_get_annotations().
An important fact to remember when using this function is that it appears to work backwards compared to the other libztxt functions. The data to be worked on is put into the ztxt structure's "output" data area and this function's result is placed in the "input" data area. The reason for this is because the result of this function is suitable for reprocessing using ztxt_process(). So, in a way, it is working backwards.
Important: After calling this function remember to call ztxt_set_output() again, passing in NULL for the data pointer. This is important because ztxt_free() will attempt to free that memory region unless it is NULL and it is not an area of memory that was allocated by libztxt.
int ztxt_crc32(int crc, const void *buf, int len);
Used for calculating a CRC32 value over a range of data. crc is the CRC seed to use. When a new CRC32 is being calculated, the CRC seed value should be initialized by calling ztxt_crc32(0, NULL, 0). Currently this function is just a simple wrapper for the crc32function found in zLib.
void ztxt_list_bookmarks(ztxt *db);
This is a simple utility function to output the names and offsets of all bookmarks generated by the regular expressions when ztxt_process() was called. They are output in a simple table to stdout.
void ztxt_set_title(ztxt *db, char *new_title);
Call this function to set the database title. The title has a maximum length of 32 characters. You really should call this function as the default database name is rather meaningless to users.
void ztxt_set_data(ztxt *db, char *new_data, long datasize);
You must call this function after ztxt_init() and before calling any other function. This sets the input buffer to new_data. Be sure that datasize is no larger than the actual size of new_data or else bad things are likely to occur.
void ztxt_set_output(ztxt *db, char *data, long datasize);
This sets the output buffer pointer to data. Be sure that datasize is no larger than the actual size of data or else bad things are likely to occur.
void ztxt_set_creator(ztxt *db, long new_creator);
The default creator for a zTXT database is 'GPlm'. Normally this is what you want so that the database can be read by Weasel Reader, but you can change it with this function. The creator ID is a 32 bit integer assigned by Palm.
void ztxt_set_type(ztxt *db, long new_type);
The default type for a zTXT database is 'zTXT'. Since you're using libztxt to create zTXT databases, you probably want this. If not, you can change it with this function. The type ID is a 32bit integer.
void ztxt_set_wbits(ztxt *db, int new_wbits);
This function sets the window bits used by zlib. For a full explanation of what this does, consult the zlib docs (header files). In short, this has an affect on how much memory is used during compression and then again during decompression. Valid range is 8 to 15, with higher numbers using more memory. The default is 15 and should be fine. If you have a Palm with very little memory, this might be useful... maybe.
void ztxt_set_compressiontype(ztxt *db, int new_comptype);
This function controls which type of zTXT libztxt will generate. new_comptype can be either 1 or 2, with 1 being the default. Type 1 is the new "on-demand decompression" zTXT format. This entails compressing 8K chunks of text with Z_FULL_FLUSH thus allowing for random access on the compressed data. Type 2 is the original style which requires the document to be fully decompressed when you start reading it. Type 2 does, however, give about 10% - 15% better compression.
void ztxt_set_attribs(ztxt *db, short new_attribs);
Use this function to change the attributes in the output database header. If this function is not called, the default attributes will be set. The default is to just set the backup bit (dmHdrAttrBackup). The attributes are a 16bit integer.
char * ztxt_get_output(ztxt *db);
Returns a pointer to the output data buffer in the ztxt structure.
long ztxt_get_outputsize(ztxt *db);
Returns the size of the output data buffer in the ztxt structure.
char * ztxt_get_input(ztxt *db);
Returns a pointer to the input data buffer in the ztxt structure.
long ztxt_get_inputsize(ztxt *db);
Returns the size of the input data buffer in the ztxt structure.
short ztxt_get_num_bookmarks(ztxt *db);
Returns the number of bookmarks in the bookmark linked list in the ztxt structure.
bmrk_node * ztxt_get_bookmarks(ztxt *db);
Returns a pointer to the head of the bookmark linked list. This list may be traversed by going to ptr->next until that value is NULL.
short ztxt_get_num_annotations(ztxt *db);
Returns the number of annotations in the annotation linked list in the ztxt structure.
anno_node * ztxt_get_annotations(ztxt *db);
Returns a pointer to the head of the annotation linked list. This list may be traversed by going to ptr->next until that value is NULL.
char * ztxt_strip_spaces(char *str);
Strips leading and trailing whitespace from the input string. This function will modify that data in str. Returns str.
int ztxt_whitespace(char yoda);
Returns 1 if yoda is a whitespace character, 0 otherwise.
char * ztxt_sanitize_string(char *str);
Removes non-printable characters, linefeeds, and carriage returns from the input string. This function will modify that data in str. Returns str.