str_functions.h File Reference

String processing functions. More...

#include <string.h>
#include <glib.h>
#include <sys/types.h>
#include <regex.h>

Go to the source code of this file.

Data Structures

struct  StringList
 StringList is a structure that stores a list of constant strings. More...

Defines

#define CHAR_TO_UNSIGNEDINT(c)   (unsigned int) ((int) c >=0)? c : c+256
 Convert char to unsigned integer.
#define CHAR_TO_UNSIGNEDCHAR(c)   (unsigned char) ((int) c >=0)? c : c+256
 Convert char to unsigned char.

Functions

StringListstringList_new ()
 Create a new StringList instance.
StringListstringList_sized_new (size_t chunk_size, size_t element_count)
 Create a new StringList instance with given sizes.
void stringList_clear (StringList *sList)
 Clear all content of Stringlist.
int stringList_find_string (StringList *sList, const char *str)
 Return index of the first match string in StringList.
gboolean stringList_has_string (StringList *sList, const gchar *str)
 Whether a string is in StringList.
gchar ** stringList_to_charPointerPointer (StringList *sList)
 Return a char pointer pointer (char **) which points to the list of strings.
const gchar * stringList_index (StringList *sList, guint index)
 Return the string at the given index.
guint stringList_insert (StringList *sList, const gchar *str)
 Adds a copy of string to the StringList.
guint stringList_insert_const (StringList *sList, const gchar *str)
 Adds a copy of string to the StringList, unless the identical string has already been added to the StringList with stringList_insert_const().
void stringList_free (StringList *sList)
 Free the StringList instance.
gchar * string_formatted_combine (const gchar *format, StringList *sList, int *counter_ptr)
 *
gchar * initString (gchar *str)
 Initialize the string by setting the first char to 0x0.
gboolean isEmptyString (const gchar *str)
 Check whether the string is NULL or have 0 length.
void string_trim (gchar *str)
 Trim the leading and trailing whitespace of the string.
gchar * subString (const gchar *str, int beginIndex, int length)
 Return a substring of the given string.
gchar * subString_buffer (gchar *buf, const gchar *str, int beginIndex, int length)
 Return a substring of the given string in given buffer.
gchar * string_append_c (gchar *str, const char ch, size_t length)
 Append a character to a string.
gboolean string_is_decomposed_fast (const gchar *str)
 Whether a string is decomposed (no validation).
gchar * string_padding_left (const gchar *str, const gchar *padding_str, size_t length)
 Pad a string on the left up to certain length.
gchar * string_padding_right (const gchar *str, const gchar *padding_str, size_t length)
 Pad a string on the right up to certain length.
char * ucs4_to_utf8 (gunichar ucs4_code)
 Convert UCS-4 to UTF-8 string.
gunichar * utf8_to_ucs4 (const char *utf8_str)
 Convert UTF-8 string to UCS-4 (gunichar).
gchar * utf8_concat_ucs4 (gchar *utf8_str, gunichar ucs4_code)
 Concatenate a UCS-4 (gunichar) to an UTF-8 string.
int strcmp_unsigned_signed (const unsigned char *str1, const gchar *str2)
 Compare between signed and unsigned char arrays.
unsigned char * signedStr_to_unsignedStr (const gchar *str)
 Convert the signed char string to a new allocated unsigned char string.
unsigned char * signedStr_to_unsignedStr_buffer (unsigned char *resultBuf, const gchar *str)
 Convert the signed char string to the unsigned char string buffer.
char * unsignedStr_to_signedStr (const unsigned char *str)
 Convert the unsigned char string to a new allocated signed char string.
gchar * unsignedStr_to_signedStr_buffer (gchar *resultBuf, const unsigned char *str)
 Convert the unsigned char string to the signed char string buffer.
Regex manipulating functions.
These functions provide evaluation and search-replace functions for regex matches.

They are based with regex.h, thus format of search pattern and option flags are same as used in regcomp() and regexec().

These functions are capable of dealing with parenthesized sub patterns, which are referred as their pattern id. Id 0 refers the whole matched pattern, 1 refers the first sub pattern, and 2 for second sub pattern, and so on.

The matched sub-patterns are extracted and stored in a StringList, then processed by string_formatted_combine().

gchar * string_regex_formatted_combine_regex_t (const gchar *str, const regex_t *preg, const gchar *format, int eflags, int *counter_ptr)
 Combine sub-matches of a regex search into a specified format, if the regex expression is complied as a regex_t.
gchar * string_regex_formatted_combine (const gchar *str, const gchar *pattern, const gchar *format, int cflags, int eflags, int *counter_ptr)
 Combine sub-matches of a regex search into a specified format.
gchar * string_regex_replace_regex_t (const gchar *str, const regex_t *preg, const gchar *format, int eflags, int *counter_ptr)
 Replace a new text for the substring matching a regular expression.
gchar * string_regex_replace (const gchar *str, const gchar *pattern, const gchar *format, int cflags, int eflags, int *counter_ptr)
 Replace a new text for the substring matching a regular expression.


Detailed Description

This header file lists the some string processing functions. Such as subString, and StringList, which provides a memory efficient methods to store a list of constrant strings.

Define Documentation

#define CHAR_TO_UNSIGNEDCHAR (  )     (unsigned char) ((int) c >=0)? c : c+256

Parameters:
c char to be converted.
Returns:
unsigned char representation of c.

#define CHAR_TO_UNSIGNEDINT (  )     (unsigned int) ((int) c >=0)? c : c+256

Parameters:
c char to be converted.
Returns:
unsigned integer representation of c.


Function Documentation

gchar* initString ( gchar *  str  ) 

If str is NULL, then an char array with MAX_STRING_BUFFER_SIZE will be assined.

Parameters:
str String to be initialize, NULL for allocate a new string..
Returns:
The initialized string.

gboolean isEmptyString ( const gchar *  str  ) 

Parameters:
str String to be check.
Returns:
False if the string is not empty, true otherwise.

unsigned char* signedStr_to_unsignedStr ( const gchar *  str  ) 

Parameters:
str Signed char string.
Returns:
A new allocated unsigned char string.
See also:
signedStr_to_unsignedStr_buffer()

unsignedStr_to_signedStr()

unsignedStr_to_signedStr_buffer()

unsigned char* signedStr_to_unsignedStr_buffer ( unsigned char *  resultBuf,
const gchar *  str 
)

Parameters:
resultBuf The buffer that stored the conversion result.
str Signed char string.
See also:
signedStr_to_unsignedStr()

unsignedStr_to_signedStr()

unsignedStr_to_signedStr_buffer()

int strcmp_unsigned_signed ( const unsigned char *  str1,
const gchar *  str2 
)

It behaves like strcmp() except the comparison is between a unsigned string (char array) and signed string. Mainly for GCC 4.3

Parameters:
str1 Unsigned string to be compared.
str2 Signed string to be compared.
Returns:
An integer less than, equal to, or greater than zero if str1 is found, respectively, to be less than, to match, or be greater than str2.

gchar* string_append_c ( gchar *  str,
const char  ch,
size_t  length 
)

This function appends a character ch to the end of str, if the length of the string including the trailing '' is less than the given length. It returns NULL if ch cannot be appended because the length limit.

Note that str will be modified if ch is successfully appended.

Parameters:
str The string.
ch The char to be appended to str.
length The maximum length of str ('' included).
Returns:
str if success, NULL if failed.

gchar* string_formatted_combine ( const gchar *  format,
StringList sList,
int *  counter_ptr 
)

* Flag to indicate that result substrings can be overlapped.

With this flag, results for aaa match a* will be aaa, aa, a, but not empty string.

Note:
This flag has no effect if REGEX_RESULT_MATCH_ONCE is also set.* * * * * * * * * * Combine a list of strings into a specified format.
This function combines a list of strings (sList) into a newly allocated string, according to the format string format.

The format string a character string, which is composed of zero or more directives: ordinary characters (not $), which are copied unchanged to the output string; and format directives, each of which results in fetching zero or more subsequent arguments. Each format directives is introduced by the character $, followed by optional flags, mandatory pattern id, and optional options like substitute strings or padding instruction. In between there may be (in this order) zero or more flags, one to three optional options. Note that at most one flag can be used in format directives.

The format of a format directive is: $[flag]<argument id>[{[option1[,option2[,option3]]]}] If no flags are given, format directives are substituted by arguments they refer.

The argument id starts from 0, but should not exceed the number of arguments.

Following flags provide additional output control:

  • N<id>{str1 [,str2]}: if argument id is nonempty, then str1 is outputted for this format directives; otherwise outputs str2, or empty string if str2 is omitted.
  • E<id>{str1 [,str2]}: similar with -N, but output str1 if argument id is empty, i.e, is NULL or has 0 length.
  • U<id>: argument id should be outputted as uppercase. This directive is backed by g_utf8_strup(), so it will convert non-ascii unicode character as well.
  • L<id>: argument id should be outputted as lowercase. This directive is backed by g_utf8_strdown(), so it will convert non-ascii unicode character as well.
  • P<id>{length[,pad_char]}: argument id should be padded with pad_char on the left till it reaches the length. Space (' ') is used as pad_char if it is not given.
  • p<id>{length[,pad_char]}: argument id should be padded with pad_char on the right till it reaches the length. Space (' ') is used as pad_char if it is not given.
  • X<id>[{length}]: output argument id as hexadecimal if it contains a literal integer. If length is given, 0 will be padded on the left of the argument. Note that NULL is returned if argument id cannot be converted by strtol().
  • T<id>: output argument id as UTF-8 string if it contains a literal integer. Note that NULL is returned if argument id cannot be converted by strtol() and g_unichar_to_utf8 ().
  • S<id>{beginIndex[,length]}: output substring of argument id which begins from beginIndex, if length is not given, the it will output till the end of argument id.
  • I<id>{compare_str,true_substitute[,false_substitute]}: output true_substitute if compare_str is identical to argument id; otherwise output false_substitute if given.
  • +<id>: if argument id is nonempty, then adds 1 to provided counter and output the number. if argument id is empty, then outputs a empty string.
  • -<id>: if argument id is nonempty, then minuses 1 to provided counter and output the number. if argument id is empty, then outputs a empty string.
  • $: Outputs a '$' character.

Character '$' is also an escape character. For example, '$N1{${}' outputs '{' if argument 1 is nonempty.

This function is similar with sprintf(), except:

  • This function accept a fixed length StringList.
  • This function provides conditional control.
  • This function supports utf8 case changing.
  • This function is capable of using counter.
  • This function returns a newly allocated result string.
  • Format directives can be nested in this function.

Parameters:
format the format for evaluate output.
sList the StringList that hold arguments.
counter_ptr a pointer to an integer counter. Can be NULL if + or - flags are not required.
Returns:
a newly allocated result string; or NULL if error occurs.

gboolean string_is_decomposed_fast ( const gchar *  str  ) 

The main purpose of this function is to provides a quick check of whether a str is decomposed, so developers can determine to leave it or composite it back. However, it only compares lengths before and after the normalization, and nothing beyond. Use this function with care.

Parameters:
str The string to be checked.
Returns:
TRUE if the string is decomposed, FALSE otherwise.

gchar* string_padding_left ( const gchar *  str,
const gchar *  padding_str,
size_t  length 
)

Note that if padding_str is multi-bytes, the padding will not exceed the length.

Parameters:
str original string.
padding_str string to be padded on the left.
length length in bytes.
Returns:
A newly allocated string that store the padded string.

gchar* string_padding_right ( const gchar *  str,
const gchar *  padding_str,
size_t  length 
)

Note that if padding_str is multi-bytes, the padding will not exceed the length.

Parameters:
str original string.
padding_str string to be padded on the right.
length length in bytes.
Returns:
A newly allocated string that store the padded string.

void string_trim ( gchar *  str  ) 

Note the content of str might be changed. Use strdup() or g_strdup() to backup.

Parameters:
str String to be trim.

void stringList_clear ( StringList sList  ) 

Parameters:
sList The StringList to be processed.

int stringList_find_string ( StringList sList,
const char *  str 
)

This function returns index of the first match string in StringList, if no such string, then return -1.

Parameters:
sList The StringList to be processed.
str The string to be found.
Returns:
The index of the string from 0 if found; -1 otherwise.

void stringList_free ( StringList sList  ) 

Note that this function assumes the sList is not NULL. Use if (sList) stringList_free(sList); to tolerate the NULL parameter.

Parameters:
sList The StringList to be processed.

gboolean stringList_has_string ( StringList sList,
const gchar *  str 
)

Return TRUE if at least one string in sList is identical with str; otherwise return FALSE.

Parameters:
sList The StringList to be processed.
str The string to be found.
Returns:
TRUE if at least one string in sList is identical with str; FALSE otherwise.

const gchar* stringList_index ( StringList sList,
guint  index 
)

Parameters:
sList The StringList to be processed.
index The given index.
Returns:
The string at the given index.

guint stringList_insert ( StringList sList,
const gchar *  str 
)

This function inserts a copy of str to the string list, and return the index of the string, which is also the last element in the string list.

It calls g_string_chunk_insert() to insert a copy to string, then appends the returned pointer from g_string_chunk_insert() to the pointer array. After that, this function return the index of the appended string pointer.

The difference between this function and stringList_insert_const() is that each inserted identical string will have it own spaces.

Parameters:
sList The StringList to be processed.
str String to be inserted, can be NULL.
Returns:
the index of the newly inserted string.
See also:
stringList_insert_const()

guint stringList_insert_const ( StringList sList,
const gchar *  str 
)

This function inserts a copy str to the string list, unless the identical string has already been previous inserted by this function. Useful if you do not want to spend extra space to store identical strings.

If identical string is previous inserted by this function, which means the string is in the const hash table, the pointer to the previous inserted string is append to the pointer array; if no identical string is inserted, then a copy of str will be inserted, to the string list by g_string_chunk_insert_const(), then the returned pointer is inserted to the pointer array. After that, this function return the index of the appended string pointer.

Note:
It will not check the duplicate strings inserted by stringList_insert().
Parameters:
sList The StringList to be processed.
str String to be inserted.
Returns:
the index of the newly inserted string.
See also:
stringList_insert()

StringList* stringList_new (  ) 

Returns:
the pointer to the allocated space.

StringList* stringList_sized_new ( size_t  chunk_size,
size_t  element_count 
)

This function allocate space for a StringList with given size, thus avoid frequent reallocation.

Parameters:
chunk_size Size in bytes required for string storage.
element_count Number of strings.
Returns:
the pointer to the allocated space.

gchar** stringList_to_charPointerPointer ( StringList sList  ) 

This function returns a char** which points to the places that strings are stored. The pointer directly points to content in StringList instance, so the returned pointer should not be free.

Use the stringList_free to free instead.

Parameters:
sList The StringList to be processed.
Returns:
The index of the string from 0 if found; -1 otherwise.

gchar* subString ( const gchar *  str,
int  beginIndex,
int  length 
)

The substring begins at the specified beginIndex and end after length bytes. The index starts from zero. If length is given a negative value, then a substring starting from beginIndex to the end of str is returned.

Parameters:
str String to be process
beginIndex the beginning index, inclusive.
length total bytes to copy, excluding the trailing ''
Returns:
A newly allocated string which is a substring of str.
See also:
subString_buffer()

gchar* subString_buffer ( gchar *  buf,
const gchar *  str,
int  beginIndex,
int  length 
)

This function is similar with subString(), except it stores the result in the developer-provided buffer.

Make sure to provide at least length+1 (including the trailing ''); or strlen(str)-beginIndex+1 if length is negative.

Parameters:
buf buffer that stores the result.
str String to be process
beginIndex the beginning index, inclusive.
length total bytes to copy.
Returns:
The buffer that stores the result.
See also:
subString()

char* ucs4_to_utf8 ( gunichar  ucs4_code  ) 

Parameters:
ucs4_code the UCS-4 to be converted.
Returns:
UTF-8 string that converted from the UCS-4 Code. Use g_free() after use.

char* unsignedStr_to_signedStr ( const unsigned char *  str  ) 

Parameters:
str Unsigned char string.
Returns:
A new allocated signed char string.
See also:
signedStr_to_unsignedStr()

signedStr_to_unsignedStr_buffer()

unsignedStr_to_signedStr_buffer()

gchar* unsignedStr_to_signedStr_buffer ( gchar *  resultBuf,
const unsigned char *  str 
)

Parameters:
resultBuf The buffer that stored the conversion result.
str Unsigned char string.
See also:
signedStr_to_unsignedStr()

signedStr_to_unsignedStr_buffer()

unsignedStr_to_signedStr()

gchar* utf8_concat_ucs4 ( gchar *  utf8_str,
gunichar  ucs4_code 
)

Parameters:
utf8_str the UTF-8 string.
ucs4_code the UCS-4 to be appended.
Returns:
a pointer to utf8_str;

gunichar* utf8_to_ucs4 ( const char *  utf8_str  ) 

Parameters:
utf8_str the UTF-8 string to be converted.
Returns:
UCS-4 representation of the UTF-8 string.


Generated on Tue Jan 13 10:49:25 2009 for libUnihan by  doxygen 1.5.7.1