Re: String encoding questions


Subject: Re: String encoding questions
From: David Mandelin (mandelin@cs.wisc.edu)
Date: Thu Aug 23 2001 - 12:54:31 CDT


Dom Lachowicz wrote:
>
> I will update UT_convert as per how I feel is appropriate. I haven't really
> touched it in a while.

OK. How about this patch for the comment and parameter names?

Index: af/util/xp/ut_iconv.cpp
===================================================================
RCS file: /cvsroot/abi/src/af/util/xp/ut_iconv.cpp,v
retrieving revision 1.13
diff -u -r1.13 ut_iconv.cpp
--- af/util/xp/ut_iconv.cpp 2001/08/20 19:55:58 1.13
+++ af/util/xp/ut_iconv.cpp 2001/08/23 17:51:30
@@ -164,26 +164,31 @@
  * Borrowed from GLib 2.0 and modified
  *
  * str - Pointer to the input string.
- * len - Length of the input string to convert. If len
- * is zero the whole string (strlen(str) ) is used for conversion.
- * to_codeset - The "codeset" of the string pointed to by 'str'.
- * from_codeset - The "codeset" we want for the output.
+ * len - Length of the input string to convert.
+ * from_codeset - The "codeset" of the string pointed to by 'str'.
+ * to_codeset - The "codeset" we want for the output.
  * bytes_read - optional, supply NULL if you don't want this.
  * bytes_written - optional, supply NULL if you don't want this.
- * Returns a freshly allocated output string, which is 0-terminated
- * (though I am not sure that has any significance in the general case).
+ *
+ * Returns a freshly allocated output string, which is terminated by
+ * a zero byte. Note that if the output codeset's terminator is not
+ * a zero byte (e.g., UCS-2, where it is two zero bytes), you can
+ * get correct termination by including the input string's terminator
+ * in the length passed as 'len'. E.g., if 'str' is null-terminated
+ * US-ASCII "foo", given 'len' as 4.
+ *
  * TODO: Check for out-of-memory allocations etc.
  */
 extern "C"
 char * UT_convert(const char* str,
                   UT_uint32 len,
- const char* to_codeset,
                   const char* from_codeset,
+ const char* to_codeset,
                   UT_uint32* bytes_read_arg,
                   UT_uint32* bytes_written_arg)
 {
 
- if (!str || !to_codeset || !from_codeset)
+ if (!str || !from_codeset || !to_codeset)
         {
                 return NULL;
         }
@@ -198,7 +203,7 @@
 
         UT_TRY
           {
- auto_iconv cd(to_codeset, from_codeset);
+ auto_iconv cd(from_codeset, to_codeset);
 
             if (len < 0)
               {
Index: af/util/xp/ut_iconv.h
===================================================================
RCS file: /cvsroot/abi/src/af/util/xp/ut_iconv.h,v
retrieving revision 1.8
diff -u -r1.8 ut_iconv.h
--- af/util/xp/ut_iconv.h 2001/08/10 18:32:38 1.8
+++ af/util/xp/ut_iconv.h 2001/08/23 17:51:30
@@ -72,8 +72,8 @@
 
 char * UT_convert (const char *str,
                         UT_uint32 len,
- const char *to_codeset,
                         const char *from_codeset,
+ const char *to_codeset,
                         UT_uint32 *bytes_read,
                         UT_uint32 *bytes_written);
 



This archive was generated by hypermail 2b25 : Thu Aug 23 2001 - 12:54:33 CDT