charset: fix mm_charset_get_encoded_len

The while loop in mm_charset_get_encoded_len() iterates through each valid UTF-8 encoded character in the given NULL-terminated UTF-8 string. It uses g_utf8_find_next_char() to find the position of the next character. In case, g_utf8_find_next_char() returns NULL, it tries to find the end (i.e. the NULL character) of the string. This patch fixes the following issues in the while loop: 1. The loop uses both 'next' and 'end' to track the position of the next character in the string. When g_utf8_find_next_char() returns a non-NULL value, 'next' is essentially the same as 'end'. When g_utf8_find_next_char() returns NULL, 'next' remains NULL while 'end' is adjusted to track the end of the string (but is done incorrectly as described in #2). After the 'p = next' assignment, the 'while (*p)' check results in a NULL dereference. 'p' should thus be set to 'end' instead of 'next'. 'next' is thus redundant and can be removed. 2. When g_utf8_find_next_char() returns NULL and 'end' is adjusted to track the end of string, the 'while (*end++)' loop stops when finding the NULL character, but 'end' is advanced past the NULL character. After the 'p = end' assignment, the 'while (*p)' check results in a dereference of out-of-bounds pointer. 'while (*++end)' should be used instead given that 'p' doesn't point to a NULL character when 'end = p' happens. 'end' will be updated to point to the NULL character. After the 'p = end' assignment (fixed in #1), the 'while (*p)' check will properly stop the loop.
author: Ben Chan <benchan@chromium.org> 2017-08-11 15:17:30 -0700
committer: Aleksander Morgado <aleksander@aleksander.es> 2017-08-15 09:37:24 +0200
commit: 7ae32d6c61b7bd611b5b18626204f51d178b4b11 (patch)
tree: 7a883c4bc208973a26b562abbda0c71deeb4f749 /src/mm-charsets.c
parent: d639359320736dd582a2dbb5c50039a8a1b878da (diff)
1 files changed, 5 insertions, 5 deletions
diff --git a/src/mm-charsets.c b/src/mm-charsets.c
index 191789d7..fb47b0ae 100644
--- a/src/mm-charsets.c
+++ b/src/mm-charsets.c
@@ -595,7 +595,7 @@ mm_charset_get_encoded_len (const char *utf8,
                             MMModemCharset charset,
                             guint *out_unsupported)
 {
-    const char *p = utf8, *next;
+    const char *p = utf8;
     guint len = 0, unsupported = 0;
     SubsetEntry *e;
 
@@ -618,17 +618,17 @@ mm_charset_get_encoded_len (const char *utf8,
 
         c = g_utf8_get_char_validated (p, -1);
         g_return_val_if_fail (c != (gunichar) -1, 0);
-        end = next = g_utf8_find_next_char (p, NULL);
+        end = g_utf8_find_next_char (p, NULL);
         if (end == NULL) {
-            /* Find the end... */
+            /* Find the string terminating NULL */
             end = p;
-            while (*end++);
+            while (*++end);
         }
 
         if (!e->func (c, p, (end - p), &clen))
             unsupported++;
         len += clen;
-        p = next;
+        p = end;
     }
 
     if (out_unsupported)
author	Ben Chan <benchan@chromium.org>	2017-08-11 15:17:30 -0700
committer	Aleksander Morgado <aleksander@aleksander.es>	2017-08-15 09:37:24 +0200
commit	7ae32d6c61b7bd611b5b18626204f51d178b4b11 (patch)
tree	7a883c4bc208973a26b562abbda0c71deeb4f749 /src/mm-charsets.c
parent	d639359320736dd582a2dbb5c50039a8a1b878da (diff)