aboutsummaryrefslogtreecommitdiff
path: root/src/mm-charsets.c
AgeCommit message (Collapse)Author
2023-05-18charsets: fix read of uninitialized memory in gsm unpacked conversionAleksander Morgado
==1==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x59c6c88a31ef in gsm_ext_char_to_utf8 src/mm-charsets.c:256:13 #1 0x59c6c88a31ef in charset_gsm_unpacked_to_utf8 src/mm-charsets.c:339:20 #2 0x59c6c88a31ef in mm_modem_charset_bytearray_to_utf8 src/mm-charsets.c:857:30 #3 0x59c6c889babd in sms_decode_address src/mm-sms-part-3gpp.c:143:16 #4 0x59c6c8899d3a in mm_sms_part_3gpp_new_from_binary_pdu src/mm-sms-part-3gpp.c:514:15
2022-09-13sms: fix spliting messages into chunks in gsm7 encodingAndrey Skvortsov
1) Not every allowed GSM7 character in UTF-8 incoding takes one byte. Some (for example, 'à') take several bytes in input string, but signle byte in GSM7. 2) Extended characters in GSM7 encoding take two bytes. Otherwise for example sending following SMS fails: ``` mmcli -m a --messaging-create-sms="text='[wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww',number='+XXXXXXXXXXX'" Successfully created new SMS: /org/freedesktop/ModemManager1/SMS/99 mmcli --send -s 99 error: couldn't send the SMS: 'GDBus.Error:org.freedesktop.libqmi.Error.Protocol.WmsEncoding: Couldn't write SMS part: QMI protocol error (58): 'WmsEncoding'' ``` ``` mmcli -m a --messaging-create-sms="text='|àààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààà',number='+XXXXXXXXXXX'" Successfully created new SMS: /org/freedesktop/ModemManager1/SMS/72 mmcli --send -s 72 error: couldn't send the SMS: 'GDBus.Error:org.freedesktop.ModemManager1.Error.Core.InvalidArgs: Couldn't convert UTF-8 to GSM: input UTF-8 validation failed' ```
2022-09-13charsets: move mm_sms_part_3gpp_util_split_text to mm_charset_util_split_textAndrey Skvortsov
2022-02-17charsets: set error if UTF-8 validation failsAleksander Morgado
Otherwise, mm_modem_charset_bytearray_to_utf8() may return NULL without error set, and that will trigger a crash in the caller. Fixes https://gitlab.freedesktop.org/mobile-broadband/ModemManager/-/issues/511
2021-02-23charsets: detect iconv() support in runtimeAleksander Morgado
The only purpose of this is to log what we found, nothing else, as a quick way to detect platform support for the charsets we need.
2021-02-23charsets: define common translit fallback characterAleksander Morgado
2021-02-23charsets: remove charset_hex_to_utf8()Aleksander Morgado
No longer used, replaced by the new common conversion methods.
2021-02-23charsets: remove take_and_convert methodsAleksander Morgado
These methods worked in a very strict way for some encodings, and in a very very loose way for others. E.g. when converting from hex-encoded UCS-2, we would attempt to convert as much text as we could even if the input string was truly not even close to UCS-2. This kind of "do our best" could make sense when processing e.g. the operator name reported by the modem, as that is some string to show to the user and there may be no strict requirement to have it perfectly fine. But the kind of loose comparison done for UCS-2 doesn't make sense e.g. when converting USSD responses or SMS messages.
2021-02-23charsets: use new bytearray_to_utf8() instead of byte_array_to_utf8()Aleksander Morgado
2021-02-23charsets: make charset_gsm_unpacked_to_utf8() privateAleksander Morgado
Use the generic mm_modem_charset_bytearray_to_utf8() instead.
2021-02-23charsets: use new bytearray_from_utf8() instead of byte_array_append()Aleksander Morgado
2021-02-23charsets: make charset_utf8_to_unpacked_gsm() privateAleksander Morgado
Use the generic mm_modem_charset_bytearray_from_utf8() instead.
2021-02-23charsets: new common APIs to convert from/to charsets and UTF-8Aleksander Morgado
2021-02-23charsets: avoid //TRANSLIT when converting to/from charsetsAleksander Morgado
The //TRANSLIT extension is not always supported by the different iconv() implementations that we may find out there, so let's completely avoid using it. For some of the charsets it actually didn't make much sense anyway, e.g. as converting to UTF-16 or UTF-8 would always be possible without requiring //TRANSLIT to take effect. The //TRANSLIT extension was also being used sometimes in the source charset identification, which was also not fully correct, as we would only expect it in the target charset identification.
2021-02-23charsets: make translit optional in utf8_to_unpacked_gsm()Aleksander Morgado
If the conversion is not fully compatible, the user of the method needs to request transliteration enabled explicitly in order to avoid returning errors in this method.
2021-02-23charsets: make translit optional in gsm_unpacked_to_utf8()Aleksander Morgado
Until now, this method would automatically apply transliteration; i.e. replacing characters with '?' when no direct translation was available. We can attempt to do that transliteration on strings that are not critical, e.g. the operator name reported by the network. But we should not do that on other types of strings, e.g. on SMS contents that may really have additional purposes than just being human-readable. This commit makes the transliteration option to be explicitly requested by the caller.
2021-02-23libmm-glib,common-helpers: make hexstr2bin() return a guint8 arrayAleksander Morgado
It makes much more sense than returning a gchar array, as gchar is signed.
2021-02-23libmm-glib,common-helpers: make hexstr2bin() accept input string lengthAleksander Morgado
Optionally given explicitly, and -1 can be used to assume it's NUL-terminated.
2021-02-23libmm-glib,common-helpers: make hexstr2bin() return a GErrorAleksander Morgado
This util method checks whether the input string is a valid hex string, so make sure we return a GError on failure.
2021-02-23charsets: remove unused charset_utf8_to_hex() methodAleksander Morgado
2021-02-23charsets: don't allow quoting in byte_array_append()Aleksander Morgado
There's no point in adding a quoting option to this method; if the caller needs the appended string quoted, it should quote it before passing it to this method. It was nowhere used anyway.
2021-02-23charsets: remove HEX charset typeAleksander Morgado
This is no real charset, it is the fake one we used to represent a UCS2 hex-encoded string.
2020-08-20charsets: refactor coding styleAleksander Morgado
Mostly to use GLib types like gchar or gint, and also to use G_N_ELEMENTS() instead of custom end of array terminating items.
2020-08-20charsets: add UTF-16BE as a possible modem charsetAleksander Morgado
Just as an implementation detail to be taken as an extension of UCS2BE, never really to be used as a real modem charset.
2020-05-26charsets: take_and_convert() methods should support GSM encodingAleksander Morgado
The iconv() operation would fail because we wouldn't give any proper charset string to convert to/from. Use our custom GSM encoding support instead.
2020-05-26charsets: don't warn in unlikely case of needing to convert to HEX from UTF-8Aleksander Morgado
This would really be an implementation detail, not a real use case. Just don't warn in this case, as in the conversion in the opposite direction.
2020-05-26Revert "charsets: don't warn in unlikely case of needing to convert to HEX ↵Aleksander Morgado
from UTF-8" This reverts commit 6a7dd87f30b2cc1b459abab38a0805aa8ba1bfbc. Reverting because the merge request was set to squash all together....
2020-05-26charsets: don't warn in unlikely case of needing to convert to HEX from UTF-8Giacinto Cifelli
This would really be an implementation detail, not a real use case. Just don't warn in this case, as in the conversion in the opposite direction.
2020-04-08charsets: report GError in byte_array_append() failuresAleksander Morgado
2020-01-31charsets: fix warnings with -Wswitch-defaultAleksander Morgado
mm-charsets.c: In function ‘mm_charset_take_and_convert_to_utf8’: mm-charsets.c:730:5: error: switch missing default case [-Werror=switch-default] 730 | switch (charset) { | ^~~~~~ mm-charsets.c: In function ‘mm_utf8_take_and_convert_to_charset’: mm-charsets.c:852:5: error: switch missing default case [-Werror=switch-default] 852 | switch (charset) { | ^~~~~~
2020-01-31charsets: fix warnings with -Wsign-compareAleksander Morgado
mm-charsets.c: In function ‘mm_charset_gsm_unpacked_to_utf8’: mm-charsets.c:423:19: error: comparison of integer expressions of different signedness: ‘int’ and ‘guint32’ {aka ‘unsigned int’} [-Werror=sign-compare] 423 | for (i = 0; i < len; i++) { | ^ mm-charsets.c: In function ‘pccp437_is_subset’: mm-charsets.c:544:19: error: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’ [-Werror=sign-compare] 544 | for (i = 0; i < G_N_ELEMENTS (t); i++) { | ^ mm-charsets.c: In function ‘pcdn_is_subset’: mm-charsets.c:575:19: error: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’ [-Werror=sign-compare] 575 | for (i = 0; i < sizeof (t) / sizeof (t[0]); i++) { | ^ mm-charsets.c: In function ‘mm_charset_gsm_unpack’: mm-charsets.c:657:19: error: comparison of integer expressions of different signedness: ‘int’ and ‘guint32’ {aka ‘unsigned int’} [-Werror=sign-compare] 657 | for (i = 0; i < num_septets; i++) { | ^ mm-charsets.c: In function ‘mm_charset_gsm_pack’: mm-charsets.c:701:42: error: comparison of integer expressions of different signedness: ‘int’ and ‘guint32’ {aka ‘unsigned int’} [-Werror=sign-compare] 701 | for (i = 0, lshift = start_offset; i < src_len; i++) { | ^
2020-01-24charsets: fix handling of 0x00 bytes at the end of GSM encoded stringsAleksander Morgado
When a GSM-7 encoded string is packed, the process of packing the septets into bytes may end up with one last byte holding the last bit of the last septet. When this situation happens, the last byte will end up with the 7 remaining bits set to 0. When this packed string is unpacked, the logic to unpack will unpack those last 7 bits as an additional septet, with the value 0x00. The 0x00 value encoded in GSM-7 is the '@' character, EXCEPT when this character is found at the end of the string, in which case the value should be considered as NUL and trigger the end of string already. So, fix the conversion logic between GSM-7 and UTF-8, so that whenever we find the 0x00 character at the end of the string, we ignore it instead of adding a bogus '@' trailing character. This commit fixes the "/MM/charsets/gsm7/default-chars" unit test after having it updated to perform the full conversion cycle: UTF-8 -> packed GSM7 -> UTF-8
2019-08-02charsets: use `G_N_ELEMENTS (t)' instead of `sizeof (t) / sizeof (t[0])'Ben Chan
2018-08-21charsets: new helper to convert binary input data to UTF-8Aleksander Morgado
Most of all the other APIs we have are expecting binary data (e.g. UCS-2 encoded strings) in ASCII hex format, because they were going to be used in text AT commands. For binary protocols allowing binary data, we need use a more generic API that provides an explicit data size.
2017-08-16charsets: simplify check to see if conversion to charset possibleAleksander Morgado
Instead of having a method that returns the expected length after the conversion and the amount of input UTF-8 characters that couldn't be converted to the given charset, simplify the logic and just define a method that returns a boolean specifying whether the conversion is possible or not. Also, include unit tests.
2017-08-16charsets: ensure all methods are prefixed with 'mm_'Aleksander Morgado
2017-08-15charset: fix mm_charset_get_encoded_lenBen Chan
The while loop in mm_charset_get_encoded_len() iterates through each valid UTF-8 encoded character in the given NULL-terminated UTF-8 string. It uses g_utf8_find_next_char() to find the position of the next character. In case, g_utf8_find_next_char() returns NULL, it tries to find the end (i.e. the NULL character) of the string. This patch fixes the following issues in the while loop: 1. The loop uses both 'next' and 'end' to track the position of the next character in the string. When g_utf8_find_next_char() returns a non-NULL value, 'next' is essentially the same as 'end'. When g_utf8_find_next_char() returns NULL, 'next' remains NULL while 'end' is adjusted to track the end of the string (but is done incorrectly as described in #2). After the 'p = next' assignment, the 'while (*p)' check results in a NULL dereference. 'p' should thus be set to 'end' instead of 'next'. 'next' is thus redundant and can be removed. 2. When g_utf8_find_next_char() returns NULL and 'end' is adjusted to track the end of string, the 'while (*end++)' loop stops when finding the NULL character, but 'end' is advanced past the NULL character. After the 'p = end' assignment, the 'while (*p)' check results in a dereference of out-of-bounds pointer. 'while (*++end)' should be used instead given that 'p' doesn't point to a NULL character when 'end = p' happens. 'end' will be updated to point to the NULL character. After the 'p = end' assignment (fixed in #1), the 'while (*p)' check will properly stop the loop.
2016-08-15core: use MM-specific logging methods always instead of the generic GLib onesAleksander Morgado
2014-05-20core: minor coding style fixesBen Chan
2012-10-04libmm-glib: remove the `libmm-common.h' headerAleksander Morgado
Both the ModemManager daemon and the mmcli will now include `libmm-glib.h' only. We also handle two new special `_LIBMM_INSIDE_MM' and `LIBMM_INSIDE_MMCLI' symbols, which if included before the `libmm-glib.h' library allow us to: * Don't include the libmm-glib high level API in the ModemManager daemon, as the object names would clash with those in the core. * Define some of the methods of helper objects to be included only if compiling ModemManager daemon or the mmcli.
2012-09-14libmm-common: added common utils from coreAleksander Morgado
Moved the utils to play with binary to hex strings into libmm-common.
2012-09-12core: fix a leak in "core: better handling of non-UCS2 conversions that ↵Dan Williams
should be UCS2 (bgo #683817)"
2012-09-12core: better handling of non-UCS2 conversions that should be UCS2 (bgo #683817)Dan Williams
Some modems return the +COPS operator name in hex-encoded current character set (as set with +CSCS). Others return the operator name in ASCII when set to UCS2, while yet others return the ASCII name with trash at the end (*cough* Huawei *cough*). Handle that better by not crashing.
2012-03-28charsets: plug memleakAleksander Morgado
2012-03-16charsets: fix compilation with -Werror=maybe-uninitializedAleksander Morgado
2012-03-16charsets: new UTF-8 to given charset converterAleksander Morgado
UCS-2 strings are always hex-converted.
2012-02-18charsets: plug memleakAleksander Morgado
The string passed to utils_bin2hexstr() needs to be freed afterwards.
2012-02-08charsets: don't crash when passing a NULL string to the UTF-8 converterAleksander Morgado
2012-02-07charsets: new method to do our best to convert from current charset to UTF-8Aleksander Morgado
This method will try to convert the input string to UTF-8. The input string is supposed to be in the given charset; or otherwise is supposed to be the hex representation of the string in the given charset.
2012-02-07charsets: don't warn if we couldn't convert from hex to utf8Aleksander Morgado