Age | Commit message (Collapse) | Author |
|
==1==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x59c6c88a31ef in gsm_ext_char_to_utf8 src/mm-charsets.c:256:13
#1 0x59c6c88a31ef in charset_gsm_unpacked_to_utf8 src/mm-charsets.c:339:20
#2 0x59c6c88a31ef in mm_modem_charset_bytearray_to_utf8 src/mm-charsets.c:857:30
#3 0x59c6c889babd in sms_decode_address src/mm-sms-part-3gpp.c:143:16
#4 0x59c6c8899d3a in mm_sms_part_3gpp_new_from_binary_pdu src/mm-sms-part-3gpp.c:514:15
|
|
1) Not every allowed GSM7 character in UTF-8 incoding takes one
byte. Some (for example, 'à') take several bytes in input string, but
signle byte in GSM7.
2) Extended characters in GSM7 encoding take two bytes.
Otherwise for example sending following SMS fails:
```
mmcli -m a --messaging-create-sms="text='[wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww',number='+XXXXXXXXXXX'"
Successfully created new SMS: /org/freedesktop/ModemManager1/SMS/99
mmcli --send -s 99
error: couldn't send the SMS: 'GDBus.Error:org.freedesktop.libqmi.Error.Protocol.WmsEncoding: Couldn't write SMS part: QMI protocol error (58): 'WmsEncoding''
```
```
mmcli -m a --messaging-create-sms="text='|àààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààààà',number='+XXXXXXXXXXX'"
Successfully created new SMS: /org/freedesktop/ModemManager1/SMS/72
mmcli --send -s 72
error: couldn't send the SMS: 'GDBus.Error:org.freedesktop.ModemManager1.Error.Core.InvalidArgs: Couldn't convert UTF-8 to GSM: input UTF-8 validation failed'
```
|
|
|
|
Otherwise, mm_modem_charset_bytearray_to_utf8() may return NULL
without error set, and that will trigger a crash in the caller.
Fixes https://gitlab.freedesktop.org/mobile-broadband/ModemManager/-/issues/511
|
|
The only purpose of this is to log what we found, nothing else, as a
quick way to detect platform support for the charsets we need.
|
|
|
|
No longer used, replaced by the new common conversion methods.
|
|
These methods worked in a very strict way for some encodings, and in a
very very loose way for others. E.g. when converting from hex-encoded
UCS-2, we would attempt to convert as much text as we could even if
the input string was truly not even close to UCS-2. This kind of "do
our best" could make sense when processing e.g. the operator name
reported by the modem, as that is some string to show to the user and
there may be no strict requirement to have it perfectly fine. But the
kind of loose comparison done for UCS-2 doesn't make sense e.g. when
converting USSD responses or SMS messages.
|
|
|
|
Use the generic mm_modem_charset_bytearray_to_utf8() instead.
|
|
|
|
Use the generic mm_modem_charset_bytearray_from_utf8() instead.
|
|
|
|
The //TRANSLIT extension is not always supported by the different
iconv() implementations that we may find out there, so let's
completely avoid using it.
For some of the charsets it actually didn't make much sense anyway,
e.g. as converting to UTF-16 or UTF-8 would always be possible without
requiring //TRANSLIT to take effect.
The //TRANSLIT extension was also being used sometimes in the source
charset identification, which was also not fully correct, as we would
only expect it in the target charset identification.
|
|
If the conversion is not fully compatible, the user of the method
needs to request transliteration enabled explicitly in order to avoid
returning errors in this method.
|
|
Until now, this method would automatically apply transliteration;
i.e. replacing characters with '?' when no direct translation was
available.
We can attempt to do that transliteration on strings that are not
critical, e.g. the operator name reported by the network. But we
should not do that on other types of strings, e.g. on SMS contents
that may really have additional purposes than just being
human-readable.
This commit makes the transliteration option to be explicitly
requested by the caller.
|
|
It makes much more sense than returning a gchar array, as gchar is
signed.
|
|
Optionally given explicitly, and -1 can be used to assume it's
NUL-terminated.
|
|
This util method checks whether the input string is a valid hex
string, so make sure we return a GError on failure.
|
|
|
|
There's no point in adding a quoting option to this method; if the
caller needs the appended string quoted, it should quote it before
passing it to this method.
It was nowhere used anyway.
|
|
This is no real charset, it is the fake one we used to represent a
UCS2 hex-encoded string.
|
|
Mostly to use GLib types like gchar or gint, and also to use
G_N_ELEMENTS() instead of custom end of array terminating items.
|
|
Just as an implementation detail to be taken as an extension of
UCS2BE, never really to be used as a real modem charset.
|
|
The iconv() operation would fail because we wouldn't give any proper
charset string to convert to/from. Use our custom GSM encoding support
instead.
|
|
This would really be an implementation detail, not a real use
case. Just don't warn in this case, as in the conversion in the
opposite direction.
|
|
from UTF-8"
This reverts commit 6a7dd87f30b2cc1b459abab38a0805aa8ba1bfbc.
Reverting because the merge request was set to squash all together....
|
|
This would really be an implementation detail, not a real use
case. Just don't warn in this case, as in the conversion in the
opposite direction.
|
|
|
|
mm-charsets.c: In function ‘mm_charset_take_and_convert_to_utf8’:
mm-charsets.c:730:5: error: switch missing default case [-Werror=switch-default]
730 | switch (charset) {
| ^~~~~~
mm-charsets.c: In function ‘mm_utf8_take_and_convert_to_charset’:
mm-charsets.c:852:5: error: switch missing default case [-Werror=switch-default]
852 | switch (charset) {
| ^~~~~~
|
|
mm-charsets.c: In function ‘mm_charset_gsm_unpacked_to_utf8’:
mm-charsets.c:423:19: error: comparison of integer expressions of different signedness: ‘int’ and ‘guint32’ {aka ‘unsigned int’} [-Werror=sign-compare]
423 | for (i = 0; i < len; i++) {
| ^
mm-charsets.c: In function ‘pccp437_is_subset’:
mm-charsets.c:544:19: error: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’ [-Werror=sign-compare]
544 | for (i = 0; i < G_N_ELEMENTS (t); i++) {
| ^
mm-charsets.c: In function ‘pcdn_is_subset’:
mm-charsets.c:575:19: error: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’ [-Werror=sign-compare]
575 | for (i = 0; i < sizeof (t) / sizeof (t[0]); i++) {
| ^
mm-charsets.c: In function ‘mm_charset_gsm_unpack’:
mm-charsets.c:657:19: error: comparison of integer expressions of different signedness: ‘int’ and ‘guint32’ {aka ‘unsigned int’} [-Werror=sign-compare]
657 | for (i = 0; i < num_septets; i++) {
| ^
mm-charsets.c: In function ‘mm_charset_gsm_pack’:
mm-charsets.c:701:42: error: comparison of integer expressions of different signedness: ‘int’ and ‘guint32’ {aka ‘unsigned int’} [-Werror=sign-compare]
701 | for (i = 0, lshift = start_offset; i < src_len; i++) {
| ^
|
|
When a GSM-7 encoded string is packed, the process of packing the
septets into bytes may end up with one last byte holding the last bit
of the last septet. When this situation happens, the last byte will
end up with the 7 remaining bits set to 0.
When this packed string is unpacked, the logic to unpack will unpack
those last 7 bits as an additional septet, with the value 0x00.
The 0x00 value encoded in GSM-7 is the '@' character, EXCEPT when this
character is found at the end of the string, in which case the value
should be considered as NUL and trigger the end of string already.
So, fix the conversion logic between GSM-7 and UTF-8, so that whenever
we find the 0x00 character at the end of the string, we ignore it
instead of adding a bogus '@' trailing character.
This commit fixes the "/MM/charsets/gsm7/default-chars" unit test
after having it updated to perform the full conversion cycle:
UTF-8 -> packed GSM7 -> UTF-8
|
|
|
|
Most of all the other APIs we have are expecting binary data (e.g.
UCS-2 encoded strings) in ASCII hex format, because they were going
to be used in text AT commands. For binary protocols allowing binary
data, we need use a more generic API that provides an explicit data
size.
|
|
Instead of having a method that returns the expected length after the
conversion and the amount of input UTF-8 characters that couldn't be
converted to the given charset, simplify the logic and just define a
method that returns a boolean specifying whether the conversion is
possible or not.
Also, include unit tests.
|
|
|
|
The while loop in mm_charset_get_encoded_len() iterates through each
valid UTF-8 encoded character in the given NULL-terminated UTF-8 string.
It uses g_utf8_find_next_char() to find the position of the next
character. In case, g_utf8_find_next_char() returns NULL, it tries to
find the end (i.e. the NULL character) of the string.
This patch fixes the following issues in the while loop:
1. The loop uses both 'next' and 'end' to track the position of the next
character in the string.
When g_utf8_find_next_char() returns a non-NULL value, 'next' is
essentially the same as 'end'.
When g_utf8_find_next_char() returns NULL, 'next' remains NULL while
'end' is adjusted to track the end of the string (but is done
incorrectly as described in #2). After the 'p = next' assignment, the
'while (*p)' check results in a NULL dereference. 'p' should thus be
set to 'end' instead of 'next'.
'next' is thus redundant and can be removed.
2. When g_utf8_find_next_char() returns NULL and 'end' is adjusted to
track the end of string, the 'while (*end++)' loop stops when finding
the NULL character, but 'end' is advanced past the NULL character.
After the 'p = end' assignment, the 'while (*p)' check results in a
dereference of out-of-bounds pointer.
'while (*++end)' should be used instead given that 'p' doesn't point
to a NULL character when 'end = p' happens. 'end' will be updated to
point to the NULL character. After the 'p = end' assignment (fixed in
#1), the 'while (*p)' check will properly stop the loop.
|
|
|
|
|
|
Both the ModemManager daemon and the mmcli will now include `libmm-glib.h' only.
We also handle two new special `_LIBMM_INSIDE_MM' and `LIBMM_INSIDE_MMCLI'
symbols, which if included before the `libmm-glib.h' library allow us to:
* Don't include the libmm-glib high level API in the ModemManager daemon, as
the object names would clash with those in the core.
* Define some of the methods of helper objects to be included only if compiling
ModemManager daemon or the mmcli.
|
|
Moved the utils to play with binary to hex strings into libmm-common.
|
|
should be UCS2 (bgo #683817)"
|
|
Some modems return the +COPS operator name in hex-encoded current
character set (as set with +CSCS). Others return the operator name
in ASCII when set to UCS2, while yet others return the ASCII name
with trash at the end (*cough* Huawei *cough*). Handle that better
by not crashing.
|
|
|
|
|
|
UCS-2 strings are always hex-converted.
|
|
The string passed to utils_bin2hexstr() needs to be freed afterwards.
|
|
|
|
This method will try to convert the input string to UTF-8. The input string is
supposed to be in the given charset; or otherwise is supposed to be the hex
representation of the string in the given charset.
|
|
|