Hi, everybody!
The following situation:
This is the API https://x.com/index.php?app=ws&u=xxxx&h=c3f2454g543c3b8bfsdfa2311b1&op=pv&to=4444444444&unicode=1&msg=tex+consisting+of+82+Latin+characters
When using API to send SMS, where the number of characters consists of 82 characters (less than 160), also in the string of API specified unicode = 1. In this scenario, the number of SMS sent will be 2, instead of 1. Since we specified unicode = 1. If unicode = 1 is removed from the API string, the number of sent SMS will be 1, which is correct. But there is one point, if you remove the unicode = 1 line from the API, the Russian characters will not be interpreted correctly and will be displayed as “???” in SMS.
Conclusion:
When specifying unicode, all conclusions are correct, but the number of Latin letters will be equal to 2, instead of 1. If you do not specify unicode, Latin characters are processed correctly (each character is 1), but Russian characters are not displayed correctly.
Good day!
Thanks for answering and thanks for the tip.
I will outline the situation a little differently.
P.S. How the gateway is used by Kannel.
I noticed that when compiling SMS, the number of characters was 82 characters. In fact, this is 1 SMS, as the number of characters does not exceed 160 characters. But the API unicode = 1 option was specified. And billing considered that this equals 2 SMS, instead of 1. I removed unicode = 1 from the API and billing counted it as 1 SMS.
P.S.S. Only letters from the Latin alphabet and numbers were used in the text.
non-unicode SMS is 128 ASCII chars, and they r not including many other characters such as unicodes (Russian, Arabic etc).
For those unicode SMS, we need to submit to Kannel and set the type unicode, thus limit the SMS into 70 chars per SMS.
If you’re submitting to Kannel (playSMS via Kannel) a unicode text (SMS containing at least 1 unicode char) and you dont tell Kannel that it is unicode then Kannel will submit as non-unicode to provider and the recipient can’t read properly. Therefor you need to submit as unicode.
So, unicode text sent as SMS is limited to 70 chars per SMS, if more than that will be counted as more than 1 SMS. playSMS will follow that situation and adjust accordingly.
Anton, thanks for answering.
On the Kannel side, unicode processing is enabled (smsbox - mo-recode = true)
Imagine this situation: the API does not specify unicode (unicode = 1) and everything works fine if you use the Latin font in SMS and each character counts as 1. If we want to use the Russian font, the output will not be correct. If we want to combine Latin and Russian fonts, the output will also not be correct. Chastino will be displayed correctly (where the Latin letters), partially not (where the Russian letters). That’s because unicode (unicode = 1) is not specified in the URL API. If we pass the unicode option with argument 1, all characters, even Latin ones, will be treated as unicode. And this proof, 82 Latin characters count as 2 SMS. I want to say that it is possible to add such logic, where, with the unicode API specified in the URL, the Latin characters were not taken into account as Unicode, and those that were not Latin were treated as Unicode? In the PlaySMS settings there is the option “Enable credit unicode SMS as normal SMS” and if you enable this option, the text of 82 Latin characters, with Unicode = 1 specified in the URL API, will be interpreted as 1 SMS. But also other Unicode characters will be interpreted as 1 character.
It seems to me that before transmitting data to Kannel, we must determine which character is Unicode and which is not, and then when transmitting mixed data, where some are Unicode and some are not, the characters will be interpreted accordingly
Understood that you want to do that, but as far as I know you need to decide whether the SMS contains any unicode char or not, then you use unicode=1. So the unicode or not is not depend on per character but per SMS, if SMS contains unicode chars, to make it work on the recipient you need to submit unicode=1.
When playSMS use unicode=1 then its up to the gateway plugin to implement that. In Kannel the unicode option is used to select which encoding that will be used by Kannel to process a whole SMS (not per character).
Anton, thank you very much for answering so extensively.
Then one logical question is how to be in this situation:
We want to send several SMS and some SMS contain Unicode, some do not.
It turns out the following picture:
Assume that the Unicode is not set in the URL API, in this case some SMS where there are no Unicode characters will be displayed correctly and also read, and those SMS that contain Unicode characters will not be displayed correctly, but read as Unicode characters (70 characters )
Another situation: Unicode is specified in the URL API (Unicode = 1), in this case, no matter what characters will be transmitted (Unicode, not Unicode), everything will be displayed correctly by the end user, but all characters will be considered as unicode (70 characters - 1 SMS), while Latin characters are not among the Unicode characters.
Transferring all characters as Unicode is also not correct, because if Unicode = 1 is specified and for example we want to send 10000 SMS and where 9999 SMS will be sent using Latin letters, and all of them will be processed as Unicode characters and 1 will use Russian characters, which will be correctly interpreted as expected. And for the sake of 1 SMS, 9999 SMS will be considered as 2x9999.
On the application side, using 2 URL APIs is also not correct, where is 1 with the given Unicode, the other without Unicode. Agree, logically this is not correct.
It seems to me that the API should be multifunctional and have some kind of verification mechanism, maybe before transferring data to a plug-in that excludes the Unicode = 1 option if all characters are not Unicode. I’m not a programmer, I can judge objectively based on the situation and logically approach it. If somewhere my ideas do not intersect with the logic of the code, I apologize.
Anton, such an idea and what do you think about the following:
We add logic that checks and determines the contents of SMS before passing it to the KAnnel plugin, i.e. process SMS and define it as Unicode or non-Unicode and, depending on the result, passes Unicode = 1 to the plugin or not. And in such cases, Unicode = 1 we do not need in the URL API.
If you need webservices API it means that you are using a script to do custom processing before submitting SMS to Kannel via playSMS. Can you do your own detection which SMS containing unicodes which one is not, and then submit different URL (one with unicode=1, one without) ?
Its just for proof of concept. If you can then the detection part can be integrated in playSMS, I’ll help add it to playSMS.
Good day, Anton!
Thanks for answering.
Yes, I would like to say that if there were no verification of SMS content for the presence of Unicode characters before transferring it to the Kannel plugin, you would have to use this approach and, as you noticed, this would be a bad decision.
I think that there should be such a verification logic:
By default, before sending data to the Kannel plugin, there should be a check of SMS content for Unicode, if at least 1 Unicode character is present in the content, then treat all content characters as Unicode.
If more than one character is not related to Unicode, but the Unicode = 1 is specified in the URL API, do not take Unicode = 1 into account and treat all characters as non-Unicode characters.
And we will not need to invent the method that was listed above (with two APIs, 1 with Unicode, the other without).
Or you can completely remove the ability to specify unicode = 1 and implement all the logic in the code. Since, all the characters and Unicode and non-Unicode characters are static, they can be specified once in the code and the verification logic. In such cases, in the URL API, we do not need to specify Unicode = 1. instead, we will have some kind of automation of verification in the code, and based on the content of the characters, the system will determine whether to consider the contents as Unicode or not.
Good day, Anton!
Thanks for answering.
Yes, of course I can.
It turns out, after I uncomment the lines and when I test, I will not specify Unicode = 1, right?
The situation is as follow:
All Russian characters began to be interpreted correctly.
But at the same time, all characters (Latin, Russian are interpreted as non-Unicode characters. The same text is 82 characters, one consists of Latin, the other is Cyrillic (Russian), both are treated as 1 SMS. Although in the second case, instead of 1, there should have been 2 SMSs. And another such case: if after 82 Latin characters Cyril (Russian) is also indicated, that is, mixed Latin and Cyrillic characters are also treated as the same.
P.S. Unicode = 1 not specified in URL API
uncommenting that line will only get playSMS to detect the SMS before sending it to Kannel whether that SMS is unicode or not. If its containing unicode characters then automatically will pass it to Kannel as unicode SMS by adding option charset (just below that uncommented line)
so was the detection wrong ? or the processing by Kannel or playSMS was wrong for you example SMS text.