try to kill playsmsd one by one, one at a time, and observe
the kill starts from dlrssmsd, then restart playsmsd then kill recvsmsd manually, restart again and so on
see which service affecting mysql
anton
try to kill playsmsd one by one, one at a time, and observe
the kill starts from dlrssmsd, then restart playsmsd then kill recvsmsd manually, restart again and so on
see which service affecting mysql
anton
Ok, thank you, will try it.
Regards,
Jamshid Tursunov
Anton, I tried to collect information on each process and together as a whole, and this is what happened:
Sequential shutdown and inclusion:
Before shutting down the processes, the CPU load varied in the range of 105-140%
Disconnected in the following sequence:
The first to disconnect ratesmsd, the CPU load dropped to around 30-50%, and this range, while watching, kept steady
The second turned off recvsmsd, the CPU load was in the range of 30-43%
The third disabled schedule, the CPU load was in the range of 30-42%
The fourth disconnected sendsmsd, the CPU load was in the range of 30-42%
Fifth disabled dlrssmsd, CPU load dropped to 0.7%, sometimes jumped to 1%
When all 5 services disconnected, the CPU load dropped to 0.7%, sometimes jumped to 1%
When I turned on dlrssmsd, the CPU load jumped right up to 30-42%
Next turned on sendsmsd, the CPU load remained in the range of 30-42%
After turning on schedule, the CPU load was in the range of 30-42%
Then turned on recvsmsd, the CPU load was in the range of 30-42%
When I turned on ratesmsd, the CPU load was kept in the range of 40-45%, but very often there are skips and the load reaches as much as 145%
Individual enable and disable: (I will enable and disable the service to calculate how much load a single process gives)
When I turned on dlrssmsd, the CPU load jumped right up to 42%
Next turned on sendsmsd, CPU load kept in the range 1.3-1.7%
After I enabled schedule, the CPU load was in the range 1.3-1.7%
Then I turned on recvsmsd, the CPU load kept in the range 1.3-1.7%
When I turned on ratesmsd, the CPU load started to jump right up to the mark above 100%
…
I want to emphasize that even after I turned off all the PlaySMS processes, the memory occupied by MySQL remained at 14%. I want to say that after disabling the PlaySMS process, the memory is not freed.
Respectfully,
Jamshid Tursunov
Anton, if possible, pay attention to the above MySQL behavior. Is it possible to optimize MySQL in order to reduce the load on the processor. Such bumps occur only with a simple system.
Respectfully,
Jamshid Tursunov
Anton, good day!
Did you have time to look at the unstable behavior of PlaySMS when stacking with a database?
Respectfully,
Jamshid Tursunov
Hi, sorry no, and I think its better to ask MySQL professionals/communities for that.
Looks to me its all on how to correctly index and/or optimize MySQL configs. Of course there are parts where changes can be done in playSMS to help it, but I haven’t got time to focus on that.
anton
Hello dear Anton!
I have to bother you again, your help is needed in solving this issue
If you pay attention, I tried to describe the problem in detail within each service separately.
I really need your help in solving this issue, apparently, except for you, no one will figure this out
Respectfully,
Jamshid Tursunov
whats your server’s specification (cpu, ram, is disk ssd or not) ?
have you try to tweak MySQL config ?
have you try to make index, test it, perhaps ask MySQL pro to help you how to fix it (or follow some index pointed by @Edilson_Spessoto on previous posts
how big your data ? how many estimated rows on your tables ? (just estimation, no need exact number, like above a million, above 1 hundred thousand rows… something like this)
anton
Try to shutdown your playSMS then try to observe your MySQL if it behaving the same. Is this a physical machine? or virtual? what is the storage config and physical specs… Is there any application uses share same volume with your MYSQL. Would be great if you provide you partition tables, directory where your components directory installed. And you migth check if someone mocking your Mysql server try netstat.
Anton got you there, but need to understand how his hardware specs. Indexing is good but I have Helpdesk system for almost 10 years of data with a 1K users haven’t performed indexing but still don’t have an issues with this databases server.
Just helping here.
From you post, memory physical uses 60% of it, your swap does not even used. meaning you have a pretty good memory utilizations.
But were are not aware on how your storage specs, how you manage the partitions, where/what volumes your application working directory been stored.
You might also check you physical storage performance.
your hardware specs.
@anton , if you have the opportunity and resources, please fix this problem as well.
You can preview a similar situation on the working playSMS
Note:
I tried to collect information on each process and together as a whole, and this is what happened:
Sequential shutdown and inclusion:
Before shutting down the processes, the CPU load varied in the range of 105-140%
Disconnected in the following sequence:
The first to disconnect ratesmsd, the CPU load dropped to around 30-50%, and this range, while watching, kept steady
The second turned off recvsmsd, the CPU load was in the range of 30-43%
The third disabled schedule, the CPU load was in the range of 30-42%
The fourth disconnected sendsmsd, the CPU load was in the range of 30-42%
Fifth disabled dlrssmsd, CPU load dropped to 0.7%, sometimes jumped to 1%
When all 5 services disconnected, the CPU load dropped to 0.7%, sometimes jumped to 1%
When I turned on dlrssmsd, the CPU load jumped right up to 30-42%
Next turned on sendsmsd, the CPU load remained in the range of 30-42%
After turning on schedule, the CPU load was in the range of 30-42%
Then turned on recvsmsd, the CPU load was in the range of 30-42%
When I turned on ratesmsd, the CPU load was kept in the range of 40-45%, but very often there are skips and the load reaches as much as 145%
Individual enable and disable: (I will enable and disable the service to calculate how much load a single process gives)
When I turned on dlrssmsd, the CPU load jumped right up to 42%
Next turned on sendsmsd, CPU load kept in the range 1.3-1.7%
After I enabled schedule, the CPU load was in the range 1.3-1.7%
Then I turned on recvsmsd, the CPU load kept in the range 1.3-1.7%
When I turned on ratesmsd, the CPU load started to jump right up to the mark above 100%
…
I want to emphasize that even after I turned off all the PlaySMS processes, the memory occupied by MySQL remained at 14%. I want to say that after disabling the PlaySMS process, the memory is not freed.
Regards,
Jamshid Tursunov
Most likely, indexing is not enabled for some tables.
I’m certainly not a deep DB specialist, but all sources on the Internet point to this.
Regards,
Jamshid Tursunov
This may paritially be fixed here:
That changes, especially on credit/fn.php
, removed INNER JOIN SQL to tblSMSOutgoing
, when querying balance.
parent_uid
and uid
information also saved in tblBilling now so no need to JOIN 2 tables when querying balance.
For new installation it wont be a problem, but for an upgrade, you need alter tblBilling
by adding parent_uid
and uid
column, and then copy all parent_uid
and uid
info from each row in tblSMSOutgoing
to tblBilling
.
anton
Check soon
Yours faithfully,
Jamshid Tursnuov
Thanks in advance
This problem has not gone away; I think it’s mainly caused by inefficient code.
I find the codebase a bit difficult to follow because it contains a fair bit of ‘magic’ like numbers instead of constants, and use of backtrace & function name suffixes instead of explicit attachment to hooks, iteration with for
instead of foreach
loops, etc., so feel free to correct me on the facts, @anton.
There’s $core_config['isdlrd']
which is true
by default in your config.php
. This is meant to defer some work to the daemon (that calls dlrd()
), otherwise it’s done immediately when dlr()
is called. However, even if you set this config value to false
, dlrd()
is still called and still does the same work. It looks through every record in playsms_tblDLR
for flag_processed
is 1
, even if there are never any that will match. This search is on an un-indexed column, and that table can contain anywhere from thousands to millions of records, depending on how often you send, how many recipients you typically have, and how long you’ve been using PlaySMS. The same daemon then does something similar in getsmsstatus()
on playsms_tblSMSOutgoing
(another full table scan, no indices). The latter table has one record for every SMS sent, fewer than the other one, but a lot. It then waits 1 second before doing this again, all day every day. The resulting CPU thrashing occurs, regardless. It can have nothing queued, no unprocessed messages, etc., but it will keep performing the checks over and over.
You can increase that default wait by setting DAEMON_SLEEP
in etc/playsmsd.conf
but this value is also used for all the other daemons PlaySMS runs. A longer wait may unnecessarily throttle normal sending and receiving; I chose to do this any way because the reduction in thrashing is quite significant.
Adding a non-unique index to playsms_tblDLR.flag_processed
may help, but it’s up to your database to decide at query time whether to use it. It has to see that most of the indexed records don’t match what it’s looking for, so using the index to find the ones that do match will be faster than scanning the whole table. Initially, this wasn’t the case for me as I had a huge amount of records which logged the message as ready to send but unprocessed, yet later records for the same message (matching smslog_id
) show them as delivered and/or failed and processed. To me, this looks like the daemon constantly re-examines old delivery statuses, looking for messages where it already has an answer and will never get a new answer because the message files are long gone . If I mark the older records as processed (because the matching later ones make me think they have been), the index becomes very helpful, and looking for unprocessed records is super-fast. I’m not 100% sure this is the right approach, but the alternative is infinite scanning of a huge table, looking for responses that I doubt will arrive again.
For the other table, adding this index massively sped up the queries, because very few records had p_status
set to 0
, as needed by the daemon:
ALTER TABLE playsms_tblSMSOutgoing ADD INDEX (p_smsc, p_status, flag_deleted);
This part is gammu-specific, but when PlaySMS checks the status of messages to mark as delivered or not, it manually reads every file in the error
and sent
folders into some lists, then checks those lists until it finds a matching file, instead of using something like glob()
to look for a direct match. It also has some odd logic regarding timestamps of old messages, which continued looking for results of old sends many months after they’d completed. I had to patch that out.
I think performance issues stem from tasks using constant, repetitive, and resource-intensive polling of both the database and filesystem, instead of being event-based. It’s probably difficult to avoid some of this because it would require using platform-specific filesystem watching tools like inotify, and considerable revision to the code to accommodate them. However, it needs addressing. One strategy could involve adding indexed(!) timestamp columns to some tables so that tasks where only new and recently-modified records are relevant could reduce the amount of data to examine.
In practice, I have to run lots of queries directly from cron jobs to periodically correct & de-duplicate phonebook & stoplist entries, clean the DLR table, and remove old sends. This means far less data for the daemons to churn through. I also disabled the credit
, simplebilling
, and simplerate
plugins (because we don’t use them and they consume a lot of processing time). This can be done by prepending a .
to their folder names under plugin/feature/
. I then had to add a custom one, otherwise every send fails. It’s basically only this:
function mypluginname_hook_rate_cansend() {
return true;
}
You could probably add this to any enabled plugin, changing its prefix to match, YMMV. I could have changed the main sending code instead to assume that a null
return from that hook meant ‘no rate plugins, so no send limits’.
Something to make clean-up easier is to add foreign keys, so that deleting a single queue record automatically deletes its related records. Don’t blame me if you break your database, but I use these foreign keys which seem to work for me:
Table | Relation |
---|---|
playsms_tblDLR |
FOREIGN KEY (`smslog_id`) REFERENCES `playsms_tblSMSOutgoing_queue_dst` (`id`) ON DELETE CASCADE |
playsms_tblSMSOutgoing_queue_dst |
FOREIGN KEY (`queue_id`) REFERENCES `playsms_tblSMSOutgoing_queue` (`id`) ON DELETE CASCADE |
playsms_tblSMSOutgoing |
FOREIGN KEY (`queue_code`) REFERENCES `playsms_tblSMSOutgoing_queue` (`queue_code`) ON DELETE CASCADE |
Those foreign keys also add indices to your tables, which may speed up other queries that join on those columns. If you cannot add those foreign keys, it’s usually because there are orphaned child records that need to be deleted, e.g.:
DELETE FROM playsms_tblDLR WHERE NOT EXISTS (SELECT * FROM playsms_tblSMSOutgoing_queue_dst WHERE playsms_tblSMSOutgoing_queue_dst.id = playsms_tblDLR.smslog_id);
I had to make those changes from the mariadb
(or mysql
) command line because they took a while to complete.
I don’t mean to disparage Anton’s work, it’s a large project that’s clearly had a lot of time and effort put into it, which I doubt I’d have the patience for. It mostly behaves as expected but hasn’t been written in a way that enables it to scale well.
I’m sure this seems like a big rant, but I’m just sharing that improvements are possible with good indices, avoiding processing things that don’t need it, and not keeping old data forever.
If disabling the billing things, you’ll also need a plugin function with something like this:
function mypluginname_hook_rate_getcharges() {
return [1, 0, 0];
}