Scheduling task on web server to database

Although it sounds simple, but please trust me that it always troubled me.

OS level Cron is a good choice but it does not have any protection against multiple execution. Let’s imagine you have created an event that executes every 10 seconds, but the logic inside the event (i.e. curl) can take longer than 10 seconds, so it can pile-up. In the worst case, when an event contains a follow up action of ‘insert’ query, it can cause duplication.

MySQL Event Scheduler that available on version 5.1.6 is at the database layer implementation but it has some way to avoid deadlock. https://www.percona.com/blog/2015/02/25/using-mysql-event-scheduler-and-how-to-prevent-contention/

Database layer of scheduler is useful if you just wanto modify the data without other ‘external’ action such sending email or calling another php script. It is also useful for database hosted on AWS RDS cloud.

Gearman is another totally different tool but kind of related. http://gearman.org/

Gearman provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages. It can be used in a variety of applications, from high-availability web sites to the transport of database replication events.

Imagine you crawling multiple website base on today captured records. Gearman help you distribute task evenly across available client, changing a ‘serial’ process into ‘parallel’. The speed it give you help to avoid the above mentioned deadlock situation. However, it still not protect against multiple execution.

For now, I find this article very helpful: http://bencane.com/2015/09/22/preventing-duplicate-cron-job-executions/