Antares Trader Blog

The universe at your fingertips

Towards a Better Queue Processing Model

Wednesday

Mar 17, 2010

12:43 pm

My current short-term area of interest is in background job queues such as Resque (Git Hub Repository), Delayed_job and my own contribution Updater. So when I ran across this article about dynamically assigning jobs to queues it got me thinking about why there is a need for different queues, and how jobs get organized. I realized that it is part of the problem I am trying to solve in Updater.

Lets start by stating the obvious: job queues are intended to intelligently apply computational resources to jobs that need them. part of that intelligence is in placing a job in an appropriate queue, but is that really necessary? I do not have GitHubs experience with huge workloads, so take this informed speculation. However, it seems that the primary use of separate queues is to keep long-running or unimportant jobs from blocking the resources needed for high-priority and fast jobs. It seems like a queue processor could be developed the would treat job scheduling as an optimizing problem and more dynamically assign jobs to resources.

Better Queue Control

In Updater, I have tried done two things that allow for better assignment of jobs, both of which it seems might be expanded upon in order to eliminate the need for separately assigned queues with distinct worker processes. The first is to maintain a master process that is partially responsible for deciding when jobs get run. Delayed_job starts a set of independent processes that all try to eat of the same queue. It a model that I picture a bit like feeding ducks in the park. Enqueued jobs are like breadcrumbs tossed at the flock, and the workers are like ducks all scrambling to get the crumbs. Updater and Resque on the other hand make the queue look more like a lunch counter. The person behind the counter simply hands out food to whoever is next in line and does not worry about how the line is formed. This eliminates most collisions, but still gets bogged down if there are long running jobs in the queue that block other faster or more important jobs.

My thought was instead of simply forming separate lines at the lunch counter, what if we could be more intelligent about how the line was managed. Jobs could have values (either computed or assigned by a programmer) that represented the cost of delaying the job (priority) and the cost of running the job. The goal of a process optimizer would be to spend computer resources as efficiently as possible.

These dynamics are important to me because, unlike GitHub, I anticipate Antares Trader will need an unbounded variety of jobs processed. GitHub and many other applications of job queues use a limited set of job types whose characteristics are known ahead of time. This situation allows for some really amazing hand optimization such like what was mentioned in the Kabisa Blog article linked to above. But for applications where the number and variety of jobs is much larger and likely to expand and change with the evolution of the application I can begin to see the need for more intelligent queue handling.

None of this is built yet, but the design is there to allow me to develop it once the basic library is stable.

Fine-Grained Jobs

The other advantage that I am trying very hard to build into Updater is the ability to break Jobs down into smaller pieces. If you think about what a job queue looks like, it is very similar to the dynamics of a web front end: "(Users|Jobs) are trying to access (REST|computational) resources. A finite number of servers are available to handle those transactions and while handling one (User|Job) others are blocked waiting."

Anything that can be off-loaded from the request will increase throughput. Updater has the concept of chained jobs. That is Jobs that are run as part of another job. The idea then it to break bulky jobs into smaller pieces that can be scheduled independently. By making this an easy and obvious solution we give even a relativity dumb queue manager a better chance of working through all the jobs without becoming excessively blocked up.

Final Thoughts

At the 2009 ICFP, Guy L. Steele gave a talk, "Organizing Functional Code for Parallel Execution; or, foldl and foldr Considered Slightly Harmful" (video) about how our data structures (particularly cons lists) force serial processing on us. His suggestion was to use data structures that could be "split" into left and right halves and processed in parallel then reduced. This talk deserves its own post, but I wanted to mention it here sense it plays in so nicely. Priority queues and job queues suffer the same effect of a linear data model effecting how we process the data. There is more work to do here.

edit delete