When a job tracker receives a map reduce job, it will divvy out tasks to several task trackers in order to complete the job. If any of those tasks fails for whatever reason (perhaps they threw an exception), then it’s up to the job tracker to restart the job on another slave. This process can occur up to three times before the job tracker gives up. But what happens if a task doesn’t fail, but it doesn’t succeed either? What if it just hangs? Perhaps that map task received an extra large or extra tough block to work with. Maybe some other application on that task tracker is running and it’s hogging the entire CPU. Maybe the task tracker has entered an infinite loop. Either way, the task tracker continues to check in from time to time, which prevents it from being killed outright, but it just isn’t finishing. The job tracker can’t possibly know why this task tracker is taking longer nor can it know when or if it will finish. What does the job tracker do?
Without shutting down the first task tracker, it goes to another task tracker and gives it the same job. Then it’s a race. Whoever finishes first is the one that gets to submit its results. The other is killed (a most cutthroat race). That’s it.
Speculative execution isn’t always appropriate. In fact, some people recommend that you disable it for reduce jobs entirely. Why? Because while map tasks tend to be very evenly distributed since each task is responsible for one block of data or less (except when a large file is flagged as non-splittable, but that’s another story), reduce tasks can suffer from data skew. If one of the keys produced from the map tasks has a million values associated with it while all the rest have only a thousand, then one reducer is just going to have to deal. It will take longer than the others and spinning up speculative reducer is just going to bog down another task tracker with a million-value reduce task.
There are other times when speculative execution is a bad idea. Take, for instance, Sqoop. When it’s importing a database into hadoop, speculative execution will do nothing more than mess everything up for everyone. If speculative execution is allowed to occur, then whenever a task is taking a comparatively long time importing a table, another, duplicate task will be started up that will import the same set of data. This will result in, well, duplicates. If you have a map reduce task where speculative execution makes no sense, you can disable it through the configuration. The
mapreduce.map.speculative property disables map task speculative execution while
mapreduce.reduce.speculative disables it for reduce tasks (or
mapred.reduce.tasks.speculative.execution for old versions). In the Sqoop example, the speculative execution is hard-coded to be off, even if you turn it on through your configuration or through the command line.
Do you use speculative execution? Can you think of other times in which it’d be a bad (or a wise) idea?