On Tweaking Estimates

As Product Owner for the project, I get to field a lot of interesting questions. I woke up this morning to the following question from one of my teams:

Hello Mike,

In our current sprint, we have planned 8.5 user story points. But after task breakdown we can see for a couple of user stories, actual story points differ from the estimated story points. What is the correct course of action in this case? Do we need to update product backlog for revised user story points? Then we can say the team is working towards achieving 11.5 user story points! Also, since our [project gate commitment] is nearing, should we reexamine the estimates in the product backlog, and revise them for the remaining user stories?

My response is as follows:


As a rule, the reverse engineering of hours to story points is frowned upon in Agile circles. It presupposes that

  1. There is a meaningful correlation of points to hours, and
  2. The conversion factor you’re using for the reverse calculation is accurate.

In order for #2 above to be true,

  1. You have to fill out a “what is done” list, that identifies all the tasks that are performed for the delivery of a typical one-point story,
  2. then you have to assign size estimates to those tasks
  3. Those total up to a certain number of hours “what is a one?”,
  4. and that total is recorded on the “Resources” tab of the sprint backlog
  5. AND the story you want to reverse-engineer has to be “typical”.

Statistically speaking, the hour count for sized stories is not consistent. If you call a story a “1” and your “What Is a One” total is 25 hours, you will find the actual hours spent on one-point stories is rarely, exactly 25 hours. Most likely you will find a bell-curve distribution from 15 to 35 hours, with more stories clustered toward the middle than toward the endpoints. As the story sizes increase, the size of the deviation increases as well. Thus a two point story should be expected to cluster around 50 points, but your deviation probably gives you a 30 to 70 hour spread. Note, this overlaps the upper end of the one point story. 3 point stories should cluster around 75 hours, but the deviation could go anywhere from 45 to 105 hours, thus extending well into the 2-point arena. Using the above example, a five point story would center at 125 hours with a range from 75 to 175 hours. Wow!

As you can see, the bigger the story, the higher the uncertainty. This is why we like to see smaller story points in the backlog. Smaller stories have less uncertainty, so we can get a better projection of the end date.

You may ask yourself how we can possibly deliver a product on any sort of schedule with such variances in play. The answer lies in the law of averages. Stories will trend higher and some lower, but on average they will cluster around the expected mean. If you think about it, there have been more than a few 3-point stories in the backlog that tasked out to far less than you guys expected, yet the team received 3 points of credit. It appears you have found two stories from the other side of the distribution curve — but that interpretation is based on a “What is a One” total that may not reflect ‘reality’.

Reverse engineering of hours to story points is fraught with unknowns. Were all of your people there for the entire sprint? Where the same subject matter experts present during the estimation and tasking? Is the story “typical” – that is it doesn’t involve any sort of special tasks that aren’t accounted for in your sample one-point story? For that matter, do these stories let you leave out a task or two that would normally be necessary? Is the story evenly distributed across all the disciplines, or is it “lopsided”, giving more for developers to do, than testers … or vice versa? The variables are endless, and therefore defy a simple division problem as the means to achieve an accurate result. If you’re not convinced yet, consider this: the team made an “estimate”, they didn’t make an “actual”. So you’re compounding the entire operation with a guess – albeit an educated guess, but a guess nonetheless.

Should you change the size estimates for these stories?
My general rule of thumb, is that if the team wants to change an estimate, then the time to change the story point estimate is BEFORE they task it out. Story points are measurements of relative effort. So this story is bigger, smaller, or about the same as that other story, and how much bigger or smaller is it. Once you get to hour counting, it’s not an effort estimate anymore, it’s a time estimate.

So, since you already tasked the stories, I’m inclined to recommend that you keep the story estimates as they stand, and take it as 8.5 story points for the sprint. Just assume the stories are on the high side of the bell curve.
Since you didn’t email me and say, “We’ve taken on too much for the sprint”, I assume you feel the amount of work is not beyond the capability of the team to deliver in the sprint. So, if after everything I said above you still want to alter the story totals, be advised of the following:
1) No matter what size you say they are, you still have to complete them within the sprint.
2) The size of the stories will affect your average velocity. If you adjust them up, your average will increase, and expectations will increase with them.

It’s far more likely that you have an invalid number in the “What is a One?” field. The sad fact is that we misnamed that field in the sprint backlog in an attempt to satisfy the wishes of more traditional project leads. Despite repeated statements from the agilists that there is no correlation between hours and points, this one field implies that there is one, and continues to contribute to questions like this.

Should you re-estimate the stories in the Backlog?

That’s an entirely different question. If your inclination is to change the backlog estimates because you think it is full of 3 points stories that are really 5 point stories, then you should absolutely NOT change the estimates. All that matters is that the stories in the backlog are sized relative and consistent to one another. If all your 3-pointers are really 5’s , then your average velocity will come down such that you can deliver these stories within the sprints. If the opposite were true, and your 5 pointers are really 3’s, then your average velocity would increase so that you could take on more stories within a sprint. The team / estimates / velocities would normalize to match one another. Playing with the numbers is just an exercise in making realities match someone else’s expectations. If you are receiving pressure to make that sort of thing happen, please tell me who is doing it, so I can have a talk with them!

If you’d like to spot-check the stories, and determine if you think any of the estimates are wildly inaccurate based on new information you’ve received since the estimate was made, then please do so, and let us know what you find. However, if you intend to take team members offline to do this activity instead of working on delivering committed work, then I would again, say absolutely not.


Author: Michael Marchi

Michael Marchi CSM, CSPO, CSP-SM, CSP-PO, RSASP, AHF Management Consultant / Agile Coach & Trainer @ 42 North Unlimited (https://42north.llc) Co-Founder and Board Member @ APLN Chicago (https://aplnchicago.org) Co-Host [here's this agile thing] podcast (https://htat.show)