Getting Started with Process Behavior Charts
Process Behavior Charts (PBCs), also known as Control Charts, are a simple statistical chart that gives us the ability to interpret what any process is telling us about itself in order to more easily distinguish between special cause and common cause variation. "Knowledge of Variation" is one of the 4 components of Dr. Deming's System of Profound Knowledge. This post is simply a collection of background information on creating and using PBCs that I hope will be handy as we get to future posts discussing specific examples.
A PBC consists of 2 charts, also referred to as an XmR. The top chart contains the "run chart", which is simply a plot of time series data from any process (e.g. deployments per week over 6 months), with 3 horizontal overlayed lines representing the mean (i.e. average of the points) and the upper and lower "natural process limits". These are very simple algebraic equations that can be easily calculated and charted in Excel. The second chart, on the bottom (labeled the mR chart), is a chart of the ranges between sequential points from the run chart. This is then overlayed with its mean line its upper limit for the moving range.
Below are some links to further specifics on constructing and interpreting PBCs.
"Process Behavior Chart" is the more modern name for a Control Chart. Control charts were originally created by Dr. Walter Shewhart in the 1920s.
See A Beginner's Guide to Control Charts for another high level explanation.
The best book with very approachable explanations (including math) that I have seen is Understanding Variation by Donald J. Wheeler.
- Or there is an even more digestible PDF by Dr. Wheeler found here: “Individual Charts Done Right and Wrong”
With respect to software delivery, another excellent book I've found specifically oriented to Kanban and measuring metrics such as throughput and cycle times is Actionable Agile Metrics (AAM) by Daniel Vacanti.
- AAM goes into additional background on PBCs and uses them throughout.
What Are Process Behavior Charts?
Ultimately, it is all about identifying variation in a system, whether that is sprint metrics or DORA metrics. The 2 types of signal to identify are "special cause" and "common cause" variation.
- Common cause describe points with the "control limits" on a PBC.
- Special cause describes points outside those limits.
- This is essentially as described by detection rule 1 below.
- "Common cause" (Deming) can also be referred to as "Chance cause" (Shewhart) or "Routine cause" (Wheeler).
- "Special cause" (Deming) can also be referred to as "Assignable cause" (Shewhart) or "Exceptional cause" (Wheeler).
At a high level, we want to examine special causes to see if we can reduce their frequency, since they are evidence of where our process is not in control. However, we also need to take care to not overreact to special cause variation as something that is a new process or might be construed as a sign of improvement. You must examine the context to know.
This lies at the heart of continual improvement and the Plan-Do-Study-Act (PDSA) loop. We need a system to be "under control" in order to truly detect if an experimental change to our system has improved the system and not thrown it out of control. Improvement could be characterized by narrowing the range between the control limits or by shifting the overall process chart up or down depending on your context.
On Presenting Data...
Dr. Shewhart's 2 rules for presenting data:
- Data should always be presented in such a way that preserves the evidence in the data for the predictions that may be made from the data.
- Whenever an average, range, or histogram is used to summarize data, the summary should not mislead the user into taking any action that the user would not take if the data were presented in a time series.
There is a ton of information and knowledge out there about this type of data visualization, especially for process control. In fact, there are again several recurring patterns, no matter the type of time series data being observed. These patterns can be used to more easily see (detect) what signals your system is showing you.
- Detection Rule One: A single point outside the computed natural process limits on either the X Chart or the mR chart should be interpreted as an indication of the presence of a special cause that has a dominant effect.
- Detection Rule Two: Two out of three successive values that are beyond one of the two sigma lines (both on the same side of the average) are likely to signal a moderate process change.
- Detection Rule Three: Four out of five successive values that are beyond one of the one sigma lines (all four on the same side of the average) are likely to signal a moderate, sustained shift in the process.
- Detection Rule Four: Eight successive values on the same side of the average are likely to signal a small, sustained shift in the process.
- When you see "3 sigma from the mean" or the natural process limits in this context, take care to note that "sigma" != "standard deviation"
- This is not to imply that there is anything bad or inherently wrong in general about using standard deviation. It is only not correct or useful in the context of PBCs.
- Further, what Jira labels "control charts" are not actually control charts.
Nelson Rules for Interpreting Control Charts
Nelson Rules were developed in the 1950s and can be used with any control chart. They include the four Western Electric Rules (1-4) plus four more (5-8):
- One point above UCL or below LCL
- Two points above/below 2 sigma Nelson Rules state that 2 out of 3 points above/below 2 sigma MUST be above OR below the centerline
- Four out of five points above/below 1 sigma
- Eight points in a row above/below the center line
- Six points in a row ascending or descending (trend)
- 15 points in a row "hugging" the center line (between -1 and +1 sigma)
- 14 points in a row alternating up and down
- Eight points in a row above 1 sigma or below -1 sigma
It is very useful represent your process time series this way to easily see and detect special cause variation in order to be able to attempt to attribute it to changes or other occurrences in the environment. This is critically important in order to properly identify how to react these occurrences. Without this type of analysis, you are essentially "shooting from the hip" and acting irresponsibly.
How Many Points Do I Need?
The subtle art to this, which feels very open to interpretation, is when to interpret a signal as a point to jump off and reset, redraw, recalculate the chart and its limits, because we believe we have a new process now, versus extending the existing chart and assuming we are still maintaining and observing the same process.
You need at least 4-6 points minimum in order to even draw a chart. 10-20 points though is what we want to aim for in order to be confident in our interpretation of our current process.
- There is a good discussion of the number of points to use in AAM Chapter 5, "How Much Data?"
- "Most Shewhart Charts need 12 data points to establish trial limits and 20 to set a baseline."
- How many data points do I need? by David M. Williams, Ph.D.
This is a reflection of my current understanding of both how very powerful these concepts and techniques can be in the realm of software delivery, but also how to potentially go about doing so. I'm looking forward to continuing the discussion especially with examples as I come across them.
If you have questions or a data set you'd like to explore further, please reach out and connect.
- v1.0.0 - Jan 30, 2024