Redox logo

The art of the small pull request

Aug 16, 2021

[Written in collaboration with TC Tobin-Campbell]

Most software engineers assume it's better to make many small changes rather than a few large ones. Often, in modern organizations, this means making lots of small pull requests (PRs), rather than big ones. But, what qualifies as a “small” change? And, how should developers work effectively with many small changes?  We assumed there would be benefits, but we weren't sure what to expect.

With the goal of understanding this phenomenon better, we decided to run an experiment where we focused on making our changes as narrowly focused as possible.  Here we’ll walk through what we did and relay some of the lessons learned along the way. 

Background

Coming out of one of our sprint retrospectives (a meeting where we discuss the good, the bad and the ugly of our previous work cycle), we noticed that some of our PRs sat idly in our team Slack channel for a day or two without a reviewer. While this wasn't a huge deal, we felt that there was room for improvement. We also noticed that some of those idle PRs were a bit on the larger side, which got us questioning whether PR length was causing potential reviewers to procrastinate.

For those who are not familiar, a pull request (PR) is a git based workflow tool used to view how a developer's work will change the existing code.  At Redox, we require that every code change that makes its way to production to be reviewed by at least one other engineer that is not the primary author. If it's approved, the code can be deployed. Otherwise, the PR can serve as a platform for comments and revisions. This process is known as a PR review. Here we’ll define a ‘Small PR’ in somewhat abstract terms - as a change-set with a singular purpose.  Later we will break down the small PR into a few distinct categories.

To get a better sense of how we will measure efficiency gains, we should probably define velocity. Velocity at Redox is a team metric, defined by the number of story points completed in a sprint.  We work on two-week cycles, and estimate the size of tasks using Fibonnaci story points (1, 2, 3, 5, 8, etc.). For the purposes of this experiment, we divided our team velocity by the number of developers to get a baseline “individual velocity".  This came out to 0.6 story points per day.

Hypothesis

Our hypothesis was simple. We felt that if we broke our changes into small PRs, we could

  • Increase our velocity.

  • Improve our overall satisfaction with our work. This of course is a much more subjective measure, but we thought it would be worthwhile to share our qualitative findings.

In order to test this, two engineers (TC and I) focused on breaking our work into very small change-sets. In some cases, these PRs made *no change* to system behavior, but moved the technical implementation of our project forward nonetheless. 

Observations

For 6 business days, we worked together on 1 service. We added some new functionality, as well as refactored substantially to alleviate some existing tech debt.  The high level summary is that over these 6 days we

  • Merged / deployed 46 PRs to production

  • Consistently increased the number of PRs we were creating and merging per day

  • Consistently reduced the amount of time any given PR was open

  • Beat the expected story point velocity by a factor of 1.5

Outcomes

Recall that there were two main objectives we set to test when conducting this experiment: (1) Increase our velocity (2) Improve our overall satisfaction with our work.

Small Pull Request Velocity

Our baseline “individual velocity" was measured at 0.6 points per day. For 6 business days, the two of us would have been expected to produce roughly 6 x 0.6 x 2 = 7.2 points. 

During this experiment, we completed 19 story points.  The improved velocity is a tangible result from the experiment. However, we acknowledge this contains small sample size and a sampling bias - we were pretty excited about this idea from the get-go!

The more interesting benefit came because of the interaction we had from the frequent reviews. These seemed to fundamentally improve our communication about the code.

The Joy of Pushing: Psychological Benefits of Small Pull Requests

1 - The art of the - boZvjzGCF5B1QwKoKXTcPNJou-4jQFStf0RPpE46dstl87ngeIBspTAFJ17QQki-lnvgBHLrI1gx4m2_ckKQjvzAaU--fTSyg3PK29XQTaFMqZFRms1Pj6CA5XLNOx-1IvbSL8Yd
A PR with passing checks, just waiting to be merged

One of the first things we noticed was that it felt good to get any PR - no matter how small - out to production.  It felt like a dopamine rush to hit that beautiful green button once all the green check-marks were in place.  We had PRs for single functions, and their associated tests. Heck, we had PRs to enable environment variables.

That may sound silly, but pushing chunks of code this small has practical benefits.  On the surface, some of these are obvious. Tiny PRs are very easy to review and don’t take much time. They are easy to test, and easy to reason about.  Conversely, large PRs are daunting to open, and they often sit in a channel without anyone claiming a review (recall this was the initial impetus for this experiment).  

More importantly, tiny PRs also allow for feedback on a micro-scale.  Things that would sometimes get glossed over as an implementation detail in a larger PR become a point of emphasis in a smaller one.  Whether it’s a tip on mocking in a unit test, or finding the most elegant way to map-reduce a data-set, we found these exchanges enjoyable and instructive.

On a more structural note, we noticed that once we were in the groove, PR reviews became more forward-looking than before.  Since any given PR being reviewed was likely going to be immediately followed by another, many code reviews became previews for what the next one would entail.  We found ourselves saying things like -

“This column will be used for X”.  

“This function will also handle Y once Z is in place.”

We were planting seeds in the mind of our reviewer for what was around the corner, which led to discussions on higher level strategy.  These “transitory” conversations between PRs became an important take-away from this exercise. They also created a new type of flow.

Dual Developer Flow: Small Pull Requests as a type of Pair Programming

Typically, when you think of the concept of flow, you think of someone in an individual zone producing at their max.  The “small-PR flow” is different because you have a partner. Once you know exactly what your partner is working on, and vice versa, you create a strong shared context about what’s going on in each other’s code.  In this state, you can move quickly because you are in sync at a high level, and can focus on simply implementing the small, task that is the immediate function in front of you.

For example, if one of us created a scaffold PR (defined in the flavors section below), and the other bought in, for the next few PRs, both developers could work on one of the functions within the scaffold with little primer.  The initial scaffold may warrant a conversation, but once that’s in place, the rest of the day could sail by rather smoothly without further voice interaction. In some ways, this process felt like pair programming, though we were seldom in situations where one of us had to watch the other write code on a call.

Challenges

The Yin and Yang of having a coding partner

Our dual-flow workflow also presented some challenges.  It worked well for us because we were both available for large portions of the day, at the same time.  In order to make forward progress, we essentially needed our pairing partner to be available to perform reviews, ask questions, etc.  When this wasn’t the case, a backlog of PRs would accumulate until the person working was forced to switch off to something else. This would have become especially pronounced with timezone differences between teammates.

Conflicts and Build Issues

Another (expected) challenge we ran into was the number of conflicts we experienced when working in the same area of the code.  However, we found a few tricks to avoid some of these.

Git doesn’t always do a great job discerning whether two new segments of code in the same area are a conflict, or if they should be added sequentially.  To combat this, we learned to scaffold our modules to create clear boundaries (opening / ending braces, as well as function names) that prevent conflicts.  

The “imports” sections of commonly used files were another common source for git conflicts. This was especially true when VSCode’s linting rules force you to alphabetize your imports, causing frequent shuffling.  To reduce these types of conflicts, it sometimes helps to try to import whatever libraries you foresee needing, and add these to scaffold PR. This is trickier to eliminate, since you don’t always know what dependencies you will need until you write the code.  

The final challenge we ran into was generating a ton of builds in the same service. With our continuous integration system checking tests on every commit, we sometimes caused a number of builds to queue up. These issues were certainly manageable, but something consider when adopting this pattern for an entire team or development organization. 

Pull Request Flavors

46 PRs is a lot of PRs on one service in a small amount of time. Looking back, we found that we were able to group our PRs into 1 of 4 categories.  This list is surely not comprehensive for all potential PRs that could exist, just the ones we observed.

  1. Migrations

Creating separate PRs for all database migrations is generally good practice, even outside of this experiment. Deployments are not necessarily atomic (especially in multi-container environments), so if you mix code changes with database changes, it can be very easy to accidentally get into a state where the running code and the database state are not in sync.  We found that a migration PR could be tested and reviewed in just a few minutes.

  1. Refactors

Refactor PRs made up a large chunk of the work we needed to do. We found a couple of strategies that helped us minimize risk and PR size.

Get your tests right first

Before we changed any production code, we first made sure that we had some good tests. Specifically, we wanted tests that would stay the same throughout the refactor. Ideally, if we broke behavior at any point during the process of refactoring, the tests would start to fail. This required something more than unit tests, so we created some integration level tests at the boundaries of the affected modules.  Once these guard-rails were in place, we felt confident modifying production code. 

Make incremental changes

It can be tempting to inadvertently make a massive PR when doing a refactor. Changing one thing leads to having to change another thing, and so on. In the spirit of this experiment, we did our best to fight this temptation.  When a series of functions are linked together in a cascading fashion, we tried to modify the ‘leaf’ function first.  In other words, we tried to only change one function signature at a time, if possible.  This was easier said than done in practice, but got easier with practice.

  1. Scaffolding PRs

One practice we really started to enjoy was creating code scaffolding in a PR. One way to think of this is as a module outline.  Towards the beginning of this experiment, scaffolding originally manifested as a pairing session. One of us would sketch out pseudo-code, and the other would ask questions on a screen-share.  However, we quickly realized we could replace this process with a PR.

We touched on this a bit earlier, but this can be done by listing all of the potential functions in a module, and stubbing out their signatures without implementation.  This kind of PR allowed us to get on the same page on the overall structure of the new functionality, before getting bogged down in the details of any given function.  It also provided an opportunity for function implementation to be worked out in parallel.

2 - The art of the - c1FOqN7OUauIRh5ewxD6m8ITD5AJF0rT97jCI3JYYUvkvy1dOEpDBumGMLzGu9mPfd5T9fP5RfjujpNOKmPo3mPctuH6rH_9stel9r3msrnWekjZccFysyNEuEioKZHpmIwnyAcQ
Scaffold PR example

Within the scaffold, we might have some commented-out pseudo-code within the braces, but not much else. As the project went on, our ideas for the implementation sometimes fluctuated from this original pseudo-code. Despite this fluctuation, the underlying skeleton of the code remained pretty consistent.

  1. Application Behavior Changes

These are the changes that actually affect the product. The big upshot from all the prior work is that the application behavior changes became much less scary.  After getting our database right, completing any necessary module refactors, and setting up our new code structures, we were able to make behavior changing code PRs with minimal lines, and at a high level of abstraction.  These final application changes would entail invoking one of the new or re-factored modules, or calling the function(s) with newly supported parameters.

Conclusion

At Redox, we really value iterative process adjustments that aim to improve efficiency and enjoyability of our day-to-day work. Sometimes the seeds for these changes start as a simple retrospective discussion question, and evolve into experiments that can shape the nature of our work in future sprints.

Despite the challenges in this experiment, we both felt that working with small PRs was fun and more efficient than the status quo. We learned that using this technique can increase the turn-around time for PR review, which was one of the original motivations for this experiment. We also learned techniques to use PRs to enhance asynchronous communication flow across a pair of developers.

Of course, there are still open questions. Would a small PR workflow beneficial on projects that don’t necessarily need more than one person’s focus?  Also, is this flow possible with more than 2 people, or is 2 really the optimal number?  There’s certainly more room for further experimentation in this realm.