Running a tight ship

05 May 2024 - Jedd Campbell

In the past year our software development team has grown from two developers to six. The system we’re building has increased in scope, scale, and complexity. Our clients are larger and more vocal as they lean more and more weight on our platform. We need to adapt our thoughts and processes to match.

The problem space

Things aren’t scaling linearly. Our team, client base, and product are all growing at the same time. A larger team means more hands on deck, but also more communication complexity. A bigger client base means more revenue, but places a higher demand on support. Our product has more features than ever, but it’s surface area is huge and it requires way more maintenance.

Communication and Team Dynamics

It wasn’t that long ago that we had only three people involved with development communication: myself, Chris, and Riaan (two developers and the CEO). With three people in the loop we had only three unique communication channels (Jedd/Riaan, Jedd/Chris, Chris/Riaan). Now, with seven people in the loop we have 21 unique communication channels (not going to list them out). Communication is way more complex now. Chris, Riaan and I know the system quite well, but we now have four additions to the team who are still learning the ropes. We’re spending more time communicating than ever before. The team now has three times the amount of code to review. Chris and I each used to write about 50% of the code, now we each write 16.67% of the code (for now we write proportionally more, but this should trend to 16.67%). That means I have to spend time familiarizing myself with 250% more code than before, or else progressively lose sight of more of the system.

We need to keep more people in the loop, and keep track of more work.
There are more people to train and get up to speed.
There’s way more code to review.

System Scope and Complexity

Our system has also increased in scope, and work has slowed down significantly. Our features have become interlinked. Every change we make impacts a larger part of the system. For example, when making a single change to our checklists feature we now have to ask how that change affects reports, tasks, notifications, contractors, forms, and the mobile application. If we make a mistake, it has the potential to affect every other feature that it’s linked to. Each time we take a shortcut, name something poorly, rush a function, or fail to properly think through a database design, we make it harder to maintain for the next person who comes along.

Our product’s surface area is increasing rapidly, and with it the maintenance cost.
Making changes is more complex and impactful.
Our technical debt is piling up and increasing in cost.
One unit of work is taking more units of time to complete.

Clients and Support

Our client base has also grown, and each client has different needs and priorities. These needs sometimes emerge in a seasonal sort of way. For example, there might be a deadline for safety file submissions, and so for a few weeks companies are focusing on safety files. Naturally, there will be an uptick in urgency and frequency of safety file tickets. So all of a sudden we need to devote more attention and resources to the safety file feature. Some companies do tons of checklists, others safety files, or risk assessments, and so we get more request from them on those specific features. As our clients and their needs grow, we’re forced to adapt and improve our system with them.

We’re spending more time on client requests and support.
The increased activity is exposing poorly optimized areas of our system.
Every bit of additional integration significantly increases system complexity.

Project Management

Writing technical documentation is a pain. But you know what else is a pain? Not having technical documentation. A few months ago we didn’t really need it, so we didn’t cultivate the habit or allocate any time to writing documentation. Now we need to play catch-up. A year ago two developers could handle the workload. Project management was an after-thought. Now we need a way to prioritize work and allocate resources. We need to formalize how we do things.

Project Manager

We need a project manager. We need someone who can focus on the project as a whole, from code quality to customer satisfaction. Someone who is responsible for planning and monitoring the project, and charting it’s trajectory. This feels like a very corporate thing to implement, but at some point we need to do it. We can’t all keep tabs on the whole project, and it’s only going to get more difficult to manage.

Who is it going to be?
What are their responsibilities?

Architecture and design

Currently, we do most of the design work in our heads. As time goes by, we forget why we made certain decisions, or implemented something a certain way. The original vision for a feature fades, and we’re left with an approximate version of the original plan. How close was it to the original plan? No idea. This also makes knowledge transfer a lot more difficult. Each developer needs to rediscover how a feature works, and why it works that way. We start to lose track of our system as a whole. We need to spend more time on the design process. Without a design process, each person has a different idea of how a feature should work. The developer doing the work might have the wrong idea, and spend a week doing work that needs to be refactored immediately. Spending a day on design might feel like a waste of time, but in the long run it’ll speed us up. It’s way easier to catch issues in the design phase and change a diagram than it is to rework code. It’s about improving the way we communicate ideas so that there is more clarity between everyone involved. It’s not about creating rigid specifications that we stick to no matter what.

We need to formalize the design process.
We need to map out our current architecture.
We need to map out each feature of the system.

Time management

Software development requires focus. As our team grows, there’s more code to review, more discussions, more rubber ducking, more questions to ask and answer. If we don’t properly manage our time, a day can become too fragmented to get any meaningful work done. We need to structure some of the day so that we can plan better, and have more uninterrupted time to focus. We also need to formalize our development cycle and when we review code, test/demo that code, and deploy to production.

We need to allocate specific time for questions.
We need to allocate specific days/times for code review.
Implement code demos after code review.

Feature evaluation

We have more than 30 features on our system. Each feature could be it’s own standalone product. Sometimes we ship a feature, and then don’t really follow up on it for months (or years). Are people using it? Does it work well? Does it solve the problem as intended? How much friction does it add? As developers, we aren’t always capable of sliding into the user’s shoes and seeing the system from their perspective. And we also don’t do a good job of seeing the feature from the perspective of our fellow developers, who will have to help maintain the code we just wrote.

We need to evaluate each feature from a user perspective, and a developer perspective.
We need to review the workflows and iron out kinks.
We need to standardize features so that they don’t handle like completely different systems.
Let’s evaluate consistency, complexity, importance, user rating, dev rating.

Big picture

Where are we going? How are we doing? We ask those questions infrequently, if at all. We’re bombarded with bugs and requests, so sometimes we get the idea that it’s not going well. We get tunnel vision on the specific part of the system that we’re engrossed in. Then we move on to the next bit. We need to take a step back and ask some of the bigger questions. Are we where we want to be? If not, how do we get there?

Ethos

What are the guiding principles that we can embody in our role as developers on this specific project? We want our software to be safe, secure, performant, and valuable to our users. We want our software to solve real-world problems, not exacerbate them. We want our codebase to be clean, maintainable, and well-written. To borrow from TigerBeetle’s TIGER_STYLE.md, we want our project to be an intersection of engineering and art.

We need to formalize our developer ethos.
Our priorities should be: Safety > Performance > User XP > Dev XP.
We should write code that we’re proud of.
Ego should not get in the way of code quality.

Technical debt

In software development, a project can accrue technical debt every time a short term solution is implemented instead of a long term one. Sometimes this choice is deliberate, but at some point that debt needs to be paid off. Our project has a fair amount of technical debt to pay off. The longer we leave it, the more costly it gets (somehow, it comes with interest). This is something we need to be aggressive with, as it’s only going to get worse. Our productivity is already starting to decrease as more time is spent paying off interest, and not servicing the body of our technical debt. To get out of this cycle we need to take an even larger productivity hit so that we can make some bulk payments. We need to slow down so that we can speed up.

Lack of documentation

Writing documentation adds a lot of overhead to software development, at least in the short term. Developers are notorious for not wanting to write documentation, or writing it poorly. It’s a skill that takes time to develop and integrate into your day of coding. It takes time to map out the requirements, draw up the architecture and design, and document the code for your fellow developers. And keeping documentation up to date is tough. It’s way more fun to dive into the code and emerge with a finished feature, no documentation required!

We’ve put thousands of hours into this project, and all the knowledge and effort buried within is wasted if it can’t be transferred to other developers. Imagine if other systems like Stripe, or Laravel, or GitHub had no documentation. They’d be inaccessible to us. We’re clipping our own wings by failing to write good, readable, maintainable documentation.

We need to craft a style guide and template for our documentation.
We need to pay off the debt and document existing features, starting with the most important parts of our system.
We need to include documentation in the workflow of new features, fixes, and patches.
Documentation needs to become part of our developer culture.

No automated testing

Like documentation, writing automated tests adds overhead to the development process. It forces us to think more clearly about our code. We’ll need to be succinct and follow good coding practices. Code can be so poorly written and designed that it’s basically untestable. The technical debt we’ve accrued here is two-fold. First, we need to rewire our brains and learn how to write testable code. Second, we need to trudge through existing code, some of which may be nigh untestable. We’ll need to add tests where we can, refactor where we cannot, and write tests for all new code going forward. In the long run, we’ll have cleaner code, and an automated way of testing it.

We need to learn what it means to write useful tests. From this we can create a test guide.
We should start by writing tests for our most critical code.
We need to write tests for all new code going forward.
We need to pay off debt by writing tests for the rest of our code.

Refactoring

This is the housekeeping part of software development. It usually involves changing the underlying code in some way without changing the functionality, usually to improve it’s maintainability or flexibility, or to remove code smells. A code smell is not usually a bug, but rather a sign of poor craftsmanship. It’s like a poorly formed sentence: it might be grammatically correct, but it was difficult for the reader to understand. Easy reading is hard writing. This is true for both linguistics and code.

We have a list of refactoring that needs to be done. We need to chip away at it.
We need to set the standards for code quality, and not be afraid to call each other out when a piece is sub-par.
Sometimes we need to ship code ASAP. If refactoring debt is deliberately incurred, that needs to be added to the list and dealt with.

Further reading

Technical debt is an interesting subject in software development. It’s one of the reasons software projects often miss deadlines and run over the budget. This Wikipedia entry is worth reading for an overview of technical debt, how it’s incurred, the consequences, and how to pay it off.

Questions we need to answer

We’ve got tunnel vision. The biggest concern in a given day is the code we need to write to meet the next deadline. There’s nothing asking us to step back, look at the bigger picture, and see how it all fits into the plan. When we enter crisis mode, it becomes painfully obvious that we didn’t have the bigger picture in sight. Without our long term goals in mind, we lose sight of what’s important and each problem becomes more urgent than the last. These are just a few questions we should be asking regularly, and we need continuously refine it:

Questions about our system

How much technical debt have we accrued?
What are our baselines (safety, performance, experience) and how far off are we?

Questions about our clients

How do our clients feel about our support?
How do our clients feel about our system?
What can we do to improve our reputation and client trust?
What’s our filter for accepting suggestions and requests from clients, so that we’re not saying yes to absolutely everything?

Questions about our team

Is our team working on the highest impact items on the list?
How can we reduce our team’s stress?
What’s hindering our team’s performance the most right now, if anything?
How do our implementors feel? They’re the ones on the ground. We can get feedback from them and see what’s impacting us the most.

Possible Solutions

It seems we have several mountains of work ahead of us. Where do we even start? We can’t implement this all at once, or fix things overnight. This is a long term problem and it requires a long term solution. Let’s start with the low-hanging fruit, and then move on to things we’ll need to implement over time.

Low-hanging fruit

Appoint a project manager	We need someone to constantly think about these issues and drive these changes. They can get started on the long term planning and help keep the team on track.
Reduce our error logs to zero	Our error log contains tons of relevant and irrelevant logs. Browsing through it is overwhelming, so it’s become quite meaningless. We should get our system to a place where it generates no logs, because we’ve fixed everything. Any new log should notify us and we should be aggressive in fixing it. This is a proactive way to improve our system’s quality and health.
Implement code demos	One of the problems is that we push a lot of features to production that was only seen by the developer who wrote it, Basic issues slip past us because we only focused on reviewing the code, and not the feature itself.