Engineering leaders strive for delivering value and positive impact. Delivering value is sometimes via building new stuff, and sometimes via improving the existing engineering systems. The latter could take the form of modernizing a legacy codebase, upgrading technologies, restructuring the engineering organization, or elevating the engineering culture. In most cases, it is a combination of all of the above. I collectively call these efforts “engineering transformation”.

But how do engineering leaders promote transformation? It starts by understanding the current landscape and identifying opportunities and challenges.

1. Understand the opportunities and challenges

Working with an existing product has unique opportunities and challenges. Understanding these opportunities helps put the engineering leaders in the right mindset about how to approach the transformation.

Opportunities:

More scale. Working with an established user base can be rewarding for engineers who want to see the impact of their work on other people’s work and life.
Clear usage history. There is usually a longer history of real-world feedback and usage patterns with established products. This could help with understanding the needs of the user base.
Less unknown unknowns. Existing products have gone through more stress testing and surfaced corner cases and rare failure modes. In other words, problem areas (both internal and external) to the app are better known.
More stability. Compared to nascent startups, established products with a reliable user base have less risk of going under. That means less risk and uncertainty about whether the company will “make it”.

Challenges:

Uptime requirements. Working with live products means less flexibility for downtime or user-facing regression. All changes need to happen while keeping the product live.
Culture is sticky. Many cultural patterns are hard to unstuck.
Internal buy-in. It is sometimes hard to justify to the executing team that engineering needs to refactor a “working” codebase if they want to keep expanding the product.

The mandate for engineering transformation usually comes from the business executives who are frustrated by the slow pace of innovation or frequent issues with the platform that hurts user experience or worker productivity. But this feedback is usually not enough information to come up with actionable items for the engineering team. That form of actionable information is usually collected from the bottom up.

2. Collect information

Starting with a survey of the engineers could be very informative. Use open ended questions, and avoid asking loaded questions. Some questions you might want to consider include:

What are the biggest bottlenecks in the software delivery flow?
What is stopping you from collaborating with your teammates?
Are you clear on your weekly, monthly and yearly objectives?
Do you have visibility into what other engineers outside your group are working on?
Are you clear on the product roadmap?
Who is in charge of the product specs of what you are working on? Do you hear from them or talk to them regularly?
If you were the VP of Engineering what would change?
Do you think we are being held back because of the technologies we use in our engineering stack?
Do you think code change requests take a long time to process?
Do you think we spend more time than normal on fixing issues caused by new releases?

Perform 1-1’s with key stakeholders at all levels of the organization. Make sure to lead these conversations with empathy. The goal is not to find the “culprit” or blame a person or team or department. Think of it as a blameless retrospective: assume that everyone involved in an incident had good intentions and did the right thing with the information they had.

Usually, the information collection phase results in identifying gaps in one or more of the following areas:

Technology
Codebase and software delivery pipeline
Organization and culture

The next steps are to dig deeper into each area, and come up with actionable plans for addressing them.

3. Plan for technology upgrades

The major symptoms of outdated technology are frequent outages, and slow and non-scaling systems.

Many technologies such as databases, caching layers, messaging frameworks, and compute resources get frequent upgrades to meet the current technological advances and demands of the new products. However, the technology upgrades come with a cost. Teams need to take time away from other priorities and feature development work to spend on refactoring existing systems. A cost-benefit analysis framework like RICE analysis helps prioritize technology upgrades.

Always keep product-centric value delivery in mind, even when refactoring existing code. The focus should be on the product, and how to maintain or improve the user experience, and that focus should be reiterated at every step of the software development lifecycle.

4. Improve codebase and software development lifecycle

The major symptoms of a problematic codebase are slow development and failed deployments. Other symptoms are: prolonged oboarding period for new engineers, slow review cycle for change requests, and uneven (by individuals and level of tenure and seniority) contribution to the codebase. These symptoms mean that the codebase is confusing, convoluted, or lacking in documentation and test coverage.

The end goal is to increase development velocity and iteration stability. Start by establishing metrics that communicate and reinforce desired objectives. Then look at the entire software delivery pipeline and find improvement opportunities. Finally, promote a DevOps transformation to support the desired processes.

Use metrics to monitor your progress

Metrics are critical because most teams end up optimizing their practices towards the metrics used for measuring progress. There are many metrics that engineering teams can choose to measure. Some of the more common and battle-tested metrics are:

DORA Metrics

DORA metrics were introduced by Google’s SRE team after six years of research into DevOps best practices. Google identified four key metrics that indicate the performance of a software development team:

Deployment Frequency: How often an organization successfully releases to production.
Lead Time for Changes: The amount of time it takes a commit to get into production.
Change Failure Rate: The percentage of deployments causing a failure in production.
Time to Restore Service: How long it takes an organization to recover from a failure in production.

FLOW metrics

Flow metrics are concerned with the entire end-to-end flow of a software value stream. They include the following:

Velocity: Measures how many items or units of work are completed within a specific period of time.
Time: Captures the time that a flow item takes from start to completion, which includes both active times and wait times.
Load: Refers to the number of flow items (or units of work) that are part of the value stream (i.e., in the “To Do” or “In Progress” or similar modes).
Efficiency: It is the ratio of active time to wait time. It is calculated as the percentage of time that flow items are being worked on vs. the total amount of time they spend in the value stream. It identifies whether the waste is increasing or decreasing within the delivery value stream.
Distribution: The proportion of the four flow items: features, debt, risk, and defects. It helps decision-makers prioritize the flow items that matter the most.

I recommend starting with a simple set of metrics to ease the team into metric collection. It is more important to be consistent than to be comprehensive. Lastly, measuring and reporting these metrics requires some upfront time investment, so make sure to budget enough time for that.

DevOps transformation

The health of a company’s DevOps practices directly contributes to its ability to churn out good software quickly. The tenets of modern DevOps are:

Continuous Integration: develop code in small increments, and commit to the trunk often.
Continuous Delivery: small, incremental updates to the application. Deploy the changes into production as they are committed to the trunk.
Microservices: break down the application into individual components that are developed, deployed, scaled, and managed independently and communicate with each other via a contract.
Infrastructure as Code: define the infrastructure that runs the application in code, and manage and deploy it automatically.
Monitoring and Logging: monitor the application with actionable metrics and alerts. Log what is needed to debug the application and to extract the metrics.
Communication and Collaboration: promote collaboration and communication between different teams.

DevOps is a big item and it can be overwhelming to try and approach all aspects of it at once. The goal should be to first do a comprehensive analysis of the existing DevOps practices and finding gaps. Then, through prioritization, plan for addressing those gaps.

Improve test coverage

If you are planning to make significant changes to the codebase, having good test coverage and a rigorous QA regime is essential, as it will:

make releases more stable, which reduces regressions (lower change failure rate),
enable junior and new engineers to contribute code more quickly and confidently,
increase feature delivery rate by helping teams spend more time on feature development and less on bug fixes.

Start by creating a formal function within the engineering team to be in charge of setting up and enforcing QA practices. For more on software testing, read my other post here.

5. Promote an organizational and cultural shift

On the organization side, focus should be on aligning the communication lines with the product requirements. Different organizational structures such as product pods, matrix org, and a singular org can suit different products. Start from the product requirements and work your way back: do we need to split up the engineering team into siloed product pods? Will the team benefit from being part of one big organization? Do we have enough overlap between different product components to use a matrix structure?

Cultural shifts are hard, messy, and often very disruptive. Rarely can a company achieve cultural transformation without losing some people (voluntarily or involuntarily). So, the decision to shift or transform the culture should be taken seriously and with a lot of thought and planning. Cultural transformation stems from the belief that people are more important than processes, and processes are more important than technology.

Start by understanding the existing culture. Does the company have a well-defined culture? Are the company’s purpose, mission and values clearly specified and communicated? Where does it stem from? Has it changed through the life of the company? Was it set by the founders? If the company does not have a well-defined culture, how are decisions normally made? Who are the influential decision makers in the company?

Through conversations with the executives and influential decision makers, explore whether a cultural shift or transformation is needed to achieve the engineering goals, and secure buy-in to pursue such changes. There are different strategies to pursue cultural changes, ranging from commissioning external consultants to forming internal working groups to starting small with one item at a time.

Summary

Joining a mature engineering team has opportunities and challenges. On one hand, it is exciting to work with a product that has user traction and is a “known quantity”. On the other hand, making changes on a live product requires significantly more planning and coordination. Furthermore, cultural and legacy issues can be hard to navigate.

Dimension	Core tenets	Actionable items	Checks & balances
Codebase & Software Development Lifecycle	– Optimize development velocity- Stabilize deployment – Continuous Integration- Continuous Delivery- Microservices- Infrastructure as Code- Monitoring and Logging- Communication and Collaboration	– Improve Test Coverage- Automated builds- Break up monolith services into smaller components- Build dashboards and alerting tools to stay on top of important metrics	– DORA Merics- FLOW metrics
Technology	– Upgrade cycles- Build vs buy- SLAs, SLOs, SLIs- Breakdown of engineering efforts into new features, improvement of existing features, etc.	– Budget for technology improvements into the overall engineering work distribution- Perform RICE analysis to prioritize projects	– Follow a product-centric approach.- User satisfaction surveys
Organization and Culture	– Align org structure with product needs- People are more important than processes, and processes are more important than technology	– Understand current culture- Identify areas for cultural improvement- Emphasize 1-1’s, performance reviews, career mobility	– Engagement metrics: PTO days, retention, attrition- Employee surveys- Participation in company events- External review sites such as GlassDoor and Indeed.

Engineering Transformation happens along three dimensions: codebase, technology, org / culture.

Navigating Engineering Transformation