Probably the most consistent and accurate explanation would be that monorepo is a repository that contains multiple different projects with clearly and well-defined interrelationships. Without these clearly defined relationships and boundaries between projects, monorepo is reduced to code collocation. The code should be separated and encapsulated according to the logical units that are defined in advance during the process of its design. Monorepo, although the name may at first suggest so, is the exact opposite of a monolithic repository.
The monorepo style of development is an approach where:
However, monorepo is not a one-size-fits-all solution. No solution is. In this blog post we will present the benefits and challenges when it comes to the monorepo approach to development for a company or the application itself.
Myths are mostly a result of using tools whose primary idea is code collocation. These attempts usually end up with a bunch of problems and a conclusion that it is not a sustainable approach to development in a multi-project-multi-team environment. These tools do not represent what is actually the concept of the monorepo approach (as used by e.g., Google, Microsoft, Meta or Uber).
Perhaps the most common myth, which stems from linking the repository with the deployment artifact, is that we have to release all projects at the same time. When we think a little more, it is not so difficult to see that where the code is developed, and what or when we release are two orthogonal activities. That is, they don't have to have any mutual insight into how they work.
This is supported by the good practice of building and saving artifacts every time CI is done. Their deployment to default environments should be done in a separate deployment phase. In other words, just deploying an application to an environment should not require any access to the repository, either one or multiple.
The conclusion of this is that monorepo isn’t the same as a monolithic repository, quite the opposite. Monorepo facilitates code sharing and cross-project refactoring, which greatly reduces the cost of developing shared libraries, microservices, and microfrontends.
This myth stems from using only repository level settings to manage both access and permissions. In fact, most tools allow setting ownership on a per-folder basis. For example, GitHub has an option called CODEOWNERS where through a simple configuration file, the owner or admin of a repository, can define all the necessary settings.
This architecture actually allows (and encourages) a much more granular approach to permissions management, when it comes to code-sharing. For example, team-A develops library-A for its internal needs but does not want team-B to be able to use it. One of the main reasons may be that teams become closely related and dependent on each other. In a multi-repo setup, nothing prevents team-B from using the specified library, and team A will know nothing about it. In fact, most monorepo tools allow defining library visibility in a very precise way.
This myth stems from the premise mentioned under myth #2, which is that in a classic multi-repo environment there are no obstacles to using code from any other repository. Dependency graphs are usually outdated the day after they were created.
In monorepo, creating a new library takes a few seconds, and because of this, developers tend to create a lot more of them (which is the best practice). In addition, if we stick to the fact that every library must have a well-defined public API and that use is possible only through it, we avoid a complicated tangle.
Some tools, such as Nx, automate the creation of dependency graphs, which further facilitates the monitoring of the repository. It is also possible, by adding metadata, to define restrictions that, for example, can guarantee that the presentation components cannot depend on the code that takes care of state-management. All of this is actually different from what most people think about how monorepositories work.
This was already mentioned earlier - only what is affected by the change is rebuilt and retested. Of course, it cannot be scaled endlessly without some interventions on the CI side and of course if the repository is large and a change is made that affects a lot of things and CI will take some time to finish.
Will GIT be in trouble in the case of a large monorepo? This is a legitimate cause for concern. If your repo has millions of files, most of the tools we work in, including GIT, will stop working. But this is really an edge-case and for such cases Microsoft created GVFS that works with huge repositories.
Feature-branch type of development and monorepo architecture do not agree. This type of development is less common today anyway, but this definitely should be taken into account.
Monorepo architecture is not mainstream, so problems can be expected with some services if they expect only one artifact as a build product or coverage report based on the repository. In most cases this can be bypassed.
Moving to monorepo requires a change in approach to CI because we don't longer build one application, but all those affected by the change. Although popular services, such as Azure, Circle and Jenkins are flexible enough to work with, say, Nx, it still takes some time to set up.
Monorepositories enable easier changes that affect a wider code-base. However, it is necessary to think things through and make all breaking changes so that they are backward-compatible and help everyone affected with the transition from the old version to the new one.
The key question, in this case, is what the tools should provide us to make our experience working with the monorepo repository as pleasant and consistent as possible. The features that a tool should have can be divided into 3 groups:
1. those that enable faster work (speed)
2. those that increase comprehensibility (understanding)
3. those that enable ease of management (manageability)
• local computation cache - the ability to store and reproduce files and processes that are the result of some task. This means that you will never build and/or test the same thing twice on the same device.
• distributed computation cache - the possibility of sharing artifacts from the cache between different environments. This means that no one within the organization, including CI agents, will build and/or test the same thing twice.
• transparent remote execution - the ability to perform any task on multiple remote computers, while still developing locally
• local task orchestration - the ability to execute tasks in a given order and in parallel
• distributed task execution - the ability to distribute tasks to multiple computers, while keeping the experience as if everything is happening on the same computer as much as possible
• detecting affected projects/packages - the ability to determine exactly what is affected by the change in order to build/test only that
• Workspace analysis - the possibility of simply creating a project scheme without additional configuration
• Dependency graph visualization - visualization of relationships between projects and/or tasks. It should be interactive so that you can search, filter, hide, query, etc. its elements
• source code sharing - encourages the sharing of separate parts of the code
• consistent tooling - the tool should treat all projects in the same way, regardless of the language in which they are written
• code generation - native support for code generation
• project constraints and visibility - support the definition of rules limiting interdependencies within the repository.
There are several main advantages of monorepositories that need to be emphasized. These are easy division of code into modules that can be stacked, easier dependency management, only one toolchain setup, Code editors and IDEs are workspace-aware, and all developers have the same development experience.
Despite the misconceptions, the monorepo architecture provides greater flexibility in deploying, allows precise ownership, gives more structure to the code, and scales well using familiar tools.
Working with monorepositories also brings certain challenges: trunk-based development is much more important, some services do not work well with monorepositories, a more sophisticated CI setup is required, and all large-scale changes need to be well defined in advance.