Lean Build Bootstrapping

Since version 4, Lean is a partially bootstrapped program: most parts of the frontend and compiler are written in Lean itself and thus need to be built before building Lean itself - which is needed to again build those parts. This cycle is broken by using pre-built C files checked into the repository (which ultimately go back to a point where the Lean compiler was not written in Lean) in place of these Lean inputs and then compiling everything in multiple stages up to a fixed point. The build directory is organized in these stages:

stage0/
  # Bootstrap binary built from stage0/src/.
  # We do not use any other files from this directory in further stages.
  bin/lean
stage1/
  include/
    config.h  # config variables used to build `lean` such as used allocator
    runtime/lean.h  # runtime header, used by extracted C code, uses `config.h`
  share/lean/
    lean.mk  # used by `leanmake`
  lib/
    lean/**/*.olean  # the Lean library (incl. the compiler) compiled by the previous stage's `lean`
    temp/**/*.{c,o}  # the library extracted to C and compiled by `leanc`
    libInit.a libLean.a  # static libraries of the Lean library
    libleancpp.a  # a static library of the C++ sources of Lean
    libleanshared.so  # a dynamic library including the static libraries above
  bin/
    lean  # the Lean compiler & server, a small executable that calls directly into libleanshared.so
    leanc  # a wrapper around a C compiler supplying search paths etc
    leanmake  # a wrapper around `make` supplying the Makefile above
stage2/...
stage3/...

Stage 0 can be viewed as a blackbox since it does not depend on any local changes and is equivalent to downloading a bootstrapping binary as done in other compilers. The build for any other stage starts by building the runtime and standard library from src/, using the lean binary from the previous stage in the latter case, which are then assembled into a new bin/lean binary.

Each stage can be built by calling make stageN in the root build folder. Running just make will default to stage 1, which is usually sufficient for testing changes on the test suite or other files outside of the stdlib. However, it might happen that the stage 1 compiler is not able to load its own stdlib, e.g. when changing the .olean format: the stage 1 stdlib will use the format generated by the stage 0 compiler, but the stage 1 compiler will expect the new format. In this case, we should continue with building and testing stage 2 instead, which will both build and expect the new format. Note that this is only possible because when building a stage's stdlib, we use the previous compiler but never load the previous stdlib (since everything is prelude). We can also use stage 2 to test changes in the compiler or other "meta" parts, i.e. changes that affect the produced (.olean or .c) code, on the stdlib and compiler itself. We are not aware of any "meta-meta" parts that influence more than two stages of compilation, so stage 3 should always be identical to stage 2 and only exists as a sanity check.

In summary, doing a standard build via make internally involves these steps:

  1. compile the stage0/src archived sources into stage0/bin/lean
  2. use it to compile the current library (including your changes) into stage1/lib
  3. link that and the current C++ code from src/ into stage1/bin/lean

You now have a Lean binary and library that include your changes, though their own compilation was not influenced by them, that you can use to test your changes on test programs whose compilation will be influenced by the changes.

Updating stage0

Finally, when we want to use new language features in the library, we need to update the archived C source code of the stage 0 compiler in stage0/src.

The github repository will automatically update stage0 on master once src/stdlib_flags.h and stage0/src/stdlib_flags.h are out of sync.

If you have write access to the lean4 repository, you can also manually trigger that process, for example to be able to use new features in the compiler itself. You can do that on https://github.com/leanprover/lean4/actions/workflows/update-stage0.yml or using Github CLI with

gh workflow run update-stage0.yml

Leaving stage0 updates to the CI automation is preferable, but should you need to do it locally, you can use make update-stage0-commit in build/release to update stage0 from stage1 or make -C stageN update-stage0-commit to update from another stage. This command will automatically stage the updated files and introduce a commit,so make sure to commit your work before that.

If you rebased the branch (either onto a newer version of master, or fixing up some commits prior to the stage0 update, recreate the stage0 update commits. The script script/rebase-stage0.sh can be used for that.

The CI should prevent PRs with changes to stage0 (besides stdlib_flags.h) from entering master through the (squashing!) merge queue, and label such PRs with the changes-stage0 label. Such PRs should have a cleaned up history, with separate stage0 update commits; then coordinate with the admins to merge your PR using rebase merge, bypassing the merge queue.

Further Bootstrapping Complications

As written above, changes in meta code in the current stage usually will only affect later stages. This is an issue in two specific cases.

  • For the special case of quotations, it is desirable to have changes in builtin parsers affect them immediately: when the changes in the parser become active in the next stage, builtin macros implemented via quotations should generate syntax trees compatible with the new parser, and quotation patterns in builtin macros and elaborators should be able to match syntax created by the new parser and macros. Since quotations capture the syntax tree structure during execution of the current stage and turn it into code for the next stage, we need to run the current stage's builtin parsers in quotations via the interpreter for this to work. Caveats:

    • We activate this behavior by default when building stage 1 by setting -Dinternal.parseQuotWithCurrentStage=true. We force-disable it inside macro/macro_rules/elab/elab_rules via suppressInsideQuot as they are guaranteed not to run in the next stage and may need to be run in the current one, so the stage 0 parser is the correct one to use for them. It may be necessary to extend this disabling to functions that contain quotations and are (exclusively) used by one of the mentioned commands. A function using quotations should never be used by both builtin and non-builtin macros/elaborators. Example: https://github.com/leanprover/lean4/blob/f70b7e5722da6101572869d87832494e2f8534b7/src/Lean/Elab/Tactic/Config.lean#L118-L122
    • The parser needs to be reachable via an import statement, otherwise the version of the previous stage will silently be used.
    • Only the parser code (Parser.fn) is affected; all metadata such as leading tokens is taken from the previous stage.

    For an example, see https://github.com/leanprover/lean4/commit/f9dcbbddc48ccab22c7674ba20c5f409823b4cc1#diff-371387aed38bb02bf7761084fd9460e4168ae16d1ffe5de041b47d3ad2d22422R13

  • For non-builtin meta code such as notations or macros in Notation.lean, we expect changes to affect the current file and all later files of the same stage immediately, just like outside the stdlib. To ensure this, we build stage 1 using -Dinterpreter.prefer_native=false - otherwise, when executing a macro, the interpreter would notice that there is already a native symbol available for this function and run it instead of the new IR, but the symbol is from the previous stage!

    To make matters more complicated, while false is a reasonable default incurring only minor overhead (ParserDescrs and simple macros are cheap to interpret), there are situations where we need to set the option to true: when the interpreter is executed from the native code of the previous stage, the type of the value it computes must be identical to/ABI-compatible with the type in the previous stage. For example, if we add a new parameter to Macro or reorder constructors in ParserDescr, building the stage with the interpreter will likely fail. Thus we need to set interpreter.prefer_native to true in such cases to "freeze" meta code at their versions in the previous stage; no new meta code should be introduced in this stage. Any further stages (e.g. after an update-stage0) will then need to be compiled with the flag set to false again since they will expect the new signature.

    When enabling prefer_native, we usually want to disable parseQuotWithCurrentStage as it would otherwise make quotations use the interpreter after all. However, there is a specific case where we want to set both options to true: when we make changes to a non-builtin parser like simp that has a builtin elaborator, we cannot have the new parser be active outside of quotations in stage 1 as the builtin elaborator from stage 0 would not understand them; on the other hand, we need quotations in e.g. the builtin simp elaborator to produce the new syntax in the next stage. As this issue usually affects only tactics, enabling debug.byAsSorry instead of prefer_native can be a simpler solution.

    For a prefer_native example, see https://github.com/leanprover/lean4/commit/da4c46370d85add64ef7ca5e7cc4638b62823fbb.

To modify either of these flags both for building and editing the stdlib, adjust the code in stage0/src/stdlib_flags.h. The flags will automatically be reset on the next update-stage0 when the file is overwritten with the original version in src/.