Lean Build Bootstrapping
Since version 4, Lean is a partially bootstrapped program: most parts of the frontend and compiler are written in Lean itself and thus need to be built before building Lean itself - which is needed to again build those parts. This cycle is broken by using pre-built C files checked into the repository (which ultimately go back to a point where the Lean compiler was not written in Lean) in place of these Lean inputs and then compiling everything in multiple stages up to a fixed point. The build directory is organized in these stages:
stage0/
# Bootstrap binary built from stage0/src/.
# We do not use any other files from this directory in further stages.
bin/lean
stage1/
include/
config.h # config variables used to build `lean` such as used allocator
runtime/lean.h # runtime header, used by extracted C code, uses `config.h`
share/lean/
lean.mk # used by `leanmake`
lib/
lean/**/*.olean # the Lean library (incl. the compiler) compiled by the previous stage's `lean`
temp/**/*.{c,o} # the library extracted to C and compiled by `leanc`
libInit.a libLean.a # static libraries of the Lean library
libleancpp.a # a static library of the C++ sources of Lean
libleanshared.so # a dynamic library including the static libraries above
bin/
lean # the Lean compiler & server, a small executable that calls directly into libleanshared.so
leanc # a wrapper around a C compiler supplying search paths etc
leanmake # a wrapper around `make` supplying the Makefile above
stage2/...
stage3/...
Stage 0 can be viewed as a blackbox since it does not depend on any local
changes and is equivalent to downloading a bootstrapping binary as done in other
compilers. The build for any other stage starts by building the runtime and
standard library from src/
, using the lean
binary from the previous stage in
the latter case, which are then assembled into a new bin/lean
binary.
Each stage can be built by calling make stageN
in the root build folder.
Running just make
will default to stage 1, which is usually sufficient for
testing changes on the test suite or other files outside of the stdlib. However,
it might happen that the stage 1 compiler is not able to load its own stdlib,
e.g. when changing the .olean format: the stage 1 stdlib will use the format
generated by the stage 0 compiler, but the stage 1 compiler will expect the new
format. In this case, we should continue with building and testing stage 2
instead, which will both build and expect the new format. Note that this is only
possible because when building a stage's stdlib, we use the previous compiler
but never load the previous stdlib (since everything is prelude
). We can also
use stage 2 to test changes in the compiler or other "meta" parts, i.e. changes
that affect the produced (.olean or .c) code, on the stdlib and compiler itself.
We are not aware of any "meta-meta" parts that influence more than two stages of
compilation, so stage 3 should always be identical to stage 2 and only exists as
a sanity check.
In summary, doing a standard build via make
internally involves these steps:
- compile the
stage0/src
archived sources intostage0/bin/lean
- use it to compile the current library (including your changes) into
stage1/lib
- link that and the current C++ code from
src/
intostage1/bin/lean
You now have a Lean binary and library that include your changes, though their own compilation was not influenced by them, that you can use to test your changes on test programs whose compilation will be influenced by the changes.
Updating stage0
Finally, when we want to use new language features in the library, we need to
update the archived C source code of the stage 0 compiler in stage0/src
.
The github repository will automatically update stage0 on master
once
src/stdlib_flags.h
and stage0/src/stdlib_flags.h
are out of sync.
If you have write access to the lean4 repository, you can also manually trigger that process, for example to be able to use new features in the compiler itself. You can do that on https://github.com/leanprover/lean4/actions/workflows/update-stage0.yml or using Github CLI with
gh workflow run update-stage0.yml
Leaving stage0 updates to the CI automation is preferable, but should you need
to do it locally, you can use make update-stage0-commit
in build/release
to
update stage0
from stage1
or make -C stageN update-stage0-commit
to
update from another stage. This command will automatically stage the updated files
and introduce a commit,so make sure to commit your work before that.
If you rebased the branch (either onto a newer version of master
, or fixing
up some commits prior to the stage0 update, recreate the stage0 update commits.
The script script/rebase-stage0.sh
can be used for that.
The CI should prevent PRs with changes to stage0 (besides stdlib_flags.h
)
from entering master
through the (squashing!) merge queue, and label such PRs
with the changes-stage0
label. Such PRs should have a cleaned up history,
with separate stage0 update commits; then coordinate with the admins to merge
your PR using rebase merge, bypassing the merge queue.
Further Bootstrapping Complications
As written above, changes in meta code in the current stage usually will only affect later stages. This is an issue in two specific cases.
-
For the special case of quotations, it is desirable to have changes in builtin parsers affect them immediately: when the changes in the parser become active in the next stage, builtin macros implemented via quotations should generate syntax trees compatible with the new parser, and quotation patterns in builtin macros and elaborators should be able to match syntax created by the new parser and macros. Since quotations capture the syntax tree structure during execution of the current stage and turn it into code for the next stage, we need to run the current stage's builtin parsers in quotations via the interpreter for this to work. Caveats:
- We activate this behavior by default when building stage 1 by setting
-Dinternal.parseQuotWithCurrentStage=true
. We force-disable it insidemacro/macro_rules/elab/elab_rules
viasuppressInsideQuot
as they are guaranteed not to run in the next stage and may need to be run in the current one, so the stage 0 parser is the correct one to use for them. It may be necessary to extend this disabling to functions that contain quotations and are (exclusively) used by one of the mentioned commands. A function using quotations should never be used by both builtin and non-builtin macros/elaborators. Example: https://github.com/leanprover/lean4/blob/f70b7e5722da6101572869d87832494e2f8534b7/src/Lean/Elab/Tactic/Config.lean#L118-L122 - The parser needs to be reachable via an
import
statement, otherwise the version of the previous stage will silently be used. - Only the parser code (
Parser.fn
) is affected; all metadata such as leading tokens is taken from the previous stage.
For an example, see https://github.com/leanprover/lean4/commit/f9dcbbddc48ccab22c7674ba20c5f409823b4cc1#diff-371387aed38bb02bf7761084fd9460e4168ae16d1ffe5de041b47d3ad2d22422R13
- We activate this behavior by default when building stage 1 by setting
-
For non-builtin meta code such as
notation
s ormacro
s inNotation.lean
, we expect changes to affect the current file and all later files of the same stage immediately, just like outside the stdlib. To ensure this, we build stage 1 using-Dinterpreter.prefer_native=false
- otherwise, when executing a macro, the interpreter would notice that there is already a native symbol available for this function and run it instead of the new IR, but the symbol is from the previous stage!To make matters more complicated, while
false
is a reasonable default incurring only minor overhead (ParserDescr
s and simple macros are cheap to interpret), there are situations where we need to set the option totrue
: when the interpreter is executed from the native code of the previous stage, the type of the value it computes must be identical to/ABI-compatible with the type in the previous stage. For example, if we add a new parameter toMacro
or reorder constructors inParserDescr
, building the stage with the interpreter will likely fail. Thus we need to setinterpreter.prefer_native
totrue
in such cases to "freeze" meta code at their versions in the previous stage; no new meta code should be introduced in this stage. Any further stages (e.g. after anupdate-stage0
) will then need to be compiled with the flag set tofalse
again since they will expect the new signature.When enabling
prefer_native
, we usually want to disableparseQuotWithCurrentStage
as it would otherwise make quotations use the interpreter after all. However, there is a specific case where we want to set both options totrue
: when we make changes to a non-builtin parser likesimp
that has a builtin elaborator, we cannot have the new parser be active outside of quotations in stage 1 as the builtin elaborator from stage 0 would not understand them; on the other hand, we need quotations in e.g. the builtinsimp
elaborator to produce the new syntax in the next stage. As this issue usually affects only tactics, enablingdebug.byAsSorry
instead ofprefer_native
can be a simpler solution.For a
prefer_native
example, see https://github.com/leanprover/lean4/commit/da4c46370d85add64ef7ca5e7cc4638b62823fbb.
To modify either of these flags both for building and editing the stdlib, adjust
the code in stage0/src/stdlib_flags.h
. The flags will automatically be reset
on the next update-stage0
when the file is overwritten with the original
version in src/
.