Source Mappings
As part of the AST output, the compiler provides the range of the source code that is represented by the respective node in the AST. This can be used for various purposes ranging from static analysis tools that report errors based on the AST and debugging tools that highlight local variables and their uses.
Furthermore, the compiler can also generate a mapping from the bytecode to the range in the source code that generated the instruction. This is again important for static analysis tools that operate on bytecode level and for displaying the current position in the source code inside a debugger or for breakpoint handling. This mapping also contains other information, like the jump type and the modifier depth (see below).
Both kinds of source mappings use integer identifiers to refer to source files.
The identifier of a source file is stored in
output['sources'][sourceName]['id']
where output
is the output of the
standard-json compiler interface parsed as JSON.
For some utility routines, the compiler generates “internal” source files
that are not part of the original input but are referenced from the source
mappings. These source files together with their identifiers can be
obtained via output['contracts'][sourceName][contractName]['evm']['bytecode']['generatedSources']
.
Note
In the case of instructions that are not associated with any particular source file,
the source mapping assigns an integer identifier of -1
. This may happen for
bytecode sections stemming from compiler-generated inline assembly statements.
The source mappings inside the AST use the following notation:
s:l:f
Where s
is the byte-offset to the start of the range in the source file,
l
is the length of the source range in bytes and f
is the source
index mentioned above.
The encoding in the source mapping for the bytecode is more complicated:
It is a list of s:l:f:j:m
separated by ;
. Each of these
elements corresponds to an instruction, i.e. you cannot use the byte offset
but have to use the instruction offset (push instructions are longer than a single byte).
The fields s
, l
and f
are as above. j
can be either
i
, o
or -
signifying whether a jump instruction goes into a
function, returns from a function or is a regular jump as part of e.g. a loop.
The last field, m
, is an integer that denotes the “modifier depth”. This depth
is increased whenever the placeholder statement (_
) is entered in a modifier
and decreased when it is left again. This allows debuggers to track tricky cases
like the same modifier being used twice or multiple placeholder statements being
used in a single modifier.
In order to compress these source mappings especially for bytecode, the following rules are used:
If a field is empty, the value of the preceding element is used.
If a
:
is missing, all following fields are considered empty.
This means the following source mappings represent the same information:
1:2:1;1:9:1;2:1:2;2:1:2;2:1:2
1:2:1;:9;2:1:2;;
Important to note is that when the verbatim builtin is used, the source mappings will be invalid: The builtin is considered a single instruction instead of potentially multiple.