From f13dfd85ce1af52c1e39cbfcd1d87cfd9e63347b Mon Sep 17 00:00:00 2001
From: Stefan O'Rear <sorear@fastmail.com>
Date: Mon, 26 Feb 2024 19:29:26 -0500
Subject: [PATCH 1/2] FDPIC/ePIC draft specification

---
 riscv-abi.adoc        |   2 +
 riscv-fdpic-epic.adoc | 549 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 551 insertions(+)
 create mode 100644 riscv-fdpic-epic.adoc

diff --git a/riscv-abi.adoc b/riscv-abi.adoc
index f4070c91..1a7587e3 100644
--- a/riscv-abi.adoc
+++ b/riscv-abi.adoc
@@ -14,3 +14,5 @@ include::riscv-dwarf.adoc[]
 include::riscv-rtabi.adoc[]
 
 include::riscv-atomic.adoc[]
+
+include::riscv-fdpic-epic.adoc[]
diff --git a/riscv-fdpic-epic.adoc b/riscv-fdpic-epic.adoc
new file mode 100644
index 00000000..4b73b12f
--- /dev/null
+++ b/riscv-fdpic-epic.adoc
@@ -0,0 +1,549 @@
+[[riscv-fdpic-epic]]
+= RISC-V FDPIC and ePIC ABI supplement
+ifeval::["{docname}" == "riscv-cc"]
+include::prelude.adoc[]
+endif::[]
+
+== Purpose and need
+
+The RISC-V ELF psABI defines PIC code models which can be used to change the
+load address of an object or combine several independently linked objects
+without modifying executable memory. This supports loading of shared libraries,
+as well as dynamic memory management in systems with a single address space.
+However, the existing PIC mechanisms assume a constant displacement between the
+code and data of a single object, which precludes sharing code between multiple
+instances of a single object in an environment without address translation, as
+well as using code located in read-only memory if the location of the data is
+variable.
+
+The FDPIC and ePIC supplement permits dynamic memory management in such cases
+by defining new code models where the code and data have independently varying
+addresses for each object in a process image. These models may also be useful
+as an alternative to the non-PIC large model in cases where the code and data
+occupy fixed addresses, but at a larger separation than is accomodated by the
+range limits of the exsiting code models.
+
+== High level alternatives
+
+Not providing code models supporting independent relocation of code and data
+requires RISC-V systems without address translation to provide a separate
+instance of the code for each instance of the data, and in writable memory if
+the number or location of the data instances can change.
+
+This document proposes an implementation strategy called **FDPIC**.  Under
+FDPIC the `gp` register holds a per-function base address, called the **GOT
+address**, for the data of the object containing the currently executing
+function. All control transfers that potentially cross between FDPIC objects
+must load `gp` for the new function to establish the ABI environment. All data
+accesses are either relative to `pc`, for read-only data local to the object
+containing the function, or are indirectly derived from `gp`. A simplified
+version of FDPIC where there is only one object in each process image and all
+functions use the same value of `gp` is defined under the name **ePIC**.
+
+An alternative, not studied in detail, would be to associate identifiers with
+each object, unique within a process image, and use `gp` as a directory of data
+addresses for each object in the process image. Such an **ID shared library**
+ABI would have advantages and disadvantages, most notably the need to globally
+assign the identifiers, for which no standard tooling exists.
+
+== Code models
+
+The default code model for FDPIC/ePIC is **large**. This model provides an
+unlimited size for both code and data, but limits the size of the GOT to 4GB,
+providing access to 256-512 million unique symbols on RV64. A single GOT
+address is used for all functions in an object. Linker relaxation is used to
+generate the most efficient access sequence for any symbol.
+
+The name **huge** is reserved for a code model which relaxes the GOT size
+limit. Two approaches are possible, defining multiple GOTs in an object and
+using different access sequences to increase the size of a single GOT. Detailed
+design will not be done until there is a clear need.
+
+== Data representation
+
+FDPIC redefines a function pointer as a pointer to a struct containing two
+address-sized values, called a **function descriptor** and the ABI's namesake.
+The first value in the descriptor is the address of the first instruction of
+the function's code. The second value is the `gp` register for the function.
+
+NOTE: This style of function descriptor is used in specialized FDPIC ABIs for
+Blackfin, FR-V, SuperH, and Arm, and is part of the default ABI for PA-RISC,
+POWER (v1 only, has a third "static chain" field), and IA-64.
+
+C++ vtables have the same layout as the base ABI, but the method pointers are
+replaced with pointers to function descriptors.
+
+NOTE: This matches all function descriptor ABIs except IA-64, where the vtable
+slots are 16 bytes in size and contain inline copies of the function
+descriptors.
+
+Every function whose address is taken has a **canonical function descriptor**
+somewhere in memory used for the taken address, which is constant within the
+process image.
+
+ePIC does not use function descriptors; the representation of function pointers
+and vtables is identical between ePIC and the base ABI.
+
+== Register and calling convention
+
+In FDPIC, `gp` is an argument register. It is valid on entry to a function and
+contains that functions's GOT address. It is not valid at any other time and
+may be allocated within functions. If a function performs multiple calls, the
+caller is responsible for saving `gp` across calls other than the last and
+restoring it before subsequent calls. Calls through a function descriptor load
+`gp` from the descriptor; all other calls use the value of `gp` the caller was
+entered with.
+
+FDPIC `gp` rules apply orthogonally to all standard calling convention
+variants and do not affect the setting of `STO_RISCV_VARIANT_CC`.
+
+In ePIC, `gp` is invariant and holds the GOT address for the process image at
+all instruction boundries.
+
+== Range extension thunks
+
+If a direct call is performed across a distance exceeding that possible with a
+call pseudoinstruction the linker is expected to insert a range extension
+thunk, which can use the `t1` and `t2` registers.
+
+== ELF file header
+
+Two bits in `e_flags` are allocated for FDPIC/ePIC. Bit 5, `EF_RISCV_FUNCDESC`,
+is set on objects which contain code using the FDPIC calling convention.
+`EF_RISCV_FUNCDESC` is clear for objects where code uses the base calling
+convention. Bit 6, `EF_RISCV_NONCONSTDISP`, is set on executables or shared
+libraries when each segment in the program header table can be loaded at an
+independent address, and clear when the relative addresses of segments must be
+maintained.
+
+NOTE: All four combinations are meaningful, although `EF_RISCV_FUNCDESC`
+without `EF_RISCV_NONCONSTDISP` generally represents a misuse of relocations.
+
+TODO: The scheme above matches SuperH; Arm and Xtensa instead use special
+EI_OSABI values. Evaluate pros and cons. Linux requires both flags to be
+available from the ELF header alone so a note is not an option.
+
+== Dynamic section
+
+DT_PLTGOT holds the GOT address used for all functions in the object.
+
+== Tag_RISCV_x3_reg_usage
+
+Value 4 indicates FDPIC, and can merge with itself or value 0. Value 5
+indicates ePIC, and can merge with itself or value 0.
+
+== New relocations and relaxation
+
+TODO: Edit this into a form consistent with the existing relocation
+descriptions and separate the relaxation information.
+
+Non-TLSDESC global dynamic TLS is not supported.  No special provision is made
+to distinguish rematerializable from non-rematerializable addressing sequences,
+although compilers may treat addressing sequences as rematerializable if they
+are known to not be in the code segment.  Omitting `R_RISCV_RELAX` allows
+length-preserving rewrites.  This sketch optimizes the number of relocation
+types at the expense in some cases of the number of relocation entries.
+
+I've gone back and forth several times over exactly which transformations
+should be permitted without RELAX.  The current rules allow us to express the
+"use PCREL or GPREL but never use a GOT" property of ePIC and also allow the
+use of code models in non-relaxed FDPIC, but in the default FDPIC model do not
+allow rematerialization or omission of `R_RISCV_PIC_ADD` relocations.
+
+Requiring `R_RISCV_INTERMEDIATE_LOAD` to be explicitly marked even when it is
+optimized out at compilation or assembly time is a wart on the design and the
+only place we're quantitatively worse than ePIC.  To fix it, use the 11-type
+scheme.
+
+A full FDPIC proposal would include, in addition to the relocations and
+relaxations described here, a precise definition of the calling convention, ELF
+flags and attributes, the list of code models, and sibling PRs to asm-manual
+and c-api.
+
+* `R_RISCV_FUNCDESC` (Static/Dynamic, FDPIC ABI only)
+
+  Populates a 32/64 bit location with a pointer to a canonical function
+descriptor created by the dynamic linker for globally visible symbols and the
+static linker otherwise.
+
+  NOTE: PPC64 ELFv1 points symbol values directly at function descriptors but
+consistency with FR-V/Blackfin/SuperH/Arm favors this approach.
+
+* `R_RISCV_FUNCDESC_VALUE` (Static/Dynamic, FDPIC ABI only)
+
+  Populates a 64/128 bit location with a copy of the canonical function
+descriptor.
+
+  This is the relocation type used to support lazy binding if present in the
+relocation table pointed to by DT_JMPREL.
+
+  NOTE: This could be used as a static relocation to populate an ia64-style
+vtable containing inline descriptors, however all function descriptor ABIs for
+architectures supported in LLVM use pointers to canonical descriptors in the
+vtable.  This relocation type may also be used for lazy binding when referenced
+from DT_JMPREL.
+
+* `R_RISCV_GOTGPREL_HI` (Static, all GP ABIs)
+
+  Nondeterministically pick an _access method_, which is one of GOT entry,
+GP-relative, PC-relative, or absolute.  Absolute, GP-relative, and PC-relative
+can only be used for symbols which are resolved to a definition at static link
+time.  Absolute requires that the symbol be absolute and within signed ±2GiB of
+zero.  GP-relative requires that the symbol be within ±2GiB of
+`__global_pointer$` in the data segment.  PC-relative requires that the symbol
+be within ±2GiB of the relocation's offset in the code segment.  PC-relative
+and absolute access methods can only be used if the relocation offset is even
+and points at a `lui`.
+
+  If the access method is GOT entry, find or add an entry to the GOT which
+will, at runtime, contain the address of the relocation target.  When
+generating a dynamically linked executable or shared library this will
+typically involve creating a `R_RISCV_32` or `R_RISCV_64` dynamic relocation.
+
+  The offset of the relocation must be odd, even and point at a `c.lui`
+instruction, or even and point at a `lui` instruction.  Other cases are
+reserved for future standard use.
+
+  For the GOT entry access method and the GP relative address method, the byte
+displacement from `__global_pointer$` to the GOT entry or the target is divided
+by 4096, rounding to nearest ties up.  The divided displacement is inserted in
+the immediate field of the `lui` or `c.lui` instruction.  If the divided
+displacement cannot be represented in the immediate field or if the relocation
+offset is odd and the divided displacement is not zero, relocation fails.
+
+  For the absolute access method, the absolute address of the target is divided
+and inserted into the instruction immediate as described in the previous
+paragraph.
+
+  For the PC-relative access method, the displacement from the relocation
+offset to the target is divided and inserted into the immediate of the
+instruction, which also has its opcode rewritten from `lui` to `auipc`.
+
+  The relocation may be paired with `R_RISCV_RELAX`.  In this case, if the
+`lui` instruction is not replaced with an `auipc` it may be replaced with a
+`c.lui` (if RVC is available for relaxation), and a `lui` or `c.lui` may be
+deleted outright if it receives an immediate of 0.
+
+* `R_RISCV_FUNCDESC_GOTGPREL_HI` (Static, FDPIC ABI only)
+
+  Find or create a GOT entry which will receive a canonical function descriptor
+for the target, which must be a function symbol with zero addend.  Perform
+relocation and relaxation as for `R_RISCV_GOTGPREL_HI` with a forced access
+method of the chosen GOT entry.
+
+* `R_RISCV_FUNCDESC_VALUE_GPREL_HI` (Static, FDPIC ABI only)
+
+  Find or create an aligned pair of GOT entries which will receive a function
+descriptor for the target, which must be a function symbol with zero addend.
+If the target lacks global visibility, the aligned pair will be the canonical
+function descriptor for the symbol.  Perform relocation and relaxation as for
+`R_RISCV_GOTGPREL_HI` with a forced GP relative access method and target of the
+first chosen GOT entry.
+
+* `R_RISCV_TLSDESC_GPREL_HI` (Static, all GP ABIs but only useful when dynamic
+* linking)
+
+  Find or create a pair of GOT entries which will receive a TLS descriptor for
+the target, which must be a symbol in a `SHF_TLS` section, typically through
+creation of a `R_RISCV_TLSDESC` dynamic relocation.  Perform relocation and
+relaxation as for `R_RISCV_GOTGPREL_HI` with an access method of GP relative
+and a target of the first GOT entry.  May also be relaxed into an initial-exec
+or local-exec form as described elsewhere for TLSDESC (except for the presence
+of an add instruction).
+
+  The calling convention of the TLS descriptor **does not change**; even if an
+object uses the FDPIC calling convention, the descriptor must ignore but
+preserve the `gp` value it is called with.
+
+* `R_RISCV_TLS_GOTGPREL_HI` (Static, all GP ABIs but only useful when dynamic
+* linking)
+
+  Find or create a GOT entry containing the TP offset for the target, which
+must be a symbol in a `SHF_TLS` section, typically through creation of a
+`R_RISCV_TLS_TPREL32` or `R_RISCV_TLS_TPREL64` dynamic relocation.  Perform
+relocation and relaxation as for `R_RISCV_GOTGPREL_HI` with the GOT entry.  May
+also be relaxed into local-exec as described elsewhere (except for the presence
+of an add instruction).
+
+* `R_RISCV_PIC_ADD` (Static, all GP ABIs; replaces `R_RISCV_EPIC_BASE_ADD`)
+
+  The target of the relocation is used to locate another ("parent") relocation
+which must have the basic behavior of `R_RISCV_GOTGPREL_HI`.  The offset of the
+relocation must be even and point to an `add` or `c.add` instruction with `gp`
+as one argument; all other cases are reserved for standard use.
+
+  If the parent relocation deleted its `lui` instruction (only possible if the
+parent relocation is paired with `R_RISCV_RELAX`), delete the `add` or `c.add`
+instruction.
+
+  If the parent relocation did not delete its `lui` instruction and its access
+method is GOT entry or GP-relative, no action is taken.
+
+  If the parent relocation did not delete its `lui` instruction and its access
+method is absolute or PC-relative, rewrite the instruction into a `c.mv` or
+canonical `mv` instruction which copies the non-`gp` argument of the add to its
+result.  If the resulting instruction would move a register to itself and the
+parent relocation is paired with `R_RISCV_RELAX`, the instruction may
+optionally be deleted instead.
+
+  NOTE: `R_RISCV_PIC_ADD` relocations have no effect and can be omitted when
+the parent relocation is not paired with `R_RISCV_RELAX` and either points to a
+`c.lui` or has odd offset.
+
+* `R_RISCV_TLSDESC_LOAD_LO12` (Existing relocation)
+
+  Extended to allow using the low 12 bits of the computed displacement of a
+parent relocation of type `R_RISCV_TLSDESC_GOTGPREL_HI`.  Replace `rs1` with
+`gp` if the parent relocation deleted its instruction.
+
+* `R_RISCV_TLSDESC_ADD_LO12` (Existing relocation)
+
+  Extended to allow using the low 12 bits of the computed displacement of a
+parent relocation of type `R_RISCV_TLSDESC_GOTGPREL_HI`.  Replace `rs1` with
+`gp` if the parent relocation deleted its instruction.
+
+* `R_RISCV_CALL` (Existing relocation)
+
+  Becomes reserved for standard use in the GP-relative ABIs.
+
+* `R_RISCV_CALL_PLT` (Existing relocation)
+
+  In addition to the `auipc` `jalr` sequence supported for PLT calls, we also
+recognize `lui` `add/c.add` `lx` `lx` `jalr/c.jr` sequences for no-PLT calls.
+
+  All `R_RISCV_CALL_PLT` relocations may pass control through a
+linker-generated stub which clobbers registers equivalent to an eagerly bound
+PLT stub (`t1` - `t6`).
+
+* `R_RISCV_GPREL_HI` (New; Static, all GP ABIs)
+
+  Acts exactly as `R_RISCV_GOTGPREL_HI` except that the GOT entry access method
+will not be used.  Relocation shall fail if no other access method is possible.
+
+* `R_RISCV_INTERMEDIATE_LOAD` (Redefined; Static, all GP ABIs)
+
+  The target of the relocation is used to locate another ("parent") relocation
+which must have the basic behavior of `R_RISCV_GOTGPREL_HI`.  The offset of the
+relocation must be even and point to a `lw` (for ELFCLASS32) or `ld` (for
+ELFCLASS64) instruction; all other cases are reserved for standard use.
+
+  If the parent relocation access method is not GOT entry, replace the
+instruction with an instruction that moves `rs1` to `rd`.
+
+  If the parent relocation access method is GOT entry, write the low 12 bits of
+the parent relocation computed displacement into the I-type immediate of the
+instruction.  If the parent relocation has odd offset or deleted its `lui`
+instruction, replace the `rs1` register specifier with `gp`.
+
+  If the parent relocation is paired with `R_RISCV_RELAX` and RVC is available
+for relaxation, optionally replace the instruction with an equivalent
+compressed instruction or delete it if it has no effect.
+
+* `R_RISCV_PIC_ADDR_LO12_I` (New; Static, all GP ABIs)
+
+  The target of the relocation is used to locate another ("parent") relocation
+which must have the basic behavior of `R_RISCV_GOTGPREL_HI`.  The offset of the
+relocation must be even and point to a `lw` (for ELFCLASS32) or `ld` (for
+ELFCLASS64) instruction; all other cases are reserved for standard use.
+
+  For all access methods, write the low 12 bits of the parent relocation
+computed displacement into the I-type immediate of the instruction.  If the
+parent relocation has odd offset or deleted its `lui` instruction, replace the
+`rs1` register specifier with `gp`.
+
+  If the access method is not GOT entry, replace opcode and funct3 to convert
+the instruction into an `addi`.
+
+  If the parent relocation is paired with `R_RISCV_RELAX` and RVC is available
+for relaxation, optionally replace the instruction with an equivalent
+compressed instruction or delete it if it has no effect.
+
+* `R_RISCV_PIC_LO12_I` (Rename of `R_RISCV_PCREL_LO12_I`)
+
+  The target of the relocation is used to locate another ("parent") relocation.
+If the parent relocation has an existing type (only `R_RISCV_PCREL_HI20`
+remains valid in GP ABIs), perform relocation as described currently.  The
+following applies if the parent relocation has the basic behavior of
+`R_RISCV_GOTGPREL_HI`; all other new cases are reserved.
+
+  If the parent relocation access method is not GOT entry, add the low 12 bits
+of the parent relocation computed displacement to the 12-bit I-type immediate
+of the instruction at the relocation offset.  Relocation fails if addition
+overflows and may fail if the addends have any bits in common.  If the parent
+relocation has odd offset or deleted its instruction, replace the `rs1`
+register specifier with `gp` (for the GP relative access method) or `zero` (for
+the absolute access method).
+
+  For all access methods, if the parent relocation is paired with
+`R_RISCV_RELAX` and RVC is available for relaxation, optionally replace the
+instruction with an equivalent compressed instruction or delete it if it has no
+effect.
+
+* `R_RISCV_PIC_LO12_S` (Rename of `R_RISCV_PCREL_LO12_S`)
+
+  The target of the relocation is used to locate another ("parent") relocation.
+If the parent relocation has an existing type (no defined cases as of writing
+remain valid in GP ABIs), perform relocation as described currently.  The
+following applies if the parent relocation has the basic behavior of
+`R_RISCV_GOTGPREL_HI`; all other new cases are reserved.
+
+  If the parent relocation access method is not GOT entry, add the low 12 bits
+of the parent relocation computed displacement to the 12-bit S-type immediate
+of the instruction at the relocation offset.  Relocation fails if addition
+overflows and may fail if the addends have any bits in common.  If the parent
+relocation has odd offset or deleted its instruction, replace the `rs1`
+register specifier with `gp` (for the GP relative access method) or `zero` (for
+the absolute access method).
+
+  For all access methods, if the parent relocation is paired with
+`R_RISCV_RELAX` and RVC is available for relaxation, optionally replace the
+instruction with an equivalent compressed instruction or delete it if it has no
+effect.
+
+== Access sequences
+
+```
+lb, sb, la, lla,  Pseudoinstructions documented in riscv-asm-manual
+la.tls.ie
+la.fd, lla.fd     Materializes a pointer to a function descriptor (i.e. a C
+                  function pointer) for a global or local symbol
+llb, lsb          Like lb/sb but for local symbols
+tlsdesc_call      Materialize tp-relative offset to a global dynamic TLS symbol
+call_noplt        Like call but inlines PLT entry
+
+### lb a0, symbol ###                  ### llb a0, symbol ###
+0  lui a0, 0                           0  lui a0, 0
+0     R_RISCV_GOTGPREL_HI symbol       0     R_RISCV_GPREL_HI symbol
+0     R_RISCV_RELAX                    0     R_RISCV_RELAX
+4  c.add a0, gp                        4  c.add a0, gp
+4     R_RISCV_PIC_ADD 0                4     R_RISCV_PIC_ADD 0
+6  ld a0, 0(a0)                        6  lb a0, 0(a0)
+6     R_RISCV_INTERMEDIATE_LOAD 0      6     R_RISCV_PIC_LO12_I 0
+a  lb a0, 0(a0)
+a     R_RISCV_PIC_LO12_I 0
+
+### sb a1, symbol, a0 ###              ### lsb a1, symbol, a0 ###
+0  lui a0, 0                           0  lui a0, 0
+0     R_RISCV_GOTGPREL_HI symbol       0     R_RISCV_GPREL_HI symbol
+0     R_RISCV_RELAX                    0     R_RISCV_RELAX
+4  c.add a0, gp                        4  c.add a0, gp
+4     R_RISCV_PIC_ADD 0                4     R_RISCV_PIC_ADD 0
+6  ld a0, 0(a0)                        6  sb a1, 0(a0)
+6     R_RISCV_INTERMEDIATE_LOAD 0      6     R_RISCV_PIC_LO12_S 0
+a  sb a1, 0(a0)                     
+a     R_RISCV_PIC_LO12_S 0
+
+### lla a0, symbol ###
+0  lui a0, 0
+0     R_RISCV_GPREL_HI symbol
+0     R_RISCV_RELAX
+4  c.add a0, gp
+4     R_RISCV_PIC_ADD 0
+6  ld a0, 0(a0)
+6     R_RISCV_PIC_ADDR_LO12_I 0
+
+### la a0, symbol ###                  ### la.fd a0, symbol ###
+0  lui a0, 0                           0  lui a0, 0
+0     R_RISCV_GOTGPREL_HI symbol       0     R_RISCV_FUNCDESC_GOTGPREL_HI symbol
+0     R_RISCV_RELAX                    0     R_RISCV_RELAX
+4  c.add a0, gp                        4  c.add a0, gp
+4     R_RISCV_PIC_ADD 0                4     R_RISCV_PIC_ADD 0
+6  ld a0, 0(a0)                        6  ld a0, 0(a0)
+6     R_RISCV_PIC_ADDR_LO12_I 0        6     R_RISCV_PIC_ADDR_LO12_I 0
+
+### la.tls.ie a0, symbol ###           ### lla.fd a0, symbol ###
+0  lui a0, 0                           0  lui a0, 0
+0     R_RISCV_TLS_GOTGPREL_HI symbol   0     R_RISCV_FUNCDESC_VALUE_GPREL_HI symbol   
+0     R_RISCV_RELAX                    0     R_RISCV_RELAX
+4  c.add a0, gp                        4  c.add a0, gp
+4     R_RISCV_PIC_ADD 0                4     R_RISCV_PIC_ADD 0
+6  ld a0, 0(a0)                        6  ld a0, 0(a0)
+6     R_RISCV_PIC_ADDR_LO12_I 0        6     R_RISCV_PIC_ADDR_LO12_I 0
+
+### call_noplt symbol, t2 ###          ### tlsdesc_call symbol, t2 ###
+0  lui t2, 0                           0  lui a0, 0
+0     R_RISCV_CALL symbol              0     R_RISCV_TLSDESC_GPREL_HI symbol
+0     R_RISCV_RELAX                    0     R_RISCV_RELAX
+4  c.add t2, gp                        4  c.add a0, gp
+6  ld gp, 8(t2)                        4     R_RISCV_PIC_ADD 0
+a  ld t2, 0(t2)                        6  ld t2, 0(a0)
+e  c.jr t2                             6     R_RISCV_TLSDESC_LOAD_LO12 0
+                                       a  addi a0, a0, 0
+                                       a     R_RISCV_TLSDESC_ADD_LO12 0
+                                       e  jalr t0, t2
+                                       e     R_RISCV_TLSDESC_CALL 0
+```
+
+== Changes to other specifications
+
+=== riscv-toolchain-conventions
+
+Add -mfdpic option for FDPIC and -mepic option for ePIC.
+
+=== riscv-c-api-doc
+
+Command line options, preprocessor defines
+
+=== riscv-asm-manual
+
+Document new pseudoinstructions and changes to pseudoinstructions.
+
+Document new relocation syntax.
+
+== Changes to toolchain and runtime software
+
+=== gcc
+
+=== binutils
+
+=== llvm
+
+=== Linux
+
+There is an unfixable mistake in the RISC-V Linux syscall ABI where struct
+sigaction doesn't have space for a restorer field. In the short term, signal
+handling on fdpic should use the existing code generation on the stack
+approach. This should likely be changed to using a global restorer allocated as
+part of the kernel. Future (s)PMP support, in both options, will require
+handling instruction access fault traps and treating them as rt_sigreturn
+if they occur at the restorer address.
+
+Make sure that we're currently zeroing integer registers to new processes. It
+isn't happening in flush_thread, start_thread, or ELF_PLAT_INIT where other
+architectures do it, if it isn't happening at all it's a minor security hole
+(userspace ASLR bypass from child processes). It is needed for FDPIC binaries
+to be able to detect and operate properly under binfmt_elf.
+
+The two new flags correspond to elf_check_fdpic and
+elf_check_const_displacement.
+
+Signal handling with a FDPIC personality sets pc and gp from a function
+descriptor used in sa_handler or sa_sigaction.
+
+TODO: Research core dumps, ptrace, and compat ptrace and establish
+requirements.
+
+=== musl
+
+The dynamic linker code is highly arch-agnostic and should work with the
+proposed relocation scheme with no changes.  The startup assembly in crt_arch
+will need to be translated.
+
+None of the assembly code performs user callbacks or accesses global data in a
+way that would be affected by FDPIC or ePIC.
+
+The name of the dynamic linker is `/lib/ld-musl-riscv{32,64}{,-sp,-sf}-fdpic`.
+
+=== uclibc
+
+TODO:
+
+== Expected testing
+
+llvm internal codegen tests
+
+libc-test, LTP for fdpic
+
+ePIC static PIE smoke tests
+
+TODO: More ideas. Is anything suitable for ePIC?

From c11897a83a14c859be14d6e695c9a5f852c6a03f Mon Sep 17 00:00:00 2001
From: Stefan O'Rear <sorear@fastmail.com>
Date: Sun, 24 Mar 2024 00:00:29 -0400
Subject: [PATCH 2/2] Deficiencies of FLAT support, action plan

---
 riscv-fdpic-epic.adoc | 74 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/riscv-fdpic-epic.adoc b/riscv-fdpic-epic.adoc
index 4b73b12f..56e4d190 100644
--- a/riscv-fdpic-epic.adoc
+++ b/riscv-fdpic-epic.adoc
@@ -490,6 +490,80 @@ Document new pseudoinstructions and changes to pseudoinstructions.
 
 Document new relocation syntax.
 
+=== FLAT
+
+Static PIE ELF binaries provide all functionality of FLAT binaries except for
+the deprecated and poorly supported ID-based shared library mechanism in a more
+consistent and flexible, but equally simple, fashion. This document recommends
+use of static PIE ELF in preference to FLAT in all new systems.
+
+The following bugs and design flaws are known to exist in FLAT binary support
+for RISC-V as of 2024-03-23:
+
+1. On non-RISC-V architectures, FLAT provides a single type of relocation,
+which produces a pointer-sized value. 64-bit RISC-V FLAT binaries relocate
+32-bit fields, causing address corruption if the load address is greater than
+4GB.
+
+2. FLAT provides a "RISC-V specific GOT header" which must be skipped by the
+kernel during the relocation process. This is a misinterpretation; the data
+structure which the kernel is interpreting is part of the ELF lazy binding
+definition. It is not architecure specific in any way, but the kernel provides
+an architecture-specific workaround for an elf2flt behavior which in turn means
+that the elf2flt bug must be retained in an architecture-specific fashion to
+match kernel expectations.
+
+3. FLAT binaries on other architectures have a variable length area immediately
+before the data segment used to store pointers to other data segments to
+support the ID-based shared library ABI on MC68000. Since no attempt was made
+to define a GP-relative ABI before adding RISC-V to FLAT, all RISC-V FLAT
+binaries are effectively CONSTDISP. Rather than setting a flag in the FLAT
+header to require data immediately adjacent to text, this was made into a
+Kconfig option, CONFIG_BINFMT_FLAT_NO_DATA_START_OFFSET.
+
+4. Unlike other FLAT architectures, neither FLAT_PLAT_INIT nor start_thread
+passes the data segment location to the new process; it is impossible to find
+the process data segment without setting FLAT_FLAG_RAM on the image and relying
+on the implicit layout effects of CONFIG_BINFMT_FLAT_NO_DATA_START_OFFSET.
+
+5. Insufficient alignment is provided for `__init_array` sections on RV64; see
+https://github.com/uclinux-dev/elf2flt/pull/34[].
+
+6. FLAT_PLAT_INIT does not zero registers, which allows insecure information
+leaking from the parent process in the MMU case and makes forward compatible
+changles to FLAT_PLAT_INIT difficult.
+
+7. Shared library pointers are still written with
+CONFIG_BINFMT_FLAT_NO_DATA_START_OFFSET; this corrupts the last pointer's
+worth of the text segment. A fix exists.
+
+The last three are ABI-independent bugs and fixes are being persued independent
+of the FDPIC/ePIC effort.
+
+ePIC provides exactly the type of ABI that FLAT was designed to use, and adding
+ePIC support to FLAT will maximize similarity between RISC-V FLAT and other
+architectures. The proposal is to set gp in FLAT_PLAT_INIT to
+`current->mm->start_data + 0x800`; this will be usable as-is, but a constant
+can be added to gp in `_start` code if `__global_pointer$` is somewhere else.
+There is an obvious and well defined mapping of the ID-based FLAT shared
+library ABI to RISC-V; no attempt will be made to implement compiler or linker
+support for it, as software support for the linkage model is minimal to
+nonexistent.
+
+Constant-displacement FLAT binaries are useful, at least to the extent that it
+is ever useful to use FLAT instead of ELF static-PIE. The proposal is to add an
+architecture neutral FLAT_FLAG_CONSTDISP flag, which implies the current
+effects of FLAT_FLAG_RAM and CONFIG_BINFMT_FLAT_NO_DATA_START_OFFSET, but with
+the effects on data layout explicit.
+
+Fixing the first three issues requires breaking compatibility between old
+kernels and new elf2flt tools. The proposal is to add
+CONFIG_BINFMT_FLAT_BROKEN_RISCV, replacing
+CONFIG_BINFMT_FLAT_NO_DATA_START_OFFSET. CONFIG_BINFMT_FLAT_BROKEN_RISCV
+implies FLAT_FLAG_CONSTDISP for all binaries, forces relocations to apply to
+32-bit values instead of pointer-sized, and modifies the relocation logic to
+skip the ELF-specific PLTGOT header.
+
 == Changes to toolchain and runtime software
 
 === gcc