diff --git a/reference/src/layout/arrays-and-slices.md b/reference/src/layout/arrays-and-slices.md index e6c55f16..42b85a2f 100644 --- a/reference/src/layout/arrays-and-slices.md +++ b/reference/src/layout/arrays-and-slices.md @@ -1,43 +1,7 @@ # Layout of Rust array types and slices -## Layout of Rust array types +**This page has been archived** -Array types, `[T; N]`, store `N` values of type `T` with a _stride_ that is -equal to the size of `T`. Here, _stride_ is the distance between each pair of -consecutive values within the array. +It did not actually reflect current layout guarantees and caused frequent confusion. -The _offset_ of the first array element is `0`, that is, a pointer to the array -and a pointer to its first element both point to the same memory address. - -The _alignment_ of array types is greater or equal to the alignment of its -element type. If the element type is `repr(C)` the layout of the array is -guaranteed to be the same as the layout of a C array with the same element type. - -> **Note**: the type of array arguments in C function signatures, e.g., `void -> foo(T x[N])`, decays to a pointer. That is, these functions do not take arrays -> as an arguments, they take a pointer to the first element of the array -> instead. Array types are therefore _improper C types_ (not C FFI safe) in Rust -> foreign function declarations, e.g., `extern { fn foo(x: [T; N]) -> [U; M]; -> }`. Pointers to arrays are fine: `extern { fn foo(x: *const [T; N]) -> *const -> [U; M]; }`, and `struct`s and `union`s containing arrays are also fine. - -### Arrays of zero-size - -Arrays `[T; N]` have zero size if and only if their count `N` is zero or their -element type `T` is zero-sized. - -### Layout compatibility with packed SIMD vectors - -The [layout of packed SIMD vector types][Vector] [^2] requires the _size_ and -_alignment_ of the vector elements to match. That is, types with [packed SIMD -vector][Vector] layout are layout compatible with arrays having the same element -type and the same number of elements as the vector. - -[^2]: The [packed SIMD vector][Vector] layout is the layout of `repr(simd)` types like [`__m128`]. - -[Vector]: packed-simd-vectors.md -[`__m128`]: https://doc.rust-lang.org/core/arch/x86_64/struct.__m128.html - -## Layout of Rust slices - -The layout of a slice `[T]` of length `N` is the same as that of a `[T; N]` array. +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/arrays-and-slices.md). diff --git a/reference/src/layout/enums.md b/reference/src/layout/enums.md index 91f466d2..c455f1d2 100644 --- a/reference/src/layout/enums.md +++ b/reference/src/layout/enums.md @@ -1,406 +1,7 @@ # Layout of Rust `enum` types -**Disclaimer:** Some parts of this section were decided in RFCs, but -others represent the consensus from issue [#10]. The text will attempt -to clarify which parts are "guaranteed" (owing to the RFC decision) -and which parts are still in a "preliminary" state, at least until we -start to open RFCs ratifying parts of the Unsafe Code Guidelines -effort. +**This page has been archived** -**Note:** This document has not yet been updated to -[RFC 2645](https://github.com/rust-lang/rfcs/blob/master/text/2645-transparent-unions.md). +It did not actually reflect current layout guarantees and caused frequent confusion. -[#10]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/10 - -## Categories of enums - -**Empty enums.** Enums with no variants can never be instantiated and -are equivalent to the `!` type. They do not accept any `#[repr]` -annotations. - -**Fieldless enums.** The simplest form of enum is one where none of -the variants have any fields: - -```rust -enum SomeEnum { - Variant1, - Variant2, - Variant3, -} -``` - -Such enums correspond quite closely with enums in the C language -(though there are important differences as well). Presuming that they -have more than one variant, these sorts of enums are always -represented as a simple integer, though the size will vary. - -Fieldless enums may also specify the value of their discriminants -explicitly: - -```rust -enum SomeEnum { - Variant22 = 22, - Variant44 = 44, - Variant45, -} -``` - -As in C, discriminant values that are not specified are defined as -either 0 (for the first variant) or as one more than the prior -variant. - -**Data-carrying enums.** Enums with at least one variant with fields are called -"data-carrying" enums. Note that for the purposes of this definition, it is not -relevant whether the variant fields are zero-sized. Therefore this enum is -considered "data-carrying": - -```rust -enum Foo { - Bar(()), - Baz, -} -``` - -## repr annotations accepted on enums - -In general, enums may be annotated using the following `#[repr]` tags: - -- A specific integer type (called `Int` as a shorthand below): - - `#[repr(u8)]` - - `#[repr(u16)]` - - `#[repr(u32)]` - - `#[repr(u64)]` - - `#[repr(i8)]` - - `#[repr(i16)]` - - `#[repr(i32)]` - - `#[repr(i64)]` -- C-compatible layout: - - `#[repr(C)]` -- C-compatible layout with a specified discriminant size: - - `#[repr(C, u8)]` - - `#[repr(C, u16)]` - - etc - -Note that manually specifying the alignment using `#[repr(align)]` is -not permitted on an enum. - -The set of repr annotations accepted by an enum depends on its category, -as defined above: - -- Empty enums: no repr annotations are permitted. -- Fieldless enums: `#[repr(Int)]`-style and `#[repr(C)]` annotations are permitted, but `#[repr(C, Int)]` annotations are not. -- Data-carrying enums: all repr annotations are permitted. - -## Enum layout rules - -The rules for enum layout vary depending on the category. - -### Layout of an empty enum - -An **empty enum** is an enum with no variants; empty enums can never -be instantiated and are logically equivalent to the "never type" -`!`. `#[repr]` annotations are not accepted on empty enums. Empty -enums are guaranteed to have the same layout as `!` (zero size and -alignment 1). - -### Layout of a fieldless enum - -If there is no `#[repr]` attached to a fieldless enum, the compiler -will represent it using an integer of sufficient size to store the -discriminants for all possible variants -- note that if there is only -one variant, then 0 bits are required, so it is possible that the enum -may have zero size. In the absence of a `#[repr]` annotation, the -number of bits used by the compiler are not defined and are subject to -change. - -When a `#[repr(Int)]`-style annotation is attached to a fieldless enum -(one without any data for its variants), it will cause the enum to be -represented as a simple integer of the specified size `Int`. This must -be sufficient to store all the required discriminant values. - -The `#[repr(C)]` annotation is equivalent, but it selects the same -size as the C compiler would use for the given target for an -equivalent C-enum declaration. - -Combining a `C` and `Int` `repr` (e.g., `#[repr(C, u8)]`) is -not permitted on a fieldless enum. - -The values used for the discriminant will match up with what is -specified (or automatically assigned) in the enum definition. For -example, the following enum defines the discriminants for its variants -as 22 and 23 respectively: - -```rust -enum Foo { - // Specificy discriminant of this variant as 22: - Variant22 = 22, - - // Default discriminant is one more than the previous, - // so 23 will be assigned. - Variant23 -} -``` - -**Note:** some C compilers offer flags (e.g., `-fshort-enums`) that -change the layout of enums from the default settings that are standard -for the platform. The integer size selected by `#[repr(C)]` is defined -to match the **default** settings for a given target, when no such -flags are supplied. If interop with code that uses other flags is -desired, then one should either specify the sizes of enums manually or -else use an alternate target definition that is tailored to the -compiler flags in use. - -### Layout of a data-carrying enums with an explicit repr annotation - -This section concerns data-carrying enums **with an explicit repr -annotation of some form**. The memory layout of such cases was -specified in [RFC 2195][] and is therefore normative. - -[RFC 2195]: https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html - -The layout of data-carrying enums that do **not** have an explicit -repr annotation is generally undefined, but with certain specific -exceptions: see the next section for details. - -#### Explicit repr annotation without C compatibility - -When an enum is tagged with `#[repr(Int)]` for some integral type -`Int` (e.g., `#[repr(u8)]`), it will be represented as a C-union of a -series of `#[repr(C)]` structs, one per variant. Each of these structs -begins with an integral field containing the **discriminant**, which -specifies which variant is active. They then contain the remaining -fields associated with that variant. - -**Example.** The following enum uses an `repr(u8)` annotation: - -```rust -#[repr(u8)] -enum TwoCases { - A(u8, u16), - B(u16), -} -``` - -This will be laid out equivalently to the following more -complex Rust types: - -```rust -#[repr(C)] -union TwoCasesRepr { - A: TwoCasesVariantA, - B: TwoCasesVariantB, -} - -# #[derive(Copy, Clone)] -#[repr(u8)] -enum TwoCasesTag { A, B } - -# #[derive(Copy, Clone)] -#[repr(C)] -struct TwoCasesVariantA(TwoCasesTag, u8, u16); - -# #[derive(Copy, Clone)] -#[repr(C)] -struct TwoCasesVariantB(TwoCasesTag, u16); -``` - -Note that the `TwoCasesVariantA` and `TwoCasesVariantB` structs are -`#[repr(C)]`; this is needed to ensure that the `TwoCasesTag` value -appears at offset 0 in both cases, so that we can read it to determine -the current variant. - -#### Explicit repr annotation with C compatibility - -When the `#[repr]` tag includes `C`, e.g., `#[repr(C)]` or `#[repr(C, -u8)]`, the layout of enums is changed to better match C++ enums. In -this mode, the data is laid out as a tuple of `(discriminant, union)`, -where `union` represents a C union of all the possible variants. The -type of the discriminant will be the integral type specified (`u8`, -etc) -- if no type is specified, then the compiler will select one -based on what a size a fieldless enum would have with the same number of -variants. - -This layout, while more compatible and arguably more obvious, is also -less efficient than the non-C compatible layout in some cases in terms -of total size. For example, the `TwoCases` example given in the -previous section only occupies 4 bytes with `#[repr(u8)]`, but would -occupy 6 bytes with `#[repr(C, u8)]`, as more padding is required. - -**Example.** The following enum: - -```rust,ignore -#[repr(C, Int)] -enum MyEnum { - A(u32), - B(f32, u64), - C { x: u32, y: u8 }, - D, -} -``` - -is equivalent to the following Rust definition: - -```rust,ignore -#[repr(C)] -struct MyEnumRepr { - tag: MyEnumTag, - payload: MyEnumPayload, -} - -#[repr(Int)] -enum MyEnumTag { A, B, C, D } - -#[repr(C)] -union MyEnumPayload { - A: u32, - B: MyEnumPayloadB, - C: MyEnumPayloadC, - D: (), -} - -#[repr(C)] -struct MyEnumPayloadB(f32, u64); - -#[repr(C)] -struct MyEnumPayloadC { x: u32, y: u8 } -``` - -This enum can also be represented in C++ as follows: - -```c++ -#include - -enum class MyEnumTag: CppEquivalentOfInt { A, B, C, D }; -struct MyEnumPayloadB { float _0; uint64_t _1; }; -struct MyEnumPayloadC { uint32_t x; uint8_t y; }; - -union MyEnumPayload { - uint32_t A; - MyEnumPayloadB B; - MyEnumPayloadC C; -}; - -struct MyEnum { - MyEnumTag tag; - MyEnumPayload payload; -}; -``` - -### Layout of a data-carrying enums without a repr annotation - -If no explicit `#[repr]` attribute is used, then the layout of a -data-carrying enum is typically **not specified**. However, in certain -select cases, there are **guaranteed layout optimizations** that may -apply, as described below. - -#### Discriminant elision on Option-like enums - -(Meta-note: The content in this section have been turned into stable guarantees -[via this -FCP](https://github.com/rust-lang/rust/pull/130628#issuecomment-2402761599).). - -**Definition.** The fully monomorphized form of a 2-variant `enum` is called an **option-like enum** -if all of the following are satisfied: - -- the `enum` has no explicit `#[repr(...)]`, and -- one variant has a single field with a type that guarantees discriminant - elision (to be defined below), and -- the other variant has only 1-ZST fields (the "unit variant"). - -The simplest example is `Option` itself, where the `Some` variant -has a single field (of type `T`), and the `None` variant has no -fields. But other enums that fit that same template also qualify, e.g. -`Result` or `Result<(), T>`. - -**Definition.** The **payload** of an option-like enum is the single -field which it contains; in the case of `Option`, the payload has -type `T`. - -**Definition.** The following payload types have guaranteed discriminant -elision: - -* `&T` -* `&mut T` -* `Box` -* `extern "ABI" fn` (for arbitrary "ABI") -* `core::num::NonZero*` -* `core::ptr::NonNull` -* `#[repr(transparent)] struct` around one of the types in this list. - -(Meta-note: all these types have at least one bit pattern that is guaranteed be -invalid, and can therefore be used as a "[niche]" when computing the enum layout. -More types have this property, but the *guarantee* described here only applies -to the types listed here.) - -**Option-like enums where the payload has guaranteed discriminant elision -are guaranteed to be represented using the same memory layout as their -payload.** This is called **discriminant elision**, as there is no -explicit discriminant value stored anywhere. Instead, [niche] values are -used to represent the unit variant. - -The most common example is that `Option<&u8>` can be represented as an -nullable `&u8` reference -- the `None` variant is then represented -using the [niche] value zero. This is because a valid `&u8` value can -never be zero, so if we see a zero value, we know that this must be -`None` variant. - -**Example.** The type `Option<&u32>` will be represented at runtime as -a nullable pointer. FFI interop often depends on this property. - -**Example.** As `fn` types are non-nullable, the type `Option` will be represented at runtime as a nullable function -pointer (which is therefore equivalent to a C function pointer) . FFI -interop often depends on this property. - -**Example.** The following enum definition is **not** option-like, -as it has two unit variants: - -```rust -enum Enum1 { - Present(T), - Absent1, - Absent2, -} -``` - -**Example.** The following enum definition is **not** option-like, -as it has an explicit `repr` attribute. - -```rust -#[repr(u8)] -enum Enum2 { - Present(T), - Absent1, -} -``` - -[niche]: ../glossary.md#niche - -### Layout of enums with a single variant - -> **NOTE**: the guarantees in this section have not been approved by an RFC process. - -**Data-carrying** enums with a single variant without a `repr()` annotation have -the same layout as the variant field. **Fieldless** enums with a single variant -have the same layout as a unit struct. - -For example, here: - -```rust -struct UnitStruct; -enum FieldlessSingleVariant { FieldlessVariant } - -struct SomeStruct { x: u32 } -enum DataCarryingSingleVariant { - DataCarryingVariant(SomeStruct), -} -``` - -* `FieldSingleVariant` has the same layout as `UnitStruct`, -* `DataCarryingSingleVariant` has the same layout as `SomeStruct`. - -## Unresolved questions - -See [Issue #79.](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/79): - -* Layout of multi-variant enums where only one variant is inhabited. +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/enums.md). diff --git a/reference/src/layout/function-pointers.md b/reference/src/layout/function-pointers.md index 08fd549c..64b4d704 100644 --- a/reference/src/layout/function-pointers.md +++ b/reference/src/layout/function-pointers.md @@ -1,129 +1,7 @@ # Representation of Function Pointers -### Terminology +**This page has been archived** -In Rust, a function pointer type, is either `fn(Args...) -> Ret`, -`extern "ABI" fn(Args...) -> Ret`, `unsafe fn(Args...) -> Ret`, or -`unsafe extern "ABI" fn(Args...) -> Ret`. -A function pointer is the address of a function, -and has function pointer type. -The pointer is implicit in the `fn` type, -and they have no lifetime of their own; -therefore, function pointers are assumed to point to -a block of code with static lifetime. -This is not necessarily always true, -since, for example, you can unload a dynamic library. -Therefore, this is _only_ a safety invariant, -not a validity invariant; -as long as one doesn't call a function pointer which points to freed memory, -it is not undefined behavior. +It did not actually reflect current layout guarantees and caused frequent confusion. - -In C, a function pointer type is `Ret (*)(Args...)`, or `Ret ABI (*)(Args...)`, -and values of function pointer type are either a null pointer value, -or the address of a function. - -### Representation - -The ABI and layout of `(unsafe)? (extern "ABI")? fn(Args...) -> Ret` -is exactly that of the corresponding C type -- -the lack of a null value does not change this. -On common platforms, this means that `*const ()` and `fn(Args...) -> Ret` have -the same ABI and layout. This is, in fact, guaranteed by POSIX and Windows. -This means that for the vast majority of platforms, - -```rust -fn go_through_pointer(x: fn()) -> fn() { - let ptr = x as *const (); - unsafe { std::mem::transmute::<*const (), fn()>(ptr) } -} -``` - -is both perfectly safe, and, in fact, required for some APIs -- notably, -`GetProcAddress` on Windows requires you to convert from `void (*)()` to -`void*`, to get the address of a variable; -and the opposite is true of `dlsym`, which requires you to convert from -`void*` to `void (*)()` in order to get the address of functions. -This conversion is _not_ guaranteed by Rust itself, however; -simply the implementation. If the underlying platform allows this conversion, -so will Rust. - -However, null values are not supported by the Rust function pointer types -- -just like references, the expectation is that you use `Option` to create -nullable pointers. `Option Ret>` will have the exact same ABI -as `fn(Args...) -> Ret`, but additionally allows null pointer values. - - -### Use - -Function pointers are mostly useful for talking to C -- in Rust, you would -mostly use `T: Fn()` instead of `fn()`. If talking to a C API, -the same caveats as apply to other FFI code should be followed. -As an example, we shall implement the following C interface in Rust: - -```c -struct Cons { - int data; - struct Cons *next; -}; - -struct Cons *cons(struct Cons *self, int data); - -/* - notes: - - func must be non-null - - thunk may be null, and shall be passed unchanged to func - - self may be null, in which case no iteration is done -*/ - -void iterate(struct Cons const *self, void (*func)(int, void *), void *thunk); -bool for_all(struct Cons const *self, bool (*func)(int, void *), void *thunk); -``` - -```rust -# use std::{ -# ffi::c_void, -# os::raw::c_int, -# }; -# - -#[repr(C)] -pub struct Cons { - data: c_int, - next: Option>, -} - -#[no_mangle] -pub extern "C" fn cons(node: Option>, data: c_int) -> Box { - Box::new(Cons { data, next: node }) -} - -#[no_mangle] -pub unsafe extern "C" fn iterate( - node: Option<&Cons>, - func: unsafe extern "C" fn(i32, *mut c_void), // note - non-nullable - thunk: *mut c_void, // note - this is a thunk, so it's just passed raw -) { - let mut it = node; - while let Some(node) = it { - func(node.data, thunk); - it = node.next.as_ref().map(|x| &**x); - } -} - -#[no_mangle] -pub unsafe extern "C" fn for_all( - node: Option<&Cons>, - func: unsafe extern "C" fn(i32, *mut c_void) -> bool, - thunk: *mut c_void, -) -> bool { - let mut it = node; - while let Some(node) = node { - if !func(node.data, thunk) { - return false; - } - it = node.next.as_ref().map(|x| &**x); - } - true -} -``` +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/function-pointers.md). diff --git a/reference/src/layout/packed-simd-vectors.md b/reference/src/layout/packed-simd-vectors.md index ed7b871b..755f24a1 100644 --- a/reference/src/layout/packed-simd-vectors.md +++ b/reference/src/layout/packed-simd-vectors.md @@ -1,98 +1,7 @@ # Layout of packed SIMD vectors -**Disclaimer:** This chapter represents the consensus from issue -[#38]. The statements in here are not (yet) "guaranteed" -not to change until an RFC ratifies them. +**This page has been archived** -[#38]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/38 +It did not actually reflect current layout guarantees and caused frequent confusion. -Rust currently exposes packed[^1] SIMD vector types like `__m128` to users, but it -does not expose a way for users to construct their own vector types. - -The set of currently-exposed packed SIMD vector types is -_implementation-defined_ and it is currently different for each architecture. - -[^1]: _packed_ denotes that these SIMD vectors have a compile-time fixed size, - distinguishing these from SIMD vector types whose size is only known at - run-time. Rust currently only supports _packed_ SIMD vector types. This is - elaborated further in [RFC2366]. - -[RFC2366]: https://github.com/gnzlbg/rfcs/blob/ppv/text/0000-ppv.md#interaction-with-cray-vectors - -## Packed SIMD vector types - -Packed SIMD vector types are `repr(simd)` homogeneous tuple-structs containing -`N` elements of type `T` where `N` is a power-of-two and the size and alignment -requirements of `T` are equal: - -```rust,ignore -#[repr(simd)] -struct Vector(T_0, ..., T_(N - 1)); -``` - -The set of supported values of `T` and `N` is _implementation-defined_. - -The size of `Vector` is `N * size_of::()` and its alignment is an -_implementation-defined_ function of `T` and `N` greater than or equal to -`align_of::()`. That is: - -```rust,ignore -assert_eq!(size_of::>(), size_of::() * N); -assert!(align_of::>() >= align_of::()); -``` - -That is, two distinct `repr(simd)` vector types that have the same `T` and the -same `N` have the same size and alignment. - -Vector elements are laid out in source field order, enabling random access to -vector elements by reinterpreting the vector as an array: - -```rust,ignore -union U { - vec: Vector, - arr: [T; N] -} - -assert_eq!(size_of::>(), size_of::<[T; N]>()); -assert!(align_of::>() >= align_of::<[T; N]>()); - -unsafe { - let u = U { vec: Vector(t_0, ..., t_(N - 1)) }; - - assert_eq!(u.vec.0, u.arr[0]); - // ... - assert_eq!(u.vec.(N - 1), u.arr[N - 1]); -} -``` - -### Unresolved questions - -* **Blocked**: Should the layout of packed SIMD vectors be the same as that of - homogeneous tuples ? Such that: - - ```rust,ignore - union U { - vec: Vector, - tup: (T_0, ..., T_(N-1)), - } - - assert_eq!(size_of::>(), size_of::<(T_0, ..., T_(N-1))>()); - assert!(align_of::>() >= align_of::<(T_0, ..., T_(N-1))>()); - - unsafe { - let u = U { vec: Vector(t_0, ..., t_(N - 1)) }; - - assert_eq!(u.vec.0, u.tup.0); - // ... - assert_eq!(u.vec.(N - 1), u.tup.(N - 1)); - } - ``` - - This is blocked on the resolution of issue [#36] about the layout of - homogeneous structs and tuples. - - [#36]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/36 - -* `MaybeUninit` does not have the same `repr` as `T`, so - `MaybeUninit>` are not `repr(simd)`, which has performance - consequences and means that `MaybeUninit>` is not C-FFI safe. +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/packed-simd-vectors.md). diff --git a/reference/src/layout/pointers.md b/reference/src/layout/pointers.md index b8324ad1..c2b0c02d 100644 --- a/reference/src/layout/pointers.md +++ b/reference/src/layout/pointers.md @@ -1,74 +1,7 @@ # Layout of reference and pointer types -**Disclaimer:** Everything this section says about pointers to dynamically sized -types represents the consensus from issue [#16], but has not been stabilized -through an RFC. As such, this is preliminary information. +**This page has been archived** -[#16]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/16 +It did not actually reflect current layout guarantees and caused frequent confusion. -### Terminology - -Reference types are types of the form `&T`, `&mut T`. - -Raw pointer types are types of the form `*const T` or `*mut T`. - -### Representation - -The alignment of `&T`, `&mut T`, `*const T` and `*mut T` are the same, -and are at least the word size. - -* If `T` is a sized type then the alignment of `&T` is the word size. -* The alignment of `&dyn Trait` is the word size. -* The alignment of `&[T]` is the word size. -* The alignment of `&str` is the word size. -* Alignment in other cases may be more than the word size (e.g., for other dynamically sized types). - -The sizes of `&T`, `&mut T`, `*const T` and `*mut T` are the same, -and are at least one word. - -* If `T` is a sized type then the size of `&T` is one word. -* The size of `&dyn Trait` is two words. -* The size of `&[T]` is two words. -* The size of `&str` is two words. -* Size in other cases may be more than one word (e.g., for other dynamically sized types). - -### Notes - -The layouts of `&T`, `&mut T`, `*const T` and `*mut T` are the same. - -If `T` is sized, references and pointers to `T` have a size and alignment of one -word and have therefore the same layout as C pointers. - -> **warning**: while the layout of references and pointers is compatible with -> the layout of C pointers, references come with a _validity_ invariant that -> does not allow them to be used when they could be `NULL`, unaligned, dangling, -> or, in the case of `&mut T`, aliasing. - -We do not make any guarantees about the layout of -multi-trait objects `&(dyn Trait1 + Trait2)` or references to other dynamically sized types, -other than that they are at least word-aligned, and have size at least one word. - -The layout of `&dyn Trait` when `Trait` is a trait is the same as that of: -```rust -#[repr(C)] -struct DynObject { - data: *const u8, - vtable: *const u8, -} -``` - -> **note**: In the layout of `&mut dyn Trait` the field `data` is of the type `*mut u8`. - -The layout of `&[T]` is the same as that of: -```rust -#[repr(C)] -struct Slice { - ptr: *const T, - len: usize, -} -``` - -> **note**: In the layout of `&mut [T]` the field `ptr` is of the type `*mut T`. - -The layout of `&str` is the same as that of `&[u8]`, and the layout of `&mut str` is -the same as that of `&mut [u8]`. +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/pointers.md). diff --git a/reference/src/layout/scalars.md b/reference/src/layout/scalars.md index babb4f46..95e34c54 100644 --- a/reference/src/layout/scalars.md +++ b/reference/src/layout/scalars.md @@ -1,129 +1,7 @@ # Layout of scalar types -**Disclaimer:** This chapter represents the consensus from issue -[#9]. The statements in here are not (yet) "guaranteed" -not to change until an RFC ratifies them. +**This page has been archived** -This documents the memory layout and considerations for `bool`, `char`, floating -point types (`f{32, 64}`), and integral types (`{i,u}{8,16,32,64,128,size}`). -These types are all scalar types, representing a single value, and have no -layout `#[repr()]` flags. +It did not actually reflect current layout guarantees and caused frequent confusion. -[#9]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/9 - -## `bool` - -Rust's `bool` has the same layout as C17's` _Bool`, that is, its size -and alignment are [implementation-defined][data-layout]. Any `bool` can be -cast into an integer, taking on the values 1 (`true`) or 0 (`false`). - -> **Note**: on all platforms that Rust's currently supports, its size and -> alignment are 1, and its ABI class is `INTEGER` - see [Rust Layout and ABIs]. - -[Rust Layout and ABIs]: https://gankro.github.io/blah/rust-layouts-and-abis/#the-layoutsabis-of-builtins - -## `char` - -Rust char is 32-bit wide and represents an [unicode scalar value]. The alignment -of `char` is [implementation-defined][data-layout]. - -[unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value - -> **Note**: Rust `char` type is not layout compatible with C / C++ `char` types. -> The C / C++ `char` types correspond to either Rust's `i8` or `u8` types on all -> currently supported platforms, depending on their signedness. Rust does not -> support C platforms in which C `char` is not 8-bit wide. - -## `isize` and `usize` - -The `isize` and `usize` types are pointer-sized signed and unsigned integers. -They have the same layout as the [pointer types] for which the pointee is -`Sized`, and are layout compatible with C's `uintptr_t` and `intptr_t` types. - -> **Note**: C99 [7.18.2.4](https://port70.net/~nsz/c/c99/n1256.html#7.18.2.4) -> requires `uintptr_t` and `intptr_t` to be at least 16-bit wide. All -> platforms we currently support have a C platform, and as a consequence, -> `isize`/`usize` are at least 16-bit wide for all of them. - -> **Note**: Rust's `usize` and C's `unsigned` types are **not** equivalent. C's -> `unsigned` is at least as large as a short, allowed to have padding bits, etc. -> but it is not necessarily pointer-sized. - -> **Note**: in the current Rust implementation, the layouts of `isize` and -> `usize` determine the following: -> -> * the maximum size of Rust _allocations_ is limited to `isize::MAX`. -> The LLVM `getelementptr` instruction uses signed-integer field offsets. Rust -> calls `getelementptr` with the `inbounds` flag which assumes that field -> offsets do not overflow, -> -> * the maximum number of elements in an array is `usize::MAX` (`[T; N: usize]`). -> Only ZST arrays can probably be this large in practice, non-ZST arrays -> are bound by the maximum size of Rust values, -> -> * the maximum value in bytes by which a pointer can be offseted using -> `ptr.add` or `ptr.offset` is `isize::MAX`. -> -> These limits have not gone through the RFC process and are not guaranteed to -> hold. - -[pointer types]: ./pointers.md - -## Fixed-width integer types - -For all Rust's fixed-width integer types `{i,u}{8,16,32,64,128}` it holds that: - -* these types have no padding bits, -* their size exactly matches their bit-width, -* negative values of signed integer types are represented using 2's complement. - -Furthermore, Rust's signed and unsigned fixed-width integer types -`{i,u}{8,16,32,64}` have the same layout as the C fixed-width integer types from -the `` header `{u,}int{8,16,32,64}_t`. These fixed-width integer types -are therefore safe to use directly in C FFI where the corresponding C -fixed-width integer types are expected. - -The alignment of Rust's `{i,u}128` is _unspecified_ and allowed to change. - -> **Note**: While the C standard does not define fixed-width 128-bit wide -> integer types, many C compilers provide non-standard `__int128` types as a -> language extension. The layout of `{i,u}128` in the current Rust -> implementation does **not** match that of these C types, see -> [rust-lang/#54341](https://github.com/rust-lang/rust/issues/54341). - -### Layout compatibility with C native integer types - -The specification of native C integer types, `char`, `short`, `int`, `long`, -... as well as their `unsigned` variants, guarantees a lower bound on their size, -e.g., `short` is _at least_ 16-bit wide and _at least_ as wide as `char`. - -Their exact sizes are _implementation-defined_. - -Libraries like `libc` use knowledge of this _implementation-defined_ behavior on -each platform to select a layout-compatible Rust fixed-width integer type when -interfacing with native C integer types (e.g. `libc::c_int`). - -> **Note**: Rust does not support C platforms on which the C native integer type -> are not compatible with any of Rust's fixed-width integer type (e.g. because -> of padding-bits, lack of 2's complement, etc.). - -## Fixed-width floating point types - -Rust's `f32` and `f64` single (32-bit) and double (64-bit) precision -floating-point types have [IEEE-754] `binary32` and `binary64` floating-point -layouts, respectively. - -When the platforms' `"math.h"` header defines the `__STDC_IEC_559__` macro, -Rust's floating-point types are safe to use directly in C FFI where the -appropriate C types are expected (`f32` for `float`, `f64` for `double`). - -If the C platform's `"math.h"` header does not define the `__STDC_IEC_559__` -macro, whether using `f32` and `f64` in C FFI is safe or not for which C type is -_implementation-defined_. - -> **Note**: the `libc` crate uses knowledge of each platform's -> _implementation-defined_ behavior to provide portable `libc::c_float` and -> `libc::c_double` types that can be used to safely interface with C via FFI. - -[IEEE-754]: https://en.wikipedia.org/wiki/IEEE_754 -[data-layout]: https://doc.rust-lang.org/nightly/reference/type-layout.html#primitive-data-layout +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/scalars.md). diff --git a/reference/src/layout/structs-and-tuples.md b/reference/src/layout/structs-and-tuples.md index e94d2c6e..bfaf7356 100644 --- a/reference/src/layout/structs-and-tuples.md +++ b/reference/src/layout/structs-and-tuples.md @@ -1,452 +1,7 @@ # Layout of structs and tuples -**Disclaimer:** This chapter represents the consensus from issues -[#11] and [#12]. The statements in here are not (yet) "guaranteed" -not to change until an RFC ratifies them. +**This page has been archived** -[#11]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11 -[#12]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/12 +It did not actually reflect current layout guarantees and caused frequent confusion. -## Tuple types - -In general, an anonymous tuple type `(T1..Tn)` of arity N is laid out -"as if" there were a corresponding tuple struct declared in libcore: - -```rust,ignore -#[repr(Rust)] -struct TupleN(P1..Pn); -``` - -In this case, `(T1..Tn)` would be compatible with `TupleN`. -As discussed below, this generally means that the compiler is **free -to re-order field layout** as it wishes. Thus, if you would like a -guaranteed layout from a tuple, you are generally advised to create a -named struct with a `#[repr(C)]` annotation (see [the section on -structs for more details](#structs)). - -Note that the final element of a tuple (`Pn`) is marked as `?Sized` to -permit unsized tuple coercion -- this is implemented on nightly but is -currently unstable ([tracking issue][#42877]). In the future, we may -extend unsizing to other elements of tuples as well. - -[#42877]: https://github.com/rust-lang/rust/issues/42877 - -### Other notes on tuples - -Some related discussion: - -- [RFC #1582](https://github.com/rust-lang/rfcs/pull/1582) proposed - that tuple structs should have a "nested layout", where - e.g. `(T1, T2, T3)` would in fact be laid out as `(T1, (T2, - T3))`. The purpose of this was to permit variadic matching and so - forth against some suffix of the struct. This RFC was not accepted, - however. This layout requires extra padding and seems somewhat - surprising: it means that the layout of tuples and tuple structs - would diverge significantly from structs with named fields. - - - -## Struct types - -Structs come in two principle varieties: - -```rust,ignore -// Structs with named fields -struct Foo { f1: T1, .., fn: Tn } - -// Tuple structs -struct Foo(T1, .., Tn); -``` - -In terms of their layout, tuple structs can be understood as -equivalent to a named struct with fields named `0..n-1`: - -```rust,ignore -struct Foo { - 0: T1, - ... - n-1: Tn -} -``` - -(In fact, one may use such field names in patterns or in accessor -expressions like `foo.0`.) - -The degrees of freedom the compiler has when computing the layout of an -*inhabited* struct or tuple is to determine the order of the fields, and the -"gaps" (often called *padding*) before, between, and after the fields. The -layout of these fields themselves is already entirely determined by their types, -and since we intend to allow creating references to fields (`&s.f1`), structs do -not have any wiggle-room there. - -This can be visualized as follows: -```text -[ <--> [field 3] <-----> [field 1] <-> [ field 2 ] <--> ] -``` -**Figure 1** (struct-field layout): The `<-...->` and `[ ... ]` denote the differently-sized gaps and fields, respectively. - -Here, the individual fields are blocks of fixed size (determined by the field's -layout). The compiler freely picks an order for the fields to be in (this does -not have to be the order of declaration in the source), and it picks the gaps -between the fields (under some constraints, such as alignment). - -For *uninhabited* structs or tuples like `(i32, !)` that do not have a valid -inhabitant, the compiler has more freedom. After all, no references to fields -can ever be taken. For example, such structs might be zero-sized. - -How exactly the compiler picks order and gaps, as well as other aspects of -layout beyond size and field offset, can be controlled by a `#[repr]` attribute: - -- `#[repr(Rust)]` -- the default. -- `#[repr(C)]` -- request C compatibility -- `#[repr(align(N))]` -- specify the alignment -- `#[repr(packed)]` -- request packed layout where fields are not internally aligned -- `#[repr(transparent)]` -- request that a "wrapper struct" be treated - "as if" it were an instance of its field type when passed as an - argument - -### Default layout ("repr rust") - -With the exception of the guarantees provided below, **the default layout of -structs is not specified.** - -As of this writing, we have not reached a full consensus on what limitations -should exist on possible field struct layouts, so effectively one must assume -that the compiler can select any layout it likes for each struct on each -compilation, and it is not required to select the same layout across two -compilations. This implies that (among other things) two structs with the same -field types may not be laid out in the same way (for example, the hypothetical -struct representing tuples may be laid out differently from user-declared -structs). - -Known things that can influence layout (non-exhaustive): - -- the type of the struct fields and the layout of those types -- compiler settings, including esoteric choices like optimization fuel - -**A note on determinism.** The definition above does not guarantee -determinism between executions of the compiler -- two executions may -select different layouts, even if all inputs are identical. Naturally, -in practice, the compiler aims to produce deterministic output for a -given set of inputs. However, it is difficult to produce a -comprehensive summary of the various factors that may affect the -layout of structs, and so for the time being we have opted for a -conservative definition. - -**Compiler's current behavior.** As of the time of this writing, the -compiler will reorder struct fields to minimize the overall size of -the struct (and in particular to eliminate padding due to alignment -restrictions). - -Layout is presently defined not in terms of a "fully monomorphized" -struct definition but rather in terms of its generic definition along -with a set of substitutions (values for each type parameter; lifetime -parameters do not affect layout). This distinction is important -because of *unsizing* -- if the final field has generic type, the -compiler will not reorder it, to allow for the possibility of -unsizing. E.g., `struct Foo { x: u16, y: u32 }` and `struct Foo { -x: u16, y: T }` where `T = u32` are not guaranteed to be identical. - -#### Zero-sized structs -[zero-sized structs]: #zero-sized-structs - -For `repr(Rust)`, `repr(packed(N))`, `repr(align(N))`, and `repr(C)` structs: if -all fields of a struct have size 0, then the struct has size 0. - -For example, all these types are zero-sized: - -```rust -# use std::mem::size_of; -#[repr(align(32))] struct Zst0; -#[repr(C)] struct Zst1(Zst0); -struct Zst2(Zst1, Zst0); -# fn main() { -# assert_eq!(size_of::(), 0); -# assert_eq!(size_of::(), 0); -# assert_eq!(size_of::(), 0); -# } -``` - -In particular, a struct with no fields is a ZST, and if it has no repr attribute -it is moreover a 1-ZST as it also has no alignment requirements. - -#### Single-field structs -[single-field structs]: #single-field-structs - -A struct with only one field has the same layout as that field. - -#### Structs with 1-ZST fields - -For the purposes of struct layout [1-ZST] fields are ignored. - -In particular, if all but one field are 1-ZST, then the struct is equivalent to -a [single-field struct][single-field structs]. In other words, if all but one -field is a 1-ZST, then the entire struct has the same layout as that one field. - -Similarly, if all fields are 1-ZST, then the struct has the same layout as a -[struct with no fields][zero-sized structs], and is itself a 1-ZST. - -For example: - -```rust -type Zst1 = (); -struct S1(i32, Zst1); // same layout as i32 - -type Zst2 = [u16; 0]; -struct S2(Zst2, Zst1); // same layout as Zst2 - -struct S3(Zst1); // same layout as Zst1 -``` - -#### Unresolved questions - -During the course of the discussion in [#11] and [#12], various -suggestions arose to limit the compiler's flexibility. These questions -are currently considering **unresolved** and -- for each of them -- an -issue has been opened for further discussion on the repository. This -section documents the questions and gives a few light details, but the -reader is referred to the issues for further discussion. - -**Homogeneous structs ([#36]).** If you have homogeneous structs, where all -the `N` fields are of a single type `T`, can we guarantee a mapping to -the memory layout of `[T; N]`? How do we map between the field names -and the indices? What about zero-sized types? - -[#36]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/36 - -**Deterministic layout ([#35]).** Can we say that layout is some deterministic -function of a certain, fixed set of inputs? This would allow you to be -sure that if you do not alter those inputs, your struct layout would -not change, even if it meant that you can't predict precisely what it -will be. For example, we might say that struct layout is a function of -the struct's generic types and its substitutions, full stop -- this -would imply that any two structs with the same definition are laid out -the same. This might interfere with our ability to do profile-guided -layout or to analyze how a struct is used and optimize based on -that. Some would call that a feature. - -[#35]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/35 - -### C-compatible layout ("repr C") - -For structs tagged `#[repr(C)]`, the compiler will apply a C-like -layout scheme. See section 6.7.2.1 of the [C17 specification][C17] for -a detailed write-up of what such rules entail (as well as the relevant -specs for your platform). For most platforms, however, this means the -following: - -[C17]: https://web.archive.org/web/20181230041359if_/http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf - -- Field order is preserved. -- The first field begins at offset 0. -- Assuming the struct is not packed, each field's offset is aligned[^aligned] to - the ABI-mandated alignment for that field's type, possibly creating - unused padding bits. -- The total size of the struct is rounded up to its overall alignment. - -[^aligned]: Aligning an offset O to an alignment A means to round up the offset O until it is a multiple of the alignment A. - -The intention is that if one has a set of C struct declarations and a -corresponding set of Rust struct declarations, all of which are tagged -with `#[repr(C)]`, then the layout of those structs will all be -identical. Note that this setup implies that none of the structs in -question can contain any `#[repr(Rust)]` structs (or Rust tuples), as -those would have no corresponding C struct declaration -- as -`#[repr(Rust)]` types have undefined layout, you cannot safely declare -their layout in a C program. - -See also the notes on [ABI compatibility](#fnabi) under the section on `#[repr(transparent)]`. - -**Structs with no fields.** One area where Rust layout can deviate -from C/C++ -- even with `#[repr(C)]` -- comes about with "empty -structs" that have no fields. In C, an empty struct declaration like -`struct Foo { }` is illegal. However, both gcc and clang support -options to enable such structs, and [assign them size -zero](https://godbolt.org/z/AS2gdC). Rust behaves the same way -- -empty structs have size 0 and alignment 1 (unless an explicit -`#[repr(align)]` is present). C++, in contrast, gives empty structs a -size of 1, unless they are inherited from or they are fields that have -the `[[no_unique_address]]` attribute, in which case they do not -increase the overall size of the struct. - -**Structs of zero-size.** It is also possible to have structs that -have fields but still have zero size. In this case, the size of the -struct would be zero, but its alignment may be greater. For example, -`#[repr(C)] struct Foo { x: [u16; 0] }` would have an alignment of 2 -bytes by default. ([This matches the behavior in gcc and -clang](https://godbolt.org/z/5w0gkq).) - -**Structs with fields of zero-size.** If a `#[repr(C)]` struct -containing a field of zero-size, that field does not occupy space in -the struct; it can affect the offsets of subsequent fields if it -induces padding due to the alignment on its type. ([This matches the -behavior in gcc and clang](https://godbolt.org/z/5w0gkq).) - -**C++ compatibility hazard.** As noted above when discussing structs -with no fields, C++ treats empty structs like `struct Foo { }` -differently from C and Rust. This can introduce subtle compatibility -hazards. If you have an empty struct in your C++ code and you make the -"naive" translation into Rust, even tagging with `#[repr(C)]` will not -produce layout- or ABI-compatible results. - -### Fixed alignment - -The `#[repr(align(N))]` attribute may be used to raise the alignment -of a struct, as described in [The Rust Reference][TRR-align]. - -[TRR-align]: https://doc.rust-lang.org/stable/reference/type-layout.html#the-align-representation - -### Packed layout - -The `#[repr(packed(N))]` attribute may be used to impose a maximum -limit on the alignments for individual fields. It is most commonly -used with an alignment of 1, which makes the struct as small as -possible. For example, in a `#[repr(packed(2))]` struct, a `u8` or -`u16` would be aligned at 1- or 2-bytes respectively (as normal), but -a `u32` would be aligned at only 2 bytes instead of 4. In the absence -of an explicit `#[repr(align)]` directive, `#[repr(packed(N))]` also -sets the alignment for the struct as a whole to N bytes. - -The resulting fields may not fall at properly aligned boundaries in -memory. This makes it unsafe to create a Rust reference (`&T` or `&mut -T`) to those fields, as the compiler requires that all reference -values must always be aligned (so that it can use more efficient -load/store instructions at runtime). See the [Rust reference for more -details][TRR-packed]. - -[TRR-packed]: https://doc.rust-lang.org/stable/reference/type-layout.html#the-packed-representation - - - -### Function call ABI compatibility - -In general, when invoking functions that use the C ABI, `#[repr(C)]` -structs are guaranteed to be passed in the same way as their -corresponding C counterpart (presuming one exists). `#[repr(Rust)]` -structs have no such guarantee. This means that if you have an `extern -"C"` function, you cannot pass a `#[repr(Rust)]` struct as one of its -arguments. Instead, one would typically pass `#[repr(C)]` structs (or -possibly pointers to Rust-structs, if those structs are opaque on the -other side, or the callee is defined in Rust). - -However, there is a subtle point about C ABIs: in some C ABIs, passing -a struct with one field of type `T` as an argument is **not** -equivalent to just passing a value of type `T`. So e.g. if you have a -C function that is defined to take a `uint32_t`: - -```C -void some_function(uint32_t value) { .. } -``` - -It is **incorrect** to pass in a struct as that value, even if that -struct is `#[repr(C)`] and has only one field: - -```rust,ignore -#[repr(C)] -struct Foo { x: u32 } - -extern "C" some_function(Foo); - -some_function(Foo { x: 22 }); // Bad! -``` - -Instead, you should declare the struct with `#[repr(transparent)]`, -which specifies that `Foo` should use the ABI rules for its field -type, `u32`. This is useful when using "wrapper structs" in Rust to -give stronger typing guarantees. - -`#[repr(transparent)]` can only be applied to structs with a single -field whose type `T` has non-zero size, along with some number of -other fields whose types are all zero-sized (typically -`std::marker::PhantomData` fields). The struct then takes on the "ABI -behavior" of the type `T` that has non-zero size. - -(Note further that the Rust ABI is undefined and theoretically may -vary from compiler revision to compiler revision.) - -## Unresolved question: Guaranteeing compatible layouts? - -One key unresolved question was whether we would want to guarantee -that two `#[repr(Rust)]` structs whose fields have the same types are -laid out in a "compatible" way, such that one could be transmuted to -the other. @rkruppe laid out a [number of -examples](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-419956939) -where this might be a reasonable thing to expect. As currently -written, and in an effort to be conservative, we make no such -guarantee, though we do not firmly rule out doing such a thing in the future. - -It seems like it may well be desirable to -- at minimum -- guarantee -that `#[repr(Rust)]` layout is "some deterministic function of the -struct declaration and the monomorphized types of its fields". Note -that it is not sufficient to consider the monomorphized type of a -struct's fields: due to unsizing coercions, it matters whether the -struct is declared in a generic way or not, since the "unsized" field -must presently be [laid out last in the -structure](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/12#issuecomment-417843595). (Note -that tuples are always coercible (see [#42877] for more information), -and are always declared as generics.) This implies that our -"deterministic function" also takes as input the form in which the -fields are declared in the struct. - -However, that rule is not true today. For example, the compiler -includes an option (called "optimization fuel") that will enable us to -alter the layout of only the "first N" structs declared in the -source. When one is accidentally relying on the layout of a structure, -this can be used to track down the struct that is causing the problem. - -[#42877]: https://github.com/rust-lang/rust/issues/42877 -[pg-unsized-tuple]: https://play.rust-lang.org/?gist=46399bb68ac685f23beffefc014203ce&version=nightly&mode=debug&edition=2015 - -There are also benefits to having fewer guarantees. For example: - -- Code hardening tools can be used to randomize the layout of individual structs. -- Profile-guided optimization might analyze how instances of a -particular struct are used and tweak the layout (e.g., to insert -padding and reduce false sharing). - - However, there aren't many tools that do this sort of thing -([1](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420650851), -[2](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420681763)). Moreover, -it would probably be better for the tools to merely recommend -annotations that could be added -([1](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420077105), -[2](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420077105)), -such that the knowledge of the improved layouts can be recorded in the -source. - -As a more declarative alternative, @alercah [proposed a possible -extension](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/12#issuecomment-420165155) -that would permit one to declare that the layout of two structs or -types are compatible (e.g., `#[repr(as(Foo))] struct Bar { .. }`), -thus permitting safe transmutes (and also ABI compatibility). One -might also use some weaker form of `#[repr(C)]` to specify a "more -deterministic" layout. These areas need future exploration. - -## Counteropinions and other notes - -@joshtripplet [argued against reordering struct -fields](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-417953576), -suggesting instead it would be better if users reordering fields -themselves. However, there are a number of downsides to such a -proposal (and -- further -- it does not match our existing behavior): - -- In a generic struct, the [best ordering of fields may not be known - ahead of - time](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420659840), - so the user cannot do it manually. -- If layout is defined, and a library exposes a struct with all public - fields, then clients may be more likely to assume that the layout of - that struct is stable. If they were to write unsafe code that relied - on this assumption, that would break if fields were reordered. But - libraries may well expect the freedom to reorder fields. This case - is weakened because of the requirement to write unsafe code (after - all, one can always write unsafe code that relies on virtually any - implementation detail); if we were to permit **safe** casts that - rely on the layout, then reordering fields would clearly be a - breaking change (see also [this - comment](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420117856) - and [this - thread](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/31#discussion_r224955817)). -- Many people would prefer the name ordering to be chosen for - "readability" and not optimal layout. - -[1-ZST]: ../glossary.md#zero-sized-type--zst +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/structs-and-tuples.md). diff --git a/reference/src/layout/unions.md b/reference/src/layout/unions.md index 42d55ee7..f4d74150 100644 --- a/reference/src/layout/unions.md +++ b/reference/src/layout/unions.md @@ -1,178 +1,7 @@ # Layout of unions -**Disclaimer:** This chapter represents the consensus from issue -[#13]. The statements in here are not (yet) "guaranteed" -not to change until an RFC ratifies them. +**This page has been archived** -**Note:** This document has not yet been updated to -[RFC 2645](https://github.com/rust-lang/rfcs/blob/master/text/2645-transparent-unions.md). +It did not actually reflect current layout guarantees and caused frequent confusion. -[#13]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/13 - -### Layout of individual union fields - -A union consists of several variants, one for each field. All variants have the -same size and start at the same memory address, such that in memory the variants -overlap. This can be visualized as follows: - -```text -[ <--> [field0_ty] <----> ] -[ <----> [field1_ty] <--> ] -[ <---> [field2_ty] <---> ] -``` -**Figure 1** (union-field layout): Each row in the picture shows the layout of -the union for each of its variants. The `<-...->` and `[ ... ]` denote the -differently-sized gaps and fields, respectively. - -The individual fields (`[field{i}_ty_]`) are blocks of fixed size determined by -the field's [layout]. Since we allow creating references to union fields -(`&u.i`), the only degrees of freedom the compiler has when computing the layout -of a union are the size of the union, which can be larger than the size of its -largest field, and the offset of each union field within its variant. How these -are picked depends on certain constraints like, for example, the alignment -requirements of the fields, the `#[repr]` attribute of the `union`, etc. - -[padding]: ../glossary.md#padding -[layout]: ../glossary.md#layout - -### Unions with default layout ("`repr(Rust)`") - -Except for the guarantees provided below for some specific cases, the default -layout of Rust unions is, _in general_, **unspecified**. - -That is, there are no _general_ guarantees about the offset of the fields, -whether all fields have the same offset, what the call ABI of the union is, etc. - -
Rationale - -As of this writing, we want to keep the option of using non-zero offsets open -for the future; whether this is useful depends on what exactly the -compiler-assumed invariants about union contents are. This might become clearer -after the [validity of unions][#73] is settled. - -Even if the offsets happen to be all 0, there might still be differences in the -function call ABI. If you need to pass unions by-value across an FFI boundary, -you have to use `#[repr(C)]`. - -[#73]: https://github.com/rust-lang/unsafe-code-guidelines/issues/73 - -
- -#### Layout of unions with a single non-zero-sized field - -The layout of unions with a single non-[1-ZST]-field" is the same as the -layout of that field if it has no [padding] bytes. - -For example, here: - -```rust -# use std::mem::{size_of, align_of}; -# #[derive(Copy, Clone)] -#[repr(transparent)] -struct SomeStruct(i32); -# #[derive(Copy, Clone)] -struct Zst; -union U0 { - f0: SomeStruct, - f1: Zst, -} -# fn main() { -# assert_eq!(size_of::(), size_of::()); -# assert_eq!(align_of::(), align_of::()); -# } -``` - -the union `U0` has the same layout as `SomeStruct`, because `SomeStruct` has no -padding bits - it is equivalent to an `i32` due to `repr(transparent)` - and -because `Zst` is a [1-ZST]. - -On the other hand, here: - -```rust -# use std::mem::{size_of, align_of}; -# #[derive(Copy, Clone)] -struct SomeOtherStruct(i32); -# #[derive(Copy, Clone)] -#[repr(align(16))] struct Zst2; -union U1 { - f0: SomeOtherStruct, - f1: Zst2, -} -# fn main() { -# assert_eq!(size_of::(), align_of::()); -# assert_eq!(align_of::(), align_of::()); -assert_eq!(align_of::(), 16); -# } -``` - -the layout of `U1` is **unspecified** because: - -* `Zst2` is not a [1-ZST], and -* `SomeOtherStruct` has an unspecified layout and could contain padding bytes. - -### C-compatible layout ("repr C") - -The layout of `repr(C)` unions follows the C layout scheme. Per sections -[6.5.8.5] and [6.7.2.1.16] of the C11 specification, this means that the offset -of every field is 0. Unsafe code can cast a pointer to the union to a field type -to obtain a pointer to any field, and vice versa. - -[6.5.8.5]: http://port70.net/~nsz/c/c11/n1570.html#6.5.8p5 -[6.7.2.1.16]: http://port70.net/~nsz/c/c11/n1570.html#6.7.2.1p16 - -#### Padding - -Since all fields are at offset 0, `repr(C)` unions do not have padding before -their fields. They can, however, have padding in each union variant *after* the -field, to make all variants have the same size. - -Moreover, the entire union can have trailing padding, to make sure the size is a -multiple of the alignment: - -```rust -# use std::mem::{size_of, align_of}; -#[repr(C, align(2))] -union U { x: u8 } -# fn main() { -// The repr(align) attribute raises the alignment requirement of U to 2 -assert_eq!(align_of::(), 2); -// This introduces trailing padding, raising the union size to 2 -assert_eq!(size_of::(), 2); -# } -``` - -> **Note**: Fields are overlapped instead of laid out sequentially, so -> unlike structs there is no "between the fields" that could be filled -> with padding. - -#### Zero-sized fields - -`repr(C)` union fields of zero-size are handled in the same way as in struct -fields, matching the behavior of GCC and Clang for unions in C when zero-sized -types are allowed via their language extensions. - -That is, these fields occupy zero-size and participate in the layout computation -of the union as usual: - -```rust -# use std::mem::{size_of, align_of}; -#[repr(C)] -union U { - x: u8, - y: [u16; 0], -} -# fn main() { -// The zero-sized type [u16; 0] raises the alignment requirement to 2 -assert_eq!(align_of::(), 2); -// This in turn introduces trailing padding, raising the union size to 2 -assert_eq!(size_of::(), 2); -# } -``` - -**C++ compatibility hazard**: C++ does, in general, give a size of 1 to types -with no fields. When such types are used as a union field in C++, a "naive" -translation of that code into Rust will not produce a compatible result. Refer -to the [struct chapter](structs-and-tuples.md#c-compatible-layout-repr-c) for -further details. - -[1-ZST]: ../glossary.md#zero-sized-type--zst +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/unions.md). diff --git a/reference/src/optimizations/return_value_optimization.md b/reference/src/optimizations/return_value_optimization.md index 4f5b11ec..de3e0562 100644 --- a/reference/src/optimizations/return_value_optimization.md +++ b/reference/src/optimizations/return_value_optimization.md @@ -1,20 +1,5 @@ -We should turn +**This page has been archived** -```rust,ignore -// y unused -let mut x = f(); -g(&mut x); -y = x; -// x unused -``` +It did not actually reflect current language guarantees and caused frequent confusion. -into - -```rust,ignore -y = f(); -g(&mut y); -``` - -to avoid a copy. - -The potential issue here is `g` storing the pointer it got as an argument elsewhere. +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/optimizations/return_value_optimization.md). diff --git a/reference/src/validity/function-pointers.md b/reference/src/validity/function-pointers.md index 11389b15..a156249c 100644 --- a/reference/src/validity/function-pointers.md +++ b/reference/src/validity/function-pointers.md @@ -1,25 +1,7 @@ # Validity of function pointers -**Disclaimer**: This chapter is a work-in-progress. What's contained here -represents the consensus from issue [#72]. The statements in here are not (yet) -"guaranteed" not to change until an RFC ratifies them. +**This page has been archived** -A function pointer is "valid" (in the sense that it can be produced without causing immediate UB) if and only if it is non-null. +It did not actually reflect current language guarantees and caused frequent confusion. -That makes this code UB: - -```rust -fn bad() { - let x: fn() = unsafe { std::mem::transmute(0usize) }; // This is UB! -} -``` - -However, any integer value other than NULL is allowed for function pointers: - -```rust -fn good() { - let x: fn() = unsafe { std::mem::transmute(1usize) }; // This is not UB. -} -``` - -[#72]: https://github.com/rust-lang/unsafe-code-guidelines/issues/72 +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/validity/function-pointers.md). diff --git a/reference/src/validity/unions.md b/reference/src/validity/unions.md index bc732ff0..ec610c74 100644 --- a/reference/src/validity/unions.md +++ b/reference/src/validity/unions.md @@ -1,13 +1,7 @@ # Validity of unions -**Disclaimer**: This chapter is a work-in-progress. What's contained here -represents the consensus from issue [#73]. The statements in here are not (yet) -"guaranteed" not to change until an RFC ratifies them. +**This page has been archived** -## Validity of unions with zero-sized fields +It did not actually reflect current language guarantees and caused frequent confusion. -A union containing a zero-sized field can contain any bit pattern. An example of such -a union is [`MaybeUninit`]. - -[#73]: https://github.com/rust-lang/unsafe-code-guidelines/issues/73 -[`MaybeUninit`]: https://doc.rust-lang.org/std/mem/union.MaybeUninit.html +The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/validity/unions.md).