Puya Arrays

Status: Decided
Owner: Daniel McGregor (MakerX)
Deciders: Neil Campbell (MakerX), Daniel McGregor (MakerX), Tristan Menzel (MakerX), Larkin Young (Algorand Foundation), David Rojas (Algorand Foundation), Brian Whippo (Algorand Foundation), Joe Polny (Algorand Foundation), Giorgio Ciotti (Algorand Foundation)
Date created: 2024-12-10
Date decided: 2024-12-19
Date updated: 2024-12-19

Context

Arrays are a fundamental primitive in a high level languages and as such broad support is desirable, even in a constrained environment like the AVM. Currently Algorand Python only supports ARC-4 encoded arrays that have been designed to be efficient for reading, letting clients interacting with smart contracts manage the additional work required to ARC-4 encode the values. Given they are not efficient for creating or manipulating values, we have room to improve and introduce a more general purpose array.

In addition to how an array is encoded, the other facet to consider is where the array is stored in memory. The AVM supports two different types of memory which can be leveraged when writing a program: the stack and scratch space. These concepts are akin to a stack and heap found in more traditional virtual machine environments like the JVM.

The AVM memory types have some constraints that need to be considered. The stack can hold up to 1000 values, but only 256 are addressable at any point. Scratch slots support up to 256 addressable values. Both support values types of either a single uint64 or a byte slice of up to 4096 bytes. The stack is addressed using relative static offsets whereas scratch slots are addressed using dynamic absolute indexes. The stack is a good fit for value based semantics as each value is a copy, whereas scratch slots are a good fit for reference based semantics as each value can be uniquely addressed.

In our target high level languages (Python and JavaScript), the builtin arrays (or lists) types have reference based semantics and mutable API’s. Meaning that when an array is modified, all references to that array see the modification.

The developer experience with Puya Arrays will depend on the API design (mutable vs immutable) and where those arrays reside in memory (stack vs scratch) this ADR will outline the pros and cons of some different approaches with these facets in mind.

Requirements

Provide a more efficient array implementation for general purpose use
Provide an API that communicates clearly the effect of operations and is semantically correct
Support a broad range of types (including non ARC-4 types)

Principles

AlgoKit Guiding Principles - specifically Seamless onramp, Meet developers where they are, Leverage existing ecosystem
Algorand Python Principles - specifically Least surprise and Leverage existing ecosystem by maintaining semantic compatibility.
Algorand TypeScript Guiding Principles - specifically TEALScript compatibility and minimise Algorand Python disparity.

Options

Option 1. Support mutable array and element API with a stack based implementation (e.g. ARC4 style)

This option introduces a mutable array API, which is familiar and would closely align with native collection types in all currently supported languages (arrays in TypeScript, lists in Python). A stack based implementation is relatively simple to implement in the AVM as it is a stack based virtual machine. Additionally the existing arc4 arrays are stack based and we’d likely repurpose a large portion of the existing functionality.

However a mutable API implies reference semantics, but a stack based implementation is semantically incompatible with this, so various restrictions are required to make the API semantically compatible.

Supported Operations

const arr = new Array<uint64>();
// add elements
arr.push(42);
arr.push(123);
// replace element
arr[0] = 123;
// remove element
arr.pop();
// read element
const value = arr[0];
// pass to subroutine without .copy() - additional complexity in compiler to support this
someFunction(arr);

// awkward consequences
// forced copies - enforcing this is complex and has been a source of bugs
const arr2 = arr.copy();
structs.push(someStruct.copy());

Unsupported Operations

const structs = new Array<MutableStruct>();

//unsupported - introducing additional references without a .copy()
const arr2 = arr;
structs.push(someStruct);
structs[0] = someStruct;

// passing an array multiple times
someFunction(arr, arr);

Pros

A mutable API supports both mutable and immutable elements and arrays have an API that is similar to the native collection API
Can repurpose the existing ARC-4 arrays and extend to support non ARC-4 values

Cons

There is a risk of subtle unknown bugs due to the incompatibility of the API and the implementation
Requires .copy() whenever introducing additional references to a mutable array or struct
Requires additional checks in the compiler to detect when .copy() is needed
To reduce awkwardness, we’d want to reduce the need for .copy() where possible i.e. implicitly updating array values when passed to a subroutine
The total array memory usage is limited to 4096 bytes

Option 2. Support immutable array and element API with a stack based implementation

This option introduces an immutable array API, which may be less familiar to developers, but is not uncommon (e.g. Redux), however it is a more natural fit for a stack based implementation. Arrays typically have reference semantics, however with an immutable API there isn’t an observable difference between an immutable reference based array and an immutable value based array.

Supported Operations

let arr = new Array<uint64>();

// add elements
arr = arr.concat(42, 123);
// replace element
arr = arr.with(0, 123);
// remove elements
arr = arr.slice(0, 1);
// read element
const value = arr[index];
// pass to subroutine, does not require additional complexity to support this
someReadonlyFunction(arr);
// for a subroutine to return a modified array, it must be an explicitly returned value
arr = someFunctionThatMutates(arr);

let arr2 = new Array<ImmutableStruct>();
arr2 = arr2.concat(someStruct);
// mutating elements requires creating a copy of the new element and the array
arr2 = arr2.with(0, someStruct.evolve((someProp = "someValue")));

Unsupported Operations

// unsupported operations
arr.push(...)
arr.pop(...)
arr.splice(...)
arr2[0].someProp = "someValue"

// mutatating an array without returning the modified copy
someFunctionThatMutates(arr)

Pros

The API is semantically compatible with the implementation making it easy to reason about
It won’t require .copy() or other “tricks” to ensure semantic compatibility

Cons

Will result in more verbose code as each modification requires reassignment
Is less familiar and more verbose for those used to mutable array and element API’s
The total array memory usage is limited to 4096 bytes

Option 3. Support mutable array with immutable element API with a scratch based implementation

This option introduces a scratch based array, which is a better fit for a mutable API as it has reference semantics, which makes for a more familiar developer experience. Because of the constraints that exist with scratch slots, supporting mutable elements could result in exhausting available resources in unpredictable ways. As a result only immutable elements shoud be supported.

Supported Operations

const arr = new Array<uint64>();
// add elements
arr.push(42);
arr.push(123);
// replace element
arr[0] = 123;
// remove element
arr.pop();
// read element
const value = arr[0];

const structs = new Array<ImmutableStruct>();
// reassigning still points to the same array
const sameStructs = structs;
// can modify array
structs.push(someStruct);
// can replace elements
structs[0] = someStruct.evolve((someProp = "someValue"));

Unsupported Operations

//unsupported mutable types
const structs = new Array<MutableStruct>();
//modifications to elements
structs[0].someProp = "someValue";

Pros

A familiar API
Won’t require .copy() or other “tricks” to ensure semantic compatibility
Has predicable scratch slot usage

Cons

Is restricted to immutable elements
Still has potential to exhaust scratch slot allocations when a large number of arrays are used
The total array memory usage is limited to 4096 bytes

Option 4. Support mutable array and element API with a scratch based implementation

This option is similar to option 3, but expands support to include mutable elements. In this implementation each mutable element would reside in it’s own slot so that multiple references can all address it independently.

Supported Operations

const arr = new Array<uint64>()
// add elements
arr.push(42)
arr.push(123)
// replace element
arr[0] = 123
// remove element
arr.pop()
// read element
const value = arr[0]

const structs = new Array<MutableStruct>()
structs.push(someStruct)
// reassigning still points to the same array
const structs2 = structs
// modifying an element updates all referencies
structs2[0].someProp = "someValue"
assert structs[0].someProp == structs2[0].someProp
// can pass the same reference multiple times
// any mutations to either reference will be seen in calling function
someFunctionThatMutates(structs, structs2)

Pros

A familiar and fully supported API
An array of mutable elements is not limited to 4096 bytes, however each element would still have this limit

Cons

Easily possible to exhaust scratch slots quickly when working with an array of mutable elements as each element would require it’s own slot
It’s the most complex option to fully implement

Preferred option

In an ideal world being able to fully support option 4 would be great, however due to the constrained nature of the AVM it’s not really feasible. The other options on their own will work, however there is some differences to what developers would be used to. As a result we feel that a combination of both option 2 and option 3 would give developers the familiarity and control they would expect. If the front-end language authors choose to leverage a native collection type, having both options available also allows them to choose which implementation makes the most sense.

Selected option

We have chosen to implement option 2 and option 3 in the form of ImmutableArray and MutableArray variants. The exact naming and API design will be fleshed out as support is built into the Algorand Python and Algorand TypeScript languages.