Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for symbolic storage #148

Closed
karmacoma-eth opened this issue Jul 26, 2023 · 15 comments
Closed

Add support for symbolic storage #148

karmacoma-eth opened this issue Jul 26, 2023 · 15 comments
Assignees
Labels
bug Something isn't working

Comments

@karmacoma-eth
Copy link
Collaborator

Describe the bug

External call encountered an issue at SSTORE: symbolic storage base slot: Concat(0, Extract(159, 0, p_alice_address))

To Reproduce

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.15;

import "forge-std/Test.sol";
import {SymTest} from "halmos-cheatcodes/SymTest.sol";

contract Dummy {}

/// @notice don't use, this is very buggy on purpose
contract SmolWETH {
    function deposit() external payable {
        // bug: we stomp any existing balance
        assembly {
            sstore(caller(), callvalue())
        }
    }

    function withdraw(uint256) external {
        assembly {
            // revert if msg.value > 0
            if gt(callvalue(), 0) {
                revert(0, 0)
            }

            let amount := sload(caller())
            let success := call(gas(), caller(), amount, 0, 0, 0, 0)

            // bug: we always erase the balance, regardless of transfer success
            // bug: we should erase the balance before making the call
            sstore(caller(), 0)
        }
    }

    function balanceOf(address account) external view returns (uint256) {
        assembly {
            mstore(0, sload(account))
            return(0, 0x20)
        }
    }
}

contract SmolWETHTest is Test, SymTest {
    SmolWETH weth;

    function setUp() public {
        weth = new SmolWETH();
    }

    function test_deposit_once(address alice, uint256 amount) external {
        // fund alice
        vm.deal(alice, amount);

        // alice deposits
        vm.prank(alice);
        weth.deposit{value: amount}();

        // alice's balance is updated
        assertEq(weth.balanceOf(alice), amount);
    }
}
halmos --contract SmolWETHTest --function test_deposit_once  
@karmacoma-eth karmacoma-eth added the bug Something isn't working label Jul 26, 2023
@plotchy
Copy link
Contributor

plotchy commented Aug 16, 2023

Note that sloads and sstores at keccak(..) is heavily used across Solady. Runs into this same issue when testing.

[FAIL] check_balanceOfBeforeMints(address) (paths: 0/4, time: 0.02s, bounds: [])
WARNING:Halmos:STATICCALL failed: External call encountered an issue at SLOAD: symbolic storage base slot: sha3_224(Concat(Extract(159, 0, p_to_address), 745959789804126208))

Potentially can we treat sload's at sha based on the arguments instead of the hash output?

https://github.com/Vectorized/solady/blob/0586b13309275f95cc368ef7da65fc17a0dbd7b3/src/tokens/ERC721.sol#L167-L169
https://github.com/Vectorized/solady/blob/0586b13309275f95cc368ef7da65fc17a0dbd7b3/src/tokens/ERC721.sol#L180-L182
https://github.com/Vectorized/solady/blob/0586b13309275f95cc368ef7da65fc17a0dbd7b3/src/tokens/ERC721.sol#L212-L215

@daejunpark daejunpark self-assigned this Aug 16, 2023
@daejunpark
Copy link
Collaborator

Potentially can we treat sload's at sha based on the arguments instead of the hash output?

Could you elaborate more on it? What do you mean by "arguments" vs "output"?

@plotchy
Copy link
Contributor

plotchy commented Aug 17, 2023

Sorry, I think I misinterpreted the results I was getting. This issue is moreso symbolic storage values and not related to storage locations from the result of hashing.

This is an example depicting what I thought the issue would be related to (hashing a symbolic value and using the result for an sload), but this runs fine! Nice!

function check_keccak_matching_3(uint c) public {
    bytes32 a;
    bytes32 b;
    assembly {
        mstore(0x00, c)
        a := sload(keccak256(0x00, 0x20))
        mstore(0x20, c)
        b := sload(keccak256(0x20, 0x20))
    }
    assert(a == b);
}

@plotchy
Copy link
Contributor

plotchy commented Aug 17, 2023

What would it take to hack together storage support for arbitrary sha3_sizes?

referring to this part of the codebase:

halmos/src/halmos/sevm.py

Lines 915 to 923 in e3aea14

if loc.decl().name() == "sha3_512": # m[k] : hash(k.m)
args = loc.arg(0)
offset, base = simplify(Extract(511, 256, args)), simplify(
Extract(255, 0, args)
)
return self.decode_storage_loc(base) + (offset, con(0))
elif loc.decl().name() == "sha3_256": # a[i] : hash(a)+i
base = loc.arg(0)
return self.decode_storage_loc(base) + (con(0),)

Solady uses custom storage slot calculations. In the case of balanceOf it appears as
sha3_224(Concat(halmos_to_address_01, 745959789804126208)).
Currently looks like sha_size of 512 and 256 are supported by halmos.

My goal is to just check updates against the same storage slot:

function check_MintBalanceUpdate() public {
    address to = svm.createAddress('to');
    uint tokenId = svm.createUint256('tokenId');
    uint oldBalanceTo = token.balanceOf(to);
    token.mint(to, tokenId);

    uint newBalanceTo = token.balanceOf(to);

    assert(token.ownerOf(tokenId) == to);
    assert(newBalanceTo >= oldBalanceTo); // ensuring no overflow
    assert(newBalanceTo == oldBalanceTo + 1);
}

For this usecase I would think the sha_size can be any arbitrary value.
Any general pointers would be appreciated! :) @karmacoma-eth @daejunpark

here's a repro gist. One big flattened file.
https://gist.github.com/plotchy/9f378a435c557322f1800547a01821b9

@karmacoma-eth
Copy link
Collaborator Author

thanks for the repro! I also noticed the same issue when looking at SoladyERC20. I don't have a direct answer for how to solve this but I'll surface my high level intuition: the current storage model is tailored to solidity storage conventions, for which it works very well. OTOH it doesn't support alternative storage conventions like the ones in Solady, Vyper, Huff (cc @Philogy), etc. So it would be awesome to not just go for a short term fix but rather think about how we want to deal with storage in general. Trying to think of a couple options:

  1. dedicated storage "plugins" -- we already have a Solidity plugin, do we want to make a Vyper plugin, a Solady plugin, etc? It would then be possible to configure them on demand with something like halmos --storage-model=solady
  2. a generic (but inefficient) storage model with fully symbolic locations. Instead of aborting like today, we could fallback to this slower model
  3. tweak the current model to make it a "mega-model" that tries to accommodate more use cases?
  4. something else...?

I am a bit biased against option (3.) because the current model is already very complicated IMO, but curious to hear what other people think

@daejunpark
Copy link
Collaborator

We previously discussed this internally, and our current plan revolves around the first option in Karma's answer.

Specifically, we can provide preset storage layouts for known use cases and allow users to specify their own custom storage layout if it is not already provided.

We also considered more general approaches that don't rely on specific storage layouts. Indeed, we came up with a scheme for that, but we found some usability issues, such as difficulties in displaying counterexamples around complex storage mappings, and in implementing cheatcodes that allow for creating a symbol for an entire storage mapping.

@plotchy
Copy link
Contributor

plotchy commented Aug 17, 2023

I'm onboard with the preset+custom storage layouts. Sounds like a good mix of providing support for arbitrary projects but not accruing mega-model complexity or inefficiencies with a general model.

Considering that vyper/huff/solidity are equals under the halmos view, do you see the definition of the storage layout using something like json/yaml?

Spitballing here. With halmos as the consumer of this layout, how do you see it integrating together? User provides something like a Layout.yaml and halmos parses this during runtime? Or upon aborting due to this issue on a first pass, halmos could generate a layout file where it needs a user to supply extra information to work correctly on a second pass?

What kinds of information does halmos need from this layout to properly perform this portion?:

difficulties in displaying counterexamples around complex storage mappings, and in implementing cheatcodes that allow for creating a symbol for an entire storage mapping.

vyper: keccak256(uint256(slot) . uint256(key))
solidity: keccak256(uint256(key) . uint256(slot))
solady: keccak256(address . uint32(0x7d882553)), ... among others
#154

@daejunpark
Copy link
Collaborator

do you see the definition of the storage layout using something like json/yaml?

A language-independent medium would be preferable, but a specific format for describing the potentially recursive definition of layouts is not yet determined. I expect that we can gather more data as we implement various preset layouts.

User provides something like a Layout.yaml and halmos parses this during runtime? Or upon aborting due to this issue on a first pass, halmos could generate a layout file where it needs a user to supply extra information to work correctly on a second pass?

I was thinking of the former, but the latter also seems like a good idea / feature that can be added too.

What kinds of information does halmos need from this layout

We need to reconstruct source-level expressions from the recursive hash arguments to better display counterexample values in the storage. For that, we also need to know how to associate slot numbers with storage variable names.

For the cheatcode that creates a symbol for an entire storage mapping, we need to maintain separate SMT arrays for different mappings. This requirement makes the general approach less viable. But, that said, I'm not sure if we really need such a cheatcode in practice.

@plotchy
Copy link
Contributor

plotchy commented Aug 17, 2023

I hesitate to hold the solidity layout as the best starting place since the other evm langs have had to deal with solidity becoming the first mover in regards to abi-encoding, fn selectors, etc. This seems like a good opportunity to take nice inspiration from vyper with storage layout.

that being said, bringing an instructive example of a solidity layout here.

contract InstructiveStorage {
    struct Struct {
        bool a;
        address b;
    }
    bool flag;
    uint8 eight;
    uint16 sixteen;
    address addr;
    mapping(address => bool) private foo;
    mapping(address => mapping(address => uint256)) private bar;
    mapping(bool => mapping(address => Struct)) private baz;
    string private _name;
}

Taking a peek at the storageLayout JSON and a first pass at converting to yaml. snipping a bunch of entries to keep it brief.

"storageLayout": {
    "storage": [
      {
        "astId": 7,
        "contract": "src/storage.sol:Storage",
        "label": "flag",
        "offset": 0,
        "slot": "0",
        "type": "t_bool"
      },
      {
        "astId": 23,
        "contract": "src/storage.sol:Storage",
        "label": "bar",
        "offset": 0,
        "slot": "2",
        "type": "t_mapping(t_address,t_mapping(t_address,t_uint256))"
      },
      {
        "astId": 30,
        "contract": "src/storage.sol:Storage",
        "label": "baz",
        "offset": 0,
        "slot": "3",
        "type": "t_mapping(t_bool,t_mapping(t_address,t_struct(Struct)5_storage))"
      },
   "types": {
      "t_mapping(t_address,t_mapping(t_address,t_uint256))": {
        "encoding": "mapping",
        "key": "t_address",
        "label": "mapping(address => mapping(address => uint256))",
        "numberOfBytes": "32",
        "value": "t_mapping(t_address,t_uint256)"
      },
      "t_mapping(t_address,t_struct(Struct)5_storage)": {
        "encoding": "mapping",
        "key": "t_address",
        "label": "mapping(address => struct Storage.Struct)",
        "numberOfBytes": "32",
        "value": "t_struct(Struct)5_storage"
      },

for the yaml, I trimmed out "contract" and "astId".

InstructiveStorage:
  storage:
    - label: "flag"
      offset: 0
      slot: "0"
      type: "t_bool"
    - label: "eight"
      offset: 1
      slot: "0"
      type: "t_uint8"
    - label: "bar"
      offset: 0
      slot: "2"
      type: "t_mapping(t_address,t_mapping(t_address,t_uint256))"
    - label: "baz"
      offset: 0
      slot: "3"
      type: "t_mapping(t_bool,t_mapping(t_address,t_struct(Struct)5_storage))"
    - label: "_name"
      offset: 0
      slot: "4"
      type: "t_string_storage"
  types:
    t_address:
      encoding: "inplace"
      label: "address"
      numberOfBytes: "20"
    t_mapping(t_address,t_mapping(t_address,t_uint256)):
      encoding: "mapping"
      key: "t_address"
      label: "mapping(address => mapping(address => uint256))"
      numberOfBytes: "32"
      value: "t_mapping(t_address,t_uint256)"
    t_mapping(t_bool,t_mapping(t_address,t_struct(Struct)5_storage)):
      encoding: "mapping"
      key: "t_bool"
      label: "mapping(bool => mapping(address => struct Storage.Struct))"
      numberOfBytes: "32"
      value: "t_mapping(t_address,t_struct(Struct)5_storage)"
    t_struct(Struct)5_storage:
      encoding: "inplace"
      label: "struct Storage.Struct"
      numberOfBytes: "32"
      members:
        - label: "a"
          offset: 0
          slot: "0"
          type: "t_bool"
        - label: "b"
          offset: 1
          slot: "0"
          type: "t_address"
    # snip

overall this tiny exercise has shown me that defining a custom layout would feel like pure pain. I can definitely see myself wanting to use halmos to verify contracts in different languages. Would like your perspectives on what fields (if any) can be trimmed from this, and what ones would need to be added.

ie: more information on "encoding" is necessary. "mapping" is a solidity definition of keccak(a . b)

@daejunpark
Copy link
Collaborator

Perhaps I used confusing terminology. I'm not referring to the Solidity storage layout format itself, but to something that can be used to decode hash values for storage access.

There are two different types of information needed for different purposes:

  1. How to encode the slot for compound type storage entries, such as mappings, arrays, and their nested forms. This is necessary for symbolic execution.

  2. How to associate the raw slots with their source variables. This is more about usability.

The Solidity storage layout is more about (2), not (1). Also, (1) is specific to languages, not programs.

But again, we can get more data for a discussion on the specific format for either (1) or (2), once we implement preset versions. Will keep you updated!

@daejunpark
Copy link
Collaborator

BTW, if you have some representative example Huff programs for commonly used storage layouts, that would be really helpful.

@plotchy
Copy link
Contributor

plotchy commented Aug 17, 2023

Very nice explainer! Appreciate the breakdown

Huffmate is the go-to for huff examples. Mostly huff is used as a learning tool or for CTFs. Rare to see it used outside of those contexts.

Huffmate ERC20:
https://github.com/huff-language/huffmate/blob/3211a84d9d98bc13e2f0d912f35ce1b46f6c869c/src/tokens/ERC20.huff#L33-L41
Huffmate lib for mapping logic (imported for most of huffmate):
https://github.com/huff-language/huffmate/blob/main/src/data-structures/Hashmap.huff

@Philogy 's METH implementation:
https://github.com/Philogy/meth-weth/blob/abb2ca7187942d2e5bf1921d7162e1c2ee33d047/src/METH_WETH.huff#L346-L362

Mostly these huff contracts still use layouts that follow solidity convention. FWIW, I only know of Solady as far as popular contracts that deviate from standards.

@Philogy
Copy link

Philogy commented Aug 18, 2023

@Philogy 's METH implementation:
https://github.com/Philogy/meth-weth/blob/abb2ca7187942d2e5bf1921d7162e1c2ee33d047/src/METH_WETH.huff#L346-L362

To clarify METH does not follow solidity's conventions, you can look at the readme for the deets

@daejunpark
Copy link
Collaborator

daejunpark commented Sep 15, 2023

@plotchy @Philogy This issue should be fixed by #192, included in the v0.1.5 release. Please add --custom-storage-layout option to enable general storage layouts. (No need to provide the definition of a custom layout.) Please let us know if you encounter any further issues!

@daejunpark
Copy link
Collaborator

FYI, the option name is changed to: --storage-layout=generic, by #194 of the v0.1.6 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants