Trying to use hypothesis to generate custom complex dataframes can be frustrating as the documentation is poor and is lacking examples. Many things I've done has been through trial and error and figuring things out manually.
Here, I will try to maintain some examples of simple and complex dataframe generation which can be used as a reference in the future.
I have not figured out how to generate unique strings. For this, I generate UUIDs and cast them to strings. For example, using pandas, this might look like:
import hypothesis.strategies as st
import hypothesis.extra.pandas as hpd
from hypothesis import given
import pandas as pd
from pandas.api.types import is_string_dtype
@given(hpd.dataframes(columns=[
hpd.column("unique_strs", st.uuids())
])
)
def test_unique_cols(df: pd.DataFrame) -> None:
df["unique_strs"] = df["unique_strs"].astype(str)
assert is_string_dtype(df["unique_strs"])
assert len(df) == df["unique_strs"].nunique()