Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probably just me having issues with peptide.py #92

Open
Zitzeronion opened this issue Sep 6, 2024 · 5 comments
Open

Probably just me having issues with peptide.py #92

Zitzeronion opened this issue Sep 6, 2024 · 5 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@Zitzeronion
Copy link
Contributor

Zitzeronion commented Sep 6, 2024

The peptide.py script produces an error for me:

Traceback (most recent call last):
  File ".../samples/peptide.py", line 195, in <module>
    print("Net charge analysis")
                                 
  File ".../lib/analysis.py", line 102, in block_analyze
    dt, n_warnings = get_dt(full_data, time_col, verbose = verbose) # check that the data was stored with the same time interval dt
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../lib/analysis.py", line 187, in get_dt
    if time_col in data.columns.to_list():
                   ^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'columns'

As far as I understand Z_pH does not have time intervals saved. I tried to find a fix using the tests folder but wasn't yet able to do so. I am using espresso v4.2 and pyMBE v0.8.

@kosovan
Copy link
Collaborator

kosovan commented Sep 6, 2024

I suspect that the actual issue is with the format of your observable file. It seems that it has been read improperly and the error has propagated up to this function. The data variable should be a pandas data frame but in your case it is a list.

@Zitzeronion
Copy link
Contributor Author

Z_pH is a an empty list same as Z_sim to which an column of a dataframe is appended. When I do this operation for the minimal example:

import numpy as np
import pandas as pd
S = []
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns = ['a', 'b', 'c'])
S.append(df["b"])
S

Than S is a list. I tried to extract charge_dict["mean"]into a seperate dataframe but still got issues with block_analyze and get_dt.

@pm-blanco
Copy link
Collaborator

pm-blanco commented Sep 6, 2024

You are actually both right. The error raises because block_analyze expects a PandasDataframe object as an input argument but it gets list input instead. However, this is happening because when we refactored block_analyze we did not upgrade peptide.py to comply with the new expected PandasDataframe input argument for that function instead than the deprecated list one. To solve this issue we need to:

  • upgrade peptide.py to comply with the new architechture of block_analyze, as we did in the other sample scripts for example in branched_polyampholyte.py
  • do a functional test that triggers peptide.py and checks that it works to ensure that new changes in the API do not break again the sample script without triggering our CI testing.
    Thank you @Zitzeronion for reporting this issue.

@pm-blanco pm-blanco added the bug Something isn't working label Sep 6, 2024
@pm-blanco pm-blanco added this to the pyMBE 1.0.0 milestone Sep 6, 2024
Zitzeronion added a commit to Zitzeronion/pyMBE that referenced this issue Oct 17, 2024
Trying to get the peptide script running see [issue](pyMBE-dev#92)
@Zitzeronion Zitzeronion mentioned this issue Oct 17, 2024
@Zitzeronion
Copy link
Contributor Author

Hello, I've been trying what @pm-blanco suggested and took inspiration from branched_polyampholyte.py. While it solves the dataframe issue it creates a new one.

In the analysis.py script the function get_dt() starts to divide by 0 and dt_init is set to 0. Below is the error message from running peptide.py with branched_polyampholyte.py changes:

ValueError: 
get_dt: Row 2 dt = 1.0000000000047748 = 173.000000000462 - 172.00000000045722 not equal to dt_init = 0.0

Printing the time (from time_series['time']) I saw multiple entries where no MD step was performed. In the notes of get_dt() this is mentioned, should dt_init just be hardcoded to 1?

@pm-blanco
Copy link
Collaborator

@Zitzeronion thank you for reporting this and for the typo fix in your PR.

The issue actually comes from another deprecated logic in how to run the simulation presented in peptide.py:

for step in range(Samples_per_pH+steps_eq):
        if np.random.random() > probability_reaction:
            espresso_system.integrator.run(steps=MD_steps_per_sample)        
        else:
            cpH.reaction( reaction_steps = total_ionisible_groups)

in here in one simulation step one either does MC moves or MD moves. We decided to remove this logic in our current scripts because when estimating the autocorrelation times of the observables sampled the sampling time became ambiguous. Instead, now we run both MC and MD in every simulation step:

for step in range(Samples_per_pH+steps_eq):
        espresso_system.integrator.run(steps=MD_steps_per_sample)        
        cpH.reaction( reaction_steps = total_ionisible_groups)

we want to enforce this logic when using our analysis framework, that is why get_dt raises a ValueError if it finds inconsistent time steps in the time series. We are currently working on updating peptide.py in #95, I will review that PR during this week and solve this issue as well.

pm-blanco pushed a commit that referenced this issue Oct 17, 2024
Trying to get the peptide script running see [issue](#92)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants