Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while calculating standard errors #13

Open
achinmay17 opened this issue Dec 20, 2023 · 3 comments
Open

Error while calculating standard errors #13

achinmay17 opened this issue Dec 20, 2023 · 3 comments

Comments

@achinmay17
Copy link

Hi, I am trying to run Doubly Robust S-DID with unbalanced panel and varying base period.
the control group is 'not_yet_treated'
My code is as following:

    att_gt = ATTgt(data=diddata, cohort_name="course_month_end_date", base_period='varying', freq='M') 
    att_gt.fit(formula = formula, est_method='dr',control_group=control_group, progress_bar = True) 

however, I am getting following error which I am not able to understand

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[241], line 7
      4 diddata = diddata.reset_index().set_index(keys=['id','month_end_date'])
      6 att_gt = ATTgt(data=diddata, cohort_name="course_month_end_date", base_period='varying', freq='M')
----> 7 att_gt.fit(formula = formula, est_method='dr',control_group=control_group, progress_bar = False)

File ~/anaconda3/lib/python3.11/site-packages/differences/attgt/attgt.py:718, in ATTgt.fit(self, formula, weights_name, control_group, base_delta, est_method, as_repeated_cross_section, boot_iterations, random_state, alpha, cluster_var, split_sample_by, n_jobs, backend, progress_bar)
    688     res = get_att_gt(
    689         data=(
    690             self._data_matrix
   (...)
    714         ),
    715     )
    717     # standard errors & ci/cbands
--> 718     res = get_standard_errors(
    719         ntl=res,
    720         cluster_groups=cluster_groups,
    721         alpha=alpha,
    722         boot_iterations=boot_iterations,
    723         random_state=random_state,
    724         n_jobs_boot=n_jobs,
    725         backend_boot=backend,
    726         progress_bar=progress_bar,
    727         sample_name=s if s != "full_sample" else None,
    728         release_workers=s_idx == n_sample_names,
    729     )
    731     self._result_dict[s]["ATTgt_ntl"] = res
    733 self._fit_res = output_dict_to_dataframe(
    734     extract_dict_ntl(self._result_dict),
    735     stratum=bool(self._strata),
    736     date_map=self._map_datetime,
    737 )

File ~/anaconda3/lib/python3.11/site-packages/differences/attgt/attgt_cal.py:442, in get_standard_errors(ntl, cluster_groups, alpha, boot_iterations, random_state, backend_boot, n_jobs_boot, progress_bar, sample_name, release_workers)
    436     raise ValueError(
    437         "'boot_iterations' must be >= 0. "
    438         "If boot_iterations=0, analytic standard errors are computed"
    439     )
    441 # influence funcs + idx for not nan cols
--> 442 inf_funcs, not_nan_idx = stack_influence_funcs(ntl, return_idx=True)
    444 # create an empty array to populate with the standard errors
    445 se_array = np.empty(len(ntl))

File ~/anaconda3/lib/python3.11/site-packages/differences/attgt/utility.py:382, in stack_influence_funcs(ntl, return_idx)
    380     inf_funcs = inf_funcs.toarray()  # faster mboot if dense matrix
    381 else:
--> 382     inf_funcs = np.stack(
    383         [r.influence_func for r in ntl if r.influence_func is not None], axis=1
    384     )
    386 if return_idx:
    387     # indexes for the non-missing influence_func
    388     not_nan_idx = np.array(
    389         [i for i, r in enumerate(ntl) if r.influence_func is not None]
    390     )

File <__array_function__ internals>:200, in stack(*args, **kwargs)

File ~/anaconda3/lib/python3.11/site-packages/numpy/core/shape_base.py:460, in stack(arrays, axis, out, dtype, casting)
    458 arrays = [asanyarray(arr) for arr in arrays]
    459 if not arrays:
--> 460     raise ValueError('need at least one array to stack')
    462 shapes = {arr.shape for arr in arrays}
    463 if len(shapes) != 1:

ValueError: need at least one array to stack

Based on what I could understand from the package, it is not able to calculate standard errors. It would be great if you can help with debugging. Thanks.

@bernardodionisi
Copy link
Owner

Are all your cohorts very small? How unbalanced is the data? Would you be able to share some data to reproduce this error? A simulated dataset that contains the same entity-time structure and cohort composition should do. Thanks!

@jonahnieuwenhuijzen
Copy link

@bernardodionisi

Are all your cohorts very small? How unbalanced is the data? Would you be able to share some data to reproduce this error? A simulated dataset that contains the same entity-time structure and cohort composition should do. Thanks!

I have the same problem (ValueError: need at least one array to stack
) and the data is very unbalanced and cohorts are small. Do you have a solution?

@bernardodionisi
Copy link
Owner

Hi @jonahnieuwenhuijzen

have you tried different estimation methods? using the est_method parameter? The default is dr-mle, you may try dr-ipt which changes how the propensity scores are calculated. But could you try to experiment with other methods? Let me know if it helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants