Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NameError: name 'data' is not defined for Windows aggmap #20

Open
shenwanxiang opened this issue Sep 26, 2023 · 1 comment
Open

NameError: name 'data' is not defined for Windows aggmap #20

shenwanxiang opened this issue Sep 26, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@shenwanxiang
Copy link
Owner

shenwanxiang commented Sep 26, 2023

RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\AdminCOOP\anaconda3\envs\aggmap\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Users\AdminCOOP\anaconda3\envs\aggmap\lib\site-packages\aggmap\utils\calculator.py", line 41, in _fuc
return _calculate(i1, i2)
File "C:\Users\AdminCOOP\anaconda3\envs\aggmap\lib\site-packages\aggmap\utils\calculator.py", line 23, in _calculate
x1 = data[:, i1]
NameError: name 'data' is not defined
"""

The above exception was the direct cause of the following exception:

NameError Traceback (most recent call last)
Cell In[1], line 11
8 dfy = pd.get_dummies(pd.Series(data.target))
10 # AggMap object definition, fitting, and saving
---> 11 mp = AggMap(dfx, metric = 'correlation')
12 mp.fit(cluster_channels=5, emb_method = 'umap', verbose=0)
13 mp.save('agg.mp')

File ~\anaconda3\envs\aggmap\lib\site-packages\aggmap\map.py:176, in AggMap.init(self, dfx, metric, by_scipy, n_cpus, info_distance)
174 self.info_distance = D.clip(0, np.inf)
175 else:
--> 176 D = calculator.pairwise_distance(dfx.values, n_cpus=n_cpus, method=metric)
177 D = np.nan_to_num(D,copy=False)
178 D_ = squareform(D)

File ~\anaconda3\envs\aggmap\lib\site-packages\aggmap\utils\calculator.py:67, in pairwise_distance(npydata, n_cpus, method)
65 N = data.shape[1]
66 lst = list(_yield_combinations(N))
---> 67 res = MultiProcessUnorderedBarRun(_fuc, lst, n_cpus=n_cpus)
68 dist_matrix = np.zeros(shape = (N,N))
69 for x,y,v in tqdm(res,ascii=True):

File ~\anaconda3\envs\aggmap\lib\site-packages\aggmap\utils\multiproc.py:111, in MultiProcessUnorderedBarRun(func, deal_list, n_cpus)
109 res_list = []
110 with pbar(total = len(deal_list), ascii=True) as pb:
--> 111 for res in p.imap_unordered(func, deal_list):
112 pb.update(1)
113 res_list.append(res)

File ~\anaconda3\envs\aggmap\lib\multiprocessing\pool.py:868, in IMapIterator.next(self, timeout)
866 if success:
867 return value
--> 868 raise value

NameError: name 'data' is not defined

@shenwanxiang
Copy link
Owner Author

You shouldn't expect the values of global variables that you set in the parent process to be automatically propagated to the child processes.

Your code happens to work on Unix-like platforms because on those platforms multiprocessing uses fork(). This means that every child processes gets a copy of the parent process's address space, including all global variables.

This isn't the case on Windows; every variable from the parent process that needs to be accessed by the child has to be explicitly passed down or placed in shared memory.

Once you do this, your code will work on both Unix and Windows.

Ref: https://stackoverflow.com/questions/6596617/python-multiprocess-diff-between-windows-and-linux

@shenwanxiang shenwanxiang self-assigned this Sep 26, 2023
@shenwanxiang shenwanxiang added the enhancement New feature or request label Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant