You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, the MEGAnno paper mentions "Active suggestions" and "coverage" suggestions (Project.suggest_coverage).
I have looked for these functionalities both in the code and in the documentation, but I could not locate them.
Any pointers would be most helpful. I'm using the v1.5.4 of the code. Thanks!
The text was updated successfully, but these errors were encountered:
Thanks for your interests.
MEGAnno doesn't directly support active learning, instead it provides support for data to be selected in an active manners.
Below are some old scripts applying the random forest active learning on the Tweet dataset.
# load init and testing data
import modAL, sklearn, numpy as np
from meganno_client import subset
npzfile = np.load('tweet_init_test.npz')
X_init, y_init, X_test, y_test = npzfile['X_init'], npzfile['y_init'], npzfile['X_test'], npzfile['y_test']
#initialize learner
acc_list = []
learner = modAL.models.ActiveLearner(
estimator=sklearn.ensemble.RandomForestClassifier(),
query_strategy=modAL.uncertainty.uncertainty_sampling,
X_training=X_init, y_training=y_init
)
s_pool = demo.search(keyword='', meta_names=['bert-embedding'], limit=100, start=0)
uuid_pool,X_pool=[],[]
for item in s_pool.value():
uuid_pool.append(item['uuid'])
X_pool.append(item['metadata'][0]['value'])
# Active selection: let model select next batch
query_idx, query_inst = learner.query(X_pool,n_instances=3)
next_batch=subset.Subset(service=demo, data_uuids=list(np.array(uuid_pool)[query_idx]))
next_batch.show()# -> annotates in widget
# get y_new
y_new = []
for item in next_batch.value():
data_uuid = item['uuid']
labels = item['annotation_list'][0]['labels_record']
for l in labels:
if l['label_name']=='sentiment':
y_new.append(l['label_value'][0])
#remove from pool
X_pool,uuid_pool= np.delete(X_pool, query_idx, axis=0), np.delete(uuid_pool, query_idx, axis=0)
#update learner and compute accuracy
learner.teach(query_inst, np.array(y_new))
acc = learner.score(X_test, y_test)
acc_list.append(acc)
As for coverage suggestion, it could be implemented in various ways. Our previous implementation clusters all labeled data points and sample the unlabeled datapoints from with large average distance from the cluster centroids. It was excluded in the release due to efficiency issues.
Let me know if you need further clarification or interested in contributing. Thanks!
Hi, the MEGAnno paper mentions "Active suggestions" and "coverage" suggestions (Project.suggest_coverage).
I have looked for these functionalities both in the code and in the documentation, but I could not locate them.
Any pointers would be most helpful. I'm using the v1.5.4 of the code. Thanks!
The text was updated successfully, but these errors were encountered: