-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
training csv files and extracted features doesnt match? #21
Comments
Hi, When u extracted log-mel feature files, please make sure u included all the files that u want to use for training and testing in the csv file. https://github.com/MihawkHu/DCASE2020_task1/blob/master/task1a/3class/extr_feat_2020_nodelta_scaled.py#L12 Besides, the data set might be changed so some old audio files might have been removed. #16 (comment) |
I just checked the file u mentioned, logmel128_scaled/airport-paris-206-6247-b.logmel, which is the first audio of the evaluation set. So I think u may try to change the csv file name in https://github.com/MihawkHu/DCASE2020_task1/blob/master/task1a/3class/extr_feat_2020_nodelta_scaled.py#L12, to point to https://github.com/MihawkHu/DCASE2020_task1/blob/master/task1a/3class/evaluation_setup/fold1_evaluate_3class.csv. And then run the script extr_feat_2020_nodelta_scaled.py again. Thus the acoustic features for the evaluation set will be extracted. |
i am following the task from the basic i open https://github.com/MihawkHu/DCASE2020_task1/tree/master/task1a/3class extracted feature by using script [extr_feat_2020_nodelta_scaled.py] by giving path to DCASE 2020 3class dataset file_path='nouman/datasets/d20/TAU-urban-acoustic-scene-2020-3class-development/' |
the most confusing thing is that the TAUTAU-urban-acoustic-scene-2020-3class-development dataset only contain audio from Device A but the https://github.com/MihawkHu/DCASE2020_task1/tree/master/task1a/3class/evaluation_setup csv files contain data from other devices also . |
for this task first i have to download 'TAU-urban-acoustic-scene-2020-3class-development' dataset and 'TAU-urban-acoustic-scene-2020-10class-development' dataset and unzip these two in one file that is 'data/dcase_audio/'. |
i dont know have to create fold1_all.csv .it should be created manually ? |
I guess now I understand what's the problem. There are two sub-tasks of 2020 task1. Task 1a is referred to the 10-class data set, whereas task 1b is referred to the 3-class data set. You can go to the official website of DCASE for more details https://dcase.community/challenge2020/index, where task 1a focuses on device robustness, thus the dataset includes audios from different devices. Task 1b focuses on model complexity. The folder u mentioned here, https://github.com/MihawkHu/DCASE2020_task1/tree/master/task1a/, includes our system for task 1a. Here the 3 class and 10 class is our proposed two-stage system, which u can refer to our ICASSP paper for more details. https://arxiv.org/abs/2011.01447. It does not mean the final target is to do the 3-class classification, that is in task 1b. So the csv file u used here, it's from the task 1b data set, thus there are some differences. Please use the 10-class data set, as the csv and all others can match. |
the task 1a also two subtask 3class and 10class and then we have fusion of those two. so please have a look on the csv files . how arrange the paths of csv files in my experiments .in 3class data their is only one device that is device A but their train csv contain files from devices |
Hi, For task 1a, the 3-class and 10-class share the same data, where the only difference is the labels. I checked the 3-class data csv file. Both the training, https://github.com/MihawkHu/DCASE2020_task1/blob/master/task1a/3class/evaluation_setup/fold1_train_3class.csv, and the testing, https://github.com/MihawkHu/DCASE2020_task1/blob/master/task1a/3class/evaluation_setup/fold1_evaluate_3class.csv, have all the devices. Please point me out if I misunderstood anything. For your questions, |
thanks alot for the reply |
after concatenating https://github.com/MihawkHu/DCASE2020_task1/blob/master/task1a/3class/evaluation_setup/fold1_train_3class.csv and https://github.com/MihawkHu/DCASE2020_task1/blob/master/task1a/3class/evaluation_setup/fold1_evaluate_3class.csv to get fold1_all.csv i am getting another error which is given file name error. During handling of the above exception, another exception occurred: Traceback (most recent call last): |
Hi, u may need to remove the header of the second csv. |
by concatenating i got this fold1_all.csv |
by concatenating i got this fold1_all.csv |
If u did in this way, you may need to remove line 16937: "filename scene_label". You can find more information about NumPy and Pandas at https://numpy.org/doc/ and https://pandas.pydata.org/docs/index.html. |
dear sir i have read you technical report and tried every way to reproduce results and run this code but unfortunately i didn't succeeded. can you please give me a short overview from the start to get implement this code .please point out the csv files if its not in directory upload it here |
Hi, I will definitely help you with your problems and questions. Please let me know if the previous solution can solve your problem. |
i am trying now to mix both the 3 and 10-class development dataset but for the feature extraction which file shoiuld i use because the fold1_alll is giving the error of filename? |
Did you try this? I'm not with my PC now so I can not help you to generate it. If you feel difficult to handle csv or pandas, please try to set the 'csv_file' at https://github.com/MihawkHu/DCASE2020_task1/blob/master/task1a/3class/extr_feat_2020_nodelta_scaled.py#L12 to 'evaluation_setup/fold1_evaluate_3class.csv' and 'evaluation_setup/fold1_train_3class.csv', then run them respectively. The script here, https://github.com/MihawkHu/DCASE2020_task1/blob/master/task1a/3class/extr_feat_2020_nodelta_scaled.py, is to extract and save acoustic features, so indicate any file list you want to use ('csv' file) then you can get extracted features. In our paper, we described that we used different setups (with/without different data augmentation strategies) so here in the codes, we did not indicate a specific csv file to avoid confusion. You can use any csv file that you want to have for training and testing in your experiments. For your case, I would suggest you skip the data augmentation part at the first step for simplicity. So you can just extract features for the training and testing set which is suggested by the original data set. Please try the solutions I mentioned here. Besides, please note that 10 class and 3 class classifiers used the same audio data, the only difference is the labels. So you don't need to extract features twice for them. |
you mean one 10-class datatset is enough to extract features from ? |
Yes, for task 1a, the 3-class and 10-class share the same data, where the only difference is the labels. |
i think i did mistake i extract features from the original 3-class dataset based on their original meta.csv file which involves only device A datasets . |
the three class csv contain the labels as indoor ,outdoor and transportation so how can i extract features from 10-class audio files based on 3-class csv labels ? |
Yes, for task 1a, the 3-class and 10-class share the same data, where the only difference is the labels. You dont need to extract 10-class acoustic features again cuz they are the same. You can refer to our paper for more details about our proposed procedure. https://arxiv.org/abs/2011.01447 |
but whenever we are extracting the features the audio files name and their labels in csv should the same to extract features accordingly. i mean we cant extract the features through the csv files included in the evaluation setup for 3-class ...am i right? |
The 3-class data set you mentioned before is for task 1b of DCASE 2020, it has only device A data and the target point is the model complexity. As for task 1a, the goal is to do the device-robust 10 class classification. You can find more information about the two tasks here, https://dcase.community/challenge2020/task-acoustic-scene-classification#subtask-a. We proposed a two-stage system for task 1a, as described in our paper, https://arxiv.org/abs/2011.01447. You can surely use our task1a code to build a system for task 1b dataset (the 3-class data set on the official website, only device A), just change the corresponding csv file and paths. |
first i want to implement task1a please clear me and upload the csv files used in the subtask1a |
Did you try this? |
the main thing is the feature extraction csv that is "fold1_all.csv" is not available anywhere in this gitub directory when you get please upload it |
Hi, did you read my sentences here? " In our paper, we described that we used different setups (with/without different data augmentation strategies) so here in the codes, we did not indicate a specific csv file to avoid confusion. You can use any csv file that you want to have for training and testing in your experiments. For your case, I would suggest you skip the data augmentation part at the first step for simplicity. So you can just extract features for the training and testing set which is suggested by the original data set. Please try the solutions I mentioned here. You may generate your fold1_all.csv by simply concatenating https://github.com/MihawkHu/DCASE2020_task1/blob/master/task1a/3class/evaluation_setup/fold1_train_3class.csv and https://github.com/MihawkHu/DCASE2020_task1/blob/master/task1a/3class/evaluation_setup/fold1_evaluate_3class.csv. I think you did this before but you need to remove the header of the second file (the first line), otherwise the pandas read csv function would not work (see my answers above). Alternatively, you can run this script for training and testing, respectively. |
thank you much its so late here in my area i am trying and will inform you if their occur in problem again ..i am really grateful to you |
Hi i am trying to run your codes i have successfully extracted the feature by using 3class Dcase 2020 datasets through script "extr_feat_2020_nodelta_scaled.py" after i run the command "python train_fcnn.py " the below error occur please help me to resolve this .
2022-09-13 18:33:39.709410: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55f9c6bffc90 executing computations on platform CUDA. Devices:
2022-09-13 18:33:39.709454: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
Traceback (most recent call last):
File "/data/nouman/Task1_lcasc/DCASE2020_task1-master/DCASE2020_task1-master/task1a/3class/fcnn/train_fcnn.py", line 61, in
data_val, y_val = load_data_2020(feat_path, val_csv, num_freq_bin, 'logmel')
File "/data/nouman/Task1_lcasc/DCASE2020_task1-master/DCASE2020_task1-master/task1a/3class/fcnn/utils.py", line 27, in load_data_2020
with open(filepath,'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/data/nouman/Task1_lcasc/DCASE2020_task1-master/DCASE2020_task1-master/task1a/3class/features/logmel128_scaled/airport-paris-206-6247-b.logmel'
The text was updated successfully, but these errors were encountered: