Replies: 1 comment
-
this basically happens when too many of the input audio files are rejected for being invalid. the dataset sampler is a bit jank; it simply recursively calls itself for a new item when the current item is invalid, and that causes a recursionerror if it sees dozens of bad files at once. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I got a recursion error from running the fine-tuning.
The GPT4 response is.
The error you're encountering is a RecursionError: maximum recursion depth exceeded. It seems to be occurring in the getitem method of your paired_voice_audio_dataset.py file. This error usually occurs when a function calls itself too many times, leading to a stack overflow.
Here's what's happening in your code:
When the getitem method encounters an exception (in this case, the RecursionError from the tokenizer), it attempts to recover by calling itself with a new index (return self[rv]).
This recovery process is repeated recursively until the maximum recursion depth is reached.
To fix this issue, you should refactor the getitem method to avoid recursion. One possible solution is to use a loop to retry a fixed number of times before raising an error. Here's an example of how you could modify the getitem method:
then it suggest this fix
`def getitem(self, index):
retry_count = 0
max_retries = 10 # You can set a different value depending on your needs.
I don't really trust GPT on generating a fix for it. Has anyone encounter a similar error?
Beta Was this translation helpful? Give feedback.
All reactions