Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align audio array shape #45

Open
wants to merge 11 commits into
base: dev
Choose a base branch
from
Open

Conversation

emanuele-moscato
Copy link
Collaborator

@emanuele-moscato emanuele-moscato commented Jul 11, 2024

Various fixes:

  • Align the audio array reshaping with what librosa needs for resampling (when working with mono audio, librosa requires the audio array to have shape (n_samples,), not (n_samples, 1) as we have).
  • Fix bug related to input features to speech models (sometimes the feature extractor returns input tensors under the input_values key, sometimes under the input_features one). Note: should this be done for all the model helpers?
  • In utils_removal.py, fix the case in which the offset a subtracted from the word start timestamp would have caused the audio to be read from the end.
  • In utils_removal.py, fix the paths for the (local) mp3 files with white and pink noise.
  • Added tests for text and speech
  • Readme updated and Testing.md created

To do after merging the testing_environment branch:

  • Make sure that tests still work.
  • Add pytest among the requirements (pyproject.toml).
  • Try creating a new environment from scratch, install all the requirements (watch out for the manual ones!) and make sure tests still run.

@emanuele-moscato emanuele-moscato requested a review from g8a9 July 11, 2024 12:59
@emanuele-moscato emanuele-moscato self-assigned this Jul 11, 2024
@emanuele-moscato emanuele-moscato changed the base branch from main to dev July 11, 2024 12:59
emanuele-moscato and others added 6 commits July 11, 2024 17:12
…so added an audio that I use to test the functions of the benchmark_speech

bug: bug fix in gradient_speech_explainer.py because it should have been commented the transcription part since the whole FerretAudio was changed and its functions changed

fix: there is an update on ctranslate2, so I added a piece of code that fixes that takes into account that update of ctranslate2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants