Podcastfy offers a range of customization options to tailor your AI-generated podcasts. This document outlines how you can adjust parameters such as conversation style, word count, and dialogue structure to suit your specific needs.
Podcastfy uses the default conversation configuration stored in podcastfy/conversation_config.yaml.
Parameter | Default Value | Type | Description |
---|---|---|---|
conversation_style | ["engaging", "fast-paced", "enthusiastic"] | list[str] | Styles to apply to the conversation |
roles_person1 | "main summarizer" | str | Role of the first speaker |
roles_person2 | "questioner/clarifier" | str | Role of the second speaker |
dialogue_structure | ["Introduction", "Main Content Summary", "Conclusion"] | list[str] | Structure of the dialogue |
podcast_name | "PODCASTIFY" | str | Name of the podcast |
podcast_tagline | "Your Personal Generative AI Podcast" | str | Tagline for the podcast |
output_language | "English" | str | Language of the output |
engagement_techniques | ["rhetorical questions", "anecdotes", "analogies", "humor"] | list[str] | Techniques to engage the audience |
creativity | 1 | float | Level of creativity/temperature (0-1) |
user_instructions | "" | str | Custom instructions to guide the conversation focus and topics |
max_num_chunks | 7 | int | Maximum number of rounds of discussions in longform |
min_chunk_size | 600 | int | Minimum number of characters to generate a round of discussion in longform |
Podcastfy uses the default TTS configuration stored in podcastfy/conversation_config.yaml.
default_voices
:question
: "Chris"- Default voice for questions in the podcast.
answer
: "Jessica"- Default voice for answers in the podcast.
model
: "eleven_multilingual_v2"- The ElevenLabs TTS model to use.
default_voices
:question
: "echo"- Default voice for questions using OpenAI TTS.
answer
: "shimmer"- Default voice for answers using OpenAI TTS.
model
: "tts-1-hd"- The OpenAI TTS model to use.
default_voices
:question
: "R"- Default voice for questions using Gemini Multi-Speaker TTS.
answer
: "S"- Default voice for answers using Gemini Multi-Speaker TTS.
model
: "en-US-Studio-MultiSpeaker"- Model to use for Gemini Multi-Speaker TTS.
language
: "en-US"- Language of the voices.
default_voices
:question
: "en-US-Journey-D"- Default voice for questions using Gemini TTS.
answer
: "en-US-Journey-O"- Default voice for answers using Gemini TTS.
default_voices
:question
: "en-US-JennyNeural"- Default voice for questions using Edge TTS.
answer
: "en-US-EricNeural"- Default voice for answers using Edge TTS.
default_tts_model
: "openai"- Default text-to-speech model to use.
output_directories
:transcripts
: "./data/transcripts"- Directory for storing generated transcripts.
audio
: "./data/audio"- Directory for storing generated audio files.
audio_format
: "mp3"- Format of the generated audio files.
temp_audio_dir
: "data/audio/tmp/"- Temporary directory for audio processing.
ending_message
: "Bye Bye!"- Message to be appended at the end of the podcast.
These examples demonstrate how conversations can be altered to suit different purposes, from academic rigor to creative storytelling. The comments explain the rationale behind each choice, helping users understand how to tailor the configuration to their specific needs.
This configuration transforms the podcast into a formal academic debate, encouraging deep analysis and critical thinking. It's designed for educational content or in-depth discussions on complex topics.
{
"word_count": 3000, # Longer to allow for detailed arguments
"conversation_style": ["formal", "analytical", "critical"], # Appropriate for academic discourse
"roles_person1": "thesis presenter", # Presents the main argument
"roles_person2": "counterargument provider", # Challenges the thesis
"dialogue_structure": [
"Opening Statements",
"Thesis Presentation",
"Counterarguments",
"Rebuttals",
"Closing Remarks"
], # Mimics a structured debate format
"podcast_name": "Scholarly Showdown",
"podcast_tagline": "Where Ideas Clash and Knowledge Emerges",
"engagement_techniques": [
"socratic questioning",
"historical references",
"thought experiments"
], # Techniques to stimulate critical thinking
"creativity": 0 # Low creativity to maintain focus on facts and logic
}
This configuration turns the podcast into an interactive storytelling experience, engaging the audience in a narrative journey. It's ideal for fiction podcasts or creative content marketing.
word_count: 1000 # Shorter to maintain pace and suspense
conversation_style:
- narrative
- suspenseful
- descriptive # Creates an immersive story experience
roles_person1: storyteller
roles_person2: audience participator # Allows for interactive elements
dialogue_structure:
- Scene Setting
- Character Introduction
- Rising Action
- Climax
- Resolution # Follows classic storytelling structure
podcast_name: Tale Spinners
podcast_tagline: Where Every Episode is an Adventure
engagement_techniques:
- cliffhangers
- vivid imagery
- audience prompts # Keeps the audience engaged and coming back
creativity: 0.9 # High creativity for unique and captivating stories
When using the Podcastfy Python package, you can customize the conversation by passing a dictionary to the conversation_config
parameter:
from podcastfy.client import generate_podcast
custom_config = {
"word_count": 200,
"conversation_style": ["casual", "humorous"],
"podcast_name": "Tech Chuckles",
"creativity": 0.7
}
generate_podcast(
urls=["https://example.com/tech-news"],
conversation_config=custom_config
)
When using the Podcastfy CLI, you can specify a path to a YAML file containing your custom configuration:
podcastfy --url https://example.com/tech-news --conversation-config path/to/custom_config.yaml
The custom_config.yaml
file should contain your configuration in YAML format:
word_count: 200
conversation_style:
- casual
- humorous
podcast_name: Tech Chuckles
creativity: 0.7
- The
word_count
is a target, and the AI may generate more or less than the specified word count. Low word counts are more likely to generate high-level discussions, while high word counts are more likely to generate detailed discussions. - The
output_language
defines both the language of the transcript and the language of the audio. Here's some relevant information:- Bottom-line: non-English transcripts are good enough but non-English audio is work-in-progress.
- Transcripts are generated using Google's Gemini 1.5 Pro by default, which supports 100+ languages. Other user-defined models may or may not support non-English languages.
- Audio is generated using
openai
(default),elevenlabs
,gemini
,geminimulti
oredge
TTS models.- The
gemini
(Google) TTS model supports multiple languages and can be controlled by theoutput_language
parameter and respective voice choices. Eg.output_language="Tamil"
,question="ta-IN-Standard-A"
,answer="ta-IN-Standard-B"
. Refer to Google Cloud Text-to-Speech documentation for more details. - The
geminimulti
(Google) TTS model supports only English voices. Also, not every Google Cloud project might have access to multi-speaker voices (Eg.en-US-Studio-MultiSpeaker
). In case if you get -"Multi-speaker voices are only available to allowlisted projects."
, you can fallback togemini
TTS model. - The
openai
TTS model supports multiple languages automatically, however non-English voices still present sub-par quality in my experience. - The
elevenlabs
TTS model has English voices by default, in order to use a non-English voice you would need to download a custom voice for the target language in yourelevenlabs
account settings and then set thetext_to_speech.elevenlabs.default_voices
parameters to the voice you want to use in the config.yaml file (this config file is only available in the source code of the project, not in the pip package, hence if you are using the pip package you will not be able to change the ElevenLabs voice). For more information on ElevenLabs voices, visit ElevenLabs Voice Library
- The