Skip to content

Latest commit

 

History

History
273 lines (197 loc) · 18.5 KB

README.md

File metadata and controls

273 lines (197 loc) · 18.5 KB

❄️ ArcticMeet ❄️

Powered by Snowflake

Enjoying what you find in this repository? Your star ⭐ would be greatly appreciated!


📖 Short description 📖

ArcticMeet is a Streamlit app designed for meeting analysis using the Snowflake Arctic LLM via Snowflake Cortex LLM functions.*

👉 See the video presentation here.
👉 Try the fully functioning app here. (Note: You need to set up Snowflake credentials to start using ArcticMeet. See the instructions below.)

*ArcticMeet was developed as a project for the Snowflake June 2024 hackathon.

Caution

Please be aware that if you use the publicly available version of ArcticMeet hosted on Streamlit Cloud, it means that all transcriptions of meetings you upload there can be viewed by anyone in the world. Do not upload any sensitive or private meetings in any way.

I strongly recommend you use the provided sample meeting, which is a simulated, dummy meeting designed for testing purposes.

By proceeding, you acknowledge and understand the risks associated with uploading meetings to ArcticMeet. You absolve ArcticMeet and its developers of any responsibility for the consequences of such uploads.

Thank you for your attention to this matter.


🚀 Getting started 🚀

Step 1: Clone this repository

Run the following in the terminal to clone the repository:

git clone https://github.com/rokbenko/arctic-meet.git

Step 2: Change the directory

Run the following in the terminal to change the directory:

cd arctic-meet

Step 3: Install all requirements and packages

Run the following in the terminal to install all the required packages:

pip install -r requirements.txt
pip install -r packages.txt

Step 4: Set up Snowflake credentials (mandatory) and add them to the secrets.toml file (optional but recommended)

Note

Setting up Snowflake credentials is mandatory. You need your Snowflake credentials if you want to use ArcticMeet.

But adding Snowflake credentials to the secrets.toml file is optional. You have two options for how to use your Snowflake credentials with ArcticMeet:

  • adding them to the secrets.toml file, or
  • typing them into the input fields in the ArcticMeet's sidebar during Step 3.
  1. Create a Snowflake account if you haven't already.
  2. Create the secrets.toml file inside the .streamlit folder.
  3. Add the following Snowflake secrets to the secrets.toml file:
# secrets.toml
SNOWFLAKE_ACCOUNT="xxxxxxx-xxxxxxx"
SNOWFLAKE_USER_NAME="xxxxx"
SNOWFLAKE_USER_PASSWORD="xxxxx"

Where:

  • SNOWFLAKE_ACCOUNT is the Snowflake account you want to use.

How to get Snowflake credentials

Important

The connection object stores a secure connection URL that you use with a Snowflake client to connect to Snowflake. The hostname in the connection URL is composed of your organization name and the connection object name, in addition to a common domain name:

<organization_name>-<connection_name>.snowflakecomputing.com

Let's say your URL is the following: https://abcdefg-hackathon.snowflakecomputing.com

For the SNOWFLAKE_ACCOUNT secret, you need to set just the <organization_name>-<connection_name>.

  • Wrong: SNOWFLAKE_ACCOUNT="https://abcdefg-hackathon.snowflakecomputing.com"
  • Wrong: SNOWFLAKE_ACCOUNT="abcdefg-hackathon.snowflakecomputing.com"
  • Correct: SNOWFLAKE_ACCOUNT="abcdefg-hackathon"

Important

Generative AI features of ArcticMeet are using the Snowflake Arctic LLM via Snowflake Cortex LLM functions. The Complete() Snowflake Cortex LLM function with the snowflake-arctic LLM is, as of April 2024, only supported if you're using the AWS US West 2 (Oregon). If you're using any other location (e.g., Azure West Europe (Netherlands)), you'll get the 400 unknown model \snowflake-arctic\ error. This might mislead you. The snowflake-arctic LLM exists, but you need to use the AWS US West 2 (Oregon) location when you create an account.

If you haven't created an account with the AWS US West 2 (Oregon) location yet, simply create a new account with the AWS US West 2 (Oregon) location.

  • SNOWFLAKE_USER_NAME is the Snowflake user that you want to use associated with the account that you set for the SNOWFLAKE_ACCOUNT secret.

Important

The Default Warehouse needs to be set for the Snowflake user that you want to use, associated with the account that you set for SNOWFLAKE_ACCOUNT secret. Otherwise, you'll get the No active warehouse selected in the current session. Select an active warehouse with the 'use warehouse' command. error.

How to check Default Warehouse

  • SNOWFLAKE_USER_PASSWORD is the password for the Snowflake user that you set for the SNOWFLAKE_USER_NAME secret.

Step 5: Start the Streamlit app

Run the following in the terminal to start the Streamlit app:

streamlit run ArcticMeet.py

Step 6: Access ArcticMeet in your browser

Navigate to http://localhost:8501 to open ArcticMeet in the browser.


🤔 How does it work 🤔

ArcticMeet analyzes your meeting in the following three steps:

  1. Upload a meeting:
    • The goal of this step is to get a transcription of the meeting. ArcticMeet needs a transcription, which is a written version of what was said in your meeting. This helps ArcticMeet understand and analyze your meeting in the next two steps. ArcticMeet will get a transcription of the meeting you upload using Whisper via Hugging Face, more precisely the openai/whisper-tiny.
    • Note 1: You can only upload one meeting at a time. The file must be in MP4 format and not larger than 5 GB.
    • Note 2: Although there are other more capable (i.e., larger) Whisper models out there, they make the Streamlit app too heavy in terms of resources needed to be hosted on the Streamlit Cloud via the free tier. Larger Whisper models crash the Streamlit app due to the resource limit hit.
  2. Select a transcription:
    • The goal of this step is that the user selects a transcription he/she wants to analyze in the next step. Although only one meeting can be uploaded at a time, the user can analyze multiple meetings one after another. ArcticMeet remembers previously uploaded meetings, so the user in this step can choose between different transcriptions.
  3. Transcription analysis:
    • The goal of this step is that the user selects all the analysis features he or she wants to include in the analysis. Then ArcticMeet can start analyzing the transcription and provide the meeting analysis.

⚙️ Analysis features ⚙️

ArcticMeet is able to perform the following:

  • Summarization: Providing a summary of the meeting.
  • Agenda extraction: Providing key topics discussed in the meeting.
  • Participant identification: Providing meeting participants and their gender.
  • Sentiment analysis: Providing the sentiment of the meeting, sentence by sentence.
  • Translation: Providing translation of the meeting to different languages.

Note

As of April 2024, the Translate() Snowflake Cortex LLM function supports the following languages:

  • English
  • French
  • German
  • Italian
  • Japanese
  • Korean
  • Polish
  • Portuguese
  • Russian
  • Spanish
  • Swedish

⚒️ Tech stack ⚒️

ArcticMeet works with the following tech stack:

Tech Version
Python 3.11.8
Streamlit 1.34.0
Streamlit JS eval 0.1.7
Snowflake Connector for Python 3.10.0
Snowpark API for Python 1.16.0
Snowflake ML for Python 1.5.0
PyTorch 2.3.0+cpu
Torchvision 0.18.0
Torchaudio 2.3.0
FFmpeg 7.0
Hugging Face Transformers 4.40.2
Pandas 2.2.0
Plotly 5.22.0

Tip

You don't have to install above mentioned packages one by one. See the instructions above.


🎭 Behind the sceenes 🎭

ArcticMeet follows the Streamlit multipage app architecture and leverages a wide range of Streamlit components to deliver the best possible UX:

  • st.set_page_config
  • st.write
  • st.header
  • st.subheader
  • st.markdown
  • st.columns
  • st.button
  • st.switch_page
  • st.image
  • st.form
  • st.form_submit_button
  • st.file_uploader
  • st.info
  • st.status
  • st.spinner
  • st.toast
  • st.error
  • st.session_state
  • st.selectbox
  • st.text_input
  • st.checkbox
  • st.plotly_chart
  • st.line_chart
  • st.data_editor
  • st.stop
  • st.sidebar

To maximize ArcticMeet's performance, the app utilizes Streamlit caching:

  • @st.cache_resource during Step 1: This means ArcticMeet will transcribe the uploaded meeting only once if the user keeps uploading the same meeting in a span of less than 1 hour. After 1 hour, ArcticMeet dumps the transcription from the cache.
  • @st.cache_data during Step 3: This means ArcticMeet will analyze the transcription only once if the user keeps uploading the same meeting with the same analysis features chosen in a span of less than 1 hour. After 1 hour, ArcticMeet dumps the meeting analysis from the cache.

Also, ArcticMeet employs a wide range of Snowflake Cortex LLM functions during Step 3:


⚠️ Limitations ⚠️

Limitation Solution Implementation difficulty
ArcticMeet currently only supports uploading the MP4 file format because most meeting platforms (Zoom, Google Meet, Teams, etc.) enable users to download meetings in the MP4 file format. The solution is to simply add support for other file formats using the st.file_uploader. Low.
ArcticMeet has an upload limitation of 5 GB, which is probably enough for most meetings. However, the problem might be that a long meeting, although under 5 GB, could produce a large transcription that might hit the context window limit of the Snowflake Arctic LLM when used with the Complete() Snowflake Cortex LLM function (i.e., 4,096 tokens as of April 2024). It's a current model limitation that will probably be solved in the future if the Snowflake Arctic LLM gets an update. Low or none, if the model gets an update.
ArctcAlly's Participant identification analysis feature is not very robust because it depends on names being mentioned in the meeting at any point. It might happen that ArcticMeet doesn't find all participants but only some of them. The sample meeting is a perfect example of a transcription, which is not likely to always be the case in real life. ? ?
ArcticMeet's Translation analysis feature always hits the context window limit of the Translate() Snowflake Cortex LLM function (i.e., 1,024 tokens as of April 2024). Even if you upload a very short meeting, the transcription will be too large to get the full translation back. This is the reason the translation is cut off. There are two possible solutions:
• the Translate() Snowflake Cortex LLM function gets an update, or
• I change the code so that the transcription is sent to the Translate() Snowflake Cortex LLM function in chunks, but to do this, I need to know which tokenizer Snowflake Arctic uses. I couldn't find this information anywhere.
Low or none, if the function gets an update.

Despite all the limitations mentioned above, ArcticMeet, in my humble attempt to maintain objectivity 😅, is pretty impressive considering that:

  • ArcticMeet's core was developed in just 8 days by 1 person (i.e., me).
  • The Snowflake Arctic LLM was added to the Snowflake Cortex LLM functions only 8 days ago, at the time of writing this.
  • The Snowflake Arctic LLM was announced only 20 days ago, at the time of writing this.

ArcticMeet could become even more awesome by making improvements to either the Snowflake Arctic LLM, Snowflake Cortex LLM functions, or the code itself. There's a lot of room for growth and improvement ahead for ArcticMeet.


⭐ Star history ⭐

Star history chart


🤝 Contributing 🤝

Contributions are welcome! Feel free to open issues or create pull requests for any improvements or bug fixes.


📝 License 📝

This project is open source and available under the MIT License.