Running Marimo in Docker Image

chanzuckerberg · Jul 2, 2024 · 9af327d · 9af327d
1 parent 2a0b23c
commit 9af327d
Show file tree

Hide file tree

Showing 5 changed files with 26 additions and 13 deletions.
diff --git a/README.md b/README.md
@@ -51,14 +51,17 @@ The preferred method to run Alhazen is through
 Note, for correct functionality, set the following environment variables
 for the shell from which you are calling Docker:
 
-**MANDATORY** \* LOCAL_FILE_PATH - the directory where the system will
-store full-text files.
-
-**OPTIONAL** \* OPENAI_API_KEY - if you are using OpenAI large language
-models.  
-\* DATABRICKS_API_KEY - if you are using the Databricks AI Playground
-endpoint as an LLM server. \* GROQ_API_KEY - if you are calling LLMs on
-groq.com
+**MANDATORY**
+
+- LOCAL_FILE_PATH - the directory where the system will store full-text
+  files.
+
+**OPTIONAL**
+
+- OPENAI_API_KEY - if you are using OpenAI large language models.  
+- DATABRICKS_API_KEY - if you are using the Databricks AI Playground
+  endpoint as an LLM server.
+- GROQ_API_KEY - if you are calling LLMs on groq.com
 
 #### Quickstart
 

diff --git a/docs/index.html b/docs/index.html
@@ -563,9 +563,17 @@ <h2 class="anchored" data-anchor-id="installation">Installation</h2>
 <h3 class="anchored" data-anchor-id="docker">Docker</h3>
 <p>The preferred method to run Alhazen is through <a href="https://www.docker.com/">Docker</a>.</p>
 <p>Note, for correct functionality, set the following environment variables for the shell from which you are calling Docker:</p>
-<p><strong>MANDATORY</strong> * LOCAL_FILE_PATH - the directory where the system will store full-text files.</p>
-<p><strong>OPTIONAL</strong> * OPENAI_API_KEY - if you are using OpenAI large language models.<br>
-* DATABRICKS_API_KEY - if you are using the Databricks AI Playground endpoint as an LLM server. * GROQ_API_KEY - if you are calling LLMs on groq.com</p>
+<p><strong>MANDATORY</strong></p>
+<ul>
+<li>LOCAL_FILE_PATH - the directory where the system will store full-text files.</li>
+</ul>
+<p><strong>OPTIONAL</strong></p>
+<ul>
+<li>OPENAI_API_KEY - if you are using OpenAI large language models.<br>
+</li>
+<li>DATABRICKS_API_KEY - if you are using the Databricks AI Playground endpoint as an LLM server.</li>
+<li>GROQ_API_KEY - if you are calling LLMs on groq.com</li>
+</ul>
 <section id="quickstart" class="level4">
 <h4 class="anchored" data-anchor-id="quickstart">Quickstart</h4>
 <p>To run the system out of the box, run these commands:</p>

diff --git a/docs/search.json b/docs/search.json
@@ -26,7 +26,7 @@
     "href": "index.html#installation",
     "title": "Home - Alhazen",
     "section": "Installation",
-    "text": "Installation\n\nDocker\nThe preferred method to run Alhazen is through Docker.\nNote, for correct functionality, set the following environment variables for the shell from which you are calling Docker:\nMANDATORY * LOCAL_FILE_PATH - the directory where the system will store full-text files.\nOPTIONAL * OPENAI_API_KEY - if you are using OpenAI large language models.\n* DATABRICKS_API_KEY - if you are using the Databricks AI Playground endpoint as an LLM server. * GROQ_API_KEY - if you are calling LLMs on groq.com\n\nQuickstart\nTo run the system out of the box, run these commands:\n$ git clone https://github.com/chanzuckerberg/alhazen\n$ cd alhazen\n$ docker compose build\n$ docker compose up\nThis should generate the output that includes a link formatted like this one: http://127.0.0.1:8888/lab?token=LONG-ALPHANUMERIC-STRING.\nOpen a browser to that location and you should get access to a juypter lab notebook that provides access to all notebooks in the repo.\nBrowse to nbs/tutorials/CryoET_Tutorial.ipynb to access a walkthrough of an analysis over papers involving CryoET as a demonstration.\n\n\nRun Huridocs as PDF extraction\nTo run the system with support from the Huridocs PDF extraction system (needed for processing full text articles), you must first run the docker container for that system:\n$ git clone https://github.com/huridocs/pdf_paragraphs_extraction\n$ cd pdf_paragraphs_extraction\n$ docker compose build\n$ docker compose up\nThen repeat as before, but with the huridocs alhazen image\n$ cd ..\n$ git clone https://github.com/chanzuckerberg/alhazen\n$ cd alhazen\n$ docker compose build\n$ docker compose -f docker-compose-huridocs.yml up\n\n\n\nInstall dependencies\n\nPostgresql\nAlhazen requires postgresql@14 to run. Homebrew provides an installer:\n$ brew install postgresql@14\nwhich can be run as a service:\n$ brew services start postgresql@14\n$ brew services list\nIf you install Postgresql via homebrew, you will need to create a postgres superuser to run the psql command.\n$ createuser -s postgres\nNote that the Postgres.app system also provides a nice GUI interface for Postgres but installing the pgvector package is a little more involved.\n\n\nOllama\nThe tool uses the Ollama library to execute large language models locally on your machine. Note that to able to run the best performing models on a Apple Mac M1 or M2 machine, you will need at least 48GB of memory.\n\n\nHuridocs\nWe use a PDF document text extraction and classification system called Huridocs. In particular, our PDF processing requires a docker image of their PDF Paragraphs Extraction system. To run this, perform the following steps:\n1. git clone https://github.com/huridocs/pdf_paragraphs_extraction\n2. cd pdf_paragraphs_extraction\n3. docker-compose up\n\n\n\nInstall Alhazen source code\ngit clone https://github.com/chanzuckerberg/alzhazen\nconda create -n alhazen python=3.11\nconda activate alhazen\ncd alhazen\npip install -e .",
+    "text": "Installation\n\nDocker\nThe preferred method to run Alhazen is through Docker.\nNote, for correct functionality, set the following environment variables for the shell from which you are calling Docker:\nMANDATORY\n\nLOCAL_FILE_PATH - the directory where the system will store full-text files.\n\nOPTIONAL\n\nOPENAI_API_KEY - if you are using OpenAI large language models.\n\nDATABRICKS_API_KEY - if you are using the Databricks AI Playground endpoint as an LLM server.\nGROQ_API_KEY - if you are calling LLMs on groq.com\n\n\nQuickstart\nTo run the system out of the box, run these commands:\n$ git clone https://github.com/chanzuckerberg/alhazen\n$ cd alhazen\n$ docker compose build\n$ docker compose up\nThis should generate the output that includes a link formatted like this one: http://127.0.0.1:8888/lab?token=LONG-ALPHANUMERIC-STRING.\nOpen a browser to that location and you should get access to a juypter lab notebook that provides access to all notebooks in the repo.\nBrowse to nbs/tutorials/CryoET_Tutorial.ipynb to access a walkthrough of an analysis over papers involving CryoET as a demonstration.\n\n\nRun Huridocs as PDF extraction\nTo run the system with support from the Huridocs PDF extraction system (needed for processing full text articles), you must first run the docker container for that system:\n$ git clone https://github.com/huridocs/pdf_paragraphs_extraction\n$ cd pdf_paragraphs_extraction\n$ docker compose build\n$ docker compose up\nThen repeat as before, but with the huridocs alhazen image\n$ cd ..\n$ git clone https://github.com/chanzuckerberg/alhazen\n$ cd alhazen\n$ docker compose build\n$ docker compose -f docker-compose-huridocs.yml up\n\n\n\nInstall dependencies\n\nPostgresql\nAlhazen requires postgresql@14 to run. Homebrew provides an installer:\n$ brew install postgresql@14\nwhich can be run as a service:\n$ brew services start postgresql@14\n$ brew services list\nIf you install Postgresql via homebrew, you will need to create a postgres superuser to run the psql command.\n$ createuser -s postgres\nNote that the Postgres.app system also provides a nice GUI interface for Postgres but installing the pgvector package is a little more involved.\n\n\nOllama\nThe tool uses the Ollama library to execute large language models locally on your machine. Note that to able to run the best performing models on a Apple Mac M1 or M2 machine, you will need at least 48GB of memory.\n\n\nHuridocs\nWe use a PDF document text extraction and classification system called Huridocs. In particular, our PDF processing requires a docker image of their PDF Paragraphs Extraction system. To run this, perform the following steps:\n1. git clone https://github.com/huridocs/pdf_paragraphs_extraction\n2. cd pdf_paragraphs_extraction\n3. docker-compose up\n\n\n\nInstall Alhazen source code\ngit clone https://github.com/chanzuckerberg/alzhazen\nconda create -n alhazen python=3.11\nconda activate alhazen\ncd alhazen\npip install -e .",
     "crumbs": [
       "Get Started",
       "Home - Alhazen"

diff --git a/docs/sitemap.xml b/docs/sitemap.xml
@@ -2,7 +2,7 @@
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
     <loc>https://chanzuckerberg.github.io/alhazen/index.html</loc>
-    <lastmod>2024-07-02T20:30:04.606Z</lastmod>
+    <lastmod>2024-07-02T20:31:46.092Z</lastmod>
   </url>
   <url>
     <loc>https://chanzuckerberg.github.io/alhazen/tutorials/index.html</loc>

diff --git a/nbs/index.ipynb b/nbs/index.ipynb
@@ -59,9 +59,11 @@
     "Note, for correct functionality, set the following environment variables for the shell from which you are calling Docker:\n",
     "\n",
     "**MANDATORY**\n",
+    "\n",
     "* LOCAL_FILE_PATH - the directory where the system will store full-text files.\n",
     "\n",
     "**OPTIONAL**\n",
+    "\n",
     "* OPENAI_API_KEY - if you are using OpenAI large language models.  \n",
     "* DATABRICKS_API_KEY - if you are using the Databricks AI Playground endpoint as an LLM server. \n",
     "* GROQ_API_KEY - if you are calling LLMs on groq.com\n",