Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR for using Pull based(Integrated Vectorization) approach throughout #629

Merged
merged 14 commits into from
Apr 18, 2024

Conversation

komalg1
Copy link
Collaborator

@komalg1 komalg1 commented Apr 8, 2024

Purpose

  • Adds an ADR for the approach to take to implement Integrated Vectorization. Required by #321

Does this introduce a breaking change?

[ ] Yes
[x] No

Pull Request Type

What kind of change does this Pull Request introduce?

[ ] Bugfix
[ ] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[x] Documentation content changes
[ ] Other... Please describe:

Copy link

github-actions bot commented Apr 8, 2024

Coverage

Coverage Report
FileStmtsMissCoverMissing
code
   app.py13130%1–4, 6, 8, 11–13, 15, 17, 19–20
   create_app.py148397%199, 204, 327
code/backend
   Admin.py22220%1–6, 9, 11, 13, 15, 18–19, 21–22, 25, 32, 39, 42–44, 46, 48
code/backend/batch
   AddURLEmbeddings.py29293%37–38
   BatchPushResults.py270100% 
   BatchStartProcessing.py190100% 
   GetConversationResponse.py32390%63–65
   function_app.py15150%1–8, 11–12, 14, 17–20
code/backend/batch/utilities/common
   Answer.py24195%39
   SourceDocument.py58493%31, 35, 39, 124
code/backend/batch/utilities/document_chunking
   DocumentChunkingBase.py10280%10, 16
   FixedSizeOverlap.py190100% 
   Layout.py190100% 
   Page.py170100% 
   Paragraph.py990%1–4, 7–9, 12, 15
   Strategies.py25484%24–25, 27, 29
   __init__.py70100% 
code/backend/batch/utilities/document_loading
   DocumentLoadingBase.py9188%13
   Layout.py12120%1–4, 7–9, 11–13, 16, 25
   Read.py12120%1–4, 7–9, 11–13, 16, 25
   Strategies.py20860%13, 15, 17, 19, 24–25, 27, 29
   Web.py19194%23
   WordDocument.py25250%1–6, 9–12, 21–24, 26–27, 29–30, 32–37, 45
   __init__.py110100% 
code/backend/batch/utilities/helpers
   AzureBlobStorageHelper.py715325%16, 20–22, 30, 46, 48–50, 53–54, 59, 63, 67, 70, 73–74, 79, 82, 88–89, 91, 95, 99, 103, 109, 124, 127, 139, 142, 146, 149, 151, 159–163, 186, 190–194, 196, 199–200, 204, 208, 210, 212, 216, 227
   AzureFormRecognizerHelper.py81810%1–6, 9–11, 13, 16–17, 25, 27, 35, 43–45, 52–55, 60–68, 70, 73–75, 77–78, 81, 84–86, 88–90, 93, 97–98, 105–109, 111–114, 117–131, 133, 135–137, 139–140, 143, 145–147
   AzureSearchHelper.py200100% 
   ConfigHelper.py633446%19–22, 30–31, 34, 39, 42, 45, 48, 54–58, 63, 68–69, 75–76, 78–80, 83–86, 88, 92–93, 99–100, 218
   DocumentChunkingHelper.py12191%21
   DocumentLoadingHelper.py12191%14
   DocumentProcessorHelper.py311938%15–17, 22, 25–35, 38–41
   EnvHelper.py120992%194, 199–200, 203–205, 215–217
   LLMHelper.py332039%11–13, 15–16, 22, 28–29, 34, 37–38, 47, 58–59, 71, 83–84, 91, 101, 109
   OrchestratorHelper.py12466%20–22, 25
code/backend/batch/utilities/loggers
   ConversationLogger.py362822%8, 11–12, 15–24, 27–30, 33–42, 46
   TokenLogger.py9455%7–8, 11, 15
code/backend/batch/utilities/orchestrator
   LangChainAgent.py722959%23–28, 30, 65–66, 71–73, 78, 98–101, 118–119, 122–125, 132–133, 138–140, 143
   OpenAIFunctions.py66660%1–3, 5–12, 14, 17–21, 56, 59, 62–64, 69–71, 76, 79, 81, 87–90, 92, 95, 102–106, 110–111, 113, 119–123, 127–129, 132, 135–136, 139, 144–146, 149–151, 156–158, 161, 164, 169
   OrchestratorBase.py321553%14–20, 31, 40–42, 49–51, 61
   Strategies.py12741%10–11, 13–15, 17, 19
   __init__.py11190%10
code/backend/batch/utilities/parser
   OutputParserTool.py390100% 
   ParserBase.py9277%9, 19
   __init__.py7271%7, 11
code/backend/batch/utilities/tools
   AnswerProcessingBase.py8275%8, 12
   AnsweringToolBase.py9277%9, 15
   ContentSafetyChecker.py412539%16, 18–19, 24, 30–32, 35–36, 42–43, 49–54, 57–59, 61, 65–67, 69
   PostPromptTool.py221340%11, 14–15, 17–18, 22, 29, 36–37, 45, 51–52, 60
   QuestionAnswerTool.py362044%21–24, 27–28, 33, 36, 43, 46, 50–51, 53–54, 57–59, 68, 70, 77
   TextProcessingTool.py16943%9, 12–15, 19, 21, 28, 35
code/backend/pages
   01_Ingest_Data.py1201200%1–12, 18–22, 24–26, 28, 34, 41, 44, 48–49, 51, 56, 59–60, 63–72, 76–78, 81–84, 86, 89–99, 102–109, 112–114, 116, 119, 121–124, 129–134, 137, 140–141, 144, 150, 163–166, 169–170, 174, 178, 185, 199–202, 205, 210–211, 213–214, 216–218, 222, 225–226, 232–235, 242–243, 248, 250–251
   02_Explore_Data.py28280%1–8, 10, 12, 14, 20, 27, 30, 38, 41–43, 45–48, 50, 54, 58–59, 62–63
   03_Delete_Data.py51510%1–6, 8, 10, 12, 18, 25, 28, 36, 39–40, 43–47, 49, 51–55, 57–58, 60, 63–65, 67–70, 72–74, 76, 78, 81–83, 85–86, 88–90, 92–93
   04_Configuration.py69690%1–6, 8, 10, 12, 19, 26, 28, 33–38, 41–42, 45–46, 48–51, 53–54, 64–68, 71–72, 76–78, 81–82, 85–86, 89–93, 100–102, 104, 106, 114, 121–122, 129, 131, 143–144, 161–162, 166, 168–169, 185, 207–208, 210–211
TOTAL174985251% 

Tests Skipped Failures Errors Time
74 0 💤 0 ❌ 0 🔥 9.970s ⏱️

@komalg1
Copy link
Collaborator Author

komalg1 commented Apr 8, 2024

This is the initial draft design I thought of for implementing Integrated Vectorization. I might have missed something. We can have an extended discussion and finalize the design.

@komalg1 komalg1 changed the title ADR for options available for using pull based approach (Integrated Vectorization) ADR for using Pull based(Integrated Vectorization) approach throughout Apr 10, 2024
@komalg1 komalg1 requested a review from ross-p-smith April 10, 2024 16:50
Copy link
Collaborator

@adamdougal adamdougal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! One minor comment!

@komalg1 komalg1 requested a review from superhindupur April 15, 2024 12:44
cecheta
cecheta previously approved these changes Apr 18, 2024
ross-p-smith
ross-p-smith previously approved these changes Apr 18, 2024
@komalg1 komalg1 dismissed stale reviews from cecheta and ross-p-smith via 61be960 April 18, 2024 13:25
@komalg1 komalg1 requested review from ross-p-smith and cecheta April 18, 2024 15:26
@komalg1 komalg1 added this pull request to the merge queue Apr 18, 2024
Merged via the queue into Azure-Samples:main with commit 5748db1 Apr 18, 2024
2 checks passed
@komalg1 komalg1 deleted the komal/adr-iv branch April 18, 2024 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants