Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Rag scrap notice and embedding for vectorDB #191

Merged
merged 30 commits into from
Jul 22, 2024
Merged
Changes from 1 commit
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
24400f2
setting: Chroma Vector DB 의존성 설정
zbqmgldjfh Jul 15, 2024
4241964
feat: 환경설정 파일 수정
zbqmgldjfh Jul 15, 2024
b571e81
feat(QueryVectorStoreAdapter): QueryVectorStoreAdapter를 ChromaVectorS…
zbqmgldjfh Jul 15, 2024
4ed5577
feat(Notice): Notice 테이블에 embedded boolean 필드 추가
zbqmgldjfh Jul 16, 2024
d997740
feat(NoticeTextParserTemplate): 공지의 본문, 제목, 아이디를 파싱하는 ParserTemplate 구현
zbqmgldjfh Jul 16, 2024
937e233
test: ChromaDB test container 설정
zbqmgldjfh Jul 17, 2024
64d54af
feat(NoticeApiClient): 단일 페이지를 scrap하는 requestSinglePageWithUrl 구현
zbqmgldjfh Jul 17, 2024
d285e34
fix(NoticeJdbcRepository): 공지에 추가된 embedded 필드를 위해 bulk insert method…
zbqmgldjfh Jul 18, 2024
aa7c80b
feat(NoticeRepository): updateNoticeEmbeddingStatus, findNotYetEmbedd…
zbqmgldjfh Jul 18, 2024
4d65d24
fix(KuisHomepageNoticeTextParser): 본문을 포함하는 추가 테그를 파싱하는 로직 추가
zbqmgldjfh Jul 18, 2024
360b078
feat(KuisHomepageNoticeInfo): textParser 의존성 추가
zbqmgldjfh Jul 18, 2024
95dbbae
feat(ChromaVectorStoreAdapter): ChromaVector 구현
zbqmgldjfh Jul 18, 2024
7c71810
test(KuisHomepageNoticeScraperTemplateTest): 임베딩 테스트 scrapForEmbeddin…
zbqmgldjfh Jul 18, 2024
91c6a93
feat(RAGConfiguration): RAG 환경설정 구현
zbqmgldjfh Jul 19, 2024
f808b15
feat(NoticeEmbeddingUpdater): 공지 embedding을 위한 Updater 구현
zbqmgldjfh Jul 19, 2024
3c0fca8
feat: 공지 updater 작업 수행 시간 변경
zbqmgldjfh Jul 19, 2024
6b4c10c
chore: 설정파일에 collection-name 추가
zbqmgldjfh Jul 19, 2024
81e0625
fix(ChromaVectorStoreAdapter): embedding 메서드 수정과 테스트 추가
zbqmgldjfh Jul 20, 2024
1889237
feat(ChromaVectorStoreAdapter): 유사도 임계치 제거
zbqmgldjfh Jul 20, 2024
828fd4e
feat: 사용하지 않는 RestTemplateConfig 제거
zbqmgldjfh Jul 20, 2024
63d68b3
chore: Public 접근 제어자 제거
zbqmgldjfh Jul 20, 2024
bc88621
feat(ChromaVectorStoreAdapter): Top-K 를 2로 변경
zbqmgldjfh Jul 21, 2024
4e52373
feat(User): 한달 질문 가능 횟수를 3번으로 변경
zbqmgldjfh Jul 21, 2024
2aee9bf
feat(UserUpdater#questionCountReset): 매달 마지막날 사용자 질문 카운트 초기화 작업 구현
zbqmgldjfh Jul 21, 2024
10488b2
feat(UserRegisterNonChainingFilter): 사용자 중복 등록 예외 로그를 남기도록 처리
zbqmgldjfh Jul 21, 2024
d8944be
feat(UserUpdater): 사용자 제거작업 중지
zbqmgldjfh Jul 21, 2024
8351982
setting: ai max token 1000으로 변경
zbqmgldjfh Jul 21, 2024
d00db85
feat(RAGQueryApiV2): RAGQueryApi 문서화
zbqmgldjfh Jul 21, 2024
736e835
refactor: SecurityRequirement에서 상수를 사용하도록 변경
zbqmgldjfh Jul 22, 2024
645cb77
feat(User): 사용자 질문 횟수 2로 제한
zbqmgldjfh Jul 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
feat(KuisHomepageNoticeInfo): textParser 의존성 추가
  • Loading branch information
zbqmgldjfh committed Jul 18, 2024
commit 360b0783f4b2047b5d4ab63acc3b0f42b671940e
Original file line number Diff line number Diff line change
@@ -1,21 +1,24 @@
package com.kustacks.kuring.worker.scrap.noticeinfo;

import com.kustacks.kuring.notice.domain.CategoryName;
import com.kustacks.kuring.worker.parser.notice.KuisHomepageNoticeHtmlParser;
import com.kustacks.kuring.worker.parser.notice.KuisHomepageNoticeTextParser;
import com.kustacks.kuring.worker.scrap.client.notice.KuisHomepageNoticeApiClient;
import com.kustacks.kuring.worker.scrap.client.notice.property.KuisHomepageNoticeProperties;
import com.kustacks.kuring.worker.parser.notice.KuisHomepageNoticeHtmlParser;
import org.springframework.stereotype.Component;

@Component
public class BachelorKuisHomepageNoticeInfo extends KuisHomepageNoticeInfo {
public BachelorKuisHomepageNoticeInfo(
KuisHomepageNoticeApiClient kuisHomepageNoticeApiClient,
KuisHomepageNoticeHtmlParser kuisHomepageNoticeHtmlParser,
KuisHomepageNoticeTextParser kuisHomepageNoticeTextParser,
KuisHomepageNoticeProperties kuisHomepageNoticeProperties
) {
super();
this.noticeApiClient = kuisHomepageNoticeApiClient;
this.htmlParser = kuisHomepageNoticeHtmlParser;
this.textParser = kuisHomepageNoticeTextParser;
this.kuisHomepageNoticeProperties = kuisHomepageNoticeProperties;
this.siteId = 234;
this.categoryName = CategoryName.BACHELOR;
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
package com.kustacks.kuring.worker.scrap.noticeinfo;

import com.kustacks.kuring.worker.parser.notice.KuisHomepageNoticeTextParser;
import com.kustacks.kuring.worker.scrap.client.notice.KuisHomepageNoticeApiClient;
import com.kustacks.kuring.worker.scrap.client.notice.property.KuisHomepageNoticeProperties;
import com.kustacks.kuring.worker.parser.notice.KuisHomepageNoticeHtmlParser;
@@ -12,10 +13,12 @@ public class EmploymentKuisHomepageNoticeInfo extends KuisHomepageNoticeInfo {
public EmploymentKuisHomepageNoticeInfo(
KuisHomepageNoticeApiClient kuisHomepageNoticeApiClient,
KuisHomepageNoticeHtmlParser kuisHomepageNoticeHtmlParser,
KuisHomepageNoticeTextParser kuisHomepageNoticeTextParser,
KuisHomepageNoticeProperties kuisHomepageNoticeProperties
) {
this.noticeApiClient = kuisHomepageNoticeApiClient;
this.htmlParser = kuisHomepageNoticeHtmlParser;
this.textParser = kuisHomepageNoticeTextParser;
this.kuisHomepageNoticeProperties = kuisHomepageNoticeProperties;
this.category = "job";
this.siteId = 4083;
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package com.kustacks.kuring.worker.scrap.noticeinfo;

import com.kustacks.kuring.notice.domain.CategoryName;
import com.kustacks.kuring.worker.parser.notice.KuisHomepageNoticeTextParser;
import com.kustacks.kuring.worker.scrap.client.notice.KuisHomepageNoticeApiClient;
import com.kustacks.kuring.worker.scrap.client.notice.property.KuisHomepageNoticeProperties;
import com.kustacks.kuring.worker.parser.notice.KuisHomepageNoticeHtmlParser;
@@ -11,10 +12,12 @@ public class IndustryUnivKuisHomepageNoticeInfo extends KuisHomepageNoticeInfo {
public IndustryUnivKuisHomepageNoticeInfo(
KuisHomepageNoticeApiClient kuisHomepageNoticeApiClient,
KuisHomepageNoticeHtmlParser kuisHomepageNoticeHtmlParser,
KuisHomepageNoticeTextParser kuisHomepageNoticeTextParser,
KuisHomepageNoticeProperties kuisHomepageNoticeProperties
) {
this.noticeApiClient = kuisHomepageNoticeApiClient;
this.htmlParser = kuisHomepageNoticeHtmlParser;
this.textParser = kuisHomepageNoticeTextParser;
this.kuisHomepageNoticeProperties = kuisHomepageNoticeProperties;
this.category = "research";
this.siteId = 4214;
Original file line number Diff line number Diff line change
@@ -2,6 +2,8 @@

import com.kustacks.kuring.notice.domain.CategoryName;
import com.kustacks.kuring.worker.dto.ScrapingResultDto;
import com.kustacks.kuring.worker.parser.notice.NoticeTextParserTemplate;
import com.kustacks.kuring.worker.parser.notice.PageTextDto;
import com.kustacks.kuring.worker.scrap.client.notice.NoticeApiClient;
import com.kustacks.kuring.worker.scrap.client.notice.property.KuisHomepageNoticeProperties;
import com.kustacks.kuring.worker.parser.notice.NoticeHtmlParserTemplate;
@@ -18,6 +20,7 @@ public class KuisHomepageNoticeInfo {
protected NoticeApiClient<ScrapingResultDto, KuisHomepageNoticeInfo> noticeApiClient;
protected KuisHomepageNoticeProperties kuisHomepageNoticeProperties;
protected NoticeHtmlParserTemplate htmlParser;
protected NoticeTextParserTemplate textParser;
protected CategoryName categoryName;
protected String category = "konkuk";
protected Integer siteId;
@@ -30,10 +33,18 @@ public List<ScrapingResultDto> scrapAllPageHtml() {
return noticeApiClient.requestAll(this);
}

public ScrapingResultDto scrapSinglePageHtml(String url) {
return noticeApiClient.requestSinglePageWithUrl(this, url);
}

public RowsDto parse(Document document) {
return htmlParser.parse(document);
}

public PageTextDto parseText(Document document) {
return textParser.parse(document);
}

public CategoryName getCategoryName() {
return categoryName;
}
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package com.kustacks.kuring.worker.scrap.noticeinfo;

import com.kustacks.kuring.notice.domain.CategoryName;
import com.kustacks.kuring.worker.parser.notice.KuisHomepageNoticeTextParser;
import com.kustacks.kuring.worker.scrap.client.notice.KuisHomepageNoticeApiClient;
import com.kustacks.kuring.worker.scrap.client.notice.property.KuisHomepageNoticeProperties;
import com.kustacks.kuring.worker.parser.notice.KuisHomepageNoticeHtmlParser;
@@ -11,10 +12,12 @@ public class NationalKuisHomepageNoticeInfo extends KuisHomepageNoticeInfo {
public NationalKuisHomepageNoticeInfo(
KuisHomepageNoticeApiClient kuisHomepageNoticeApiClient,
KuisHomepageNoticeHtmlParser kuisHomepageNoticeHtmlParser,
KuisHomepageNoticeTextParser kuisHomepageNoticeTextParser,
KuisHomepageNoticeProperties kuisHomepageNoticeProperties
) {
this.noticeApiClient = kuisHomepageNoticeApiClient;
this.htmlParser = kuisHomepageNoticeHtmlParser;
this.textParser = kuisHomepageNoticeTextParser;
this.kuisHomepageNoticeProperties = kuisHomepageNoticeProperties;
this.siteId = 237;
this.categoryName = CategoryName.NATIONAL;
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package com.kustacks.kuring.worker.scrap.noticeinfo;

import com.kustacks.kuring.notice.domain.CategoryName;
import com.kustacks.kuring.worker.parser.notice.KuisHomepageNoticeTextParser;
import com.kustacks.kuring.worker.scrap.client.notice.KuisHomepageNoticeApiClient;
import com.kustacks.kuring.worker.scrap.client.notice.property.KuisHomepageNoticeProperties;
import com.kustacks.kuring.worker.parser.notice.KuisHomepageNoticeHtmlParser;
@@ -11,10 +12,12 @@ public class NormalKuisHomepageNoticeInfo extends KuisHomepageNoticeInfo {
public NormalKuisHomepageNoticeInfo(
KuisHomepageNoticeApiClient kuisHomepageNoticeApiClient,
KuisHomepageNoticeHtmlParser kuisHomepageNoticeHtmlParser,
KuisHomepageNoticeTextParser kuisHomepageNoticeTextParser,
KuisHomepageNoticeProperties kuisHomepageNoticeProperties
) {
this.noticeApiClient = kuisHomepageNoticeApiClient;
this.htmlParser = kuisHomepageNoticeHtmlParser;
this.textParser = kuisHomepageNoticeTextParser;
this.kuisHomepageNoticeProperties = kuisHomepageNoticeProperties;
this.siteId = 240;
this.categoryName = CategoryName.NORMAL;
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package com.kustacks.kuring.worker.scrap.noticeinfo;

import com.kustacks.kuring.notice.domain.CategoryName;
import com.kustacks.kuring.worker.parser.notice.KuisHomepageNoticeTextParser;
import com.kustacks.kuring.worker.scrap.client.notice.KuisHomepageNoticeApiClient;
import com.kustacks.kuring.worker.scrap.client.notice.property.KuisHomepageNoticeProperties;
import com.kustacks.kuring.worker.parser.notice.KuisHomepageNoticeHtmlParser;
@@ -11,10 +12,12 @@ public class ScholarshipKuisHomepageNoticeInfo extends KuisHomepageNoticeInfo {
public ScholarshipKuisHomepageNoticeInfo(
KuisHomepageNoticeApiClient kuisHomepageNoticeApiClient,
KuisHomepageNoticeHtmlParser kuisHomepageNoticeHtmlParser,
KuisHomepageNoticeTextParser kuisHomepageNoticeTextParser,
KuisHomepageNoticeProperties kuisHomepageNoticeProperties
) {
this.noticeApiClient = kuisHomepageNoticeApiClient;
this.htmlParser = kuisHomepageNoticeHtmlParser;
this.textParser = kuisHomepageNoticeTextParser;
this.kuisHomepageNoticeProperties = kuisHomepageNoticeProperties;
this.siteId = 235;
this.categoryName = CategoryName.SCHOLARSHIP;
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package com.kustacks.kuring.worker.scrap.noticeinfo;

import com.kustacks.kuring.notice.domain.CategoryName;
import com.kustacks.kuring.worker.parser.notice.KuisHomepageNoticeTextParser;
import com.kustacks.kuring.worker.scrap.client.notice.KuisHomepageNoticeApiClient;
import com.kustacks.kuring.worker.scrap.client.notice.property.KuisHomepageNoticeProperties;
import com.kustacks.kuring.worker.parser.notice.KuisHomepageNoticeHtmlParser;
@@ -11,10 +12,12 @@ public class StudentKuisHomepageNoticeInfo extends KuisHomepageNoticeInfo {
public StudentKuisHomepageNoticeInfo(
KuisHomepageNoticeApiClient kuisHomepageNoticeApiClient,
KuisHomepageNoticeHtmlParser kuisHomepageNoticeHtmlParser,
KuisHomepageNoticeTextParser kuisHomepageNoticeTextParser,
KuisHomepageNoticeProperties kuisHomepageNoticeProperties
) {
this.noticeApiClient = kuisHomepageNoticeApiClient;
this.htmlParser = kuisHomepageNoticeHtmlParser;
this.textParser = kuisHomepageNoticeTextParser;
this.kuisHomepageNoticeProperties = kuisHomepageNoticeProperties;
this.siteId = 238;
this.categoryName = CategoryName.STUDENT;