Skip to content

Commit

Permalink
Feat : 학교 홈페이지 리뉴얼로인한 교직원 스크랩 및 스크랩간 직위 추가 (#223)
Browse files Browse the repository at this point in the history
* fix : 교직원 스크랩할 페이지 base url 변경(학과 홈페이지)

* remove : 리빙디자인, 커뮤니케이션 디자인 파서 삭제(미사용)

* fix : 학과 교직원 페이지 파서 수정(부동산 학과 제외)

* fix : 부동산 학과 파싱로직 변경

* feat : 교직원 스크랩시 필요한 정보 변경

* feat : 전체 학과 siteId, siteName 수정

* feat : 교직원 정보 지원 유무 검증을 위한 메서드 추가

* feat : siteId, siteName 필드값 추가로 인한 getter 변경

* remove : 교직원 스크랩 API Client 통합에 따른 미사용 클래스 삭제

* feat : 학과 교직원 스크랩 API Client 로직 변경

* feat : 교직원 스크랩 과정 간 직위정보 추가

* feat : StaffUpdater 스크랩 로직 수정

* feat : 수의예과, 수의학과 중복 교직원 정보 제거를 위한 distinct처리

* feat : test용 html 파일 추가(컴퓨터공학부, 부동산학과) 및 legacy 파일 이동

* test : 학과 교직원 정보 스크랩 로직 테스트코드 작성

* fix : entity position 추가 전 로직으로 수정

* remove : legacy StaffScraperTest 삭제

* remove : 불필요 주석 제거

* test : MockServerSupport 객체 생성

* remove : 불필요 DTO 제거

* test : 신규 StaffScraperTest 추가

* remove : 불필요 import문 삭제

* feat : StaffDTO identifier() 메서드 추가

* feat : 수의과대학 교직원 스크랩시 수의예과만 스크랩 하도록 변경

* feat : 전화번호 유틸 클래스 분리

* feat : 이메일 유틸 클래스 분리

* test : 이메일 유틸 클래스 테스트 추가

* teat : 전화번호 유틸 클래스 테스트 추가

* feat : Staff DB position column 추가

* feat : Staff & StaffDTO position 추가

* feat : StaffDTO position 비교 로직 추가

* feat : 이메일 valid 정책 수정(공백 허용)

* feat : Staff 업데이트시 직위 추가

* feat : 교직원 스크랩 스케쥴링 월 1회 활성화

* feat : EmailSupporter & PhoneNumberSupporter 검증/변환 로직 분리

* feat : EmailSupporter & PhoneNumberSupporter 검증 메서드 테스트 추가

* feat : StaffDTO 객체 생성 간 email, phone 검증, 변환하도록 수정

* refactor : 불필요 import문 제거

* refactor : 테스트 클래스 및 메서드 public 키워드 제거

* refactor : 주석 TODO 키워드 제거 및 replaceAll() -> replace() 변경

* feat : 직위 추가에 따른 StaffUpdate 로직 변경

* refactor : 람다 함수 사용 간 불필요 괄호 제거

* feat : 전화번호 없을 경우 기본 저장 값 변경("-" -> "")

* fix : StaffUpdate 로직 변경에 따른 Staff 도메인 테스트 변경(identifier(), 전화번호)

* test : StaffUpdate 로직 변경에 따른 테스트 코드 변경(identifier(), 전화번호, 직위)

* refactor : 소나큐브 이슈 수정(변수명 컨벤션)

* remove : 불필요 출력문 제거

* feat : 교직원 스크랩 스케쥴링 시간 1분 변경.(테스트 후 30분 되돌릴 예정)
  • Loading branch information
rlagkswn00 authored Dec 3, 2024
1 parent 9b605f6 commit a73c120
Show file tree
Hide file tree
Showing 128 changed files with 7,099 additions and 14,741 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
package com.kustacks.kuring.common.utils.converter;

import java.util.Arrays;
import java.util.regex.Pattern;

public class EmailSupporter {
private static final Pattern AT_PATTERN = Pattern.compile("\\s+at\\s+");
private static final Pattern DOT_PATTERN = Pattern.compile("\\s+dot\\s+");
private static final Pattern EMAIL_PATTERN = Pattern.compile("^[a-zA-Z0-9_!#$%&'\\*+/=?{|}~^.-]+@[a-zA-Z0-9.-]+$");

private static final String KONKUK_DOMAIN = "@konkuk.ac.kr";
private static final String EMPTY_EMAIL = "";

public static boolean isNullOrBlank(String email) {
return email == null || email.isBlank();
}

public static String convertValidEmail(String email) {
if (isNullOrBlank(email)) {
return EMPTY_EMAIL;
}

String[] emailGroups = splitEmails(email);
String[] normalizedEmails = normalizeEmails(emailGroups);

//여러 이메일 중 konkuk을 우선 선택, 없으면 첫번째 내용
return selectPreferredEmail(normalizedEmails);
}

private static String[] splitEmails(String email) {
return email.split("[/,]");
}

private static String[] normalizeEmails(String[] emailGroups) {
return Arrays.stream(emailGroups)
.map(EmailSupporter::normalizeEmail)
.toArray(String[]::new);
}

private static String normalizeEmail(String email) {
if (EMAIL_PATTERN.matcher(email).matches()) {
return email;
}

if (containsSubstitutePatterns(email)) {
return replaceSubstitutePatterns(email);
}

return EMPTY_EMAIL;
}

private static String replaceSubstitutePatterns(String email) {
return email.replaceAll(DOT_PATTERN.pattern(), ".")
.replaceAll(AT_PATTERN.pattern(), "@");
}

private static boolean containsSubstitutePatterns(String email) {
return DOT_PATTERN.matcher(email).find() && AT_PATTERN.matcher(email).find();
}

// Konkuk 도메인 우선 선택
private static String selectPreferredEmail(String[] emails) {
return Arrays.stream(emails)
.filter(email -> email.endsWith(KONKUK_DOMAIN))
.findFirst()
.orElseGet(() -> emails.length > 0 ? emails[0] : EMPTY_EMAIL);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
package com.kustacks.kuring.common.utils.converter;

import java.util.regex.Pattern;

public class PhoneNumberSupporter {

private static final Pattern LAST_FOUR_NUMBER_PATTERN = Pattern.compile("\\d{4}");
private static final Pattern FULL_NUMBER_PATTERN = Pattern.compile("02-\\d{3,4}-\\d{4}");
private static final Pattern FULL_NUMBER_WITH_PARENTHESES_PATTERN = Pattern.compile("02[)]\\d{3,4}-\\d{4}");

private static final String EMPTY_PHONE = "";

public static boolean isNullOrBlank(String number) {
return number == null || number.isBlank();
}

public static String convertFullExtensionNumber(String number) {
if (isNullOrBlank(number)) {
return EMPTY_PHONE;
}

if (FULL_NUMBER_PATTERN.matcher(number).matches()) {
return number;
}
if (containsLastFourNumber(number)) {
return "02-450-" + number;
}
if (containsParenthesesPattern(number)) {
return number.replace(")", "-");
}

return EMPTY_PHONE;
}

private static boolean containsLastFourNumber(String number) {
return LAST_FOUR_NUMBER_PATTERN.matcher(number).matches();
}

private static boolean containsParenthesesPattern(String number) {
return FULL_NUMBER_WITH_PARENTHESES_PATTERN.matcher(number).matches();
}
}
3 changes: 2 additions & 1 deletion src/main/java/com/kustacks/kuring/staff/domain/Email.java
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ public Email(String email) {
}

private boolean isValidEmail(String email) {
return !Objects.isNull(email) && patternMatches(email);
return Objects.nonNull(email) &&
(patternMatches(email) || Objects.equals(email,""));
}

private boolean patternMatches(String email) {
Expand Down
5 changes: 3 additions & 2 deletions src/main/java/com/kustacks/kuring/staff/domain/Phone.java
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,14 @@ public class Phone {
= Pattern.compile("(\\d{3,4})[-\\s]*(\\d{4})");
private static final String SEOUL_AREA_CODE = "02";
private static final String DELIMITER = "-";
private static final String EMPTY_NUMBER = "";

@Column(name = "phone", length = 64)
private String value;

public Phone(String phone) {
if(isEmptyNumbers(phone)) {
this.value = DELIMITER;
this.value = EMPTY_NUMBER;
return;
}

Expand Down Expand Up @@ -71,7 +72,7 @@ private boolean isValidNumbersAndSet(String phone) {
}

private static boolean isEmptyNumbers(String phone) {
return phone == null || phone.isBlank() || phone.equals(DELIMITER);
return phone == null || phone.isBlank();
}

public boolean isSameValue(String phone) {
Expand Down
18 changes: 16 additions & 2 deletions src/main/java/com/kustacks/kuring/staff/domain/Staff.java
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@ public class Staff {
@Column(name = "lab", length = 64)
private String lab;

@Getter(AccessLevel.PUBLIC)
@Column(name = "position", length = 64)
private String position;

@Embedded
private Phone phone;

Expand All @@ -45,24 +49,26 @@ public class Staff {
private College college;

@Builder
private Staff(String name, String major, String lab, String phone, String email, String dept, String college) {
private Staff(String name, String major, String lab, String phone, String email, String dept, String college, String position) {
this.name = new Name(name);
this.major = major;
this.lab = lab;
this.phone = new Phone(phone);
this.email = new Email(email);
this.dept = dept;
this.college = College.valueOf(college);
this.position = position;
}

public void updateInformation(String name, String major, String lab, String phone, String email, String deptName, String college) {
public void updateInformation(String name, String major, String lab, String phone, String email, String deptName, String college, String position) {
this.name = new Name(name);
this.major = major;
this.lab = lab;
this.phone = new Phone(phone);
this.email = new Email(email);
this.dept = deptName;
this.college = College.valueOf(college);
this.position = position;
}

public String getEmail() {
Expand Down Expand Up @@ -105,6 +111,14 @@ public boolean isSameCollege(String collegeName) {
return this.college == College.valueOf(collegeName);
}

public boolean isSamePosition(String position) {
return this.position.equals(position);
}

public String identifier() {
return String.join(",", getName(), position, dept);
}

@Override
public boolean equals(Object o) {
if (this == o) return true;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
package com.kustacks.kuring.worker.parser.staff;

import com.kustacks.kuring.worker.scrap.deptinfo.DeptInfo;
import com.kustacks.kuring.worker.scrap.deptinfo.art_design.CommunicationDesignDept;
import com.kustacks.kuring.worker.scrap.deptinfo.art_design.LivingDesignDept;
import com.kustacks.kuring.worker.scrap.deptinfo.real_estate.RealEstateDept;
import lombok.NoArgsConstructor;
import lombok.extern.slf4j.Slf4j;
Expand All @@ -18,33 +16,22 @@ public class EachDeptStaffHtmlParser extends StaffHtmlParserTemplate {

@Override
public boolean support(DeptInfo deptInfo) {
return !(deptInfo instanceof RealEstateDept) &&
!(deptInfo instanceof LivingDesignDept) &&
!(deptInfo instanceof CommunicationDesignDept);
return !(deptInfo instanceof RealEstateDept);
}

protected Elements selectStaffInfoRows(Document document) {
Element table = document.select(".photo_intro").get(0);
return table.getElementsByTag("dl");
return document.select(".row");
}

protected String[] extractStaffInfoFromRow(Element row) {
Elements infos = row.getElementsByTag("dd");

// 교수명, 직위, 세부전공, 연구실, 연락처, 이메일 순으로 파싱
// 연구실, 연락처 정보는 없는 경우가 종종 있으므로, childNode접근 전 인덱스 체크하는 로직을 넣었음
String name = infos.get(0).getElementsByTag("span").get(1).text();

String jobPosition = String.valueOf(infos.get(1).childNodeSize() < 2 ? "" : infos.get(1).childNode(1));
if (jobPosition.contains("명예") || jobPosition.contains("대우") || jobPosition.contains("휴직") || !jobPosition.contains("교수")) {
log.info("스크래핑 스킵 -> {} 교수", name);
return new String[]{};
}

String major = infos.get(2).childNodeSize() < 2 ? "" : String.valueOf(infos.get(2).childNode(1));
String lab = infos.get(3).childNodeSize() < 2 ? "" : String.valueOf(infos.get(3).childNode(1));
String phone = infos.get(4).childNodeSize() < 2 ? "" : String.valueOf(infos.get(4).childNode(1));
String email = infos.get(5).getElementsByTag("a").get(0).text();
return new String[]{name, major, lab, phone, email};
String name = row.select(".info .title .name").text();

Elements detailElement = row.select(".detail");
String jobPosition = detailElement.select(".ico1 dd").text().trim();
String major = detailElement.select(".ico2 dd").text().trim();
String lab = detailElement.select(".ico3 dd").text().trim();
String extensionNumber = detailElement.select(".ico4 dd").text().trim();
String email = detailElement.select(".ico5 dd").text().trim();
return new String[]{name, jobPosition, major, lab, extensionNumber, email};
}
}

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -14,23 +14,20 @@ public class RealEstateStaffHtmlParser extends StaffHtmlParserTemplate {
public boolean support(DeptInfo deptInfo) {
return deptInfo instanceof RealEstateDept;
}

protected Elements selectStaffInfoRows(Document document) {
Element table = document.select(".sub0201_list").get(0).getElementsByTag("ul").get(0);
return table.getElementsByTag("li");
return document.select(".row");
}

protected String[] extractStaffInfoFromRow(Element row) {
Element content = row.select(".con").get(0);

String name = content.select("dl > dt > a > strong").get(0).text();
String major = String.valueOf(content.select("dl > dd").get(0).childNode(4)).replaceFirst("\\s", "").trim();

Element textMore = content.select(".text_more").get(0);

String lab = String.valueOf(textMore.childNode(4)).split(":")[1].replaceFirst("\\s", "").trim();
String phone = String.valueOf(textMore.childNode(6)).split(":")[1].replaceFirst("\\s", "").trim();
String email = textMore.getElementsByTag("a").get(0).text();
return new String[]{name, major, lab, phone, email};
String name = row.select(".info .title .name").text();

Elements detalTagElement = row.select(".detail");
String jobPosition = detalTagElement.select("dt:contains(직위) + dd").text();
String major = detalTagElement.select("dt:contains(연구분야) + dd").text().trim();
String lab = detalTagElement.select("dt:contains(연구실) + dd").text().trim();
String extensionNumber = detalTagElement.select("dt:contains(연락처) + dd").text().trim();
String email = detalTagElement.select("dt:contains(이메일) + dd").text().trim();
return new String[]{name, jobPosition, major, lab, extensionNumber, email};
}
}

Original file line number Diff line number Diff line change
Expand Up @@ -60,10 +60,11 @@ private static List<StaffDto> convertStaffDtos(DeptInfo deptInfo, List<String[]>
return parseResult.stream()
.map(oneStaffInfo -> StaffDto.builder()
.name(oneStaffInfo[0])
.major(oneStaffInfo[1])
.lab(oneStaffInfo[2])
.phone(oneStaffInfo[3])
.email(oneStaffInfo[4])
.position(oneStaffInfo[1])
.major(oneStaffInfo[2])
.lab(oneStaffInfo[3])
.phone(oneStaffInfo[4])
.email(oneStaffInfo[5])
.deptName(deptInfo.getDeptName())
.collegeName(deptInfo.getCollegeName()
).build()
Expand Down
Loading

0 comments on commit a73c120

Please sign in to comment.