Skip to content

Commit

Permalink
Merge pull request #117 from Ljzd-PRO/devel
Browse files Browse the repository at this point in the history
Bump to v0.7.0
  • Loading branch information
Ljzd-PRO authored May 24, 2024
2 parents cc55ad6 + 497ff68 commit 2f3d642
Show file tree
Hide file tree
Showing 14 changed files with 502 additions and 456 deletions.
18 changes: 13 additions & 5 deletions .github/workflows/codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,16 +25,24 @@ on:

jobs:
test:
name: Test Coverage
name: Test With Coverage
runs-on: ${{ matrix.os }}
concurrency:
group: test-coverage-${{ github.ref }}-${{ matrix.os }}-${{ matrix.python-version }}
cancel-in-progress: true
strategy:
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11"]
os: [ubuntu-latest, windows-latest, macos-latest]
fail-fast: false
matrix:
# macos-14 aka. macos-latest has switched to being an ARM runner, only supporting newer versions of Python.
# https://github.com/actions/setup-python/issues/855#issuecomment-2096792205
os: [ ubuntu-latest, windows-latest, macos-13 ]
python-version: [ "3.8", "3.9", "3.10", "3.11" ]
include: # Use macos-latest here because it seems that Pydantic currently not support macOS ARM64 on 3.11, 3.12
- os: macos-latest
python-version: "3.11"
exclude:
- os: macos-13
python-version: "3.11"
env:
PYTEST_REPORT_FILENAME: report-${{ matrix.os }}-${{ matrix.python-version }}.html

Expand All @@ -51,7 +59,7 @@ jobs:
poetry run pytest -v --capture=sys --cov --cov-report=xml tests/
- name: Upload coverage report
uses: codecov/codecov-action@v3
uses: codecov/codecov-action@v4
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
with:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,9 @@ jobs:
cd ..
- name: Release
uses: softprops/action-gh-release@v1
uses: softprops/action-gh-release@v2
if: startsWith(github.ref, 'refs/tags/')
with:
body_path: CHANGELOG.md
files: artifact/*.zip
prerelease: contains(github.ref, 'beta')
prerelease: ${{ contains(github.ref, 'beta') }}
46 changes: 22 additions & 24 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,42 +2,40 @@

### 💡 Feature

- Add support for filename allow-list/block-list to filter downloaded files.
- Use Unix shell-style wildcards
- Edit `KTOOLBOX_JOB__ALLOW_LIST`, `KTOOLBOX_JOB__BLOCK_LIST` in `prod.env` or environment variables to set this option
- Add support for customizing filename:
- Edit `KTOOLBOX_JOB__FILENAME_FORMAT` in `prod.env` or environment variables to set this option (#116)
- 📖More information: [Configuration-Reference-JobConfiguration](https://ktoolbox.readthedocs.io/latest/configuration/reference/#ktoolbox.configuration.JobConfiguration)
```dotenv
# Only download files that match these pattern
KTOOLBOX_JOB__ALLOW_LIST=["*.jpg","*.jpeg","*.png"]
# Rename attachments in numerical order, e.g. `1.png`, `2.png`, ...
KTOOLBOX_JOB__SEQUENTIAL_FILENAME=True
# Not to download files that match these pattern
KTOOLBOX_JOB__BLOCK_LIST=["*.psd"]
# `{}`: Basic filename
# Can be used with the configuration option above.
# Rename attachments to `[2024-1-1]_1.png`, `[2024-1-1]_2.png`, ...
KTOOLBOX_JOB__FILENAME_FORMAT="[{published}]_{}"
```
- Default not to save `creator-indices.ktoolbox` (because it's useless now :(
- Change default post text content filename `index.html` to `content.txt`
### 🪲 Fix
- Fix missing `Post.file.name` may cause download file (`Post.file`) named to `None`
[//]: # (### 🪲 Fix)
- - -
### 💡 新特性
- 增加文件名白名单/黑名单支持以进行下载文件的过滤
- 使用 Unix 风格通配符
- 在 `prod.env` 或环境变量中编辑 `KTOOLBOX_JOB__POST_DIRNAME_FORMAT` 以设置该选项
- 📖更多信息: [Configuration-Reference-JobConfiguration](https://ktoolbox.readthedocs.io/latest/configuration/reference/#ktoolbox.configuration.JobConfiguration)
- 支持自定义下载的文件名格式:
- 在 `prod.env` 或环境变量中编辑 `KTOOLBOX_JOB__FILENAME_FORMAT` 以设置该选项 (#116)
- 📖更多信息: [配置-参考-JobConfiguration](https://ktoolbox.readthedocs.io/latest/configuration/reference/#ktoolbox.configuration.JobConfiguration)
```dotenv
# 只下载匹配这些模式的文件
KTOOLBOX_JOB__ALLOW_LIST=["*.jpg","*.jpeg","*.png"]
# 按照数字顺序重命名附件, 例如 `1.png`, `2.png`, ...
KTOOLBOX_JOB__SEQUENTIAL_FILENAME=True
# 不下载匹配这些模式的文件
KTOOLBOX_JOB__BLOCK_LIST=["*.psd"]
# `{}`:基本文件名
# 可以和上面的配置选项搭配使用
# 附件将被重命名为 `[2024-1-1]_1.png`, `[2024-1-1]_2.png`, ...
KTOOLBOX_JOB__FILENAME_FORMAT="[{published}]_{}"
```
- 默认不保存 `creator-indices.ktoolbox` (因为它现在已经没什么用了 :(
### 🪲 修复
- 更改默认的作品文本内容文件名 `index.html` 为 `content.txt`
- 修复缺失 `Post.file.name` 可能导致下载文件(`Post.file`)被命名为 `None`
[//]: # (### 🪲 修复)
**Full Changelog**: https://github.com/Ljzd-PRO/KToolBox/compare/v0.5.2...v0.6.0
**Full Changelog**: https://github.com/Ljzd-PRO/KToolBox/compare/v0.6.0...v0.7.0
7 changes: 7 additions & 0 deletions docs/en/configuration/guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@ KTOOLBOX_JOB__POST_STRUCTURE__ATTACHMENTS=./
# Rename attachments in numerical order, e.g. `1.png`, `2.png`, ...
KTOOLBOX_JOB__SEQUENTIAL_FILENAME=True
# Customize the filename format by inserting an empty `{}` to represent the basic filename.
# Similar to `post_dirname_format`, you can use some of the properties in `Post`.
# For example: `{title}_{}` > `HelloWorld_b4b41de2-8736-480d-b5c3-ebf0d917561b`, etc.
# You can also use it with `sequential_filename`. For instance,
# `[{published}]_{}` > `[2024-1-1]_1.png`, `[2024-1-1]_2.png`, etc.
KTOOLBOX_JOB__FILENAME_FORMAT=[{published}]_{}
# Prefix the post directory name with its release/publish date, e.g. `[2024-1-1]HelloWorld`
KTOOLBOX_JOB__POST_DIRNAME_FORMAT=[{published}]{title}
Expand Down
7 changes: 7 additions & 0 deletions docs/zh/configuration/guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@ KTOOLBOX_JOB__POST_STRUCTURE__ATTACHMENTS=./
# 按照数字顺序重命名附件, 例如 `1.png`, `2.png`, ...
KTOOLBOX_JOB__SEQUENTIAL_FILENAME=True
# 通过插入一个代表了基本文件名的空白的 `{}` 以自定义文件名格式
# 与 `post_dirname_format` 类似,你可以使用一些 `Post` 类里的属性
# 例如 `{title}_{}` > `HelloWorld_b4b41de2-8736-480d-b5c3-ebf0d917561b`
# 你也可以和 `sequential_filename` 搭配使用
# 例如 `[{published}]_{}` > `[2024-1-1]_1.png`, `[2024-1-1]_2.png`
KTOOLBOX_JOB__FILENAME_FORMAT=[{published}]_{}
# 将发布日期作为作品目录名的开头,例如 `[2024-1-1]HelloWorld`
KTOOLBOX_JOB__POST_DIRNAME_FORMAT=[{published}]{title}
Expand Down
7 changes: 7 additions & 0 deletions example.env
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,13 @@ KTOOLBOX_JOB__POST_STRUCTURE__ATTACHMENTS=./
# Rename attachments in numerical order, e.g. `1.png`, `2.png`, ...
KTOOLBOX_JOB__SEQUENTIAL_FILENAME=True

# Customize the filename format by inserting an empty `{}` to represent the basic filename.
# Similar to `post_dirname_format`, you can use some of the properties in `Post`.
# For example: `{title}_{}` > `HelloWorld_b4b41de2-8736-480d-b5c3-ebf0d917561b`, etc.
# You can also use it with `sequential_filename`. For instance,
# `[{published}]_{}` > `[2024-1-1]_1.png`, `[2024-1-1]_2.png`, etc.
KTOOLBOX_JOB__FILENAME_FORMAT=[{published}]_{}

# Prefix the post directory name with its release/publish date, e.g. `[2024-1-1]HelloWorld`
KTOOLBOX_JOB__POST_DIRNAME_FORMAT=[{published}]{title}

Expand Down
2 changes: 1 addition & 1 deletion ktoolbox/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__title__ = "KToolBox"
# noinspection SpellCheckingInspection
__description__ = "A useful CLI tool for downloading posts in Kemono.party / .su"
__version__ = "0.6.0"
__version__ = "0.7.0"
2 changes: 1 addition & 1 deletion ktoolbox/action/fetch.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ async def fetch_creator_posts(service: str, creator_id: str, o: int = 0) -> Asyn
:param creator_id: The ID of the creator
:param o: Result offset, stepping of 50 is enforced
:return: Async generator of several list of posts
:raise FetchInterruptError
:raise FetchInterruptError: Exception for interrupt of data fetching
"""
while True:
ret = await get_creator_post(service=service, creator_id=creator_id, o=o)
Expand Down
5 changes: 3 additions & 2 deletions ktoolbox/action/job.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

from ktoolbox._enum import PostFileTypeEnum, DataStorageNameEnum
from ktoolbox.action import ActionRet, fetch_creator_posts, FetchInterruptError
from ktoolbox.action.utils import generate_post_path_name, filter_posts_by_date
from ktoolbox.action.utils import generate_post_path_name, filter_posts_by_date, generate_filename
from ktoolbox.api.model import Post, Attachment
from ktoolbox.configuration import config, PostStructureConfiguration
from ktoolbox.job import Job, CreatorIndices
Expand Down Expand Up @@ -68,7 +68,8 @@ async def create_job_from_post(
config.job.block_list
)
):
alt_filename = f"{i + 1}{file_path_obj.suffix}" if config.job.sequential_filename else file_path_obj.name
basic_filename = f"{i + 1}{file_path_obj.suffix}" if config.job.sequential_filename else file_path_obj.name
alt_filename = generate_filename(post, basic_filename)
jobs.append(
Job(
path=attachments_path,
Expand Down
31 changes: 26 additions & 5 deletions ktoolbox/action/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,32 +8,53 @@
from ktoolbox.configuration import config
from ktoolbox.job import CreatorIndices

__all__ = ["generate_post_path_name", "filter_posts_by_date", "filter_posts_by_indices"]
__all__ = ["generate_post_path_name", "generate_filename", "filter_posts_by_date", "filter_posts_by_indices"]

TIME_FORMAT = "%Y-%m-%d"


def generate_post_path_name(post: Post) -> str:
"""Generate directory name for post to save."""
if config.job.post_id_as_path or not post.title:
return post.id
else:
time_format = "%Y-%m-%d"
try:
return sanitize_filename(
config.job.post_dirname_format.format(
id=post.id,
user=post.user,
service=post.service,
title=post.title,
added=post.added.strftime(time_format) if post.added else "",
published=post.published.strftime(time_format) if post.published else "",
edited=post.edited.strftime(time_format) if post.edited else ""
added=post.added.strftime(TIME_FORMAT) if post.added else "",
published=post.published.strftime(TIME_FORMAT) if post.published else "",
edited=post.edited.strftime(TIME_FORMAT) if post.edited else ""
)
)
except KeyError as e:
logger.error(f"`JobConfiguration.post_dirname_format` contains invalid key: {e}")
exit(1)


def generate_filename(post: Post, basic_name: str) -> str:
"""Generate download filename"""
try:
return sanitize_filename(
config.job.filename_format.format(
basic_name,
id=post.id,
user=post.user,
service=post.service,
title=post.title,
added=post.added.strftime(TIME_FORMAT) if post.added else "",
published=post.published.strftime(TIME_FORMAT) if post.published else "",
edited=post.edited.strftime(TIME_FORMAT) if post.edited else ""
)
)
except KeyError as e:
logger.error(f"`JobConfiguration.filename_format` contains invalid key: {e}")
exit(1)


def _match_post_date(
post: Post,
start_date: Optional[datetime],
Expand Down
13 changes: 10 additions & 3 deletions ktoolbox/configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,10 +109,10 @@ class PostStructureConfiguration(BaseModel):
```
:ivar attachments: Sub path of attachment directory
:ivar content_filepath: Sub path of post content HTML file
:ivar content_filepath: Sub path of post content file
"""
attachments: Path = Path("attachments")
content_filepath: Path = Path("index.html")
content_filepath: Path = Path("content.txt")


class JobConfiguration(BaseModel):
Expand All @@ -134,12 +134,18 @@ class JobConfiguration(BaseModel):
:ivar count: Number of coroutines for concurrent download
:ivar post_id_as_path: (**Deprecated**) Use post ID as post directory name
:ivar post_dirname_format: Customize the post directory name format, you can use some of the \
[properties](/configuration/reference/#ktoolbox.configuration.JobConfiguration) in ``Post``. \
[properties][ktoolbox.configuration.JobConfiguration] in ``Post``. \
e.g. ``[{published}]{id}`` > ``[2024-1-1]123123``, ``{user}_{published}_{title}`` > ``234234_2024-1-1_HelloWorld``
:ivar post_structure: Post path structure
:ivar mix_posts: Save all files from different posts at same path in creator directory. \
It would not create any post directory, and ``CreatorIndices`` would not been recorded.
:ivar sequential_filename: Rename attachments in numerical order, e.g. ``1.png``, ``2.png``, ...
:ivar filename_format: Customize the filename format by inserting an empty ``{}`` to represent the basic filename.
Similar to post_dirname_format, you can use some of the [properties][ktoolbox.configuration.JobConfiguration] \
in Post. For example: ``{title}_{}`` could result in filenames like \
``HelloWorld_b4b41de2-8736-480d-b5c3-ebf0d917561b``, ``HelloWorld_af349b25-ac08-46d7-98fb-6ce99a237b90``, etc. \
You can also use it with ``sequential_filename``. For instance, \
``[{published}]_{}`` could result in filenames like ``[2024-1-1]_1.png``, ``[2024-1-1]_2.png``, etc.
:ivar allow_list: Download files which match these patterns (Unix shell-style), e.g. ``["*.png"]``
:ivar block_list: Not to download files which match these patterns (Unix shell-style), e.g. ``["*.psd","*.zip"]``
"""
Expand All @@ -149,6 +155,7 @@ class JobConfiguration(BaseModel):
post_structure: PostStructureConfiguration = PostStructureConfiguration()
mix_posts: bool = False
sequential_filename: bool = False
filename_format: str = "{}"
allow_list: Set[str] = set()
block_list: Set[str] = set()

Expand Down
2 changes: 1 addition & 1 deletion ktoolbox/downloader/downloader.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ async def run(
:param tqdm_class: ``tqdm`` class to replace default ``tqdm.asyncio.tqdm``
:param progress: Show progress bar
:return: ``DownloaderRet`` which contain the actual output filename
:raise CancelledError
:raise CancelledError: Job cancelled
"""
# Get filename to check if file exists (First-time duplicate file check)
# Check it before request to make progress more efficiency
Expand Down
Loading

0 comments on commit 2f3d642

Please sign in to comment.