Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always be Killed #77

Open
m4ra7h0n opened this issue Jul 30, 2024 · 0 comments
Open

Always be Killed #77

m4ra7h0n opened this issue Jul 30, 2024 · 0 comments

Comments

@m4ra7h0n
Copy link

m4ra7h0n commented Jul 30, 2024

我想爬取大概2000多个url,结果我需要写一个获取没爬取的url的文件,因为gospider经常被kill

import os

domains_ok = []
for filename in os.listdir('gs_output'):
    domain_ok = filename.replace('_', '.')
    domains_ok.append(domain_ok)

domains_not_ok = set()
with open('sub_alive.txt', 'r') as f:
    urls = f.read().split()
    for url in urls:
        flag = True
        for domain_ok in domains_ok:
            if domain_ok in url:
                flag = False
                break
        if flag:
            domains_not_ok.add(url)
            

with open('gs_continue.txt', 'w') as f:
    f.write('\n'.join(domains_not_ok))

能改改吗?别总被系统kill,我这样重复运行大概能有10次了,这2000个url还没爬完.

然后我通过使用systemd来解决这个问题,自动restart,然后更新未爬取的域名。期望代码赶快更新

[Unit]
Description=My Go Application

[Service]
# 指定你的 Go 应用程序可执行文件路径
ExecStart=/usr/lib/golang/bin/gospider -S /root/assets/dell/gs_continue.txt -o /root/assets/dell/gs_output -c 4 -d 2 --other-source --subs --sitemap --robots

# 停止时执行的操作
ExecStopPost=/usr/local/python3/bin/python3 /root/tools/gs_continue.py /root/assets/dell

# KillMode 控制了 systemd 如何发送信号来停止服务。mixed 模式意味着当服务需要停止时,systemd 将首先尝试使用 SIGTERM 信号来优雅地停止服务。如果服务在一定时间内没有响应,systemd 将使用更强烈的信号,如 SIGKILL,来强制终止服务。
KillMode=mixed

# 这个选项指定了 systemd 在尝试停止服务时最初使用的信号类型。SIGINT 通常与用户通过键盘中断程序(如按 Ctrl+C)所发出的信号相同。
KillSignal=SIGINT

# 可以创建的最大任务数量(通常是进程)
TasksMax=infinity

# 内存限制(例如,限制为 512MB)
MemoryMax=7.5G

# CPU 时间限制(例如,限制为 50%)
CPUQuota=95%

# 当达到资源限制时,允许的超时时间
TimeoutStopSec=10

# 失败时重启服务
Restart=on-failure
RestartSec=10s

[Install]
WantedBy=multi-user.target
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant