Skip to content

Commit

Permalink
Fixed gridstat -j <id> + prepared implem of TOO_MANY_RESUBMIT
Browse files Browse the repository at this point in the history
  • Loading branch information
bzizou committed Sep 23, 2024
1 parent aea9936 commit cf43528
Show file tree
Hide file tree
Showing 5 changed files with 21 additions and 2 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Unreleased
- Fixed notifications
- Fixed OAR_AUTO_RESUBMIT events (resubmit id could be missed)
- Fixed gridstat -f
- Fixed gridstat -j <id>

version 3.2.0
-------------
Expand Down
1 change: 1 addition & 0 deletions bin/gridstat.rb
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,7 @@
elsif cinfos
puts response.body
else
j = JSON.parse(response.body)
Cigri::Client.print_job(j)
end
elsif jdl
Expand Down
2 changes: 1 addition & 1 deletion lib/cigri-colombolib.rb
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ def check_clusters
if event.props[:class]=="cluster"
COLOMBOLIBLOGGER.debug("Checking event #{event.props[:code]}")
case event.props[:code]
when "REQUEST_TOO_LARGE","POST_TIMEOUT","TIMEOUT", "CONNECTION_RESET", "SOCKET_ERROR", "CONNECTION_REFUSED", "HOST_UNREACHABLE", "SSL_ERROR", "GET_JOBS", "GET_JOB", "GET_MEDIA","GET_STRESS_FACTOR", "FILL_JOBS_CACHE", "RUNNER_GET_JOB_CHUNK_ERROR"
when "REQUEST_TOO_LARGE","POST_TIMEOUT","TIMEOUT", "CONNECTION_RESET", "SOCKET_ERROR", "CONNECTION_REFUSED", "HOST_UNREACHABLE", "SSL_ERROR", "GET_JOBS", "GET_JOB", "GET_MEDIA","GET_STRESS_FACTOR", "FILL_JOBS_CACHE", "RUNNER_GET_JOB_CHUNK_ERROR", "TOO_MANY_RESUBMIT"
blacklist_cluster(event.id,event.props[:cluster_id],event.props[:campaign_id])
event.checked
when "CLUSTER_MANUALLY_DISABLED"
Expand Down
1 change: 0 additions & 1 deletion lib/cigri-iolib.rb
Original file line number Diff line number Diff line change
Expand Up @@ -848,7 +848,6 @@ def get_campaign_tasks(dbh, id, limit, offset)
res << row.to_h
end
sth.finish
IOLIBLOGGER.debug(res.inspect)
return res
end

Expand Down
18 changes: 18 additions & 0 deletions modules/updator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,24 @@ def notify_judas
end
end

##
# Check for campaigns with too bad resubmit_rate
##
# TODO for each cluster! Because blacklisting is by cluster!
#campaigns=Cigri::Campaignset.new
#campaigns.get_running
#campaigns.each do |campaign|
# if campaign.resubmit_rate > 0.6
# logger.info("campaign #{campaign.id} has a lot of resubmits!")
# if campaign.tasks(100,0).length() > 10
# event=Cigri::Event.new(:class => 'campaign', :state => 'open', :campaign_id => campaign.id,
# :code => "TOO_MANY_RESUBMIT", :message => "Your campaign #{campaign.id} has too many resubmit jobs. Please, check the duration of your jobs and walltime, then kill and restart your campaign.")
# notify_judas
# Cigri::Colombo.new(event).check_clusters
# end
# end
#end

##
# Autofix clusters
##
Expand Down

0 comments on commit cf43528

Please sign in to comment.