-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compatibility with SAS drives #22
base: master
Are you sure you want to change the base?
Conversation
Problem: In the current version, my SAS drive had empty fields: * Drive Model * Serial Number * Short/Extended test duration The script got stuck after the first SMART short test, reporting it would wait for 0 seconds. Assumptions: * smartctl reports different values in different fields on SAS drives (e.g. smartctl --capabilities only reports "SCSI device successfully opened") * Every drive offers a "short" test. SAS drives do not provide information about that * Every drive offers a "long" test. ATA drives will translate this into "extended" * A burn-in is not time critical but must be exhaustive Actions: * Made grep case insensitive for Serial number on SAS vs Serial Number on ATA * Changed "discard first two columns" to "discard everything until first colon" in get_smart_info_value * Added colon at the end of every inquired smart info * Changed test behavior "if success in smartctl then success; else if error in smartctl then error" to "if ATA error in smartctl then error; else if no test in progress then success" in poll_selftest_complete Remaining problems: * For SAS drives it will still print and log "waiting 0 seconds for test completion" Propositions: * Show Progress of SMART Test / remaining % instead of a fixed "waiting for ### seconds" * Print and log actual time until completion instead of reported test duration, or maybe both * As even SAS drives report a "long test duration", POLL_TIMEOUT_SECONDS could be set relative to that (like 1.5x), and not fixed to 4 hours. * Maybe warn on POLL_TIMEOUT and report time tested until now, instead of abort? * Switch to some other output form of smartctl, like json, and parse it with something like jq INFO ABOUT RUNNING TEST: SAS running short test smartctl -l selftest /dev/sdc smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.15-arch1-1] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === Self-test execution status: 84% of test remaining SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background short Self test in progress ... - NOW - [- - -] # 2 Background short Completed - 29626 - [- - -] # 3 Background short Completed - 29626 - [- - -] # 4 Background short Completed - 29625 - [- - -] # 5 Background short Completed - 29625 - [- - -] # 6 Background long Completed - 29625 - [- - -] # 7 Background short Completed - 29612 - [- - -] # 8 Background long Completed - 29608 - [- - -] Long (extended) Self-test duration: 3772 seconds [62.9 minutes] SAS not running short test: "Self-test execution status" line is missing ATA running extended test: smartctl -c /dev/sdg smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.15-arch1-1] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === General SMART Values: Offline data collection status: (0x05) Offline data collection activity was aborted by an interrupting command from host. Auto Offline Data Collection: Disabled. Self-test execution status: ( 242) Self-test routine in progress... 20% of test remaining. Total time to complete Offline data collection: ( 45) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 113) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. ATA drive after abort: Self-test execution status: ( 25) The self-test routine was aborted by the host. ATA drive after successful short test: Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. SMART INFO ON SAS DRIVE # echo $SMART_INFO smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.15-arch1-1] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: TOSHIBA Product: AL13SXB600N Revision: 5202 Compliance: SPC-3 User Capacity: 600,127,266,816 bytes [600 GB] Logical block size: 512 bytes Rotation Rate: 15000 rpm Form Factor: 2.5 inches Logical Unit id: 0x500003975861d374 Serial number: X6S0A03FFIYA Device type: disk Transport protocol: SAS (SPL-3) Local Time is: Sat Nov 20 14:18:01 2021 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Disabled or Not Supported FYI:# smartctl -A /dev/sdc smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.15-arch1-1] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === Current Drive Temperature: 51 C Drive Trip Temperature: 65 C Accumulated power on time, hours:minutes 29625:34 Manufactured in week 43 of year 2016 Specified cycle count over device lifetime: 50000 Accumulated start-stop cycles: 15 Specified load-unload count over device lifetime: 600000 Accumulated load-unload cycles: 15 Elements in grown defect list: 0 FYI to get info like ATA "reallocated sector" or "pending sector" os SAS devices: # smartctl -l error /dev/sdc smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.15-arch1-1] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 0 2567 1986 1986 0 993683.879 0 write: 0 0 0 0 0 49874.645 0 verify: 0 0 0 0 0 106222.531 0 Non-medium error count: 0
Excellent! I unfortunately forgot about this and no longer have SAS drives to work with, but glad to see this getting worked on. EDIT: Actually, I do have some SAS drives I'll be replacing with SATA, so I'll run this on those and see how it goes. |
I came here with the same need for SAS support and am testing this patch out now, but I wanted to suggest that perhaps a better way forward is to make use of the |
I still have one SAS drive model lying around for testing and would be open to contributing to a rewrite. @Spearfoot seems to be inactive since 10/2021, though... |
@ciscam your version is working brilliantly for me. I have 5 SAS WD Ultrastor's from 2018 that I got from eBay.... 50k hours, 17 power cycles. Script is working great. Shame @Spearfoot seems to no longer be maintaining this repo. |
We might look into forking? |
I apologize for letting this project wither for lack of attention... Work and Life have gotten in the way. |
Thanks for the kind words @gfilicetti. |
I also have had success with SAS drives using ciscam's PR, after original script errored out. I have done 2 types of drives (4 drives total)... 2x of 6 TB Seagate Exos Enterprise 2x of 12 TB HGST WD Ultrastar DC HC520 |
Problem:
In the current version, my SAS drive had empty fields:
The script got stuck after the first SMART short test, reporting it would wait for 0 seconds.
Assumptions:
Actions:
Remaining problems:
Propositions:
INFO ABOUT RUNNING TEST:
SAS running short test
SAS not running short test: "Self-test execution status" line is missing
ATA running extended test:
ATA drive after abort:
ATA drive after successful short test:
SMART INFO ON SAS DRIVE
FYI:
FYI to get info like ATA "reallocated sector" or "pending sector" os SAS devices: