Compute-Cluster: Difference between revisions
From HateotU
Line 249: | Line 249: | ||
Info 2022-05-08 05:19:12 58 A1899308 Disk detected error (Channel:0 ID:40 SN:3QQ1DZ8700009004VXWF Encl:2 Slot:8) Key,Code,Qual=(01h,17h,02h) cdb:Rd 00018180 0080 Info:000181a6h CmdSpc:0h FRU:0h SnsKeySpc:800010h Recovered Error recovered data with positive head offset | Info 2022-05-08 05:19:12 58 A1899308 Disk detected error (Channel:0 ID:40 SN:3QQ1DZ8700009004VXWF Encl:2 Slot:8) Key,Code,Qual=(01h,17h,02h) cdb:Rd 00018180 0080 Info:000181a6h CmdSpc:0h FRU:0h SnsKeySpc:800010h Recovered Error recovered data with positive head offset | ||
Warning 2022-06-11 19:55:53 55 A1899596 Disk drive (Channel:0 ID:34 SN:EA09PB80A1TS Encl:2 Slot:2) reported a SMART event sense key:Recovered Error(01h) ASC:5Dh ASCQ:00h failure prediction threshold exceeded Info:00000000 | |||
Critical 2022-06-11 18:50:52 314 A1899595 FRU type: drive, problem: encl 2 deviceID 34. Vendor: IBM-ES Product ID: MBF2600RC , S/N: EA09PB80A1TS rev: SB2F. Related event ID: 1899594, type: 55 | |||
Revision as of 19:43, 17 June 2022
compute0.hateotu.de
- Fujitsu RX300 S4
compute0.hateotu.de 10.204.3.220 Proxmox VE 6 SSH @ 22, HTTPS @ 8006
- 48GB memory, 12* 4GB DDR2 FB
- 2* E5420 (4+0 cores per socket, @2.5GHz)
- SAS1
- LSI SAS1068E-based controller, flashed to HBA mode
- 2* 73GB 15k 2.5"
- FC
- FC-HBA Emulex Zephyr-X 2*4G
- 10:00:00:00:c9:77:e8:6c -> sw1-1 p1
- 10:00:00:00:c9:77:e8:6d -> free
- FC-HBA QLogic QLE2460 1* 4G
- 21:00:00:1b:32:09:b9:b6 -> sw2-1 p3
- FC-HBA Emulex Zephyr-X 2*4G
- storage
- OS on ZFS, mirror of 2* 73GB
- imported LUNs from storage0 via redundant FC
- multipathd
out-of-band management
irmc-compute0.hateotu.de 10.204.3.225 with KVM license
compute1.hateotu.de
- Fujitsu RX300 S4
compute1.hateotu.de 10.204.3.221 Proxmox VE 6 SSH @ 22, HTTPS @ 8006
- 48GB memory, 12* 4GB DDR2 FB
- 2* E5420 (4+0 cores per socket, @2.5GHz)
- SAS1
- LSI SAS1068E-based controller, flashed to HBA mode
- 2* 73GB 15k 2.5"
- FC
- FC-HBA Emulex Zephyr-X 2*4G
- 10:00:00:00:c9:77:e3:90 -> sw1-1 p2
- 10:00:00:00:c9:77:e3:91 -> free
- FC-HBA QLogic QLE2460 1* 4G
- 21:00:00:1b:32:09:68:b2 -> sw2-1 p1
- FC-HBA Emulex Zephyr-X 2*4G
- storage
- OS on ZFS, mirror of 2* 73GB
- imported LUNs from storage0 via redundant FC
- multipathd
out-of-band management
irmc-compute1.hateotu.de 10.204.3.226 with KVM license
compute2.hateotu.de
- Fujitsu RX300 S4
compute2.hateotu.de 10.204.3.222 Proxmox VE 6 SSH @ 22, HTTPS @ 8006
- 48GB memory, 12* 4GB DDR2 FB
- 2* E5420 (4+0 cores per socket, @2.5GHz)
- SAS1
- LSI SAS1068E-based controller, flashed to HBA mode
- 2* 73GB 15k 2.5"
- FC
- FC-HBA Emulex Zephyr-X 2*4G
- 10:00:00:00:c9:7c:95:0a -> sw1-1 p3
- 10:00:00:00:c9:7c:95:09 -> free
- FC-HBA QLogic QLE2460 1* 4G
- 21:00:00:1b:32:89:c4:62 -> sw2-1 p2
- FC-HBA Emulex Zephyr-X 2*4G
- storage
- OS on ZFS, mirror of 2* 73GB
- imported LUNs from storage0 via redundant FC
- multipathd
out-of-band management
irmc-compute2.hateotu.de 10.204.3.227 with KVM license
storage0.hateotu.de
- Fujitsu FibreCAT SX80
limitiert auf ~2.1TB pro Festplatte
862820-0807D5276D 500C0FF0D52C7A3C 500C0FF0DA6C263C 500C0FF0DA69103C storage0.hateotu.de 10.204.3.223 00:c0:ff:d5:27:6d HTTPS @ 443
- master shelf + 2* disk shelf, 12 FC-HDDs (via interposer) 3.5" each
- 18* 146GB
- 18* 450GB
- RAIDs
- vdisk146_0: 8*146GB RAID6 -> ~880GB
- vdisk146_1: 8*146GB RAID6 -> ~880GB
- + global spares: 2* 146GB
- vdisk450_0: 8*450GB RAID6 -> ~2700GB
- vdisk450_1: 8*450GB RAID6 -> ~2700GB
- + global spares: 2* 450GB
- SUM 7160 GB / 6,99 TB = 6,52 TiB
- LUNs exported via FC & merged via multipathd, then configured for shared LVM
Info 2020-12-22 18:08:51 58 A1893432 Disk detected error (Channel:0 ID:19 SN:3QQ1T0H800009004Y70Q Encl:1 Slot:3) Key,Code,Qual=(01h,18h,08h) cdb:Rd 0005bd80 0080 Info:0005bde8h CmdSpc:0h FRU:0h SnsKeySpc:800096h Recovered Error Info 2020-01-19 18:46:55 58 A1887662 Disk detected error (Channel:0 ID:37 SN:3QQ1WFJ500009004YAXC Encl:2 Slot:5) Key,Code,Qual=(01h,18h,01h) cdb:Rd 00046480 0080 Info:0004648eh CmdSpc:0h FRU:0h SnsKeySpc:800039h Recovered Error recovered data with error corr. & retries applied Info 2021-10-13 07:40:37 58 A1897520 Disk detected error (Channel:0 ID:37 SN:3QQ1WFJ500009004YAXC Encl:2 Slot:5) Key,Code,Qual=(01h,17h,02h) cdb:Rd 0013fa80 0080 Info:0013faa4h CmdSpc:0h FRU:0h SnsKeySpc:800010h Recovered Error recovered data with positive head offset enc 2 slot 2 3QQ12XVN00009004UMLZ hat auch schon mal gemeckert Info 2021-01-06 00:15:30 58 A1893622 Disk detected error (Channel:0 ID:19 SN:3QQ1T0H800009004Y70Q Encl:1 Slot:3) Key,Code,Qual=(03h,11h,00h) cdb:Rd 0005bd80 0080 Info:0005bde8h CmdSpc:0h FRU:81h SnsKeySpc:800096h Medium Error unrecovered read error Warning 2021-01-06 00:15:35 8 A1893626 Vdisk vdisk450_1 drive down (Channel:0 ID:19 SN:3QQ1T0H800009004Y70Q Encl:1 Slot:3) Info 2021-01-06 00:15:36 9 A1893629 Spare kicked in (Channel:0 ID:22, SN:3QQ1T3QC00009004Y5MH Encl:1 Slot:6) for critical Vdisk (Vdisk: vdisk450_1, SN: 00c0ffd5276d0048749a235e00000000) Info 2021-01-06 00:15:36 37 A1893630 Vdisk reconstruct started (Vdisk: vdisk450_1, SN: 00c0ffd5276d0048749a235e00000000) drive: Channel:0 ID:22 SN:3QQ1T3QC00009004Y5MH Encl:1 Slot:6 Info 2021-01-09 00:18:40 58 A1893649 Disk detected error (Channel:0 ID:16 SN:3QQ1T1DG00009004YC2Y Encl:1 Slot:0) Key,Code,Qual=(01h,18h,01h) cdb:Rd 0130cf80 0080 Info:0130cfaah CmdSpc:0h FRU:1h SnsKeySpc:800037h Recovered Error recovered data with error corr. & retries applied Warning 2021-01-10 17:58:40 8 A1893656 Vdisk vdisk450_0 drive down (Channel:0 ID:32 SN:3QQ1T1SB00009003QT8Z Encl:2 Slot:0) Critical 2021-01-10 17:58:40 314 A1893657 FRU type: drive, problem: encl 2 deviceID 32. Vendor: SEAGAT Product ID: ST3450856SS , S/N: 3QQ1T1SB00009003QT8Z rev: 0006. Related event ID: 1893656, type: 8 Warning 2021-01-10 17:58:40 1 A1893658 Vdisk critical: vdisk450_0, SN: 00c0ffd5276d00480c9a235e00000000 Critical 2021-01-10 17:58:41 207 A1893659 Vdisk scrub job failed. Command failed (error code: 1) (number of errors found: 0) (vdisk: vdisk450_0, SN: 00c0ffd5276d00480c9a235e00000000) Warning 2021-01-10 18:02:52 18 A1893665 Vdisk reconstruct failed. Command failed (error code 1). (Vdisk: vdisk450_0, SN: 00c0ffd5276d00480c9a235e00000000) Warning 2021-01-10 18:02:53 78 A1893666 Spare drive unusable (too small) for Vdisk: vdisk450_0, SN: 00c0ffd5276d00480c9a235e00000000 [...] Info 2021-01-10 18:02:52 59 A1893664 Disk channel error (Channel:0 ID:34 SN:3QQ12XVN00009004UMLZ Encl:2 Slot:2): I/O Timeout cdb:10 additional Warning 2021-01-10 18:02:52 18 A1893665 Vdisk reconstruct failed. Command failed (error code 1). (Vdisk: vdisk450_0, SN: 00c0ffd5276d00480c9a235e00000000) Warning 2021-04-05 14:15:56 58 A1895145 Disk detected error (Channel:0 ID:16 SN:3QQ1T1DG00009004YC2Y Encl:1 Slot:0) Key,Code,Qual=(04h,15h,01h) cdb:Rd 000000e2 0004 Info:000000e2h CmdSpc:0h FRU:83h SnsKeySpc:802049h Hardware mechanical positioning error Info 2021-07-07 03:10:43 58 A1896752 Disk detected error (Channel:0 ID:7 SN:3LN4L5LV00009834Q4PS Encl:0 Slot:7) Key,Code,Qual=(01h,17h,01h) cdb:Rd 00591d00 0080 Info:00591d68h CmdSpc:0h FRU:0h SnsKeySpc:800031h Recovered Error recovered data with retries Info 2021-07-04 16:13:11 59 A1896701 Disk channel error (Channel:0 ID:38 SN:3QQ037RW00009004TVCN Encl:2 Slot:6): I/O Timeout cdb:Rd 135c5b00 0080 Info 2021-07-08 01:10:57 58 A1896809 Disk detected error (Channel:0 ID:40 SN:3QQ1DZ8700009004VXWF Encl:2 Slot:8) Key,Code,Qual=(01h,17h,01h) cdb:Rd 00017780 0080 Info:000177a1h CmdSpc:0h FRU:0h SnsKeySpc:800002h Recovered Error recovered data with retries Info 2021-09-15 13:14:07 58 A1897323 Disk detected error (Channel:0 ID:39 SN:3QQ1T1CC00009004Y5DH Encl:2 Slot:7) Key,Code,Qual=(01h,17h,01h) cdb:Rd 0002d200 0080 Info:0002d275h CmdSpc:0h FRU:0h SnsKeySpc:800003h Recovered Error recovered data with retries Disk drive (Channel:0 ID:21 SN: Encl:1 Slot:5) reported a SMART event sense key:Recovered Error(01h) ASC:5Dh ASCQ:00h failure prediction threshold exceeded Info:00000000 Disk drive (Channel:0 ID:39 SN: Encl:2 Slot:7) reported a SMART event sense key:Recovered Error(01h) ASC:5Dh ASCQ:00h failure prediction threshold exceeded Info:00000000 Warning 2022-03-09 12:24:53 58 A1898846 Disk detected error (Channel:0 ID:39 SN:3QQ1T1CC00009004Y5DH Encl:2 Slot:7) Key,Code,Qual=(04h,32h,00h) cdb:Rd 0002a880 0080 Info:0002a89ch CmdSpc:0h FRU:9dh SnsKeySpc:800096h Hardware no defect spare location available Info 2022-03-27 13:29:51 58 A1899000 Disk detected error (Channel:0 ID:34 SN:EA09PB80A1TS Encl:2 Slot:2) Key,Code,Qual=(01h,17h,EFh) cdb:Rd 2ccf8100 0080 Info:2ccf817fh CmdSpc:0h FRU:0h SnsKeySpc:800000h Recovered Error Info 2022-04-03 18:01:19 58 A1899051 Disk detected error (Channel:0 ID:34 SN:EA09PB80A1TS Encl:2 Slot:2) Key,Code,Qual=(01h,17h,02h) cdb:Rd 2caee780 0080 Info:2caee7adh CmdSpc:0h FRU:0h SnsKeySpc:800004h Recovered Error recovered data with positive head offset Info 2022-04-03 18:01:17 58 A1899050 Disk detected error (Channel:0 ID:34 SN:EA09PB80A1TS Encl:2 Slot:2) Key,Code,Qual=(01h,18h,00h) cdb:Rd 2caee180 0080 Info:2caee1a0h CmdSpc:0h FRU:0h SnsKeySpc:800001h Recovered Error recovered data with error correction applied Info 2022-04-04 20:24:25 58 A1899060 Disk detected error (Channel:0 ID:34 SN:EA09PB80A1TS Encl:2 Slot:2) Key,Code,Qual=(01h,17h,03h) cdb:Rd 136d3000 0080 Info:136d3014h CmdSpc:0h FRU:0h SnsKeySpc:800004h Recovered Error recovered data with negative head offset Info 2022-04-04 20:24:22 58 A1899059 Disk detected error (Channel:0 ID:34 SN:EA09PB80A1TS Encl:2 Slot:2) Key,Code,Qual=(01h,17h,01h) cdb:Rd 136d0d00 0080 Info:136d0d3eh CmdSpc:0h FRU:0h SnsKeySpc:800002h Recovered Error recovered data with retries Info 2022-04-04 23:00:13 58 A1899063 Disk detected error (Channel:0 ID:34 SN:EA09PB80A1TS Encl:2 Slot:2) Key,Code,Qual=(01h,18h,A0h) cdb:Rd 2d432b00 0080 Info:2d432b27h CmdSpc:0h FRU:0h SnsKeySpc:80000ch Recovered Error Info 2022-04-04 22:51:22 58 A1899062 Disk detected error (Channel:0 ID:34 SN:EA09PB80A1TS Encl:2 Slot:2) Key,Code,Qual=(01h,17h,03h) cdb:Rd 2c4d8e00 0080 Info:2c4d8e06h CmdSpc:0h FRU:0h SnsKeySpc:800008h Recovered Error recovered data with negative head offset Info 2022-04-19 13:05:17 58 A1899166 Disk detected error (Channel:0 ID:7 SN:3LN4L5LV00009834Q4PS Encl:0 Slot:7) Key,Code,Qual=(01h,17h,01h) cdb:Rd 0bf9d180 0080 Info:0bf9d18eh CmdSpc:0h FRU:0h SnsKeySpc:800032h Recovered Error recovered data with retries Info 2022-04-24 23:15:51 58 A1899212 Disk detected error (Channel:0 ID:7 SN:3LN4L5LV00009834Q4PS Encl:0 Slot:7) Key,Code,Qual=(01h,17h,01h) cdb:Rd 0bf9d180 0080 Info:0bf9d18eh CmdSpc:0h FRU:0h SnsKeySpc:800032h Recovered Error recovered data with retries Info 2022-05-08 05:19:24 58 A1899309 Disk detected error (Channel:0 ID:40 SN:3QQ1DZ8700009004VXWF Encl:2 Slot:8) Key,Code,Qual=(01h,18h,01h) cdb:Rd 00029b80 0080 Info:00029bf2h CmdSpc:0h FRU:1h SnsKeySpc:80000ah Recovered Error recovered data with error corr. & retries applied Info 2022-05-08 05:19:12 58 A1899308 Disk detected error (Channel:0 ID:40 SN:3QQ1DZ8700009004VXWF Encl:2 Slot:8) Key,Code,Qual=(01h,17h,02h) cdb:Rd 00018180 0080 Info:000181a6h CmdSpc:0h FRU:0h SnsKeySpc:800010h Recovered Error recovered data with positive head offset Warning 2022-06-11 19:55:53 55 A1899596 Disk drive (Channel:0 ID:34 SN:EA09PB80A1TS Encl:2 Slot:2) reported a SMART event sense key:Recovered Error(01h) ASC:5Dh ASCQ:00h failure prediction threshold exceeded Info:00000000 Critical 2022-06-11 18:50:52 314 A1899595 FRU type: drive, problem: encl 2 deviceID 34. Vendor: IBM-ES Product ID: MBF2600RC , S/N: EA09PB80A1TS rev: SB2F. Related event ID: 1899594, type: 55
fcsw01.hateotu.de
- Brocade 300, 8*8G FC licenced
fcsw01.hateotu.de 10.204.3.230 00:27:f8:81:ee:a6
fcsw02.hateotu.de
- Brocade 300, 8*8G FC licenced
fcsw02.hateotu.de 10.204.3.229 00:27:f8:81:ff:ae
installed the two missing iRMC-KVM-lics which got sponsored by Fujitsu --Carwe (talk) 18:47, 20 December 2020 (CET)