Compute-Cluster: Difference between revisions

From HateotU
Line 249: Line 249:


Info 2022-05-08 05:19:12 58 A1899308 Disk detected error (Channel:0 ID:40 SN:3QQ1DZ8700009004VXWF Encl:2 Slot:8) Key,Code,Qual=(01h,17h,02h) cdb:Rd 00018180 0080 Info:000181a6h CmdSpc:0h FRU:0h SnsKeySpc:800010h Recovered Error recovered data with positive head offset  
Info 2022-05-08 05:19:12 58 A1899308 Disk detected error (Channel:0 ID:40 SN:3QQ1DZ8700009004VXWF Encl:2 Slot:8) Key,Code,Qual=(01h,17h,02h) cdb:Rd 00018180 0080 Info:000181a6h CmdSpc:0h FRU:0h SnsKeySpc:800010h Recovered Error recovered data with positive head offset  
Warning 2022-06-11 19:55:53 55 A1899596 Disk drive (Channel:0 ID:34 SN:EA09PB80A1TS Encl:2 Slot:2) reported a SMART event sense key:Recovered Error(01h) ASC:5Dh ASCQ:00h failure prediction threshold exceeded Info:00000000
Critical 2022-06-11 18:50:52 314 A1899595 FRU type: drive, problem: encl 2 deviceID 34. Vendor: IBM-ES Product ID: MBF2600RC , S/N: EA09PB80A1TS rev: SB2F. Related event ID: 1899594, type: 55





Revision as of 19:43, 17 June 2022


compute0.hateotu.de

  • Fujitsu RX300 S4
compute0.hateotu.de
10.204.3.220

Proxmox VE 6
SSH @ 22, HTTPS @ 8006
  • 48GB memory, 12* 4GB DDR2 FB
  • 2* E5420 (4+0 cores per socket, @2.5GHz)
  • SAS1
    • LSI SAS1068E-based controller, flashed to HBA mode
    • 2* 73GB 15k 2.5"
  • FC
    • FC-HBA Emulex Zephyr-X 2*4G
      • 10:00:00:00:c9:77:e8:6c -> sw1-1 p1
      • 10:00:00:00:c9:77:e8:6d -> free
    • FC-HBA QLogic QLE2460 1* 4G
      • 21:00:00:1b:32:09:b9:b6 -> sw2-1 p3
  • storage
    • OS on ZFS, mirror of 2* 73GB
    • imported LUNs from storage0 via redundant FC
      • multipathd



out-of-band management

irmc-compute0.hateotu.de
10.204.3.225

with KVM license


compute1.hateotu.de

  • Fujitsu RX300 S4
compute1.hateotu.de
10.204.3.221

Proxmox VE 6
SSH @ 22, HTTPS @ 8006
  • 48GB memory, 12* 4GB DDR2 FB
  • 2* E5420 (4+0 cores per socket, @2.5GHz)
  • SAS1
    • LSI SAS1068E-based controller, flashed to HBA mode
    • 2* 73GB 15k 2.5"
  • FC
    • FC-HBA Emulex Zephyr-X 2*4G
      • 10:00:00:00:c9:77:e3:90 -> sw1-1 p2
      • 10:00:00:00:c9:77:e3:91 -> free
    • FC-HBA QLogic QLE2460 1* 4G
      • 21:00:00:1b:32:09:68:b2 -> sw2-1 p1
  • storage
    • OS on ZFS, mirror of 2* 73GB
    • imported LUNs from storage0 via redundant FC
      • multipathd


out-of-band management

irmc-compute1.hateotu.de
10.204.3.226

with KVM license



compute2.hateotu.de

  • Fujitsu RX300 S4
compute2.hateotu.de
10.204.3.222

Proxmox VE 6
SSH @ 22, HTTPS @ 8006
  • 48GB memory, 12* 4GB DDR2 FB
  • 2* E5420 (4+0 cores per socket, @2.5GHz)
  • SAS1
    • LSI SAS1068E-based controller, flashed to HBA mode
    • 2* 73GB 15k 2.5"
  • FC
    • FC-HBA Emulex Zephyr-X 2*4G
      • 10:00:00:00:c9:7c:95:0a -> sw1-1 p3
      • 10:00:00:00:c9:7c:95:09 -> free
    • FC-HBA QLogic QLE2460 1* 4G
      • 21:00:00:1b:32:89:c4:62 -> sw2-1 p2


  • storage
    • OS on ZFS, mirror of 2* 73GB
    • imported LUNs from storage0 via redundant FC
      • multipathd


out-of-band management

irmc-compute2.hateotu.de
10.204.3.227

with KVM license



storage0.hateotu.de

  • Fujitsu FibreCAT SX80
limitiert auf ~2.1TB pro Festplatte
862820-0807D5276D
500C0FF0D52C7A3C
500C0FF0DA6C263C
500C0FF0DA69103C

storage0.hateotu.de
10.204.3.223
00:c0:ff:d5:27:6d
HTTPS @ 443
  • master shelf + 2* disk shelf, 12 FC-HDDs (via interposer) 3.5" each
    • 18* 146GB
    • 18* 450GB
  • RAIDs
    • vdisk146_0: 8*146GB RAID6 -> ~880GB
    • vdisk146_1: 8*146GB RAID6 -> ~880GB
    • + global spares: 2* 146GB
    • vdisk450_0: 8*450GB RAID6 -> ~2700GB
    • vdisk450_1: 8*450GB RAID6 -> ~2700GB
    • + global spares: 2* 450GB
    • SUM 7160 GB / 6,99 TB = 6,52 TiB
  • LUNs exported via FC & merged via multipathd, then configured for shared LVM


Info 	2020-12-22 18:08:51 	58
	A1893432 	Disk detected error (Channel:0 ID:19 SN:3QQ1T0H800009004Y70Q Encl:1 Slot:3) Key,Code,Qual=(01h,18h,08h) cdb:Rd 0005bd80 0080 Info:0005bde8h CmdSpc:0h FRU:0h SnsKeySpc:800096h Recovered Error 


Info 	2020-01-19 18:46:55 	58	A1887662 	
Disk detected error (Channel:0 ID:37 SN:3QQ1WFJ500009004YAXC Encl:2 Slot:5) Key,Code,Qual=(01h,18h,01h) cdb:Rd 00046480 0080 Info:0004648eh CmdSpc:0h FRU:0h SnsKeySpc:800039h Recovered Error recovered data with error corr. & retries applied 

Info 	2021-10-13 07:40:37 	
58
	A1897520 	Disk detected error (Channel:0 ID:37 SN:3QQ1WFJ500009004YAXC Encl:2 Slot:5) Key,Code,Qual=(01h,17h,02h) cdb:Rd 0013fa80 0080 Info:0013faa4h CmdSpc:0h FRU:0h SnsKeySpc:800010h Recovered Error recovered data with positive head offset 


   enc 2 slot 2 3QQ12XVN00009004UMLZ hat auch schon mal gemeckert


Info 	2021-01-06 00:15:30 	
58
	A1893622 	Disk detected error (Channel:0 ID:19 SN:3QQ1T0H800009004Y70Q Encl:1 Slot:3) Key,Code,Qual=(03h,11h,00h) cdb:Rd 0005bd80 0080 Info:0005bde8h CmdSpc:0h FRU:81h SnsKeySpc:800096h Medium Error unrecovered read error 
Warning 	2021-01-06 00:15:35 	
8
	A1893626 	Vdisk vdisk450_1 drive down (Channel:0 ID:19 SN:3QQ1T0H800009004Y70Q Encl:1 Slot:3) 
Info 	2021-01-06 00:15:36 	
9
	A1893629 	Spare kicked in (Channel:0 ID:22, SN:3QQ1T3QC00009004Y5MH Encl:1 Slot:6) for critical Vdisk (Vdisk: vdisk450_1, SN: 00c0ffd5276d0048749a235e00000000) 
Info 	2021-01-06 00:15:36 	
37
	A1893630 	Vdisk reconstruct started (Vdisk: vdisk450_1, SN: 00c0ffd5276d0048749a235e00000000) drive: Channel:0 ID:22 SN:3QQ1T3QC00009004Y5MH Encl:1 Slot:6 




Info 	2021-01-09 00:18:40 	
58
	A1893649 	Disk detected error (Channel:0 ID:16 SN:3QQ1T1DG00009004YC2Y Encl:1 Slot:0) Key,Code,Qual=(01h,18h,01h) cdb:Rd 0130cf80 0080 Info:0130cfaah CmdSpc:0h FRU:1h SnsKeySpc:800037h Recovered Error recovered data with error corr. & retries applied 

Warning 	2021-01-10 17:58:40	8	A1893656	Vdisk vdisk450_0 drive down (Channel:0 ID:32 SN:3QQ1T1SB00009003QT8Z Encl:2 Slot:0)
Critical	2021-01-10 17:58:40	314	A1893657	FRU type: drive, problem: encl 2 deviceID 32. Vendor: SEAGAT Product ID: ST3450856SS , S/N: 3QQ1T1SB00009003QT8Z rev: 0006. Related event ID: 1893656, type: 8
Warning 	2021-01-10 17:58:40	1	A1893658	Vdisk critical: vdisk450_0, SN: 00c0ffd5276d00480c9a235e00000000
Critical	2021-01-10 17:58:41	207	A1893659	Vdisk scrub job failed. Command failed (error code: 1) (number of errors found: 0) (vdisk: vdisk450_0, SN: 00c0ffd5276d00480c9a235e00000000)
Warning 	2021-01-10 18:02:52	18	A1893665	Vdisk reconstruct failed. Command failed (error code 1). (Vdisk: vdisk450_0, SN: 00c0ffd5276d00480c9a235e00000000)
Warning 	2021-01-10 18:02:53	78	A1893666	Spare drive unusable (too small) for Vdisk: vdisk450_0, SN: 00c0ffd5276d00480c9a235e00000000


[...]
Info 	2021-01-10 18:02:52 	
59
	A1893664 	Disk channel error (Channel:0 ID:34 SN:3QQ12XVN00009004UMLZ Encl:2 Slot:2): I/O Timeout cdb:10 additional 
Warning 	2021-01-10 18:02:52 	
18
	A1893665 	Vdisk reconstruct failed. Command failed (error code 1). (Vdisk: vdisk450_0, SN: 00c0ffd5276d00480c9a235e00000000)


Warning 	2021-04-05 14:15:56		58		A1895145		Disk detected error (Channel:0 ID:16 SN:3QQ1T1DG00009004YC2Y Encl:1 Slot:0) Key,Code,Qual=(04h,15h,01h) cdb:Rd 000000e2 0004 Info:000000e2h CmdSpc:0h FRU:83h SnsKeySpc:802049h Hardware mechanical positioning error		


Info 	2021-07-07 03:10:43 	
58
	A1896752 	Disk detected error (Channel:0 ID:7 SN:3LN4L5LV00009834Q4PS Encl:0 Slot:7) Key,Code,Qual=(01h,17h,01h) cdb:Rd 00591d00 0080 Info:00591d68h CmdSpc:0h FRU:0h SnsKeySpc:800031h Recovered Error recovered data with retries 


Info 	2021-07-04 16:13:11 	
59
	A1896701 	Disk channel error (Channel:0 ID:38 SN:3QQ037RW00009004TVCN Encl:2 Slot:6): I/O Timeout cdb:Rd 135c5b00 0080 


Info 	2021-07-08 01:10:57 	
58
	A1896809 	Disk detected error (Channel:0 ID:40 SN:3QQ1DZ8700009004VXWF Encl:2 Slot:8) Key,Code,Qual=(01h,17h,01h) cdb:Rd 00017780 0080 Info:000177a1h CmdSpc:0h FRU:0h SnsKeySpc:800002h Recovered Error recovered data with retries 


Info 	2021-09-15 13:14:07 	
58
	A1897323 	Disk detected error (Channel:0 ID:39 SN:3QQ1T1CC00009004Y5DH Encl:2 Slot:7) Key,Code,Qual=(01h,17h,01h) cdb:Rd 0002d200 0080 Info:0002d275h CmdSpc:0h FRU:0h SnsKeySpc:800003h Recovered Error recovered data with retries 

Disk drive (Channel:0 ID:21 SN: Encl:1 Slot:5) reported a SMART event sense key:Recovered Error(01h) ASC:5Dh ASCQ:00h failure prediction threshold exceeded Info:00000000

Disk drive (Channel:0 ID:39 SN: Encl:2 Slot:7) reported a SMART event sense key:Recovered Error(01h) ASC:5Dh ASCQ:00h failure prediction threshold exceeded Info:00000000

Warning 	2022-03-09 12:24:53		58		A1898846		Disk detected error (Channel:0 ID:39 SN:3QQ1T1CC00009004Y5DH Encl:2 Slot:7) Key,Code,Qual=(04h,32h,00h) cdb:Rd 0002a880 0080 Info:0002a89ch CmdSpc:0h FRU:9dh SnsKeySpc:800096h Hardware no defect spare location available

Info 	2022-03-27 13:29:51 	58	A1899000 	Disk detected error (Channel:0 ID:34 SN:EA09PB80A1TS Encl:2 Slot:2) Key,Code,Qual=(01h,17h,EFh) cdb:Rd 2ccf8100 0080 Info:2ccf817fh CmdSpc:0h FRU:0h SnsKeySpc:800000h Recovered Error 

Info 	2022-04-03 18:01:19 	58	A1899051 	Disk detected error (Channel:0 ID:34 SN:EA09PB80A1TS Encl:2 Slot:2) Key,Code,Qual=(01h,17h,02h) cdb:Rd 2caee780 0080 Info:2caee7adh CmdSpc:0h FRU:0h SnsKeySpc:800004h Recovered Error recovered data with positive head offset

Info 	2022-04-03 18:01:17 	58	A1899050 	Disk detected error (Channel:0 ID:34 SN:EA09PB80A1TS Encl:2 Slot:2) Key,Code,Qual=(01h,18h,00h) cdb:Rd 2caee180 0080 Info:2caee1a0h CmdSpc:0h FRU:0h SnsKeySpc:800001h Recovered Error recovered data with error correction applied 

Info 	2022-04-04 20:24:25 	58	A1899060 	Disk detected error (Channel:0 ID:34 SN:EA09PB80A1TS Encl:2 Slot:2) Key,Code,Qual=(01h,17h,03h) cdb:Rd 136d3000 0080 Info:136d3014h CmdSpc:0h FRU:0h SnsKeySpc:800004h Recovered Error recovered data with negative head offset

Info 	2022-04-04 20:24:22 	58	A1899059 	Disk detected error (Channel:0 ID:34 SN:EA09PB80A1TS Encl:2 Slot:2) Key,Code,Qual=(01h,17h,01h) cdb:Rd 136d0d00 0080 Info:136d0d3eh CmdSpc:0h FRU:0h SnsKeySpc:800002h Recovered Error recovered data with retries 

Info 	2022-04-04 23:00:13 	58	A1899063 	Disk detected error (Channel:0 ID:34 SN:EA09PB80A1TS Encl:2 Slot:2) Key,Code,Qual=(01h,18h,A0h) cdb:Rd 2d432b00 0080 Info:2d432b27h CmdSpc:0h FRU:0h SnsKeySpc:80000ch Recovered Error

Info 	2022-04-04 22:51:22 	58	A1899062 	Disk detected error (Channel:0 ID:34 SN:EA09PB80A1TS Encl:2 Slot:2) Key,Code,Qual=(01h,17h,03h) cdb:Rd 2c4d8e00 0080 Info:2c4d8e06h CmdSpc:0h FRU:0h SnsKeySpc:800008h Recovered Error recovered data with negative head offset 

Info 	2022-04-19 13:05:17 	58	A1899166 	Disk detected error (Channel:0 ID:7 SN:3LN4L5LV00009834Q4PS Encl:0 Slot:7) Key,Code,Qual=(01h,17h,01h) cdb:Rd 0bf9d180 0080 Info:0bf9d18eh CmdSpc:0h FRU:0h SnsKeySpc:800032h Recovered Error recovered data with retries 

Info 	2022-04-24 23:15:51 	58	A1899212 	Disk detected error (Channel:0 ID:7 SN:3LN4L5LV00009834Q4PS Encl:0 Slot:7) Key,Code,Qual=(01h,17h,01h) cdb:Rd 0bf9d180 0080 Info:0bf9d18eh CmdSpc:0h FRU:0h SnsKeySpc:800032h Recovered Error recovered data with retries 

Info 	2022-05-08 05:19:24 	58	A1899309 	Disk detected error (Channel:0 ID:40 SN:3QQ1DZ8700009004VXWF Encl:2 Slot:8) Key,Code,Qual=(01h,18h,01h) cdb:Rd 00029b80 0080 Info:00029bf2h CmdSpc:0h FRU:1h SnsKeySpc:80000ah Recovered Error recovered data with error corr. & retries applied

Info 	2022-05-08 05:19:12 	58	A1899308 	Disk detected error (Channel:0 ID:40 SN:3QQ1DZ8700009004VXWF Encl:2 Slot:8) Key,Code,Qual=(01h,17h,02h) cdb:Rd 00018180 0080 Info:000181a6h CmdSpc:0h FRU:0h SnsKeySpc:800010h Recovered Error recovered data with positive head offset 


 Warning 	2022-06-11 19:55:53 	55	A1899596 	Disk drive (Channel:0 ID:34 SN:EA09PB80A1TS Encl:2 Slot:2) reported a SMART event sense key:Recovered Error(01h) ASC:5Dh ASCQ:00h failure prediction threshold exceeded Info:00000000

Critical 	2022-06-11 18:50:52 	314	A1899595 	FRU type: drive, problem: encl 2 deviceID 34. Vendor: IBM-ES Product ID: MBF2600RC , S/N: EA09PB80A1TS rev: SB2F. Related event ID: 1899594, type: 55 



fcsw01.hateotu.de

  • Brocade 300, 8*8G FC licenced
fcsw01.hateotu.de
10.204.3.230
00:27:f8:81:ee:a6


fcsw02.hateotu.de

  • Brocade 300, 8*8G FC licenced
fcsw02.hateotu.de
10.204.3.229
00:27:f8:81:ff:ae


installed the two missing iRMC-KVM-lics which got sponsored by Fujitsu --Carwe (talk) 18:47, 20 December 2020 (CET)