建议使用Chrome浏览器访问!
技术支持
互动社区
学习培训
深信服官网
合作伙伴

超融合HCI

关注
深信服超融合SANGFOR HCI是面向下一代数据中心的软件定义基础架构,通过虚拟化技术融合计算、存储、网络和安全等资源,并提供运维管理、容灾备份、智能监控等高级特性,帮助用户构建极简、稳定、高性能的云化数据中心基石。
故障案例库
典型场景排查思路

【HCI-OS-USB】usb设备插拔后出现宕机

更新时间:2025-03-15
  • 阅读权限:游客
  • 下载
  • 分享
  • 收藏
所属模块 OS底座服务器硬件 | 硬件变更内核驱动
适用版本 HCI 6.7.0以上

插拔usb设备时,服务器宕机出core。

 

宕机问题,无前台告警

查看宕机日志

[56585606.336851] usb 1-2: USB disconnect, device number 2
[56585610.740837] usb 1-2: new full-speed USB device number 14 using xhci_hcd
[56585610.872850] usb 1-2: device descriptor read/64, error -71
[56585611.143354] usb 1-2: New USB device found, idVendor=096e, idProduct=0705, bcdDevice=63.10
[56585611.143357] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[56585611.143358] usb 1-2: Product: FT Interpass3000
[56585611.143359] usb 1-2: Manufacturer: FS
[56585611.144448] usb-storage 1-2:1.0: USB Mass Storage device detected
[56585611.144551] scsi host30: usb-storage 1-2:1.0
[56585612.172075] scsi 30:0:0:0: CD-ROM            FT       Interpass300004  1.00 PQ: 0 ANSI: 2
[56585612.172079] scsi 30:0:0:0: scsi_device_set_state() [created]->[running]
[56585612.172230] scsi 30:0:0:1: scsi_device_set_state() [created]->[deleted]
[56585612.173312] sr 30:0:0:0: [sr0] scsi3-mmc drive: 0x/0x caddy
[56585612.173966] sr 30:0:0:0: Attached scsi CD-ROM sr0
[56585612.174069] sr 30:0:0:0: Attached scsi generic sg55 type 5
[56585615.816804] device channel1.3705 entered promiscuous mode
[56585619.357557] device channel1.3705 left promiscuous mode
[56585620.560744] device channel1.91 entered promiscuous mode
[56585621.775416] sr 30:0:0:0: scsi_device_set_state() [running]->[cancel]
[56585621.776639] scsi 30:0:0:0: scsi_device_set_state() [cancel]->[deleted]
[56585622.261200] usb 1-2: reset full-speed USB device number 14 using xhci_hcd
[56585622.921195] usb 1-2: reset full-speed USB device number 14 using xhci_hcd
[56585623.593202] usb 1-2: reset full-speed USB device number 14 using xhci_hcd
[56585623.925843] device channel1.91 left promiscuous mode
[56585624.385192] usb 1-2: reset full-speed USB device number 14 using xhci_hcd
[56585624.677175] usb 1-2: reset full-speed USB device number 14 using xhci_hcd
[56585624.981200] usb 1-2: reset full-speed USB device number 14 using xhci_hcd
[56585625.781185] usb 1-2: reset full-speed USB device number 14 using xhci_hcd
[56585625.934135] usb 1-2: device descriptor read/all, error 2
[56585626.061130] usb 1-2: reset full-speed USB device number 14 using xhci_hcd
[56585626.214152] usb 1-2: device descriptor read/all, error 2
[56585626.345118] usb 1-2: reset full-speed USB device number 14 using xhci_hcd
[56585626.366216] usb 1-2: device descriptor read/all, error 2
[56585626.497109] usb 1-2: reset full-speed USB device number 14 using xhci_hcd
[56585626.517802] usb 1-2: device descriptor read/all, error -71
[56585626.517933] usb 1-2: USB disconnect, device number 14
[56585626.518097] general protection fault: 0000 [#1] SMP NOPTI
[56585626.518208] CPU: 40 PID: 7690 Comm: kworker/40:1 Kdump: loaded Tainted: G     U     O     --------- -t - 4.18.0 #1
[56585626.518331] Hardware name: SANGFOR INSPUR/ASERVER-P-2305, BIOS 4.1.8 12/06/2019
[56585626.518445] Workqueue: usb_hub_wq hub_event
[56585626.518553] RIP: 0010:kfree+0x4b/0x170
[56585626.518651] Code: ba 00 00 00 80 48 8b 15 73 28 fd 00 49 01 da 0f 83 d4 00 00 00 49 01 d2 49 c1 ea 0c 49 c1 e2 06 4c 89 d0 48 03 05 05 68 f1 00 <48> 8b 50 08 4c 8d 52 ff 83 e2 01 4c 0f 44 d0 49 8b 52 08 48 8d 42
[56585626.519111] RSP: 0018:ffffc90026563cb0 EFLAGS: 00010207
[56585626.519525] RAX: 03ffe9fe00ddd5c0 RBX: ffff880037757063 RCX: 0000000082000111
[56585626.520248] RDX: 0000777f80000000 RSI: 0000000082000111 RDI: ffff880037757063
[56585626.520956] RBP: ffff88fcd6248800 R08: 0000000000000001 R09: 0000000000000000
[56585626.521647] R10: 03fffffe00ddd5c0 R11: 0000000000000000 R12: ffffffff81628448
[56585626.522339] R13: ffff88fcd62488b0 R14: ffff88b851a2c800 R15: 0000000000000002
[56585626.523042] FS:  0000000000000000(0000) GS:ffff88bf80300000(0000) knlGS:0000000000000000
[56585626.523801] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[56585626.524226] CR2: 00000223a86b0010 CR3: 000000000221c005 CR4: 00000000007626e0
[56585626.524920] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[56585626.525612] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[56585626.526306] PKRU: 55555554
[56585626.526691] Call Trace:
[56585626.527084]  usb_destroy_configuration+0x48/0x120
[56585626.527479]  usb_release_dev+0x1f/0x60
[56585626.527878]  device_release+0x30/0x90
[56585626.528273]  kobject_release+0x68/0x190
[56585626.528663]  hub_port_connect+0x75/0xa70
[56585626.529057]  hub_event+0x752/0xae0
[56585626.529450]  process_one_work+0x15e/0x3f0
[56585626.529841]  worker_thread+0x4c/0x440
[56585626.530230]  ? rescuer_thread+0x350/0x350
[56585626.530621]  kthread+0xf8/0x130
[56585626.531009]  ? kthread_destroy_worker+0x40/0x40
[56585626.531402]  ret_from_fork+0x1f/0x40
[56585626.531793] Modules linked in: ib_iser(O) rdma_cm(O) iw_cm(O) ib_cm(O) squashfs overlay ipmi_poweroff ipmi_watchdog ipmi_devintf bonding iptable_nat nf_nat_ipv4 nf_nat nfsv3 ipt_REJECT nf_reject_ipv4 rpcsec_gss_krb5 nfsv4 dns_resolver xt_comment fuse nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc 8021q garp mrp ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack libcrc32c iptable_filter ip_tables ramoops reed_solomon mpt3sas(O) raid_class scsi_transport_sas iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vhost_net vhost tap vfio_iommu_type1 vfio_pci vfio_virqfd vfio mlx5_ib(O) mlx5_core(O) mlxfw tls(t) ib_uverbs(O) ib_core(O) mlx_compat(O) dm_round_robin dm_multipath mlx4_en mlx4_core tipc ip6_udp_tunnel udp_tunnel tun nbd skx_edac nfit libnvdimm k10temp coretemp bridge stp llc watch_reboot(O) sffs(O) cl_lock(O) cl_softdog(O) kvm_intel kvm irqbypass igb(O) i2c_algo_bit ixgbe(O) i40e(O) loop dm_mod sr_mod cdrom usbhid sg sd_mod usb_storage
[56585626.531833]  iTCO_wdt iTCO_vendor_support pcspkr megaraid_sas(O) ahci i2c_i801 libahci lpc_ich i2c_core mfd_core ioatdma libata dca wmi ipmi_si ipmi_msghandler acpi_cpufreq acpi_power_meter sch_fq_codel [last unloaded: iw_cm]
[56585626.537056] Features: eBPF/event

观察是否存在

[56585625.934135] usb 1-2: device descriptor read/all, error 2

[56585626.214152] usb 1-2: device descriptor read/all, error 2

[56585626.366216] usb 1-2: device descriptor read/all, error 2

相关日志。并且堆栈相似,则为同问题

特定usb设备初始化时,usb配置读取可能会不完整,导致内存拷贝时出现脏数据,从而异常访问导致宕机。

6.11.1版本已解决该问题 td2024072200006,升级后即可解决。

旧版本补丁还在排期中。

升级HCI版本需要集群主机重启。
如引起故障的usb设备后续常用,建议安排时间升级HCI版本,如不常用,可考虑等待后续的补丁计划。

hub_port_init调用usb_get_device_descriptor,向下传入了USB_DT_DEVICE_SIZE=18,

现有代码逻辑,当usb_get_descriptor返回值大于0时,memcpy就会固定将desc的数据传入&dev->descriptor,

但是没考虑到usb_get_descriptor读取到的值和USB_DT_DEVICE_SIZE不一致的情况。

 

 

 

本页目录
  • 问题描述
  • 告警信息
  • 有效排查步骤
  • 根因
  • 解决方案
  • 操作影响范围
  • 是否是临时解决方案
  • 建议与总结
  • 排查内容