| NodeManager: | |
| Node ID: ebe4f699c70098a433806e0a4157788b55e34bb8970a6de5c2fabc84 | |
| Node name: 192.168.0.2 | |
| InitialConfigResources: {CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, memory: 847245893640000, GPU: 20000, object_store_memory: 21474836480000, node:__internal_head__: 10000} | |
| ClusterTaskManager: | |
| ========== Node: ebe4f699c70098a433806e0a4157788b55e34bb8970a6de5c2fabc84 ================= | |
| Infeasible queue length: 0 | |
| Schedule queue length: 0 | |
| Dispatch queue length: 0 | |
| num_waiting_for_resource: 0 | |
| num_waiting_for_plasma_memory: 0 | |
| num_waiting_for_remote_node_resources: 0 | |
| num_worker_not_started_by_job_config_not_exist: 0 | |
| num_worker_not_started_by_registration_timeout: 0 | |
| num_tasks_waiting_for_workers: 0 | |
| num_cancelled_tasks: 0 | |
| cluster_resource_scheduler state: | |
| Local id: 4007731071187618347 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [847245893640000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [180000], memory: [847245893640000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"ebe4f699c70098a433806e0a4157788b55e34bb8970a6de5c2fabc84",} is_draining: 0 is_idle: 0 Cluster resources: node id: 4007731071187618347{"total":{node:__internal_head__: 10000, node:192.168.0.2: 10000, memory: 847245893640000, CPU: 200000, accelerator_type:A40: 10000, GPU: 20000, object_store_memory: 21474836480000}}, "available": {CPU: 180000, accelerator_type:A40: 10000, memory: 847245893640000, node:192.168.0.2: 10000, node:__internal_head__: 10000, GPU: 20000, object_store_memory: 21474836480000}}, "labels":{"ray.io/node_id":"ebe4f699c70098a433806e0a4157788b55e34bb8970a6de5c2fabc84",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []} | |
| Waiting tasks size: 0 | |
| Number of executing tasks: 2 | |
| Number of pinned task arguments: 0 | |
| Number of total spilled tasks: 0 | |
| Number of spilled waiting tasks: 0 | |
| Number of spilled unschedulable tasks: 0 | |
| Resource usage { | |
| - (language=PYTHON actor_or_task=process_single_file pid=2373 worker_id=037fe865c00fe918749fadaf7868ef44a5963a2cef8390074fadf51d): {CPU: 10000} | |
| - (language=PYTHON actor_or_task=process_single_file pid=2384 worker_id=c63d951aeb4df7d1f0380b7e213e452534b60d5774d9f2e9f4222b9c): {CPU: 10000} | |
| } | |
| Backlog Size per scheduling descriptor :{workerId: num backlogs}: | |
| Running tasks by scheduling class: | |
| - {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=parse_ray_txt_with_gpu_cpu, class_name=, function_name=process_single_file, function_hash=048a5e2747364791a2fb3cba5ef04fcb} scheduling_strategy=default_scheduling_strategy { | |
| } | |
| resource_set={CPU : 1, }}: 2/20 | |
| ================================================== | |
| ClusterResources: | |
| LocalObjectManager: | |
| - num pinned objects: 0 | |
| - pinned objects size: 0 | |
| - num objects pending restore: 0 | |
| - num objects pending spill: 0 | |
| - num bytes pending spill: 0 | |
| - num bytes currently spilled: 0 | |
| - cumulative spill requests: 0 | |
| - cumulative restore requests: 0 | |
| - spilled objects pending delete: 0 | |
| ObjectManager: | |
| - num local objects: 0 | |
| - num unfulfilled push requests: 0 | |
| - num object pull requests: 0 | |
| - num chunks received total: 0 | |
| - num chunks received failed (all): 0 | |
| - num chunks received failed / cancelled: 0 | |
| - num chunks received failed / plasma error: 0 | |
| Event stats: | |
| Global stats: 0 total (0 active) | |
| Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s | |
| Execution time: mean = -nan s, total = 0.000 s | |
| Event stats: | |
| PushManager: | |
| - num pushes in flight: 0 | |
| - num chunks in flight: 0 | |
| - num chunks remaining: 0 | |
| - max chunks allowed: 409 | |
| OwnershipBasedObjectDirectory: | |
| - num listeners: 0 | |
| - cumulative location updates: 0 | |
| - num location updates per second: 0.000 | |
| - num location lookups per second: 0.000 | |
| - num locations added per second: 0.000 | |
| - num locations removed per second: 0.000 | |
| BufferPool: | |
| - create buffer state map size: 0 | |
| PullManager: | |
| - num bytes available for pulled objects: 2147483648 | |
| - num bytes being pulled (all): 0 | |
| - num bytes being pulled / pinned: 0 | |
| - get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} | |
| - wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} | |
| - task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable} | |
| - first get request bundle: N/A | |
| - first wait request bundle: N/A | |
| - first task request bundle: N/A | |
| - num objects queued: 0 | |
| - num objects actively pulled (all): 0 | |
| - num objects actively pulled / pinned: 0 | |
| - num bundles being pulled: 0 | |
| - num pull retries: 0 | |
| - max timeout seconds: 0 | |
| - max timeout request is already processed. No entry. | |
| WorkerPool: | |
| - registered jobs: 1 | |
| - process_failed_job_config_missing: 0 | |
| - process_failed_rate_limited: 0 | |
| - process_failed_pending_registration: 0 | |
| - process_failed_runtime_env_setup_failed: 0 | |
| - num PYTHON workers: 20 | |
| - num PYTHON drivers: 1 | |
| - num PYTHON pending start requests: 0 | |
| - num PYTHON pending registration requests: 0 | |
| - num object spill callbacks queued: 0 | |
| - num object restore queued: 0 | |
| - num util functions queued: 0 | |
| - num idle workers: 18 | |
| TaskDependencyManager: | |
| - task deps map size: 0 | |
| - get req map size: 0 | |
| - wait req map size: 0 | |
| - local objects map size: 0 | |
| WaitManager: | |
| - num active wait requests: 0 | |
| Subscriber: | |
| Channel WORKER_OBJECT_LOCATIONS_CHANNEL | |
| - cumulative subscribe requests: 0 | |
| - cumulative unsubscribe requests: 0 | |
| - active subscribed publishers: 0 | |
| - cumulative published messages: 0 | |
| - cumulative processed messages: 0 | |
| Channel WORKER_REF_REMOVED_CHANNEL | |
| - cumulative subscribe requests: 0 | |
| - cumulative unsubscribe requests: 0 | |
| - active subscribed publishers: 0 | |
| - cumulative published messages: 0 | |
| - cumulative processed messages: 0 | |
| Channel WORKER_OBJECT_EVICTION | |
| - cumulative subscribe requests: 0 | |
| - cumulative unsubscribe requests: 0 | |
| - active subscribed publishers: 0 | |
| - cumulative published messages: 0 | |
| - cumulative processed messages: 0 | |
| num async plasma notifications: 0 | |
| Remote node managers: | |
| Event stats: | |
| Global stats: 2991 total (35 active) | |
| Queueing time: mean = 1.381 ms, max = 878.225 ms, min = 65.000 ns, total = 4.130 s | |
| Execution time: mean = 893.427 us, total = 2.672 s | |
| Event stats: | |
| NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 630 total (0 active), Execution time: mean = 45.039 us, total = 28.374 ms, Queueing time: mean = 116.054 us, max = 312.419 us, min = 2.512 us, total = 73.114 ms | |
| NodeManagerService.grpc_server.ReportWorkerBacklog - 630 total (0 active), Execution time: mean = 611.293 us, total = 385.115 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s | |
| RaySyncer.OnDemandBroadcasting - 300 total (1 active), Execution time: mean = 21.143 us, total = 6.343 ms, Queueing time: mean = 98.133 us, max = 2.501 ms, min = 17.969 us, total = 29.440 ms | |
| ObjectManager.UpdateAvailableMemory - 300 total (0 active), Execution time: mean = 6.717 us, total = 2.015 ms, Queueing time: mean = 113.020 us, max = 373.933 us, min = 7.463 us, total = 33.906 ms | |
| NodeManager.CheckGC - 300 total (1 active), Execution time: mean = 3.553 us, total = 1.066 ms, Queueing time: mean = 114.756 us, max = 2.502 ms, min = 22.455 us, total = 34.427 ms | |
| RayletWorkerPool.deadline_timer.kill_idle_workers - 150 total (1 active), Execution time: mean = 21.689 us, total = 3.253 ms, Queueing time: mean = 160.138 us, max = 11.582 ms, min = 22.229 us, total = 24.021 ms | |
| MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 120 total (1 active), Execution time: mean = 476.822 us, total = 57.219 ms, Queueing time: mean = 76.256 us, max = 169.739 us, min = 15.682 us, total = 9.151 ms | |
| ClientConnection.async_read.ProcessMessageHeader - 94 total (21 active), Execution time: mean = 8.233 us, total = 773.904 us, Queueing time: mean = 40.901 ms, max = 878.225 ms, min = 32.614 us, total = 3.845 s | |
| ClientConnection.async_read.ProcessMessage - 73 total (0 active), Execution time: mean = 990.704 us, total = 72.321 ms, Queueing time: mean = 39.992 us, max = 349.119 us, min = 2.947 us, total = 2.919 ms | |
| NodeManager.ScheduleAndDispatchTasks - 31 total (1 active), Execution time: mean = 16.325 us, total = 506.078 us, Queueing time: mean = 77.851 us, max = 241.007 us, min = 33.440 us, total = 2.413 ms | |
| NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 30 total (0 active), Execution time: mean = 145.194 us, total = 4.356 ms, Queueing time: mean = 95.892 us, max = 139.690 us, min = 34.366 us, total = 2.877 ms | |
| NodeManager.deadline_timer.spill_objects_when_over_threshold - 30 total (1 active), Execution time: mean = 3.611 us, total = 108.336 us, Queueing time: mean = 179.852 us, max = 1.223 ms, min = 38.117 us, total = 5.396 ms | |
| NodeManager.deadline_timer.flush_free_objects - 30 total (1 active), Execution time: mean = 8.273 us, total = 248.184 us, Queueing time: mean = 176.293 us, max = 1.228 ms, min = 36.023 us, total = 5.289 ms | |
| NodeManagerService.grpc_server.GetResourceLoad - 30 total (0 active), Execution time: mean = 687.359 us, total = 20.621 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s | |
| ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.436 us, total = 31.593 us, Queueing time: mean = 49.013 us, max = 157.019 us, min = 14.007 us, total = 1.078 ms | |
| ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 12.277 us, total = 257.825 us, Queueing time: mean = 103.107 us, max = 250.113 us, min = 8.342 us, total = 2.165 ms | |
| NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 114.123 us, total = 2.397 ms, Queueing time: mean = 137.700 us, max = 353.229 us, min = 20.842 us, total = 2.892 ms | |
| NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.104 ms, total = 23.189 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s | |
| ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.118 us, total = 422.484 us, Queueing time: mean = 119.067 us, max = 288.455 us, min = 23.069 us, total = 2.500 ms | |
| PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 229.695 us, total = 2.986 ms, Queueing time: mean = 3.006 ms, max = 8.914 ms, min = 36.505 us, total = 39.079 ms | |
| ClusterResourceManager.ResetRemoteNodeView - 11 total (1 active), Execution time: mean = 9.259 us, total = 101.845 us, Queueing time: mean = 72.953 us, max = 115.108 us, min = 50.989 us, total = 802.479 us | |
| NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 1.060 ms, total = 10.604 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s | |
| NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 128.679 us, total = 1.287 ms, Queueing time: mean = 221.593 us, max = 499.891 us, min = 51.035 us, total = 2.216 ms | |
| WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.581 us, total = 245.814 us, Queueing time: mean = 136.647 us, max = 244.686 us, min = 33.920 us, total = 1.366 ms | |
| - 9 total (0 active), Execution time: mean = 942.556 ns, total = 8.483 us, Queueing time: mean = 78.072 us, max = 180.110 us, min = 18.065 us, total = 702.645 us | |
| RaySyncer.BroadcastMessage - 9 total (0 active), Execution time: mean = 210.305 us, total = 1.893 ms, Queueing time: mean = 772.444 ns, max = 1.260 us, min = 93.000 ns, total = 6.952 us | |
| NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 8 total (0 active), Execution time: mean = 126.457 us, total = 1.012 ms, Queueing time: mean = 107.657 us, max = 158.354 us, min = 40.665 us, total = 861.256 us | |
| NodeManagerService.grpc_server.ReturnWorker - 8 total (0 active), Execution time: mean = 643.188 us, total = 5.146 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s | |
| ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 6 total (0 active), Execution time: mean = 54.225 us, total = 325.353 us, Queueing time: mean = 135.348 us, max = 193.776 us, min = 110.287 us, total = 812.087 us | |
| ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 6 total (0 active), Execution time: mean = 1.508 ms, total = 9.048 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s | |
| NodeManager.GcsCheckAlive - 6 total (1 active), Execution time: mean = 238.280 us, total = 1.430 ms, Queueing time: mean = 516.632 us, max = 1.218 ms, min = 134.433 us, total = 3.100 ms | |
| NodeManager.deadline_timer.record_metrics - 6 total (1 active), Execution time: mean = 496.643 us, total = 2.980 ms, Queueing time: mean = 321.493 us, max = 972.208 us, min = 56.612 us, total = 1.929 ms | |
| NodeManager.deadline_timer.debug_state_dump - 3 total (1 active, 1 running), Execution time: mean = 1.415 ms, total = 4.244 ms, Queueing time: mean = 35.090 us, max = 56.664 us, min = 48.605 us, total = 105.269 us | |
| ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 493.527 ms, total = 987.054 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s | |
| ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.401 ms, total = 2.802 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s | |
| ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 157.136 us, total = 314.272 us, Queueing time: mean = 805.977 us, max = 1.500 ms, min = 112.014 us, total = 1.612 ms | |
| RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.027 us, total = 4.054 us, Queueing time: mean = 236.000 ns, max = 407.000 ns, min = 65.000 ns, total = 472.000 ns | |
| ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 68.789 us, total = 68.789 us, Queueing time: mean = 343.283 us, max = 343.283 us, min = 343.283 us, total = 343.283 us | |
| ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.782 ms, total = 1.782 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s | |
| ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.020 s, total = 1.020 s, Queueing time: mean = 188.617 us, max = 188.617 us, min = 188.617 us, total = 188.617 us | |
| NodeManager.deadline_timer.print_event_loop_stats - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s | |
| ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.872 ms, total = 2.872 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s | |
| ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 221.010 us, total = 221.010 us, Queueing time: mean = 104.753 us, max = 104.753 us, min = 104.753 us, total = 104.753 us | |
| ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 575.837 us, total = 575.837 us, Queueing time: mean = 50.651 us, max = 50.651 us, min = 50.651 us, total = 50.651 us | |
| NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s | |
| ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.146 ms, total = 2.146 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s | |
| ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.814 ms, total = 1.814 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s | |
| ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.830 ms, total = 1.830 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s | |
| ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 206.474 us, total = 206.474 us, Queueing time: mean = 113.161 us, max = 113.161 us, min = 113.161 us, total = 113.161 us | |
| Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 87.039 us, total = 87.039 us, Queueing time: mean = 313.227 us, max = 313.227 us, min = 313.227 us, total = 313.227 us | |
| ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 42.545 us, total = 42.545 us, Queueing time: mean = 109.614 us, max = 109.614 us, min = 109.614 us, total = 109.614 us | |
| DebugString() time ms: 1 |