JayKimDevolved's picture
JayKimDevolved/deepseek
c011401 verified
NodeManager:
Node ID: ebe4f699c70098a433806e0a4157788b55e34bb8970a6de5c2fabc84
Node name: 192.168.0.2
InitialConfigResources: {CPU: 200000, node:192.168.0.2: 10000, accelerator_type:A40: 10000, memory: 847245893640000, GPU: 20000, object_store_memory: 21474836480000, node:__internal_head__: 10000}
ClusterTaskManager:
========== Node: ebe4f699c70098a433806e0a4157788b55e34bb8970a6de5c2fabc84 =================
Infeasible queue length: 0
Schedule queue length: 0
Dispatch queue length: 0
num_waiting_for_resource: 0
num_waiting_for_plasma_memory: 0
num_waiting_for_remote_node_resources: 0
num_worker_not_started_by_job_config_not_exist: 0
num_worker_not_started_by_registration_timeout: 0
num_tasks_waiting_for_workers: 0
num_cancelled_tasks: 0
cluster_resource_scheduler state:
Local id: 4007731071187618347 Local resources: {"total":{node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [200000], memory: [847245893640000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "available": {node:__internal_head__: [10000], accelerator_type:A40: [10000], node:192.168.0.2: [10000], CPU: [180000], memory: [847245893640000], GPU: [10000, 10000], object_store_memory: [21474836480000]}}, "labels":{"ray.io/node_id":"ebe4f699c70098a433806e0a4157788b55e34bb8970a6de5c2fabc84",} is_draining: 0 is_idle: 0 Cluster resources: node id: 4007731071187618347{"total":{node:__internal_head__: 10000, node:192.168.0.2: 10000, memory: 847245893640000, CPU: 200000, accelerator_type:A40: 10000, GPU: 20000, object_store_memory: 21474836480000}}, "available": {CPU: 180000, accelerator_type:A40: 10000, memory: 847245893640000, node:192.168.0.2: 10000, node:__internal_head__: 10000, GPU: 20000, object_store_memory: 21474836480000}}, "labels":{"ray.io/node_id":"ebe4f699c70098a433806e0a4157788b55e34bb8970a6de5c2fabc84",}, "is_draining": 0, "draining_deadline_timestamp_ms": -1} { "placment group locations": [], "node to bundles": []}
Waiting tasks size: 0
Number of executing tasks: 2
Number of pinned task arguments: 0
Number of total spilled tasks: 0
Number of spilled waiting tasks: 0
Number of spilled unschedulable tasks: 0
Resource usage {
- (language=PYTHON actor_or_task=process_single_file pid=2373 worker_id=037fe865c00fe918749fadaf7868ef44a5963a2cef8390074fadf51d): {CPU: 10000}
- (language=PYTHON actor_or_task=process_single_file pid=2384 worker_id=c63d951aeb4df7d1f0380b7e213e452534b60d5774d9f2e9f4222b9c): {CPU: 10000}
}
Backlog Size per scheduling descriptor :{workerId: num backlogs}:
Running tasks by scheduling class:
- {depth=1 function_descriptor={type=PythonFunctionDescriptor, module_name=parse_ray_txt_with_gpu_cpu, class_name=, function_name=process_single_file, function_hash=048a5e2747364791a2fb3cba5ef04fcb} scheduling_strategy=default_scheduling_strategy {
}
resource_set={CPU : 1, }}: 2/20
==================================================
ClusterResources:
LocalObjectManager:
- num pinned objects: 0
- pinned objects size: 0
- num objects pending restore: 0
- num objects pending spill: 0
- num bytes pending spill: 0
- num bytes currently spilled: 0
- cumulative spill requests: 0
- cumulative restore requests: 0
- spilled objects pending delete: 0
ObjectManager:
- num local objects: 0
- num unfulfilled push requests: 0
- num object pull requests: 0
- num chunks received total: 0
- num chunks received failed (all): 0
- num chunks received failed / cancelled: 0
- num chunks received failed / plasma error: 0
Event stats:
Global stats: 0 total (0 active)
Queueing time: mean = -nan s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
Execution time: mean = -nan s, total = 0.000 s
Event stats:
PushManager:
- num pushes in flight: 0
- num chunks in flight: 0
- num chunks remaining: 0
- max chunks allowed: 409
OwnershipBasedObjectDirectory:
- num listeners: 0
- cumulative location updates: 0
- num location updates per second: 0.000
- num location lookups per second: 0.000
- num locations added per second: 0.000
- num locations removed per second: 0.000
BufferPool:
- create buffer state map size: 0
PullManager:
- num bytes available for pulled objects: 2147483648
- num bytes being pulled (all): 0
- num bytes being pulled / pinned: 0
- get request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
- wait request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
- task request bundles: BundlePullRequestQueue{0 total, 0 active, 0 inactive, 0 unpullable}
- first get request bundle: N/A
- first wait request bundle: N/A
- first task request bundle: N/A
- num objects queued: 0
- num objects actively pulled (all): 0
- num objects actively pulled / pinned: 0
- num bundles being pulled: 0
- num pull retries: 0
- max timeout seconds: 0
- max timeout request is already processed. No entry.
WorkerPool:
- registered jobs: 1
- process_failed_job_config_missing: 0
- process_failed_rate_limited: 0
- process_failed_pending_registration: 0
- process_failed_runtime_env_setup_failed: 0
- num PYTHON workers: 20
- num PYTHON drivers: 1
- num PYTHON pending start requests: 0
- num PYTHON pending registration requests: 0
- num object spill callbacks queued: 0
- num object restore queued: 0
- num util functions queued: 0
- num idle workers: 18
TaskDependencyManager:
- task deps map size: 0
- get req map size: 0
- wait req map size: 0
- local objects map size: 0
WaitManager:
- num active wait requests: 0
Subscriber:
Channel WORKER_OBJECT_LOCATIONS_CHANNEL
- cumulative subscribe requests: 0
- cumulative unsubscribe requests: 0
- active subscribed publishers: 0
- cumulative published messages: 0
- cumulative processed messages: 0
Channel WORKER_REF_REMOVED_CHANNEL
- cumulative subscribe requests: 0
- cumulative unsubscribe requests: 0
- active subscribed publishers: 0
- cumulative published messages: 0
- cumulative processed messages: 0
Channel WORKER_OBJECT_EVICTION
- cumulative subscribe requests: 0
- cumulative unsubscribe requests: 0
- active subscribed publishers: 0
- cumulative published messages: 0
- cumulative processed messages: 0
num async plasma notifications: 0
Remote node managers:
Event stats:
Global stats: 2991 total (35 active)
Queueing time: mean = 1.381 ms, max = 878.225 ms, min = 65.000 ns, total = 4.130 s
Execution time: mean = 893.427 us, total = 2.672 s
Event stats:
NodeManagerService.grpc_server.ReportWorkerBacklog.HandleRequestImpl - 630 total (0 active), Execution time: mean = 45.039 us, total = 28.374 ms, Queueing time: mean = 116.054 us, max = 312.419 us, min = 2.512 us, total = 73.114 ms
NodeManagerService.grpc_server.ReportWorkerBacklog - 630 total (0 active), Execution time: mean = 611.293 us, total = 385.115 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
RaySyncer.OnDemandBroadcasting - 300 total (1 active), Execution time: mean = 21.143 us, total = 6.343 ms, Queueing time: mean = 98.133 us, max = 2.501 ms, min = 17.969 us, total = 29.440 ms
ObjectManager.UpdateAvailableMemory - 300 total (0 active), Execution time: mean = 6.717 us, total = 2.015 ms, Queueing time: mean = 113.020 us, max = 373.933 us, min = 7.463 us, total = 33.906 ms
NodeManager.CheckGC - 300 total (1 active), Execution time: mean = 3.553 us, total = 1.066 ms, Queueing time: mean = 114.756 us, max = 2.502 ms, min = 22.455 us, total = 34.427 ms
RayletWorkerPool.deadline_timer.kill_idle_workers - 150 total (1 active), Execution time: mean = 21.689 us, total = 3.253 ms, Queueing time: mean = 160.138 us, max = 11.582 ms, min = 22.229 us, total = 24.021 ms
MemoryMonitor.CheckIsMemoryUsageAboveThreshold - 120 total (1 active), Execution time: mean = 476.822 us, total = 57.219 ms, Queueing time: mean = 76.256 us, max = 169.739 us, min = 15.682 us, total = 9.151 ms
ClientConnection.async_read.ProcessMessageHeader - 94 total (21 active), Execution time: mean = 8.233 us, total = 773.904 us, Queueing time: mean = 40.901 ms, max = 878.225 ms, min = 32.614 us, total = 3.845 s
ClientConnection.async_read.ProcessMessage - 73 total (0 active), Execution time: mean = 990.704 us, total = 72.321 ms, Queueing time: mean = 39.992 us, max = 349.119 us, min = 2.947 us, total = 2.919 ms
NodeManager.ScheduleAndDispatchTasks - 31 total (1 active), Execution time: mean = 16.325 us, total = 506.078 us, Queueing time: mean = 77.851 us, max = 241.007 us, min = 33.440 us, total = 2.413 ms
NodeManagerService.grpc_server.GetResourceLoad.HandleRequestImpl - 30 total (0 active), Execution time: mean = 145.194 us, total = 4.356 ms, Queueing time: mean = 95.892 us, max = 139.690 us, min = 34.366 us, total = 2.877 ms
NodeManager.deadline_timer.spill_objects_when_over_threshold - 30 total (1 active), Execution time: mean = 3.611 us, total = 108.336 us, Queueing time: mean = 179.852 us, max = 1.223 ms, min = 38.117 us, total = 5.396 ms
NodeManager.deadline_timer.flush_free_objects - 30 total (1 active), Execution time: mean = 8.273 us, total = 248.184 us, Queueing time: mean = 176.293 us, max = 1.228 ms, min = 36.023 us, total = 5.289 ms
NodeManagerService.grpc_server.GetResourceLoad - 30 total (0 active), Execution time: mean = 687.359 us, total = 20.621 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ClientConnection.async_write.DoAsyncWrites - 22 total (0 active), Execution time: mean = 1.436 us, total = 31.593 us, Queueing time: mean = 49.013 us, max = 157.019 us, min = 14.007 us, total = 1.078 ms
ObjectManager.ObjectAdded - 21 total (0 active), Execution time: mean = 12.277 us, total = 257.825 us, Queueing time: mean = 103.107 us, max = 250.113 us, min = 8.342 us, total = 2.165 ms
NodeManagerService.grpc_server.GetSystemConfig.HandleRequestImpl - 21 total (0 active), Execution time: mean = 114.123 us, total = 2.397 ms, Queueing time: mean = 137.700 us, max = 353.229 us, min = 20.842 us, total = 2.892 ms
NodeManagerService.grpc_server.GetSystemConfig - 21 total (0 active), Execution time: mean = 1.104 ms, total = 23.189 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ObjectManager.ObjectDeleted - 21 total (0 active), Execution time: mean = 20.118 us, total = 422.484 us, Queueing time: mean = 119.067 us, max = 288.455 us, min = 23.069 us, total = 2.500 ms
PeriodicalRunner.RunFnPeriodically - 13 total (0 active), Execution time: mean = 229.695 us, total = 2.986 ms, Queueing time: mean = 3.006 ms, max = 8.914 ms, min = 36.505 us, total = 39.079 ms
ClusterResourceManager.ResetRemoteNodeView - 11 total (1 active), Execution time: mean = 9.259 us, total = 101.845 us, Queueing time: mean = 72.953 us, max = 115.108 us, min = 50.989 us, total = 802.479 us
NodeManagerService.grpc_server.RequestWorkerLease - 10 total (0 active), Execution time: mean = 1.060 ms, total = 10.604 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
NodeManagerService.grpc_server.RequestWorkerLease.HandleRequestImpl - 10 total (0 active), Execution time: mean = 128.679 us, total = 1.287 ms, Queueing time: mean = 221.593 us, max = 499.891 us, min = 51.035 us, total = 2.216 ms
WorkerPool.PopWorkerCallback - 10 total (0 active), Execution time: mean = 24.581 us, total = 245.814 us, Queueing time: mean = 136.647 us, max = 244.686 us, min = 33.920 us, total = 1.366 ms
- 9 total (0 active), Execution time: mean = 942.556 ns, total = 8.483 us, Queueing time: mean = 78.072 us, max = 180.110 us, min = 18.065 us, total = 702.645 us
RaySyncer.BroadcastMessage - 9 total (0 active), Execution time: mean = 210.305 us, total = 1.893 ms, Queueing time: mean = 772.444 ns, max = 1.260 us, min = 93.000 ns, total = 6.952 us
NodeManagerService.grpc_server.ReturnWorker.HandleRequestImpl - 8 total (0 active), Execution time: mean = 126.457 us, total = 1.012 ms, Queueing time: mean = 107.657 us, max = 158.354 us, min = 40.665 us, total = 861.256 us
NodeManagerService.grpc_server.ReturnWorker - 8 total (0 active), Execution time: mean = 643.188 us, total = 5.146 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive.OnReplyReceived - 6 total (0 active), Execution time: mean = 54.225 us, total = 325.353 us, Queueing time: mean = 135.348 us, max = 193.776 us, min = 110.287 us, total = 812.087 us
ray::rpc::NodeInfoGcsService.grpc_client.CheckAlive - 6 total (0 active), Execution time: mean = 1.508 ms, total = 9.048 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
NodeManager.GcsCheckAlive - 6 total (1 active), Execution time: mean = 238.280 us, total = 1.430 ms, Queueing time: mean = 516.632 us, max = 1.218 ms, min = 134.433 us, total = 3.100 ms
NodeManager.deadline_timer.record_metrics - 6 total (1 active), Execution time: mean = 496.643 us, total = 2.980 ms, Queueing time: mean = 321.493 us, max = 972.208 us, min = 56.612 us, total = 1.929 ms
NodeManager.deadline_timer.debug_state_dump - 3 total (1 active, 1 running), Execution time: mean = 1.415 ms, total = 4.244 ms, Queueing time: mean = 35.090 us, max = 56.664 us, min = 48.605 us, total = 105.269 us
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 2 total (1 active), Execution time: mean = 493.527 ms, total = 987.054 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 2 total (0 active), Execution time: mean = 1.401 ms, total = 2.802 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch.OnReplyReceived - 2 total (0 active), Execution time: mean = 157.136 us, total = 314.272 us, Queueing time: mean = 805.977 us, max = 1.500 ms, min = 112.014 us, total = 1.612 ms
RaySyncerRegister - 2 total (0 active), Execution time: mean = 2.027 us, total = 4.054 us, Queueing time: mean = 236.000 ns, max = 407.000 ns, min = 65.000 ns, total = 472.000 ns
ray::rpc::JobInfoGcsService.grpc_client.AddJob.OnReplyReceived - 1 total (0 active), Execution time: mean = 68.789 us, total = 68.789 us, Queueing time: mean = 343.283 us, max = 343.283 us, min = 343.283 us, total = 343.283 us
ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo - 1 total (0 active), Execution time: mean = 1.782 ms, total = 1.782 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig.OnReplyReceived - 1 total (0 active), Execution time: mean = 1.020 s, total = 1.020 s, Queueing time: mean = 188.617 us, max = 188.617 us, min = 188.617 us, total = 188.617 us
NodeManager.deadline_timer.print_event_loop_stats - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode - 1 total (0 active), Execution time: mean = 2.872 ms, total = 2.872 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll.OnReplyReceived - 1 total (0 active), Execution time: mean = 221.010 us, total = 221.010 us, Queueing time: mean = 104.753 us, max = 104.753 us, min = 104.753 us, total = 104.753 us
ray::rpc::NodeInfoGcsService.grpc_client.RegisterNode.OnReplyReceived - 1 total (0 active), Execution time: mean = 575.837 us, total = 575.837 us, Queueing time: mean = 50.651 us, max = 50.651 us, min = 50.651 us, total = 50.651 us
NodeManager.GCTaskFailureReason - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::InternalKVGcsService.grpc_client.GetInternalConfig - 1 total (0 active), Execution time: mean = 2.146 ms, total = 2.146 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo - 1 total (0 active), Execution time: mean = 1.814 ms, total = 1.814 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::JobInfoGcsService.grpc_client.AddJob - 1 total (0 active), Execution time: mean = 1.830 ms, total = 1.830 ms, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
ray::rpc::NodeInfoGcsService.grpc_client.GetAllNodeInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 206.474 us, total = 206.474 us, Queueing time: mean = 113.161 us, max = 113.161 us, min = 113.161 us, total = 113.161 us
Subscriber.HandlePublishedMessage_GCS_JOB_CHANNEL - 1 total (0 active), Execution time: mean = 87.039 us, total = 87.039 us, Queueing time: mean = 313.227 us, max = 313.227 us, min = 313.227 us, total = 313.227 us
ray::rpc::JobInfoGcsService.grpc_client.GetAllJobInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 42.545 us, total = 42.545 us, Queueing time: mean = 109.614 us, max = 109.614 us, min = 109.614 us, total = 109.614 us
DebugString() time ms: 1