我试图确定我能在CPU和GPU上运行多大的模型。使用下面的代码作为模板,我从一个小网络开始,并增加了参数直到失败。令我惊讶的是,第一次失败是由于下面的代码与CPU内存不足的错误有关。
我的GPU有12 of的RAM。我的CPU有128 of的RAM。为什么CPU会在GPU之前耗尽内存?如何让tensorflow与CPU一起使用更多的内存?
import time
import tensorflow as tf
import numpy as np
from tensorflow import keras
(X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data()
X_train_scaled = X_train/ 255
X_test_scaled = X_test / 255
y_train_encoded = keras.utils.to_categorical(y_train, num_classes = 10, dtype = 'float32')
y_test_encoded = keras.utils.to_categorical(y_test, num_classes = 10, dtype = 'float32')
def get_model():
model = keras.Sequential([
keras.layers.Flatten(input_shape=(32,32,3)),
keras.layers.Dense(20000, activation='relu'),
keras.layers.Dense(20000, activation='relu'),
keras.layers.Dense(10, activation='sigmoid')
model.compile(optimizer='SGD',
loss='categorical_crossentropy',
metrics=['accuracy'])
return model
with tf.device('/GPU:0'):
model_gpu = get_model()
t0 = time.time()
model_gpu.fit(X_train_scaled, y_train_encoded, epochs = 1)
t1 = time.time()
print('GPU: ', t1 - t0)
with tf.device('/CPU:0'):
model_cpu = get_model()
t0 = time.time()
model_cpu.fit(X_train_scaled, y_train_encoded, epochs = 1)
t1 = time.time()
print('CPU: ', t1 - t0)
当我运行上面的代码时,我得到以下输出。
2022-03-15 02:04:41.968970: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0b:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-03-15 02:04:41.972553: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0b:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-03-15 02:04:41.972749: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0b:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-03-15 02:04:41.973141: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-15 02:04:41.975086: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0b:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-03-15 02:04:41.975318: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0b:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-03-15 02:04:41.975535: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0b:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-03-15 02:04:42.332615: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0b:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-03-15 02:04:42.332868: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0b:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-03-15 02:04:42.332901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1609] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-03-15 02:04:42.333147: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0b:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-03-15 02:04:42.333209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9396 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:0b:00.0, compute capability: 8.6
2022-03-15 02:04:44.056461: I tensorflow/stream_executor/cuda/cuda_blas.cc:1774] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
1563/1563 [==============================] - 36s 23ms/step - loss: 1.7839 - accuracy: 0.3684
GPU: 36.7231342792511
2022-03-15 02:05:22.016145: E tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 2147483648 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-15 02:05:22.016190: W ./tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 2147483648
2022-03-15 02:05:22.786090: E tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 1932735232 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-15 02:05:22.786147: W ./tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 1932735232
2022-03-15 02:05:23.549123: E tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 1739461632 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-15 02:05:23.549172: W ./tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 1739461632
2022-03-15 02:05:24.478277: E tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 2147483648 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-15 02:05:24.478325: W ./tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 2147483648
2022-03-15 02:05:35.315236: E tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 2147483648 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-15 02:05:35.315301: W ./tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 2147483648
2022-03-15 02:05:36.087838: E tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 2147483648 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-03-15 02:05:36.087884: W ./tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 2147483648
2022-03-15 02:05:36.087902: W tensorflow/core/common_runtime/bfc_allocator.cc:462] Allocator (gpu_host_bfc) ran out of memory trying to allocate 1.49GiB (rounded to 1600000000)requested by op SGD/SGD/update_2/ResourceApplyGradientDescent
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2022-03-15 02:05:36.087925: I tensorflow/core/common_runtime/bfc_allocator.cc:1010] BFCAllocator dump for gpu_host_bfc
2022-03-15 02:05:36.087942: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (256): Total Chunks: 5, Chunks in use: 5. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 28B client-requested in use in bin.
2022-03-15 02:05:36.087949: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (512): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.087954: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (1024): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.087959: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (2048): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.087964: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (4096): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.087968: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (8192): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.087972: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (16384): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.087977: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (32768): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.087981: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (65536): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.087985: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (131072): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.087990: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (262144): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.087995: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (524288): Total Chunks: 1, Chunks in use: 1. 781.2KiB allocated for chunks. 781.2KiB in use in bin. 781.2KiB client-requested in use in bin.
2022-03-15 02:05:36.088001: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (1048576): Total Chunks: 1, Chunks in use: 0. 1.24MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.088005: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (2097152): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.088009: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (4194304): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.088014: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (8388608): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.088018: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (16777216): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.088023: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.088028: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.088033: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (134217728): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.088037: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (268435456): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-03-15 02:05:36.088041: I tensorflow/core/common_runtime/bfc_allocator.cc:1033] Bin for 1.49GiB was 256.00MiB, Chunk State:
2022-03-15 02:05:36.088046: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Next region of size 2097152
2022-03-15 02:05:36.088052: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 303850000 of size 256 next 1
2022-03-15 02:05:36.088057: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 303850100 of size 256 next 2
2022-03-15 02:05:36.088060: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 303850200 of size 256 next 3
2022-03-15 02:05:36.088063: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 303850300 of size 256 next 4
2022-03-15 02:05:36.088067: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 303850400 of size 256 next 5
2022-03-15 02:05:36.088071: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] InUse at 303850500 of size 800000 next 6
2022-03-15 02:05:36.088074: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Free at 303913a00 of size 1295872 next 18446744073709551615
2022-03-15 02:05:36.088079: I tensorflow/core/common_runtime/bfc_allocator.cc:1071] Summary of in-use Chunks by size:
2022-03-15 02:05:36.088083: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 5 Chunks of size 256 totalling 1.2KiB
2022-03-15 02:05:36.088087: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 800000 totalling 781.2KiB
2022-03-15 02:05:36.088091: I tensorflow/core/common_runtime/bfc_allocator.cc:1078] Sum Total of in-use chunks: 782.5KiB
2022-03-15 02:05:36.088095: I tensorflow/core/common_runtime/bfc_allocator.cc:1080] total_region_allocated_bytes_: 2097152 memory_limit_: 68719476736 available bytes: 68717379584 curr_region_allocation_bytes_: 2147483648
2022-03-15 02:05:36.088103: I tensorflow/core/common_runtime/bfc_allocator.cc:1086] Stats:
Limit: 68719476736
InUse: 801280
MaxInUse: 801280
NumAllocs: 6261
MaxAllocSize: 800000
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
2022-03-15 02:05:36.088110: W tensorflow/core/common_runtime/bfc_allocator.cc:474] ***************************************_____________________________________________________________
2022-03-15 02:05:36.088143: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at training_ops.cc:973 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[20000,20000] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator gpu_host_bfc
我认为把20000个神经元放在隐藏层是问题所在。你可以减少隐藏层中神经元的数量,以看到差异。我把128个和64个神经元放在CPU模式下来解决这个问题。
def get_model():
model = keras.Sequential([
keras.layers.Flatten(input_shape=(32,32,3)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(64, activation='relu'),