RTX 2080Ti with NVLINK - Performance TensorFlow (includes comparison with GTX 1080Ti, RTX 2070, 2080, 2080 Ti and Titan V)
Functionality and Peer-to-Peer data transfer for the RTX 2080 graphics processor with NVLINKRTX 2080 Ti NVLINK "Capability" report from nvidia-smi nvlink -c
RTX 2080 Ti NVLINK Performance from peer to peer:
simpleP2P
Does NVLINK with NVIDIA RTX 2080 Ti GPU use both connectors for CUDA Memory Copy?
p2pBandwidthLatencyTest
Performance TensorFlow with 2 RTX 2080 Ti GPU and NVLINK
TensorFlow CNN: ResNet-50
ResNet-50 - GTX 1080Ti, RTX 2070, RTX 2080, RTX 2080Ti and Titan V - TensorFlow - Training Performance (images / sec)
TensorFlow LSTM: Big-LSTM 1 Billion Word Dataset
"Large LSTM" - GTX 1080Ti, RTX 2070, RTX 2080, RTX 2080Ti and Titan V - TensorFlow - Training Performance (lyrics / sec)
Should you get an RTX 2080Ti (or two or more) machine to work in automated learning?
This functionality is a continuation of the NVIDIA RTX GPU test that you have done with TensorFlow in; NVLINK on RTX 2080 TensorFlow and Peer-to-Peer with Linux and NVIDIA RTX 2080 Ti vs 2080 vs 1080 Ti vs Titan V, TensorFlow Performance with CUDA 10.0. The same task performed in these two previous functions will be extended with the dual RTX 2080Ti. You are also able to add performance numbers for one RTX 2070.
If you have read the previous posts, you may need to scroll down and check the new results tables and plots.
System testing
hardware
Single peak Puget systems
Intel Xeon-W 2175 14-core
128GB memory
1TB Samsung NVMe M.2
And GPU
GTX 1080Ti
RTX 2070
RTX 2080 (2)
RTX 2080Ti (2)
Titan V
Software
Ubuntu
NVIDIA display driver 410.66 (from CUDA installation) NOTE: The 410.48 driver that you used in the previous test was causing system restart during the large LSTM test with RTX 2080Ti and NVLINK 2.
CUDA 10.0 Source Builder,
simpleP2P
p2pBandwidthLatencyTest
TensorFlow 1.10 and 1.4
Port worker 18.06.1-m
NVIDIA-Docker 2.0.3
Record the NVIDIA NGC container
Container Image: nvcr.io/nvidia/tensorflow:18.08-py3 For "LSTM Great"
Container image: nvcr.io/nvidia/tensorflow:18.03-py2 Linked to NCCL and CUDA 9.0 for the MLti-GPU "CNN"
Two versions of TensorFlow were used because the latest version of the TensorFlow TensorFlow image on NGC does not support the multi-graphics processing unit for the CNN ResNet-50 training test job that I love to use. To get the "Big LSTM billion word" training model, use the latest container with TensorFlow 1.10 that is associated with CUDA 10.0. Both test programs are from "nvidia examples" in instances of the container.
For details on how to install Docker / NVIDIA-Docker on my workstation, you can take a look at the next post as well as the links that are contained in the rest of the thread. How to set up NVIDIA Docker and NGC log on your workstation - Part 5 Perform port factor and adjust resources
Functionality and Peer-to-Peer data transfer for the RTX 2080 graphics processor with NVLINK
RTX 2080 Ti NVLINK "Capability" report from nvidia-smi nvlink -c
There are two types of links available:
GPU 0: GeForce RTX 2080 Ti (UUID:
The link is supported 0, P2P: true
Link 0, Access to system memory supported: true
Link 0, supports P2P atoms: true
Link 0, system memory support atomics: true
Link 0, supports SLI: True
Link 0, Supports link: false
Link 1, supports P2P: True
Link 1, Access the system memory supported: true
1, support P2P atomics: true
Link 1, system memory support atomics: true
Link 1, Supports SLI: True
Link 1, link is supported: false
These two links are grouped via the NVLINK bridge!
RTX 2080 Ti NVLINK Performance from peer to peer:
In short, NVLINK provides with two RTX 2080 Ti GPU features and performance following,
simpleP2P
Peer-to-Peer Memory Access: Yes
Unified Universal Address (UVA): Yes
Does NVLINK with NVIDIA RTX 2080 Ti GPU use both connectors for CUDA Memory Copy?
Yes really!
cudaMemcpyPeer / cudaMemcpy Between GPU0 and GPU1: 44.87 GB / s
This is double the one-way bandwidth of RTX 2080.
p2pBandwidthLatencyTest
The final output below shows that two RTX 2080 Ti GPU with NVLINK provide,
Unidirectional bandwidth: 48 GB / s
Bidirectional bandwidth: 96 GB / s
Disappearance (Peer-To-Peer Disabled)
GPU-GPU: 12 seconds min
Cumin (Peer-To-Peer Enabled),
GPU-GPU: 1.3 seconds accurate
Bidirectional bandwidth over NVLINK with 2 2080 Ti GPU is approximately 100 GB / s!
P2P Connectivity Matrix
D \ D 0 1
0 1 1
1 1 1
P2P Unidirectional = Disabled Bandwidth Matrix (GB / s)
D \ D 0 1
0 528.83 5.78
1 5.81 531.37
Unidirectional P2P = Enables Bandwidth (P2P) Matrix (GB / s)
D \ D 0 1
0 532.21 48.37
1 48.38 532.37
Bidirectional Bidirectional P2P = Disabled Bandwidth Matrix (GB / s)
D \ D 0 1
0 535.76 11.31
1 11.42 536.52
Bidirectional Bidirectional P2P = Possible Bandwidth Matrix (GB / s)
D \ D 0 1
0 535.72 96.40
1 96.40 534.63
P2P = Lost Response Matrix (US)
GPU
No comments