The Tesla P4 is engineered to deliver real-time inference performance and enable smart user experiences in scale-out servers.
RESPONSIVE EXPERIENCE WITH REAL-TIME INFERENCE
Responsiveness is key to user engagement for services such as interactive speech, visual search, and video recommendations. As models increase in accuracy and complexity, CPUs are no longer capable of delivering a responsive user experience. The Tesla P4 delivers 22 TOPs of inference performance with INT8 operations to slash latency by 15X.
UNPRECEDENTED EFFICIENCY FOR LOWPOWER SCALE-OUT SERVERS
The Tesla P4’s small form factor and 50W/75W power footprint design accelerates densityoptimized, scale-out servers. It also provides an incredible 60X better energy efficiency than CPUs for deep learning inference workloads, letting hyperscale customers meet the exponential growth in demand for AI applications.
UNLOCK NEW AI-BASED VIDEO SERVICES WITH A DEDICATED DECODE ENGINE
Tesla P4 can transcode and infer up to 35 HD video streams in real-time, powered by a dedicated hardware-accelerated decode engine that works in parallel with the GPU doing inference. By integrating deep learning into the video pipeline, customers can offer smart, innovative video services to users which were previously impossible to do.
FASTER DEPLOYMENT WITH TensorRT AND DEEPSTREAM SDK
TensorRT is a library created for optimizing deep learning models for production deployment. It takes trained neural nets—usually in 32-bit or 16-bit data—and optimizes them for reduced precision INT8 operations. NVIDIA DeepStream SDK taps into the power of Pascal GPUs to simultaneously decode and analyze video streams.