Job Description
|
Responsibilities: · Lead research into state-of-the-art optimization techniques, including Quantization-Aware Training (QAT), Pruning, Knowledge Distillation, and Neural Architecture Search (NAS) to minimize latency. · Design and implement scalable AI deployment architectures that can handle high-throughput data streams from multiple high-resolution cameras and process sensors simultaneously. · Conduct hardware-software co-design to optimize models for specific deployment targets (e.g., NVIDIA Jetson, TensorRT, FPGAs, or specialized AI accelerators). · Develop and manage asynchronous data pipelines that ensure zero-bottleneck performance from image acquisition to "final sentencing" decisions. · Establish rigorous performance profiling benchmarks to track model latency and memory footprint across various manufacturing environments. · Work with the System Integrator (SI) to ensure that optimized models are seamlessly integrated into the factory-level software stack. JOB REQUIREMENTS · Ph.D. in Computer Engineering, Computer Science, Electrical Engineering, or a related field with a focus on High-Performance AI. · Deep understanding of AI Inference Engines (e.g., TensorRT, ONNX Runtime, OpenVINO). · Mastery of Model Compression techniques (Pruning, Quantization, Distillation). · Expertise in C++ and Python for high-performance implementation. · Hands-on experience with Parallel Computing (CUDA, OpenCL). · Familiarity with Mixed-Precision Training and FP16/INT8 deployment. · Proven ability to architect end-to-end AI systems that balance the trade-off between throughput, latency, and model precision. |