Job Description
Responsibilities:
· Lead research into state-of-the-art optimization techniques, including Quantization-Aware Training (QAT), Pruning, Knowledge Distillation, and Neural Architecture Search (NAS) to minimize latency.
· Design and implement scalable AI deployment architectures that can handle high-throughput data streams from multiple high-resolution cameras and process sensors simultaneously.
· Conduct hardware-software co-design to optimize models for specific deployment targets (e.g., NVIDIA Jetson, TensorRT, FPGAs, or specialized AI accelerators).
· Develop and manage asynchronous data pipelines that ensure zero-bottleneck performance from image acquisition to "final sentencing" decisions.
· Establish rigorous performance profiling benchmarks to track model latency and memory footprint across various manufacturing environments.
· Work with the System Integrator (SI) to ensure that optimized models are seamlessly integrated into the factory-level software stack.
JOB REQUIREMENTS
· Ph.D. in Computer Engineering, Computer Science, Electrical Engineering, or a related field with a focus on High-Performance AI.
· Deep understanding of AI Inference Engines (e.g., TensorRT, ONNX Runtime, OpenVINO).
· Mastery of Model Compression techniques (Pruning, Quantization, Distillation).
· Expertise in C++ and Python for high-performance implementation.
· Hands-on experience with Parallel Computing (CUDA, OpenCL).
· Familiarity with Mixed-Precision Training and FP16/INT8 deployment.
· Proven ability to architect end-to-end AI systems that balance the trade-off between throughput, latency, and model precision.