
PDMFC
IA Engineer
August 1, 2025
Position Overview
We are seeking a talented and innovative AI Engineer with exceptional Python skills to join our cutting-edge AI team. The ideal candidate will have hands-on experience deploying and optimizing large language models (LLMs) and multimodal AI systems using frameworks like llama.cpp and vLLM. You will be responsible for building and maintaining robust AI services that power our speech-to-text, text-to-speech, summarization, RAG (Retrieval-Augmented Generation), and multimodal generation capabilities. This role offers an exciting opportunity to work at the intersection of multiple AI domains and develop solutions that transform how our users interact with technology.
Key Responsibilities
- Design, implement, and optimize AI model serving infrastructure using llama.cpp, vLLM, or similar frameworks to ensure efficient inference and deployment
- Develop and maintain speech-to-text and text-to-speech pipelines that deliver high-quality audio processing capabilities
- Create robust RAG systems that effectively retrieve relevant information and generate accurate, contextually appropriate responses
- Build and optimize multimodal AI systems capable of processing and generating text, images, and audio content
- Implement efficient document summarization capabilities for various content types and lengths
- Continuously evaluate and improve model performance, latency, and resource utilization
- Collaborate with cross-functional teams to integrate AI capabilities into our product ecosystem
- Research and implement state-of-the-art techniques to enhance our AI systems' capabilities
- Develop monitoring and evaluation frameworks to track AI system performance and identify areas for improvement
Required Qualifications
- Bachelor's degree in Computer Science, Artificial Intelligence, or a related technical field
- Strong Python programming skills with demonstrated experience in AI/ML development
- Hands-on experience deploying and optimizing LLMs using llama.cpp, vLLM, or similar frameworks
- Experience with speech processing technologies (ASR/TTS) and relevant libraries
- Knowledge of RAG architectures and vector databases (e.g., Pinecone, Weaviate, Milvus)
- Familiarity with multimodal AI systems that process text, images, and audio
- Experience with deep learning frameworks such as PyTorch or TensorFlow
- Understanding of model quantization, optimization, and serving techniques
- Proficiency with containerization technologies (Docker) and cloud environments
Preferred Qualifications
- Master's or PhD in AI, ML, NLP, or a related field
- Experience with ONNX Runtime, TensorRT, or other model optimization frameworks
- Knowledge of distributed systems for large-scale AI model serving
- Experience with streaming data processing for real-time AI applications
- Familiarity with GPU/TPU acceleration and optimization techniques
- Contributions to open-source AI projects or research publications
- Experience with fine-tuning and adapting foundation models for specific tasks
- Knowledge of MLOps practices and tools for AI system deployment and monitoring
- Experience with prompt engineering and LLM evaluation methodologies
What We Offer
- Opportunity to work on cutting-edge AI technologies that impact millions of users
- Access to state-of-the-art computing resources and the latest AI models
- Collaborative environment with leading experts in AI research and engineering
- Continuous learning and professional development opportunities
- Competitive salary package and comprehensive benefits
- Flexible working arrangements to support work-life balance
- Regular opportunities to present and publish your work
Join our team and help shape the future of multimodal AI systems. Your expertise in Python development and AI model serving will be instrumental in creating seamless, efficient, and powerful AI experiences that transform how people interact with technology across multiple modalities.
If you have any questions, please email us at