<aside>
<img src="/icons/city_gray.svg" alt="/icons/city_gray.svg" width="40px" /> Mako, Inc.
</aside>
<aside>
<img src="/icons/row_gray.svg" alt="/icons/row_gray.svg" width="40px" /> Table of Contents
</aside>
<aside>
<img src="/icons/flash_gray.svg" alt="/icons/flash_gray.svg" width="40px" /> Apply Now!
</aside>
Summary
Our R&D team is focused on creating the most efficient engine for deploying generative AI models, with efforts ranging from precise GPU kernel tuning to comprehensive system optimizations.
We're looking for an expert level engineer with a strong background in either CUDA, ROCm, or Triton kernel optimization. Your role will involve leading substantial improvements in GPU performance and playing a key role in pioneering AI and machine learning initiatives.
This job is based in either Gdansk or New York City. Remote work will be considered for exceptional candidates.
About Mako
Mako is a venture-backed tech startup building software infrastructure for high performance AI inference and training on any hardware. There are three core components:
- Mako Compiler automatically selects, tunes, and generates GPU kernels for any hardware platform (you’ll be working on this!)
- Mako Runtime serves compiled models at high performance
- Mako Platform enables users to easily deploy and manage deployments across any cloud
Responsibilities
- Explore and analyze performance bottlenecks in ML training and inference.
- Develop and optimize high-performance computing kernels in Triton, CUDA, and/or ROCm.
- Implement programming solutions in C/C++ and Python.
- Deep dive into GPU performance optimizations to maximize efficiency and speed.
- Collaborate with the team to extend and improve existing machine learning compilers or frameworks such as MLIR, Pytorch, Tensorflow, ONNX Runtime, TensorRT. (This is optional but beneficial)
Qualifications
- Bachelor's, Master’s or PhD’s degree in Computer Science, Electrical Engineering, or a related field.
- Strong programming skills in C/C++ and Python.
- Deep understanding and experience in GPU performance optimizations.
- Proven experience with kernel optimizations on CUDA, ROCm, or other accelerators.
- General experience with the training and deployment of ML models
- Experience with distributed systems development or distributed ML workloads
Bonus Points
- Experience with innovative OSS projects like FlashAttention, mlc-llm, vllm.
- Experience with machine learning compilers or frameworks such as TVM, MLIR, Pytorch, Tensorflow, ONNX Runtime, TensorRT.
Our Benefits
- Competitive salary and equity package
- Comprehensive health insurance coverage for you and your family
- Remote work option for exceptional candidates
- Generous vacation and paid time off policy
- Modern and comfortable work environment with state-of-the-art equipment and facilities
To Apply
**Fill out this form or:**
- Send an email to [email protected] with subject line: “GPU Kernel Engineer**”**
- Attach your resume as a PDF file
- Include in the body of the email any additional information we should be aware of, such as a GitHub account or blog posts you’ve authored.