Heterogeneous AI Infrastructure Engineer
Callosum
Location
London
Employment Type
Full time
Location Type
On-site
Department
Intelligent Systems Engineering
Compensation
- £101K – £192K • Offers Equity
Compensation reflecting your experience and skills.
About Us
Artificial intelligence scaled on a bet - that bigger models, more identical chips, and more data would keep delivering. As problems become more complex and the requirements of intelligence more diverse, that bet is breaking down.
We believe that the next era of AI belongs to heterogeneous intelligence: diverse models on diverse chips, each with distinct strengths, working together into something greater than the sum of their parts. Novel accelerators are emerging from every direction, but no infrastructure exists to bring them together. We are building it.
Callosum is the Intelligent Systems company. We believe intelligence comes from the system, not the model - where chips and models co-evolve to unlock discoveries unreachable under the current paradigm.
We are scientists and engineers solving what others consider impossible. If you thrive on hard problems, are passionate and energised by the scale of the challenge, we'd love to hear from you.
About the Role
As the Heterogeneous AI Infrastructure Engineer, you will build and optimise the orchestration and inference serving layers that make Callosum’s heterogeneous AI services fast, reliable and efficient at scale. You will own the systems that route workloads across diverse accelerators, and ensure inference runtimes perform at their limits across heterogeneous silicon. The role combines inference systems depth with production infrastructure ownership.
The role has strong growth potential toward technical leadership in silicon-to-system AI platform architecture and reliability, and involves strong collaboration with accelerator, runtime, and simulation engineering teams.
Responsibilities
Build and optimise inference serving infrastructure across heterogeneous silicon
Design and operate systems of heterogeneous compute: different accelerator types, memory capacities, and interconnect paths.
Own distributed systems concerns end-to-end: coordination, flow control, state management, consistency trade-offs, fault tolerance, and recovery behaviour.
Drive fleet-level performance analysis: profile, benchmark and eliminate bottlenecks across inference and infrastructure layers.
Build observability and reliability tooling for heterogeneous clusters: metrics pipelines and regression detection for performance.
Skills & Qualifications
Master’s degree or equivalent practical experience in distributed systems, systems engineering, HPC infrastructure, or a related technical field.
Strong experience building or operating production distributed systems with emphasis on performance, reliability, and fault tolerance.
Hands-on experience with Kubernetes, or equivalent orchestration systems.
Experience with topology-aware scheduling, NUMA-aware placement, or resource isolation for mixed workloads.
Hands-on experience with inference runtimes - vLLM, SGLang, TensorRT, or similar - with a focus on performance, batching and latency.
Strong programming ability in C++/Go, Bash or Python; comfortable with automation in Linux environments.
Comfort working from first principles on heterogeneous infrastructure problems and balancing performance, reliability, and operability.
Nice to Have
Experience with inference runtimes for distributed training or large-scale inference across clusters.
Experience with network fabrics - RDMA, InfiniBand, RoCE, or similar.
Familiarity with ML-native job scheduling - Ray, Slurm.
Exposure to heterogeneous or experimental AI infrastructure environments.
Experience collaborating with hardware, firmware, or runtime teams.
What We Offer
Competitive Salary: £101,000 - £192,000, determined by skills and experience.
Equity & Ownership
Medical and dental healthcare
We offer Visa sponsorship and relocation benefits to hire the best in the world
We work in person at our London office. You'll have the tools, space and setup to do your best work, and if you have specific needs, just tell us
We're committed to building an inclusive workplace where everyone feels welcome, and believe in equal opportunities for all.
Compensation Range: £101K - £192K