Heterogeneous AI Hardware Engineer
Callosum
Location
London
Employment Type
Full time
Location Type
On-site
Department
Intelligent Systems Engineering
Compensation
- £101K – £192K • Offers Equity
Compensation reflecting skills and experience levels.
About Us
Artificial intelligence scaled on a bet - that bigger models, more identical chips, and more data would keep delivering. As problems become more complex and the requirements of intelligence more diverse, that bet is breaking down.
We believe that the next era of AI belongs to heterogeneous intelligence: diverse models on diverse chips, each with distinct strengths, working together into something greater than the sum of their parts. Novel accelerators are emerging from every direction, but no infrastructure exists to bring them together. We are building it.
Callosum is the Intelligent Systems company. We believe intelligence comes from the system, not the model - where chips and models co-evolve to unlock discoveries unreachable under the current paradigm.
We are scientists and engineers solving what others consider impossible. If you thrive on hard problems, are passionate and energised by the scale of the challenge, we'd love to hear from you.
About the Role
As the Heterogeneous AI Accelerator Engineer, you will take novel accelerators from initial bring-up to production-grade performance across Callosum’s heterogeneous infrastructure. You will write custom kernels, optimise communication stacks and profile hardware at its limits, ensuring every new device is optimised to its top efficiency for AI workloads. The work spans firmware, drivers, runtime integration, communication stacks, and deep performance characterization for large‑scale AI workloads.
The role has strong growth potential toward technical leadership in heterogeneous system architecture design, and involves close collaboration with infrastructure, orchestration, and simulation teams.
Responsibilities
Bring up and characterize new compute hardware from first power‑on through stable multi‑node operation, establishing performance envelopes, thermal and failure boundaries, and integration constraints.
Develop and maintain low‑level interfaces including firmware hooks, drivers, runtime bindings, and communication plugins
Design and optimise communication paths across interconnects (PCIe, NVLink-class fabrics, RDMA, InfiniBand, RoCE, or emerging coherent links)
Integrate new hardware into distributed AI frameworks and inference runtimes, validating scaling behaviour.
Qualifications
Master’s degree or equivalent practical experience in computer engineering, systems engineering, HPC, or related field.
Hands-on experience with accelerators, GPUs, AI ASICs, FPGAs, or emerging compute hardware, with strong hardware performance profiling skills.
Experience with CUDA, HIP, Triton, or other accelerator programming environments.
Strong understanding of low‑level systems including memory hierarchies, DMA, driver models, and hardware/software interfaces.
Experience with high-performance communication libraries and collective frameworks (UCX, NCCL-class collectives, or similar).
Comfort working from first principles on unfamiliar hardware and debugging across hardware, firmware, and runtime boundaries.
Nice to Have
Experience enabling hardware for large‑scale ML training or inference workloads.
Exposure to heterogeneous clusters or experimental hardware environments.
Background in networking hardware, distributed runtimes, or kernel‑level work.
Experience collaborating with silicon vendors or hardware teams.
What We Offer
Competitive Salary: £101,000 - £192,000, determined by skills and experience
Equity & Ownership
Medical and dental healthcare
We offer Visa sponsorship and relocation benefits to hire the best in the world
We work in person at our London office. You'll have the tools, space and setup to do your best work, and if you have specific needs, just tell us
We're committed to building an inclusive workplace where everyone feels welcome, and believe in equal opportunities for all.
Compensation Range: £101K - £192K