Cloud Systems & Resource Orchestration - Member of Technical Staff

Callosum

Callosum

IT

London, UK

Posted on May 21, 2026

Location

London

Employment Type

Full time

Location Type

On-site

Department

Intelligent Systems Engineering

About Us

Artificial intelligence scaled on a bet - that bigger models, more identical chips, and more data would keep delivering. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. The next era belongs to heterogeneous intelligence: diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability unreachable by any single model or accelerator.

Callosum is the Intelligent Systems company. We built the infrastructure to make that possible. Our co-evolution engine optimises simultaneously across workflows, agents, and silicon. We launched in early 2026 showing orders of magnitude improvements in performance and a shift in the cost-performance frontier that no single chip or model provider can provide.

We believe intelligence comes from the system, not the model.

We are scientists and engineers solving what others consider impossible. If you thrive on hard problems, and are passionate and energised by the scale of the challenge, we'd love to hear from you.

About the Role

Callosum believes that orders of magnitude improvements in AI systems will come through application-aware orchestration across heterogeneous hardware. We are building that vision: infrastructure that treats the full landscape of compute as a unified, co-evolving system, evolved beyond GPUs. Current orchestration stacks were built for the homogeneous world - naive to the strengths of new chips and blind to the demands of modern multi-agent workflows.

This role defines how Callosum addresses this problem at the cloud and cluster level, transforming a fragmented compute ecosystem into a unified, exploitable resource pool. We are building the novel paradigm of orchestration that understands accelerator-specific constraints and capabilities. Your work is what makes heterogeneous compute intelligent at scale: every chip placed precisely and allocated efficiently in a stack that is resource-aware and diversity-native.

What You’ll Build

  • Design and build multi-cloud orchestration systems that abstract provider-specific differences behind a unified deployment and scheduling layer

  • Extend Kubernetes - particularly Dynamic Resource Allocation (DRA) — to be aware of heterogeneous accelerator topologies and capabilities, and multi-agent AI workflows

  • Implement intelligent load balancing and placement strategies across cloud providers, regions, and hardware types

  • Build control plane systems that enable efficient allocation and management of heterogeneous accelerator capacity while preserving the ability to exploit hardware-specific strengths

  • Collaborate with an Accelerator Systems Software engineer to surface low-level scheduling primitives into the orchestration layer

What You Bring

  • Strong experience with Kubernetes internals - custom controllers, schedulers, device plugins, CRDs, and the DRA framework

  • You've built or operated multi-cloud infrastructure and have a detailed understanding of the networking, storage, and compute differences between major providers

  • Familiarity with GPU/accelerator resource management in cluster environments (e.g. MIG, time-slicing, device plugins, topology-aware scheduling)

  • Experience with infrastructure-as-code, fleet management, and the reliability engineering required to keep large-scale heterogeneous systems running