Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    What was the first RGB-lit PC component that wasn’t a case?

    Judge dismisses lawsuit twice due to alleged deepfake video testimony

    Microsoft is baking Sysmon directly into Windows 11 and Windows Server

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Insurance companies are trying to avoid big payouts by making AI safer

      November 19, 2025

      State and local opposition to new data centers is gaining steam, study shows

      November 15, 2025

      Amazon to lay off 14,000 corporate employees

      October 29, 2025

      Elon Musk launches Grokipedia as an alternative to ‘woke’ Wikipedia

      October 29, 2025

      Fears of an AI bubble are growing, but some on Wall Street aren’t worried just yet

      October 18, 2025
    • Business

      Windows 11 gets new Cloud Rebuild, Point-in-Time Restore tools

      November 18, 2025

      Government faces questions about why US AWS outage disrupted UK tax office and banking firms

      October 23, 2025

      Amazon’s AWS outage knocked services like Alexa, Snapchat, Fortnite, Venmo and more offline

      October 21, 2025

      SAP ECC customers bet on composable ERP to avoid upgrading

      October 18, 2025

      Revenue generated by neoclouds expected to exceed $23bn in 2025, predicts Synergy

      October 15, 2025
    • Crypto

      Nvidia Posts $57B Record Revenue with Bitcoin Rebounding Above $91K

      November 20, 2025

      3 Reasons Why A Cardano Price Rebound Looks Likely

      November 20, 2025

      BitMine (BMNR) Stock Bounces As Q4 Results Near — Is the Price Preparing Another Early Move?

      November 20, 2025

      Fed Minutes Reveal December Rate Cut on a Knife’s Edge, Bitcoin Slips Below $89,000

      November 20, 2025

      TRUMP Price Holds Above $7, Even As Epstein Files Release Approved

      November 20, 2025
    • Technology

      What was the first RGB-lit PC component that wasn’t a case?

      November 20, 2025

      Judge dismisses lawsuit twice due to alleged deepfake video testimony

      November 20, 2025

      Microsoft is baking Sysmon directly into Windows 11 and Windows Server

      November 20, 2025

      Blender 5.0 lands with HDR upgrades and improved rendering features

      November 20, 2025

      Roblox will now restrict chat by age group, verified with selfie-based scans

      November 20, 2025
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»From Rust to Reality: The Hidden Journey of Fetch_max
    Technology

    From Rust to Reality: The Hidden Journey of Fetch_max

    TechAiVerseBy TechAiVerseSeptember 23, 2025No Comments14 Mins Read1 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    From Rust to Reality: The Hidden Journey of Fetch_max
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    From Rust to Reality: The Hidden Journey of Fetch_max

    QuestDB is the open-source time-series database for demanding workloads—from trading floors to mission control
    It delivers ultra-low latency, high ingestion throughput, and a multi-tier storage engine.
    Native support for Parquet and SQL keeps your data portable, AI-ready—no vendor lock-in.


    How a Job Interview Sent Me Down a Compiler Rabbit Hole

    I occasionally interview candidates for engineering roles. We need people who understand
    concurrent programming. One of our favorite questions involves keeping track of a
    maximum value across multiple producer threads – a classic pattern that appears in many
    real-world systems.

    Candidates can use any language they want.
    In Java (the language I know best), you might write a CAS loop,
    or if you’re feeling functional, use updateAndGet() with a lambda:

    AtomicLong highScore = new AtomicLong(100);

    [...]

    highScore.updateAndGet(current -> Math.max(current, newScore));

    But that lambda is doing work – it’s still looping under the hood, retrying if
    another thread interferes. You can see the loop right in AtomicLong’s source code.

    Then one candidate chose Rust.

    I was following along as he started typing, expecting to see either an explicit
    CAS loop or some functional wrapper around one. But instead, he just wrote:

    high_score.fetch_max(new_score, Ordering::Relaxed);

    “Rust has fetch_max built in,” he explained casually, moving on to the next
    part of the problem.

    Hold on. This wasn’t a wrapper around a loop pattern – this was a first-class
    atomic operation, sitting right there next to fetch_add and fetch_or. Java
    doesn’t have this. C++ doesn’t have this. How could Rust just… have this?

    After the interview, curiosity got the better of me. Why would Rust provide
    fetch_max as a built-in intrinsic? Intrinsics usually exist to leverage
    specific hardware instructions. But x86-64 doesn’t have an atomic max
    instruction. So there had to be a CAS loop somewhere in the pipeline. Unless…
    maybe some architectures do have this instruction natively? And if so, how
    does the same Rust code work on both?

    I had to find out. Was the loop in Rust’s standard library? Was it in LLVM?
    Was it generated during code generation for x86-64?

    So I started digging. What I found was a fascinating journey through five
    distinct layers of compiler transformations, each one peeling back another level
    of abstraction, until I found exactly where that loop materialized. Let me share
    what I discovered.

    Layer 1: The Rust Code

    Let’s start with what that candidate wrote – a simple high score tracker that can
    be safely updated from multiple threads:

    use std::sync::atomic::{AtomicU64, Ordering};

    fn main() {

    let high_score = AtomicU64::new(100);

    // [...]

    // Another thread reports a new score of 200

    let _old_score = high_score.fetch_max(200, Ordering::Relaxed);

    // [...]

    }

    // Save this snippet as `main.rs` we are going to use it later.

    This single line does exactly what it promises: atomically fetches the current
    value, compares it with the new one, updates it if the new value is greater, and
    returns the old value. It’s safe, concise, and impossible to mess up. No
    explicit loops, no retry logic visible anywhere. But how does it actually work under
    the hood?

    Layer 2: The Macro Expansion

    Before our fetch_max call even reaches anywhere close to machine code generation,
    there’s another layer of abstraction at work. The fetch_max method isn’t hand-written
    for each atomic type – it’s generated by a Rust macro called atomic_int!.

    If we peek into Rust’s standard library source code, we find that AtomicU64
    and all its methods are actually created by
    this macro:

    atomic_int! {

    cfg(target_has_atomic = "64"),

    // ... various configuration attributes ...

    atomic_umin, atomic_umax, // The intrinsics to use

    8, // Alignment

    u64 AtomicU64 // The type to generate

    }

    Inside this macro, fetch_max is defined as a
    template
    that works for any integer type:

    pub fn fetch_max(&self, val: $int_type, order: Ordering) -> $int_type {

    // SAFETY: data races are prevented by atomic intrinsics.

    unsafe { $max_fn(self.v.get(), val, order) }

    }

    The $max_fn placeholder gets replaced with atomic_umax for unsigned types
    and atomic_max for signed types. This single macro definition generates
    fetch_max methods for AtomicI8, AtomicU8, AtomicI16, AtomicU16, and so
    on – all the way up to AtomicU128.

    So our simple fetch_max call is actually invoking generated code. But what
    does the atomic_umax function actually do? To answer that, we need
    to see what the Rust compiler produces next.

    Layer 3: LLVM IR

    Now that we know fetch_max is macro-generated code calling atomic_umax,
    let’s see what happens when the Rust compiler processes it. The compiler
    doesn’t go straight to assembly. First, it translates the code into an
    intermediate representation. Rust uses the LLVM compiler project, so it
    generates LLVM Intermediate Representation (IR).

    If we peek at the LLVM IR for our fetch_max call, we see something like this:

    ; Before the transformation

    bb7:

    %0 = atomicrmw umax ptr %self, i64 %val monotonic, align 8

    ...

    This is LLVM’s language for saying: “I need an atomic read-modify-write
    operation. The modification I want to perform is an unsigned maximum.”

    This is a powerful, high-level instruction within the compiler itself. But it
    poses a critical question: does the CPU actually have a single instruction
    called umax? For most architectures, the answer is no. So how does the
    compiler bridge this gap?

    How to See This Yourself

    My goal is not to merely describe what is happening, but to give you the tools to
    see it for yourself. You can trace this transformation step-by-step on your own
    machine.

    First, tell the Rust compiler to stop after generating the LLVM IR:

    rustc --emit=llvm-ir main.rs

    This creates a main.ll file. This file contains the LLVM IR
    representation of your Rust code, including our atomicrmw umax instruction.
    Keep the file around; we’ll use it in the next steps.

    Interlude: Compiler Intrinsics

    We’re missing something important. How does the Rust function atomic_umax
    actually become the LLVM instruction atomicrmw umax? This is where compiler
    intrinsics come into play.

    If you dig into Rust’s source code, you’ll find that atomic_umax is
    defined like this:

    /// Updates `*dst` to the max value of `val` and the old value (unsigned comparison)

    #[inline]

    #[cfg(target_has_atomic)]

    #[cfg_attr(miri, track_caller)] // even without panics, this helps for Miri backtraces

    unsafe fn atomic_umax(dst: *mut T, val: T, order: Ordering) -> T {

    // SAFETY: the caller must uphold the safety contract for `atomic_umax`

    unsafe {

    match order {

    Relaxed => intrinsics::atomic_umax::(dst, val),

    Acquire => intrinsics::atomic_umax::(dst, val),

    Release => intrinsics::atomic_umax::(dst, val),

    AcqRel => intrinsics::atomic_umax::(dst, val),

    SeqCst => intrinsics::atomic_umax::(dst, val),

    }

    }

    }

    But what is this intrinsics::atomic_umax function? If you look at its
    definition
    ,
    you find something slightly unusual:

    /// Maximum with the current value using an unsigned comparison.

    /// `T` must be an unsigned integer type.

    ///

    /// The stabilized version of this intrinsic is available on the

    /// [`atomic`] unsigned integer types via the `fetch_max` method. For example, [`AtomicU32::fetch_max`].

    #[rustc_intrinsic]

    #[rustc_nounwind]

    pub unsafe fn atomic_umax(dst: *mut T, src: T) -> T;

    There is no body. This is a declaration, not a definition. The
    #[rustc_intrinsic] attribute tells the Rust compiler that this function
    maps directly to a low-level operation understood by the compiler
    itself. When the Rust compiler sees a call to intrinsics::atomic_umax, it
    knows to
    replace it
    with the corresponding
    LLVM intrinsic function.

    So our journey actually looks like this:

    1. fetch_max method (user-facing API)
    2. Macro expands to call atomic_umax function
    3. atomic_umax is a compiler intrinsic
    4. Rustc replaces the intrinsic with LLVM’s atomicrmw umax ← We are here
    5. LLVM processes this instruction…

    Layer 4: The Transformation

    LLVM runs a series of “passes” that analyze and transform the code. The one we’re interested in is called the
    AtomicExpandPass.

    Its job is to look at high-level atomic operations like atomicrmw umax and ask
    the target architecture, “Can you do this natively?”

    When the x86-64 backend says “No, I can’t,” this pass expands the single
    instruction into a sequence of more fundamental ones that the CPU does
    understand. The result is a
    compare-and-swap (CAS) loop.

    We can see this transformation in action by asking LLVM to emit the
    intermediate representation before and after this pass. To see the IR before
    the AtomicExpandPass, run:

    llc -print-before=atomic-expand main.ll -o /dev/null

    Tip: If you do not have llc installed, you can ask rustc to run the pass for you directly.
    rustc -C llvm-args="-print-before=atomic-expand -print-after=atomic-expand" main.rs

    The code will be printed to your terminal. The function containing our atomic max
    looks like this:

    *** IR Dump Before Expand Atomic instructions (atomic-expand) ***

    ; Function Attrs: inlinehint nonlazybind uwtable

    define internal i64 @_ZN4core4sync6atomic9AtomicU649fetch_max17h6c42d6f2fc1a6124E(ptr align 8 %self, i64 %val, i8 %0) unnamed_addr #1 {

    start:

    %_0 = alloca [8 x i8], align 8

    %order = alloca [1 x i8], align 1

    store i8 %0, ptr %order, align 1

    %1 = load i8, ptr %order, align 1

    %_7 = zext i8 %1 to i64

    switch i64 %_7, label %bb2 [

    i64 0, label %bb7

    i64 1, label %bb5

    i64 2, label %bb6

    i64 3, label %bb4

    i64 4, label %bb3

    ]

    bb2: ; preds = %start

    unreachable

    bb7: ; preds = %start

    %2 = atomicrmw umax ptr %self, i64 %val monotonic, align 8

    store i64 %2, ptr %_0, align 8

    br label %bb1

    bb5: ; preds = %start

    %3 = atomicrmw umax ptr %self, i64 %val release, align 8

    store i64 %3, ptr %_0, align 8

    br label %bb1

    bb6: ; preds = %start

    %4 = atomicrmw umax ptr %self, i64 %val acquire, align 8

    store i64 %4, ptr %_0, align 8

    br label %bb1

    bb4: ; preds = %start

    %5 = atomicrmw umax ptr %self, i64 %val acq_rel, align 8

    store i64 %5, ptr %_0, align 8

    br label %bb1

    bb3: ; preds = %start

    %6 = atomicrmw umax ptr %self, i64 %val seq_cst, align 8

    store i64 %6, ptr %_0, align 8

    br label %bb1

    bb1: ; preds = %bb3, %bb4, %bb6, %bb5, %bb7

    %7 = load i64, ptr %_0, align 8

    ret i64 %7

    }

    You can see the atomicrmw umax instruction in multiple places, depending on
    the memory ordering specified. This is the high-level atomic operation that the
    compiler backend understands, but the CPU does not.

    llc -print-after=atomic-expand main.ll -o /dev/null

    This is the relevant part of the output:

    *** IR Dump After Expand Atomic instructions (atomic-expand) ***

    ; Function Attrs: inlinehint nonlazybind uwtable

    define internal i64 @_ZN4core4sync6atomic9AtomicU649fetch_max17h6c42d6f2fc1a6124E(ptr align 8 %self, i64 %val, i8 %0) unnamed_addr #1 {

    start:

    %_0 = alloca [8 x i8], align 8

    %order = alloca [1 x i8], align 1

    store i8 %0, ptr %order, align 1

    %1 = load i8, ptr %order, align 1

    %_7 = zext i8 %1 to i64

    switch i64 %_7, label %bb2 [

    i64 0, label %bb7

    i64 1, label %bb5

    i64 2, label %bb6

    i64 3, label %bb4

    i64 4, label %bb3

    ]

    bb2: ; preds = %start

    unreachable

    bb7: ; preds = %start

    %2 = load i64, ptr %self, align 8 ; seed expected value

    br label %atomicrmw.start ; enter CAS loop

    atomicrmw.start: ; preds = %atomicrmw.start, %bb7

    %loaded = phi i64 [ %2, %bb7 ], [ %newloaded, %atomicrmw.start ] ; on first iteration: use %2, on retries: use value observed by last cmpxchg

    %3 = icmp ugt i64 %loaded, %val ; unsigned compare (umax semantics)

    %new = select i1 %3, i64 %loaded, i64 %val ; desired = max(loaded, val)

    %4 = cmpxchg ptr %self, i64 %loaded, i64 %new monotonic monotonic, align 8 ; CAS: if *self==loaded, store new

    %success = extractvalue { i64, i1 } %4, 1 ; boolean: whether the swap happened

    %newloaded = extractvalue { i64, i1 } %4, 0 ; value seen in memory before the CAS

    br i1 %success, label %atomicrmw.end, label %atomicrmw.start ; loop until CAS succeeds

    atomicrmw.end: ; preds = %atomicrmw.start

    store i64 %newloaded, ptr %_0, align 8

    br label %bb1

    [... MORE OF THE SAME, JUST FOR DIFFERENT ORDERING..]

    bb1: ; preds = %bb3, %bb4, %bb6, %bb5, %bb7

    %7 = load i64, ptr %_0, align 8

    ret i64 %7

    }

    We can see the pass did not change the first part – it still has the code to dispatch based
    on the memory ordering. But in the bb7 block, where we originally had the
    atomicrmw umax LLVM instruction, we now see a full compare-and-swap loop.
    A compiler engineer would say that the atomicrmw umax instruction has been
    “lowered” into a sequence of more primitive operations, that are closer to what
    the hardware can actually execute.

    Here’s the simplified logic:

    1. Read (seed): grab the current value (expected).
    2. Compute: desired = umax(expected, val).
    3. Attempt: observed, success = cmpxchg(ptr, expected, desired, [...]).
    4. If success, return observed (the old value). Otherwise set expected = observed and loop.

    This CAS loop is a fundamental pattern in lock-free programming. The compiler
    just built it for us automatically.

    Layer 5: The Final Product (x86-64 Assembly)

    We’re at the final step. To see the final machine code, you can tell rustc to
    emit the assembly directly:

    This will produce a main.s file containing the final assembly code.
    Inside, you’ll find the result of the cmpxchg loop:

    .LBB8_2:

    movq -32(%rsp), %rax # rax = &self

    movq (%rax), %rax # rax = *self (seed 'expected')

    movq %rax, -48(%rsp) # spill expected to stack

    .LBB8_3: # loop head

    movq -48(%rsp), %rax # rax = expected

    movq -32(%rsp), %rcx # rcx = &self

    movq -40(%rsp), %rdx # rdx = val

    movq %rax, %rsi # rsi = expected (scratch)

    subq %rdx, %rsi # set flags for unsigned compare: expected - val

    cmovaq %rax, %rdx # if (expected > val) rdx = expected; else rdx = val (compute max)

    lock cmpxchgq %rdx, (%rcx)# CAS: if *rcx==rax then *rcx=rdx; rax <- old *rcx; ZF=success

    sete %cl # cl = success

    movq %rax, -56(%rsp) # spill observed to stack

    testb $1, %cl # branch on success

    movq %rax, -48(%rsp) # expected = observed (for retry)

    jne .LBB8_4 # success -> exit

    jmp .LBB8_3 # failure → retry

    The syntax might look a bit different from what you’re used to, that’s because it’s
    in AT&T syntax, which is the default for rustc. If you prefer Intel syntax, you can
    use rustc --emit=asm main.rs -C "llvm-args=-x86-asm-syntax=intel" to get that.

    I’m not an assembly expert, but you can see the key parts of the CAS loop here:

    • Seed read (first iteration): Load *self once to initialize the expected value.
    • Compute umax without branching: The pair sub + cmova implements desired = max_u(expected, val).
    • CAS operation: On x86-64, cmpxchg uses RAX as the expected value and returns the observed value in RAX; ZF
      encodes success.
    • Retry or finish: If ZF is clear, we failed and need to retry. Otherwise, we are done.

    Note we did not ask rustc to optimize the code. If we did, the compiler would
    generate more efficient assembly: No spills to the stack, fewer jumps, no
    dispatch on memory ordering, etc. But I wanted to keep the output as close
    to the original IR as possible to make it easier to follow.

    The Beauty of Abstraction

    And there we have it. Our journey is complete. We started with a safe, clear,
    single line of Rust and ended with a CAS loop written in assembly language.

    Rust fetch_max → Macro-generated atomic_umax → LLVM
    atomicrmw umax
    → LLVM cmpxchg loop → Assembly lock cmpxchg loop

    This journey is a perfect example of the power of modern compilers. We get to
    work at a high level of abstraction, focusing on safety and logic, while the
    compiler handles the messy, error-prone, and incredibly complex task of
    generating correct and efficient code for the hardware.

    So, next time you use an atomic, take a moment to appreciate the incredible,
    hidden journey your code is about to take.

    PS: After conducting this journey I learned that
    C++26 adds fetch_max
    too!

    PPS: We are hiring!

    Bonus: Apple Silicon (AArch64)

    Out of curiosity, I also checked how this looks on Apple Silicon (AArch64).
    This architecture does have a native atomic max instruction, so the
    AtomicExpandPass does not need to lower it into a CAS loop. The LLVM code before and after
    the pass is identical, still containing the atomicrmw umax instruction.

    The final assembly contains a variant of the LDUMAX instruction. This is the relevant part of the assembly:

    ldr x8, [sp, #16] # x8 = value to compare with

    ldr x9, [sp, #8] # x9 = pointer to the atomic variable

    ldumax x8, x8, [x9] # atomic unsigned max (relaxed), [x9] = max(x8, [x9]), x8 = old value

    str x8, [sp, #40] # Store old value

    b LBB8_11

    Note that AArch64 uses Unified Assembler Language,
    when reading the snippet above, it’s important to remember that the destination register comes first.

    And that’s really it. We could continue to dig into the microarchitecture, to see how instructions are executed
    at the hardware level, what are the effects of the LOCK prefix, dive into differences in memory ordering, etc.
    But we’ll leave that for another day.

    Alice: “Would you tell me, please, which way I ought to go from here?”
    The Cat: “That depends a good deal on where you want to get to.”
    Alice: “I don’t much care where.”
    The Cat: “Then it doesn’t much matter which way you go.”
    Alice: “…So long as I get somewhere.”
    The Cat: “Oh, you’re sure to do that, if only you walk long enough.”

    – Lewis Carroll, Alice’s Adventures in Wonderland

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleContext Engineering for AI Agents: Lessons
    Next Article Preventing IoT Edge Device Cloning
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    What was the first RGB-lit PC component that wasn’t a case?

    November 20, 2025

    Judge dismisses lawsuit twice due to alleged deepfake video testimony

    November 20, 2025

    Microsoft is baking Sysmon directly into Windows 11 and Windows Server

    November 20, 2025
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025410 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025109 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 202575 Views

    Is Libby Compatible With Kobo E-Readers?

    March 31, 202555 Views
    Don't Miss
    Technology November 20, 2025

    What was the first RGB-lit PC component that wasn’t a case?

    What was the first RGB-lit PC component that wasn’t a case?Who flipped the switch on…

    Judge dismisses lawsuit twice due to alleged deepfake video testimony

    Microsoft is baking Sysmon directly into Windows 11 and Windows Server

    Blender 5.0 lands with HDR upgrades and improved rendering features

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    What was the first RGB-lit PC component that wasn’t a case?

    November 20, 20251 Views

    Judge dismisses lawsuit twice due to alleged deepfake video testimony

    November 20, 20253 Views

    Microsoft is baking Sysmon directly into Windows 11 and Windows Server

    November 20, 20253 Views
    Most Popular

    Xiaomi 15 Ultra Officially Launched in China, Malaysia launch to follow after global event

    March 12, 20250 Views

    Apple thinks people won’t use MagSafe on iPhone 16e

    March 12, 20250 Views

    French Apex Legends voice cast refuses contracts over “unacceptable” AI clause

    March 12, 20250 Views
    © 2025 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.