Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Build a Rocket Boy confirms more layoffs amid further claims of “organized espionage and corporate sabotage”

    Former Blizzard CCO and Bonfire CEO Rob Pardo to present keynote address at GDC Festival of Gaming

    Turkish mobile developer Vento Games secures $4m in seed round funding

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      What the polls say about how Americans are using AI

      February 27, 2026

      Tensions between the Pentagon and AI giant Anthropic reach a boiling point

      February 21, 2026

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026
    • Business

      Google releases Gemini 3.1 Flash Lite at 1/8th the cost of Pro

      March 4, 2026

      Huawei Watch GT Series

      March 4, 2026

      Weighing up the enterprise risks of neocloud providers

      March 3, 2026

      A stolen Gemini API key turned a $180 bill into $82,000 in two days

      March 3, 2026

      These ultra-budget laptops “include” 1.2TB storage, but most of it is OneDrive trial space

      March 1, 2026
    • Crypto

      Banks Respond to Kraken’s Federal Reserve Access as Trump Sides with Crypto

      March 4, 2026

      Hyperliquid and DEXs Break the Top 10 — Is the CEX Era Ending?

      March 4, 2026

      Consensus Hong Kong 2026: The Institutional Turn 

      March 4, 2026

      New Crypto Mutuum Finance (MUTM) Reports V1 Protocol Progress as Roadmap Enters Phase 3

      March 4, 2026

      Bitcoin Short Sellers Caught Off Guard in New White House Move

      March 4, 2026
    • Technology

      Big tech companies agree to not ruin your electric bill with AI data centers

      March 5, 2026

      Mark Zuckerberg downplays Meta’s own research in New Mexico child safety trial

      March 5, 2026

      Bill Gates-backed TerraPower begins nuclear reactor construction

      March 5, 2026

      Assassin’s Creed Unity is getting a free 60 fps patch tomorrow

      March 5, 2026

      LG reveals pricing for its 2026 OLED TVs

      March 5, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»MySQL transactions per second vs. fsyncs per second (2020)
    Technology

    MySQL transactions per second vs. fsyncs per second (2020)

    TechAiVerseBy TechAiVerseMarch 21, 2025No Comments15 Mins Read2 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    MySQL transactions per second vs. fsyncs per second (2020)

    Jul 2020

    Just wondering how many transactions or writes per second MySQL can handle? While it depends on many factors, fundamentally, about as many transactions as MySQL can commit to disk per second. A modern disk can do ~1000 fsyncs per second, but MySQL will group multiple writes with each fsync. An okay rule-of-thumb would be 5000-15,000 writes per second, depending on things like writes per transaction, number of indexes, hardware, size of writes, etc. Read the article to understand this in more depth!

    1. Problem 10: Is MySQL’s maximum transactions per second equivalent to fsyncs per second?
    2. Problem 9: Inverted Index

    Napkin friends, from near and far, it’s time for another napkin problem!

    Since the beginning of this newsletter I’ve posed problems for you to try to
    answer. Then in the next month’s edition, you hear my answer. Talking with a few
    of you, it seems many of you read these as posts regardless of their
    problem-answer format.

    That’s why I’ve decided to experiment with a simpler format: posts where I both
    present a problem and solution in one go. This one will be long, since it’ll
    include an answer to last month’s.

    Hope you enjoy this format! As always, you are encouraged to reach out with
    feedback.

    Problem 10: Is MySQL’s maximum transactions per second equivalent to fsyncs per second?

    How many transactions (‘writes’) per second is MySQL capable of?

    A naive model of how a write (a SQL insert/update/delete) to an ACID-compliant
    database like MySQL works might be the following (this applies equally to
    Postgres, or any other relational/ACID-compliant databases, but we’ll
    proceed to work with MySQL as it’s the one I know best):

    1. Client sends query to MySQL over an existing connection: INSERT INTO products (name, price) VALUES ('Sneaker', 100)
    2. MySQL inserts the new record to the write-ahead-log (WAL) and calls
      fsync(2) to tell the operating system to tell the filesystem to tell the
      disk to make sure that this data is for sure, pinky-swear committed to
      the disk. This step, being the most complex, is depicted below.
    3. MySQL inserts the record into an in-memory page in the backing storage engine
      (InnoDB) so the record will be visible to subsequent queries. Why commit to
      the storage engine and the WAL? The storage engine is optimized for serving
      query results the data, and the WAL for writing it in a safe manner — we
      can’t serve a SELECT efficiently from the WAL!
    4. MySQL returns OK to the client.
    5. MySQL eventually calls fsync(2) to ensure InnoDB commits the page to disk.

    In the event of power-loss at any of these points, the behaviour can be defined
    without nasty surprises, upholding our dear ACID-compliance.

    Splendid! Now that we’ve constructed a naive model of how a relational database
    might handle writes safely, we can consider the latency of inserting a new
    record into the database. When we consult the reference napkin numbers, we
    see that the fsync(2) in step (2) is by far the slowest operation in the
    blocking chain at 1 ms.

    For example, the network handling at step (1) takes roughly ~10 μs (TCP Echo
    Server is what we can classify as ‘the TCP overhead’). The write(2) itself
    prior to the fsync(2) is also negligible at ~10 μs, since this system call
    essentially just writes to an in-memory buffer (the ‘page cache’) in the kernel.
    This doesn’t guarantee the actual bits are committed on disk, which means an
    unexpected loss of power would erase the data, dropping our ACID-compliance on
    the floor. Calling fsync(2) guarantees us the bits are persisted on the disk,
    which will survive an unexpected system shutdown. Downside is that it’s 100x
    slower.

    With that, we should be able to form a simple hypothesis on the maximum
    throughput of MySQL:

    The maximum theoretical throughput of MySQL is equivalent to the maximum
    number of fsync(2) per second.

    We know that fsync(2) takes 1 ms from earlier, which means we would naively
    expect that MySQL would be able to perform in the neighbourhood of: 1s / 1ms/fsync = 1000 fsyncs/s = 1000 transactions/s .

    Excellent. We followed the first three of the napkin math steps: (1) Model the
    system, (2) Identify the relevant latencies, (3) Do the napkin math, (4) Verify
    the napkin calculations against reality.

    On to (4: Verifying)! We’ll write a simple benchmark in Rust that writes to
    MySQL with 16 threads, doing 1,000 insertions each:

    for i in 0..16 {
        handles.push(thread::spawn({
            let pool = pool.clone();
            move || {
                let mut conn = pool.get_conn().unwrap();
                // TODO: we should ideally be popping these off a queue in case of a stall
                // in a thread, but this is likely good enough.
                for _ in 0..1000 {
                    conn.exec_drop(
                        r"INSERT INTO products (shop_id, title) VALUES (:shop_id, :title)",
                        params! { "shop_id" => 123, "title" => "aerodynamic chair" },
                    )
                    .unwrap();
                }
            }
        }));
    
        for handle in handles {
          handle.join().unwrap();
        }
        // 3 seconds, 16,000 insertions
    }
    

    This takes ~3 seconds to perform 16,000 insertions, or ~5,300 insertions per
    second. This is 5x more than the 1,000 fsync per second our napkin math
    told us would be the theoretical maximum transactional throughput!

    Typically with napkin math we aim for being within an order of magnitude, which
    we are. But, when I do napkin math it usually establishes a lower-bound for the
    system, i.e. from first-principles, how fast could this system perform in
    ideal circumstances?

    Rarely is the system 5x faster than napkin math. When we identify a
    significant-ish gap between the real-life performance and the expected
    performance, I call it the “first-principle gap.” This is where curiosity sets
    in. It typically means there’s (1) an opportunity to improve the system, or (2)
    a flaw in our model of the system. In this case, only (2) makes sense, because
    the system is faster than we predicted.

    What’s wrong with our model of how the system works? Why aren’t fsyncs per
    second equal to transactions per second?

    First I examined the benchmark… is something wrong? Nope SELECT COUNT(*) FROM products says 16,000. Is the MySQL I’m using configured to not fsync on every
    write? Nope, it’s at the safe default.

    Then I sat down and thought about it. Perhaps MySQL is not doing an fsync
    for every single write? If it’s processing 5,300 insertions per second,
    perhaps it’s batching multiple writes together as part of writing to the WAL,
    step (2) above? Since each transaction is so short, MySQL would benefit from
    waiting a few microseconds to see if other transactions want to ride along
    before calling the expensive fsync(2).

    We can test this hypothesis by writing a simple bpftrace script to observe the
    number of fsync(1) for the ~16,000 insertions:

    tracepoint:syscalls:sys_enter_fsync,tracepoint:syscalls:sys_enter_fdatasync
    /comm == "mysqld"/
    {
            @fsyncs = count();
    }
    

    Running this during the ~3 seconds it takes to insert the 16,000 records we get
    ~8,000 fsync calls:

    $ sudo bpftrace fsync_count.d
    Attaching 2 probes...
    ^C
    
    @fsyncs: 8037
    

    This is a peculiar number. If MySQL was batching fsyncs, we’d expect something
    far lower. This number means that we’re on average doing ~2,500 fsync per
    second, at a latency of ~0.4ms. This is twice as fast as the fsync latency we
    expect, the 1ms mentioned earlier. For sanity, I ran the script to benchmark
    fsync outside MySQL again, no, still 1ms. Looked at the
    distribution
    , and it was consistently ~1ms.

    So there’s two things we can draw from this: (1) We’re able to fsync more than
    twice as fast as we expect, (2) Our hypothesis was correct that MySQL is more
    clever than doing one fsync per transaction, however, since fsync also was
    faster than expected, this didn’t explain everything.

    If you remember from above, while committing the transaction could theoretically
    be a single fsync, other features of MySQL might also call fsync. Perhaps
    they’re adding noise?

    We need to group fsync by file descriptor to get a better idea of how MySQL
    uses fsync. However, the raw file descriptor number doesn’t tell us much. We
    can use readlink and the proc file-system to obtain the file name the file
    descriptor points to. Let’s write a bpftrace script to see what’s being
    fsync‘ed:

    tracepoint:syscalls:sys_enter_fsync,tracepoint:syscalls:sys_enter_fdatasync
    /comm == str($1)/
    {
      @fsyncs[args->fd] = count();
      if (@fd_to_filename[args->fd]) {
      } else {
        @fd_to_filename[args->fd] = 1;
        system("echo -n 'fd %d -> ' &1>&2 | readlink /proc/%d/fd/%d",
               args->fd, pid, args->fd);
      }
    }
    
    END {
      clear(@fd_to_filename);
    }
    

    Running this while inserting the 16,000 transactions into MySQL gives us:

    personal@napkin:~$ sudo bpftrace --unsafe fsync_count_by_fd.d mysqld
    Attaching 5 probes...
    fd 5 -> /var/lib/mysql/ib_logfile0 # redo log, or write-ahead-log
    fd 9 -> /var/lib/mysql/ibdata1 # shared mysql tablespace
    fd 11 -> /var/lib/mysql/#ib_16384_0.dblwr # innodb doublewrite-buffer
    fd 13 -> /var/lib/mysql/undo_001 # undo log, to rollback transactions
    fd 15 -> /var/lib/mysql/undo_002 # undo log, to rollback transactions
    fd 27 -> /var/lib/mysql/mysql.ibd # tablespace 
    fd 34 -> /var/lib/mysql/napkin/products.ibd # innodb storage for our products table
    fd 99 -> /var/lib/mysql/binlog.000019 # binlog for replication
    ^C
    
    @fsyncs[9]: 2
    @fsyncs[12]: 2
    @fsyncs[27]: 12
    @fsyncs[34]: 47
    @fsyncs[13]: 86
    @fsyncs[15]: 93
    @fsyncs[11]: 103
    @fsyncs[99]: 2962
    @fsyncs[5]: 4887
    

    What we can observe here is that the majority of the writes are to the “redo
    log”, what we call the “write-ahead-log” (WAL). There’s a few fsync calls to
    commit the InnoDB table-space, not nearly as often, as we can always recover
    this from the WAL in case we crash between them. Reads work just fine prior to
    the fsync, as the queries can simply be served out of memory from InnoDB.

    The only surprising thing here is the substantial volume of writes to the
    binlog, which we haven’t mentioned before. You can think of the binlog as the
    “replication stream.” It’s a stream of events such as row a changed from x to y, row b was deleted, and table u added column c. The primary replica
    streams this to the read-replicas, which use it to update their own data.

    When you think about it, the binlog and the WAL need to be kept exactly in
    sync. We can’t have something committed on the primary replica, but not
    committed to the replicas. If they’re not in sync, this could cause loss of data
    due to drift in the read-replicas. The primary could commit a change to the WAL,
    lose power, recover, and never write it to the binlog.

    Since fsync(1) can only sync a single file-descriptor at a time, how can you
    possibly ensure that the binlog and the WAL contain the transaction?

    One solution would be to merge the binlog and the WAL into one log. I’m not
    entirely sure why that’s not the case, but likely the reasons are historic. If
    you know, let me know!

    The solution employed by MySQL is to use a 2-factor commit. This requires three
    fsyncs to commit the transaction. This and this reference explain
    this process in more detail. Because the WAL is touched twice as part of the
    2-factor commit, it explains why we see roughly ~2x the number of fsync to
    that over the bin-log from the bpftrace output above. The process of grouping
    multiple transactions into one 2-factor commit in MySQL is called ‘group commit.’

    What we can gather from these numbers is that it seems the ~16,000 transactions
    were, thanks to group commit, reduced into ~2885 commits, or ~5.5 transactions
    per commit on average.

    But there’s still one other thing remaining… why was the average latency per
    fsync twice as fast as in our benchmark? Once again, we write a simple
    bpftrace script:

    tracepoint:syscalls:sys_enter_fsync,tracepoint:syscalls:sys_enter_fdatasync
    /comm == "mysqld"/
    {
            @start[tid] = nsecs;
    }
    
    tracepoint:syscalls:sys_exit_fsync,tracepoint:syscalls:sys_exit_fdatasync
    /comm == "mysqld"/
    {
            @bytes = lhist((nsecs - @start[tid]) / 1000, 0, 1500, 100);
            delete(@start[tid]);
    }
    

    Which throws us this histogram, confirming that we’re seeing some very fast
    fsyncs:

    personal@napkin:~$ sudo bpftrace fsync_latency.d
    Attaching 4 probes...
    ^C
    
    @bytes:
    [0, 100)             439 |@@@@@@@@@@@@@@@                                     |
    [100, 200)             8 |                                                    |
    [200, 300)             2 |                                                    |
    [300, 400)           242 |@@@@@@@@                                            |
    [400, 500)          1495 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    [500, 600)           768 |@@@@@@@@@@@@@@@@@@@@@@@@@@                          |
    [600, 700)           376 |@@@@@@@@@@@@@                                       |
    [700, 800)           375 |@@@@@@@@@@@@@                                       |
    [800, 900)           379 |@@@@@@@@@@@@@                                       |
    [900, 1000)          322 |@@@@@@@@@@@                                         |
    [1000, 1100)         256 |@@@@@@@@                                            |
    [1100, 1200)         406 |@@@@@@@@@@@@@@                                      |
    [1200, 1300)         690 |@@@@@@@@@@@@@@@@@@@@@@@@                            |
    [1300, 1400)         803 |@@@@@@@@@@@@@@@@@@@@@@@@@@@                         |
    [1400, 1500)         582 |@@@@@@@@@@@@@@@@@@@@                                |
    [1500, ...)         1402 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    |
    

    To understand exactly what’s going on here, we’d have to dig into the
    file-system we’re using. This is going to be out of scope (otherwise I’m never
    going to be sending anything out). But, to not leave you completely hanging,
    presumably, ext4 is using techniques similar to MySQL’s group commit to batch
    writes together in the journal (equivalent to the write-ahead-log of MySQL). In
    ext4’s vocabulary, this seems to be called max_batch_time, but the
    documentation on this is scanty at best. The disk could also be doing this in
    addition/instead of the file-system. If you know more about this, please
    enlighten me!

    The bottom-line is that fsync can perform faster during real-life workloads than the
    1 ms I obtain on this machine from repeatedly writing and fsyncing a file. Most
    likely from the ext4 equivalent of group commit, which we won’t see on a
    benchmark that never does multiple fsyncs in parallel.

    This brings us back around to explaining the discrepancy between real-life and
    the napkin-math of MySQL’s theoretical, maximum throughput. We are able to
    achieve an at least 5x increase in throughput from raw fsync calls due to:

    1. MySQL merging multiple transactions into fewer fsyncs through ‘group commits.’
    2. The file-system and/or disk merging multiple fsyncs performed in parallel
      through its own ‘group commits’, yielding faster performance.

    In essence, the same technique of batching is used at every layer to improve
    performance.

    While we didn’t manage to explain everything that’s going on here, I certainly
    learned a lot from this investigation. It’d be interesting light of this to play
    with changing the group commit settings to optimize MySQL for throughput over
    latency. This could also be tuned at the file-system level.

    Problem 9: Inverted Index

    Last month, we looked at the inverted
    index.
    This data-structure is what’s
    behind full-text search, and the way the documents are packed works well for set
    intersections.

    (A) How long do you estimate it’d take to get the ids for title AND see with 2
    million ids for title, and 1 million for see?

    Let’s assume that each document id is stored as a 64-bit integer. Then we’re
    dealing with 1 * 10^6 * 64bit = 8 Mb and 2 * 10^6 * 64 bit = 16 Mb. If we
    use an exceptionally simple set intersection algorithm of essentially two nested
    for-loops, we need to scan ~24Mb of sequential memory. According to the
    reference, we can do this in 1Mb/100us * 24Mb = 2.4ms.

    Strangely, the Lucene nightly benchmarks are performing these queries at
    roughly 22 QPS, or 1000ms/22 = 45ms per query. That’s substantially worse than
    our prediction. I was ready to explain why Lucene might be faster (e.g. by
    compressing postings to less than 64-bit), but not why it might be 20x slower!
    We’ve got ourselves another first-principle gap.

    Some slowness can be due to reading from disk, but since the access pattern is
    sequential, it should only be 2-3x slower. The hardware could be different
    than the reference, but hardly anything that’d explain 20x. Sending the data to
    the client might incur a large penalty, but again, 20x seems enormous. This type
    of gap points towards missing something fundamental (as we saw with MySQL).
    Unfortunately, this month I didn’t have time to dig much deeper than this, as I
    prioritized the MySQL post.

    (B) What about title OR see?

    In this case we’d have to scan roughly as much memory, but handle more documents
    and potentially transfer more back to the client. We’d expect to roughly be in
    the same ballpark for performance ~2.4ms.

    Lucene in this case is doing roughly half the throughput, which aligns with
    our relative expectations. But again, in absolute terms, Lucene’s handling these
    queries in ~100ms, which is much, much higher than we expect.

    (C) How do the Lucene nightly benchmarks compare for (A) and (B)? This file
    shows some of the actual terms used. If they don’t line up, how might you
    explain the discrepency?

    Answered inline with (A) and (B).

    (D) Let’s imagine that we want title AND see and order the results by the last
    modification date of each document. How long would you expect that to take?

    If the postings are not stored in that order, we’d naively expect in the worst
    case we’d need to sort roughly ~24Mb of memory, at
    5ms/Mb
    . This would land us in the
    5mb/mb * 24mb ~= 120ms query time ballpark.

    In reality, this seems like an unintentional trick question. If ordered by last
    modification date, they’d already be sorted in roughly that order, since new
    documents are inserted to the end of the list. Which means they’re already
    stored in roughly the right order, meaning our sort has to move far less bits
    around. Even if that wasn’t the case, we could store sorted list for just this
    column, which e.g. Lucene allows with doc values.

    Subscribe through email,

    RSS

    or

    Twitter

    to new articles!

    3,637 subscribers

    You might also like…

    • Using checksums to verify syncing 100M database records
    • Shitlist Driven Development
    • Neural Network From Scratch
    • Inverted Index Performance and Merkle Tree Syncronization
    • Increase HTTP Performance by Fitting In the Initial TCP Slow Start Window
    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleFrance rejects backdoor mandate
    Next Article Making Sense of Cybersecurity – Part 1: Seeing Through Complexity
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Big tech companies agree to not ruin your electric bill with AI data centers

    March 5, 2026

    Mark Zuckerberg downplays Meta’s own research in New Mexico child safety trial

    March 5, 2026

    Bill Gates-backed TerraPower begins nuclear reactor construction

    March 5, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025704 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025289 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025164 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025124 Views
    Don't Miss
    Gaming March 5, 2026

    Build a Rocket Boy confirms more layoffs amid further claims of “organized espionage and corporate sabotage”

    Build a Rocket Boy confirms more layoffs amid further claims of “organized espionage and corporate…

    Former Blizzard CCO and Bonfire CEO Rob Pardo to present keynote address at GDC Festival of Gaming

    Turkish mobile developer Vento Games secures $4m in seed round funding

    Good Games Group has bought the Humble and Firestoke back catalogues. Now, newly renamed as Balor Games, it wants to invest in triple-I

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Build a Rocket Boy confirms more layoffs amid further claims of “organized espionage and corporate sabotage”

    March 5, 20262 Views

    Former Blizzard CCO and Bonfire CEO Rob Pardo to present keynote address at GDC Festival of Gaming

    March 5, 20262 Views

    Turkish mobile developer Vento Games secures $4m in seed round funding

    March 5, 20262 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    Best TV Antenna of 2025

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.