Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Are consumers doomed to pay more for electricity due to data center buildouts?

    The $599 MacBook Neo is Apple’s long-awaited colorful, lower-cost MacBook

    No fooling: NASA targets April 1 for Artemis II launch to the Moon

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      What the polls say about how Americans are using AI

      February 27, 2026

      Tensions between the Pentagon and AI giant Anthropic reach a boiling point

      February 21, 2026

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026
    • Business

      Weighing up the enterprise risks of neocloud providers

      March 3, 2026

      A stolen Gemini API key turned a $180 bill into $82,000 in two days

      March 3, 2026

      These ultra-budget laptops “include” 1.2TB storage, but most of it is OneDrive trial space

      March 1, 2026

      FCC approves the merger of cable giants Cox and Charter

      February 28, 2026

      Finding value with AI and Industry 5.0 transformation

      February 28, 2026
    • Crypto

      Strait of Hormuz Shutdown Shakes Asian Energy Markets

      March 3, 2026

      Wall Street’s Inflation Alarm From Iran — What It Means for Crypto

      March 3, 2026

      Ethereum Price Prediction: What To Expect From ETH In March 2026

      March 3, 2026

      Was Bitcoin Hijacked? How Institutional Interests Shaped Its Narrative Since 2015

      March 3, 2026

      XRP Whales Now Hold 83.7% of All Supply – What’s Next For Price?

      March 3, 2026
    • Technology

      Are consumers doomed to pay more for electricity due to data center buildouts?

      March 4, 2026

      The $599 MacBook Neo is Apple’s long-awaited colorful, lower-cost MacBook

      March 4, 2026

      No fooling: NASA targets April 1 for Artemis II launch to the Moon

      March 4, 2026

      Downdetector, Speedtest sold to IT service-provider Accenture in $1.2B deal

      March 4, 2026

      FCC chair calls Paramount/WBD merger “a lot cleaner” than defunct Netflix deal

      March 4, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»LLMs contain a LOT of parameters. But what’s a parameter?
    Technology

    LLMs contain a LOT of parameters. But what’s a parameter?

    TechAiVerseBy TechAiVerseJanuary 10, 2026No Comments13 Mins Read2 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    LLMs contain a LOT of parameters. But what’s a parameter?
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    LLMs contain a LOT of parameters. But what’s a parameter?

    MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here.

    I am writing this because one of my editors woke up in the middle of the night and scribbled on a bedside notepad: “What is a parameter?” Unlike a lot of thoughts that hit at 4 a.m., it’s a really good question—one that goes right to the heart of how large language models work. And I’m not just saying that because he’s my boss. (Hi, Boss!)

    A large language model’s parameters are often said to be the dials and levers that control how it behaves. Think of a planet-size pinball machine that sends its balls pinging from one end to the other via billions of paddles and bumpers set just so. Tweak those settings and the balls will behave in a different way.  

    OpenAI’s GPT-3, released in 2020, had 175 billion parameters. Google DeepMind’s latest LLM, Gemini 3, may have at least a trillion—some think it’s probably more like 7 trillion—but the company isn’t saying. (With competition now fierce, AI firms no longer share information about how their models are built.)

    But the basics of what parameters are and how they make LLMs do the remarkable things that they do are the same across different models. Ever wondered what makes an LLM really tick—what’s behind the colorful pinball-machine metaphors? Let’s dive in.  

    What is a parameter?

    Think back to middle school algebra, like 2a + b. Those letters are parameters: Assign them values and you get a result. In math or coding, parameters are used to set limits or determine output. The parameters inside LLMs work in a similar way, just on a mind-boggling scale. 

    How are they assigned their values?

    Short answer: an algorithm. When a model is trained, each parameter is set to a random value. The training process then involves an iterative series of calculations (known as training steps) that update those values. In the early stages of training, a model will make errors. The training algorithm looks at each error and goes back through the model, tweaking the value of each of the model’s many parameters so that next time that error is smaller. This happens over and over again until the model behaves in the way its makers want it to. At that point, training stops and the values of the model’s parameters are fixed.

    Sounds straightforward …

    In theory! In practice, because LLMs are trained on so much data and contain so many parameters, training them requires a huge number of steps and an eye-watering amount of computation. During training, the 175 billion parameters inside a medium-size LLM like GPT-3 will each get updated tens of thousands of times. In total, that adds up to quadrillions (a number with 15 zeros) of individual calculations. That’s why training an LLM takes so much energy. We’re talking about thousands of specialized high-speed computers running nonstop for months.

    Oof. What are all these parameters for, exactly?

    There are three different types of parameters inside an LLM that get their values assigned through training: embeddings, weights, and biases. Let’s take each of those in turn.

    Okay! So, what are embeddings?

    An embedding is the mathematical representation of a word (or part of a word, known as a token) in an LLM’s vocabulary. An LLM’s vocabulary, which might contain up to a few hundred thousand unique tokens, is set by its designers before training starts. But there’s no meaning attached to those words. That comes during training.  

    When a model is trained, each word in its vocabulary is assigned a numerical value that captures the meaning of that word in relation to all the other words, based on how the word appears in countless examples across the model’s training data.

    Each word gets replaced by a kind of code?

    Yeah. But there’s a bit more to it. The numerical value—the embedding—that represents each word is in fact a list of numbers, with each number in the list representing a different facet of meaning that the model has extracted from its training data. The length of this list of numbers is another thing that LLM designers can specify before an LLM is trained. A common size is 4,096.

    Every word inside an LLM is represented by a list of 4,096 numbers?  

    Yup, that’s an embedding. And each of those numbers is tweaked during training. An LLM with embeddings that are 4,096 numbers long is said to have 4,096 dimensions.

    Why 4,096?

    It might look like a strange number. But LLMs (like anything that runs on a computer chip) work best with powers of two—2, 4, 8, 16, 32, 64, and so on. LLM engineers have found that 4,096 is a power of two that hits a sweet spot between capability and efficiency. Models with fewer dimensions are less capable; models with more dimensions are too expensive or slow to train and run. 

    Using more numbers allows the LLM to capture very fine-grained information about how a word is used in many different contexts, what subtle connotations it might have, how it relates to other words, and so on.

    Back in February, OpenAI released GPT-4.5, the firm’s largest LLM yet (some estimates have put its parameter count at more than 10 trillion). Nick Ryder, a research scientist at OpenAI who worked on the model, told me at the time that bigger models can work with extra information, like emotional cues, such as when a speaker’s words signal hostility: “All of these subtle patterns that come through a human conversation—those are the bits that these larger and larger models will pick up on.”

    The upshot is that all the words inside an LLM get encoded into a high-dimensional space. Picture thousands of words floating in the air around you. Words that are closer together have similar meanings. For example, “table” and “chair” will be closer to each other than they are to “astronaut,” which is close to “moon” and “Musk.” Way off in the distance you can see “prestidigitation.” It’s a little like that, but instead of being related to each other across three dimensions, the words inside an LLM are related across 4,096 dimensions.

    Yikes.

    It’s dizzying stuff. In effect, an LLM compresses the entire internet into a single monumental mathematical structure that encodes an unfathomable amount of interconnected information. It’s both why LLMs can do astonishing things and why they’re impossible to fully understand.    

    Okay. So that’s embeddings. What about weights?

    A weight is a parameter that represents the strength of a connection between different parts of a model—and one of the most common types of dial for tuning a model’s behavior. Weights are used when an LLM processes text.

    When an LLM reads a sentence (or a book chapter), it first looks up the embeddings for all the words and then passes those embeddings through a series of neural networks, known as transformers, that are designed to process sequences of data (like text) all at once. Every word in the sentence gets processed in relation to every other word.

    This is where weights come in. An embedding represents the meaning of a word without context. When a word appears in a specific sentence, transformers use weights to process the meaning of that word in that new context. (In practice, this involves multiplying each embedding by the weights for all other words.)

    And biases?

    Biases are another type of dial that complement the effects of the weights. Weights set the thresholds at which different parts of a model fire (and thus pass data on to the next part). Biases are used to adjust those thresholds so that an embedding can trigger activity even when its value is low. (Biases are values that are added to an embedding rather than multiplied with it.) 

    By shifting the thresholds at which parts of a model fire, biases allow the model to pick up information that might otherwise be missed. Imagine you’re trying to hear what somebody is saying in a noisy room. Weights would amplify the loudest voices the most; biases are like a knob on a listening device that pushes quieter voices up in the mix. 

    Here’s the TL;DR: Weights and biases are two different ways that an LLM extracts as much information as it can out of the text it is given. And both types of parameters are adjusted over and over again during training to make sure they do this. 

    Okay. What about neurons? Are they a type of parameter too? 

    No, neurons are more a way to organize all this math—containers for the weights and biases, strung together by a web of pathways between them. It’s all very loosely inspired by biological neurons inside animal brains, with signals from one neuron triggering new signals from the next and so on. 

    Each neuron in a model holds a single bias and weights for every one of the model’s dimensions. In other words, if a model has 4,096 dimensions—and therefore its embeddings are lists of 4,096 numbers—then each of the neurons in that model will hold one bias and 4,096 weights. 

    Neurons are arranged in layers. In most LLMs, each neuron in one layer is connected to every neuron in the layer above. A 175-billion-parameter model like GPT-3 might have around 100 layers with a few tens of thousands of neurons in each layer. And each neuron is running tens of thousands of computations at a time. 

    Dizzy again. That’s a lot of math.

    That’s a lot of math.

    And how does all of that fit together? How does an LLM take a bunch of words and decide what words to give back?

    When an LLM processes a piece of text, the numerical representation of that text—the embedding—gets passed through multiple layers of the model. In each layer, the value of the embedding (that list of 4,096 numbers) gets updated many times by a series of computations involving the model’s weights and biases (attached to the neurons) until it gets to the final layer.

    The idea is that all the meaning and nuance and context of that input text is captured by the final value of the embedding after it has gone through a mind-boggling series of computations. That value is then used to calculate the next word that the LLM should spit out. 

    It won’t be a surprise that this is more complicated than it sounds: The model in fact calculates, for every word in its vocabulary, how likely that word is to come next and ranks the results. It then picks the top word. (Kind of. See below …) 

    That word is appended to the previous block of text, and the whole process repeats until the LLM calculates that the most likely next word to spit out is one that signals the end of its output. 

    That’s it?  

    Sure. Well …

    Go on.

    LLM designers can also specify a handful of other parameters, known as hyperparameters. The main ones are called temperature, top-p, and top-k.

    You’re making this up.

    Temperature is a parameter that acts as a kind of creativity dial. It influences the model’s choice of what word comes next. I just said that the model ranks the words in its vocabulary and picks the top one. But the temperature parameter can be used to push the model to choose the most probable next word, making its output more factual and relevant, or a less probable word, making the output more surprising and less robotic. 

    Top-p and top-k are two more dials that control the model’s choice of next words. They are settings that force the model to pick a word at random from a pool of most probable words instead of the top word. These parameters affect how the model comes across—quirky and creative versus trustworthy and dull.   

    One last question! There has been a lot of buzz about small models that can outperform big models. How does a small model do more with fewer parameters?

    That’s one of the hottest questions in AI right now. There are a lot of different ways it can happen. Researchers have found that the amount of training data makes a huge difference. First you need to make sure the model sees enough data: An LLM trained on too little text won’t make the most of all its parameters, and a smaller model trained on the same amount of data could outperform it. 

    Another trick researchers have hit on is overtraining. Showing models far more data than previously thought necessary seems to make them perform better. The result is that a small model trained on a lot of data can outperform a larger model trained on less data. Take Meta’s Llama LLMs. The 70-billion-parameter Llama 2 was trained on around 2 trillion words of text; the 8-billion-parameter Llama 3 was trained on around 15 trillion words of text. The far smaller Llama 3 is the better model. 

    A third technique, known as distillation, uses a larger model to train a smaller one. The smaller model is trained not only on the raw training data but also on the outputs of the larger model’s internal computations. The idea is that the hard-won lessons encoded in the parameters of the larger model trickle down into the parameters of the smaller model, giving it a boost. 

    In fact, the days of single monolithic models may be over. Even the largest models on the market, like OpenAI’s GPT-5 and Google DeepMind’s Gemini 3, can be thought of as several small models in a trench coat. Using a technique called “mixture of experts,” large models can turn on just the parts of themselves (the “experts”) that are required to process a specific piece of text. This combines the abilities of a large model with the speed and lower power consumption of a small one.

    But that’s not the end of it. Researchers are still figuring out ways to get the most out of a model’s parameters. As the gains from straight-up scaling tail off, jacking up the number of parameters no longer seems to make the difference it once did. It’s not so much how many you have, but what you do with them.

    Can I see one?

    You want to see a parameter? Knock yourself out: Here’s an embedding.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleThe man who made India digital isn’t done yet
    Next Article The Download: war in Europe, and the company that wants to cool the planet
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Are consumers doomed to pay more for electricity due to data center buildouts?

    March 4, 2026

    The $599 MacBook Neo is Apple’s long-awaited colorful, lower-cost MacBook

    March 4, 2026

    No fooling: NASA targets April 1 for Artemis II launch to the Moon

    March 4, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025703 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025288 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025164 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025124 Views
    Don't Miss
    Technology March 4, 2026

    Are consumers doomed to pay more for electricity due to data center buildouts?

    Are consumers doomed to pay more for electricity due to data center buildouts? To avoid…

    The $599 MacBook Neo is Apple’s long-awaited colorful, lower-cost MacBook

    No fooling: NASA targets April 1 for Artemis II launch to the Moon

    Downdetector, Speedtest sold to IT service-provider Accenture in $1.2B deal

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Are consumers doomed to pay more for electricity due to data center buildouts?

    March 4, 20262 Views

    The $599 MacBook Neo is Apple’s long-awaited colorful, lower-cost MacBook

    March 4, 20262 Views

    No fooling: NASA targets April 1 for Artemis II launch to the Moon

    March 4, 20261 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    Best TV Antenna of 2025

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.