Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    The case for and against influencer-led Super Bowl ads

    Future of TV Briefing: Brands are spending more to advertise creators’ content, making usage rights a focal point

    After an oversaturation of AI-generated content, creators’ authenticity and ‘messiness’ are in high demand

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Anthropic joins OpenAI’s push into health care with new Claude tools

      January 12, 2026

      The mother of one of Elon Musk’s children says his AI bot won’t stop creating sexualized images of her

      January 7, 2026

      A new pope, political shake-ups and celebs in space: The 2025-in-review news quiz

      December 31, 2025

      AI has become the norm for students. Teachers are playing catch-up.

      December 23, 2025

      Trump signs executive order seeking to ban states from regulating AI companies

      December 13, 2025
    • Business

      New VoidLink malware framework targets Linux cloud servers

      January 14, 2026

      Nvidia Rubin’s rack-scale encryption signals a turning point for enterprise AI security

      January 13, 2026

      How KPMG is redefining the future of SAP consulting on a global scale

      January 10, 2026

      Top 10 cloud computing stories of 2025

      December 22, 2025

      Saudia Arabia’s STC commits to five-year network upgrade programme with Ericsson

      December 18, 2025
    • Crypto

      Bitcoin Hits 50-Day High as US–Iran War Tensions Escalate

      January 14, 2026

      Cardano Whales Buy 100 Million ADA, but Price Still Struggles Below $0.40

      January 14, 2026

      Amended CLARITY Act Bill Frustrates the Crypto Community: Who Really Benefits?

      January 14, 2026

      How Nvidia’s Rubin Chips Could Boost Bittensor Adoption in 2026

      January 14, 2026

      Ethereum Loses Out On $116 Million, But Price Remains Steady Above $3,000

      January 14, 2026
    • Technology

      The case for and against influencer-led Super Bowl ads

      January 14, 2026

      Future of TV Briefing: Brands are spending more to advertise creators’ content, making usage rights a focal point

      January 14, 2026

      After an oversaturation of AI-generated content, creators’ authenticity and ‘messiness’ are in high demand

      January 14, 2026

      Walmart says ‘open partnerships’ are central to its AI strategy, while Amazon goes it alone

      January 14, 2026

      ‘Intentionally being cautious’: Why the ad industry isn’t ready to let AI agents spend ad dollars

      January 14, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»LLMs contain a LOT of parameters. But what’s a parameter?
    Technology

    LLMs contain a LOT of parameters. But what’s a parameter?

    TechAiVerseBy TechAiVerseJanuary 10, 2026No Comments13 Mins Read2 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    LLMs contain a LOT of parameters. But what’s a parameter?
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    LLMs contain a LOT of parameters. But what’s a parameter?

    MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here.

    I am writing this because one of my editors woke up in the middle of the night and scribbled on a bedside notepad: “What is a parameter?” Unlike a lot of thoughts that hit at 4 a.m., it’s a really good question—one that goes right to the heart of how large language models work. And I’m not just saying that because he’s my boss. (Hi, Boss!)

    A large language model’s parameters are often said to be the dials and levers that control how it behaves. Think of a planet-size pinball machine that sends its balls pinging from one end to the other via billions of paddles and bumpers set just so. Tweak those settings and the balls will behave in a different way.  

    OpenAI’s GPT-3, released in 2020, had 175 billion parameters. Google DeepMind’s latest LLM, Gemini 3, may have at least a trillion—some think it’s probably more like 7 trillion—but the company isn’t saying. (With competition now fierce, AI firms no longer share information about how their models are built.)

    But the basics of what parameters are and how they make LLMs do the remarkable things that they do are the same across different models. Ever wondered what makes an LLM really tick—what’s behind the colorful pinball-machine metaphors? Let’s dive in.  

    What is a parameter?

    Think back to middle school algebra, like 2a + b. Those letters are parameters: Assign them values and you get a result. In math or coding, parameters are used to set limits or determine output. The parameters inside LLMs work in a similar way, just on a mind-boggling scale. 

    How are they assigned their values?

    Short answer: an algorithm. When a model is trained, each parameter is set to a random value. The training process then involves an iterative series of calculations (known as training steps) that update those values. In the early stages of training, a model will make errors. The training algorithm looks at each error and goes back through the model, tweaking the value of each of the model’s many parameters so that next time that error is smaller. This happens over and over again until the model behaves in the way its makers want it to. At that point, training stops and the values of the model’s parameters are fixed.

    Sounds straightforward …

    In theory! In practice, because LLMs are trained on so much data and contain so many parameters, training them requires a huge number of steps and an eye-watering amount of computation. During training, the 175 billion parameters inside a medium-size LLM like GPT-3 will each get updated tens of thousands of times. In total, that adds up to quadrillions (a number with 15 zeros) of individual calculations. That’s why training an LLM takes so much energy. We’re talking about thousands of specialized high-speed computers running nonstop for months.

    Oof. What are all these parameters for, exactly?

    There are three different types of parameters inside an LLM that get their values assigned through training: embeddings, weights, and biases. Let’s take each of those in turn.

    Okay! So, what are embeddings?

    An embedding is the mathematical representation of a word (or part of a word, known as a token) in an LLM’s vocabulary. An LLM’s vocabulary, which might contain up to a few hundred thousand unique tokens, is set by its designers before training starts. But there’s no meaning attached to those words. That comes during training.  

    When a model is trained, each word in its vocabulary is assigned a numerical value that captures the meaning of that word in relation to all the other words, based on how the word appears in countless examples across the model’s training data.

    Each word gets replaced by a kind of code?

    Yeah. But there’s a bit more to it. The numerical value—the embedding—that represents each word is in fact a list of numbers, with each number in the list representing a different facet of meaning that the model has extracted from its training data. The length of this list of numbers is another thing that LLM designers can specify before an LLM is trained. A common size is 4,096.

    Every word inside an LLM is represented by a list of 4,096 numbers?  

    Yup, that’s an embedding. And each of those numbers is tweaked during training. An LLM with embeddings that are 4,096 numbers long is said to have 4,096 dimensions.

    Why 4,096?

    It might look like a strange number. But LLMs (like anything that runs on a computer chip) work best with powers of two—2, 4, 8, 16, 32, 64, and so on. LLM engineers have found that 4,096 is a power of two that hits a sweet spot between capability and efficiency. Models with fewer dimensions are less capable; models with more dimensions are too expensive or slow to train and run. 

    Using more numbers allows the LLM to capture very fine-grained information about how a word is used in many different contexts, what subtle connotations it might have, how it relates to other words, and so on.

    Back in February, OpenAI released GPT-4.5, the firm’s largest LLM yet (some estimates have put its parameter count at more than 10 trillion). Nick Ryder, a research scientist at OpenAI who worked on the model, told me at the time that bigger models can work with extra information, like emotional cues, such as when a speaker’s words signal hostility: “All of these subtle patterns that come through a human conversation—those are the bits that these larger and larger models will pick up on.”

    The upshot is that all the words inside an LLM get encoded into a high-dimensional space. Picture thousands of words floating in the air around you. Words that are closer together have similar meanings. For example, “table” and “chair” will be closer to each other than they are to “astronaut,” which is close to “moon” and “Musk.” Way off in the distance you can see “prestidigitation.” It’s a little like that, but instead of being related to each other across three dimensions, the words inside an LLM are related across 4,096 dimensions.

    Yikes.

    It’s dizzying stuff. In effect, an LLM compresses the entire internet into a single monumental mathematical structure that encodes an unfathomable amount of interconnected information. It’s both why LLMs can do astonishing things and why they’re impossible to fully understand.    

    Okay. So that’s embeddings. What about weights?

    A weight is a parameter that represents the strength of a connection between different parts of a model—and one of the most common types of dial for tuning a model’s behavior. Weights are used when an LLM processes text.

    When an LLM reads a sentence (or a book chapter), it first looks up the embeddings for all the words and then passes those embeddings through a series of neural networks, known as transformers, that are designed to process sequences of data (like text) all at once. Every word in the sentence gets processed in relation to every other word.

    This is where weights come in. An embedding represents the meaning of a word without context. When a word appears in a specific sentence, transformers use weights to process the meaning of that word in that new context. (In practice, this involves multiplying each embedding by the weights for all other words.)

    And biases?

    Biases are another type of dial that complement the effects of the weights. Weights set the thresholds at which different parts of a model fire (and thus pass data on to the next part). Biases are used to adjust those thresholds so that an embedding can trigger activity even when its value is low. (Biases are values that are added to an embedding rather than multiplied with it.) 

    By shifting the thresholds at which parts of a model fire, biases allow the model to pick up information that might otherwise be missed. Imagine you’re trying to hear what somebody is saying in a noisy room. Weights would amplify the loudest voices the most; biases are like a knob on a listening device that pushes quieter voices up in the mix. 

    Here’s the TL;DR: Weights and biases are two different ways that an LLM extracts as much information as it can out of the text it is given. And both types of parameters are adjusted over and over again during training to make sure they do this. 

    Okay. What about neurons? Are they a type of parameter too? 

    No, neurons are more a way to organize all this math—containers for the weights and biases, strung together by a web of pathways between them. It’s all very loosely inspired by biological neurons inside animal brains, with signals from one neuron triggering new signals from the next and so on. 

    Each neuron in a model holds a single bias and weights for every one of the model’s dimensions. In other words, if a model has 4,096 dimensions—and therefore its embeddings are lists of 4,096 numbers—then each of the neurons in that model will hold one bias and 4,096 weights. 

    Neurons are arranged in layers. In most LLMs, each neuron in one layer is connected to every neuron in the layer above. A 175-billion-parameter model like GPT-3 might have around 100 layers with a few tens of thousands of neurons in each layer. And each neuron is running tens of thousands of computations at a time. 

    Dizzy again. That’s a lot of math.

    That’s a lot of math.

    And how does all of that fit together? How does an LLM take a bunch of words and decide what words to give back?

    When an LLM processes a piece of text, the numerical representation of that text—the embedding—gets passed through multiple layers of the model. In each layer, the value of the embedding (that list of 4,096 numbers) gets updated many times by a series of computations involving the model’s weights and biases (attached to the neurons) until it gets to the final layer.

    The idea is that all the meaning and nuance and context of that input text is captured by the final value of the embedding after it has gone through a mind-boggling series of computations. That value is then used to calculate the next word that the LLM should spit out. 

    It won’t be a surprise that this is more complicated than it sounds: The model in fact calculates, for every word in its vocabulary, how likely that word is to come next and ranks the results. It then picks the top word. (Kind of. See below …) 

    That word is appended to the previous block of text, and the whole process repeats until the LLM calculates that the most likely next word to spit out is one that signals the end of its output. 

    That’s it?  

    Sure. Well …

    Go on.

    LLM designers can also specify a handful of other parameters, known as hyperparameters. The main ones are called temperature, top-p, and top-k.

    You’re making this up.

    Temperature is a parameter that acts as a kind of creativity dial. It influences the model’s choice of what word comes next. I just said that the model ranks the words in its vocabulary and picks the top one. But the temperature parameter can be used to push the model to choose the most probable next word, making its output more factual and relevant, or a less probable word, making the output more surprising and less robotic. 

    Top-p and top-k are two more dials that control the model’s choice of next words. They are settings that force the model to pick a word at random from a pool of most probable words instead of the top word. These parameters affect how the model comes across—quirky and creative versus trustworthy and dull.   

    One last question! There has been a lot of buzz about small models that can outperform big models. How does a small model do more with fewer parameters?

    That’s one of the hottest questions in AI right now. There are a lot of different ways it can happen. Researchers have found that the amount of training data makes a huge difference. First you need to make sure the model sees enough data: An LLM trained on too little text won’t make the most of all its parameters, and a smaller model trained on the same amount of data could outperform it. 

    Another trick researchers have hit on is overtraining. Showing models far more data than previously thought necessary seems to make them perform better. The result is that a small model trained on a lot of data can outperform a larger model trained on less data. Take Meta’s Llama LLMs. The 70-billion-parameter Llama 2 was trained on around 2 trillion words of text; the 8-billion-parameter Llama 3 was trained on around 15 trillion words of text. The far smaller Llama 3 is the better model. 

    A third technique, known as distillation, uses a larger model to train a smaller one. The smaller model is trained not only on the raw training data but also on the outputs of the larger model’s internal computations. The idea is that the hard-won lessons encoded in the parameters of the larger model trickle down into the parameters of the smaller model, giving it a boost. 

    In fact, the days of single monolithic models may be over. Even the largest models on the market, like OpenAI’s GPT-5 and Google DeepMind’s Gemini 3, can be thought of as several small models in a trench coat. Using a technique called “mixture of experts,” large models can turn on just the parts of themselves (the “experts”) that are required to process a specific piece of text. This combines the abilities of a large model with the speed and lower power consumption of a small one.

    But that’s not the end of it. Researchers are still figuring out ways to get the most out of a model’s parameters. As the gains from straight-up scaling tail off, jacking up the number of parameters no longer seems to make the difference it once did. It’s not so much how many you have, but what you do with them.

    Can I see one?

    You want to see a parameter? Knock yourself out: Here’s an embedding.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleThe man who made India digital isn’t done yet
    Next Article The Download: war in Europe, and the company that wants to cool the planet
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    The case for and against influencer-led Super Bowl ads

    January 14, 2026

    Future of TV Briefing: Brands are spending more to advertise creators’ content, making usage rights a focal point

    January 14, 2026

    After an oversaturation of AI-generated content, creators’ authenticity and ‘messiness’ are in high demand

    January 14, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025602 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025230 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025134 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025109 Views
    Don't Miss
    Technology January 14, 2026

    The case for and against influencer-led Super Bowl ads

    The case for and against influencer-led Super Bowl adsThis story is part of Digiday’s annual…

    Future of TV Briefing: Brands are spending more to advertise creators’ content, making usage rights a focal point

    After an oversaturation of AI-generated content, creators’ authenticity and ‘messiness’ are in high demand

    Walmart says ‘open partnerships’ are central to its AI strategy, while Amazon goes it alone

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    The case for and against influencer-led Super Bowl ads

    January 14, 20262 Views

    Future of TV Briefing: Brands are spending more to advertise creators’ content, making usage rights a focal point

    January 14, 20261 Views

    After an oversaturation of AI-generated content, creators’ authenticity and ‘messiness’ are in high demand

    January 14, 20261 Views
    Most Popular

    What to Know and Where to Find Apple Intelligence Summaries on iPhone

    March 12, 20250 Views

    A Team of Female Founders Is Launching Cloud Security Tech That Could Overhaul AI Protection

    March 12, 20250 Views

    Senua’s Saga: Hellblade 2 leads BAFTA Game Awards 2025 nominations

    March 12, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.