Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Gartner: Why neoclouds are the future of GPU-as-a-Service

    Runlayer is now offering secure OpenClaw agentic capabilities for large enterprises

    Microsoft Copilot ignored sensitivity labels twice in eight months — and no DLP stack caught either one

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026

      To avoid accusations of AI cheating, college students are turning to AI

      January 29, 2026

      ChatGPT can embrace authoritarian ideas after just one prompt, researchers say

      January 24, 2026
    • Business

      Gartner: Why neoclouds are the future of GPU-as-a-Service

      February 21, 2026

      The HDD brand that brought you the 1.8-inch, 2.5-inch, and 3.5-inch hard drives is now back with a $19 pocket-sized personal cloud for your smartphones

      February 12, 2026

      New VoidLink malware framework targets Linux cloud servers

      January 14, 2026

      Nvidia Rubin’s rack-scale encryption signals a turning point for enterprise AI security

      January 13, 2026

      How KPMG is redefining the future of SAP consulting on a global scale

      January 10, 2026
    • Crypto

      Another European Country Bans Polymarket, Threatens Massive Fine

      February 20, 2026

      Why Is The US Stock Market Up Today?

      February 20, 2026

      Is XRP Price Preparing To Breach Its 2026 Downtrend? Here’s What History Says

      February 20, 2026

      “Disgrace” or “Win for American Wallets”? Supreme Court Tariff Bombshell Sparks Political Meltdown in Washington

      February 20, 2026

      Perle Labs CEO Ahmed Rashad on Why AI Needs Verifiable Data Infrastructure

      February 20, 2026
    • Technology

      Runlayer is now offering secure OpenClaw agentic capabilities for large enterprises

      February 21, 2026

      Microsoft Copilot ignored sensitivity labels twice in eight months — and no DLP stack caught either one

      February 21, 2026

      Be Wary of Bluesky

      February 21, 2026

      CERN rebuilt the original browser from 1989

      February 21, 2026

      Across the US, people are dismantling and destroying Flock surveillance cameras

      February 21, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»SpikingBrain 7B – More efficient than classic LLMs
    Technology

    SpikingBrain 7B – More efficient than classic LLMs

    TechAiVerseBy TechAiVerseSeptember 14, 2025No Comments4 Mins Read2 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    SpikingBrain 7B – More efficient than classic LLMs
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    SpikingBrain 7B – More efficient than classic LLMs

    SpikingBrain:Spiking Brain-inspired Large Models

    📄 Technical Report: Chinese | English
    🚀 Arxiv: arXiv:2509.05276
    🧩 Models: Available Models


    About SpikingBrain

    Inspired by brain mechanisms, SpikingBrain integrates hybrid efficient attention, MoE modules, and spike encoding into its architecture, supported by a universal conversion pipeline compatible with the open-source model ecosystem. This enables continual pre-training with less than 2% of the data while achieving performance comparable to mainstream open-source models. We further adapt frameworks, operators, parallel strategies, and communication primitives for non-NVIDIA (MetaX) clusters, ensuring stable large-scale training and inference. SpikingBrain achieves over 100× speedup in TTFT for 4M-token sequences, while spiking delivers over 69% sparsity at the micro level. Combined with macro-level MoE sparsity, these advances provide valuable guidance for the design of next-generation neuromorphic chips.


    Project Structure

    This repository provides the full implementation and weights of SpikingBrain-7B, including the HuggingFace version, vLLM inference version, and quantized version, enabling flexible deployment and research across different scenarios.

    SpikingBrain-7B/
    ├── hf_7B_model/ # HuggingFace version
    ├── run_model/   # Model run examples
    ├── vllm_hymeta/ # vLLM plugins and inference support
    ├── W8ASpike/    # Quantized inference version
    ├── setup.py
    ├── requirements.txt 
    └── README.md 
    

    vLLM-HyMeta

    vllm-hymeta is the plugin adaptation of HyMeta (Hybrid Models built on MetaX GPUs) for the vLLM inference framework, providing efficient inference support on NVIDIA GPUs.

    By leveraging the plugins mechanism in vLLM, hardware backends can be integrated in a modular fashion, bringing the following benefits:

    • Decoupled codebase: Backend-specific code remains independent, keeping the vLLM core cleaner.

    • Reduced maintenance cost: vLLM developers can focus on general functionality without being affected by backend-specific implementations.

    • Faster integration: New backends can be integrated quickly and evolve independently with less engineering effort.

    Container Deployment (NVIDIA)

    sudo docker run -itd 
        --entrypoint /bin/bash 
        --network host 
        --name hymeta-bench 
        --shm-size 160g 
        --gpus all 
        --privileged 
        -v /host_path:/container_path 
        docker.1ms.run/vllm/vllm-openai:v0.10.0

    Plugin Installation

    git clone https://github.com/BICLab/SpikingBrain-7B.git
    cd SpikingBrain-7B
    pip install .

    Recommended environment for installing vllm-hymeta on NVIDIA GPUs:

    decorator
    pyyaml
    scipy
    setuptools
    setuptools-scm
    flash_attn==2.7.3
    flash-linear-attention==0.1
    vllm==0.10.0
    torch==2.7.1

    Run with vLLM

    You can serve a model with vLLM in the simplest way using the following command:

    You may also set --tensor-parallel-size and --pipeline-parallel-size when launching if you want to run with multiple GPUs.


    W8ASpike

    W8ASpike is the quantized inference version of SpikingBrain-7B, aiming to reduce inference cost under low-precision settings and explore the potential of Spiking Neural Networks (SNNs).

    The current implementation adopts pseudo-spiking, where activations are approximated as spike-like signals at the tensor level, rather than true asynchronous event-driven spiking on neuromorphic hardware.

    • Pseudo-spiking: Efficient approximation at the tensor level, suitable for prototyping and research.

    • True-spiking: Requires asynchronous hardware and event-driven operator support, which is beyond the scope of this repository.

    The activation spike encoding process here is inspired by the pseudo-spiking interfaces from BICLab/Int2Spike. For additional PyTorch-based spiking interfaces, please refer to the Int2Spike library.


    Available Models

    The model weights are hosted on ModelScope. Please select the appropriate version based on your needs:

    • Pre-trained model (7B): https://www.modelscope.cn/models/Panyuqi/V1-7B-base
    • Chat model (7B-SFT): https://www.modelscope.cn/models/Panyuqi/V1-7B-sft-s3-reasoning
    • Quantized weights (7B-W8ASpike): https://www.modelscope.cn/models/Abel2076/SpikingBrain-7B-W8ASpike

    Usage

    Example scripts are provided in run_model/ for running the model with the released checkpoints.

    • Hugging Face
      Load with AutoModelForCausalLM and use as a standard CausalLM (forward or generation); see run_model/run_model_hf.py.
      For the SFT model, a chat template is used; see run_model/run_model_hf_chat_template.py.

    • vLLM
      Perform inference using the provided vLLM Hymeta plugin; see run_model/run_model_vllm.py and the vLLM Hymeta section.

    Performance Evaluation

    Table 1: Performance evaluation of the SpikingBrain-7B pre-trained model. All models are tested with the HuggingFace framework and evaluated using a perplexity-based method. Except for Qwen2.5, the other baselines are trained on limited Chinese data, resulting in clear disadvantages on CMMLU and C-Eval.

    Table 2: Performance evaluation of the SpikingBrain-76B pre-trained model. All models are tested with the vLLM framework and evaluated using a perplexity-based method. Except for Qwen2.5, the other baselines are trained on limited Chinese data, resulting in clear disadvantages on CMMLU and C-Eval.


    Citation

    If you find our work useful, please consider citing SpikingBrain:

    @article{pan2025spikingbrain,
      title={SpikingBrain Technical Report: Spiking Brain-inspired Large Models},
      author={Pan, Yuqi and Feng, Yupeng and Zhuang, Jinghao and Ding, Siyu and Liu, Zehao and Sun, Bohan and Chou, Yuhong and Xu, Han and Qiu, Xuerui and Deng, Anlin and others},
      journal={arXiv preprint arXiv:2509.05276},
      year={2025}
    }
    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleRefurb Weekend: Silicon Graphics Indigo² Impact 10000
    Next Article Models of European Metro Stations
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Runlayer is now offering secure OpenClaw agentic capabilities for large enterprises

    February 21, 2026

    Microsoft Copilot ignored sensitivity labels twice in eight months — and no DLP stack caught either one

    February 21, 2026

    Be Wary of Bluesky

    February 21, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025684 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025276 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025158 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025119 Views
    Don't Miss
    Business Technology February 21, 2026

    Gartner: Why neoclouds are the future of GPU-as-a-Service

    Gartner: Why neoclouds are the future of GPU-as-a-Service Neoclouds are set to change the economcs…

    Runlayer is now offering secure OpenClaw agentic capabilities for large enterprises

    Microsoft Copilot ignored sensitivity labels twice in eight months — and no DLP stack caught either one

    Be Wary of Bluesky

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Gartner: Why neoclouds are the future of GPU-as-a-Service

    February 21, 20262 Views

    Runlayer is now offering secure OpenClaw agentic capabilities for large enterprises

    February 21, 20260 Views

    Microsoft Copilot ignored sensitivity labels twice in eight months — and no DLP stack caught either one

    February 21, 20260 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    This new Roomba finally solves the big problem I have with robot vacuums

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.