CISPO: Clipped Importance Sampling Policy Optimization →
https://huggingface.co/papers/2506.13585
This RL algorithm from the MiniMax-M1 project clips importance-sampling weights instead of per-token updates. This lets all tokens (even rare but crucial ones) contribute to learning, avoiding the token-level clipping. CISPO also avoids KL penalties and uses group relative advantage like GRPO.PAPO: Perception-Aware Policy Optimization → https://huggingface.co/papers/2507.06448
Enhances RL in vision-language tasks by adding a KL-based perception loss to the GRPO objective for better visual alignment during training. It boosts accuracy by 4–8% and reduces perception errors by ~30%.OPO: On-Policy RL with Optimal Baseline → https://huggingface.co/papers/2505.23585
A simplified RL algorithm from Microsoft that enforces strict on-policy training by using freshly sampled outputs from the current policy for every update, minimizing off-policy drift. It minimizes gradient variance, avoiding auxiliary models and regularization.EXPO: Expressive Policy Optimization → https://huggingface.co/papers/2507.07986
Trains complex policies by pairing a large base model with a lightweight edit policy that suggests better actions, selecting the best of both without backpropagating through the base.
Ksenia Se
AI & ML interests
Recent Activity
Organizations


Reinforcement Learning (RL) won't stuck in the same old PPO loop - in the last two months alone, researchers have introduced a new wave of techniques, reshaping how we train and fine-tune LLMs, VLMs, and agents.
Here are 9 fresh policy optimization techniques worth knowing:
1. GSPO: Group Sequence Policy Optimization → Group Sequence Policy Optimization (2507.18071)
Shifts from token-level to sequence-level optimization, clipping, and rewarding to capture the full picture and increase stability compared to GRPO. GSPO-token variation also allows token-level fine-tuning.
2. LAPO: Length-Adaptive Policy Optimization → LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization (2507.15758)
A two-stage RL framework that trains models to adaptively control reasoning length by learning typical solution lengths for shorter and more efficient reasoning.
3. HBPO: Hierarchical Budget Policy Optimization → Hierarchical Budget Policy Optimization for Adaptive Reasoning (2507.15844)
This one trains model to adapt reasoning depth based on problem complexity. It divides training samples into subgroups with different token budgets, using budget-aware rewards to align reasoning effort with task difficulty.
4. SOPHIA: Semi-off-policy reinforcement learning → Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning (2507.16814)
Combines on-policy visual understanding from the Vision Language Models (VLMs) with off-policy reasoning from an LM, assigning outcome-based rewards and propagating visual rewards backward through the reasoning steps.
5. RePO: Replay-Enhanced Policy Optimization → RePO: Replay-Enhanced Policy Optimization (2506.09340)
Introduces a replay buffer into on-policy RL for LLMs, retrieving diverse off-policy samples for each prompt to broaden the training data per prompt
Read further below ⬇️
If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

Time to look at some free useful resources that can help you upgrade your knowledge of AI and machine learning!
Today we offer you these 6 must-read surveys that can be your perfect guides to the major fields and techniques:
1. Foundations of Large Language Models by Tong Xiao and Jingbo Zhu → https://arxiv.org/abs/2501.09223
Many recommend this 270-page book as a good resource to focus on fundamental concepts, such as pre-training, generative models, prompting, alignment, and inference
2. Large Language Models Post-Training: Surveying Techniques from Alignment to Reasoning -> A Survey on Post-training of Large Language Models (2503.06072)
Read this to master policy optimization (RLHF, DPO, GRPO), supervised and parameter-efficient fine-tuning, reasoning, integration, and adaptation techniques
3. Agentic Large Language Models, a survey by Leiden University → https://arxiv.org/abs/2503.23037
Surveys agentic LLMs across reasoning, tools, and multi-agent collaboration, highlighting their synergy. It also explores their promise, risks and applications in medicine, finance, science.
4. A Survey of Context Engineering for Large Language Models → A Survey of Context Engineering for Large Language Models (2507.13334)
Defines Context Engineering as systematic info design for LLMs beyond prompting, covering retrieval, processing, management, and architectures like RAG and multi-agent systems
5. A Survey of Generative Categories and Techniques in Multimodal Large Language Models → https://arxiv.org/abs/2506.10016
Covers multimodal models, exploring six generative modalities, key techniques (SSL, RLHF, CoT), architectural trends, and challenges
6. Large Language models for Time Series Analysis: Techniques, Applications, and Challenges → https://arxiv.org/abs/2506.11040
Explains how LLMs transform time series analysis by enhancing pattern recognition and long-term dependency handling + shows how to build them
Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

FreeLoRA → https://huggingface.co/papers/2507.01792
Enables training-free image generation with multiple subjects by fine-tuning each LoRA module on one subject. During inference, subject-aware activation applies modules only to their target tokens, ensuring clean, interference-free fusion.LoRA-Augmented Generation (LAG) → https://huggingface.co/papers/2507.05346
Uses large collections of task-specific LoRA adapters without needing extra training or data. It selects and applies the most relevant adapters at each layer and token, exceling in knowledge-intensive tasks.ARD-LoRA (Adaptive Rank Dynamic LoRA) → https://huggingface.co/papers/2506.18267
Adjusts the rank of LoRA adapters dynamically across transformer layers and heads by learning per-head scaling factors through a meta-objective. It balances performance, efficiency, using fewer parameters and reducing memory use.WaRA → https://huggingface.co/papers/2506.24092
Designed for vision tasks, it uses wavelet transforms and decomposes weight updates into multiple resolutions, capturing both coarse and detailed patterns.BayesLoRA → https://huggingface.co/papers/2506.22809
Adds uncertainty estimation to LoRA adapters using MC-Dropout, helping models gauge confidence in unfamiliar situations. It detects variance outside fine-tuned distributions, supporting more cautious and adaptive behavior of models.Dual LoRA Learning (DLoRAL) → https://huggingface.co/papers/2506.15591
Trains two LoRA branches: C-LoRA captures temporal coherence from degraded input, while D-LoRA improves visual detail. It's used for video super-resolution that enhances both spatial detail and temporal consistency.Safe Pruning LoRA (SPLoRA) → https://huggingface.co/papers/2506.18931
Improves the safety of LoRA-tuned LMs by selectively removing LoRA layers that reduce alignment, using a new E-DIEM metric to detect safety-related shifts without relying on data labels.PLoP (Precise LoRA Placement) → https://huggingface.co/papers/2506.20629
A lightweight method that automatically selects optimal LoRA adapter placement during fine-tuning based on the model and task

LoRA (Low-Rank Adaptation) is a popular lightweight method for fine-tuning AI models. It doesn't update the full model, it adds small trainable components, low-rank matrices, while keeping the original weights frozen. Only these adapters are trained.
Recently, many interesting new LoRA variations came out, so it’s a great time to take a look at these 13 clever approaches:
1. T-LoRA → T-LoRA: Single Image Diffusion Model Customization Without Overfitting (2507.05964)
A timestep-dependent LoRA method for adapting diffusion models with a single image. It dynamically adjusts updates and uses orthogonal initialization to reduce overlap, achieving better fidelity–alignment balance than standard LoRA
2. SingLoRA → SingLoRA: Low Rank Adaptation Using a Single Matrix (2507.05566)
Simplifies LoRA by using only one small matrix instead of usual two, and multiplying it by its own transpose (like A × Aᵀ). It uses half the parameters of LoRA and avoids scale mismatch between different matrices
3. LiON-LoRA → LiON-LoRA: Rethinking LoRA Fusion to Unify Controllable Spatial and Temporal Generation for Video Diffusion (2507.05678)
Improves control and precision in video diffusion models when training data is limited. It builds on LoRA, adding 3 key principles: linear scalability, orthogonality, and norm consistency. A controllable token and modified self-attention enables smooth adjustment of motion
4. LoRA-Mixer → LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing (2507.00029)
Combines LoRA and mixture-of-experts (MoE) to adapt LLMs for multiple tasks. It dynamically routes task-specific LoRA experts into linear projections of attention modules, supporting both joint training and frozen expert reuse
5. QR-LoRA → QR-LoRA: Efficient and Disentangled Fine-tuning via QR Decomposition for Customized Generation (2507.04599)
Separates content and style when combining multiple LoRA adapters. It implements QR decomposition to structure parameter updates, where the orthogonal Q matrix reduces interference between features, and the R matrix captures specific transformations
Read further in the comments 👇
If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

AllVoiceLab MCP Server -> https://github.com/allvoicelab/AllVoiceLab-MCP
Enables AI agents to access advanced text-to-speech, voice conversion, and video translation APIs, powering use cases like global content localization, AI audiobooks, and voice-driven media production.MCP Email Server -> https://github.com/Shy2593666979/mcp-server-email
For email functionality: write and send emails with multiple recipients, add and search files within specified directories.Google Admin MCP Server -> https://github.com/securityfortech/google-admin-mcp
Manage Google Workspace users through the Admin Directory API (list, create, get info about users, etc.)Android MCP Server -> https://github.com/minhalvp/android-mcp-server
Provides programmatic control over Android devices through ADB (Android Debug Bridge).DeepView MCP -> https://github.com/ai-1st/deepview-mcp
Enables IDEs (Cursor, Windsurf, etc.) to analyze large codebases using Gemini's extensive context window.Calculator MCP Server -> https://github.com/githejie/mcp-server-calculator
May sound easy, but it's essential for precise numerical calculations within LLMsMCP Aggregator -> https://github.com/nazar256/combine-mcp
Combines multiple MCP servers into a single interface for more convenient use

MCP is redefining how AI assistants connect to the world of data and tools, so no wonder MCP servers are in high demand now. That’s why we’ve curated 13 cool MCP servers to upgrade your workflow:
1. Hugging Face Official MCP Server -> https://github.com/evalstate/hf-mcp-server
Provides an access and interaction with Hugging Face models, datasets, and Gradio Spaces for dynamic tool integration and configuration across environments.
2. Browser MCP -> https://browsermcp.io/
An MCP server +Chrome extension. It allows to automate your browser with AI apps like VS Code, Claude, Cursor, and Windsurf.
3. Bright Data MCP -> https://github.com/brightdata/brightdata-mcp
This one is for working with data in real-time: searching the web, navigating websites, taking action and retrieving data.
4. JSON MCP -> https://github.com/VadimNastoyashchy/json-mcp
Interact with JSON files: split, merge, find specific data, and validate content within them.
5. Octagon Deep Research MCP -> https://github.com/OctagonAI/octagon-deep-research-mcp
Allows for deep research via AI agents, integrating seamlessly with MCP clients like Claude Desktop and Cursor for powerful, unlimited research capabilities.
6. VLM Run MCP Server -> https://docs.vlm.run/mcp/introduction
Provides an agent the ability to see, understand and process visual content.
Read further in the comments 👇
P.S.:
Our most read explanation of MCP on Hugging Face https://huggingface.co/blog/Kseniase/mcp
Our first list of 13 awesome MCP servers: https://huggingface.co/posts/Kseniase/204958200717570
If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

DeepResearcher -> https://github.com/GAIR-NLP/DeepResearcher
An RL framework for training deep research agents end-to-end in real-world environments with web search, exhibiting emergent behaviour like planning, multi-source validation, self-reflection, and honest defining when the agent doesn't know the answerSearch-R1 -> https://github.com/PeterGriffinJin/Search-R1
Features interleaved search access and an open-source RL training pipeline supporting various algorithms (PPO, GRPO, etc.), LLMs (LLaMA3, Qwen2.5, etc.), and search engines (online, local, retrievers)ReCall -> https://github.com/Agent-RL/ReCall
Trains LLMs to reason with tools via RL, no supervised tool-use data needed. It enables agentic use of tools like OpenAI o3 and supports synthetic data generation across diverse environments and multi-step tasksOWL -> https://github.com/camel-ai/owl
A framework built on CAMEL-AI framework enabling dynamic multi-agent collaboration for task automation across diverse domains
Here's an awesome study exploring the entire roadmap of Deep Research assistants. Don't forget to check it out -> https://huggingface.co/papers/2506.18096

Deep Research agents are quickly becoming our daily co-workers — built for complex investigations, not just chat. With modular architecture, advanced tool use and real web access, they go far beyond typical AI. While big-name agents get the spotlight, we want to highlight some powerful recent open-source alternatives:
1. DeerFlow -> https://github.com/bytedance/deer-flow
A modular multi-agent system combining LMs and tools for automated research and code analysis. It links a coordinator, planner, team of specialized agent, and reporter, and converts reports to speech via Text-to-Speech (TTS)
2. Alita -> https://github.com/CharlesQ9/Alita
Uses a single problem-solving module for scalable reasoning through simplicity. It self-evolves by generating and reusing Model Context Protocols (MCPs) from open-source tools to build external capabilities for diverse tasks
3. WebThinker -> https://github.com/RUC-NLPIR/WebThinker
Lets reasoning models autonomously search the web and navigate pages. Deep Web Explorer allows interaction with links and follow-up searches. Through a Think-Search-and-Draft process models generate and refine reports in real time. RL training with preference pairs improves the workflow
4. SimpleDeepSearcher -> https://github.com/RUCAIBox/SimpleDeepSearcher
A lightweight framework showing that supervised fine-tuning is a real alternative to complex RL, using simulated web interactions and multi-criteria curation to generate high-quality training data
5. AgenticSeek -> https://github.com/Fosowl/agenticSeek
A private, on-device assistant that picks the best agent expert for browsing, coding, or planning—no cloud needed. Includes voice input via speech-to-text
6. Suna -> https://github.com/kortix-ai/suna
Offers web browsing, file and doc handling, CLI execution, site deployment, and API/service integration—all in one assistant
Subscribe to the Turing Post:https://www.turingpost.com/subscribe
Read further ⬇️

Constraint-Based Decoding -> https://huggingface.co/papers/2502.05111
Guide generation using hard constraints, like context-free grammar (CFG) rules. This keeps outputs aligned with task goals, especially in structured prediction or planning. Can be combined with symbolic solvers or logic-checking agentsExploration Prompts (Explore-then-Pick) -> https://huggingface.co/papers/2506.09014
Generate multiple diverse responses via sampling, then use a learned Sample Set Aggregator (SSA), trained with reinforcement learning, to pick the best answer. Similar to “draft → verify” strategies, but the final selection is done via a trained model, not heuristics.Prompt Perturbation Sampling for Inference -> https://huggingface.co/papers/2502.11027
From a pool of diverse model responses sampled with prompt perturbation, distill only the most elegant, logically consistent outputs to improve metrics like Pass@10. This is a post‑generation inference technique.Prompt Ordering via Embedding Clustering -> https://openreview.net/pdf?id=1Iu2Yte5N6
Uncovers that few-shot prompt permutations form clusters in the model’s embedding space — especially by first demonstration — and uses this to design a cluster-based ordering method for generating strong in-context example sequences.Controlled Prompting Variations -> https://huggingface.co/papers/2504.02111
Controlled “bad” prompts (like irrelevant info, misleading framing) expose fragilities in model reasoning. So use light adversarial prompting in evaluations to find breaking points. Plus remove irrelevant info to reduce confusion and improve focus; standardize format to minimize inconsistency and hallucination; and implement explicitly prompt reasoning to boost accuracy and transparency

Everyone’s chasing top reasoning, but sometimes it's still the bottleneck for many real-world tasks. This week, let's spotlight some powerful techniques that have shown promise in helping LLMs achieve more consistent logic, planning, and depth:
1. Retrieval-Augmented CoT Chaining (RAG+CoT) -> CoT-RAG: Integrating Chain of Thought and Retrieval-Augmented Generation to Enhance Reasoning in Large Language Models (2504.13534)
Combines Chain-of-Thought prompting with retrieval augmentation at intermediate steps. Relevant documents are fetched after each reasoning subgoal, updating context dynamically. Great for open-domain QA, math, logic and multi-hop fact-checking
2. Tool-use by example injection -> Self-Training Large Language Models for Tool-Use Without Demonstrations (2502.05867)
Injects few-shot tool interaction examples during training to implicitly teach calling patterns. Helps in plug-and-play tool use without training new architectures
3. Visual Scratchpads, or multimodal reasoning support -> Imagine while Reasoning in Space: Multimodal Visualization-of-Thought (2501.07542)
Using structured visual inputs or sketchable intermediate steps (diagrams, grids, trees) boosts performance in tasks like planning, geometry, and multi-agent simulation. In real practice thanks to this GPT-4o, Claude, and Gemini show marked improvement
4. System 1 vs System 2 Prompt switching -> Adaptive Deep Reasoning: Triggering Deep Thinking When Needed (2505.20101)
Changing a fast, intuitive response prompt with a slow, deliberate reasoning mode is among the most popular AI trends. E.g., models tend to respond more reliably when explicitly instructed to “think like a researcher.” This can also reduce hallucinations in open-ended generation and debate tasks
5. Adversarial Self-Chat Fine-Tuning -> Self-playing Adversarial Language Game Enhances LLM Reasoning (2404.10642)
Generate debates between model variants or model vs human, then fine-tune on the winner’s response. It helps models learn to better defend their reasoning. Used in Claude’s Constitutional AI and SPPO-style tuning
Read further below👇
Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe
Since Meta released the newest V-JEPA 2 this week, we thought it's a good time to revisit a few other interesting JEPA variants. JEPA, or Joint Embedding Predictive Architecture, a self-supervised learning framework that predicts the latent representation of a missing part of the input.
Here are 11 JEPA types that you should know about:
1. V-JEPA 2 -> V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning (2506.09985)
Trained on 1M+ hours of internet videos and a little bit of robot interaction data, V-JEPA 2 can watch, understand, answer questions, and help robots plan and act in physical world
2. Time-Series-JEPA (TS-JEPA) -> Time-Series JEPA for Predictive Remote Control under Capacity-Limited Networks (2406.04853)
It's a time-series predictive model that learns compact, meaningful representations. A self-supervised semantic actor then uses them to generate control commands without raw data
3. Denoising JEPA (D-JEPA) -> Denoising with a Joint-Embedding Predictive Architecture (2410.03755)
Combines JEPA with diffusion techniques. By treating JEPA as masked image modeling and next-token prediction, D-JEPA generates data auto-regressively, incorporating diffusion and flow-matching losses
4. CNN-JEPA -> CNN-JEPA: Self-Supervised Pretraining Convolutional Neural Networks Using Joint Embedding Predictive Architecture (2408.07514)
This SSL approach applies JEPA idea to CNNs using a sparse encoder, depthwise separable convolutions, and improved masking. On ImageNet-100, CNN-JEPA outperforms I-JEPA with 73.3% accuracy
5. Stem-JEPA -> Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation (2408.02514)
Identifies instrument stems by mapping mixes and stems into a shared space using an encoder and predictor. It captures timbre, harmony, and rhythm for tasks like stem retrieval, alignment, and genre or key estimation
6. DMT-JEPA (Discriminative Masked Targets JEPA) -> DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture (2405.17995)
Improves discriminative power by generating masked targets from semantically similar neighboring patches and uses lightweight cross-attention for aggregation
Read further below👇
Also, subscribe to the Turing Post -> https://www.turingpost.com/subscribe

seq-JEPA -> https://huggingface.co/papers/2505.03176
A world modeling framework that learns invariant and equivariant representations from view sequences and transformations, using a transformer to predict future states. Excels in sequence-based tasksAD-L-JEPA -> https://huggingface.co/papers/2501.04969
Learns spatial world models via Bird’s Eye View (BEV) embeddings without explicit generation or manual pair creation, simplifying training and boosting representation quality. Excels in LiDAR 3D object detection and transfer learningSAR-JEPA -> https://huggingface.co/papers/2311.15153
Predicts multi-scale Synthetic Aperture Radar (SAR) gradient features from locally masked patches. SAR-JEPA handles small targets and speckle noise and integrates domain-specific features to improve SSL signalsHEP-JEPA -> https://huggingface.co/papers/2502.03933
A transformer-based foundation model for high-energy collider tasks. Using the JetClass dataset of 100M jets, it predicts embeddings of unseen jet constituents from partial contextECG-JEPA -> https://huggingface.co/papers/2410.13867
JEPA for self-supervised ECG representation learning designed to excel at ECG-based heart arrhythmia diagnosis
Check out more types of JEPA here -> https://huggingface.co/posts/Kseniase/646284586461230

Since Meta released the newest V-JEPA 2 this week, we thought it's a good time to revisit a few other interesting JEPA variants. JEPA, or Joint Embedding Predictive Architecture, a self-supervised learning framework that predicts the latent representation of a missing part of the input.
Here are 11 JEPA types that you should know about:
1. V-JEPA 2 -> V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning (2506.09985)
Trained on 1M+ hours of internet videos and a little bit of robot interaction data, V-JEPA 2 can watch, understand, answer questions, and help robots plan and act in physical world
2. Time-Series-JEPA (TS-JEPA) -> Time-Series JEPA for Predictive Remote Control under Capacity-Limited Networks (2406.04853)
It's a time-series predictive model that learns compact, meaningful representations. A self-supervised semantic actor then uses them to generate control commands without raw data
3. Denoising JEPA (D-JEPA) -> Denoising with a Joint-Embedding Predictive Architecture (2410.03755)
Combines JEPA with diffusion techniques. By treating JEPA as masked image modeling and next-token prediction, D-JEPA generates data auto-regressively, incorporating diffusion and flow-matching losses
4. CNN-JEPA -> CNN-JEPA: Self-Supervised Pretraining Convolutional Neural Networks Using Joint Embedding Predictive Architecture (2408.07514)
This SSL approach applies JEPA idea to CNNs using a sparse encoder, depthwise separable convolutions, and improved masking. On ImageNet-100, CNN-JEPA outperforms I-JEPA with 73.3% accuracy
5. Stem-JEPA -> Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation (2408.02514)
Identifies instrument stems by mapping mixes and stems into a shared space using an encoder and predictor. It captures timbre, harmony, and rhythm for tasks like stem retrieval, alignment, and genre or key estimation
6. DMT-JEPA (Discriminative Masked Targets JEPA) -> DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture (2405.17995)
Improves discriminative power by generating masked targets from semantically similar neighboring patches and uses lightweight cross-attention for aggregation
Read further below👇
Also, subscribe to the Turing Post -> https://www.turingpost.com/subscribe

LRM - Large Reasoning Model (DeepSeek-R1, OpenAI's o3) -> https://huggingface.co/papers/2501.09686
Advanced AI systems specifically optimized for multi-step logical reasoning, complex problem-solving, and structured thinking. LRMs incorporate test-time scaling, Chain-of-Thought reasoning, tool use, external memory, strong math and code capabilities, and more modular design for reliable decision-makingMoE - Mixture of Experts (e.g. Mixtral) -> https://www.turingpost.com/p/moe
Uses many sub-networks called experts, but activates only a few per input, enabling massive scaling with sparse computationSSM - State Space Model (Mamba, RetNet) -> https://huggingface.co/papers/2111.00396
- our overview of SSMs and Mamba: https://www.turingpost.com/p/mamba
A neural network that defines the sequence as a continuous dynamical system, modeling how hidden state vectors change in response to inputs over time. SSMs are parallelizable and efficient for long contexts
- RNN - Recurrent Neural Network (advanced variants: LSTM, GRU) -> https://huggingface.co/papers/1912.05911
- detailed article about LSTM: https://www.turingpost.com/p/xlstm
Processes sequences one step at a time, passing information through a hidden state that acts as memory. RNNs were widely used in early NLP and time-series tasks but struggle with long-range dependencies compared to newer architectures
CNN - Convolutional Neural Network (MobileNet, EfficientNet) -> https://huggingface.co/papers/1511.08458
Automatically learns patterns from visual data. It uses convolutional layers to detect features like edges, textures, or shapes. Not so popular now, but still used in edge applications and visual processingSAM - Segment Anything Model (developed by Meta AI) -> https://huggingface.co/papers/2304.02643
A foundation model trained on over 1 billion segmentation masks. Given a prompt (like a point or box), it segments the relevant objectLNN – Liquid Neural Network (LFMs - Liquid Foundation Models by Liquid AI) -> https://arxiv.org/pdf/2006.04439
- more about LFMs https://www.turingpost.com/p/liquidhyena
LNNs use differential equations to model neuronal dynamics to adapt their behavior in real-time. They continuously update their internal state, which is great for time-series data, robotics, and real-world decision making

Let’s refresh some fundamentals today to stay fluent in the what we all work with. Here are some of the most popular model types that shape the vast world of AI (with examples in the brackets):
1. LLM - Large Language Model (GPT, LLaMA) -> Large Language Models: A Survey (2402.06196)
+ history of LLMs: https://www.turingpost.com/t/The%20History%20of%20LLMs
It's trained on massive text datasets to understand and generate human language. They are mostly build on Transformer architecture, predicting the next token. LLMs scale by increasing overall parameter count across all components (layers, attention heads, MLPs, etc.)
2. SLM - Small Language Model (TinyLLaMA, Phi models, SmolLM) A Survey of Small Language Models (2410.20011)
Lightweight LM optimized for efficiency, low memory use, fast inference, and edge use. SLMs work using the same principles as LLMs
3. VLM - Vision-Language Model (CLIP, Flamingo) -> An Introduction to Vision-Language Modeling (2405.17247)
Processes and understands both images and text. VLMs map images and text into a shared embedding space or generate captions/descriptions from both
4. MLLM - Multimodal Large Language Model (Gemini) -> A Survey on Multimodal Large Language Models (2306.13549)
A large-scale model that can understand and process multiple types of data (modalities) — usually text + other formats, like images, videos, audio, structured data, 3D or spatial inputs. MLLMs can be LLMs extended with modality adapters or trained jointly across vision, text, audio, etc.
5. LAM - Large Action Model (InstructDiffusion, RT-2) -> Large Action Models: From Inception to Implementation (2412.10047)
Understands and generates action sequences by predicting action tokens (discrete/continuous instructions) that guide agents. Trained on behavior datasets, LAMs generalize across tasks, environments, and modalities - video, sensor data, etc.
Read about LRM, MoE, SSM, RNN, CNN, SAM and LNN below👇
Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

Filesystem MCP Server -> https://github.com/modelcontextprotocol/servers/tree/HEAD/src/filesystem
Read, write, search files, plus create, delete, list and move directories specified via args.Notion MCP Server -> https://github.com/makenotion/notion-mcp-server
Enable models to interact with your Notion workspace to automate tasks such as searching, reading, creating, and updating pages and databasesMarkdownify MCP Server -> https://github.com/zcaceres/markdownify-mcp
Converts various file types (PDFs, images, audio) and web pages to Markdown formatFetch MCP Server -> https://github.com/modelcontextprotocol/servers/tree/main/src/fetch
Allows LLMs to retrieve and process content from web pages, converting HTML to markdownMobile Next - MCP server for Mobile Development and Automation -> https://github.com/mobile-next/mobile-mcp
Enables Agents and LLMs to interact with iOS/Android apps using accessibility snapshots or taps from screenshotsMCP installer -> https://github.com/anaisbetts/mcp-installer
This one is quite hilarious - "MCP for MCP". It allows you to ask your model (Claude, for example) to install MCP servers hosted in npm or PyPi for you.

MCP changed how agents connect with tools.
After writing the most read explanation of MCP on Hugging Face (https://huggingface.co/blog/Kseniase/mcp), we chose this 13 awesome MCP servers that you can work with:
1. Agentset MCP -> https://github.com/agentset-ai/mcp-server
For efficient and quick building of intelligent, doc-based apps using open-source Agentset platform for RAG
2. GitHub MCP Server -> https://github.com/github/github-mcp-server
Integrates GitHub APIs into your workflow, allowing to build AI tools and apps that interact with GitHub's ecosystem
3. arXiv MCP -> https://github.com/andybrandt/mcp-simple-arxiv
Allows working with research papers on arXiv through effective search and access to their metadata, abstracts, and links
4. MCP Run Python -> https://github.com/pydantic/pydantic-ai/tree/main/mcp-run-python
Enables to run Python code in a sandbox via Pyodide in Deno, so it can be isolated from the rest of the operating system
5. Safe Local Python Executor -> https://github.com/maxim-saplin/mcp_safe_local_python_executor
A lightweight tool for running LLM-generated Python code locally, using Hugging Face’s LocalPythonExecutor (from smolagents framework) and exposing it via MCP for AI assistant integration
6. Cursor MCP Installer -> https://github.com/matthewdcage/cursor-mcp-installer
Allows to automatically add MCP servers to Cursor for development convenience
7. Basic Memory -> https://memory.basicmachines.co/docs/introduction
This knowledge management system connects to LLMs and lets you build a persistent semantic graph from AI conversations with AI agents
Read further in the comments 👇
If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

T-JEPA -> https://huggingface.co/papers/2410.05016
This one is for tabular (structured) data. By masking one subset of a table’s features and predicting their latent representation from another subset, it learns rich, label-agnostic embeddingsACT-JEPA -> https://huggingface.co/papers/2501.14622
Merges imitation and self-supervised learning to learn policy embeddings without heavy expert data. It predicts chunked actions and abstract observations in latent space, filtering noise, modeling dynamics, and cutting compounding errorsBrain-JEPA -> https://huggingface.co/papers/2409.19407
Applies JEPA in brain dynamics foundation model for demographic, disease, and trait prediction.3D-JEPA -> https://huggingface.co/papers/2409.15803
JEPA for 3D representation learning. It samples one rich context block and several target blocks, then predicts each target’s embedding from the contextPoint-JEPA -> https://huggingface.co/papers/2404.16432
Brings joint-embedding predictive learning to point clouds. A lightweight sequencer orders patch embeddings. It lets the model choose context and target patches and reuse distance calculations for speed

JEPA, or Joint Embedding Predictive Architecture, is an approach to building AI models introduced by Yann LeCun. It differs from transformers by predicting the representation of a missing or future part of the input, rather than the next token or pixel. This encourages conceptual understanding, not just low-level pattern matching. So JEPA allows teaching AI to reason abstractly.
Here are 12 types of JEPA you should know about:
1. I-JEPA -> Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (2301.08243)
A non-generative, self-supervised learning framework designed for processing images. It works by masking parts of the images and then trying to predict those masked parts
2. MC-JEPA -> MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features (2307.12698)
Simultaneously interprets video data - dynamic elements (motion) and static details (content) - using a shared encoder
3. V-JEPA -> Revisiting Feature Prediction for Learning Visual Representations from Video (2404.08471)
Presents vision models trained by predicting future video features, without pretrained image encoders, text, negative sampling, or reconstruction
4. UI-JEPA -> UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity (2409.04081)
Masks unlabeled UI sequences to learn abstract embeddings, then adds a fine-tuned LLM decoder for intent prediction.
5. Audio-based JEPA (A-JEPA) -> A-JEPA: Joint-Embedding Predictive Architecture Can Listen (2311.15830)
Masks spectrogram patches with a curriculum, encodes them, and predicts hidden representations.
6. S-JEPA -> S-JEPA: towards seamless cross-dataset transfer through dynamic spatial attention (2403.11772)
Signal-JEPA is used in EEG analysis. It adds a spatial block-masking scheme and three lightweight downstream classifiers
7. TI-JEPA -> TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems (2503.06380)
Text-Image JEPA uses self-supervised, energy-based pre-training to map text and images into a shared embedding space, improving cross-modal transfer to downstream tasks
Find more types below 👇
Also, explore the basics of JEPA in our article: https://www.turingpost.com/p/jepa
If you liked it, subscribe to the Turing Post: https://www.turingpost.com/subscribe