Welcome to matlok
matlok
AI & ML interests
Welcome! We share large, open source multimodal datasets for training and fine-tuning AI to write python and build AI models, we curate collections of guides, papers, datasets, models and tools like frankenmerging AI models.
Organizations
None yet
Papers - Text - Classification - FastText
Papers - RL - GRPO - Group Relative Policy Optimization
Papers - Coding - Agent
Papers - Fine-tuning - Preference Opt - Reward Free
Papers - Training - Pipeline - Memory - ZB-V
Papers - Training - Pipeline - Scheduler - Multi-GPU
Models - Text - Video
Models - 3D Asset Generator - Image to 3D Mesh
Papers - Interpretability - MoE
Papers - Custom Layers - PEER - Single Embedding
Papers - Training - Memory Augmented
Papers - Coding - Java
Papers - Coding - Dataset - BigCloneBench - BigCloneEval
https://github.com/clonebench/BigCloneBench and https://github.com/jeffsvajlenko/BigCloneEval
Papers - Coding - Defect Detection
Papers - Coding - CodeBert
Papers - Coding - Dataset - Compiler - IR - DeepDataFlow
https://zenodo.org/records/4247595
Papers - Coding - Classification - Algo Prediction - XFG
Papers - Coding - GNNs - CDFG
Papers - Coding - Compilers - Global Common Subexpression
Papers - Coding - Training - Classification - Algorithms
Papers - Coding - GNN - IR
Papers - Coding - Control Flow
Papers - Coding - IR - Intermediate Representations
Papers - Coding - Compilers - LLVM-IR
Papers - Coding - Compilers
Papers - Morphogenesis - Coding - Sorting
Papers - Coding - Sorting
Papers - Coding - Rust - Memory - Borrow Checker
Papers - Coding - Static Analysis
Papers - Coding - Translate - C to Rust - Repo - EverParse
Papers - Coding - Port to Rust
Papers - Coding - Rust
Models - Embeddings - Text - Research Papers - Arxiv
Papers - Embeddings - Freq n-gram Hash - Vocabulary Impacts
Papers - Text - Eval - Character Level - CUTE
Papers - Training - Bytes - Dynamic Patch Sizes
Datasets - Text - Classification - Multitask
Papers - Text - Eval - Coding - Python
Papers - Text - Dataset - Datacomp-LM
Papers - Training - Text - Datasets - Coding - GitHub
Papers - Text - Character Level RNNs
Papers - Training - Scaling - Bytes - BLT >= BPE Tokenizer
Papers - Attention - Flex Attention
https://pytorch.org/blog/flexattention/
Papers - Embeddings - Bytes - Flops - Input Layer Lookup
Papers - Attention - Bytes - Patch Cross Attention
Papers - Embeddings - Text - Byte - Hash ngrams
Papers - Tokenizers - Bytes - Incremental Patching
Note: BPE does not handle incremental patching like BLT
Papers - Tokenizers - Bytes - Space - First Char - Patch Len
Papers - Tokenizers - Bytes - Patches - Entropy-based
Patch start detected by entropy crossing a threshold
Papers - Text - Tokenizer - Bytes - Strided Patches
Papers - Audio - STT - ASR - wav2vec
Papers - Audio - Training - Mask Len Distribution - Ablation
Papers - Audio - Viz - Phoneme - Conditional Probability
Papers - Audio - Fine-tuning - Decoder only - SpecAugment
Papers - Audio - Pretraining - Fairseq
Papers - Audio - Dataset - LibriVox
Papers - Audio - Fine-tuning - Loss - CTC
Papers - Audio - Training - Activation - Gelu
Spaces - Reasoning
Papers - Encodings - BBPE - Byte level byte pair
Papers - Attention - QKV Bias - RMSNorm with Pre-normalizatn
Models - Qwen
-
Qwen/QwQ-32B-Preview
Text Generation • 33B • Updated • 25.2k • • 1.74k -
bartowski/Qwen2.5-Coder-14B-Instruct-GGUF
Text Generation • 15B • Updated • 2.74k • 38 -
bartowski/Qwen2.5-Coder-32B-Instruct-GGUF
Text Generation • 33B • Updated • 22.2k • 92 -
bartowski/Qwen2.5-72B-Instruct-GGUF
Text Generation • 73B • Updated • 16.3k • 38
Papers - Training - SGD - SGDM - SGD with Momentum
Papers - Training - Eval - Mix of Show
Papers - Training - LR - Optimizer - Prodigy
Papers - Pretraining - Image
Papers - Training - SGD - Regularization
Papers - Training - PyTorch
Papers - Training - LR - Gradient Signal to Noise Ratio
Papers - Training - LR - Learning Rate
Papers - Training - Optimizers
Papers - Training - Backward Masking
Papers - KV Cache - Spectrogram
Papers - Text - Midtraining - Rag - Recall - Rerank - ICL
Papers - Text - Training - Dataset Selection - Filtering
Papers - Text - Datasets - Math - AMC
Papers - Training - Overfitting - Decontamination
Papers - Pretraining - Synthetic Data - Reasoning
Papers - Training - Scaling Laws - Scaling Consistency
Papers - Training - Text - Vocabulary - SentencePiece
Papers - Training - Token Free - Bytes or Characters
Papers - Audio - Encoders - Bert
Papers - Video Games - Starcraft 2
Papers - Fine-tuning - Decoder Only - Frozen Encoder Weights
Papers - Reasoning - CoT - Tree Search - BFS
Papers - Reasoning - CoT - MCTS
Models - Biology - Protein - SAE
Papers - Robotics - Lie Groups
Papers - Math - Differential Geometry - Lie Theory
Papers - NEDL - Differential Geometry - Visualizations
Papers - Math - SGD - Stochastic Gradient Descent
Papers - Training - Convergence - Stoch Gradient Descent
Papers - Training - Convergence - SoftMax
Papers - Image - Rectified Flows
Papers - Image - DDPM - SDE
Papers - Image - Diffusion Coefficient - Deterministic
Papers - Image - Training - Sampler - SDE
Papers - Image - Datasets - Oxford Flowers
Papers - Image - Diffusion - Stochastic Interpolants
Papers - Text - SAE - Sparse Autoencoders
-
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Paper • 2411.14257 • Published • 14 -
Scaling and evaluating sparse autoencoders
Paper • 2406.04093 • Published • 3 -
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Paper • 2408.05147 • Published • 40 -
Disentangling Dense Embeddings with Sparse Autoencoders
Paper • 2408.00657 • Published • 1
Papers - SAT Solver
Papers - BitNet - Research - Classification - SAT Solvers
Papers - Training - Classification - Bit - SAT Solver
Papers - Visualizations - GPU Programming - Memory
Papers - ICL - Text - Classification - Label - Unique Words
Papers - Text - Encoders - DeBERTa
Papers - RL - Text - Prompts - Navigation - Maze Running
Papers - RL - Monte Carlo
Papers - NEDL - Training - Hyperparameters
Papers - Math - Distance - Pearson Correlation
Papers - NEDL - Embedding - Potential Distance
Papers - NEDL - Train - Diffusion - Geodesic
Papers - NEDL - Embedding - Geodesic - Euclidean Distance
Papers - Biology - RNA - Sequencing
Papers - Reasoning - Visualization - Pearson’s R
Papers - Training - Influence Functions - EK-FAC
Papers - Inference - CPU - Intel
Papers - NEDL - Visualization - Non-linearity - tSNE
Papers - NEDL - Fine-tuning - Multimodal Mixup
Papers - NEDL - Fine-tuning - Geometric Contrastive Learning
Papers - NEDL - Hypersphere
Papers - Training - Cauchy-Schwarz Inequality
Papers - Training - Non-linear Learning - Lipschitz
Papers - Training - Non-linear Learning - Kernel
Papers - Training - Kernel
Papers - Text - Fine-tuning - Loss - CCE - Triton
Papers - Gemma 2 - Fine-tuning
Papers - Text - Training - Vocabulary Sorting
Papers - Text - Train - Vocab - Dense Blocks Common Tokens
Papers - Triton
Papers - Text - Training - Batch Scaling - Cut Cross Entropy
Spaces - Image - Editing a Picture
Papers - Image - Fine-tuning - Editing - LAION-Aesthetics
Papers - Image - Benchmarks - Editing - BrushBench
Papers - Image - Generation Quality Models - Aesthetic Score
Papers - Image - Generation Quality Models - Image Reward
Papers - Image - BrushNet
Datasets - Coding - GitHub Issues
Papers - Text - Embedding - Noise - In-Batch Deduplication
Papers - Fine-tuning - LoRA - Text - Embedding - Sentence
Papers - Text - Datasets - GitHub Issues
Models - Text - Embedding - Multilingual
Models - Text - Reranker
Models - Text - Embedding - MRL
Papers - Embedding - Text - Sentence - 2DMSE
Models - Text - Sentence Embedding - Binary Quantization
Papers - CoT - Arch - Reasoning - Layer Depth vs Wider Layer
Papers - Math - Generate Synthetic Data
Papers - Flow Matching - Data Generation - XGBoost
Papers - Fine-tune - Text - Retry
Papers - Audio - Tokenizer
Papers - Text - Embedding - Sentence - R-BM25
Papers - Text - Embedding - Sentence - SONAR
Papers - Text - Encodings - Roberta
Papers - Text - Machine Translation
Papers - Image - OOD - Out of Distribution
Papers - Image - Guidance - Smooth Energy Guidance (SEG)
Papers - Text - Training - Wave Net
Papers - Text - Embedding - Fixed Token - Skip-gram
Papers - Fine-tuning - LoRA - Intruder Dimensions
Papers - Text - Inference - Early Stop - Filter Layers
Papers - Image - Training - Batch
Papers - Image - Datasets - XM-3600
Papers - Text - Tokens - Vocabulary - Herdan’s Law
Papers - Image - Visual Tokens
Papers - Datasets - Multimodal - YFCC100M
Papers - Datasets - Visualization - WizMap
Papers - Datasets - Image to Video
Papers - Fine-tuning - Self-Consistency - ScPO
Papers - Healthcare - CoT
Papers - Healthcare - Benchmarks
Papers - Training - Scaling Properties
Papers - Image - Autoregressive Visual Generation
Paperse - Mobile - Android
Papers - Fine-tuning - Machine Unlearning
Papers - Training - Knowledge Distillation - Tool Usage
Papers - Image - CoT
Papers - Attention - Token Parameter - Pattention
Papers - Flow Matching
-
Movie Gen: A Cast of Media Foundation Models
Paper • 2410.13720 • Published • 98 -
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Paper • 2410.06885 • Published • 46 -
Flow Matching for Generative Modeling
Paper • 2210.02747 • Published • 3 -
Matcha-TTS: A fast TTS architecture with conditional flow matching
Paper • 2309.03199 • Published • 13
Papers - Training - Differential Transformer
matlok - Python Code Instruction Datasets
Python Alpaca instructions from leading AI research and tools repositories - focus is on "Manager level" understanding atm
-
matlok/python-text-copilot-training-instruct-ai-research-2024-02-11
Viewer • Updated • 130 • 412 -
matlok/python-text-copilot-training-instruct-ai-research-2024-02-10
Viewer • Updated • 123 • 746 -
matlok/python-text-copilot-training-instruct-ai-research-2024-02-03
Viewer • Updated • 2.67k • 3.48k • 1 -
matlok/python-text-copilot-training-instruct-ai-research-2024-01-27
Viewer • Updated • 43.1k • 412
matlok - Python Src Code Datasets (base)
Python code from leading AI research and tools repositories
Dataset - Python Coding Alpaca Instructions
Audio Papers
there's many more on arxiv if you search for CLAP
-
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
Paper • 2211.06687 • Published • 4 -
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
Paper • 2401.17690 • Published • 5 -
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper • 2312.09911 • Published • 55 -
Audiobox: Unified Audio Generation with Natural Language Prompts
Paper • 2312.15821 • Published • 17
Multimodal Papers
-
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper • 2401.01885 • Published • 28 -
Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance
Paper • 2401.15687 • Published • 24 -
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Paper • 2312.17172 • Published • 30 -
MouSi: Poly-Visual-Expert Vision-Language Models
Paper • 2401.17221 • Published • 9
Coding Papers
There's usually interesting papers in the model cards on the leaderboard: https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard
-
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 31 -
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Paper • 2306.08568 • Published • 28 -
SantaCoder: don't reach for the stars!
Paper • 2301.03988 • Published • 7 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 66
Embedding Papers
-
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 82 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Metadata Might Make Language Models Better
Paper • 2211.10086 • Published • 4 -
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Paper • 2310.03686 • Published • 3
LMM
Large Multimodal Models
Non-English Embeddings and Models
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper • 2211.05100 • Published • 34 -
Contrastive Language-Image Pre-training for the Italian Language
Paper • 2108.08688 • Published • 2 -
IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation
Paper • 2203.03759 • Published • 5 -
Spanish Pre-trained BERT Model and Evaluation Data
Paper • 2308.02976 • Published • 3
More Alpaca Instruction Datasets
Actor Critic Papers
Search papers from a url
Audio models
-
metavoiceio/metavoice-1B-v0.1
Text-to-Speech • Updated • 438 • 789 -
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Paper • 2402.08093 • Published • 62 -
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 195 -
SWivid/F5-TTS
Text-to-Speech • Updated • 604k • 1.12k
Datasets - Geospatial
Papers - Geospatial
Datasets - Financial
Models - Testing
Papers - Context
-
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Paper • 2402.10790 • Published • 42 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 18 -
A Neural Conversational Model
Paper • 1506.05869 • Published • 2 -
Data Engineering for Scaling Language Models to 128K Context
Paper • 2402.10171 • Published • 25
Tuning - Dora
Models - Multimodal
-
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper • 2402.12226 • Published • 45 -
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition
Paper • 2401.11649 • Published • 3 -
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition
Paper • 2402.15504 • Published • 22 -
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 195
Models - n-gram and Kneser-Ney
-
A Generalized Language Model as the Combination of Skipped n-grams and Modified Kneser-Ney Smoothing
Paper • 1404.3377 • Published • 2 -
Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation
Paper • 1412.1454 • Published • 2 -
Neural Text Generation from Structured Data with Application to the Biography Domain
Paper • 1603.07771 • Published • 2 -
Distributed Representations of Words and Phrases and their Compositionality
Paper • 1310.4546 • Published • 3
Papers - Multi-turn Conversations
Models - Watermarking
Papers - Fintech - Benchmarks
Models - Video
-
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper • 2402.13217 • Published • 37 -
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 195 -
Qwen/Qwen-VL-Chat
Text Generation • Updated • 63.3k • 374 -
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper • 2403.01422 • Published • 29
Models - NeRFs - Image Radiance Fields
-
Improving Robustness for Joint Optimization of Camera Poses and Decomposed Low-Rank Tensorial Radiance Fields
Paper • 2402.13252 • Published • 19 -
RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs
Paper • 2112.00724 • Published • 2 -
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts
Paper • 2308.11793 • Published • 2
Models - Predicting Models
Datasets - Coding
Models - Text
-
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Paper • 1909.08053 • Published • 3 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 14 -
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Paper • 2304.01373 • Published • 9
Papers - Decoders
-
Lossless Acceleration for Seq2seq Generation with Aggressive Decoding
Paper • 2205.10350 • Published • 2 -
Blockwise Parallel Decoding for Deep Autoregressive Models
Paper • 1811.03115 • Published • 2 -
Fast Transformer Decoding: One Write-Head is All You Need
Paper • 1911.02150 • Published • 9 -
Sequence-Level Knowledge Distillation
Paper • 1606.07947 • Published • 2
Datasets - Text
Models - Large Scale
Papers - Transfer Learning
Datasets - Binarized
Papers - Pipeline - Multimodal
Models - Gaming
Papers - Learning and Compression
Models - Quants
Models - Image
-
Geometric Algebra Transformers
Paper • 2305.18415 • Published • 2 -
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 40 -
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Paper • 1503.03585 • Published • 5 -
IDKiro/sdxs-512-0.9
Text-to-Image • Updated • 443 • 109
Spaces - Math
Models - Base - 7B
Spaces - Vision
Models - Science
Models - Byte Transformer
Papers - RoPE
-
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 24 -
Scaling Laws of RoPE-based Extrapolation
Paper • 2310.05209 • Published • 8 -
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Paper • 2404.12387 • Published • 39 -
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 126
Papers - Model Scaling
-
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 9 -
An Empirical Model of Large-Batch Training
Paper • 1812.06162 • Published • 3 -
Measuring the Effects of Data Parallelism on Neural Network Training
Paper • 1811.03600 • Published • 2 -
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2
Models - UI - Front-End
Models - Text - Explanation
Papers - QLoRA
Papers - Sequence Parallelism
Helpful - VRAM Calculator
Models - Video Generation
Papers - Masked Sequence Packing
Papers - Fine-tuning - Multimodal
Spaces - Coding
Datasets - Audio
Models - Audio - Sheet Music Gen
Datasets - Text - Instruction (non-Alpaca)
Papers - Benchmarks - Image and Text
Models - Suggest - Audiobooks from Playlist
Papers - MoE
-
Non-asymptotic oracle inequalities for the Lasso in high-dimensional mixture of experts
Paper • 2009.10622 • Published • 1 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 53 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 73 -
MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving
Paper • 2401.14361 • Published • 2
Models - IoT
Models - MoE - Mulitmodal
-
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Paper • 2308.11971 • Published • 2 -
HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph Completion
Paper • 2308.06512 • Published • 2 -
Unraveling Complex Data Diversity in Underwater Acoustic Target Recognition through Convolution-based Mixture of Experts
Paper • 2402.11919 • Published • 2
Papers - Image - Knowledge Graphs
-
Multi-view Contrastive Learning for Entity Typing over Knowledge Graphs
Paper • 2310.12008 • Published • 2 -
HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph Completion
Paper • 2308.06512 • Published • 2 -
ARIEL: Adversarial Graph Contrastive Learning
Paper • 2208.06956 • Published • 2 -
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Paper • 2401.18059 • Published • 46
Papers - Image - MoE
-
Robust Mixture-of-Expert Training for Convolutional Neural Networks
Paper • 2308.10110 • Published • 2 -
HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph Completion
Paper • 2308.06512 • Published • 2 -
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Paper • 2309.04354 • Published • 15 -
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Paper • 2212.05055 • Published • 6
Papers - Lora - LCM
Models - Image - Lora
Models - MoE - Constitutional Experts
Models - MoE - Training using Lora
Papers - MoE - Prompt Immunity
Models - MoE - Audio - Underwater Acoustics
Papers - MoE - Malicious Queries
Models - MoE - Image
Papers - MoE - Scaling
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 7 -
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Paper • 2202.08906 • Published • 2 -
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 44
Papers - MoE - Deny an Expert
Papers - MoE - Frankenmerge
Papers - Image - Bounding Box
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 188 -
Unifying Vision, Text, and Layout for Universal Document Processing
Paper • 2212.02623 • Published • 11 -
Grounded Language-Image Pre-training
Paper • 2112.03857 • Published • 3 -
ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model
Paper • 2404.07773 • Published • 1
Papers - Exploit - Model Layer Retrieval
Papers - Image - Dataset Generator
Papers - Video - Mamba
Papers - Fine-tuning - Home Lab
-
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU
Paper • 2403.06504 • Published • 55 -
Token-Level Adaptation of LoRA Adapters for Downstream Task Generalization
Paper • 2311.10847 • Published • 2 -
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 59
Papers - MoE - Attention
Papers - Image - MoE - IoT
Papers - MoE - Router - Task
Papers - MoE - Federated Learning
Papers - MoE - Router - Research
-
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts
Paper • 2306.04845 • Published • 4 -
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks
Paper • 2306.04073 • Published • 2 -
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 44 -
Unified Scaling Laws for Routed Language Models
Paper • 2202.01169 • Published • 2
Papers - MoE - IoT
Papers - Image - OCR Handwriting
-
Vulnerability Analysis of Transformer-based Optical Character Recognition to Adversarial Attacks
Paper • 2311.17128 • Published • 2 -
Data Generation for Post-OCR correction of Cyrillic handwriting
Paper • 2311.15896 • Published • 4 -
An End-to-End OCR Framework for Robust Arabic-Handwriting Recognition using a Novel Transformers-based Model and an Innovative 270 Million-Words Multi-Font Corpus of Classical Arabic with Diacritics
Paper • 2208.11484 • Published • 3 -
Transformer based Urdu Handwritten Text Optical Character Reader
Paper • 2206.04575 • Published • 2
Papers - Image - Segment - Handwriting
-
Character Queries: A Transformer-based Approach to On-Line Handwritten Character Segmentation
Paper • 2309.03072 • Published • 2 -
Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models
Paper • 2309.01674 • Published • 2 -
Segment Anything
Paper • 2304.02643 • Published • 4
Papers - Image - Handwritten Characters
-
Disentangling Writer and Character Styles for Handwriting Generation
Paper • 2303.14736 • Published • 3 -
A Transformer Architecture for Online Gesture Recognition of Mathematical Expressions
Paper • 2211.02643 • Published • 2 -
A tailored Handwritten-Text-Recognition System for Medieval Latin
Paper • 2308.09368 • Published • 3 -
Scalable handwritten text recognition system for lexicographic sources of under-resourced languages and alphabets
Paper • 2303.16256 • Published • 2
Papers - Image - HTR - Math Gestures and Symbols
Models - Text - Multilingual
Papers - Benchmark - Handwriting Recognition
Datasets - Image - Handwritten Recognition
GitHub: https://github.com/Planet-AI-GmbH/tfaip-hybrid-ctc-s2s and math: https://storage.googleapis.com/mathwriting_data/mathwriting-2024.tgz
Papers - Image - Handwriting Recognition - Tetrolets
Papers - Text - Encoders
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15 -
Triple-Encoders: Representations That Fire Together, Wire Together
Paper • 2402.12332 • Published • 2 -
BERTs are Generative In-Context Learners
Paper • 2406.04823 • Published • 1
Papers - Text - Bidirectional - Bio
Papers - Text - Pre-training
-
Pretraining-Based Natural Language Generation for Text Summarization
Paper • 1902.09243 • Published • 2 -
Unified Vision-Language Pre-Training for Image Captioning and VQA
Paper • 1909.11059 • Published • 2 -
When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes
Paper • 2404.12365 • Published • 1
Papers - Text - Pre-training - Decoder Multi-Steps
Papers - Image - Multimodal - Handwriting Recognition
-
Representing Online Handwriting for Recognition in Large Vision-Language Models
Paper • 2402.15307 • Published • 3 -
Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition
Paper • 1903.07377 • Published • 2 -
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Paper • 2310.16527 • Published • 2 -
Detecting and recognizing characters in Greek papyri with YOLOv8, DeiT and SimCLR
Paper • 2401.12513 • Published • 1
Papers - Text - Multilingual
-
mT5: A massively multilingual pre-trained text-to-text transformer
Paper • 2010.11934 • Published • 4 -
mSLAM: Massively multilingual joint pre-training for speech and text
Paper • 2202.01374 • Published • 2 -
DeepNet: Scaling Transformers to 1,000 Layers
Paper • 2203.00555 • Published • 2 -
SONAR: Sentence-Level Multimodal and Language-Agnostic Representations
Paper • 2308.11466 • Published • 1
Papers - Multimodal - Speech and Text - Multilingual
Papers - Multimodal - Document Analysis
-
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Paper • 2310.16527 • Published • 2 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 188 -
Unifying Vision, Text, and Layout for Universal Document Processing
Paper • 2212.02623 • Published • 11
Papers - Video - Entity Recognition
-
DragAnything: Motion Control for Anything using Entity Representation
Paper • 2403.07420 • Published • 15 -
Capabilities of Gemini Models in Medicine
Paper • 2404.18416 • Published • 24 -
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Paper • 2406.08407 • Published • 28
Papers - Image - Pre-training
Papers - Image - Synthetic Data Generator
-
Synth^2: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
Paper • 2403.07750 • Published • 24 -
Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering
Paper • 2403.14554 • Published • 14 -
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Paper • 2403.18818 • Published • 28
Papers - ResNet
-
Wide Residual Networks
Paper • 1605.07146 • Published • 2 -
Characterizing signal propagation to close the performance gap in unnormalized ResNets
Paper • 2101.08692 • Published • 2 -
Pareto-Optimal Quantized ResNet Is Mostly 4-bit
Paper • 2105.03536 • Published • 2 -
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Paper • 2106.01548 • Published • 2
Papers - Federated Learning
Papers - Image - Human Motion Generator
Papers - Autonomous Drones
Papers - Multimodal - Drone - Object Manipulation
Papers - Pre-training - Time Series
Papers - Training - Hardware Detection
Papers - Image - IoT - Split Computing
Papers - Image - Segmentation
-
Image Segmentation using U-Net Architecture for Powder X-ray Diffraction Images
Paper • 2310.16186 • Published • 2 -
H-DenseUNet: Hybrid Densely Connected UNet for Liver and Tumor Segmentation from CT Volumes
Paper • 1709.07330 • Published • 2 -
Deep LOGISMOS: Deep Learning Graph-based 3D Segmentation of Pancreatic Tumors on CT scans
Paper • 1801.08599 • Published • 2 -
RTSeg: Real-time Semantic Segmentation Comparative Study
Paper • 1803.02758 • Published • 2
Papers - Video - Synthetic Data Generator
-
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper • 2403.01422 • Published • 29 -
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding
Paper • 2403.09530 • Published • 10 -
VidToMe: Video Token Merging for Zero-Shot Video Editing
Paper • 2312.10656 • Published • 11 -
TC4D: Trajectory-Conditioned Text-to-4D Generation
Paper • 2403.17920 • Published • 18
Papers - Image - Segmentation - Report
-
Generalizability vs. Robustness: Adversarial Examples for Medical Imaging
Paper • 1804.00504 • Published • 2 -
Evaluating Transformer-based Semantic Segmentation Networks for Pathological Image Segmentation
Paper • 2108.11993 • Published • 2 -
From Modern CNNs to Vision Transformers: Assessing the Performance, Robustness, and Classification Strategies of Deep Learning Models in Histopathology
Paper • 2204.05044 • Published • 2
Papers - Image - Segmentation - MRI
Papers - Image - SkipNet
Papers - Image - Hybrid
-
3D Medical Image Segmentation based on multi-scale MPU-Net
Paper • 2307.05799 • Published • 2 -
Joint Liver and Hepatic Lesion Segmentation in MRI using a Hybrid CNN with Transformer Layers
Paper • 2201.10981 • Published • 2 -
Using Multi-scale SwinTransformer-HTC with Data augmentation in CoNIC Challenge
Paper • 2202.13588 • Published • 2
Papers - Image - Hybrid - Swin - U-Net
-
Attention Swin U-Net: Cross-Contextual Attention Mechanism for Skin Lesion Segmentation
Paper • 2210.16898 • Published • 2 -
Cross-Shaped Windows Transformer with Self-supervised Pretraining for Clinically Significant Prostate Cancer Detection in Bi-parametric MRI
Paper • 2305.00385 • Published • 2 -
Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss
Paper • 2404.02731 • Published • 1
Papers - Image - Segmentation - Quantum
Papers - Image - Hybrid - Patient Meta Data - U-Net
Papers - Image - Encoders
-
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Paper • 2107.00652 • Published • 2 -
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Paper • 2403.09622 • Published • 18 -
Veagle: Advancements in Multimodal Representation Learning
Paper • 2403.08773 • Published • 10 -
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Paper • 2304.14178 • Published • 3
Papers - Image - Attention - BOAT - Bilateral Local Attn
Papers - Image - Swin
-
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Paper • 2103.14030 • Published • 5 -
A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images
Paper • 2104.12137 • Published • 2 -
Self-Supervised Learning with Swin Transformers
Paper • 2105.04553 • Published • 3 -
Evaluating Transformer-based Semantic Segmentation Networks for Pathological Image Segmentation
Paper • 2108.11993 • Published • 2
Papers - BYOL
Papers - Text - Model Guided Training
Papers - Image - Hybrid - Hybrid Task Cascade (HTC) - Swin
Papers - Image - Dino
-
Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology
Paper • 2203.00585 • Published • 2 -
Emerging Properties in Self-Supervised Vision Transformers
Paper • 2104.14294 • Published • 3 -
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
Paper • 2404.06903 • Published • 21 -
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 32
Papers - DenseNet
Papers - Image - EfficientNet
Papers - Base Models - Text - Coding
Papers - AI - Social Risks
Papers - Testing - Single Layer Model
Papers - Pre-training
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Paper • 2403.15042 • Published • 27 -
LIMA: Less Is More for Alignment
Paper • 2305.11206 • Published • 26
Papers - Fine-tuning
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
SELF: Language-Driven Self-Evolution for Large Language Model
Paper • 2310.00533 • Published • 2 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 56 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 45
Papers - Reinforcement Learning
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
SELF: Language-Driven Self-Evolution for Large Language Model
Paper • 2310.00533 • Published • 2 -
Bigger, Better, Faster: Human-level Atari with human-level efficiency
Paper • 2305.19452 • Published • 4 -
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Paper • 2408.08152 • Published • 59
Papers - Training
-
SELF: Language-Driven Self-Evolution for Large Language Model
Paper • 2310.00533 • Published • 2 -
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
Paper • 2310.00576 • Published • 2 -
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
Paper • 2305.13169 • Published • 3 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15
Papers - Text - Math
-
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Paper • 2310.00535 • Published • 2 -
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
Paper • 2308.07921 • Published • 23 -
AutoNumerics-Zero: Automated Discovery of State-of-the-Art Mathematical Functions
Paper • 2312.08472 • Published • 2 -
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Paper • 2408.16293 • Published • 27
Papers - Multimodal - Healthcare
Papers - Named Entity Extraction - Healthcare
Papers - Healthcare
-
MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data
Paper • 2304.08247 • Published • 2 -
Structural Similarities Between Language Models and Neural Response Measurements
Paper • 2306.01930 • Published • 2 -
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V
Paper • 2310.19061 • Published • 8 -
Question-Answering Model for Schizophrenia Symptoms and Their Impact on Daily Life using Mental Health Forums Data
Paper • 2310.00448 • Published
Papers - Image - Clip
-
Demystifying CLIP Data
Paper • 2309.16671 • Published • 20 -
Model Stock: All we need is just a few fine-tuned models
Paper • 2403.19522 • Published • 13 -
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Paper • 2404.01367 • Published • 22 -
On the Scalability of Diffusion-based Text-to-Image Generation
Paper • 2404.02883 • Published • 19
Papers - Disaster Recovery
Papers - Neural Architecture Search - Report
Papers - Neural Architecture Search - Tabular Data
Papers - Image - Neural Architecture Search
Papers - Neural Architecture Search - Reinforcement Learning
Papers - AutoML
-
Unified Functional Hashing in Automatic Machine Learning
Paper • 2302.05433 • Published • 2 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 117 -
Semi-Supervised Semantic Segmentation using Redesigned Self-Training for White Blood Cells
Paper • 2401.07278 • Published • 2
Papers - Testing - Speech and Text
Papers - Automated Training - Self Discover
Papers - Math - Research
-
AutoNumerics-Zero: Automated Discovery of State-of-the-Art Mathematical Functions
Paper • 2312.08472 • Published • 2 -
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Paper • 2403.14624 • Published • 53 -
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Paper • 2404.02893 • Published • 22 -
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 93
Papers - Critical Thinking - Step Back
Papers - Text - Length Generalization
Papers - Image - Multi-Image Reasoning
Papers - Image - Text and Symbolic Image Generator
Papers - Multimodal - Text to 2D to 3D Mesh
Datasets - Multimodal - Text and Image
-
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 55 -
HuggingFaceM4/WebSight
Viewer • Updated • 2.75M • 16.8k • 372 -
HuggingFaceM4/VLM_WebSight_finetuned
Text Generation • 8B • Updated • 584 • 190 -
laion/filtered-wit
Viewer • Updated • 2.8M • 4.67k • 10
Papers - Image - Selective Scan
Papers - Encoders
-
Functional Interpolation for Relative Positions Improves Long Context Transformers
Paper • 2310.04418 • Published • 4 -
SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs
Paper • 2106.09997 • Published • 2 -
Neural Machine Translation of Rare Words with Subword Units
Paper • 1508.07909 • Published • 4 -
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
Paper • 2403.14438 • Published • 2
Papers - Video - Understanding with Many Models
Papers - Image - Understanding
-
Veagle: Advancements in Multimodal Representation Learning
Paper • 2403.08773 • Published • 10 -
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Paper • 2304.14178 • Published • 3 -
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
Paper • 2403.12596 • Published • 11 -
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Paper • 2403.11703 • Published • 17
Papers - Image - GiT
Papers - QFormer
Papers - Image - Attention - Window
-
Vision Transformer with Quadrangle Attention
Paper • 2303.15105 • Published • 2 -
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Paper • 2103.14030 • Published • 5 -
MAFormer: A Transformer Network with Multi-scale Attention Fusion for Visual Recognition
Paper • 2209.01620 • Published • 2 -
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Paper • 2107.00652 • Published • 2
Papers - Image - Training - Noise
Papers - Image - Training - Quantized Mask
Papers - Image - Training - Seed Vector
Papers - Blockwise Parallel
Papers - Training - Masked Sequence Packing
Papers - Semantic Segmentation
Papers - Training - Self-Training - Student and Teacher
Papers - Training - Guided Task Flow
Papers - Structured Thoughts
Papers - Chinchilla
Papers - Custom Layers - Hash Layers
Papers - Hallucination - Reduction
Papers - Reading Comprehension
Papers - Training - Chain of Thought
Papers - Ethics
-
Exploring Large Language Models' Cognitive Moral Development through Defining Issues Test
Paper • 2309.13356 • Published • 37 -
Unveiling Safety Vulnerabilities of Large Language Models
Paper • 2311.04124 • Published • 10 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69 -
Evaluating Frontier Models for Dangerous Capabilities
Paper • 2403.13793 • Published • 7
Papers - Fine-tuning - Understanding Tables
Datasets - Text - Tabular
Papers - Qwen
-
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 11 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 10 -
Audio Dialogues: Dialogues dataset for audio and music understanding
Paper • 2404.07616 • Published • 16
Papers - Multimodal - Report
significant improvements in zero-shot performance require exponentially more data, following a log-linear scaling trend
Papers - Attention - Custom Encoder
Papers - Research - Safety
Papers - Reduce Model Size - SliceGPT
Papers - Rag
-
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
Paper • 2401.15391 • Published • 6 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 72 -
Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation
Paper • 2404.06910 • Published • 3 -
Stylus: Automatic Adapter Selection for Diffusion Models
Paper • 2404.18928 • Published • 15
Papers - Encoders - Coding
Embeddings - Coding - CodeBert
Papers - Coding - Fill in the Middle - Infilling
Papers - Training - Knowledge Graphs
Papers - Image - Training - Adversarial
Papers - Text - Tabular - Conditional Formatting
Papers - Coding - Out of Vocabulary
Papers - Automatic Speech Recognition
-
Streaming Transformer ASR with Blockwise Synchronous Beam Search
Paper • 2006.14941 • Published • 2 -
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
Paper • 2403.14438 • Published • 2 -
SONAR: Sentence-Level Multimodal and Language-Agnostic Representations
Paper • 2308.11466 • Published • 1
Papers - Beam Search
Papers - Training - Synthetic Data - Sycophancy
Papers - Training - Domain Reweighting
Papers - Training - Proxy Model - Group DRO
Papers - Coding - Decoding with Static Analysis
Papers - UDOP
Papers - Multimodal - Document and Text
Papers - Encoder - Byte-Pair Encoding
-
Neural Machine Translation of Rare Words with Subword Units
Paper • 1508.07909 • Published • 4 -
A Formal Perspective on Byte-Pair Encoding
Paper • 2306.16837 • Published • 3 -
Byte-Pair Encoding for Text-to-SQL Generation
Paper • 1910.08962 • Published • 2 -
Pattern Discovery in Time Series with Byte Pair Encoding
Paper • 2106.00614 • Published • 2
Papers - Science - Research Analysis
Papers - Attention - Tree Attention
-
Recurrent Drafter for Fast Speculative Decoding in Large Language Models
Paper • 2403.09919 • Published • 22 -
SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Paper • 2305.09781 • Published • 4 -
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
Paper • 2408.04093 • Published • 4
Models - Table - Extraction
-
microsoft/table-transformer-detection
Object Detection • 28.8M • Updated • 1.55M • 379 -
RUCKBReasoning/TableLLM-13b
Text Generation • 13B • Updated • 49 • 31 -
RUCKBReasoning/TableLLM-7b
Text Generation • 7B • Updated • 84 • 15 -
TahaDouaji/detr-doc-table-detection
Object Detection • 41.6M • Updated • 185k • 60
Papers - Audio - GAN - Upsamplimg
Papers - Image - Illumination
Papers - Image - Edit
Papers - Fine-tuning - Parameter Efficiency
Papers - Text - 3D Mesh - Volumetric
Papers - Image - Limited-Training
Papers - Image - Plot - Understanding and Reasoning
Papers - Text - Taxonomy Generator
Papers - Fine-tuning - Language Model Policy with LoRA
Papers - Robotic - Observational Learning
Papers - Training - Skill Learning
Papers - mPlug-Owl
Papers - Document - mPlugOwl
Papers - Prompt - Prompt Compression - Report
Papers - Image - Gaussian Splatting and NeRF
-
GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation
Paper • 2403.12365 • Published • 11 -
RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS
Paper • 2403.13806 • Published • 18 -
Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering
Paper • 2403.14554 • Published • 14 -
DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
Paper • 2403.17237 • Published • 11
Models - Reverse Engineering
Models - Table - Structure - Recognition
Paper - Image - Table
Papers - Image - Object Detection
-
End-to-End Object Detection with Transformers
Paper • 2005.12872 • Published • 7 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 30 -
Grounded Language-Image Pre-training
Paper • 2112.03857 • Published • 3 -
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Paper • 2311.06242 • Published • 94
Papers - Benchmarks - Reward Models
Papers - Science - Molecule
Papers - Image - Frankenmerging
Papers - Attention - Grouped-Query Attention (GQA)
-
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 6 -
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Paper • 2404.12387 • Published • 39 -
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 126 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 166
Papers - Benchmarks - Math
-
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Paper • 2403.14624 • Published • 53 -
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Paper • 2407.01284 • Published • 81 -
DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models
Paper • 2411.00836 • Published • 15
Papers - Multimodal - Mamba
Papers - Image - Personalization
Papers - Image - Blip
Papers - Image - Video Generator
-
Explorative Inbetweening of Time and Space
Paper • 2403.14611 • Published • 13 -
StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN
Paper • 2403.14186 • Published • 10 -
Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
Paper • 2403.18795 • Published • 20
Papers - Video - Time Reversal Fusion
Papers - Image - Video - Adversarial (GAN)
Papers - Fine-tuning - Toxicity
Papers - Decoders - Chain of Thought
Papers - Image - Flow Matching
Papers - Text - Classification - Social Media
Papers - Text - Training - Classification
Papers - Multimodal - Audio
Papers - Encoders - Audio
Papers - Math - Derive New Math - Function Class
Papers - Agent - Memory
Papers - Critic Models
Papers - Security
-
Python Fuzzing for Trustworthy Machine Learning Frameworks
Paper • 2403.12723 • Published • 2 -
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?
Paper • 2404.03411 • Published • 11 -
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Paper • 2406.01637 • Published • 2 -
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
Paper • 2407.12784 • Published • 51
Papers - Reasoning - Critic Pattern
-
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Paper • 2305.11738 • Published • 8 -
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Paper • 2402.14809 • Published • 3 -
DRLC: Reinforcement Learning with Dense Rewards from LLM Critic
Paper • 2401.07382 • Published • 2
Papers - Sports
Papers - Pop Culture
Papers - Coding - Training
Papers - Coding - Reasoning
-
V-STaR: Training Verifiers for Self-Taught Reasoners
Paper • 2402.06457 • Published • 9 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 46 -
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Paper • 2404.02575 • Published • 50
Papers - Video - Streaming
Papers - Encoders - Video
Papers - Multimodal - Captions - Audio
Papers - Multimodal - Captions - Video
Papers - 3D
-
ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars
Paper • 2403.15383 • Published • 15 -
FlexiDreamer: Single Image-to-3D Generation with FlexiCubes
Paper • 2404.00987 • Published • 23 -
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Paper • 2402.15627 • Published • 38 -
Interactive3D: Create What You Want by Interactive 3D Generation
Paper • 2404.16510 • Published • 21
Papers - Document - Understanding
-
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
Paper • 2403.15246 • Published • 11 -
Noise-Aware Training of Layout-Aware Language Models
Paper • 2404.00488 • Published • 10 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 27
Papers - Compiler
Papers - LLVM
Papers - Tree of Thoughts
Papers - Coding - Stack Traces
Papers - Fine-tuning - Search Based
Papers - Encoders - T5
Papers - T5
-
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Paper • 2402.14083 • Published • 48 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 6 -
Training a T5 Using Lab-sized Resources
Paper • 2208.12097 • Published • 1 -
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Paper • 2212.05055 • Published • 6
Papers - Training - Weighted Average
Papers - Training - Fitness Score
Papers - Fine-tuning - Prompts
Models - T5
Papers - Encoders - VAE
Papers - Image - Synthetic Data - Human Faces
Papers - Fine-tuning - Multilingual
Papers - SAM - Segment Anything Model
-
Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models
Paper • 2309.01674 • Published • 2 -
Segment Anything
Paper • 2304.02643 • Published • 4 -
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Paper • 2403.18118 • Published • 12 -
A Multimodal Automated Interpretability Agent
Paper • 2404.14394 • Published • 22
Papers - Image - Explainability
Papers - Image - Pattern Recognition
Papers - Training - Distribution-based
Papers - Image - In-Context Learning
Papers - Deepmind - ICL Rule-based Classification
Spaces - Decoders - Beam Search Visualizer
Spaces - Decoders
Papers - FAIR
Papers - Healthcare - Text - Antibodies
Papers - Performance - Intel
Papers - VQA
Papers - Fine-tuning - Report
Papers - Video - Enhance
Papers - Meta
-
LIMA: Less Is More for Alignment
Paper • 2305.11206 • Published • 26 -
Garment3DGen: 3D Garment Stylization and Texture Generation
Paper • 2403.18816 • Published • 25 -
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Paper • 2403.18118 • Published • 12 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 82
Papers - Image - Avatar Generator
Papers - Healthcare - Synthetic Data Generator - 3D
Could also use a dna repo like: https://github.com/koeng101/dnadesign
Datasets - Fine-tuning
Papers - University - MIT
-
One-step Diffusion with Distribution Matching Distillation
Paper • 2311.18828 • Published • 3 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 82 -
Condition-Aware Neural Network for Controlled Image Generation
Paper • 2404.01143 • Published • 13 -
Locating and Editing Factual Associations in GPT
Paper • 2202.05262 • Published • 1
Papers - Image - MultiDiffusion
Papers - Convert - T2I to T2V
Papers - OpenAI
-
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
Paper • 1801.03924 • Published • 2 -
Fine-Tuning Language Models from Human Preferences
Paper • 1909.08593 • Published • 3 -
Training Verifiers to Solve Math Word Problems
Paper • 2110.14168 • Published • 4 -
Learning Transferable Visual Models From Natural Language Supervision
Paper • 2103.00020 • Published • 17
Papers - RWKV
Papers - Text - Fact Checking
Papers - Healthcare - Text
Papers - University - Stanford University
-
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
Paper • 2403.18421 • Published • 23 -
Long-form factuality in large language models
Paper • 2403.18802 • Published • 26 -
stanford-crfm/BioMedLM
Text Generation • Updated • 1.83k • 440 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63
Models - Healthcare
Papers - Encoders - Synthetic Noise
Papers - Video - Clothing
Papers - IoT - Assistant
Papers - Training Research - AD FOFE
Papers - Image - Editing - Object Insertion
Papers - 3DGS - Feature Rendering
Papers - 3DGS - Security Camera Object Detection
Papers - University - Carnegie Mellon University
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 33 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 37 -
PIQA: Reasoning about Physical Commonsense in Natural Language
Paper • 1911.11641 • Published • 3 -
AQuA: A Benchmarking Tool for Label Quality Assessment
Paper • 2306.09467 • Published • 1
Papers - Healthcare - Image - SynthRAD2023
Models - MoE - GQA
Models - MoE - Coding
Papers - Text - Translation
Papers - Image - Synthetic Noise
Papers - Johns Hopkins
Papers - Intel
-
Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation
Paper • 2403.19319 • Published • 14 -
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper • 2404.01197 • Published • 31 -
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 27 -
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
Paper • 2404.03118 • Published • 26
Papers - Image - Encoders - Text
Papers - Video - Reasoning - Time of Events
Papers - Video - Training - Understanding Time
Papers - U-Net - 3D
Models - Fine-tuning
Papers - Fine-tuning - Preference-based RL (PbRL)
-
Dueling RL: Reinforcement Learning with Trajectory Preferences
Paper • 2111.04850 • Published • 2 -
Learning Trajectory Preferences for Manipulators via Iterative Improvement
Paper • 1306.6294 • Published • 3 -
Deep reinforcement learning from human preferences
Paper • 1706.03741 • Published • 4 -
Learning Dynamic Robot-to-Human Object Handover from Human Feedback
Paper • 1603.06390 • Published • 2
Papers - Robotics - Fine-tuning - PbRL
Papers - Reward Model
-
Fine-Tuning Language Models from Human Preferences
Paper • 1909.08593 • Published • 3 -
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 12 -
Leverage the Average: an Analysis of KL Regularization in RL
Paper • 2003.14089 • Published • 2 -
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Paper • 2404.01258 • Published • 12
Papers - Reward Model - Training
-
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 12 -
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper • 2310.01377 • Published • 5 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 89 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 124
Papers - Reward Models - KL Regularization - RL
Models - Fine-tuning - PPO
Papers - Fine-tuning - Emulator
Datasets - Fine-tuning - RLHF
Papers - top-k - Flat (good) vs Peaked (bad) Dist Sampling
Figure 5: The probability mass assigned to partial human sentences. Flat distributions lead to many
moderately probable tokens, while peaked distribut
Papers - Institute - Allen Institute
-
The Curious Case of Neural Text Degeneration
Paper • 1904.09751 • Published • 3 -
PIQA: Reasoning about Physical Commonsense in Natural Language
Paper • 1911.11641 • Published • 3 -
SocialIQA: Commonsense Reasoning about Social Interactions
Paper • 1904.09728 • Published • 3 -
HellaSwag: Can a Machine Really Finish Your Sentence?
Paper • 1905.07830 • Published • 6
Models - 1bit
Papers - Coding - Unit Tests
Papers - Audio - WaveNet
Papers - Audio - TTS
-
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Paper • 1712.05884 • Published • 3 -
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Paper • 2403.16973 • Published • 2 -
High Fidelity Neural Audio Compression
Paper • 2210.13438 • Published • 4 -
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 10
Papers - Decoders - Audio
Papers - Image - GAN
Papers - GAN - Compression
Papers - Audio - Speech Transcription
Papers - Audio - Voice Activity Detection
Models - Audio - TTS
Models - Audio
Models - Audio - Encoders
Models - FAIR
Models - Audio - Music Generator
Models - TinyLlama
-
keeeeenw/MicroLlama
Text Generation • 0.3B • Updated • 1.47k • 51 -
TinyLlama/TinyLlama-1.1B-intermediate-step-240k-503b
Text Generation • 1B • Updated • 458 • • 20 -
TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
Text Generation • 1B • Updated • 20.6k • • 180 -
TinyLlama/TinyLlama-1.1B-Chat-v1.0
Text Generation • 1B • Updated • 4.23M • 1.44k
Models - Starling
Datasets - Starling
Papers - Audio - Residual Vector Quantization
Models - Image - Object Detection - DETR
Papers - Audio - Inference - Rescore Models
Inference - Autoregressive and Non-Autoregressive Models
Models - Text - Music Generator
Papers - Touch
Papers - Flan-T5
Papers - Mobile - User Entity Context Understanding
Models - MoE - Mamba
Papers - University of Tokyo
Papers - Duke
Papers - Image - Report
Papers - Trustworthiness
Papers - Healthcare - Surgical Gestures
Papers - Fine-tuning - Dataset - Few-Shot Retrieval (FRet)
Papers - Embeddings
-
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Paper • 2403.20327 • Published • 48 -
Round and Round We Go! What makes Rotary Positional Encodings useful?
Paper • 2410.06205 • Published • 2 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Paper • 2410.20771 • Published • 3
Papers - Text - Memorization
Gradients flow differently for memorized and non-memorized during decoding
Papers - Huawei
-
DiJiang: Efficient Large Language Models through Compact Kernelization
Paper • 2403.19928 • Published • 12 -
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Paper • 2309.12284 • Published • 18 -
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
Paper • 2404.09204 • Published • 11 -
SAGS: Structure-Aware 3D Gaussian Splatting
Paper • 2404.19149 • Published • 14
Papers - Inference - vLLM
Papers - Fine-tuning - Model Merge
Papers - Naver
Models - Frankenmerge
Papers - Benchmarks
-
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 23 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 37 -
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Paper • 1804.07461 • Published • 4
Papers - 1bit
-
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
Paper • 1606.06160 • Published • 1 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625 -
mobiuslabsgmbh/Llama-2-7b-chat-hf_1bitgs8_hqq
Text Generation • Updated • 53 • 74
Papers - Video - Fine-tuning
Models - Spright
Papers - Hugging Face
Papers - University - Tsinghua University
-
Condition-Aware Neural Network for Controlled Image Generation
Paper • 2404.01143 • Published • 13 -
FlexiDreamer: Single Image-to-3D Generation with FlexiCubes
Paper • 2404.00987 • Published • 23 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 46 -
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Paper • 2404.02893 • Published • 22
Papers - Xidian University
Papers - ShengShu
Papers - Non-Autoregressive Transformers
Papers - Safety
Papers - Audio - Chain of Thought
Papers - Audio - Fine-tuning
-
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper • 2404.00656 • Published • 11 -
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 12 -
Long-form music generation with latent diffusion
Paper • 2404.10301 • Published • 27 -
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Paper • 2006.11477 • Published • 7
Papers - Image - Continual Training Framework
Papers - Documents - FormNet
-
Noise-Aware Training of Layout-Aware Language Models
Paper • 2404.00488 • Published • 10 -
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
Paper • 2305.02549 • Published • 6 -
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction
Paper • 2203.08411 • Published • 1 -
ETC: Encoding Long and Structured Inputs in Transformers
Paper • 2004.08483 • Published • 1
Papers - Ohio State
Papers - Video - Streaming - Captions
Papers - Healthcare - Cardiac MRI - CMRxRecon Challenge 2023
Papers - Image - Healthcare
-
The state-of-the-art in Cardiac MRI Reconstruction: Results of the CMRxRecon Challenge in MICCAI 2023
Paper • 2404.01082 • Published • 1 -
Realism in Action: Anomaly-Aware Diagnosis of Brain Tumors from Medical Images Using YOLOv8 and DeiT
Paper • 2401.03302 • Published • 1 -
Brain2Music: Reconstructing Music from Human Brain Activity
Paper • 2307.11078 • Published • 41
Papers - Coding - C/C++ - Memory
Papers - Coding - Annotations, Decorators and Captions
Papers - Image - Contrastive Graph Learning
Papers - Documents - Tabular
-
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
Paper • 2305.02549 • Published • 6 -
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction
Paper • 2203.08411 • Published • 1 -
More efficient manual review of automatically transcribed tabular data
Paper • 2306.16126 • Published • 1 -
CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents
Paper • 2004.12629 • Published • 3
Papers - Documents - Graph Convolutional Network
Papers - Decoders - Bert
Papers - T5 - MoE
Papers - Image - Extract Style
Papers - Image - Use a Model to find a similar image
https://github.com/learn2phoenix/CSD
Papers - Shanghai AI Laboratory
-
CameraCtrl: Enabling Camera Control for Text-to-Video Generation
Paper • 2404.02101 • Published • 24 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 18 -
Interactive3D: Create What You Want by Interactive 3D Generation
Paper • 2404.16510 • Published • 21 -
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
Paper • 2406.07394 • Published • 29
Papers - Government - USA
Papers - Vector Institute
-
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 37 -
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Paper • 2211.12588 • Published • 3 -
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Paper • 2402.16671 • Published • 29 -
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Paper • 2404.04167 • Published • 14
Papers - Benchmarks - In-Context Learning
Models - Documents - OCR
Models - Text - Classifier - Deberta
https://github.com/MoritzLaurer/zeroshot-classifier/tree/main
Papers - Network Traffic - 4G and 5G - OTA - Packet Shaping
Papers - Network Traffic - 4G and 5G
Papers - Network Traffic - Packet Shaping
Papers - Network Traffic
Papers - University of Peking
-
LLM-ABR: Designing Adaptive Bitrate Algorithms via Large Language Models
Paper • 2404.01617 • Published • 8 -
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Paper • 2404.02905 • Published • 74 -
Learning From Mistakes Makes LLM Better Reasoner
Paper • 2310.20689 • Published • 29 -
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Paper • 2404.04167 • Published • 14
Papers - Coding - Understanding Tree Structures
Papers - University - University of Illinois
-
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 46 -
PointInfinity: Resolution-Invariant Point Diffusion Models
Paper • 2404.03566 • Published • 16 -
MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance
Paper • 2404.08252 • Published • 6 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 27
Papers - Multilingual - Finnish
Papers - LLaVA
-
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 27 -
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
Paper • 2404.03118 • Published • 26 -
DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation
Paper • 2404.07917 • Published • 2 -
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 32
Papers - Multimodal - Training
-
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 27 -
Data curation via joint example selection further accelerates multimodal learning
Paper • 2406.17711 • Published • 3 -
Unveiling Encoder-Free Vision-Language Models
Paper • 2406.11832 • Published • 54
Papers - Image - Encoders - DinoV2
-
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 27 -
OmniFusion Technical Report
Paper • 2404.06212 • Published • 77 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
WildGaussians: 3D Gaussian Splatting in the Wild
Paper • 2407.08447 • Published • 9
Papers - Training Research - Smaller vs Larger Models
Papers - Pre-training - Dynamic Context Length
For HyperClova X they split 90% at 4096 and 10% at 32k context length during pt
Papers - Text - Supervised Fine-tuning - Batch Grouping
Batches are grouped by similar token length to help optimize gpu/hardware. Mini batch lengths are different but the max number of tokens is the same.
Papers - Multilingual - Benchmarks
Papers - Image - SDXL
-
On the Scalability of Diffusion-based Text-to-Image Generation
Paper • 2404.02883 • Published • 19 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 22 -
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Paper • 2404.03653 • Published • 36 -
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Paper • 2404.07987 • Published • 48
Papers - Video - Autoregressive Model
Papers - Coding - Algorithmic Reasoning
Papers - Coding - Program of Thoughts (PoT)
Papers - Prompts - Detailed Examples
Papers - Mixture of Depths - MLP, residuals, router, tokens
Papers - Yonsei University
Papers - Alibaba
-
Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition
Paper • 2404.02514 • Published • 11 -
BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer
Paper • 1904.06690 • Published • 1 -
Contrastive Chain-of-Thought Prompting
Paper • 2311.09277 • Published • 36 -
LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency
Paper • 2404.12872 • Published • 11
Papers - Image - Frequency Decomposition
Papers - University - Hong Kong University of Science and Te
-
Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss
Paper • 2404.02731 • Published • 1 -
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Paper • 2309.12284 • Published • 18 -
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 10 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 18
Papers - 3D - Interior Design
Papers - 3D - Indoor Scene Synthesis
Papers - Reasoning - Self-Reference Metalinguistic
Papers - PlayTest AI
Papers - Reasoning - MRGSM8k - Meta Math Multi Step
Papers - Tencent
-
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 55 -
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Paper • 2404.16790 • Published • 10 -
A Thorough Examination of Decoding Methods in the Era of LLMs
Paper • 2402.06925 • Published • 1
Datasets - Reasoning - Meta Math Multi-Step - GSM8k
Papers - University of Cambridge
Papers - Alan Turing Institute
Datasets - Text - QA
Models - Image - Handwriting Comprehension
Papers - Arctic University of Norway
Papers - Documents - Tabular - Census
Papers - Documents - Custom Annotation and Labeling Tools
Papers - CascadeTabNet
Papers - Pune Institute
Papers - Documents - Table Recognition - Fine-tuning
Papers - Image - OCR - Tesseract for Text Location
Papers - Harbin Institute
Papers - Coding - OpenCodeInterpreter
Papers - Coding - Training - Equal-Info Windows
Table 5: Transformers struggle to learn Arithmetic Coding. In the sequence-to-sequence setting,
a model that learns AC compression/decompression shoul
Papers - Coding - Distributed - Adaptive Computation Time
Papers - Training Research - Compression and Multi-Model Inf
Papers - Encoders - Compression
Emergence with scale is unlikely Given the recent findings of [55], we anticipate that continuing
to scale models beyond 2 billion parameters is unlik
Papers - Tokenizer - Neural Compression
Papers - Fine-tuning - ReFT
In this paper, we propose a strong alternative to PEFTs, LoReFT. LoReFT achieves strong per-
formance across benchmarks from four domains while being
Datasets - Reasoning - Commonsense
Papers - Reasoning - Commonsense
-
SocialIQA: Commonsense Reasoning about Social Interactions
Paper • 1904.09728 • Published • 3 -
PIQA: Reasoning about Physical Commonsense in Natural Language
Paper • 1911.11641 • Published • 3 -
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Paper • 1905.10044 • Published • 2 -
HellaSwag: Can a Machine Really Finish Your Sentence?
Paper • 1905.07830 • Published • 6
Papers - University of Houston
Datasets - Reasoning - Math
Papers - Benchmarks - Image
-
AQuA: A Benchmarking Tool for Label Quality Assessment
Paper • 2306.09467 • Published • 1 -
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Paper • 2404.07972 • Published • 50 -
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 26 -
Vision language models are blind
Paper • 2407.06581 • Published • 84
Papers - Reasoning - Math - AQuA
https://github.com/google-deepmind/AQuA
Papers - University of IAIR Xi’an Jiaotong
Datasets - Text - Instruction-following
Papers - Benchmarks - Text - General Language Understanding
Datasets - Benchmarks - Glue
Papers - Encoders - Roberta
-
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
Paper • 1907.12461 • Published • 1 -
Transformer Language Models without Positional Encodings Still Learn Positional Information
Paper • 2203.16634 • Published • 5 -
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Paper • 2102.04664 • Published • 2
Papers - University of California Santa Barabra
Models - StructLM
Papers - Prompts - System Chat
Papers - Tokenizers - LLaMA Byte Pair Encoding (BPE)
Datasets - Documents - OCR - Image with Text from Textract
Papers - University - Harvard University
-
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
Paper • 2404.03413 • Published • 28 -
Scaling Data-Constrained Language Models
Paper • 2305.16264 • Published • 16 -
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
Paper • 2406.19370 • Published • 1
Papers - Image - Point Cloud
Papers - Image - Encoders - RBG-D
Papers - University - Beihang University
Papers - Tokenizers - Image - TrOCR
Spaces - Image - Handwriting Recognition
Papers - Audio - Text to Speech
-
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 10 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 10 -
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper • 2404.14700 • Published • 32
Papers - Audio - TTS - RALL-E
Papers - Benchmark - Security
Papers - Siemens
Papers - Munich Center for Machine Learning (MCML)
Papers - Web Navigation - Chrome Extension
Papers - Web - Training - Curriculum Learning
Papers - Zhipu AI
Datasets - Benchmarks - CodeEditorBench - OCI
Models - Text - Image
Models - Audio - Understanding
Models - Audio - Edit with Text
Models - Image - Chat
Spaces - Image - Chat
Papers - Audio - Captions
Datasets - SQL
Papers - Redwood Research
Models - Encoders - Bidirectional
Papers - Text - Encoders - Image - Clip
Papers - Training Research - Mamba
Papers - Training Research - Ablation - Factuality
Papers - Training Research - Interpretability
Papers - Interpretability
-
Prompt-to-Prompt Image Editing with Cross Attention Control
Paper • 2208.01626 • Published • 2 -
BERT Rediscovers the Classical NLP Pipeline
Paper • 1905.05950 • Published • 3 -
A Multiscale Visualization of Attention in the Transformer Model
Paper • 1906.05714 • Published • 2 -
Analyzing Transformers in Embedding Space
Paper • 2209.02535 • Published • 3
Papers - Interpretability - Attention
Papers - Training Research - Layer Understanding
Papers - Image - Imagen
Papers - Attention - Weights - Re-Weighting
Datasets - Image - ImageNet
Papers - Recommendation - Cloze Task
Papers - Recommendation
Papers - Recommendation - Bert4rec - SASRec
Papers - University of Zurich
Papers - University - Shanghai Jiao Tong University
-
DeViDe: Faceted medical knowledge for improved medical vision-language pre-training
Paper • 2404.03618 • Published • 2 -
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Paper • 2404.04167 • Published • 14 -
SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Paper • 2305.09781 • Published • 4 -
McEval: Massively Multilingual Code Evaluation
Paper • 2406.07436 • Published • 41
Papers - Training Research - Vision Language Pre-training
Papers - Multimodal - Encoders - ALBEF
Papers - Fine-tuning - RLHF - Direct Nash Optimization (DNO)
Reward expressed as win-rates related to general preferences
Datasets - Text - Alpaca
Papers - Fine-tuning - Image - Prompt Image Alignment
Papers - Fine-tuning - Stream of Search
Models - Text - Science
Papers - HKUST
Papers - Text - Dialog Inpainting
Papers - 3DGS - Color Transformation
Papers - University of Dalian
Papers - Audio - Encoders - HuBert with EnCodec
Papers - Mobile - Multimodal - Screen Image with Captions
Papers - Healthcare- DeiT
Papers - Healthcare - Image - Cancer - Brain
Papers - Image - Healthcare - DICOM
Papers - Image - DeiT
-
Realism in Action: Anomaly-Aware Diagnosis of Brain Tumors from Medical Images Using YOLOv8 and DeiT
Paper • 2401.03302 • Published • 1 -
MLP Can Be A Good Transformer Learner
Paper • 2404.05657 • Published • 1 -
Detecting and recognizing characters in Greek papyri with YOLOv8, DeiT and SimCLR
Paper • 2401.12513 • Published • 1 -
DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets
Paper • 2404.02900 • Published • 1
Papers - University of Melbourne
Papers - Indian Institute of Technology
Papers - University of Sorbonne
Papers - Regularization - Binary Cross Entropy
Models - Image - Classification
Papers - Image - Training - Mistral
Papers - Sber AI
Papers - Image - LLaVA
Papers - Image - Clip - Coco Testing
-
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Paper • 2310.03502 • Published • 78 -
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper • 2404.07448 • Published • 12 -
RegionGPT: Towards Region Understanding Vision Language Model
Paper • 2403.02330 • Published • 2 -
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 4
Papers - Training - Long Context
-
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Paper • 2404.07143 • Published • 111 -
RULER: What's the Real Context Size of Your Long-Context Language Models?
Paper • 2404.06654 • Published • 39 -
An Evolved Universal Transformer Memory
Paper • 2410.13166 • Published • 6
Papers - Benchmarks - Context - Ruler
Papers - Image - Decoders - ViT
Papers - Image - Training - AS2D RoPE and SwiGLU
Papers - Image - Encoders - ViT
-
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
Paper • 2404.06903 • Published • 21 -
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 29 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
Paper • 2404.17672 • Published • 19
Papers - Image - Training - Self Refinement
Papers - Image - Object Detection - DETR
-
End-to-End Object Detection with Transformers
Paper • 2005.12872 • Published • 7 -
ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model
Paper • 2404.07773 • Published • 1 -
Efficient Transformer Encoders for Mask2Former-style models
Paper • 2404.15244 • Published • 1 -
DETRs Beat YOLOs on Real-time Object Detection
Paper • 2304.08069 • Published • 14
Papers - Text - Social Skills
Papers - KAIST AI
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 69 -
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
Paper • 2404.07738 • Published • 2 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 124
Papers - Image - FNO - Low and High Frequency Data
Papers - Image - FNO - SpecBoost Ensemble
Papers - Image - Spectral Analysis
Papers - Rag - Multiple Documents in Parallel
Papers - Tokens - Real-Valued Positioning
Papers - Models - Griffin - RecurrentGemma
Papers - Fine-tuning - ControlNet
Papers - Reward Model - Consistency Loss - ControlNet
Papers - Qwen - Audio
Papers - Image - Auto - Lane Detection
Papers - Operating Systems
Papers - Benchmarks - Agent - Multimodal - Tasks
Papers - University - Princeton University
-
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Paper • 2404.07413 • Published • 38 -
Allowing humans to interactively guide machines where to look does not always improve a human-AI team's classification accuracy
Paper • 2404.05238 • Published • 3 -
Cognitive Architectures for Language Agents
Paper • 2309.02427 • Published • 8 -
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2
Papers - Attention - Mixture of Attention Heads (MoA)
Generalized multi head using RoPE
Papers - Image - Generator - Gaussian Noise - Bounding Boxes
Papers - Image - Object Detection - Bounding Boxes
Datasets - Image - Coco - Obj Det, Segmentation, Captions
Models - Image - DPT - Dino
Papers - Image - TrOCR
Read by Bark: https://drive.google.com/file/d/1apmyvLMEQ97ObHKzQna9URFHF0Xg-EsO/view?usp=sharing
Models - Mistral
Models - Image - Dino
Models - Agent - On-Device
Papers - Chain of Thoughts - Visualization
Papers - Benchmarks - Documentation
Papers - AutoDesk
Papers - University of Shenzhen
Papers - Image - Knowledge Graph
Papers - Knowledge Graph - Tasks
Papers - University of Xiamen
Papers - Fine-tuning - Math
-
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 93 -
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Paper • 2408.16293 • Published • 27 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 129
Papers - Image - VQA
-
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 32 -
RegionGPT: Towards Region Understanding Vision Language Model
Paper • 2403.02330 • Published • 2 -
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Paper • 2404.12803 • Published • 30 -
Pegasus-v1 Technical Report
Paper • 2404.14687 • Published • 33
Papers - University - University of Santa Barbara
Papers - Image - VQA - Ferret
Papers - Image - Referring Object Classification (ROC)
Where the model is tasked with identifying
the object in a region mentioned in a query. we utilize the validation split of the
LVIS dataset
Papers - Image - Grounding
Papers - Image - Captioning
Papers - Documents - Fine-tuning - LayoutLM and UDOP
Papers - Documents - Scientific Charts
Papers - Image - Fine-tuning - ICPR22 dataset
Papers - Image - Fine-tuning - DeGruyter dataset
Papers - Embeddings - Image
Papers - LayoutLM - Fine-tuning - Word Patch Alignment
Papers - Fine-tuning - Hyperparameter - FUNSD
Papers - Timeseries
Papers - Image - Report - Training - CNN RNN LTSM MLP
Papers - Image - Climate - SHAP
Papers - Image - Climate - ERA5
Papers - Image - Mask - box-kMaX over kMaX-DeepLab
Papers - Image - Coco - Panoptic
Papers - NeRF - Training - Photometric Consistency Patches
Papers - Image - Datasets - TanksAndTemples
Papers - Image - Evaluation Metrics - PSNR SSIM LPIPS
Papers - University of Alberta
Papers - Explainability - Image - VQA
Spaces - Chat - QA - Research Papers on Arxiv read by Claude
Audio Reading - 2403.07691 - ORPO Fine-tuning
Read by Bark: https://drive.google.com/file/d/1no3kjSmexQxlS-KjhRB0jB5hz72Yuhsb/view?usp=sharing
Audio Reading - 2404.06209 - Elephants Never Forget
Read by Bark: https://drive.google.com/file/d/13IlbhKh71vxLpdYJ6mkIiiJZOUsf7XFv/view?usp=sharing
Models - Reasoning
-
mlabonne/AlphaMonarch-7B
Text Generation • 7B • Updated • 14.4k • • 148 -
Qwen/QwQ-32B-Preview
Text Generation • 33B • Updated • 25.2k • • 1.74k -
deepseek-ai/DeepSeek-R1-Distill-Llama-8B
Text Generation • 8B • Updated • 573k • • 816 -
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
Text Generation • 8B • Updated • 632k • • 734
Datasets - Audio - Multilingual
Spaces - Audio - TTS
Datasets - Benchmark - Tasks
Datasets - Chat - Persuasion
Papers - Training - Curriculum Learning
Papers - Training - Curriculum Instruction Tuning
Papers - Training - AI2 Reasoning
Papers - Training - Multilingual - Out of Vocabulary
Papers - Training - Report - LTSM vs LLM vs Ensemble
Papers - University of Seoul National
Papers - University - National University of Singapore
-
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 12 -
Contrastive Chain-of-Thought Prompting
Paper • 2311.09277 • Published • 36 -
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper • 2404.14700 • Published • 32 -
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
Paper • 2406.06911 • Published • 12
Papers - Audio - Fine-tuning - DPO
Papers - Audio - Clap
We use an ensemble filtering strategy based on two different CLAP models: 630k-audioset-best and 630k-best
Papers - Audio - Frechet Audio Distance (FAD) like FID
Papers - University of Southern California
Papers - Multimodal - Long Context - Megalodon
Papers - Multimodal - Speculative Decoding
Papers - Qualcomm
Papers - Dataset Grooming - Report
Papers - Image - Hyperspectral Images (HSI)
Papers - Healthcare - Image - Cancer
Papers - Agent - Research
-
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
Paper • 2404.07738 • Published • 2 -
Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations
Paper • 2404.17521 • Published • 13 -
LEGENT: Open Platform for Embodied Agents
Paper • 2404.18243 • Published • 22 -
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
Paper • 2406.07394 • Published • 29
Papers - Fine-tuning - DPO - KL Divergence vs Learning Rates
Papers - Embeddings - Scalable Positional Encodings
Papers - Image - Layer Pruning
Papers - Inference - Image - Layer Pruning
Audio Reading - 2404.08011 - Review Handwriting Recognition
Read by Bark: https://drive.google.com/file/d/1yCc6rr199rQHHNwKozHhqzz0Rr48z03A/view?usp=sharing duration is 1 hour and 11 min 47s
Papers - Pre-training - Pegasus
Papers - Pre-training - Text - Masked Language Models (MLM)
Papers - Pre-training - Warm-Start - Encoders - BPE
Papers - Pre-training - Summarization
Papers - Pre-training - Encoders - Roberta
Papers - Pre-training - Unsupervised
Models - Fintech - Financial Summarization
Datasets - Image - Multilingual - VQA
Papers - Inference
Models - Encoders - Multimodal - Clip - SigLIP
better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities
Spaces - Multimodal - Image and Chat
Papers - Audio - Activation - Snake
Papers - Audio - RoPE
Papers - Audio - Embedding - Text - Clap - Cross Attention
Papers - Audio - Encoders - Clap - HTSAT audio RoBERTa text
Papers - Audio - Encoders - Clap - Training - Metadata
Papers - Audio - Encoders - Laion-Clap
Found 5566 memorized, repeated audio sequences
Papers - World Sim - Agent - Tasks
Papers - Video Games
Papers - Video Games - Crafting
Papers - Video Games - Navigation
Papers - Video Games - Farming
Papers - World Sim - Encoder - Image - Sparc
Papers - World Sim - OCR
Papers - World Sim - Cognitive Architectures
Papers - Video - Encoders - C-ViViT
The embeddings of
images and video patches from raw frames x are processed by a spatial and then a causal transformer
(AR in time) to gen video tokens
Papers - Embeddings - Text - T5X
Papers - JAX
Papers - Training - GNN
Papers - GNN - Fine-tuning
Papers - GNN - Benchmark - Polaris
Papers - Hybrid Arch - Skip Connections
Papers - GNN - Encoders
Papers - GNN - Fine-tuning - Custom Layer - MLP
Papers - Healthcare - Molecules - GNN
Papers - Healthcare - GNN
Papers - Healthcare - Drug Discovery
Papers - Valence Labs
Papers - University - University of Toronto
Papers - Healthcare - Image - X-ray
Papers - Healthcare - Image - Lung Disease
Papers - XAI - Gradient Weighted Class Activation Mapping
Grad-CAM
Papers - XAI - Fine-tuning
Papers - Healthcare - Image - Covid-19
Papers - Inference - Batch - Hierarchical Sharing Pattern
Papers - Attention - Sliding Window
Papers - Training - 3D Parallelism - Forward - All-Gather
Papers - Training Research - Model FLOPs Utilization (MFU)
Papers - Custom Layers - Decoders - No FFN
Papers - Equall AI
Datasets - Fine-tuning - Orpo
Papers - Emergent Properties - Multiple Choice Grade
Papers - Emergent Properties - Image
Papers - Attention - Mixture-of-Attention (MoA)
Papers - Benchmarks - Safety
Papers - Reward Model - Fine-tuning
Papers - Reward Model - Cross-Lingual
We propose to perform reward optimization using a RM trained for a different language. Assuming model generation quality transfers cross-lingually
Papers - Datasets - Multilingual - OpenAssistant
multilingual, pairwise human-rated chat transcripts.
For the SFT data, we use the human-preferred response in each pair to finetune the model
Papers - Speculative Decoding - KV Cache
we recognize two memory bottlenecks: model weights and KV cache, and the latter gradually bottleneck(s) as context length increases
Papers - Inference - Speculative Decoding - Draft - KV Cache
Papers - Speculative Decoding - Long Context
Models - Speculative Decoding - Draft - Base Model
Papers - Speculative Decoding - Token Tree Verification
Papers - TensorRT-LLM - FasterTransformer - deprecated
Papers - Tokenizers - tiktoken
Papers - Animation - Text - Kinetic Typography
Papers - Image - LPIPS
-
Dynamic Typography: Bringing Words to Life
Paper • 2404.11614 • Published • 45 -
Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
Paper • 2404.14351 • Published • 6 -
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
Paper • 2404.17672 • Published • 19 -
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 71
Models - Fine-tuning - Orpo
Papers - Nota
Papers - Training - 3D - NeRF
Papers - Training - Self-Improvement
Datasets - Benchmarks - Image - QA - Real World Objects
Papers - Benchmarks - Image - Visual Commonsense
Datasets - Benchmarks - Image - QA
Papers - Context - NoPE
Papers - University - East China Normal University
Papers - Context - Length Generalization
Papers - Attention - Training - Context - Head-based Scaling
Papers - Datasets - Training - Context - SlimPajama
Papers - Datasets - Training - Context - Starcoderdata
Papers - Training - Eval - Sliding Window - Proof-pile
Papers - Transformers Without Positional Encoding - NoPE
-
Length Generalization of Causal Transformers without Position Encoding
Paper • 2404.12224 • Published • 1 -
Transformer Language Models without Positional Encodings Still Learn Positional Information
Paper • 2203.16634 • Published • 5 -
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2 -
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2
Papers - IBM
Papers - Attention - Multi-Head Attention (MHA)
-
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2 -
Transformers Can Represent n-gram Language Models
Paper • 2404.14994 • Published • 21 -
Are Sixteen Heads Really Better than One?
Paper • 1905.10650 • Published • 2 -
Reasoning in Large Language Models: A Geometric Perspective
Paper • 2407.02678 • Published • 1
Papers - Text - Encoders - Bert
-
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2 -
BERTs are Generative In-Context Learners
Paper • 2406.04823 • Published • 1 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 156 -
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Paper • 2006.11477 • Published • 7
Papers - Embeddings - Absolute Position Embedding (APE)
Papers - Encodings - Rotary - RoPE
-
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 166 -
Round and Round We Go! What makes Rotary Positional Encodings useful?
Paper • 2410.06205 • Published • 2 -
ThunderKittens: Simple, Fast, and Adorable AI Kernels
Paper • 2410.20399 • Published • 2
Papers - Embeddings - T5 Relative Bias
Papers - Text - Classification - FastFit
Papers - Text - Datasets - Classification and Labels
Papers - Weather
Papers - Datasets - Weather - ERA5
Papers - University of Aarhus
Datasets - Coding - Code Reviews
Datasets - Text - Web
Datasets - Text - QA - Web
Papers - Image - Graph - Understanding
Papers - Image - Glip
Core techniques: 1) unified grounding loss 2) language-aware deep fusion 3) pre-training with both types of data.
Papers - International Digital Economy Academy (IDEA)
Papers - Image - Bounding Box - Coco - Teacher and Student
Models - Image - GLIGEN
Papers - Image - UMAP
Papers - University - University of Padua
Papers - Text - Named Entity Recognition (NER)
Papers - Text - Eval - SMOTE
Papers - Text - Remove Redaction - Countermeasures
Papers - FDM Business Services
Papers - Inference - Scheduled Sampling
improved visual quality as the rough concept location and outline are decided in the early stages, followed by fine-grained details in later stages.
Papers - Image - Inpainting
Papers - 3DGS - Structure from Motion
Papers - SQL - Knowledge Graphs
Papers - SQL - Curriculum Learning
Papers - University - Simon Fraser University
Papers - Coding - Git Commits
Papers - 3DGS - Material Point Method (MPM)
Papers - Video - Simulated Material Dynamics - MLS-MPM
Papers - University - Huazhong University
Papers - Text - Mobile
Papers - Kunlun
Papers - Multimodal - XAI
Papers - XAI - Research in Appendix
Papers - Llama 3
-
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
Paper • 2404.14047 • Published • 45 -
Reasoning in Large Language Models: A Geometric Perspective
Paper • 2407.02678 • Published • 1 -
Natural Language Reinforcement Learning
Paper • 2411.14251 • Published • 31 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108
Papers - Llama 3 - Fine-tuning
-
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
Paper • 2404.14047 • Published • 45 -
LiteSearch: Efficacious Tree Search for LLM
Paper • 2407.00320 • Published • 40 -
Cut Your Losses in Large-Vocabulary Language Models
Paper • 2411.09009 • Published • 49 -
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
Paper • 2411.09595 • Published • 77
Papers - Image - NeRF - Structure from Motion (SfM)
Papers - Benchmarks - Fintech
Papers - Fintech - Datasets - SEC - Edgar Filings DB - N-CEN
Papers - JP Morgan Chase
Papers - KL Regularization - Diffusion Matching Distillation
Papers - Prompts - Security - Instruction Prioritization
Papers - Image - Adaptive Concept Normalization (ACN)
Papers - Image - Synthetic Generator - Canny
Datasets - Image - Classification
Papers - Image - Datasets - MNIST
Papers - Pre-training - Layer Initialization
Papers - Image - Datasets - ImageNet
-
All you need is a good init
Paper • 1511.06422 • Published • 1 -
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper • 2404.14507 • Published • 23 -
Efficient Transformer Encoders for Mask2Former-style models
Paper • 2404.15244 • Published • 1 -
Deep Residual Learning for Image Recognition
Paper • 1512.03385 • Published • 8
Papers - Pre-training - Weight Initialization
Models - Phi-3
Papers - Audio - Attention - FlashSpeech
Papers - Cohere
Papers - Training - KL-divergence Upper bound (KLUB)
Papers - Audio - Latent Consistency Model (LCM)
Papers - Audio - Prosody Generator
Papers - MSRA
Papers - University - Beijing University
Papers - OLMo
Papers - Fine-tuning - Dataset - Instruct - UltraFeedback
Papers - Fine-tuning - DoRA
Papers - Fine-tuning - Text - Bottleneck - RMSNorm
Papers - Training Research - Flash Memory - DRAM
Papers - Attention - Hard Attention
Papers - Training - Early Exit - Gating Network
Encoder - Weighted Stochastic Depth
Paper - Image - Segmentation - Cost vs Quality - Gating Net
Papers - NEC Laboratories
Papers - Custom Layers - No Dropout - Batch Normalization
Papers - Pre-training - Batch Normalization
Papers - Healthcare - DNA
Papers - University - University of Massachusetts
Papers - Fine-tuning - Transfer Learning - Cross-Lingual
Papers - Fine-tuning - Part of Speech (POS)
Papers - Cross-lingual
Datasets - Text - Multilingual - Catalan, Spanish, English
Datasets - Text - Web, Medical Journals
Spaces - Image - Clothing
Papers - Knowledge Graphs - Construction and Validation
Papers - Documents - Knowledge Graphs
Papers - Prompt - Knowledge Graphs
Papers - Knowledge Graphs - Validation - Pydantic
Papers - Knowledge Graphs - Llama 2
See Appendix A.2
Papers - Training - Contrastive Loss - CatLIP
Papers - Image - Pre-training - Transfer Learning
Papers - Pre-training - Continual - Expert Onboarding
Papers - Training - Image - MoE - Clip
Papers - Embeddings - Clustering
Papers - Embeddings - Text - SimCSE
Papers - Pre-training - MoE - Flexible Expert Ensembles
Papers - Pre-training - MoE - Continual Learning
Papers - Inference - MoE - Routing with Task Metadata
Papers - Image - Datasets - Flickr
Papers - MoE - Image - MoDE
Papers - Image - Datasets - LAION
-
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 71 -
Data curation via joint example selection further accelerates multimodal learning
Paper • 2406.17711 • Published • 3 -
Unveiling Encoder-Free Vision-Language Models
Paper • 2406.11832 • Published • 54
Papers - Attention - BASS
Papers - 3D - NeRF
Papers - 3D - Interactive
Papers - University - The Chinese University of Hong Kong
-
Interactive3D: Create What You Want by Interactive 3D Generation
Paper • 2404.16510 • Published • 21 -
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Paper • 2404.16790 • Published • 10 -
A Thorough Examination of Decoding Methods in the Era of LLMs
Paper • 2402.06925 • Published • 1 -
LLaVA-OneVision: Easy Visual Task Transfer
Paper • 2408.03326 • Published • 60
Papers - Benchmarks - Multimodal
-
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Paper • 2404.16790 • Published • 10 -
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Paper • 2406.08407 • Published • 28 -
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
Paper • 2408.03361 • Published • 85
Papers - Multimodal - Benchmarks - Report
Papers - Inference - Early Exit
Papers - Prompts - Adversarial
Papers - Agent - Robotics
Papers - Healthcare - Fine-tuning
Papers - Chain of Reasoning (CoR)
Papers - Inference - Uncertainty-Guided Search
Papers - Healthcare - Multimodal
Papers - Gemini
Papers - Healthcare - Surgery - VQA
Papers - Healthcare - Benchmarks - Text - NEJM
Papers - Healthcare - Benchmarks - Long Context - MIMIC-III
Papers - Healthcare - Benchmarks - Video - Cholec80
Papers - Healthcare - Report
Papers - 3D - Garment
Papers - Training - Multi-Model Evaluation - PoLL
Papers - Prompts - Training - Evaluation - Multi-Hop QA
Papers - Agent - Fine-tuning
Papers - Blender
Papers - 3D - Mesh Editing
Papers - 3D - Lighting
Papers - World Sim - VQA
Papers - University - Central South University
Papers - Image - Multi-Model Evaluation
Papers - Image - Annotation Pipeline
Papers - 3DGS - Test - Dataset - RealEstate10k
Papers - Image - Detailed Multi-Object Generation
Papers - 3DGS - Point Cloud - COLMAP
Papers - 3DGS - Test - PSNR
Papers - University - Imperial College London
Papers - Octopus
Papers - Alternative Layers - KAN instead of MLP
Papers - National Science Foundation (NSF)
Papers - ICL - Induction Head
-
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Paper • 2404.07129 • Published • 3 -
In-context Learning and Induction Heads
Paper • 2209.11895 • Published • 2 -
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Paper • 2403.07809 • Published • 1
Papers - ICL - Training - Activations - Clamping
See: pattern-preserving ablation
Papers - Audio - Codec - Bitrate - Low
Papers - Image - Comics
Papers - University - Nankai University
Papers - Fine-tuning - LoRA - LoRAX
Papers - Training - Datasets - Few-Shot Learning - OmniGlot
Papers - Custom Layers - No Dropout - Dropout Regularization
Papers - Ablation - Attention - Head Pruning
Causal ablations taking into account LayerNorm
Papers - Attention - Induction Heads
Papers - Training - Ablation
Papers - Training Research - Loss Dynamics - Clamping
Papers - XAI - Induction Head - Phase Change - Components
Papers - ICL - Induction Circuit - Data Dependent Learning
Papers - ICL - Induction Head - Copy vs QK Match
See figure 6: Classes vs labels in columns B and C. Subcircuit B delays phase change on number classes vs C delays on number of labels (dramatically)
Papers - XAI - Framework - pyvene
Papers - Training - Interventions - Understanding
Papers - ICL - Training - Distributed Alignment Search
Papers - XAI - Attention - LayerNorm
Papers - Reward Model - Preference Collection Construction
Papers - Model Merging - DARE better than TIES
See Appendix E: Merging Method Ablation on MoE Mistral and instruct 7B. Ties merged degenerate vs DARE model merges did not degenerate.
Papers - LG
Papers - Dataset Storage - Orc vs Parquet
Local (gpu/cpu), compression (zstd) and over the wan (orc over s3 beats parquet too) results
Papers - Dataset Storage - Orc
Papers - Dataset Storage - cuDF - Parquet and Orc
See figure 19. Orc with cuDF beat parquet cuDF. Parquet arrow has dramatically more throughput without access to gpus
Papers - Dataset Storage - Technical Report
Papers - BitNet
Additional paper with faq, code and tips on: https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf
-
You Only Cache Once: Decoder-Decoder Architectures for Language Models
Paper • 2405.05254 • Published • 10 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625 -
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model
Paper • 2406.04333 • Published • 38 -
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs
Paper • 2410.16144 • Published • 5
Papers - Fine-tuning - Rerankers
Papers - Training - Math
Papers - SSMs
Papers - Mamba - Mamba 2
Papers - Epoch AI
Papers - Training - Report - Historical Cost Estimates
Papers - Reasoning - Complex - TACT
Papers - Reasoning - Prompt - Table and Calculations
Papers - Reasoning - Datasets - TACT
Datasets - Video - Captions
Papers - Training - Piecewise Affine Multiplication
Papers - Training - Multiplication Free
Papers - Training - CNN - Binarized MNIST - Code Examples
Papers - Unsupervised - Distribution Estimation
Papers - Healthcare - Virus Detection - Classification
Appendix A.9
Papers - University - Hong Kong University
Papers - Text to Image - Encoders - Flan-T5 XL
Papers - Image - Training Metrics - SSIM
Papers - Image - Tokenizers - ViT-VQGAN
Papers - Image - Training - AutoRegressive
Papers - Image - Training - Captions created with LLaVA
Stage 2 training using LLaVA to describe the image with a caption
Papers - Image - Tokenizer - L2 Normalization
Papers - Image - Training - Loss - PatchGAN
Papers - Image - Training - Detailed Training Tables
Papers - Image - BigGAN
Papers - Image - Metrics - FID and IS
Papers - Image - Classifier - Inception v2 - JFT-300M
Papers - RL - Monte Carlo Tree Search (MCTS)
Papers - Image - Datasets - JFT-300M
Papers - Image - U-Net - Mask Augmentation
Papers - Monte Carlo Tree Search (MCTS) - Self-Refine MCTSr
Papers - University - Hong Kong Polytechnic University
Papers - Image - Region Proposal Network (RPN)
Papers - Image - Faster RCNN - Region Proposal Network (RPN)
Papers - Image - InceptionResNet
Papers - University - Drexel University
Papers - Agent - Security
Papers - Security - OWASP Testing
Papers - Image - Inference - Model Segmentation
Papers - Video - SDXL - Multi-GPU
Papers - Coding - Benchmarks - McEval
Papers - Coding - Prompts
Papers - Coding - Training - Distributed - PyTorch FSDP
Papers - Coding - Tokenizer - Visualization - t-SNE
Papers - CCSE
Papers - Coding - Classification - Categories Easy Med Hard
Papers - Coding - Fine-tuning - CodeQwen
Papers - World Sim - Video - Benchmarks - MMWorld
Papers - SSMs - Chimera
Papers - SSMs - 2D Mamba
Papers - SSMs - Time Series Anomaly Detection
Papers - Image - ControlNet
-
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 57 -
Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention
Paper • 2408.00760 • Published • 8 -
MagicQuill: An Intelligent Interactive Image Editing System
Paper • 2411.09703 • Published • 78 -
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
Paper • 2403.06976 • Published • 2
Models - Image - ControlNet - Training Annotator
Papers - Image - Datasets - NYUD - NYU Depth
Papers - Image - Pipeline - HED
Papers - 3DGS - Enhancement - Lighting
Papers - 3DGS - Security Camera - Image Enhancement
Papers - Training - Preference Optimization - Code Samples
Papers - Image - Augmentation - Binarization - NAF-DPM
Papers - Image - OCR - Binarization - Sauvola
Papers - Image - OCR - Binarization - D2BFormer
Papers - Image - OCR - Binarization - DocEnTr
Papers - Image - Datasets - OCR - DIBCO
Papers - Image - DPM - Diffusion Probabilistic Model
Papers - Document - Deblurring
Models - Abliterated - Refusal Direction Editing
Models - Image - Augmentation - Depth Estimation
Papers - XAI - Text - WordNet - Noun and Verb Hierarchy
Papers - Quantization
-
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 56 -
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Paper • 2407.11062 • Published • 10 -
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models
Paper • 2407.12327 • Published • 79 -
BitNet a4.8: 4-bit Activations for 1-bit LLMs
Paper • 2411.04965 • Published • 69
Papers - 3D - Artist-Created Meshes (AMs)
Papers - Duplex Models
Papers - XAI - Confidence Regulation
Papers - Image - Charts
Papers - In-Context Learning - Concept Learning Geometry
Papers - ICL - Concept Spaces
Papers - Coding - Programming by Example
Papers - Coding - Eval - LambdaBeam Problems
Spaces - Biology - ESM - Proteins
Papers - Healthcare - Datasets - Image - PubMedVision
Papers - Image - Datasets - Biology - Arboretum
Papers - Training - Brier Score - Probabilistic Accuracy
Papers - Rag - Benchmarks
Models - Text - Multi-token Prediction
Papers - Benchmarks - Tables
Papers - Image - Region Zoom
Papers - Image - Florence 2
Papers - 3DGS - Geometry-Bound
Papers - 3DGS - Classifier-Free Guidance (CFG)
Papers - 3DGS - Text - Image - Mesh
Models - Text - Research
Papers - Image - Training - Optimization - SigLIP
Models - Text - Chemistry
Papers - Multimodal - Training - Decoder Only
Papers - Attention - Decoder Only
Papers - Multimodal - Training - LLM Guided Pre-training
Papers - Agent - Math
Papers - Text - Decoding - Truthful
Datasets - CoT - Math
Papers - XAI - Attention - MLP - Partitioning - Affine Maps
Papers - Decoders - Strategy - Beam Search - Report
Papers - Markov Decision Process
Papers - RL - GBT vs GBRL vs XGBoost
Papers - XGBoost
Papers - Decoders - Deterministic - FSD
Papers - Decoders - Deterministic
Deterministic methods with unaligned models usually perform better on all tasks except for open-ended text generation.
Papers - Decoders - Deterministic - DoLa
Papers - Decoders - Deterministic - Contrastive Search
Papers - Decoders - Stochastic - Mirostat Sampling
Papers - Decoders - Stochastic - Temperature Sampling
Papers - Decoders - Stochastic - Top-k Sampling
Papers - Benchmark - Coding - HumanEval
Papers - Text - Datasets - Translation - WMT22
Papers - Text - Benchmark - Factual Knowledge - FActScore
Papers - Fine-tuning - Math - QA
Papers - Encodings - SPE - Sinusoidal Position Encoding
Papers - Text - Reasoning - Causal Chains
Papers - Knowledge Graph - Dataset - Text - WordNet
Papers - CoT - Intermediate Thoughts
Papers - Training - Text - Continual Learning
Datasets - Text - Wiki - Embeddings - SBert
Papers - Positive Geometries - Report
Papers - ICV - PCA - Directional Alignment
Papers - Text - Datasets - Toxicity - ParaDetox
Papers - Text - Safety
Papers - Text - Personalization - ICV
Papers - Text - Role-Play - Shakespeare - Romeo and Juliet
Papers - Text - Role-Play - Ranking Responses - ChatGPT
Papers - Text - Benchmarks - Similarity - Text - ROUGE-1
Papers - ICL - Detox - ICL Fine-tuning vs In-Context Vectors
PPapers - Text - Safety - Diagonal Safety for Unsafe Queries
Papers - Text - Jail break - ICV
Papers - Text - Datasets - AGNews
Papers - Activation Editing - ICV
Papers - ICV - Task Arithmetics
Papers - Text - Sentiment - Classification
Papers - Attention - Rescale Weights - YARN
Papers - Text - Training - Long Context
Papers - Benchmarks - Alignment - MT-Bench
Papers - Benchmarks - Long Context - Needle in a Haystack
Papers - Benchmarks - Biology
Papers - Text - Benchmarks - Reasoning - Long Context - ATC
Papers - 3DGS - Scene Editing - Day vs Night - t-SNE
Papers - 3DGS - Datasets - Photo Tourism
Papers - 3DGS - Datasets - NeRF on-the-go
Papers - 3DGS - Uncertainty - Per-Pixel Binary Mask
Papers - Ternary
Papers - Quantization - AQLM
Papers - Multimodal - Benchmarks
Papers - Text - Cognitive Science - Participation
Papers - Text - Linguistics - Precarity - Conflict - Tension
Papers - Visualizations - Non-Euclidean Structures
Papers - Visualizations - Topological, Geometric, Algebraic
Papers - Image - Segmentation - High Dimensional Objects
Papers - Visualizations - Dimensionality Reduction
-
Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures
Paper • 2407.09468 • Published • 2 -
Efficient Algorithms for t-distributed Stochastic Neighborhood Embedding
Paper • 1712.09005 • Published • 1 -
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
Paper • 1802.03426 • Published • 1 -
Principal subbundles for dimension reduction
Paper • 2307.03128 • Published • 1
Papers - Math - Visualizations
-
Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures
Paper • 2407.09468 • Published • 2 -
Barycentric Subspace Analysis on Manifolds
Paper • 1607.02833 • Published • 1 -
Template shape estimation: correcting an asymptotic bias
Paper • 1610.01502 • Published • 1 -
A Heat Diffusion Perspective on Geodesic Preserving Dimensionality Reduction
Paper • 2305.19043 • Published • 1
Papers - Math - Geometry - Distance - Riemannian Manifold
Papers - Math - Geometry - Riemannian Geodesic
-
Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures
Paper • 2407.09468 • Published • 2 -
Geodesic Multi-Modal Mixup for Robust Fine-Tuning
Paper • 2203.03897 • Published • 1 -
A Heat Diffusion Perspective on Geodesic Preserving Dimensionality Reduction
Paper • 2305.19043 • Published • 1 -
A micro Lie theory for state estimation in robotics
Paper • 1812.01537 • Published • 1
Papers - Coding - Science
Papers - Math - Algebra - Algebraic Transformations
Papers - Coding - Verilog
Papers - Coding - Agentic - Summarization - Prompting
Papers - Math - Structures - Topology, Geometry and Algebra
Models - Attention - GQA
Papers - Math - Algebra - Lie Group - SO(3)
Papers - Math - Structures in Data
Papers - Healthcare - Medical Assistant - Diagnosis
Papers - Topological Deep Learning - Structures in Data
Papers - Training - Research - Data as Signals
Papers - Attention - Algebra SE(d) - Fourier Nonlinearities
Papers - Math - Fourier Components - Fourier Space
Models - Embedding - Text - BGE M3
Models - Text - Fine-tuning - SPPO - Reranker
Papers - MLP
Papers - Multilingual - Greek
Papers - Multilingual - Hebrew
Papers - Math - Non-Euclidean Spaces - Domain and Codomain
Papers - Math - Visualization - Non-Linear - t-SNE
2008 paper: https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf
Models - Bitnet - Frankenmerge
Paper - Non-Euclidean - Sphere - Frechet Mean - Geodesic
Papers - Math - PCA - Barycentric Subspace Analysis (BSA)
Papers - Math - Manifold - Metric Space - Quotient Space
Papers - Image - Rectified Flow Transformers
Papers - Fine-tuning - LlamaFactory
Papers - Multimodal - Storytelling
Papers - Netflix
Papers - Audio - Segmentation -Cinematic Music
Papers - NTU
Papers - NEML - Manifold - Tangent Space - Exponential Map
Papers - Math - Non-Euclidean Machine Learning (NEML)
See also: https://dawn.cs.stanford.edu/2019/10/10/noneuclidean/
Papers - NEML - Preprocessing - Topological Data Analysis
Papers - NEML - Preprocessing - Algebra - Group Learning
Papers - NEML - Transform - Euclidean to Manifold
Papers - NEML - Manifold - Bayesian - Kernel Regression
Papers - Benchmark - Distractions
Papers - Attention - Topology
Papers - NEML - Manifolds Geometric - Polynomial Regression
Papers - NEML - Frechet Regression - Geodesic Regression
Papers - NEML - Regression - Stochastic - Non-Geodesic
Papers - NEML - Bayesian - Non-Parametric - Gaussian Process
Papers - NEML - Regression - Local Extrinsic
Papers - NEML - Manifold IO - Banerjee Kernel Regression
Papers - NEML - Geometric Structures - Dim Reduction - UMAP
Papers - NEML - Geometric - Dim Reduction - Barycentric Subs
Papers - NEML - Linear - Embeddings - Tangent Space PCA
Papers - NEML - Manifolds - VAE
Papers - NEML - Hyberbolic - Frechet Mean - Poincare
Papers - NEML - Poincare Ball
Papers - Science - Discovery
Papers - NEML - Euclid Latents - Nongeodesic Sub Man - VAE
Papers - NEML - Manifold Latents - Lie Group Latent Space
Papers - NEML - Manifold - Nonparametric Decoder - GPLVM
Papers - NEDL - Model Layer - Euclidean - MLP
Papers - NEDL - Layer - Log Perceptron - Riemannian Log Map
The manifold needs to be known for this layer to be implemented, and manifolds whose Log enjoys an analytical expression are preferred.
Papers - NEDL - Benchmarks - Topology Deep Learning (TDL)
Papers - NEDL - Attention - Equivariance - LieTransformer
Euclidean signal on manifold domain for all inputs / outputs, with domain group action.
Papers - Monte-Carlo Tree Search - MCTS
Papers - Function Calling - LLM Compiler - Parallel
Papers - Video Games - Image - Understanding - QA
Papers - Multimodal - Blip-3
Papers - Math - Polynomial Symmetry - Galois Theory
Papers - NEDL - Topological Deep Learning (TDL)
-
Architectures of Topological Deep Learning: A Survey on Topological Neural Networks
Paper • 2304.10031 • Published • 3 -
Adaptive Topological Feature via Persistent Homology: Filtration Learning for Point Clouds
Paper • 2307.09259 • Published • 1 -
Persistent homology of the cosmic web. I: Hierarchical topology in ΛCDM cosmologies
Paper • 2011.12851 • Published • 1
Papers - Security - Benchmark
Papers - Music - Training - Performer - Finger Location
Papers - Music - Training - Annotation - Piano
Papers - NEDL - Equivariant Transformers
Papers - NEDL - Hyperbolic Rotation
Papers - NEDL - Dim Redct - Principal Geodesics Analysis PGA
Papers - Normalization - NLP - Power vs Batch
Papers - Normalization - Embedding Layer - SVD
Papers - Training - Initialization - Regularization - Fixup
Papers - ResNet - Activation - nonlinear ReLU
Papers - Training - Regularization - MixUp Regularizer
-
Fixup Initialization: Residual Learning Without Normalization
Paper • 1901.09321 • Published • 1 -
RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and Out Distribution Robustness
Paper • 2206.14502 • Published • 1 -
MixUp as Locally Linear Out-Of-Manifold Regularization
Paper • 1809.02499 • Published • 1
Papers - NEDL - Topology - Attention - Set Transformer
Papers - NEDL - Topology - Attention - Geodesic Transformer
Papers - NEDL - Topology - Attention - SE(3) Transformer
Papers - Text - Controllable Text Generation (CTG)
Papers - Training - Unlearning
Papers - MoE - Jamba
Spaces - Image - Segmentation
Spaces - Multimodal - Image Generation - Text and Image
Papers - Multimodal - Alignment Correspondence Policy
Models - Image - SDXL
Papers - Training - Loss - Multiple Loss - Jacobian Descent
Papers - NEDL - Topology - Lifting Topological Domains
Papers - Text - Metric - Hamming
Papers - Reasoning - Code Training
Papers - Coding - Agent - Arch - Multi-Turn Learning
Models - Training - Reinforcement Learning - Reasoning
Papers - Fine-tuning - GRPO
Papers - Training - Pipeline - Zero Bubble Rate - 1F1B
Papers - Training - Distributed Pipelines - Parallel
Papers - Tokenizers - World Sim
Papers - Training - MoE - Expert Choice
Papers - Training - MoE
Papers - Attention - Gating - Input - Silu Non-linearity
Papers - Custom Layers - Memory - Index
Papers - Coding - Multilingual
Papers - Coding - Bug Fixing
Papers - Coding - Understanding - Masking - Cloze Test - CT
Papers - Coding - Understanding
Papers - Coding - Datasets - POJ-104
Papers - Coding - GGNN
Papers - Coding - Classification - Instruction - inst2vec
Papers - Coding - Compiler Optimization - IR
Papers - Coding - Compilers - IR - Call Flow
Papers - Coding - Embeddings - Compiler - IR2Vec
Papers - Coding - Rag - Compiler IR
Papers - Coding - LLVM
Papers - Coding - GNNs
Papers - Diverse Intelligence
Papers - NEDL - Research - Structures - Coding - Sorting
Papers - Coding - Rust - Traits
Papers - Coding - Split Trees
Papers - Coding - Safe Rust
Papers - Coding - Translate - C to Rust - Repo - HACL
Papers - Coding - Rust - Memory
Papers - Coding - Translation - C++ to Rust
Models - Text - Chat - Research Papers - Arxiv
Papers - Embeddings - n-gram Hash - Vocabulary
Papers - Multilingual - Encoders - Bytes
Papers - Text - Dataset - Classification - Multitask - MMLU
Papers - Text - Dataset - Coding - MBPP
Papers - Embeddings - Bytes - BPB - Larger Patches than BPE
Papers - Embeddings - Bytes - Tokenizer Free
Papers - Text - Character Level Transformers
Papers - Training - Bytes - Lookup - Rolling Poly Hashing
Papers - Training - Scaling - Compute Optimal
Papers - Embeddings - Bytes - BPB - Tokenzr Free Perplexity
Papers - Training - Embeddings Model - Bytes - Entropy Model
Papers - Attention - Bytes - MHA Cross Attention - Perceiver
Papers - Attention - Block Causal
Papers - Tokenizers- Bytes - Entropy Patching - Threshold
Helps with finding the end of the byte patch
Papers - Tokenizers - Bytes - Patches - Space Detection
Papers - Tokenizers - Bytes - Strided Patches - MegaByte
Papers - Training Research - Bytes - No Vocabulary
Papers - Audio - Contrastive Task - Quantized - Speech
Papers - Audio - Training - Masking - Time Steps
Papers - Audio - Training - Self-Supervised- Unlabeled Data
Papers - Audio - Fine-tuning - Metric - WER
Papers - Audio - Dataset - Phoneme Recognition - TIMIT
Papers - Audio - Dataset - Librispeech
Papers - Audio - Training - Activation - Gumbel Softmax
Papers - Audio - Training - Loss - CTC
Models - Image - ViT
Papers - Tokenizer - Qwen
Papers - Training - Activation Function - SwiGLU
Papers - Training - Algorithm - SGD vs Adam vs Prodigy
Papers - Training - CNN
Papers - Training - LR - Optimizer - SGD-Sal
Papers - Pretraining - Image - ViT
Datasets - Text - E2E
Papers - Training - SGD - Decoupled Weight Decay
Papers - Training - LR - Gradient Local Gain - Variance
Papers - Training - Layer Initialization
Papers - Training - Adam
Papers - Training - Dataset Selection - Spectrogram Features
Papers - Training - Feature Extraction - Frequency - STFT
Papers - Attention - Spectrogram - KV Cache
Papers - Training - Midtraining - Context Length
Papers - Text - Training - Mixture
Papers - Training - Eval - Out of Distribution
Papers - Pretraining - Synthetic Data - Problem Solving
Papers - Fine-tuning - DPO - Pivotal Token Search
Models - Image - Sketch - Pencil
Papers - Encoders - Bytes - More Depth than Decoder
Papers - Training - Bytes - No Tokenizer
-
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Paper • 2105.13626 • Published • 3 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Paper • 2410.20771 • Published • 3
Papers - Reinforcement Learning - Video Games
Papers - Training - Speed - Reduced Training Time
Papers - CoT - Latent Search Tree
Papers - 3D - SLAT - Structure Latents
Papers - Training - Sparse Learning - k-Sparse Autoencoder
Models - Text - SAE
Papers - Robotics
Papers - Math - Differential Geometry
Papers - Training - Convergence - Gaussian Kernel
Papers - Training - Convergence - SoftMax vs SGD
Papers - Training - Convergence - Kernel - Gaussian
Papers - Image - Normalizing Flows
Papers - Image - Diffusion - SBDM
Papers - Image - Diffusion Coefficient - Fokker-Planck
Papers - Image - Diffusion Coefficient - Stochastic
Papers - Image - Training - Sampler - ODE
Papers - Training - Gaussian Mixtures - Bridging
Papers - Video - Generator - Multiple Views
Papers - SAT Solver - GNN
Papers - Training - Supervised - Classification
Papers - SAT Solver - NeuroSAT
Spaces - Image - Generation - High Res - Wide
Papers - Attention - GPU Programming - Kernel - Cuda
Papers - ICL - Text - Prompts - Learning Unique Words
Papers - RL - Text - Prompts - ASCII ART - Game Board
Papers - RL - PC Board Games - Chess - TicTacToe
Papers - RL - Natural Language
Papers - Math - Distance - Spearman Correlation
Papers - Math - Distance - Chebyshev Polynomials
Papers - NEDL - Embedding - Potential Distance - PHATE
Papers - NEDL - Geodesic Symmetry - Harnack Inequality
Papers - Biology - Dataset - RNA - Swiss Roll
Papers - NEDL - Embedding - Heat Geodesic
Papers - Training - Scaling - Influence Functions
Papers - Inference - CPU - Apple
Papers - Inference - CPU - Intel vs Apple - BitNet
Papers - Fine-tuning - Multimodal - Contrastive Learning
Papers - NEDL - Fine-tuning - Embedding Shift
Papers - Image - Datasets - SIMAT
Papers - NEDL - Geodesic
Papers - Training - PCA - Kernel
Papers - Training - Gradient Descent - Kernel
Papers - Training - Non-linear Learning
Papers - Text - 3D Mesh - Fine-tuning - LLaMa
Papers - Fine-tuning - Memory Reduction Techniques - Text
Papers - Mistral - NeMo - Fine-tuning
Papers - Text - Training - Gradient Filtering
Papers - Text - Training - Loss - Cuda - Triton - SRAM
Papers - Text - Training - Loss - Cut Cross Entropy
Papers - Text - Training - Large Vocabulary - CCE
Papers - Image - Fine-tuning - Dataset - Hand Drawn - DCI
Papers - Fine-tuning - Image - LLaVA
Papers - Image - Editing - BrushNet
Papers - Image - Generation Quality Models - HPS
Papers - Image - Guidance - Masked Image Guidance
Papers - Text - Bit Strings - Hamming Distance
Models - Embedding - Multimodal
Papers - Fine-tuning - Text - Embedding
Papers - Text - Embedding - Angle Optimization
Datasets - Text - GitHub Issues
Models - Text - Embedding - Sentence - German and English
Papers - Text - Embedding - Sentence
Models - Text - Embedding - Matryoshka Representation Lang
Papers - Embeddings - Text - Sentence - Matryoshka
Models - Text - Sentence Embedding
Papers - Math - Generate - Synthetic Data - CoT
Papers - Benchmarks - Math - Reasoning - GSM-Symbolic
Papers - Text - Datasets - Math - Reasoning - iGSM
Papers - Text - Training - Retry
Papers - Image - MiniCPM
Papers - Text - Embedding - Sentence - BM25
Papers - Text - Datasets - Flores-200
Papers - Inria
Papers - Healthcare - Image Segmentation
Papers - Image - Guidance - PAG - Perturbed Attention Guidan
Papers - Text - Training - Complex Vector Token Representati
Papers - Text - Encodings - Complex Vectors
Papers - Text - Embedding - Fixed Token - CBOW
Papers - Image - Fine-tuning - Clip - Self-supervision
Papers - Image - Training - Contrastive Loss - Batch Size
Papers - Image - Fine-tuning - DPO
Papers - Text - Tokens - Vocabulary- Zipfian
Papers - Text - Tokens - Vocabulary - Heaps Law
Papers - Image - Zipf
Papers - Training - Image - SLIP
Papers - Fine-tuning - Video - Video Masked Encoder
Papers - Datasets - Text to Image
Papers - Training - Self-Alignment
Papers - Healthcare - Reasoning
Papers - Quantization - BitNet
Models - Image - Autoregressive
Papers - Benchmarks - Math - VQA
Papers - Interpretability - Sparse Autoencoder (SAE)
Papers - Fine-tuning - ResNet
Papers - Training - Knowledge Distillation - World
Papers - Custom Layers - Persistent Key-Value Vectors
Models - Video Games - Gameplay
Papers - Video - MovieGen
matlok - Python Copilot Image Datasets
More extracted images on github: https://github.com/matlok-ai/python-copilot-image-and-audio-examples/tree/main/png
-
matlok/python-image-copilot-training-using-class-knowledge-graphs-2024-01-27
Viewer • Updated • 773 • 561 -
matlok/python-image-copilot-training-using-function-knowledge-graphs
Viewer • Updated • 88 • 668 -
matlok/python-image-copilot-training-using-inheritance-knowledge-graphs
Viewer • Updated • 88 • 224 -
matlok/python-image-copilot-training-using-import-knowledge-graphs
Viewer • Updated • 88 • 100
matlok - Python Copilot Audio Datasets
More extracted mp3 samples on github: https://github.com/matlok-ai/python-copilot-image-and-audio-examples/tree/main/mp3
-
matlok/python-audio-copilot-training-using-class-knowledge-graphs-2024-01-27
Viewer • Updated • 948 • 903 -
matlok/python-audio-copilot-training-using-function-knowledge-graphs
Viewer • Updated • 120 • 184 • 1 -
matlok/python-audio-copilot-training-using-inheritance-knowledge-graphs
Viewer • Updated • 120 • 115 -
matlok/python-audio-copilot-training-using-import-knowledge-graphs
Viewer • Updated • 48 • 101
How to build a Python Coding Model with Alpaca Instructions
Great article on how this works: https://towardsdatascience.com/a-beginners-guide-to-llm-fine-tuning-4bae7d4da672
Image Papers
-
Visual Instruction Tuning
Paper • 2304.08485 • Published • 20 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 39 -
Aligning Large Multimodal Models with Factually Augmented RLHF
Paper • 2309.14525 • Published • 31
Text Instruction Papers
-
Self-Instruct: Aligning Language Model with Self Generated Instructions
Paper • 2212.10560 • Published • 9 -
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4
Paper • 2312.16171 • Published • 37 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 66 -
AlpaCare:Instruction-tuned Large Language Models for Medical Application
Paper • 2310.14558 • Published • 4
Mixture of Experts Papers
MoE
-
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 53 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 56 -
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper • 2312.07987 • Published • 41 -
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Paper • 2101.03961 • Published • 13
Models - Coding
-
dphn/dolphin-2.6-mistral-7b-dpo-laser
Text Generation • 7B • Updated • 96 • 120 -
Evaluating Large Language Models Trained on Code
Paper • 2107.03374 • Published • 8 -
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Paper • 2002.08155 • Published • 2 -
code2seq: Generating Sequences from Structured Representations of Code
Paper • 1808.01400 • Published • 2
Transformer Arch
Checkout: https://bbycroft.net/llm and http://nlp.seas.harvard.edu/2018/04/03/attention.html
-
Attention Is All You Need
Paper • 1706.03762 • Published • 91 -
ImageNet Large Scale Visual Recognition Challenge
Paper • 1409.0575 • Published • 9 -
Sequence to Sequence Learning with Neural Networks
Paper • 1409.3215 • Published • 3 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 17
LoRA
-
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper • 2311.05556 • Published • 87 -
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
Paper • 2311.11501 • Published • 37 -
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 32 -
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Paper • 2310.20624 • Published • 13
Fine-Tuning
-
Metadata Might Make Language Models Better
Paper • 2211.10086 • Published • 4 -
Empirical Analysis of the Strengths and Weaknesses of PEFT Techniques for LLMs
Paper • 2304.14999 • Published • 2 -
PEFT for Speech: Unveiling Optimal Placement, Merging Strategies, and Ensemble Techniques
Paper • 2401.02122 • Published • 2 -
Zephyr: Direct Distillation of LM Alignment
Paper • 2310.16944 • Published • 122
Model Benchmarking
-
Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam Detection
Paper • 2304.01238 • Published • 2 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 23 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13 -
Evaluating Large Language Models Trained on Code
Paper • 2107.03374 • Published • 8
Gaming Reinforcement Learning
Chat datasets
Datasets - DPO
Models - Geospatial
Models - Biotech
U-Net was trained in 10 hours on a NVidia Titan GPU (6 GB) - 2015
-
U-Net: Convolutional Networks for Biomedical Image Segmentation
Paper • 1505.04597 • Published • 14 -
microsoft/BioGPT-Large
Text Generation • Updated • 23k • 207 -
kuleshov-group/caduceus-ps_seqlen-131k_d_model-256_n_layer-16
Fill-Mask • 7.73M • Updated • 969 • 14 -
kuleshov-group/caduceus-ps_seqlen-1k_d_model-256_n_layer-4_lr-8e-3
Fill-Mask • 1.93M • Updated • 45 • 2
Models - Video Editing
-
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing
Paper • 2402.10294 • Published • 27 -
Valley: Video Assistant with Large Language model Enhanced abilitY
Paper • 2306.07207 • Published • 2 -
Video Editing via Factorized Diffusion Distillation
Paper • 2403.09334 • Published • 23
Papers - Attention
-
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 81 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 6 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 22 -
Sequence Parallelism: Long Sequence Training from System Perspective
Paper • 2105.13120 • Published • 6
Papers - Synthetic Data
-
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 31 -
Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping
Paper • 1709.07857 • Published • 2 -
Simple synthetic data reduces sycophancy in large language models
Paper • 2308.03958 • Published • 22 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 11
Models - Fintech
-
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models
Paper • 2402.10986 • Published • 81 -
BloombergGPT: A Large Language Model for Finance
Paper • 2303.17564 • Published • 26 -
GPT-InvestAR: Enhancing Stock Investment Strategies through Annual Report Analysis with Large Language Models
Paper • 2309.03079 • Published • 2 -
FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis
Paper • 2308.01430 • Published • 2
Models - MultiAgent
Papers - NLP Research
Datasets - Synthetic - Instruct
Models - Captions
Models - Touch and Image
Datasets - Image - Text
Models - Parameter Testing
Models - Robotics
-
Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping
Paper • 1709.07857 • Published • 2 -
Sensor-based Multi-Robot Search and Coverage with Spatial Separation in Unstructured Environments
Paper • 2403.01710 • Published • 2 -
Twisting Lids Off with Two Hands
Paper • 2403.02338 • Published • 7
Models - ReAct - Reasoning and Action
Models - Custom-Training
exploring speculative sampling with autoregressive model like: https://proceedings.mlr.press/v139/song21a.html and https://proceedings.mlr.press/v119/
Papers - Testing a Coding Model
Datasets - Multimodal - Text and Images
Papers - Coding
-
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Paper • 2002.08155 • Published • 2 -
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 83 -
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper • 2310.17680 • Published • 73 -
CodePlan: Repository-level Coding using LLMs and Planning
Paper • 2309.12499 • Published • 79
Datasets - Text - Multiple Choice
Models - Math
-
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Paper • 2402.14830 • Published • 25 -
MathScale: Scaling Instruction Tuning for Mathematical Reasoning
Paper • 2403.02884 • Published • 17 -
meta-math/MetaMath-Mistral-7B
Text Generation • Updated • 688 • 96 -
meta-math/MetaMath-13B-V1.0
Text Generation • Updated • 371 • 13
Papers - Reasoning
-
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Paper • 2402.14848 • Published • 20 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 50 -
How Far Are We from Intelligent Visual Deductive Reasoning?
Paper • 2403.04732 • Published • 23 -
Learning to Reason and Memorize with Self-Notes
Paper • 2305.00833 • Published • 5
Papers - IoT
-
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 134 -
Sensor-based Multi-Robot Search and Coverage with Spatial Separation in Unstructured Environments
Paper • 2403.01710 • Published • 2 -
EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models
Paper • 2308.14352 • Published -
Slimmable Encoders for Flexible Split DNNs in Bandwidth and Resource Constrained IoT Systems
Paper • 2306.12691 • Published • 2
Papers - Conversations
Models - Image - Geometric Algebra
Papers - Video
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 21 -
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Paper • 2310.19512 • Published • 16 -
VideoMamba: State Space Model for Efficient Video Understanding
Paper • 2403.06977 • Published • 30 -
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Paper • 2401.09047 • Published • 14
Datasets - Math
-
introspector/unimath
Updated • 1.41k • 7 -
MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms
Paper • 1905.13319 • Published • 2 -
Measuring Mathematical Problem Solving With the MATH Dataset
Paper • 2103.03874 • Published • 5 -
MathScale: Scaling Instruction Tuning for Mathematical Reasoning
Paper • 2403.02884 • Published • 17
Models - Base - 1B
Datasets - Image and Bounding Box
Papers - Sampling
-
Priority Sampling of Large Language Models for Compilers
Paper • 2402.18734 • Published • 19 -
Accelerating Large Language Model Decoding with Speculative Sampling
Paper • 2302.01318 • Published • 3 -
Fast Inference from Transformers via Speculative Decoding
Paper • 2211.17192 • Published • 9 -
AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling
Paper • 2011.09011 • Published • 2
Models - Cooking
Papers - Math - GSM8K
-
Training Verifiers to Solve Math Word Problems
Paper • 2110.14168 • Published • 4 -
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Paper • 2309.12284 • Published • 18 -
LiteSearch: Efficacious Tree Search for LLM
Paper • 2407.00320 • Published • 40 -
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Paper • 2309.03883 • Published • 35
Papers - Training Research
-
Measuring the Effects of Data Parallelism on Neural Network Training
Paper • 1811.03600 • Published • 2 -
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2 -
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Paper • 1905.11946 • Published • 3 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 65
Papers - Reasoning - Vision
Papers - Multi-Agent
Papers - Ring Attention
-
Sequence Parallelism: Long Sequence Training from System Perspective
Paper • 2105.13120 • Published • 6 -
Ring Attention with Blockwise Transformers for Near-Infinite Context
Paper • 2310.01889 • Published • 13 -
Striped Attention: Faster Ring Attention for Causal Transformers
Paper • 2311.09431 • Published • 4 -
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 40
Models - Legal
Models - Audio - Translation
Models - Image - Long Context
Papers - Speculative Decoding
-
Accelerating LLM Inference with Staged Speculative Decoding
Paper • 2308.04623 • Published • 25 -
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Paper • 2310.12962 • Published • 13 -
The Curious Case of Neural Text Degeneration
Paper • 1904.09751 • Published • 3 -
On Speculative Decoding for Multimodal Large Language Models
Paper • 2404.08856 • Published • 13
Datasets - Math - Word Problems
Models - Audio - Music Generation
Datasets - Audio - Fine-tuning
Papers - Striped Attention
Models - Images - Instruct
Papers - Image - Not-using CLIP
-
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper • 2403.05135 • Published • 45 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper • 2403.05525 • Published • 46 -
CoCa: Contrastive Captioners are Image-Text Foundation Models
Paper • 2205.01917 • Published • 3
Models - MoE
-
Mixtral of Experts
Paper • 2401.04088 • Published • 159 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 53 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 73 -
EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models
Paper • 2308.14352 • Published
Models - MoE - IoT
Models - Mamba
Papers - MoE - Research
-
Adaptive sequential Monte Carlo by means of mixture of experts
Paper • 1108.2836 • Published • 2 -
Convergence Rates for Mixture-of-Experts
Paper • 1110.2058 • Published • 2 -
Multi-view Contrastive Learning for Entity Typing over Knowledge Graphs
Paper • 2310.12008 • Published • 2 -
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts
Paper • 2308.11793 • Published • 2
Papers - MoE - Training
-
Robust Mixture-of-Expert Training for Convolutional Neural Networks
Paper • 2308.10110 • Published • 2 -
Experts Weights Averaging: A New General Training Scheme for Vision Transformers
Paper • 2308.06093 • Published • 2 -
ConstitutionalExperts: Training a Mixture of Principle-based Prompts
Paper • 2403.04894 • Published • 2 -
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models
Paper • 2403.03432 • Published • 1
Models - Image - MoE
Models - Image - Drone Photography
Models - MoE - Principles
Models - MoE - Visual Relationship Detection
Papers - Training with Lora
-
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models
Paper • 2403.03432 • Published • 1 -
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data
Paper • 2304.08247 • Published • 2
Papers - MoE - Router
-
Turn Waste into Worth: Rectifying Top-k Router of MoE
Paper • 2402.12399 • Published • 2 -
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via Competition
Paper • 2402.02526 • Published • 3 -
Buffer Overflow in Mixture of Experts
Paper • 2402.05526 • Published • 8 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 28
Models - MoE - Audio
Papers - MoE - Image
-
Scaling Vision with Sparse Mixture of Experts
Paper • 2106.05974 • Published • 4 -
Routers in Vision Mixture of Experts: An Empirical Study
Paper • 2401.15969 • Published • 2 -
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Paper • 2206.02770 • Published • 4 -
Experts Weights Averaging: A New General Training Scheme for Vision Transformers
Paper • 2308.06093 • Published • 2
Papers - MoE - Training - Blocks
Papers - MoE - Adversary Queries
Papers - MoE - Custom Layers
-
LocMoE: A Low-overhead MoE for Large Language Model Training
Paper • 2401.13920 • Published • 2 -
HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts
Paper • 2312.07035 • Published • 2 -
DEMix Layers: Disentangling Domains for Modular Language Modeling
Paper • 2108.05036 • Published • 3
Papers - Multimodal
-
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper • 2402.14289 • Published • 21 -
ImageBind: One Embedding Space To Bind Them All
Paper • 2305.05665 • Published • 6 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 188 -
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Paper • 2206.02770 • Published • 4
Papers - Multimodal - Documents
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 188 -
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Paper • 2006.03654 • Published • 3 -
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
Paper • 2111.09543 • Published • 3
Papers - Image
-
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Paper • 2403.06775 • Published • 5 -
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper • 2010.11929 • Published • 14 -
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 -
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2
Datasets - Text and Video
Papers - Performance Trends in AI
Papers - MoE - Audio
-
SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts
Paper • 2105.03036 • Published • 2 -
Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition
Paper • 2112.05820 • Published • 2 -
SpeechMoE2: Mixture-of-Experts Model with Improved Routing
Paper • 2111.11831 • Published • 2
Papers - Quants
-
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper • 2310.16795 • Published • 27 -
Pareto-Optimal Quantized ResNet Is Mostly 4-bit
Paper • 2105.03536 • Published • 2 -
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Paper • 2403.15447 • Published • 16
Papers - MoE - Speech Recognition
Papers - MoE - Multilingual
Papers - MoE - Training - Weight Sharing
Papers - Image - Handwriting Recognition
-
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 -
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2 -
Vulnerability Analysis of Transformer-based Optical Character Recognition to Adversarial Attacks
Paper • 2311.17128 • Published • 2 -
Data Generation for Post-OCR correction of Cyrillic handwriting
Paper • 2311.15896 • Published • 4
Papers - MoE - Handwriting Recognition
Papers - Image - Adversarial
Papers - Image - Handwriting and Online Gestures
Papers - Image - Fine-tuning
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 188 -
Visual Instruction Tuning
Paper • 2304.08485 • Published • 20 -
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Paper • 2403.09622 • Published • 18 -
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 86
Papers - Image - Handwritten Generation
Models - Image - Diffusion Probabilistic Models
Papers - Image - Handwriting Recognition - Lexical Features
Papers - Image - Custom Layers
-
Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes
Paper • 2110.05909 • Published • 2 -
Deep Residual Learning for Image Recognition
Paper • 1512.03385 • Published • 8 -
Wide Residual Networks
Paper • 1605.07146 • Published • 2 -
Comprehensive Survey of Model Compression and Speed up for Vision Transformers
Paper • 2404.10407 • Published • 1
Papers - Image - Handwriting Recognition - Near-Realtime
Papers - Text - Decoders
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15 -
A Thorough Examination of Decoding Methods in the Era of LLMs
Paper • 2402.06925 • Published • 1 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108
Papers - Text - Bidirectional Encoders
-
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Paper • 1901.08746 • Published • 6 -
Pretraining-Based Natural Language Generation for Text Summarization
Paper • 1902.09243 • Published • 2 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Paper • 2006.03654 • Published • 3
Papers - Text - Pre-training - Research
-
Pretraining-Based Natural Language Generation for Text Summarization
Paper • 1902.09243 • Published • 2 -
Learning to Reason and Memorize with Self-Notes
Paper • 2305.00833 • Published • 5 -
Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise
Paper • 2212.11685 • Published • 2 -
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Paper • 2408.16293 • Published • 27
Papers - Text - Benchmarks - Quality Diversity
Papers - Text - Research
-
An Interdisciplinary Comparison of Sequence Modeling Methods for Next-Element Prediction
Paper • 1811.00062 • Published • 2 -
mT5: A massively multilingual pre-trained text-to-text transformer
Paper • 2010.11934 • Published • 4 -
Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance
Paper • 2310.10021 • Published • 2 -
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 50
Papers - Multimodal - Speech and Text
Papers - Multimodal - Training and Tuning
-
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Paper • 2310.16527 • Published • 2 -
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
Paper • 2310.02960 • Published • 1 -
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 129 -
Veagle: Advancements in Multimodal Representation Learning
Paper • 2403.08773 • Published • 10
Papers - Video - Motion Control
Papers - Video - Pre-training
Papers - Image - Caption Generation
Papers - Transformer Research - Custom Layers
Papers - SuperNets
-
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts
Paper • 2306.04845 • Published • 4 -
Balanced Mixture of SuperNets for Learning the CNN Pooling Architecture
Paper • 2306.11982 • Published • 2 -
AlphaNet: Improved Training of Supernets with Alpha-Divergence
Paper • 2102.07954 • Published • 2
Papers - Mamba - Structured State Space Model
-
Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM
Paper • 2403.07487 • Published • 17 -
LocalMamba: Visual State Space Model with Windowed Selective Scan
Paper • 2403.09338 • Published • 9 -
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Paper • 2403.14520 • Published • 35 -
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
Paper • 2403.15360 • Published • 13
Papers - MoE - Multimodal
Mixtures of experts for text, image and speech
Papers - Multimodal - Drone
Papers - Training Research - Time series
-
Chronos: Learning the Language of Time Series
Paper • 2403.07815 • Published • 46 -
Large Language Models as Optimizers
Paper • 2309.03409 • Published • 77 -
Pattern Discovery in Time Series with Byte Pair Encoding
Paper • 2106.00614 • Published • 2 -
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Paper • 2403.19888 • Published • 12
Papers - Neural Architecture Search
-
AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling
Paper • 2011.09011 • Published • 2 -
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Paper • 2005.14187 • Published • 2 -
BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models
Paper • 2003.11142 • Published • 2 -
Efficient Architecture Search by Network Transformation
Paper • 1707.04873 • Published • 2
Papers - Image - Split Computing
Papers - U-Net
-
U-Net: Convolutional Networks for Biomedical Image Segmentation
Paper • 1505.04597 • Published • 14 -
Image Segmentation using U-Net Architecture for Powder X-ray Diffraction Images
Paper • 2310.16186 • Published • 2 -
H-DenseUNet: Hybrid Densely Connected UNet for Liver and Tumor Segmentation from CT Volumes
Paper • 1709.07330 • Published • 2 -
Deep LOGISMOS: Deep Learning Graph-based 3D Segmentation of Pancreatic Tumors on CT scans
Paper • 1801.08599 • Published • 2
Papers - Image - Segmentation - Cancer
-
H-DenseUNet: Hybrid Densely Connected UNet for Liver and Tumor Segmentation from CT Volumes
Paper • 1709.07330 • Published • 2 -
Deep LOGISMOS: Deep Learning Graph-based 3D Segmentation of Pancreatic Tumors on CT scans
Paper • 1801.08599 • Published • 2 -
Hierarchical multi-class segmentation of glioma images using networks with multi-level activation function
Paper • 1810.09488 • Published • 2 -
Cross-modality (CT-MRI) prior augmented deep learning for robust lung tumor segmentation from small MR datasets
Paper • 1901.11369 • Published • 2
Papers - Image - Segmentation - Drone
Papers - Image - Segmentation - Adversarial
Papers - Image - Segmentation - Stroke Brain Lesions
Papers - Image - IoT
Papers - Image - Hybrid - ResNet - U-Net
Papers - Image - Segmentation - Bio Cell
-
Enforcing Morphological Information in Fully Convolutional Networks to Improve Cell Instance Segmentation in Fluorescence Microscopy Images
Paper • 2106.05843 • Published • 2 -
Semi-Supervised Semantic Segmentation using Redesigned Self-Training for White Blood Cells
Paper • 2401.07278 • Published • 2
Papers - Image - Hybrid - Graph Net - U-Net
Papers - Image - CSWin - Cross-Shaped Windows
-
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Paper • 2107.00652 • Published • 2 -
Cross-Shaped Windows Transformer with Self-supervised Pretraining for Clinically Significant Prostate Cancer Detection in Bi-parametric MRI
Paper • 2305.00385 • Published • 2 -
2nd Place Solution to Google Landmark Recognition Competition 2021
Paper • 2110.02638 • Published • 2 -
BOAT: Bilateral Local Attention Vision Transformer
Paper • 2201.13027 • Published • 2
Papers - Image - Encoders - LePE - Local-Enhanced Pos Enc
Papers - Image - Attention - Multi-Scale
-
MAFormer: A Transformer Network with Multi-scale Attention Fusion for Visual Recognition
Paper • 2209.01620 • Published • 2 -
Using Multi-scale SwinTransformer-HTC with Data augmentation in CoNIC Challenge
Paper • 2202.13588 • Published • 2 -
GasHis-Transformer: A Multi-scale Visual Transformer Approach for Gastric Histopathological Image Detection
Paper • 2104.14528 • Published • 2
Papers - Text - Fine-tuning - Math
Papers - Robot - Tasks - Boss
Papers - Robot - Research
-
Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance
Paper • 2310.10021 • Published • 2 -
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
Discovering Adaptable Symbolic Algorithms from Scratch
Paper • 2307.16890 • Published • 7 -
DragAPart: Learning a Part-Level Motion Prior for Articulated Objects
Paper • 2403.15382 • Published • 11
Papers - Image - GasHis
Papers - Text - Architecture - Scaling to 1000 Layers
Papers - Adversarial Testing
-
Feature-Guided Black-Box Safety Testing of Deep Neural Networks
Paper • 1710.07859 • Published • 2 -
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks
Paper • 2309.17410 • Published • 4 -
Intriguing Properties of Adversarial Examples
Paper • 1711.02846 • Published • 2
Papers - Image - Compound Scaling Method
Papers - Image - Visualization - Splatting
Papers - AI - Safety
Papers - Custom Layers
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Paper • 2310.00535 • Published • 2 -
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Paper • 2307.09458 • Published • 11 -
The Impact of Depth and Width on Transformer Language Model Generalization
Paper • 2310.19956 • Published • 10
Papers - Motion Control
Papers - Fine-tuning - LoRA
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data
Paper • 2304.08247 • Published • 2 -
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 32 -
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper • 2404.00656 • Published • 11
Papers - AI - Self-refinement - Training and Tuning
Papers - Audio
-
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 21 -
Structural Similarities Between Language Models and Neural Response Measurements
Paper • 2306.01930 • Published • 2 -
Streaming Transformer ASR with Blockwise Synchronous Beam Search
Paper • 2006.14941 • Published • 2 -
NU-GAN: High resolution neural upsampling with GAN
Paper • 2010.11362 • Published • 2
Papers - Observability and Interpretability
-
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Paper • 2310.00535 • Published • 2 -
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Paper • 2211.00593 • Published • 2 -
Rethinking Interpretability in the Era of Large Language Models
Paper • 2402.01761 • Published • 23 -
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Paper • 2307.09458 • Published • 11
Papers - Interpretability - DAS
Papers - Multilingual
-
A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese
Paper • 2304.08999 • Published • 3 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
Robust Open-Vocabulary Translation from Visual Text Representations
Paper • 2104.08211 • Published • 1 -
Poro 34B and the Blessing of Multilinguality
Paper • 2404.01856 • Published • 15
Papers - Watermark
Papers - Proof of Learning
Papers - Named Entity Extraction and Disambiguation
Papers - Neural Architecture Search - One-shot
Papers - Hyperparameter Architecture Search
Papers - Neural Architecture Search - RNN
Papers - Neural Architecture Search - Quantization - FLIQS
Papers - Neural Architecture Search - AutoML
Papers - AI - Are models similar to a human brain?
Papers - Math - Automated Discovery
Papers - Alpaca
Papers - Critical Thinking
Papers - Text - Encoders - Fire
Paper - Image - Chain of Thought
Models - Fine-tuning - Mixture of Loras
Datasets - HTML
Papers - Image - Mamba
-
LocalMamba: Visual State Space Model with Windowed Selective Scan
Paper • 2403.09338 • Published • 9 -
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
Paper • 2403.15360 • Published • 13 -
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Paper • 2403.19888 • Published • 12
Papers - Healthcare - Mental Health
Papers - Encoders - Fire
Papers - Video - Understanding
-
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Paper • 2403.09626 • Published • 16 -
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Paper • 2403.10517 • Published • 37 -
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Paper • 2403.13501 • Published • 9 -
LITA: Language Instructed Temporal-Localization Assistant
Paper • 2403.19046 • Published • 19
Papers - Multimodal - Encoders
Papers - Text - Star
Papers - Image - Near Real Time
Papers - Image - Editing
-
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control
Paper • 2403.09055 • Published • 27 -
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Paper • 2112.10741 • Published • 4 -
Lightweight Image Inpainting by Stripe Window Transformer with Joint Attention to CNN
Paper • 2301.00553 • Published • 3 -
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Paper • 2403.18818 • Published • 28
Papers - Image - LCM
-
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control
Paper • 2403.09055 • Published • 27 -
ReNoise: Real Image Inversion Through Iterative Noising
Paper • 2403.14602 • Published • 21 -
EdgeFusion: On-Device Text-to-Image Generation
Paper • 2404.11925 • Published • 23
Papers - Image - Editing - Glide
Papers - Image - Semantic Palette
Papers - Training - Distributed
Datasets - Chess
Papers - Training - FixMatch
Papers - Task Assistant - ExploreLLM
Papers - Training - Problem Solving
Papers - GUI - Task Assistants
Papers - Model Scaling - Effective Parameter Count
Papers - Scaling
Papers - Chain of Verification
Datasets - Text - Multilingual
Papers - CoT - Chain of Thought
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 39 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Paper • 2403.14624 • Published • 53 -
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Paper • 2402.12875 • Published • 13
Papers - Fine-tuning - QA-LoRA
Papers - Text - Perform Tasks on Tabular Data
-
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 41 -
approximatelabs/tablib-v1-full
Viewer • Updated • 10.4B • 5.33k • 64 -
approximatelabs/tablib-v1-sample
Viewer • Updated • 44.9k • 297 • 13 -
TabLib: A Dataset of 627M Tables with Context
Paper • 2310.07875 • Published • 8
Papers - Text - Dataset - TabLib - Tabular
Papers - Qwen - Report
Papers - MoE - Quantization
Papers - Research - Replacing Attention
Embeddings - C4 - Jina
Papers - Decoders - CoT Decoding
Papers - Rag - Multi-hop Queries
-
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
Paper • 2401.15391 • Published • 6 -
Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation
Paper • 2404.06910 • Published • 3 -
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Paper • 2404.18796 • Published • 71
Embeddings - Coding
Papers - Training - Synthetic Noise
-
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Paper • 2002.08155 • Published • 2 -
Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise
Paper • 2212.11685 • Published • 2 -
ReNoise: Real Image Inversion Through Iterative Noising
Paper • 2403.14602 • Published • 21 -
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Paper • 2105.13626 • Published • 3
Papers - Text - Pre-training - Synthetic Noise
Papers - Image - Training - Knowledge Graphs
Papers - Multimodal - Fine-tuning - Report
Papers - Text - Training - Code - Byte Pair Encoding
Papers - Coding - BPE vs Pointer Mixture Network
Papers - Automatic Speech Recognition - Beam Search
Papers - Explainability
-
Neural networks behave as hash encoders: An empirical study
Paper • 2101.05490 • Published • 2 -
A Multiscale Visualization of Attention in the Transformer Model
Paper • 1906.05714 • Published • 2 -
BERT Rediscovers the Classical NLP Pipeline
Paper • 1905.05950 • Published • 3 -
Using Explainable AI and Transfer Learning to understand and predict the maintenance of Atlantic blocking with limited observational data
Paper • 2404.08613 • Published • 1
Papers - Training - DoReMi
Papers - Training - AI training AI
-
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Paper • 2305.10429 • Published • 3 -
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Paper • 2403.15042 • Published • 27 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 124 -
Discovering Preference Optimization Algorithms with and for Large Language Models
Paper • 2406.08414 • Published • 16
Papers - Adafactor
-
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2 -
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Paper • 2305.10429 • Published • 3 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 6
Papers - MoE - Hashing instead of a Router
Datasets - Multimodal - Image and Text
-
DocVQA: A Dataset for VQA on Document Images
Paper • 2007.00398 • Published • 2 -
GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
Paper • 1902.09506 • Published • 2 -
google/wit
Viewer • Updated • 2.66M • 200 • 59 -
Lin-Chen/MMStar
Viewer • Updated • 1.5k • 10.7k • 43
Datasets - Multimodal - Document and Image
Papers - Text - SQL
Papers - Training - Speculative Decoding - Single Model
Papers - Fine-tuning - Rag
Papers - Video - Agent
Papers - Audio - GAN
Papers - Decoders - 3D Nerf
Papers - ControlNet
-
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 57 -
LightIt: Illumination Modeling and Control for Diffusion Models
Paper • 2403.10615 • Published • 18 -
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
Paper • 2403.16627 • Published • 21 -
DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
Paper • 2403.17237 • Published • 11
Papers - Image - Lightning
Papers - Text - Label Generator
Papers - Image - Chart to Table
Papers - Image - 3D Asset Enhancement
Papers - Training - Reward Model
-
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 59 -
WARM: On the Benefits of Weight Averaged Reward Models
Paper • 2401.12187 • Published • 19 -
RewardBench: Evaluating Reward Models for Language Modeling
Paper • 2403.13787 • Published • 22 -
DreamReward: Text-to-3D Generation with Human Preference
Paper • 2403.14613 • Published • 37
Papers - Fine-tuning - Mixture of LoRA (MoL)
Papers - Attention - Cross
-
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers
Paper • 2403.12943 • Published • 15 -
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper • 2401.04577 • Published • 43 -
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 13 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 22
Papers - FIne-tuning - Multi-Agent
Papers - Image - Document - mPlugOwl
Papers - Structured Learning - Document
Papers - Prompt
-
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 25 -
Improving Text-to-Image Consistency via Automatic Prompt Optimization
Paper • 2403.17804 • Published • 20 -
The Unreasonable Effectiveness of Eccentric Automatic Prompts
Paper • 2402.10949 • Published • 5 -
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Paper • 2306.02707 • Published • 47
Models - Reverse Engineering - Decompiler
Papers - Text - 3D
-
TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation
Paper • 2403.12906 • Published • 7 -
GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
Paper • 2403.14621 • Published • 16 -
LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis
Paper • 2403.15385 • Published • 8 -
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
Paper • 2411.09595 • Published • 77
Paper - Image - Table - Extraction
Papers - Tabular
Converted the Elephants Never Forget paper to audio with Bark: https://drive.google.com/file/d/13IlbhKh71vxLpdYJ6mkIiiJZOUsf7XFv/view?usp=sharing
-
End-to-End Object Detection with Transformers
Paper • 2005.12872 • Published • 7 -
Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models
Paper • 2404.06209 • Published • 5 -
TabReD: A Benchmark of Tabular Machine Learning in-the-Wild
Paper • 2406.19380 • Published • 50 -
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
Paper • 2407.09025 • Published • 139
Models - Image - Object Detection
Papers - 3D - Text
Papers - Frankenmerging
-
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 58 -
Model Stock: All we need is just a few fine-tuned models
Paper • 2403.19522 • Published • 13 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 124
Papers - Image - Model Merging
Papers - Image - Math
-
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Paper • 2403.14624 • Published • 53 -
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Paper • 2407.01284 • Published • 81 -
MAVIS: Mathematical Visual Instruction Tuning
Paper • 2407.08739 • Published • 33
Papers - Image - Reward Model
Papers - Video - Editing
Papers - Image - Personalization - Captions
Papers - 3D - Reconstruction
Papers - Video - Upsampler
Papers - Image - Adversarial (GAN)
Papers - Toxicity
-
Recourse for reclamation: Chatting with generative language models
Paper • 2403.14467 • Published • 8 -
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Paper • 2403.15447 • Published • 16 -
Introducing v0.5 of the AI Safety Benchmark from MLCommons
Paper • 2404.12241 • Published • 13
Papers - Video - Content Motion Latent Diffusion
Papers - Image - Depth Estimation
Papers - Image - Training
-
Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling
Paper • 2403.14551 • Published • 2 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 18 -
Toward a Better Understanding of Fourier Neural Operators: Analysis and Improvement from a Spectral Perspective
Paper • 2404.07200 • Published • 2 -
An inclusive review on deep learning techniques and their scope in handwriting recognition
Paper • 2404.08011 • Published • 1
Papers - Text - Classification
-
LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding
Paper • 2306.14924 • Published • 2 -
When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes
Paper • 2404.12365 • Published • 1 -
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
Paper • 2311.06668 • Published • 5 -
Wave Network: An Ultra-Small Language Model
Paper • 2411.02674 • Published • 3
Papers - Audio - Training
-
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
Paper • 2403.14438 • Published • 2 -
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
Paper • 2403.17694 • Published • 12 -
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper • 2404.14700 • Published • 32 -
SONAR: Sentence-Level Multimodal and Language-Agnostic Representations
Paper • 2308.11466 • Published • 1
Papers - Audio - Whisper vs Clap - Whisper wins with ASR
Papers - ICL - In-Context Learning
-
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models
Paper • 2311.00871 • Published • 3 -
Can large language models explore in-context?
Paper • 2403.15371 • Published • 33 -
Data Distributional Properties Drive Emergent In-Context Learning in Transformers
Paper • 2205.05055 • Published • 2 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 37
Papers - Agent - Architecture
Papers - Fine-tuning - DPO
Refer to additional papers: https://link.springer.com/article/10.1007/s10994-014-5458-8 and https://link.springer.com/article/10.1007/BF00992696
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Dueling RL: Reinforcement Learning with Trajectory Preferences
Paper • 2111.04850 • Published • 2
Papers - Training - Critic Model
-
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Paper • 2305.11738 • Published • 8 -
Shepherd: A Critic for Language Model Generation
Paper • 2308.04592 • Published • 32 -
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Paper • 2402.14809 • Published • 3 -
DRLC: Reinforcement Learning with Dense Rewards from LLM Critic
Paper • 2401.07382 • Published • 2
Papers - Security - Fuzzing
Papers - Benchmarks - Reasoning
-
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Paper • 2402.14809 • Published • 3 -
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools
Paper • 2406.03618 • Published • 2
Papers - Music
Papers - Coding - Chain of Thought
-
ReGAL: Refactoring Programs to Discover Generalizable Abstractions
Paper • 2401.16467 • Published • 10 -
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Paper • 2404.02575 • Published • 50 -
How Far Can We Go with Practical Function-Level Program Repair?
Paper • 2404.12833 • Published • 7
Papers - Coding - Fine-tuning
-
V-STaR: Training Verifiers for Self-Taught Reasoners
Paper • 2402.06457 • Published • 9 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 46 -
McEval: Massively Multilingual Code Evaluation
Paper • 2406.07436 • Published • 41 -
Is Programming by Example solved by LLMs?
Paper • 2406.08316 • Published • 13
Papers - Fine-tuning - Reasoning
-
V-STaR: Training Verifiers for Self-Taught Reasoners
Paper • 2402.06457 • Published • 9 -
Self-Consistency Preference Optimization
Paper • 2411.04109 • Published • 19 -
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper • 2411.14405 • Published • 61 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 129
Papers - Mamba - FFT - EinFFT
Papers - Multimodal - Video - Text - Audio
Papers - Multimodal - Captions - Speech
Papers - Synthetic Data - Multimodal
Papers - 3D - Synthetic Data
-
ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars
Paper • 2403.15383 • Published • 15 -
LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis
Paper • 2403.15385 • Published • 8 -
MaPa: Text-driven Photorealistic Material Painting for 3D Shapes
Paper • 2404.17569 • Published • 13 -
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Paper • 2404.16994 • Published • 36
Papers - Documents - Fine-tuning
-
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
Paper • 2403.15246 • Published • 11 -
Noise-Aware Training of Layout-Aware Language Models
Paper • 2404.00488 • Published • 10 -
Text Role Classification in Scientific Charts Using Multimodal Transformers
Paper • 2402.14579 • Published • 1
Papers - Coding - Compiler
-
Compiler generated feedback for Large Language Models
Paper • 2403.14714 • Published • 7 -
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Paper • 2404.02575 • Published • 50 -
Compiling C to Safe Rust, Formalized
Paper • 2412.15042 • Published • 1
Papers - Training - Teacher Model
Papers - Searchformer
Papers - Training Research - Stack Traces
Papers - Fine-tuning - Procedure Cloning
Papers - Decoders - T5
Papers - DenseFormer
Papers - Encoders - Image - Clip
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
Paper • 2404.03413 • Published • 28 -
openai/clip-vit-large-patch14-336
Zero-Shot Image Classification • Updated • 3.36M • 277 -
openai/clip-vit-base-patch32
Zero-Shot Image Classification • Updated • 19.8M • 793
Papers - Training Research - Exemplary Prompts
Models - TTS
Models - Documents
Papers - Agent - Operating Systems
Papers - Multilingual - Japanese
Papers - Document - Understanding - Historical Images Text
Papers - Image - Historical
-
Insightful analysis of historical sources at scales beyond human capabilities using unsupervised Machine Learning and XAI
Paper • 2310.09091 • Published • 2 -
Evolution and Transformation of Scientific Knowledge over the Sphaera Corpus: A Network Study
Paper • 2004.00520 • Published • 2 -
NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement
Paper • 2404.05669 • Published • 1
Papers - Image - VGG
Papers - Image - Historical - Symbolic and Artistic
Papers - Research - Emergent Properties
Papers - Deepmind - ICL vs RNN vs LTSM
Papers - DeepMind - ICL Small Models are More Exemplar-Based
Spaces - Decoders - Beam
Papers - Video - NeRF
Papers - Fine-tuning - Model Layer Pruning
Papers - Intel - MLP
Papers - Image - Prompt
Papers - Fine-tuning - SFT
-
InternLM2 Technical Report
Paper • 2403.17297 • Published • 34 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 89 -
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data
Paper • 2404.12195 • Published • 12
Papers - Text - Video Generator
Papers - Image - Gaussian Splatting - 2D
Papers - Audio - Image
Papers - Training Research - Audio
Models - Image - Streaming
Datasets - Meta
Papers - Google
-
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 86 -
Long-form factuality in large language models
Paper • 2403.18802 • Published • 26 -
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Paper • 2403.18818 • Published • 28 -
TC4D: Trajectory-Conditioned Text-to-4D Generation
Paper • 2403.17920 • Published • 18
Papers - Imagen
Papers - University - University of California Berkeley
-
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
Paper • 1801.03924 • Published • 2 -
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Paper • 2403.15042 • Published • 27 -
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Paper • 1712.05884 • Published • 3 -
Efficient Memory Management for Large Language Model Serving with PagedAttention
Paper • 2309.06180 • Published • 25
Papers - Adobe
Papers - 3DGS
-
Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
Paper • 2403.18795 • Published • 20 -
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Paper • 2403.18118 • Published • 12 -
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling
Paper • 2403.19655 • Published • 19 -
Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces
Paper • 2403.20275 • Published • 10
Papers - Text - Factuality
Papers - Healthcare - Training Research
Papers - DataBricks
Papers - Image - Generator - Large Resolution
Papers - Apple
-
Towards a World-English Language Model for On-Device Virtual Assistants
Paper • 2403.18783 • Published • 6 -
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 129 -
ReALM: Reference Resolution As Language Modeling
Paper • 2403.20329 • Published • 22 -
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Paper • 2404.05719 • Published • 82
Papers - Encoders - Video - MetaCLIP
Papers - Training Research - Mixture FOFE
Papers - Image - Editing - Object Removal
Papers - Image - Editing - Counterfactual Supervision
Papers - 3DGS - Open-world Segmentation
Papers - Microsoft
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 33 -
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling
Paper • 2403.19655 • Published • 19 -
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper • 2404.00656 • Published • 11 -
Enabling Memory Safety of C Programs using LLMs
Paper • 2404.01096 • Published • 1
Papers - Healthcare - Image Analysis
-
Generating Synthetic Computed Tomography for Radiotherapy: SynthRAD2023 Challenge Report
Paper • 2403.08447 • Published • 2 -
Deformable MRI Sequence Registration for AI-based Prostate Cancer Diagnosis
Paper • 2404.09666 • Published • 1 -
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning
Paper • 2408.07931 • Published • 22
Papers - Healthcare - Image - CT
Papers - Image - Segmentation - Bounding Box Infilling
Papers - Image - Translation
Papers - Multilingual - German
Papers - Multilingual - Translation
Papers - Multilingual - Synthetic Noise
Papers - Fine-tuning - Text - U-Net
Papers - Image - Encoders - Clip
-
TextCraftor: Your Text Encoder Can be Image Quality Controller
Paper • 2403.18978 • Published • 15 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 22 -
OmniFusion Technical Report
Paper • 2404.06212 • Published • 77 -
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper • 2404.07448 • Published • 12
Papers - Video - Encoders
Papers - Nvidia
-
LITA: Language Instructed Temporal-Localization Assistant
Paper • 2403.19046 • Published • 19 -
Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces
Paper • 2403.20275 • Published • 10 -
Condition-Aware Neural Network for Controlled Image Generation
Paper • 2404.01143 • Published • 13 -
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues
Paper • 2404.03820 • Published • 26
Papers - 3DGS - 3D Mesh Generator
Papers - Model - SFT - Alpaca and DPO - Solar
Papers - University - Cornell University
-
Learning Trajectory Preferences for Manipulators via Iterative Improvement
Paper • 1306.6294 • Published • 3 -
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Paper • 2403.19888 • Published • 12 -
RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Paper • 2404.03673 • Published • 16 -
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation
Paper • 2404.13026 • Published • 24
Papers - Fine-tuning - DPO - Reward Model Training
Papers - Reward Model - Bradley-Terry
https://web.stanford.edu/class/archive/stats/stats200/stats200.1172/Lecture24.pdf
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
HyperCLOVA X Technical Report
Paper • 2404.01954 • Published • 25 -
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 12 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 89
Papers - University of Chicago
Papers - KL Regularization - ADP - Con/Divergence Error Rate
Papers - Fine-tuning - Factuality
Datasets - RLHF
Papers - top-p - Nucleus Sampling
Papers - Distribution - Zipf Analysis
Papers - University - University of Washington
-
The Curious Case of Neural Text Degeneration
Paper • 1904.09751 • Published • 3 -
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper • 2404.01197 • Published • 31 -
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Paper • 1905.10044 • Published • 2 -
PIQA: Reasoning about Physical Commonsense in Natural Language
Paper • 1911.11641 • Published • 3
Models - Bitnet - Text
Papers - Tacotron 2
Spectrogram Prediction Network
As in Tacotron, mel spectrograms are computed through a short-
time Fourier transform (STFT) ... and a Hann window func
Papers - Audio - Time Domain Waveforms
Papers - Audio - Mel Spectogram
-
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Paper • 1712.05884 • Published • 3 -
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 12 -
Music Consistency Models
Paper • 2404.13358 • Published • 14 -
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper • 2404.14700 • Published • 32
Papers - GAN
Papers - GAN - Compression - Bitstream
Papers - Audio - STT - ASR
-
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Paper • 2303.00747 • Published • 5 -
Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion
Paper • 2311.14836 • Published • 2 -
SONAR: Sentence-Level Multimodal and Language-Agnostic Representations
Paper • 2308.11466 • Published • 1 -
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Paper • 2108.06209 • Published • 1
Papers - Audio - WhisperX
Papers - Audio - VoiceCraft
Papers - Audio - Compression
Models - Audio - Codec
Models - Audio - Decoders
Models - Meta - FAIR
Models - Getting Started - Pre-training
Models - Reward Model
Datasets - Chat - RLHF
Papers - Audio - Masked Language Model
Papers - Audio - Encoders
Models - ResNet
Papers - Inference - Rescore Models
Papers - Kyutai
https://kyutai.org/
Models - Audio - Hybrid - AR with NAR Models
Papers - MoE - Mamba
Papers - IoT - Screen Usage Understanding and Context
Papers - Mamba - Limitations - In-Context Learning (ICL)
Papers - AI21 Labs
Papers - S-Lab
Papers - University of Wisconsin
Papers - Hallucinations
Papers - University of Bristol
Papers - Vanderbilt
Papers - University - New York University
-
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Paper • 2403.19888 • Published • 12 -
Measuring Style Similarity in Diffusion Models
Paper • 2404.01292 • Published • 17 -
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Paper • 1804.07461 • Published • 4 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15
Papers - Embeddings - Text
-
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Paper • 2403.20327 • Published • 48 -
2D Matryoshka Sentence Embeddings
Paper • 2402.14776 • Published • 6 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 156
Papers - Training a 2.8B Model in 38 days
Papers - vLLM
Papers - Attention - PagedAttention
Papers - Frankenmerge - Model Stock - Use Fine-tuned Models
Models - Model Stock
Models - Frankenmerge - Model Stock
Papers - Benchmarks - Financials
Models - 2bit
Papers - Video - Reward Model
Papers - ASU
Papers - University of Maryland
Papers - Chinese Academy of Sciences
Papers - 3D - FlexiCubes
Gradient-based surface extraction method
Papers - Fine-tuning - Llava - DPO
Papers - Salesforce
-
Non-Autoregressive Neural Machine Translation
Paper • 1711.02281 • Published • 1 -
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Paper • 2107.07651 • Published • 1 -
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Paper • 2404.07972 • Published • 50 -
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 18
Papers - Speech - Chain of Thought
Papers - Chinese University of Hong Kong
-
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper • 2404.00656 • Published • 11 -
CameraCtrl: Enabling Camera Control for Text-to-Video Generation
Paper • 2404.02101 • Published • 24 -
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1
Papers - Audio - Fine-tuning - Lora
Papers - Documents - LayoutLM
-
Noise-Aware Training of Layout-Aware Language Models
Paper • 2404.00488 • Published • 10 -
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Paper • 2204.08387 • Published • 5 -
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Paper • 2012.14740 • Published • 2 -
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Paper • 1912.13318 • Published • 4
Papers - Document - OCR
-
Noise-Aware Training of Layout-Aware Language Models
Paper • 2404.00488 • Published • 10 -
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction
Paper • 2203.08411 • Published • 1 -
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
Paper • 2305.02549 • Published • 6 -
ETC: Encoding Long and Structured Inputs in Transformers
Paper • 2004.08483 • Published • 1
Papers - Video - Captions
-
Streaming Dense Video Captioning
Paper • 2404.01297 • Published • 13 -
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Paper • 2404.16994 • Published • 36 -
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Paper • 2406.04325 • Published • 75
Papers - Decoders - Training Decoding Point Supervision
Papers - Image - Healthcare - Cardiac MRI
Papers - Training Research - Optimizers
Papers - Coding - C/C++
Papers - Coding - Operating Systems - Memory
Papers - Extended Transformer Construction
-
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
Paper • 2305.02549 • Published • 6 -
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction
Paper • 2203.08411 • Published • 1 -
ETC: Encoding Long and Structured Inputs in Transformers
Paper • 2004.08483 • Published • 1 -
LongT5: Efficient Text-To-Text Transformer for Long Sequences
Paper • 2112.07916 • Published • 2
Papers - Graph Convolutional Network
Papers - Training Research - Contrastive Predictive Coding
Papers - Optimizers - Adafactor
Papers - University of Georgia Tech
Papers - Image - Contrastive Style Descriptors
Papers - Ellis Institute
Papers - Image - Security Cameras
Papers - University - University of Waterloo
-
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 37 -
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Paper • 2211.12588 • Published • 3 -
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Paper • 2402.16671 • Published • 29 -
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Paper • 2404.04167 • Published • 14
Papers - Benchmarks - Text
Papers - Benchmarks - Text - Long Context
Models - Text - Classifier - Zero-Shot
Papers - Network - Adaptive BitRate Algorithms
Papers - Network Traffic - 4G and 5G - OTA
Papers - Network Traffic - OTA
Papers - Network Traffic - Transport Optimization
Papers - University of Texas
Papers - Coding - Preference Trees
Papers - Math - Reasoning
-
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 46 -
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Paper • 2404.02893 • Published • 22 -
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Paper • 2309.12284 • Published • 18 -
Premise Order Matters in Reasoning with Large Language Models
Paper • 2402.08939 • Published • 28
Papers - University - Northeastern University
-
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 46 -
Locating and Editing Factual Associations in Mamba
Paper • 2404.03646 • Published • 3 -
Locating and Editing Factual Associations in GPT
Paper • 2202.05262 • Published • 1 -
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 115
Papers - Multilingual - Encoders - BPE
Papers - Gemma
Papers - Encoders - DinoV2
Papers - Training Research - Scaling Properties - T2I
Papers - Pre-training - In-filling - PSM and SPM ordering
Papers - Text - Supervised Fine-tuning
Papers - Fine-tuning - PPO
-
HyperCLOVA X Technical Report
Paper • 2404.01954 • Published • 25 -
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper • 2310.01377 • Published • 5 -
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Paper • 2305.14387 • Published • 1 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 129
Papers - Amazon
-
On the Scalability of Diffusion-based Text-to-Image Generation
Paper • 2404.02883 • Published • 19 -
MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance
Paper • 2404.08252 • Published • 6 -
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
Paper • 2303.15647 • Published • 4
Papers - ByteDance
-
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Paper • 2404.02905 • Published • 74 -
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Paper • 2404.07987 • Published • 48 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 30 -
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Paper • 2402.15627 • Published • 38
Papers - Infererence - Performance
-
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Paper • 2404.02905 • Published • 74 -
On Speculative Decoding for Multimodal Large Language Models
Paper • 2404.08856 • Published • 13 -
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 20
Papers - Coding - Think and Execute vs CoT and PoTs
Papers - Coding - Think and Exectue - 7B vs 13B vs GPT
Papers - Infra - Cost - Automatic Compute Planning
Papers - MoD - Router
Papers - Image - NeRF
-
Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition
Paper • 2404.02514 • Published • 11 -
MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance
Paper • 2404.08252 • Published • 6 -
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video
Paper • 2404.09833 • Published • 30 -
MeshLRM: Large Reconstruction Model for High-Quality Mesh
Paper • 2404.12385 • Published • 27
Papers - University - Fudan University
-
Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition
Paper • 2404.02514 • Published • 11 -
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Paper • 2404.04167 • Published • 14 -
Length Generalization of Causal Transformers without Position Encoding
Paper • 2404.12224 • Published • 1 -
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
Paper • 2406.07394 • Published • 29
Papers - Image - Demosaic
Papers - Image - Interior Design
Papers - ETH Zurich
-
I-Design: Personalized LLM Interior Designer
Paper • 2404.02838 • Published • 2 -
Scaling MLPs: A Tale of Inductive Bias
Paper • 2306.13575 • Published • 16 -
Fast Feedforward Networks
Paper • 2308.14711 • Published • 3 -
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
Paper • 2404.14047 • Published • 45
Datasets - Reasoning
Papers - University - University of California San Diego
-
I am a Strange Dataset: Metalinguistic Tests for Language Models
Paper • 2401.05300 • Published • 3 -
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 66 -
SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Paper • 2305.09781 • Published • 4 -
MeshLRM: Large Reconstruction Model for High-Quality Mesh
Paper • 2404.12385 • Published • 27
Papers - Contextual AI
Papers - Reasoning - GSM8k
-
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
Premise Order Matters in Reasoning with Large Language Models
Paper • 2402.08939 • Published • 28 -
Reasoning in Large Language Models: A Geometric Perspective
Paper • 2407.02678 • Published • 1 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 129
Papers - Benchmarks - GSM8k
-
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
Paper • 2406.07394 • Published • 29 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 166 -
Self-Consistency Preference Optimization
Paper • 2411.04109 • Published • 19
Datasets - Math - Meta Context Reasoning
Papers - Southern University of Science and Technology
Papers - Max Planck Institute
Datasets - Text - System Chat
Models - Table - Handwriting Comprehension
Papers - Document - Tabular - Manual Review
Repo: https://github.com/HistLab/More-efficient-manual-review-of-automatically-transcribed-tabular-data
Papers - Image - Custom Annotation and Labeling Tools
Papers - Image - Tabular
Papers - Image - OCR
-
CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents
Paper • 2004.12629 • Published • 3 -
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Paper • 2204.08387 • Published • 5 -
Text Role Classification in Scientific Charts Using Multimodal Transformers
Paper • 2402.14579 • Published • 1 -
An inclusive review on deep learning techniques and their scope in handwriting recognition
Paper • 2404.08011 • Published • 1
Papers - Image - Table Structure Recognition
Papers - Image - Fine-tuning - Tables
Papers - Document AI
-
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Paper • 1912.13318 • Published • 4 -
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Paper • 2012.14740 • Published • 2 -
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Paper • 2204.08387 • Published • 5
Papers - Coding - Benchmarks - Report
Papers - Benchmarks - Coding
-
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
Paper • 2404.03543 • Published • 18 -
McEval: Massively Multilingual Code Evaluation
Paper • 2406.07436 • Published • 41 -
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Paper • 2406.15877 • Published • 48 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 166
Papers - Coding - Multi-Model Inference
Papers - Anthropic
Papers - Coding - Encoders
Papers - Coding - Compression
Papers - Inference - Multi-Model
Papers - Fine-tuning - Report - Llama 7B and 13B
Papers - Tokenizers - Roberta
Papers - Reasoning - Social IQ
Papers - Image - Classifier - Label Quality Assessment
Papers - Benchmarks - Image - Labels
Papers - Reasoning - Math
MAWPS paper: https://aclanthology.org/N16-1136.pdf
-
Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
Paper • 1705.04146 • Published • 1 -
Training Verifiers to Solve Math Word Problems
Paper • 2110.14168 • Published • 4 -
Explaining Math Word Problem Solvers
Paper • 2307.13128 • Published • 1 -
MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms
Paper • 1905.13319 • Published • 2
Papers - University of Oxford
-
Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
Paper • 1705.04146 • Published • 1 -
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?
Paper • 2404.03411 • Published • 11 -
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 29 -
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 20
Papers - Training - Instruction-Following
Alpaca eval: https://github.com/tatsu-lab/alpaca_eval
Papers - RLHF
-
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper • 2310.01377 • Published • 5 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 89 -
Natural Language Reinforcement Learning
Paper • 2411.14251 • Published • 31 -
Group Robust Preference Optimization in Reward-free RLHF
Paper • 2405.20304 • Published • 1
Papers - Benchmarks - Text - Glue
Datasets - Benchmarks - Text
Papers - Reasoning - Program of Thoughts
Papers - StructLM - Understanding Structured Data
Datasets - Text - StructLM
Papers - Prompts - Chain of Thought
Datasets - OCR - Image with Text from Textract
Papers - Benchmarks - Web Browsing Tasks
Papers - Kaust
Papers - Video - MultiView Compressive Coding (MCC)
Papers - Image - Training - Low Res Predicts High Res
Papers - Tokenizers - Documents - TrOCR
Papers - Tokenizers - Image - Handwriting
Papers - University of Zhejiang
Papers - Audio - TTS - VALL-E
Papers - Security - Jailbreak
-
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?
Paper • 2404.03411 • Published • 11 -
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Paper • 2404.13208 • Published • 40 -
A False Sense of Safety: Unsafe Information Leakage in 'Safe' AI Responses
Paper • 2407.02551 • Published • 9
Papers - LMU Munich
Papers - University of Wuhan
Papers - Benchmarks - Website Navigation
Papers - Web - Recognition
Papers - Fine-tuning - Rejection Sampling (RFT)
Models - General Purpose
-
CohereLabs/c4ai-command-r-plus
Text Generation • 104B • Updated • 2.89k • 1.76k -
CohereLabs/c4ai-command-r-plus-4bit
Text Generation • 55B • Updated • 22 • 256 -
mistral-community/Mixtral-8x22B-v0.1-4bit
Text Generation • 73B • Updated • 32 • 55 -
CohereLabs/c4ai-command-r-v01
Text Generation • 35B • Updated • 11.5k • 1.1k
Models - Chat
Models - Multimodal - Chat
Models - Synthetic Data - Audio
Models - Audio - Classification and Segmentation
Models - Image - Synthetic Data
Papers - Audio - Understanding
Spaces - Qwen - Image
Models - Audio - STT - ASR
-
openai/whisper-large-v3
Automatic Speech Recognition • 2B • Updated • 4.17M • • 5.03k -
openai/whisper-tiny
Automatic Speech Recognition • 37.8M • Updated • 609k • 374 -
openai/whisper-small
Automatic Speech Recognition • 0.2B • Updated • 2.46M • 469 -
openai/whisper-medium
Automatic Speech Recognition • 0.8B • Updated • 771k • 267
Papers - Automated Interpretability
OpenAI has a 2024 tool referring to this technique: https://github.com/openai/transformer-debugger with https://transformer-circuits.pub/2023/monosema
Models - Encoders - Bert
Papers - Training Research - Rank-One Model Editing
Papers - Training Research - Ablation - Mamba
Papers - Training Research - Weights - Activation Patching
Papers - Interpretability - Rome - Factuality Editing
Websit: https://rome.baulab.info/
Papers - University of Tel-Aviv
-
Analyzing Transformers in Embedding Space
Paper • 2209.02535 • Published • 3 -
Prompt-to-Prompt Image Editing with Cross Attention Control
Paper • 2208.01626 • Published • 2 -
Dynamic Typography: Bringing Words to Life
Paper • 2404.11614 • Published • 45 -
Transformer Language Models without Positional Encodings Still Learn Positional Information
Paper • 2203.16634 • Published • 5
Papers - University of Brown
Papers - Interpretability - Prompts
Papers - Training Research - Control Attention Reweighting
Papers - Training Research - Text - Token Visualization
https://github.com/jessevig/bertviz
Datasets - Image
Papers - Recommendation - Encoders - Bert
FFNs: Using a smoother GELU instead of an ReLu
Papers - Recommendation - Multi-Task Learning
Papers - Recommendation - RTG Balancing
Papers - Healthcare - Radiology
Papers - Training Research - Pre-training - ALBEF
Papers - Pre-training - ALBEF - Multimodal Encoder
Papers - Dataset - MultiModal - MultiLingual - Wiki
Papers - RLHF - Iterative Contrastive Self-Improvement
A batched on-policy algorithm that conducts self-improvement iteratively via contrastive learning
Papers - RL - Consistency Model (RLCM)
a multi-step Markov Decision Process, allowing one to fine-tune consistency models toward a downstream task using just a reward function.
Papers - Harvey Mudd
Papers - Training Research - Search Based (BFS / DFS)
Focuses on policy improvement through search-based sampling
Papers - University of Tubingen
Papers - Kuaishou
Papers - 3DGS - Motion Blur
Papers - Image - Encoders - RGB-T (Thermal)
Models - Image - Stock Market - Pattern Detection
Papers - Audio - Bark
Papers - Training Research - DeiT
Papers - Image - Object Detection - YoloV8
Papers - Image - Hybrid - DeiT and YoloV8
Papers - Image - Healthcare - PTP Metrics
Papers - Custom Layers - MLP
-
MLP Can Be A Good Transformer Learner
Paper • 2404.05657 • Published • 1 -
Toward a Better Understanding of Fourier Neural Operators: Analysis and Improvement from a Spectral Perspective
Paper • 2404.07200 • Published • 2 -
An inclusive review on deep learning techniques and their scope in handwriting recognition
Paper • 2404.08011 • Published • 1 -
Long-form music generation with latent diffusion
Paper • 2404.10301 • Published • 27
Papers - Multilingual - Image - Greek
Papers - Indian Institute of Science
Papers - Regularization - LayerScale
Models - Image - DeiT
Papers - Image - Report - VQA
Papers - AIRI Institute
Papers - Skoltech
Papers - Image - Coco Testing
-
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Paper • 2310.03502 • Published • 78 -
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper • 2404.07448 • Published • 12 -
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 32 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 30
Papers - Image - Frechet Inception Distance (FID)
https://machinelearningmastery.com/how-to-implement-the-frechet-inception-distance-fid-from-scratch/
-
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Paper • 2310.03502 • Published • 78 -
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 4 -
Music Consistency Models
Paper • 2404.13358 • Published • 14 -
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper • 2404.14507 • Published • 23
Papers - Benchmark - Context
Papers - Image - Decoders
Papers - Training - Image - Causal Self Attention
Papers - Training - Detailed Appendices
Papers - 3D - Panoramic View Generator
Papers - Training - Noisy or Unseen Data Drops Accuracy 6%
Spaces - Healthcare - Multimodal
Papers - Fine-tuning - Orpo
Papers - Image - Fourier Neural Operators (FNO) vs CNNs
Papers - Image - Training - Training with an Ensemble
Papers - Image - Differential Equations - FNO - ReLu
Papers - Rag - Prompts
Papers - Tokens - Path Equilibrium Positioning
Like coupled masses connected by springs
Papers - Model - Griffin
Models - Mistral - Orpo
Papers - University of Central Florida
Papers - Audio - Datasets - Dialog
Papers - Advanced Micro Devices
Papers - Image - Auto - Lane - Training Segmentation
Papers - Agents - Operating Systems
Papers - University of Aalto
Papers - Megatron
Papers - DiffusionDet
Papers - Image - Ordinary Differential Equations (ODE)
-
ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model
Paper • 2404.07773 • Published • 1 -
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis
Paper • 2404.13686 • Published • 28 -
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper • 2404.14507 • Published • 23 -
NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement
Paper • 2404.05669 • Published • 1
Papers - Image - Bounding Boxes - Loss - Timeseries
Models - Image - Image Segmentation - Coco
-
facebook/maskformer-swin-base-coco
Image Segmentation • 0.1B • Updated • 1.53k • • 26 -
facebook/mask2former-swin-small-coco-panoptic
Image Segmentation • 68.8M • Updated • 661 • 1 -
facebook/mask2former-swin-tiny-coco-instance
Image Segmentation • 47.5M • Updated • 65.3k • • 11 -
facebook/mask2former-swin-small-ade-semantic
Image Segmentation • 68.8M • Updated • 10.5k • • 8
Papers - Image - ConsistencyDet
Audio reading using bark: https://drive.google.com/file/d/1AlHLzeUd04LXgDj99SOvmQJTy9chufGo/view?usp=sharing
Models - Rag
Models - Image - Clip
Models - Agent
Spaces - Comics
Papers - Visualization of Thought (VoT) - Mind’s Eye
Papers - Benchmark - Multimodal - Image Documentation
Papers - Investing - Stock Forecasting
Papers - Investing - AceFormer - ACEEMD
Papers - Agent
-
ODA: Observation-Driven Agent for integrating LLMs and Knowledge Graphs
Paper • 2404.07677 • Published • 1 -
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
Paper • 2404.07738 • Published • 2 -
Scaling Instructable Agents Across Many Simulated Worlds
Paper • 2404.10179 • Published • 28 -
A Multimodal Automated Interpretability Agent
Paper • 2404.14394 • Published • 22
Papers - Panasonic
Papers - Selective Language Modeling vs Causal
Datasets - Chat
Papers - Image - VQA - Captions High Res Alignment
Papers - University - Columbia University
-
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 32 -
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 4 -
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation
Paper • 2404.13026 • Published • 24 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15
Papers - Image - Encoders - Dual Vision MLP projectors
Papers - Image - Dataset - LVIS
-
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 32 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 30 -
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 4 -
Grounded Language-Image Pre-training
Paper • 2112.03857 • Published • 3
Papers - Image - Training - OCR - High-Res Dense Alignment
Papers - Documents - UDOP
Papers - Image - Scientific Charts
Papers - University of Ulm
Papers - Image - Fine-tuning - CHIME-R and EconBiz datasets
Papers - Embeddings - Text - RoBERTA and BPE
Papers - Embeddings - Image - DiT and dVAE
Papers - Tokenizers - Text - T5
Papers - Classification - F1 Macro and F1 Micro
Papers - University of Panjab
Papers - Image - Connectionist Temporal Classification (CTC)
Papers - Courant Institute
Papers - Image - Coco - Annotation Pipeline
Papers - Image - Coco - Annotation RLHF
Papers - Video - NeRF - Real Estate Walkthroughs
Papers - Image - Datasets - ETH3D
Papers - Image - NeRF - Mesh - TSDF fusion RGBD sequences
Datasets - Research Papers - ARXIV QA
Papers - University of Auburn
Papers - Explainability - Image - VQA - CHM-Corr++
Audio Reading - 2404.08639 - COCONut
Read by Bark: https://drive.google.com/file/d/1qltkY31-013JDQn-u2pmnjPyCaUcOqsV/view?usp=sharing
Audio Reading - 2212.05525 - Extending TrOCR
Read by Bark: https://drive.google.com/file/d/1apmyvLMEQ97ObHKzQna9URFHF0Xg-EsO/view?usp=sharing
Audio Reading - 2404.07773 - ConsistencyDet
Read by Bark: https://drive.google.com/file/d/1AlHLzeUd04LXgDj99SOvmQJTy9chufGo/view?usp=sharing
Datasets - Audio - Large
Datasets - Audio - Multilingual - Large
Models - WizardLM
Models - Image - QA
Papers - Training Research - Dataset Ordering
Papers - Training - Education Stage then Cognitive Hierarchy
Papers - Llama 2
-
Instruction Tuning with Human Curriculum
Paper • 2310.09518 • Published • 3 -
A Thorough Examination of Decoding Methods in the Era of LLMs
Paper • 2402.06925 • Published • 1 -
Distilling System 2 into System 1
Paper • 2407.06023 • Published • 4 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108
Papers - Training - Out of Vocabulary
Papers - University of Charles
Papers - Training - Filter Low Quality with Contriever
Papers - University of Ewha Womans
Papers - University - University of Michigan
Papers - Audio - Fine-tuning - Alpaca
Papers - Audio - Encoder - Variational Auto-Encoder (VAE)
Papers - University of North Carolina Chapel Hill
Papers - Megalodon - Unlimited Context
Papers - 3DGS - Compression
Papers - Inference - Multimodal
Papers - Inference - Speculative Decoding - Draft Model
Papers - Dataset Generation - Guide
Papers - Mamba - Bidirectional
Papers - Healthcare - Image - Cancer - Prostate
Papers - Research - Automated Research
Papers - Tinkoff AI
Papers - University of Pennsylvania
Papers - Inference - Image
Audio Reading - 2402.16827 - Survey on Data Selection ~3.5h
Reading by Bark: https://drive.google.com/file/d/1cdKmflJ3jRKszi5s3RF4Ru6nUw8tvpHX/view?usp=sharing reading duration 3 hours and 28 minutes
Papers - Pre-training - Warm-Start - Encoder and Decoders
Papers - Imperial College of London
Papers - Pre-training - Self-Supervised for Downstream Tasks
Papers - Pre-training - Warm-Start - Encoders - Unigram
Papers - Pre-training - Encoders - Bert
Papers - Pre-training - Warm-Start
Papers - Pre-training - Checkpoints
Audio Reading - 2310.09518 - Instruct with Human Curriculum
Read by Bark: https://drive.google.com/file/d/1fEZ8uwnfniMljZ5S60NxOav6Qav2A-XB/view?usp=sharing duration is 44m 12s
Datasets - Image - VQA
Papers - Inference - KV Cache
Models - Image - Embeddings
Papers - Stability AI
Papers - Audio - Decoders - DAC - No tanh activation
The DAC decoder tanh caused harmonic distortion
Papers - Audio - Embedding - Time - Sinusoidal Cross Attensi
Papers - Audio - Embedding - Clap - Timestep - Prepended
Papers - Attention - Block-wise
Papers - Audio - Musical Structure Analysis
Papers - Agent - Sima
Papers - Training - Video Games
Papers - Video Games - Survival
Papers - Video Games - Survival - Valheim
Papers - Video Games - Object Tools
Papers - Video Games - Environment Resource Planning
Papers - World Sim - Encoder - Video - Phenaki
an encoder-decoder model
which compresses videos to discrete embeddings (tokens) and a transformer model to translate
text embeddings to video tokens.
Papers - World Sim - Training - Classifier-Free Guidance
Fig 10. CFG substantially improves language conditionality.
Papers - Video - Phenaki
Papers - Video - Encoders - C-ViViT - MaskGiT
MaskGiT is trained to reconstruct
masked tokens z predicted by a frozen C-ViViT encoder and conditioned on T5X tokens of a given
prompt p0
Papers - World Sim - Embedings - Text - T5X
Papers - GNN
-
On the Scalability of GNNs for Molecular Graphs
Paper • 2404.11568 • Published • 1 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 118 -
Architectures of Topological Deep Learning: A Survey on Topological Neural Networks
Paper • 2304.10031 • Published • 3 -
Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold
Paper • 2408.14608 • Published • 8
Papers - GNN - Dataset - LargeMix
Papers - GNN - Benchmark - TDC
Papers - GNN - Benchmark - MoleculeNet
Papers - GNN - MPNN
Papers - GNN - Encoders - Positional and Structural Encoding
1) random walk diagonals 2) Laplacian eigenvectors for geometry and position 3) global structural information about the graph
Papers - GNN - MoIE
Papers - Healthcare - Molecules
Papers - GNN - Ensemble
Papers - Healthcare - Drug Discovery - GNN
Papers - University of Montreal
Papers - University of McGill
Papers - Healthcare - Image - Chest - X-ray
Papers - XAI
-
Explainable Lung Disease Classification from Chest X-Ray Images Utilizing Deep Learning and XAI
Paper • 2404.11428 • Published • 1 -
A Multimodal Automated Interpretability Agent
Paper • 2404.14394 • Published • 22 -
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Paper • 2404.07129 • Published • 3 -
The Geometry of Categorical and Hierarchical Concepts in Large Language Models
Paper • 2406.01506 • Published • 3
Papers - XAI - Loc Interpretable Model Agnostic Explanation
LIME
Papers - University of Ahsanullah
Papers - Image - Visual Feature Extractor
Papers - Optimizer - Lamb
Papers - Training - 3D Parallelism - Back - Reduce-Scatter
Papers - Custom Layers - Feedforward Neural Network (FFN)
-
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Paper • 2402.15627 • Published • 38 -
One Wide Feedforward is All You Need
Paper • 2309.01826 • Published • 33 -
Fast Feedforward Networks
Paper • 2308.14711 • Published • 3 -
Memory Layers at Scale
Paper • 2412.09764 • Published • 5
Papers - Training Research - Fault Tolerance
Papers - Training - Parameter Reduction - FFN
Papers - Multilingual - Spanish
Papers - Emergent Properties
Papers - Emergent Properties - Exact String Match
Papers - Training - Epoch - 4 Epochs by Default
See Page 7 Figure 5 on right: Repeating for 4 epochs is almost as good as new data
Papers - Surge Global
Papers - Benchmarks - Toxicity
Papers - Fine-tuning - Reward Model
Papers - Datasets - Multilingual - Documents - Seahorse
contains documents and summaries in six languages (German, English, Spanish, Russian, Turkish, and Vietnamese) with pointwise human ratings
Papers - Inference - Speculative Decoding - KV Cache
Papers - KV Cache
-
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Paper • 2404.11912 • Published • 17 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 27 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 260 -
An Evolved Universal Transformer Memory
Paper • 2410.13166 • Published • 6
Papers - Speculative Decoding - Draft - Base Model - JF68M
we utilize a 4K retrieval cache as an intermediate draft cache in our hierarchical system, while leveraging the JackFram/Llama68M (JF68M) [28] model
Papers - Speculative Decoding - Draft - Model - SpecInfer
Models - Speculative Decoding - Draft - SpecInfer
Papers - Speculative Decoding - Token Verification
Papers - Multimodal - Reka - Image Video Text Audio
Papers - Animation - Text
Papers - Video - Text Animation
Papers - Video - Score Distillation Sampling
Papers. - Samsung
Papers - 3D - Mesh Generator
-
MeshLRM: Large Reconstruction Model for High-Quality Mesh
Paper • 2404.12385 • Published • 27 -
MaPa: Text-driven Photorealistic Material Painting for 3D Shapes
Paper • 2404.17569 • Published • 13 -
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
Paper • 2406.10163 • Published • 33 -
Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials
Paper • 2407.02445 • Published • 4
Papers - Games - AlphaGo
Papers - University of Turku
Papers - Benchmarks - Image - QA - Abstract
Datasets - Benchmarks - Image
Datasets - Benchmarks - Image - Blink
Papers - International Human Phenome Institute
Papers - Datasets - Training - Context - LongBencb
Papers - Attention - NoPE - Long Context with SoftMax Temp
Uniform scaling not as good as Head-based scaling
Papers - TinyLlama
Papers - Training - Eval - Sliding Window Perplexity
Papers - Training - Eval - Sliding Window - PG19
Papers - Context - NoPE vs RoPE - Passkey Retrieval Viz
Page 7 fig shows NoPE extending passed the models context size from pretraining or fine-tuning
Papers - Mila
Papers - ServiceNow
Papers - Training - Residual Connections
Papers - Positional Encodings
Papers - Embeddings - ALiBi
Papers - Encodings - No Positional Encodings - NoPE
-
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
Teaching Transformers Causal Reasoning through Axiomatic Training
Paper • 2407.07612 • Published • 2 -
Round and Round We Go! What makes Rotary Positional Encodings useful?
Paper • 2410.06205 • Published • 2
Papers - Chain of Thought - Scratchpad
Papers - University - Hebrew University of Jerusalem
Papers - Benchmarks - Text - Classification - FewMany
Papers - Datasets - Weather
Papers - Historical - Weather
Papers - University - Berlin Technical University
Datasets - Benchmarks - Coding
Datasets - Text - CommonCrawl
Datasets - Text - Research Papers - QA - QASPER
Papers - Knowledge Graphs
-
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Paper • 2401.18059 • Published • 46 -
KGValidator: A Framework for Automatic Validation of Knowledge Graph Construction
Paper • 2404.15923 • Published • 2 -
The Geometry of Categorical and Hierarchical Concepts in Large Language Models
Paper • 2406.01506 • Published • 3
Papers - University - UCLA
Papers - Image - Phrase Grounding
Papers - Image - Grounded Captions
Papers - Text - Instruct - Grounding and Captions
Papers - Text - Legal - Remove Redaction
Papers - Benchmarks - Text - Text Anonymization Benchmark
Papers - Text - Encoders - Sentence Transformers (SBERT)
Papers - ML - XGBoost
Papers - University - Delft University
Papers - Attention - Gated Self-Attentio - Spatial Grounding
Papers - Image - Object Detection - YOLO
-
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 4 -
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 42 -
DETRs Beat YOLOs on Real-time Object Detection
Paper • 2304.08069 • Published • 14 -
RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer
Paper • 2407.17140 • Published • 2
Papers - Image - Keypoint
Papers - SQL - Database Migrations
Papers - SQL - Query Tree
Papers - Web - Agent
Papers - University - University of British Columbia
Papers - Coding - Defects
Papers - 3DGS - Motion
Papers - 3DGS - K-Means Clustering
Driving particles
Papers - Phi - Technical Report
Papers - Audio - Classifier-Free Guidance (CFG)
Papers - Image - Fine-tuning - LoRA
-
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis
Paper • 2404.13686 • Published • 28 -
MultiBooth: Towards Generating All Your Concepts in an Image from Text
Paper • 2404.14239 • Published • 9 -
Stylus: Automatic Adapter Selection for Diffusion Models
Paper • 2404.18928 • Published • 15 -
MagicQuill: An Intelligent Interactive Image Editing System
Paper • 2411.09703 • Published • 78
Papers - XAI - Eval - Synthetic Vision Neuron
Papers - XAI - MAIA
Papers - Llama 3 - Fine-tuning - Quantization
Papers - Llama 3 - GPTQ AWQ PB-LLM BiLLM - 1.1-8 bits LoRA
Papers - Niantic
Papers - Coding - Automated Workflows
Papers - Investing - Document QA - SEC Filings
Papers - Image - Consistency Trajectory Model (CTM)
Papers - Security - Prompt Injection
Papers - Image - Multi-Concept Customization (MCC)
Papers - Image - Encoder - Single-Concept Learning - QFormer
Multi-modal Concept Extraction
Papers - Image - Synthetic Generator - Depth
Papers - Image - Datasets - CIFAR
-
All you need is a good init
Paper • 1511.06422 • Published • 1 -
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper • 2404.14507 • Published • 23 -
Deep Residual Learning for Image Recognition
Paper • 1512.03385 • Published • 8 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15
Papers - Activation Functions
Papers - Pre-training - Layer Initialization - LSUV
Papers - University - Czech Technical University
Models - Instruct - Context - 128k
Models - Text - Long Context
Papers - Command-R
Papers - Pre-training - Text - Cross-lingual
Papers - Twelve Labs
Papers - Audio - Discriminator - Adversarial Loss
Papers - Audio - Voice Conversion
Papers - University - Inner Mongolia University
Papers - Attention - Flash Attention
Papers - MobiLlama
Papers - Fine-tuning - PEFT
-
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 126 -
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
Paper • 2303.15647 • Published • 4 -
Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer
Paper • 2205.12148 • Published • 2 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 43
Papers - OpenELM
-
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 126 -
apple/OpenELM-450M
Text Generation • 0.5B • Updated • 593 • 26 -
apple/OpenELM-3B
Text Generation • 3B • Updated • 925 • 126 -
apple/OpenELM-3B-Instruct
Text Generation • 3B • Updated • 2.86k • 339
Models - OpenELM
Papers - Attention - Sparse Attention
Papers - Image - Mask2Former
Papers - Image - Detectron2
Papers - University - University of California Riverside
Papers - Image - Cost Reduction - Early Exit
Papers - Model - Inception
Papers - Image - Training - Per-class Regressor (PCR)
Papers - Healthcare - Mamba
Papers - Fine-tuning - Multilingual - Multi-task
Papers - Fine-tuning - Named Entity Recognition (NER)
Papers - University - University of Groningen
Models - Multilingual - Rag - Catalan, Spanish, English
Models - Audio - TTS - Catalan
Spaces - CoT
Papers - Coding - Knowledge Graphs
Papers - Rag - Knowledge Graphs
Papers - Data Extraction - OpenIE
https://stanfordnlp.github.io/CoreNLP/openie.html
Papers - Knowledge Graphs - Prompts
Papers - Quantexa
Papers - Apple - CoreNet
Papers - Image - Classification- WordNet synsets
https://wordnet.princeton.edu/
Papers - Mixture of Data Experts (MoDE)
Papers - Image - MoDE - Clip
Papers - Image - Pre-training - Distribution Clustering
Papers - Image - Encoders - MetaClip
Papers - Embeddings - Text - TF-IDF
Papers - MoE - Training - Expert Prioritization
Using SimCSE
Papers - Pre-Training - MoE - Train One Expert
Papers - MoE - Inference - Routing with Task Metadata
Papers - Image - Encoders - OpenClip
Papers - Image - Benchmarks - Clip
Papers - MoE - Routing - Softmax Normalization
Papers - 3DGS - Segmentation
Papers - 3D - Interactive - Semantic Editing based on Loss
Papers - 3D - Gaussian Splatting and NeRF
Papers - SenseTime Research
Papers - Benchmarks - Multimodal - SEED-Bench
Papers - ARC Lab
Papers - Inference - Draft Model - Early Exit - Dropout
Papers - Agent - Image
Papers - Agent - Tasks
-
LEGENT: Open Platform for Embodied Agents
Paper • 2404.18243 • Published • 22 -
Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations
Paper • 2404.17521 • Published • 13 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 118 -
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 18
Papers - Healthcare - Chain of Reasoning (CoR)
Papers - Training - Self-Guided with Search
Papers - Healthcare - VQA - Understanding
Papers - Healthcare - Biomedical Research
Papers - Healthcare - Prompts
Papers - Healthcare - Radiology Objects in Context (ROCO)
Papers - Healthcare - Benchmarks - Text - MMMU-HM
Papers - Healthcare - Benchmarks - VQA - MedVidQA
Papers - Healthcare - Benchmarks - Video - Cholec80-CVS
Papers - Speculative Decoding - Early Exit
Papers - Training - Multi-Model Evaluation
Papers - Training - Evaluation - Multi-Hop QA
Papers - Agent - Training
Papers - Agent - Evaluation
Papers - 3D - Blender
Papers - 3D - Texture Editing
Papers - Video - Robot Simulator - VQA
Papers - World Sim - Scene Generation
Papers - Image - Fine-tuning - Dataset - StylusDocs
Papers - Image - Datasets - DOCCI
Papers - Image - Annotation UI
Papers - 3DGS - Test - Dataset - Objaverse
Papers - SK Telecom
Papers - 3DGS - Tabular Structure Detection
Papers - 3DGS - Structure Preservation
Papers - Custom Layers - KAN
Papers - Nexa AI
Papers - California Institute of Technology
Papers - University - University College London
Papers - ICL - Induction Circuit
Two layer induction heads
-
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Paper • 2404.07129 • Published • 3 -
In-context Learning and Induction Heads
Paper • 2209.11895 • Published • 2 -
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Paper • 2403.07809 • Published • 1
Papers - Ensemble
Papers - Model Editing
Papers - Image - Multi-Caption Generation
Papers - Institute - Nankai Int Advanced Research Institute
Spaces - Comics and Cartoons
Papers - Emergent Properties - ICL - Induction Heads
Additional reading: https://transformer-circuits.pub/2021/framework/index.html
-
In-context Learning and Induction Heads
Paper • 2209.11895 • Published • 2 -
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Paper • 2404.07129 • Published • 3 -
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Paper • 2403.07809 • Published • 1
Papers - Custom Layers - Residual Connection - Ablation
Papers - XAI - Attention - Induction Heads
Papers - Attention - Ablation
Papers - Attention - Previous Token Head
Papers - Training Research - Clamping
Modifying activations during training with proper gradient flow
Papers - ICL - Induction Head - Num Labels vs Classes - Loss
Papers - Training - ICL - Induction Circuit Evolution
Papers - ICL - Phase Change - Delay - Classes and Labels
Papers - Pr(Ai)2R Group
Papers - ICL - Locating Early and Late Fact Associations
Papers - ICL - Phase Change Delay - Large Vocabulary Size
Larger vocab is better compression, but may result in longer training ICL phase change delays due to the slower Induction Head Copy Subcircuit (C)
Models - MoE - Reward Model
Papers - Reward Model - Model Merging vs Joint Training
Datasets - Reward Model - Preference Collection
Papers - ICL - Residual Head Hypothesis
Papers - Dataset Storage - Parquet
Papers - Voltron Data
Papers - Dataset Storage - Zarr
Papers - Dataset Storage - Lessons Learned
Papers - Pre-training - Rerankers
Models - Coding - Code Interpreter - Agent - Multi-shot
Papers - FNet - Fourier Transformers
Papers - Mamba
-
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Paper • 2405.21060 • Published • 67 -
Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models
Paper • 2406.04320 • Published • 10 -
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Paper • 2403.19888 • Published • 12 -
Graph Mamba: Towards Learning on Graphs with State Space Models
Paper • 2402.08678 • Published • 17
Papers - Training - Cost Estimates
Papers - Training - Historical GPU Cost Trends
Papers - Reasoning - Complex - Alice in Wonderland - AIW
Papers - Text - Table Generation - Pandas DataFrames
Papers - Coding - Table and Calculations using Pandas
Models - Video - Captions
Papers - Chain of Thoughts - Multi-Shot - Buffer of Thoughts
Papers - Training - PAM faster vs MatMul - CPU
Papers - Training - Distribution Estimation - Autoregressive
Papers - Audio - Distribution Estimate - Spectrogram
Section 7.3.3
Papers - Datasets - Biology - SMILES
Papers - Healthcare - Text - Biology QA
Papers - Image - Fine-tuning - Llama
Papers - Image - Training Metrics - PSNR
Papers - Image - Tokenizers - VQGAN
Papers - Image - Metrics - Inception Score (IS)
Papers - Inference - Image - vLLM
Table 7
Datasets - Text - Characters
Papers - Image - Training - Loss - Gradient Estimator
Papers - Image - Training - Arch - 2D RoPE and SwiGLU
Papers - Image - Classifier-Free Guidance (CFG)
Guidance - "The intended effect is to decrease the diversity of the samples while increasing the quality of each individual sample."
-
Classifier-Free Diffusion Guidance
Paper • 2207.12598 • Published • 3 -
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 57 -
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models
Paper • 2404.07724 • Published • 14 -
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 71
Papers - University - University of Heriot-Watt
Papers - Image - Sampling - Variety, Fidelity, Truncation
Papers - Training - Process Reward Model
Papers - Image - InceptionResNet-v2
Papers - Image - Semantic Segmentation - Benchmark - PASCAL
Papers - Ant Group
Papers - Monte Carlo Tree Search (MCTS) - Math Reasoning
Papers - Image - Faster RCNN
Papers - Image - Faster RCNN - 2nd Stage - Box Classifier
Papers - Image - Human Pose Estimation - Coco
Papers - Image - Deep Fakes - Detecting Video Forgeries
Papers - Prompts - Report
Papers - Security - Pen Testing
Papers - Image - Diffusion - Parallel Denoising
Papers - Image - Denoising - Stride Denoising
Papers - Training - Multi-GPU
Papers - Coding - Training - Annotations
Papers - Coding - Inference - vLLM
Papers - Coding - Tokenizer - CodeBert
Papers - Coding - Tokenizer - Viz - Hierarchical Clustering
Papers - Coding - MCoder
Papers - University - Beijing Information Science and Tech
Papers - Coding - Fine-tuning - DeepSeekCoder
Papers - University - University of California Santa Cruz
Papers - SSMs - Testing - Time Series Forecasting Report
Papers - SSMs - Classification
Papers - Image - Augmentation - Edge Detection - HED
Models - Image - ControlNet - Canny
Papers - Image - Datasets - BSDS 500 - Berkeley Segmentation
Papers - Image - VGGNet
Papers - Image - CFG - CFG Resolution Weighting (CFG-RW)
Papers - 3DGS - Cone Scatter Initialization
Papers - Training - Preference Optimization - DiscoPOP
Papers - Training - Synthetic - Loss Functions
Papers - Image - OCR - Binarization - Otsu
Papers - Image - OCR - Binarization - DE-GAN
Papers - Image - OCR - Binarization - DocDiff
Papers - Image - OCR - CER (Character Error Rate)
Papers - Image - OCR - Metrics - PSNR, F-Measure, Fps
Papers - Image - OCR - Fine-tuning - CTC Loss Function
Papers - Datasets - Multimodal
-
DataComp: In search of the next generation of multimodal datasets
Paper • 2304.14108 • Published • 2 -
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Paper • 2407.08583 • Published • 13 -
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Paper • 2411.04709 • Published • 26 -
YFCC100M: The New Data in Multimedia Research
Paper • 1503.01817 • Published • 1
Papers - Image - Augmentation - Depth - MDE
Papers - Image - Augmentation - Plasma Fractals
Papers - Text - Training - Estimation - LDA
Spaces - Image - Stable Diffusion - 3 - Medium
Papers - Inference - Speculative Decoding
Papers - Embed - Duplex Models - Time-Division Mulitplexing
Papers - Image - Charts - QA - Reasoning
-
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Paper • 2406.18521 • Published • 29 -
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Paper • 2407.01284 • Published • 81 -
ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
Paper • 2407.04172 • Published • 26
Papers - Image - Benchmarks - Charts
Papers - ICL - Prompt - Out of Distribution (OOD) Emergence
Papers - NTT Research
Papers - Coding - List Functions, Editing, Logos ASCII Art
Papers - Coding - Building Using Multi-shot Prompts
Models - Biology - Proteins - ESM
Papers - Image - Fine-tuning - LLaVA
-
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale
Paper • 2406.19280 • Published • 63 -
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Paper • 2411.02327 • Published • 11 -
MagicQuill: An Intelligent Interactive Image Editing System
Paper • 2411.09703 • Published • 78 -
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 129
Papers - Math - TabMWP
Datasets - Text - Personas
Papers - Rag - Long Context
Papers - 3D - AssetGen
Papers - Text - Inferential Adversaries
Papers - Multimodal - Embeddings
Papers - 3DGS - Text - Enhance
Papers - 3DGS - Loss - Interval Score Matching (ISM)
Papers - 3DGS - Training - Model - Stability Diffusion 2
Papers - Multimodal - 3DGS and Text
Papers - Multimodal - Training - Joint Example Selection
Models - Text - Fine-tuning - Axolotl
Models - SAE - Sparse Auto Encoders
Repo: https://github.com/EleutherAI/sae
Papers - Multimodal - Training - Patch Aligning Layer
Papers - Multimodal - Training - Loss - Cross Entropy
Papers - Agent - Tools
Papers - Agent - Math - Reasoning
Papers - Decoders - Report
Datasets - CoT
Papers - XAI - Token Tracing - Model MLP Layers Plots
Papers - RL - Gradient-Boosting
Papers - RL - Actor-Critic
Papers - RL - Structured Data - Gradient Boosting
Papers - Datasets - Multimodal - Creator Guide
Papers - Decoders - Deterministic - Diverse Beam Search
Papers - Decoders - Stochastic
Papers - Decoders - Deterministic - Greedy Search
Papers - Decoders - Deterministic - Contrastive Decoding
Papers - Decoders - Stochastic - Typical Sampling
Papers - Decoders - Stochastic - Top-p Sampling
Papers - Decoders - Stochastic - n-Sampling
Papers - Coding - Datasets - MBPP
Papers - Text - Benchmark - Translation - BLEU
Papers - Text - Benchmark - Instructions - AlpacaEval
Papers - Image - Reasoning
Papers - Encodings - LPE - Learnable Position Encodings
Papers - Text - Dataset - Knowledge Graph - WordNet
Papers - Knowledge Graph - GraphRag - WordNet -
Papers - CoT - Branch Solve Merge (BSM)
Models - Text - Embedding
-
nomic-ai/nomic-embed-text-v1
Sentence Similarity • 0.1B • Updated • 799k • 544 -
nomic-ai/nomic-embed-text-v1.5-GGUF
Sentence Similarity • 0.1B • Updated • 11.2k • 78 -
BAAI/bge-large-en-v1.5
Feature Extraction • 0.3B • Updated • 4.3M • • 590 -
mixedbread-ai/mxbai-embed-2d-large-v1
Feature Extraction • 0.3B • Updated • 900 • 40
Papers - ICV - In-Context Vectors (controllable ICL)
Repo: https://github.com/shengliu66/ICV
Papers - ICL - Attention
Papers - Text - Detoxification
Papers - Text - Toxicity - Feature Shifting
Papers - Fine-tuning - Text - Detoxification - LoRA
Papers - Text - Datasets - Formality - Yahoo Answers
Papers - Text - Datasets - Sentiment Transfer - Yelp Reviews
Papers - Vicuna
Papers - Text - Benchmark - Similar - Feature - Bert-Score
Papers - Text - Personalization - Positivity
Papers - ICV - Strength - Tradeoffs Similarity and Fluency
Papers - Text - Role-Play - Style - Speaking
Papers - Text - Activation Editing
Papers - Text - Task Arithmetics - Fine-tune vs Base
Papers - Text - Formality - Classifier - XLM-RoBERTa
Papers - Attention - Dual Chunk
Papers - Activation - SwiGLU
Papers - Training - Data Annotation
Papers - Benchmarks - Text - Long Context - LV-Eval
Papers - Benchmarks - Text - Long Context - NeedleBench
Papers - Text - Long Context
Papers - 3DGS - Benchmarks - LPIPS
Papers - 3DGS - Editing - Appearance Interpolation
Papers - 3DGS - Benchmarks - SSIM
Papers - 3DGS - Fibonacci Sphere Sampling - Sky Handling
Papers - Quantization - EfficientQAT
Papers - Audio - Text - Music Generator
Papers - Security - Red Team - Agents
Papers - Text - Linguistic Agency - Algospeak
Papers - Text - Cognitive Science - Linguistic Agency
Papers - Text - Linguistics - CYOA Game Exploration
Papers - Visualizations - Report
Papers - Visualizations - High Dimensional Approximations
Papers - Visualizations - Graphical Taxonomy
Papers - Math - Non-Euclidean Geometry
Papers - Math - Topology - Discrete Topological Structures
Papers - Math - Geometry - Distance - Riemannian Metric
Papers - Math - Research - Training Loss - Riemannian Metric
Papers - Math - Geometry - Continuous Geometric Structures
Papers - Training - Energy - Carbon Footprint
Papers - Coding - Hardware - FPGA
Papers - Attention - Topology, Geometry and Algebra
See also: https://dawn.cs.stanford.edu/2019/10/10/noneuclidean/
Papers - Training - Math - PCA
See also: https://dawn.cs.stanford.edu/2019/10/10/noneuclidean/
-
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
Paper • 2311.06668 • Published • 5 -
Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures
Paper • 2407.09468 • Published • 2 -
Efficient Algorithms for t-distributed Stochastic Neighborhood Embedding
Paper • 1712.09005 • Published • 1
Papers - Image - PhotoMaker
Papers - Math - Group Action - Translate, Rotate, Reflect
Papers - Healthcare - CoT - Diagnosis
Papers - Math - Training - Topological Structures
Papers - Graphs
Papers - Training - Noise - Labels
Papers - Attention - Algebra - Equivariant
Papers - Encodings - Equivariant Positional Encodings
Models - Text - Fine-tuning - SPPO
Papers - KAN
Papers - Training - Activation - Nonlinear - B-spline
Papers - Multilingual - Malaysian
Papers - Training - Research - Data as Coordinates
Papers - Math - Non-Euclidean - Covariance Matrix - SPD
Models - Bitnet - Layer Conversion
Papers - Reasoning - Grokking
Papers - Math - Riemannian Manifolds - PCA
Papers - NEML - Frechet Mean - Consistency Bias
Models - Image - Rectified Flow Transformers
Papers - Math - Self-Compressing Models
Papers - Coding - DBA
Papers - Audio - Segmentation - Music - Vocals
Papers - Georgia Institute of Technology
Papers - Image - Training - Instruct - VQA - Multi-Image
Papers - Fine-tuning - LoRA - Rank Stabilized Adapters
Papers - NEML - Math - KNN with Geodesics and Frechet Mean
Recommended to explore constrained manifold areas with limited curvature (possible bias)
Models - Coding - Compiler
Papers - NEML - Latent Manifold - Topological Data Analysis
Papers - NEML - Latent Structure - Algebra - Group Learning
Papers - NEML - Manifold - Local Geodesic Regression
Papers - Image - Multi-Image
Papers - mPLUG
Papers - Math - Regression - Geometric Structures
Papers - NEML - Manifolds Geometric - Bezier Splines
Papers - NEML - Regression - Manifold - Weighted Frechet
Papers - NEML - Regression - Local Frechet Regression
Papers - NEML - Manifold Random Forest
Papers - NEML - Manifold IO - Steinke Regular Splines
Papers - NEML - Geometric Structures - Dim Reduction - tSNE
Papers - NEML - Geometric - Dimension Reduction - Isomap
Papers - NEML - Geometric - Dimension Reduction - Rie-SNE
Papers - NEML - Geometric - Dim Reduction - Poincare Embeds
Papers - NEML - Non-Euclidean Machine Learning
See also: "A Riemannian Framework for Tensor Computing" https://inria.hal.science/inria-00070743/file/RR-5255.pdf
-
Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures
Paper • 2407.09468 • Published • 2 -
Barycentric Subspace Analysis on Manifolds
Paper • 1607.02833 • Published • 1 -
Poincaré Embeddings for Learning Hierarchical Representations
Paper • 1705.08039 • Published • 1 -
Efficient Algorithms for t-distributed Stochastic Neighborhood Embedding
Paper • 1712.09005 • Published • 1
Papers - NEML - Hyperbolic Learning - Poincare Ball
Papers - NEML - Datasets - WordNet
Papers - NEML - Euclidean Latents - Decoder - Riemannian LLE
Papers - NEML - Manifold Latents - Hypersphere VAE
Papers - NEML - Manifold Latents - Toroidal Latent Space
Papers - NEDL - Non-Euclidean Deep Learning
Also see: https://dawn.cs.stanford.edu/2019/10/10/noneuclidean/
-
Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures
Paper • 2407.09468 • Published • 2 -
Architectures of Topological Deep Learning: A Survey on Topological Neural Networks
Paper • 2304.10031 • Published • 3 -
Galois Theory
Paper • 2408.07499 • Published • 1 -
Equivariant Transformer Networks
Paper • 1901.11399 • Published • 1
Papers - NEDL - Layer - Perceptron-Exp - Riemannian Expo Map
The manifold needs to be known for this layer to be implemented, and manifolds whose Exp enjoys an analytical expression are preferred
Papers - NEDL - Model Layers - Topology, Geometry, Alegbra
Papers - NEDL - Attention - Equivar - Steerable Transformers
Euclidean signal on Euclidean domains for keys, queries and values, with group action on codomain.
Papers - NEDL - Geometry - Layers - ManifoldNet
Manifold-valued data convolutions. Tangent mean can be used for Fréchet mean to save on compute
Papers - Function Calling
Papers - Agent - Web Navigation
Papers - Video - Segmentation
Papers - Image - Summarize as JSON
Papers - NEDL - Research - Symmetry - Group - Galois Groups
Papers - NEDL - Topology - Persistent Homology
Papers - Music - Piano - Performer - Robot - Motion
Papers - Music - Training - Segmentation - Piano
Papers - Audio - Pipeline - Annotation - Finger Placement
Papers - NEDL - Lie Groups
Papers - NEDL - Embeddings - Hyperbolic
Papers - Benchmark - Tables - Reasoning - QA
Papers - Normalization - NLP - Layer vs Batch
Papers - Normalization - No Normalization - Fixup
Papers - ResNet - Training - Init - Exploding Gradients
Papers - Training - Layers - Scalar - Bias and Multipliers
Papers - Training - Feature Space Cluster - Fisher Criterion
Papers - NEDL - Topology - Attn - Point Cloud Transformer
Papers - NEDL - Topology - Attn - Graph Attn Transformer
Papers - NEDL - Dim Reduction - Principal Geodesic Analysis
Papers - NEDL - Latent Space Manipulation
Papers - Text - Survey
Papers - Text - Benchmark - QA - Knowledge Conflicts
Spaces - Image - Prompt with LoRA
Papers - NEDL - Geometry - Wasserstein Manifold
Models - Image - Llava
Papers - Training - Multi-Task Learning - Jacobian Descent
Repo: https://github.com/TorchJD/torchjd
Papers - Training - Hardware - Survey
Papers - Benchmarks - Data Science
Models - Text - Math - GRPO
Papers - Text - Metric - Hamming
Papers - Text - Classification - FastText
Papers - Reasoning - Code Training
Papers - RL - GRPO - Group Relative Policy Optimization
Papers - Coding - Agent - Arch - Multi-Turn Learning
Papers - Coding - Agent
Models - Training - Reinforcement Learning - Reasoning
Papers - Fine-tuning - Preference Opt - Reward Free
Papers - Fine-tuning - GRPO
Papers - Training - Pipeline - Memory - ZB-V
Papers - Training - Pipeline - Zero Bubble Rate - 1F1B
Papers - Training - Pipeline - Scheduler - Multi-GPU
Papers - Training - Distributed Pipelines - Parallel
Models - Text - Video
Papers - Tokenizers - World Sim
Models - 3D Asset Generator - Image to 3D Mesh
Papers - Training - MoE - Expert Choice
Papers - Interpretability - MoE
Papers - Training - MoE
Papers - Custom Layers - PEER - Single Embedding
Papers - Attention - Gating - Input - Silu Non-linearity
Papers - Training - Memory Augmented
Papers - Custom Layers - Memory - Index
Papers - Coding - Java
Papers - Coding - Multilingual
Papers - Coding - Dataset - BigCloneBench - BigCloneEval
https://github.com/clonebench/BigCloneBench and https://github.com/jeffsvajlenko/BigCloneEval
Papers - Coding - Bug Fixing
Papers - Coding - Defect Detection
Papers - Coding - Understanding - Masking - Cloze Test - CT
Papers - Coding - CodeBert
Papers - Coding - Understanding
Papers - Coding - Dataset - Compiler - IR - DeepDataFlow
https://zenodo.org/records/4247595
Papers - Coding - Datasets - POJ-104
Papers - Coding - Classification - Algo Prediction - XFG
Papers - Coding - GGNN
Papers - Coding - GNNs - CDFG
Papers - Coding - Classification - Instruction - inst2vec
Papers - Coding - Compilers - Global Common Subexpression
Papers - Coding - Compiler Optimization - IR
Papers - Coding - Training - Classification - Algorithms
Papers - Coding - Compilers - IR - Call Flow
Papers - Coding - GNN - IR
Papers - Coding - Embeddings - Compiler - IR2Vec
Papers - Coding - Control Flow
Papers - Coding - Rag - Compiler IR
Papers - Coding - IR - Intermediate Representations
Papers - Coding - LLVM
Papers - Coding - Compilers - LLVM-IR
Papers - Coding - GNNs
Papers - Coding - Compilers
Papers - Diverse Intelligence
Papers - Morphogenesis - Coding - Sorting
Papers - NEDL - Research - Structures - Coding - Sorting
Papers - Coding - Sorting
Papers - Coding - Rust - Traits
Papers - Coding - Rust - Memory - Borrow Checker
Papers - Coding - Split Trees
Papers - Coding - Static Analysis
Papers - Coding - Safe Rust
Papers - Coding - Translate - C to Rust - Repo - EverParse
Papers - Coding - Translate - C to Rust - Repo - HACL
Papers - Coding - Port to Rust
Papers - Coding - Rust - Memory
Papers - Coding - Rust
Papers - Coding - Translation - C++ to Rust
Models - Embeddings - Text - Research Papers - Arxiv
Models - Text - Chat - Research Papers - Arxiv
Papers - Embeddings - Freq n-gram Hash - Vocabulary Impacts
Papers - Embeddings - n-gram Hash - Vocabulary
Papers - Text - Eval - Character Level - CUTE
Papers - Multilingual - Encoders - Bytes
Papers - Training - Bytes - Dynamic Patch Sizes
Papers - Text - Dataset - Classification - Multitask - MMLU
Datasets - Text - Classification - Multitask
Papers - Text - Dataset - Coding - MBPP
Papers - Text - Eval - Coding - Python
Papers - Embeddings - Bytes - BPB - Larger Patches than BPE
Papers - Text - Dataset - Datacomp-LM
Papers - Embeddings - Bytes - Tokenizer Free
Papers - Training - Text - Datasets - Coding - GitHub
Papers - Text - Character Level Transformers
Papers - Text - Character Level RNNs
Papers - Training - Bytes - Lookup - Rolling Poly Hashing
Papers - Training - Scaling - Bytes - BLT >= BPE Tokenizer
Papers - Training - Scaling - Compute Optimal
Papers - Attention - Flex Attention
https://pytorch.org/blog/flexattention/
Papers - Embeddings - Bytes - BPB - Tokenzr Free Perplexity
Papers - Embeddings - Bytes - Flops - Input Layer Lookup
Papers - Training - Embeddings Model - Bytes - Entropy Model
Papers - Attention - Bytes - Patch Cross Attention
Papers - Attention - Bytes - MHA Cross Attention - Perceiver
Papers - Embeddings - Text - Byte - Hash ngrams
Papers - Attention - Block Causal
Papers - Tokenizers - Bytes - Incremental Patching
Note: BPE does not handle incremental patching like BLT
Papers - Tokenizers- Bytes - Entropy Patching - Threshold
Helps with finding the end of the byte patch
Papers - Tokenizers - Bytes - Space - First Char - Patch Len
Papers - Tokenizers - Bytes - Patches - Space Detection
Papers - Tokenizers - Bytes - Patches - Entropy-based
Patch start detected by entropy crossing a threshold
Papers - Tokenizers - Bytes - Strided Patches - MegaByte
Papers - Text - Tokenizer - Bytes - Strided Patches
Papers - Training Research - Bytes - No Vocabulary
Papers - Audio - STT - ASR - wav2vec
Papers - Audio - Contrastive Task - Quantized - Speech
Papers - Audio - Training - Mask Len Distribution - Ablation
Papers - Audio - Training - Masking - Time Steps
Papers - Audio - Viz - Phoneme - Conditional Probability
Papers - Audio - Training - Self-Supervised- Unlabeled Data
Papers - Audio - Fine-tuning - Decoder only - SpecAugment
Papers - Audio - Fine-tuning - Metric - WER
Papers - Audio - Pretraining - Fairseq
Papers - Audio - Dataset - Phoneme Recognition - TIMIT
Papers - Audio - Dataset - LibriVox
Papers - Audio - Dataset - Librispeech
Papers - Audio - Fine-tuning - Loss - CTC
Papers - Audio - Training - Activation - Gumbel Softmax
Papers - Audio - Training - Activation - Gelu
Papers - Audio - Training - Loss - CTC
Spaces - Reasoning
Models - Image - ViT
Papers - Encodings - BBPE - Byte level byte pair
Papers - Tokenizer - Qwen
Papers - Attention - QKV Bias - RMSNorm with Pre-normalizatn
Papers - Training - Activation Function - SwiGLU
Models - Qwen
-
Qwen/QwQ-32B-Preview
Text Generation • 33B • Updated • 25.2k • • 1.74k -
bartowski/Qwen2.5-Coder-14B-Instruct-GGUF
Text Generation • 15B • Updated • 2.74k • 38 -
bartowski/Qwen2.5-Coder-32B-Instruct-GGUF
Text Generation • 33B • Updated • 22.2k • 92 -
bartowski/Qwen2.5-72B-Instruct-GGUF
Text Generation • 73B • Updated • 16.3k • 38
Papers - Training - Algorithm - SGD vs Adam vs Prodigy
Papers - Training - SGD - SGDM - SGD with Momentum
Papers - Training - CNN
Papers - Training - Eval - Mix of Show
Papers - Training - LR - Optimizer - SGD-Sal
Papers - Training - LR - Optimizer - Prodigy
Papers - Pretraining - Image - ViT
Papers - Pretraining - Image
Datasets - Text - E2E
Papers - Training - SGD - Regularization
Papers - Training - SGD - Decoupled Weight Decay
Papers - Training - PyTorch
Papers - Training - LR - Gradient Local Gain - Variance
Papers - Training - LR - Gradient Signal to Noise Ratio
Papers - Training - Layer Initialization
Papers - Training - LR - Learning Rate
Papers - Training - Adam
Papers - Training - Optimizers
Papers - Training - Dataset Selection - Spectrogram Features
Papers - Training - Backward Masking
Papers - Training - Feature Extraction - Frequency - STFT
Papers - KV Cache - Spectrogram
Papers - Attention - Spectrogram - KV Cache
Papers - Text - Midtraining - Rag - Recall - Rerank - ICL
Papers - Training - Midtraining - Context Length
Papers - Text - Training - Dataset Selection - Filtering
Papers - Text - Training - Mixture
Papers - Text - Datasets - Math - AMC
Papers - Training - Eval - Out of Distribution
Papers - Training - Overfitting - Decontamination
Papers - Pretraining - Synthetic Data - Problem Solving
Papers - Pretraining - Synthetic Data - Reasoning
Papers - Fine-tuning - DPO - Pivotal Token Search
Papers - Training - Scaling Laws - Scaling Consistency
Models - Image - Sketch - Pencil
Papers - Training - Text - Vocabulary - SentencePiece
Papers - Encoders - Bytes - More Depth than Decoder
Papers - Training - Token Free - Bytes or Characters
Papers - Training - Bytes - No Tokenizer
-
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Paper • 2105.13626 • Published • 3 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Paper • 2410.20771 • Published • 3
Papers - Audio - Encoders - Bert
Papers - Reinforcement Learning - Video Games
Papers - Video Games - Starcraft 2
Papers - Training - Speed - Reduced Training Time
Papers - Fine-tuning - Decoder Only - Frozen Encoder Weights
Papers - CoT - Latent Search Tree
Papers - Reasoning - CoT - Tree Search - BFS
Papers - 3D - SLAT - Structure Latents
Papers - Reasoning - CoT - MCTS
Papers - Training - Sparse Learning - k-Sparse Autoencoder
Models - Biology - Protein - SAE
Models - Text - SAE
Papers - Robotics - Lie Groups
Papers - Robotics
Papers - Math - Differential Geometry - Lie Theory
Papers - Math - Differential Geometry
Papers - NEDL - Differential Geometry - Visualizations
Papers - Training - Convergence - Gaussian Kernel
Papers - Math - SGD - Stochastic Gradient Descent
Papers - Training - Convergence - SoftMax vs SGD
Papers - Training - Convergence - Stoch Gradient Descent
Papers - Training - Convergence - Kernel - Gaussian
Papers - Training - Convergence - SoftMax
Papers - Image - Normalizing Flows
Papers - Image - Rectified Flows
Papers - Image - Diffusion - SBDM
Papers - Image - DDPM - SDE
Papers - Image - Diffusion Coefficient - Fokker-Planck
Papers - Image - Diffusion Coefficient - Deterministic
Papers - Image - Diffusion Coefficient - Stochastic
Papers - Image - Training - Sampler - SDE
Papers - Image - Training - Sampler - ODE
Papers - Image - Datasets - Oxford Flowers
Papers - Training - Gaussian Mixtures - Bridging
Papers - Image - Diffusion - Stochastic Interpolants
Papers - Video - Generator - Multiple Views
Papers - Text - SAE - Sparse Autoencoders
-
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Paper • 2411.14257 • Published • 14 -
Scaling and evaluating sparse autoencoders
Paper • 2406.04093 • Published • 3 -
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Paper • 2408.05147 • Published • 40 -
Disentangling Dense Embeddings with Sparse Autoencoders
Paper • 2408.00657 • Published • 1
Papers - SAT Solver - GNN
Papers - SAT Solver
Papers - Training - Supervised - Classification
Papers - BitNet - Research - Classification - SAT Solvers
Papers - SAT Solver - NeuroSAT
Papers - Training - Classification - Bit - SAT Solver
Spaces - Image - Generation - High Res - Wide
Papers - Visualizations - GPU Programming - Memory
Papers - Attention - GPU Programming - Kernel - Cuda
Papers - ICL - Text - Classification - Label - Unique Words
Papers - ICL - Text - Prompts - Learning Unique Words
Papers - Text - Encoders - DeBERTa
Papers - RL - Text - Prompts - ASCII ART - Game Board
Papers - RL - Text - Prompts - Navigation - Maze Running
Papers - RL - PC Board Games - Chess - TicTacToe
Papers - RL - Monte Carlo
Papers - RL - Natural Language
Papers - NEDL - Training - Hyperparameters
Papers - Math - Distance - Spearman Correlation
Papers - Math - Distance - Pearson Correlation
Papers - Math - Distance - Chebyshev Polynomials
Papers - NEDL - Embedding - Potential Distance
Papers - NEDL - Embedding - Potential Distance - PHATE
Papers - NEDL - Train - Diffusion - Geodesic
Papers - NEDL - Geodesic Symmetry - Harnack Inequality
Papers - NEDL - Embedding - Geodesic - Euclidean Distance
Papers - Biology - Dataset - RNA - Swiss Roll
Papers - Biology - RNA - Sequencing
Papers - NEDL - Embedding - Heat Geodesic
Papers - Reasoning - Visualization - Pearson’s R
Papers - Training - Scaling - Influence Functions
Papers - Training - Influence Functions - EK-FAC
Papers - Inference - CPU - Apple
Papers - Inference - CPU - Intel
Papers - Inference - CPU - Intel vs Apple - BitNet
Papers - NEDL - Visualization - Non-linearity - tSNE
Papers - Fine-tuning - Multimodal - Contrastive Learning
Papers - NEDL - Fine-tuning - Multimodal Mixup
Papers - NEDL - Fine-tuning - Embedding Shift
Papers - NEDL - Fine-tuning - Geometric Contrastive Learning
Papers - Image - Datasets - SIMAT
Papers - NEDL - Hypersphere
Papers - NEDL - Geodesic
Papers - Training - Cauchy-Schwarz Inequality
Papers - Training - PCA - Kernel
Papers - Training - Non-linear Learning - Lipschitz
Papers - Training - Gradient Descent - Kernel
Papers - Training - Non-linear Learning - Kernel
Papers - Training - Non-linear Learning
Papers - Training - Kernel
Papers - Text - 3D Mesh - Fine-tuning - LLaMa
Papers - Text - Fine-tuning - Loss - CCE - Triton
Papers - Fine-tuning - Memory Reduction Techniques - Text
Papers - Gemma 2 - Fine-tuning
Papers - Mistral - NeMo - Fine-tuning
Papers - Text - Training - Vocabulary Sorting
Papers - Text - Training - Gradient Filtering
Papers - Text - Train - Vocab - Dense Blocks Common Tokens
Papers - Text - Training - Loss - Cuda - Triton - SRAM
Papers - Triton
Papers - Text - Training - Loss - Cut Cross Entropy
Papers - Text - Training - Batch Scaling - Cut Cross Entropy
Papers - Text - Training - Large Vocabulary - CCE
Spaces - Image - Editing a Picture
Papers - Image - Fine-tuning - Dataset - Hand Drawn - DCI
Papers - Image - Fine-tuning - Editing - LAION-Aesthetics
Papers - Fine-tuning - Image - LLaVA
Papers - Image - Benchmarks - Editing - BrushBench
Papers - Image - Editing - BrushNet
Papers - Image - Generation Quality Models - Aesthetic Score
Papers - Image - Generation Quality Models - HPS
Papers - Image - Generation Quality Models - Image Reward
Papers - Image - Guidance - Masked Image Guidance
Papers - Image - BrushNet
Papers - Text - Bit Strings - Hamming Distance
Datasets - Coding - GitHub Issues
Models - Embedding - Multimodal
Papers - Text - Embedding - Noise - In-Batch Deduplication
Papers - Fine-tuning - Text - Embedding
Papers - Fine-tuning - LoRA - Text - Embedding - Sentence
Papers - Text - Embedding - Angle Optimization
Papers - Text - Datasets - GitHub Issues
Datasets - Text - GitHub Issues
Models - Text - Embedding - Multilingual
Models - Text - Embedding - Sentence - German and English
Models - Text - Reranker
Papers - Text - Embedding - Sentence
Models - Text - Embedding - MRL
Models - Text - Embedding - Matryoshka Representation Lang
Papers - Embedding - Text - Sentence - 2DMSE
Papers - Embeddings - Text - Sentence - Matryoshka
Models - Text - Sentence Embedding - Binary Quantization
Models - Text - Sentence Embedding
Papers - CoT - Arch - Reasoning - Layer Depth vs Wider Layer
Papers - Math - Generate - Synthetic Data - CoT
Papers - Math - Generate Synthetic Data
Papers - Benchmarks - Math - Reasoning - GSM-Symbolic
Papers - Flow Matching - Data Generation - XGBoost
Papers - Text - Datasets - Math - Reasoning - iGSM
Papers - Fine-tune - Text - Retry
Papers - Text - Training - Retry
Papers - Audio - Tokenizer
Papers - Image - MiniCPM
Papers - Text - Embedding - Sentence - R-BM25
Papers - Text - Embedding - Sentence - BM25
Papers - Text - Embedding - Sentence - SONAR
Papers - Text - Datasets - Flores-200
Papers - Text - Encodings - Roberta
Papers - Inria
Papers - Text - Machine Translation
Papers - Healthcare - Image Segmentation
Papers - Image - OOD - Out of Distribution
Papers - Image - Guidance - PAG - Perturbed Attention Guidan
Papers - Image - Guidance - Smooth Energy Guidance (SEG)
Papers - Text - Training - Complex Vector Token Representati
Papers - Text - Training - Wave Net
Papers - Text - Encodings - Complex Vectors
Papers - Text - Embedding - Fixed Token - Skip-gram
Papers - Text - Embedding - Fixed Token - CBOW
Papers - Fine-tuning - LoRA - Intruder Dimensions
Papers - Image - Fine-tuning - Clip - Self-supervision
Papers - Text - Inference - Early Stop - Filter Layers
Papers - Image - Training - Contrastive Loss - Batch Size
Papers - Image - Training - Batch
Papers - Image - Fine-tuning - DPO
Papers - Image - Datasets - XM-3600
Papers - Text - Tokens - Vocabulary- Zipfian
Papers - Text - Tokens - Vocabulary - Herdan’s Law
Papers - Text - Tokens - Vocabulary - Heaps Law
Papers - Image - Visual Tokens
Papers - Image - Zipf
Papers - Datasets - Multimodal - YFCC100M
Papers - Training - Image - SLIP
Papers - Datasets - Visualization - WizMap
Papers - Fine-tuning - Video - Video Masked Encoder
Papers - Datasets - Image to Video
Papers - Datasets - Text to Image
Papers - Fine-tuning - Self-Consistency - ScPO
Papers - Training - Self-Alignment
Papers - Healthcare - CoT
Papers - Healthcare - Reasoning
Papers - Healthcare - Benchmarks
Papers - Quantization - BitNet
Papers - Training - Scaling Properties
Models - Image - Autoregressive
Papers - Image - Autoregressive Visual Generation
Papers - Benchmarks - Math - VQA
Paperse - Mobile - Android
Papers - Interpretability - Sparse Autoencoder (SAE)
Papers - Fine-tuning - Machine Unlearning
Papers - Fine-tuning - ResNet
Papers - Training - Knowledge Distillation - Tool Usage
Papers - Training - Knowledge Distillation - World
Papers - Image - CoT
Papers - Custom Layers - Persistent Key-Value Vectors
Papers - Attention - Token Parameter - Pattention
Models - Video Games - Gameplay
Papers - Flow Matching
-
Movie Gen: A Cast of Media Foundation Models
Paper • 2410.13720 • Published • 98 -
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Paper • 2410.06885 • Published • 46 -
Flow Matching for Generative Modeling
Paper • 2210.02747 • Published • 3 -
Matcha-TTS: A fast TTS architecture with conditional flow matching
Paper • 2309.03199 • Published • 13
Papers - Video - MovieGen
Papers - Training - Differential Transformer
matlok - Python Copilot Image Datasets
More extracted images on github: https://github.com/matlok-ai/python-copilot-image-and-audio-examples/tree/main/png
-
matlok/python-image-copilot-training-using-class-knowledge-graphs-2024-01-27
Viewer • Updated • 773 • 561 -
matlok/python-image-copilot-training-using-function-knowledge-graphs
Viewer • Updated • 88 • 668 -
matlok/python-image-copilot-training-using-inheritance-knowledge-graphs
Viewer • Updated • 88 • 224 -
matlok/python-image-copilot-training-using-import-knowledge-graphs
Viewer • Updated • 88 • 100
matlok - Python Code Instruction Datasets
Python Alpaca instructions from leading AI research and tools repositories - focus is on "Manager level" understanding atm
-
matlok/python-text-copilot-training-instruct-ai-research-2024-02-11
Viewer • Updated • 130 • 412 -
matlok/python-text-copilot-training-instruct-ai-research-2024-02-10
Viewer • Updated • 123 • 746 -
matlok/python-text-copilot-training-instruct-ai-research-2024-02-03
Viewer • Updated • 2.67k • 3.48k • 1 -
matlok/python-text-copilot-training-instruct-ai-research-2024-01-27
Viewer • Updated • 43.1k • 412
matlok - Python Copilot Audio Datasets
More extracted mp3 samples on github: https://github.com/matlok-ai/python-copilot-image-and-audio-examples/tree/main/mp3
-
matlok/python-audio-copilot-training-using-class-knowledge-graphs-2024-01-27
Viewer • Updated • 948 • 903 -
matlok/python-audio-copilot-training-using-function-knowledge-graphs
Viewer • Updated • 120 • 184 • 1 -
matlok/python-audio-copilot-training-using-inheritance-knowledge-graphs
Viewer • Updated • 120 • 115 -
matlok/python-audio-copilot-training-using-import-knowledge-graphs
Viewer • Updated • 48 • 101
matlok - Python Src Code Datasets (base)
Python code from leading AI research and tools repositories
How to build a Python Coding Model with Alpaca Instructions
Great article on how this works: https://towardsdatascience.com/a-beginners-guide-to-llm-fine-tuning-4bae7d4da672
Dataset - Python Coding Alpaca Instructions
Image Papers
-
Visual Instruction Tuning
Paper • 2304.08485 • Published • 20 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 39 -
Aligning Large Multimodal Models with Factually Augmented RLHF
Paper • 2309.14525 • Published • 31
Audio Papers
there's many more on arxiv if you search for CLAP
-
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
Paper • 2211.06687 • Published • 4 -
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
Paper • 2401.17690 • Published • 5 -
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper • 2312.09911 • Published • 55 -
Audiobox: Unified Audio Generation with Natural Language Prompts
Paper • 2312.15821 • Published • 17
Text Instruction Papers
-
Self-Instruct: Aligning Language Model with Self Generated Instructions
Paper • 2212.10560 • Published • 9 -
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4
Paper • 2312.16171 • Published • 37 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 66 -
AlpaCare:Instruction-tuned Large Language Models for Medical Application
Paper • 2310.14558 • Published • 4
Multimodal Papers
-
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper • 2401.01885 • Published • 28 -
Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance
Paper • 2401.15687 • Published • 24 -
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Paper • 2312.17172 • Published • 30 -
MouSi: Poly-Visual-Expert Vision-Language Models
Paper • 2401.17221 • Published • 9
Mixture of Experts Papers
MoE
-
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 53 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 56 -
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper • 2312.07987 • Published • 41 -
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Paper • 2101.03961 • Published • 13
Coding Papers
There's usually interesting papers in the model cards on the leaderboard: https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard
-
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 31 -
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Paper • 2306.08568 • Published • 28 -
SantaCoder: don't reach for the stars!
Paper • 2301.03988 • Published • 7 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 66
Models - Coding
-
dphn/dolphin-2.6-mistral-7b-dpo-laser
Text Generation • 7B • Updated • 96 • 120 -
Evaluating Large Language Models Trained on Code
Paper • 2107.03374 • Published • 8 -
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Paper • 2002.08155 • Published • 2 -
code2seq: Generating Sequences from Structured Representations of Code
Paper • 1808.01400 • Published • 2
Embedding Papers
-
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 82 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Metadata Might Make Language Models Better
Paper • 2211.10086 • Published • 4 -
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Paper • 2310.03686 • Published • 3
Transformer Arch
Checkout: https://bbycroft.net/llm and http://nlp.seas.harvard.edu/2018/04/03/attention.html
-
Attention Is All You Need
Paper • 1706.03762 • Published • 91 -
ImageNet Large Scale Visual Recognition Challenge
Paper • 1409.0575 • Published • 9 -
Sequence to Sequence Learning with Neural Networks
Paper • 1409.3215 • Published • 3 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 17
LMM
Large Multimodal Models
LoRA
-
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper • 2311.05556 • Published • 87 -
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
Paper • 2311.11501 • Published • 37 -
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 32 -
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Paper • 2310.20624 • Published • 13
Non-English Embeddings and Models
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper • 2211.05100 • Published • 34 -
Contrastive Language-Image Pre-training for the Italian Language
Paper • 2108.08688 • Published • 2 -
IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation
Paper • 2203.03759 • Published • 5 -
Spanish Pre-trained BERT Model and Evaluation Data
Paper • 2308.02976 • Published • 3
Fine-Tuning
-
Metadata Might Make Language Models Better
Paper • 2211.10086 • Published • 4 -
Empirical Analysis of the Strengths and Weaknesses of PEFT Techniques for LLMs
Paper • 2304.14999 • Published • 2 -
PEFT for Speech: Unveiling Optimal Placement, Merging Strategies, and Ensemble Techniques
Paper • 2401.02122 • Published • 2 -
Zephyr: Direct Distillation of LM Alignment
Paper • 2310.16944 • Published • 122
More Alpaca Instruction Datasets
Model Benchmarking
-
Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam Detection
Paper • 2304.01238 • Published • 2 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 23 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13 -
Evaluating Large Language Models Trained on Code
Paper • 2107.03374 • Published • 8
Actor Critic Papers
Gaming Reinforcement Learning
Search papers from a url
Chat datasets
Audio models
-
metavoiceio/metavoice-1B-v0.1
Text-to-Speech • Updated • 438 • 789 -
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Paper • 2402.08093 • Published • 62 -
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 195 -
SWivid/F5-TTS
Text-to-Speech • Updated • 604k • 1.12k
Datasets - DPO
Datasets - Geospatial
Models - Geospatial
Papers - Geospatial
Models - Biotech
U-Net was trained in 10 hours on a NVidia Titan GPU (6 GB) - 2015
-
U-Net: Convolutional Networks for Biomedical Image Segmentation
Paper • 1505.04597 • Published • 14 -
microsoft/BioGPT-Large
Text Generation • Updated • 23k • 207 -
kuleshov-group/caduceus-ps_seqlen-131k_d_model-256_n_layer-16
Fill-Mask • 7.73M • Updated • 969 • 14 -
kuleshov-group/caduceus-ps_seqlen-1k_d_model-256_n_layer-4_lr-8e-3
Fill-Mask • 1.93M • Updated • 45 • 2
Datasets - Financial
Models - Video Editing
-
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing
Paper • 2402.10294 • Published • 27 -
Valley: Video Assistant with Large Language model Enhanced abilitY
Paper • 2306.07207 • Published • 2 -
Video Editing via Factorized Diffusion Distillation
Paper • 2403.09334 • Published • 23
Models - Testing
Papers - Attention
-
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 81 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 6 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 22 -
Sequence Parallelism: Long Sequence Training from System Perspective
Paper • 2105.13120 • Published • 6
Papers - Context
-
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Paper • 2402.10790 • Published • 42 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 18 -
A Neural Conversational Model
Paper • 1506.05869 • Published • 2 -
Data Engineering for Scaling Language Models to 128K Context
Paper • 2402.10171 • Published • 25
Papers - Synthetic Data
-
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 31 -
Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping
Paper • 1709.07857 • Published • 2 -
Simple synthetic data reduces sycophancy in large language models
Paper • 2308.03958 • Published • 22 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 11
Tuning - Dora
Models - Fintech
-
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models
Paper • 2402.10986 • Published • 81 -
BloombergGPT: A Large Language Model for Finance
Paper • 2303.17564 • Published • 26 -
GPT-InvestAR: Enhancing Stock Investment Strategies through Annual Report Analysis with Large Language Models
Paper • 2309.03079 • Published • 2 -
FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis
Paper • 2308.01430 • Published • 2
Models - Multimodal
-
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper • 2402.12226 • Published • 45 -
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition
Paper • 2401.11649 • Published • 3 -
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition
Paper • 2402.15504 • Published • 22 -
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 195
Models - MultiAgent
Models - n-gram and Kneser-Ney
-
A Generalized Language Model as the Combination of Skipped n-grams and Modified Kneser-Ney Smoothing
Paper • 1404.3377 • Published • 2 -
Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation
Paper • 1412.1454 • Published • 2 -
Neural Text Generation from Structured Data with Application to the Biography Domain
Paper • 1603.07771 • Published • 2 -
Distributed Representations of Words and Phrases and their Compositionality
Paper • 1310.4546 • Published • 3
Papers - NLP Research
Papers - Multi-turn Conversations
Datasets - Synthetic - Instruct
Models - Watermarking
Models - Captions
Papers - Fintech - Benchmarks
Models - Touch and Image
Models - Video
-
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper • 2402.13217 • Published • 37 -
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 195 -
Qwen/Qwen-VL-Chat
Text Generation • Updated • 63.3k • 374 -
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper • 2403.01422 • Published • 29
Datasets - Image - Text
Models - NeRFs - Image Radiance Fields
-
Improving Robustness for Joint Optimization of Camera Poses and Decomposed Low-Rank Tensorial Radiance Fields
Paper • 2402.13252 • Published • 19 -
RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs
Paper • 2112.00724 • Published • 2 -
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts
Paper • 2308.11793 • Published • 2
Models - Parameter Testing
Models - Predicting Models
Models - Robotics
-
Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping
Paper • 1709.07857 • Published • 2 -
Sensor-based Multi-Robot Search and Coverage with Spatial Separation in Unstructured Environments
Paper • 2403.01710 • Published • 2 -
Twisting Lids Off with Two Hands
Paper • 2403.02338 • Published • 7
Datasets - Coding
Models - ReAct - Reasoning and Action
Models - Text
-
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Paper • 1909.08053 • Published • 3 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 14 -
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Paper • 2304.01373 • Published • 9
Models - Custom-Training
exploring speculative sampling with autoregressive model like: https://proceedings.mlr.press/v139/song21a.html and https://proceedings.mlr.press/v119/
Papers - Decoders
-
Lossless Acceleration for Seq2seq Generation with Aggressive Decoding
Paper • 2205.10350 • Published • 2 -
Blockwise Parallel Decoding for Deep Autoregressive Models
Paper • 1811.03115 • Published • 2 -
Fast Transformer Decoding: One Write-Head is All You Need
Paper • 1911.02150 • Published • 9 -
Sequence-Level Knowledge Distillation
Paper • 1606.07947 • Published • 2
Papers - Testing a Coding Model
Datasets - Text
Datasets - Multimodal - Text and Images
Models - Large Scale
Papers - Coding
-
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Paper • 2002.08155 • Published • 2 -
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 83 -
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper • 2310.17680 • Published • 73 -
CodePlan: Repository-level Coding using LLMs and Planning
Paper • 2309.12499 • Published • 79
Papers - Transfer Learning
Datasets - Text - Multiple Choice
Datasets - Binarized
Models - Math
-
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Paper • 2402.14830 • Published • 25 -
MathScale: Scaling Instruction Tuning for Mathematical Reasoning
Paper • 2403.02884 • Published • 17 -
meta-math/MetaMath-Mistral-7B
Text Generation • Updated • 688 • 96 -
meta-math/MetaMath-13B-V1.0
Text Generation • Updated • 371 • 13
Papers - Pipeline - Multimodal
Papers - Reasoning
-
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Paper • 2402.14848 • Published • 20 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 50 -
How Far Are We from Intelligent Visual Deductive Reasoning?
Paper • 2403.04732 • Published • 23 -
Learning to Reason and Memorize with Self-Notes
Paper • 2305.00833 • Published • 5
Models - Gaming
Papers - IoT
-
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 134 -
Sensor-based Multi-Robot Search and Coverage with Spatial Separation in Unstructured Environments
Paper • 2403.01710 • Published • 2 -
EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models
Paper • 2308.14352 • Published -
Slimmable Encoders for Flexible Split DNNs in Bandwidth and Resource Constrained IoT Systems
Paper • 2306.12691 • Published • 2
Papers - Learning and Compression
Papers - Conversations
Models - Quants
Models - Image - Geometric Algebra
Models - Image
-
Geometric Algebra Transformers
Paper • 2305.18415 • Published • 2 -
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 40 -
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Paper • 1503.03585 • Published • 5 -
IDKiro/sdxs-512-0.9
Text-to-Image • Updated • 443 • 109
Papers - Video
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 21 -
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Paper • 2310.19512 • Published • 16 -
VideoMamba: State Space Model for Efficient Video Understanding
Paper • 2403.06977 • Published • 30 -
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Paper • 2401.09047 • Published • 14
Spaces - Math
Datasets - Math
-
introspector/unimath
Updated • 1.41k • 7 -
MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms
Paper • 1905.13319 • Published • 2 -
Measuring Mathematical Problem Solving With the MATH Dataset
Paper • 2103.03874 • Published • 5 -
MathScale: Scaling Instruction Tuning for Mathematical Reasoning
Paper • 2403.02884 • Published • 17
Models - Base - 7B
Models - Base - 1B
Spaces - Vision
Datasets - Image and Bounding Box
Models - Science
Papers - Sampling
-
Priority Sampling of Large Language Models for Compilers
Paper • 2402.18734 • Published • 19 -
Accelerating Large Language Model Decoding with Speculative Sampling
Paper • 2302.01318 • Published • 3 -
Fast Inference from Transformers via Speculative Decoding
Paper • 2211.17192 • Published • 9 -
AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling
Paper • 2011.09011 • Published • 2
Models - Byte Transformer
Models - Cooking
Papers - RoPE
-
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 24 -
Scaling Laws of RoPE-based Extrapolation
Paper • 2310.05209 • Published • 8 -
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Paper • 2404.12387 • Published • 39 -
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 126
Papers - Math - GSM8K
-
Training Verifiers to Solve Math Word Problems
Paper • 2110.14168 • Published • 4 -
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Paper • 2309.12284 • Published • 18 -
LiteSearch: Efficacious Tree Search for LLM
Paper • 2407.00320 • Published • 40 -
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Paper • 2309.03883 • Published • 35
Papers - Model Scaling
-
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 9 -
An Empirical Model of Large-Batch Training
Paper • 1812.06162 • Published • 3 -
Measuring the Effects of Data Parallelism on Neural Network Training
Paper • 1811.03600 • Published • 2 -
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2
Papers - Training Research
-
Measuring the Effects of Data Parallelism on Neural Network Training
Paper • 1811.03600 • Published • 2 -
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2 -
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Paper • 1905.11946 • Published • 3 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 65
Models - UI - Front-End
Papers - Reasoning - Vision
Models - Text - Explanation
Papers - Multi-Agent
Papers - QLoRA
Papers - Ring Attention
-
Sequence Parallelism: Long Sequence Training from System Perspective
Paper • 2105.13120 • Published • 6 -
Ring Attention with Blockwise Transformers for Near-Infinite Context
Paper • 2310.01889 • Published • 13 -
Striped Attention: Faster Ring Attention for Causal Transformers
Paper • 2311.09431 • Published • 4 -
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 40
Papers - Sequence Parallelism
Models - Legal
Helpful - VRAM Calculator
Models - Audio - Translation
Models - Video Generation
Models - Image - Long Context
Papers - Masked Sequence Packing
Papers - Speculative Decoding
-
Accelerating LLM Inference with Staged Speculative Decoding
Paper • 2308.04623 • Published • 25 -
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Paper • 2310.12962 • Published • 13 -
The Curious Case of Neural Text Degeneration
Paper • 1904.09751 • Published • 3 -
On Speculative Decoding for Multimodal Large Language Models
Paper • 2404.08856 • Published • 13
Papers - Fine-tuning - Multimodal
Datasets - Math - Word Problems
Spaces - Coding
Models - Audio - Music Generation
Datasets - Audio
Datasets - Audio - Fine-tuning
Models - Audio - Sheet Music Gen
Papers - Striped Attention
Datasets - Text - Instruction (non-Alpaca)
Models - Images - Instruct
Papers - Benchmarks - Image and Text
Papers - Image - Not-using CLIP
-
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper • 2403.05135 • Published • 45 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper • 2403.05525 • Published • 46 -
CoCa: Contrastive Captioners are Image-Text Foundation Models
Paper • 2205.01917 • Published • 3
Models - Suggest - Audiobooks from Playlist
Models - MoE
-
Mixtral of Experts
Paper • 2401.04088 • Published • 159 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 53 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 73 -
EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models
Paper • 2308.14352 • Published
Papers - MoE
-
Non-asymptotic oracle inequalities for the Lasso in high-dimensional mixture of experts
Paper • 2009.10622 • Published • 1 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 53 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 73 -
MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving
Paper • 2401.14361 • Published • 2
Models - MoE - IoT
Models - IoT
Models - Mamba
Models - MoE - Mulitmodal
-
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Paper • 2308.11971 • Published • 2 -
HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph Completion
Paper • 2308.06512 • Published • 2 -
Unraveling Complex Data Diversity in Underwater Acoustic Target Recognition through Convolution-based Mixture of Experts
Paper • 2402.11919 • Published • 2
Papers - MoE - Research
-
Adaptive sequential Monte Carlo by means of mixture of experts
Paper • 1108.2836 • Published • 2 -
Convergence Rates for Mixture-of-Experts
Paper • 1110.2058 • Published • 2 -
Multi-view Contrastive Learning for Entity Typing over Knowledge Graphs
Paper • 2310.12008 • Published • 2 -
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts
Paper • 2308.11793 • Published • 2
Papers - Image - Knowledge Graphs
-
Multi-view Contrastive Learning for Entity Typing over Knowledge Graphs
Paper • 2310.12008 • Published • 2 -
HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph Completion
Paper • 2308.06512 • Published • 2 -
ARIEL: Adversarial Graph Contrastive Learning
Paper • 2208.06956 • Published • 2 -
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Paper • 2401.18059 • Published • 46
Papers - MoE - Training
-
Robust Mixture-of-Expert Training for Convolutional Neural Networks
Paper • 2308.10110 • Published • 2 -
Experts Weights Averaging: A New General Training Scheme for Vision Transformers
Paper • 2308.06093 • Published • 2 -
ConstitutionalExperts: Training a Mixture of Principle-based Prompts
Paper • 2403.04894 • Published • 2 -
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models
Paper • 2403.03432 • Published • 1
Papers - Image - MoE
-
Robust Mixture-of-Expert Training for Convolutional Neural Networks
Paper • 2308.10110 • Published • 2 -
HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph Completion
Paper • 2308.06512 • Published • 2 -
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Paper • 2309.04354 • Published • 15 -
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Paper • 2212.05055 • Published • 6
Models - Image - MoE
Papers - Lora - LCM
Models - Image - Drone Photography
Models - Image - Lora
Models - MoE - Principles
Models - MoE - Constitutional Experts
Models - MoE - Visual Relationship Detection
Models - MoE - Training using Lora
Papers - Training with Lora
-
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models
Paper • 2403.03432 • Published • 1 -
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data
Paper • 2304.08247 • Published • 2
Papers - MoE - Prompt Immunity
Papers - MoE - Router
-
Turn Waste into Worth: Rectifying Top-k Router of MoE
Paper • 2402.12399 • Published • 2 -
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via Competition
Paper • 2402.02526 • Published • 3 -
Buffer Overflow in Mixture of Experts
Paper • 2402.05526 • Published • 8 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 28
Models - MoE - Audio - Underwater Acoustics
Models - MoE - Audio
Papers - MoE - Malicious Queries
Papers - MoE - Image
-
Scaling Vision with Sparse Mixture of Experts
Paper • 2106.05974 • Published • 4 -
Routers in Vision Mixture of Experts: An Empirical Study
Paper • 2401.15969 • Published • 2 -
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Paper • 2206.02770 • Published • 4 -
Experts Weights Averaging: A New General Training Scheme for Vision Transformers
Paper • 2308.06093 • Published • 2
Models - MoE - Image
Papers - MoE - Training - Blocks
Papers - MoE - Scaling
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 7 -
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Paper • 2202.08906 • Published • 2 -
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 44
Papers - MoE - Adversary Queries
Papers - MoE - Deny an Expert
Papers - MoE - Custom Layers
-
LocMoE: A Low-overhead MoE for Large Language Model Training
Paper • 2401.13920 • Published • 2 -
HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts
Paper • 2312.07035 • Published • 2 -
DEMix Layers: Disentangling Domains for Modular Language Modeling
Paper • 2108.05036 • Published • 3
Papers - MoE - Frankenmerge
Papers - Multimodal
-
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper • 2402.14289 • Published • 21 -
ImageBind: One Embedding Space To Bind Them All
Paper • 2305.05665 • Published • 6 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 188 -
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Paper • 2206.02770 • Published • 4
Papers - Image - Bounding Box
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 188 -
Unifying Vision, Text, and Layout for Universal Document Processing
Paper • 2212.02623 • Published • 11 -
Grounded Language-Image Pre-training
Paper • 2112.03857 • Published • 3 -
ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model
Paper • 2404.07773 • Published • 1
Papers - Multimodal - Documents
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 188 -
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Paper • 2006.03654 • Published • 3 -
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
Paper • 2111.09543 • Published • 3
Papers - Exploit - Model Layer Retrieval
Papers - Image
-
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Paper • 2403.06775 • Published • 5 -
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper • 2010.11929 • Published • 14 -
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 -
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2
Papers - Image - Dataset Generator
Datasets - Text and Video
Papers - Video - Mamba
Papers - Performance Trends in AI
Papers - Fine-tuning - Home Lab
-
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU
Paper • 2403.06504 • Published • 55 -
Token-Level Adaptation of LoRA Adapters for Downstream Task Generalization
Paper • 2311.10847 • Published • 2 -
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 59
Papers - MoE - Audio
-
SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts
Paper • 2105.03036 • Published • 2 -
Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition
Paper • 2112.05820 • Published • 2 -
SpeechMoE2: Mixture-of-Experts Model with Improved Routing
Paper • 2111.11831 • Published • 2
Papers - MoE - Attention
Papers - Quants
-
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper • 2310.16795 • Published • 27 -
Pareto-Optimal Quantized ResNet Is Mostly 4-bit
Paper • 2105.03536 • Published • 2 -
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Paper • 2403.15447 • Published • 16
Papers - Image - MoE - IoT
Papers - MoE - Speech Recognition
Papers - MoE - Router - Task
Papers - MoE - Multilingual
Papers - MoE - Federated Learning
Papers - MoE - Training - Weight Sharing
Papers - MoE - Router - Research
-
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts
Paper • 2306.04845 • Published • 4 -
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks
Paper • 2306.04073 • Published • 2 -
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 44 -
Unified Scaling Laws for Routed Language Models
Paper • 2202.01169 • Published • 2
Papers - Image - Handwriting Recognition
-
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 -
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2 -
Vulnerability Analysis of Transformer-based Optical Character Recognition to Adversarial Attacks
Paper • 2311.17128 • Published • 2 -
Data Generation for Post-OCR correction of Cyrillic handwriting
Paper • 2311.15896 • Published • 4
Papers - MoE - IoT
Papers - MoE - Handwriting Recognition
Papers - Image - OCR Handwriting
-
Vulnerability Analysis of Transformer-based Optical Character Recognition to Adversarial Attacks
Paper • 2311.17128 • Published • 2 -
Data Generation for Post-OCR correction of Cyrillic handwriting
Paper • 2311.15896 • Published • 4 -
An End-to-End OCR Framework for Robust Arabic-Handwriting Recognition using a Novel Transformers-based Model and an Innovative 270 Million-Words Multi-Font Corpus of Classical Arabic with Diacritics
Paper • 2208.11484 • Published • 3 -
Transformer based Urdu Handwritten Text Optical Character Reader
Paper • 2206.04575 • Published • 2
Papers - Image - Adversarial
Papers - Image - Segment - Handwriting
-
Character Queries: A Transformer-based Approach to On-Line Handwritten Character Segmentation
Paper • 2309.03072 • Published • 2 -
Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models
Paper • 2309.01674 • Published • 2 -
Segment Anything
Paper • 2304.02643 • Published • 4
Papers - Image - Handwriting and Online Gestures
Papers - Image - Handwritten Characters
-
Disentangling Writer and Character Styles for Handwriting Generation
Paper • 2303.14736 • Published • 3 -
A Transformer Architecture for Online Gesture Recognition of Mathematical Expressions
Paper • 2211.02643 • Published • 2 -
A tailored Handwritten-Text-Recognition System for Medieval Latin
Paper • 2308.09368 • Published • 3 -
Scalable handwritten text recognition system for lexicographic sources of under-resourced languages and alphabets
Paper • 2303.16256 • Published • 2
Papers - Image - Fine-tuning
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 188 -
Visual Instruction Tuning
Paper • 2304.08485 • Published • 20 -
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Paper • 2403.09622 • Published • 18 -
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 86
Papers - Image - HTR - Math Gestures and Symbols
Papers - Image - Handwritten Generation
Models - Text - Multilingual
Models - Image - Diffusion Probabilistic Models
Papers - Benchmark - Handwriting Recognition
Papers - Image - Handwriting Recognition - Lexical Features
Datasets - Image - Handwritten Recognition
GitHub: https://github.com/Planet-AI-GmbH/tfaip-hybrid-ctc-s2s and math: https://storage.googleapis.com/mathwriting_data/mathwriting-2024.tgz
Papers - Image - Custom Layers
-
Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes
Paper • 2110.05909 • Published • 2 -
Deep Residual Learning for Image Recognition
Paper • 1512.03385 • Published • 8 -
Wide Residual Networks
Paper • 1605.07146 • Published • 2 -
Comprehensive Survey of Model Compression and Speed up for Vision Transformers
Paper • 2404.10407 • Published • 1
Papers - Image - Handwriting Recognition - Tetrolets
Papers - Image - Handwriting Recognition - Near-Realtime
Papers - Text - Encoders
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15 -
Triple-Encoders: Representations That Fire Together, Wire Together
Paper • 2402.12332 • Published • 2 -
BERTs are Generative In-Context Learners
Paper • 2406.04823 • Published • 1
Papers - Text - Decoders
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15 -
A Thorough Examination of Decoding Methods in the Era of LLMs
Paper • 2402.06925 • Published • 1 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108
Papers - Text - Bidirectional - Bio
Papers - Text - Bidirectional Encoders
-
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Paper • 1901.08746 • Published • 6 -
Pretraining-Based Natural Language Generation for Text Summarization
Paper • 1902.09243 • Published • 2 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Paper • 2006.03654 • Published • 3
Papers - Text - Pre-training
-
Pretraining-Based Natural Language Generation for Text Summarization
Paper • 1902.09243 • Published • 2 -
Unified Vision-Language Pre-Training for Image Captioning and VQA
Paper • 1909.11059 • Published • 2 -
When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes
Paper • 2404.12365 • Published • 1
Papers - Text - Pre-training - Research
-
Pretraining-Based Natural Language Generation for Text Summarization
Paper • 1902.09243 • Published • 2 -
Learning to Reason and Memorize with Self-Notes
Paper • 2305.00833 • Published • 5 -
Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise
Paper • 2212.11685 • Published • 2 -
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Paper • 2408.16293 • Published • 27
Papers - Text - Pre-training - Decoder Multi-Steps
Papers - Text - Benchmarks - Quality Diversity
Papers - Image - Multimodal - Handwriting Recognition
-
Representing Online Handwriting for Recognition in Large Vision-Language Models
Paper • 2402.15307 • Published • 3 -
Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition
Paper • 1903.07377 • Published • 2 -
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Paper • 2310.16527 • Published • 2 -
Detecting and recognizing characters in Greek papyri with YOLOv8, DeiT and SimCLR
Paper • 2401.12513 • Published • 1
Papers - Text - Research
-
An Interdisciplinary Comparison of Sequence Modeling Methods for Next-Element Prediction
Paper • 1811.00062 • Published • 2 -
mT5: A massively multilingual pre-trained text-to-text transformer
Paper • 2010.11934 • Published • 4 -
Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance
Paper • 2310.10021 • Published • 2 -
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 50
Papers - Text - Multilingual
-
mT5: A massively multilingual pre-trained text-to-text transformer
Paper • 2010.11934 • Published • 4 -
mSLAM: Massively multilingual joint pre-training for speech and text
Paper • 2202.01374 • Published • 2 -
DeepNet: Scaling Transformers to 1,000 Layers
Paper • 2203.00555 • Published • 2 -
SONAR: Sentence-Level Multimodal and Language-Agnostic Representations
Paper • 2308.11466 • Published • 1
Papers - Multimodal - Speech and Text
Papers - Multimodal - Speech and Text - Multilingual
Papers - Multimodal - Training and Tuning
-
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Paper • 2310.16527 • Published • 2 -
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
Paper • 2310.02960 • Published • 1 -
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 129 -
Veagle: Advancements in Multimodal Representation Learning
Paper • 2403.08773 • Published • 10
Papers - Multimodal - Document Analysis
-
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Paper • 2310.16527 • Published • 2 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 188 -
Unifying Vision, Text, and Layout for Universal Document Processing
Paper • 2212.02623 • Published • 11
Papers - Video - Motion Control
Papers - Video - Entity Recognition
-
DragAnything: Motion Control for Anything using Entity Representation
Paper • 2403.07420 • Published • 15 -
Capabilities of Gemini Models in Medicine
Paper • 2404.18416 • Published • 24 -
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Paper • 2406.08407 • Published • 28
Papers - Video - Pre-training
Papers - Image - Pre-training
Papers - Image - Caption Generation
Papers - Image - Synthetic Data Generator
-
Synth^2: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
Paper • 2403.07750 • Published • 24 -
Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering
Paper • 2403.14554 • Published • 14 -
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Paper • 2403.18818 • Published • 28
Papers - Transformer Research - Custom Layers
Papers - ResNet
-
Wide Residual Networks
Paper • 1605.07146 • Published • 2 -
Characterizing signal propagation to close the performance gap in unnormalized ResNets
Paper • 2101.08692 • Published • 2 -
Pareto-Optimal Quantized ResNet Is Mostly 4-bit
Paper • 2105.03536 • Published • 2 -
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Paper • 2106.01548 • Published • 2
Papers - SuperNets
-
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts
Paper • 2306.04845 • Published • 4 -
Balanced Mixture of SuperNets for Learning the CNN Pooling Architecture
Paper • 2306.11982 • Published • 2 -
AlphaNet: Improved Training of Supernets with Alpha-Divergence
Paper • 2102.07954 • Published • 2
Papers - Federated Learning
Papers - Mamba - Structured State Space Model
-
Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM
Paper • 2403.07487 • Published • 17 -
LocalMamba: Visual State Space Model with Windowed Selective Scan
Paper • 2403.09338 • Published • 9 -
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Paper • 2403.14520 • Published • 35 -
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
Paper • 2403.15360 • Published • 13
Papers - Image - Human Motion Generator
Papers - MoE - Multimodal
Mixtures of experts for text, image and speech
Papers - Autonomous Drones
Papers - Multimodal - Drone
Papers - Multimodal - Drone - Object Manipulation
Papers - Training Research - Time series
-
Chronos: Learning the Language of Time Series
Paper • 2403.07815 • Published • 46 -
Large Language Models as Optimizers
Paper • 2309.03409 • Published • 77 -
Pattern Discovery in Time Series with Byte Pair Encoding
Paper • 2106.00614 • Published • 2 -
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Paper • 2403.19888 • Published • 12
Papers - Pre-training - Time Series
Papers - Neural Architecture Search
-
AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling
Paper • 2011.09011 • Published • 2 -
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Paper • 2005.14187 • Published • 2 -
BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models
Paper • 2003.11142 • Published • 2 -
Efficient Architecture Search by Network Transformation
Paper • 1707.04873 • Published • 2
Papers - Training - Hardware Detection
Papers - Image - Split Computing
Papers - Image - IoT - Split Computing
Papers - U-Net
-
U-Net: Convolutional Networks for Biomedical Image Segmentation
Paper • 1505.04597 • Published • 14 -
Image Segmentation using U-Net Architecture for Powder X-ray Diffraction Images
Paper • 2310.16186 • Published • 2 -
H-DenseUNet: Hybrid Densely Connected UNet for Liver and Tumor Segmentation from CT Volumes
Paper • 1709.07330 • Published • 2 -
Deep LOGISMOS: Deep Learning Graph-based 3D Segmentation of Pancreatic Tumors on CT scans
Paper • 1801.08599 • Published • 2
Papers - Image - Segmentation
-
Image Segmentation using U-Net Architecture for Powder X-ray Diffraction Images
Paper • 2310.16186 • Published • 2 -
H-DenseUNet: Hybrid Densely Connected UNet for Liver and Tumor Segmentation from CT Volumes
Paper • 1709.07330 • Published • 2 -
Deep LOGISMOS: Deep Learning Graph-based 3D Segmentation of Pancreatic Tumors on CT scans
Paper • 1801.08599 • Published • 2 -
RTSeg: Real-time Semantic Segmentation Comparative Study
Paper • 1803.02758 • Published • 2
Papers - Image - Segmentation - Cancer
-
H-DenseUNet: Hybrid Densely Connected UNet for Liver and Tumor Segmentation from CT Volumes
Paper • 1709.07330 • Published • 2 -
Deep LOGISMOS: Deep Learning Graph-based 3D Segmentation of Pancreatic Tumors on CT scans
Paper • 1801.08599 • Published • 2 -
Hierarchical multi-class segmentation of glioma images using networks with multi-level activation function
Paper • 1810.09488 • Published • 2 -
Cross-modality (CT-MRI) prior augmented deep learning for robust lung tumor segmentation from small MR datasets
Paper • 1901.11369 • Published • 2
Papers - Video - Synthetic Data Generator
-
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper • 2403.01422 • Published • 29 -
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding
Paper • 2403.09530 • Published • 10 -
VidToMe: Video Token Merging for Zero-Shot Video Editing
Paper • 2312.10656 • Published • 11 -
TC4D: Trajectory-Conditioned Text-to-4D Generation
Paper • 2403.17920 • Published • 18
Papers - Image - Segmentation - Drone
Papers - Image - Segmentation - Report
-
Generalizability vs. Robustness: Adversarial Examples for Medical Imaging
Paper • 1804.00504 • Published • 2 -
Evaluating Transformer-based Semantic Segmentation Networks for Pathological Image Segmentation
Paper • 2108.11993 • Published • 2 -
From Modern CNNs to Vision Transformers: Assessing the Performance, Robustness, and Classification Strategies of Deep Learning Models in Histopathology
Paper • 2204.05044 • Published • 2
Papers - Image - Segmentation - Adversarial
Papers - Image - Segmentation - MRI
Papers - Image - Segmentation - Stroke Brain Lesions
Papers - Image - SkipNet
Papers - Image - IoT
Papers - Image - Hybrid
-
3D Medical Image Segmentation based on multi-scale MPU-Net
Paper • 2307.05799 • Published • 2 -
Joint Liver and Hepatic Lesion Segmentation in MRI using a Hybrid CNN with Transformer Layers
Paper • 2201.10981 • Published • 2 -
Using Multi-scale SwinTransformer-HTC with Data augmentation in CoNIC Challenge
Paper • 2202.13588 • Published • 2
Papers - Image - Hybrid - ResNet - U-Net
Papers - Image - Hybrid - Swin - U-Net
-
Attention Swin U-Net: Cross-Contextual Attention Mechanism for Skin Lesion Segmentation
Paper • 2210.16898 • Published • 2 -
Cross-Shaped Windows Transformer with Self-supervised Pretraining for Clinically Significant Prostate Cancer Detection in Bi-parametric MRI
Paper • 2305.00385 • Published • 2 -
Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss
Paper • 2404.02731 • Published • 1
Papers - Image - Segmentation - Bio Cell
-
Enforcing Morphological Information in Fully Convolutional Networks to Improve Cell Instance Segmentation in Fluorescence Microscopy Images
Paper • 2106.05843 • Published • 2 -
Semi-Supervised Semantic Segmentation using Redesigned Self-Training for White Blood Cells
Paper • 2401.07278 • Published • 2
Papers - Image - Segmentation - Quantum
Papers - Image - Hybrid - Graph Net - U-Net
Papers - Image - Hybrid - Patient Meta Data - U-Net
Papers - Image - CSWin - Cross-Shaped Windows
-
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Paper • 2107.00652 • Published • 2 -
Cross-Shaped Windows Transformer with Self-supervised Pretraining for Clinically Significant Prostate Cancer Detection in Bi-parametric MRI
Paper • 2305.00385 • Published • 2 -
2nd Place Solution to Google Landmark Recognition Competition 2021
Paper • 2110.02638 • Published • 2 -
BOAT: Bilateral Local Attention Vision Transformer
Paper • 2201.13027 • Published • 2
Papers - Image - Encoders
-
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Paper • 2107.00652 • Published • 2 -
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Paper • 2403.09622 • Published • 18 -
Veagle: Advancements in Multimodal Representation Learning
Paper • 2403.08773 • Published • 10 -
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Paper • 2304.14178 • Published • 3
Papers - Image - Encoders - LePE - Local-Enhanced Pos Enc
Papers - Image - Attention - BOAT - Bilateral Local Attn
Papers - Image - Attention - Multi-Scale
-
MAFormer: A Transformer Network with Multi-scale Attention Fusion for Visual Recognition
Paper • 2209.01620 • Published • 2 -
Using Multi-scale SwinTransformer-HTC with Data augmentation in CoNIC Challenge
Paper • 2202.13588 • Published • 2 -
GasHis-Transformer: A Multi-scale Visual Transformer Approach for Gastric Histopathological Image Detection
Paper • 2104.14528 • Published • 2
Papers - Image - Swin
-
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Paper • 2103.14030 • Published • 5 -
A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images
Paper • 2104.12137 • Published • 2 -
Self-Supervised Learning with Swin Transformers
Paper • 2105.04553 • Published • 3 -
Evaluating Transformer-based Semantic Segmentation Networks for Pathological Image Segmentation
Paper • 2108.11993 • Published • 2
Papers - Text - Fine-tuning - Math
Papers - BYOL
Papers - Robot - Tasks - Boss
Papers - Text - Model Guided Training
Papers - Robot - Research
-
Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance
Paper • 2310.10021 • Published • 2 -
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
Discovering Adaptable Symbolic Algorithms from Scratch
Paper • 2307.16890 • Published • 7 -
DragAPart: Learning a Part-Level Motion Prior for Articulated Objects
Paper • 2403.15382 • Published • 11
Papers - Image - Hybrid - Hybrid Task Cascade (HTC) - Swin
Papers - Image - GasHis
Papers - Image - Dino
-
Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology
Paper • 2203.00585 • Published • 2 -
Emerging Properties in Self-Supervised Vision Transformers
Paper • 2104.14294 • Published • 3 -
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
Paper • 2404.06903 • Published • 21 -
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 32
Papers - Text - Architecture - Scaling to 1000 Layers
Papers - DenseNet
Papers - Adversarial Testing
-
Feature-Guided Black-Box Safety Testing of Deep Neural Networks
Paper • 1710.07859 • Published • 2 -
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks
Paper • 2309.17410 • Published • 4 -
Intriguing Properties of Adversarial Examples
Paper • 1711.02846 • Published • 2
Papers - Image - EfficientNet
Papers - Image - Compound Scaling Method
Papers - Base Models - Text - Coding
Papers - Image - Visualization - Splatting
Papers - AI - Social Risks
Papers - AI - Safety
Papers - Testing - Single Layer Model
Papers - Custom Layers
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Paper • 2310.00535 • Published • 2 -
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Paper • 2307.09458 • Published • 11 -
The Impact of Depth and Width on Transformer Language Model Generalization
Paper • 2310.19956 • Published • 10
Papers - Pre-training
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Paper • 2403.15042 • Published • 27 -
LIMA: Less Is More for Alignment
Paper • 2305.11206 • Published • 26
Papers - Motion Control
Papers - Fine-tuning
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
SELF: Language-Driven Self-Evolution for Large Language Model
Paper • 2310.00533 • Published • 2 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 56 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 45
Papers - Fine-tuning - LoRA
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data
Paper • 2304.08247 • Published • 2 -
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 32 -
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper • 2404.00656 • Published • 11
Papers - Reinforcement Learning
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
SELF: Language-Driven Self-Evolution for Large Language Model
Paper • 2310.00533 • Published • 2 -
Bigger, Better, Faster: Human-level Atari with human-level efficiency
Paper • 2305.19452 • Published • 4 -
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Paper • 2408.08152 • Published • 59
Papers - AI - Self-refinement - Training and Tuning
Papers - Training
-
SELF: Language-Driven Self-Evolution for Large Language Model
Paper • 2310.00533 • Published • 2 -
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
Paper • 2310.00576 • Published • 2 -
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
Paper • 2305.13169 • Published • 3 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15
Papers - Audio
-
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 21 -
Structural Similarities Between Language Models and Neural Response Measurements
Paper • 2306.01930 • Published • 2 -
Streaming Transformer ASR with Blockwise Synchronous Beam Search
Paper • 2006.14941 • Published • 2 -
NU-GAN: High resolution neural upsampling with GAN
Paper • 2010.11362 • Published • 2
Papers - Text - Math
-
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Paper • 2310.00535 • Published • 2 -
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
Paper • 2308.07921 • Published • 23 -
AutoNumerics-Zero: Automated Discovery of State-of-the-Art Mathematical Functions
Paper • 2312.08472 • Published • 2 -
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Paper • 2408.16293 • Published • 27
Papers - Observability and Interpretability
-
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Paper • 2310.00535 • Published • 2 -
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Paper • 2211.00593 • Published • 2 -
Rethinking Interpretability in the Era of Large Language Models
Paper • 2402.01761 • Published • 23 -
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Paper • 2307.09458 • Published • 11
Papers - Multimodal - Healthcare
Papers - Interpretability - DAS
Papers - Named Entity Extraction - Healthcare
Papers - Multilingual
-
A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese
Paper • 2304.08999 • Published • 3 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
Robust Open-Vocabulary Translation from Visual Text Representations
Paper • 2104.08211 • Published • 1 -
Poro 34B and the Blessing of Multilinguality
Paper • 2404.01856 • Published • 15
Papers - Healthcare
-
MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data
Paper • 2304.08247 • Published • 2 -
Structural Similarities Between Language Models and Neural Response Measurements
Paper • 2306.01930 • Published • 2 -
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V
Paper • 2310.19061 • Published • 8 -
Question-Answering Model for Schizophrenia Symptoms and Their Impact on Daily Life using Mental Health Forums Data
Paper • 2310.00448 • Published
Papers - Watermark
Papers - Image - Clip
-
Demystifying CLIP Data
Paper • 2309.16671 • Published • 20 -
Model Stock: All we need is just a few fine-tuned models
Paper • 2403.19522 • Published • 13 -
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Paper • 2404.01367 • Published • 22 -
On the Scalability of Diffusion-based Text-to-Image Generation
Paper • 2404.02883 • Published • 19
Papers - Proof of Learning
Papers - Disaster Recovery
Papers - Named Entity Extraction and Disambiguation
Papers - Neural Architecture Search - Report
Papers - Neural Architecture Search - One-shot
Papers - Neural Architecture Search - Tabular Data
Papers - Hyperparameter Architecture Search
Papers - Image - Neural Architecture Search
Papers - Neural Architecture Search - RNN
Papers - Neural Architecture Search - Reinforcement Learning
Papers - Neural Architecture Search - Quantization - FLIQS
Papers - AutoML
-
Unified Functional Hashing in Automatic Machine Learning
Paper • 2302.05433 • Published • 2 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 117 -
Semi-Supervised Semantic Segmentation using Redesigned Self-Training for White Blood Cells
Paper • 2401.07278 • Published • 2
Papers - Neural Architecture Search - AutoML
Papers - Testing - Speech and Text
Papers - AI - Are models similar to a human brain?
Papers - Automated Training - Self Discover
Papers - Math - Automated Discovery
Papers - Math - Research
-
AutoNumerics-Zero: Automated Discovery of State-of-the-Art Mathematical Functions
Paper • 2312.08472 • Published • 2 -
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Paper • 2403.14624 • Published • 53 -
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Paper • 2404.02893 • Published • 22 -
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 93
Papers - Alpaca
Papers - Critical Thinking - Step Back
Papers - Critical Thinking
Papers - Text - Length Generalization
Papers - Text - Encoders - Fire
Papers - Image - Multi-Image Reasoning
Paper - Image - Chain of Thought
Papers - Image - Text and Symbolic Image Generator
Models - Fine-tuning - Mixture of Loras
Papers - Multimodal - Text to 2D to 3D Mesh
Datasets - HTML
Datasets - Multimodal - Text and Image
-
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 55 -
HuggingFaceM4/WebSight
Viewer • Updated • 2.75M • 16.8k • 372 -
HuggingFaceM4/VLM_WebSight_finetuned
Text Generation • 8B • Updated • 584 • 190 -
laion/filtered-wit
Viewer • Updated • 2.8M • 4.67k • 10
Papers - Image - Mamba
-
LocalMamba: Visual State Space Model with Windowed Selective Scan
Paper • 2403.09338 • Published • 9 -
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
Paper • 2403.15360 • Published • 13 -
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Paper • 2403.19888 • Published • 12
Papers - Image - Selective Scan
Papers - Healthcare - Mental Health
Papers - Encoders
-
Functional Interpolation for Relative Positions Improves Long Context Transformers
Paper • 2310.04418 • Published • 4 -
SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs
Paper • 2106.09997 • Published • 2 -
Neural Machine Translation of Rare Words with Subword Units
Paper • 1508.07909 • Published • 4 -
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
Paper • 2403.14438 • Published • 2
Papers - Encoders - Fire
Papers - Video - Understanding with Many Models
Papers - Video - Understanding
-
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Paper • 2403.09626 • Published • 16 -
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Paper • 2403.10517 • Published • 37 -
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Paper • 2403.13501 • Published • 9 -
LITA: Language Instructed Temporal-Localization Assistant
Paper • 2403.19046 • Published • 19
Papers - Image - Understanding
-
Veagle: Advancements in Multimodal Representation Learning
Paper • 2403.08773 • Published • 10 -
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Paper • 2304.14178 • Published • 3 -
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
Paper • 2403.12596 • Published • 11 -
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Paper • 2403.11703 • Published • 17
Papers - Multimodal - Encoders
Papers - Image - GiT
Papers - Text - Star
Papers - QFormer
Papers - Image - Near Real Time
Papers - Image - Attention - Window
-
Vision Transformer with Quadrangle Attention
Paper • 2303.15105 • Published • 2 -
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Paper • 2103.14030 • Published • 5 -
MAFormer: A Transformer Network with Multi-scale Attention Fusion for Visual Recognition
Paper • 2209.01620 • Published • 2 -
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Paper • 2107.00652 • Published • 2
Papers - Image - Editing
-
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control
Paper • 2403.09055 • Published • 27 -
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Paper • 2112.10741 • Published • 4 -
Lightweight Image Inpainting by Stripe Window Transformer with Joint Attention to CNN
Paper • 2301.00553 • Published • 3 -
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Paper • 2403.18818 • Published • 28
Papers - Image - Training - Noise
Papers - Image - LCM
-
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control
Paper • 2403.09055 • Published • 27 -
ReNoise: Real Image Inversion Through Iterative Noising
Paper • 2403.14602 • Published • 21 -
EdgeFusion: On-Device Text-to-Image Generation
Paper • 2404.11925 • Published • 23
Papers - Image - Training - Quantized Mask
Papers - Image - Editing - Glide
Papers - Image - Training - Seed Vector
Papers - Image - Semantic Palette
Papers - Blockwise Parallel
Papers - Training - Distributed
Papers - Training - Masked Sequence Packing
Datasets - Chess
Papers - Semantic Segmentation
Papers - Training - FixMatch
Papers - Training - Self-Training - Student and Teacher
Papers - Task Assistant - ExploreLLM
Papers - Training - Guided Task Flow
Papers - Training - Problem Solving
Papers - Structured Thoughts
Papers - GUI - Task Assistants
Papers - Chinchilla
Papers - Model Scaling - Effective Parameter Count
Papers - Custom Layers - Hash Layers
Papers - Scaling
Papers - Hallucination - Reduction
Papers - Chain of Verification
Papers - Reading Comprehension
Datasets - Text - Multilingual
Papers - Training - Chain of Thought
Papers - CoT - Chain of Thought
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 39 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Paper • 2403.14624 • Published • 53 -
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Paper • 2402.12875 • Published • 13
Papers - Ethics
-
Exploring Large Language Models' Cognitive Moral Development through Defining Issues Test
Paper • 2309.13356 • Published • 37 -
Unveiling Safety Vulnerabilities of Large Language Models
Paper • 2311.04124 • Published • 10 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69 -
Evaluating Frontier Models for Dangerous Capabilities
Paper • 2403.13793 • Published • 7
Papers - Fine-tuning - QA-LoRA
Papers - Fine-tuning - Understanding Tables
Papers - Text - Perform Tasks on Tabular Data
-
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 41 -
approximatelabs/tablib-v1-full
Viewer • Updated • 10.4B • 5.33k • 64 -
approximatelabs/tablib-v1-sample
Viewer • Updated • 44.9k • 297 • 13 -
TabLib: A Dataset of 627M Tables with Context
Paper • 2310.07875 • Published • 8
Datasets - Text - Tabular
Papers - Text - Dataset - TabLib - Tabular
Papers - Qwen
-
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 11 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 10 -
Audio Dialogues: Dialogues dataset for audio and music understanding
Paper • 2404.07616 • Published • 16
Papers - Qwen - Report
Papers - Multimodal - Report
significant improvements in zero-shot performance require exponentially more data, following a log-linear scaling trend
Papers - MoE - Quantization
Papers - Attention - Custom Encoder
Papers - Research - Replacing Attention
Papers - Research - Safety
Embeddings - C4 - Jina
Papers - Reduce Model Size - SliceGPT
Papers - Decoders - CoT Decoding
Papers - Rag
-
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
Paper • 2401.15391 • Published • 6 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 72 -
Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation
Paper • 2404.06910 • Published • 3 -
Stylus: Automatic Adapter Selection for Diffusion Models
Paper • 2404.18928 • Published • 15
Papers - Rag - Multi-hop Queries
-
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
Paper • 2401.15391 • Published • 6 -
Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation
Paper • 2404.06910 • Published • 3 -
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Paper • 2404.18796 • Published • 71
Papers - Encoders - Coding
Embeddings - Coding
Embeddings - Coding - CodeBert
Papers - Training - Synthetic Noise
-
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Paper • 2002.08155 • Published • 2 -
Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise
Paper • 2212.11685 • Published • 2 -
ReNoise: Real Image Inversion Through Iterative Noising
Paper • 2403.14602 • Published • 21 -
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Paper • 2105.13626 • Published • 3
Papers - Coding - Fill in the Middle - Infilling
Papers - Text - Pre-training - Synthetic Noise
Papers - Training - Knowledge Graphs
Papers - Image - Training - Knowledge Graphs
Papers - Image - Training - Adversarial
Papers - Multimodal - Fine-tuning - Report
Papers - Text - Tabular - Conditional Formatting
Papers - Text - Training - Code - Byte Pair Encoding
Papers - Coding - Out of Vocabulary
Papers - Coding - BPE vs Pointer Mixture Network
Papers - Automatic Speech Recognition
-
Streaming Transformer ASR with Blockwise Synchronous Beam Search
Paper • 2006.14941 • Published • 2 -
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
Paper • 2403.14438 • Published • 2 -
SONAR: Sentence-Level Multimodal and Language-Agnostic Representations
Paper • 2308.11466 • Published • 1
Papers - Automatic Speech Recognition - Beam Search
Papers - Beam Search
Papers - Explainability
-
Neural networks behave as hash encoders: An empirical study
Paper • 2101.05490 • Published • 2 -
A Multiscale Visualization of Attention in the Transformer Model
Paper • 1906.05714 • Published • 2 -
BERT Rediscovers the Classical NLP Pipeline
Paper • 1905.05950 • Published • 3 -
Using Explainable AI and Transfer Learning to understand and predict the maintenance of Atlantic blocking with limited observational data
Paper • 2404.08613 • Published • 1
Papers - Training - Synthetic Data - Sycophancy
Papers - Training - DoReMi
Papers - Training - Domain Reweighting
Papers - Training - AI training AI
-
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Paper • 2305.10429 • Published • 3 -
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Paper • 2403.15042 • Published • 27 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 124 -
Discovering Preference Optimization Algorithms with and for Large Language Models
Paper • 2406.08414 • Published • 16
Papers - Training - Proxy Model - Group DRO
Papers - Adafactor
-
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2 -
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Paper • 2305.10429 • Published • 3 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 6
Papers - Coding - Decoding with Static Analysis
Papers - MoE - Hashing instead of a Router
Papers - UDOP
Datasets - Multimodal - Image and Text
-
DocVQA: A Dataset for VQA on Document Images
Paper • 2007.00398 • Published • 2 -
GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
Paper • 1902.09506 • Published • 2 -
google/wit
Viewer • Updated • 2.66M • 200 • 59 -
Lin-Chen/MMStar
Viewer • Updated • 1.5k • 10.7k • 43
Papers - Multimodal - Document and Text
Datasets - Multimodal - Document and Image
Papers - Encoder - Byte-Pair Encoding
-
Neural Machine Translation of Rare Words with Subword Units
Paper • 1508.07909 • Published • 4 -
A Formal Perspective on Byte-Pair Encoding
Paper • 2306.16837 • Published • 3 -
Byte-Pair Encoding for Text-to-SQL Generation
Paper • 1910.08962 • Published • 2 -
Pattern Discovery in Time Series with Byte Pair Encoding
Paper • 2106.00614 • Published • 2
Papers - Text - SQL
Papers - Science - Research Analysis
Papers - Training - Speculative Decoding - Single Model
Papers - Attention - Tree Attention
-
Recurrent Drafter for Fast Speculative Decoding in Large Language Models
Paper • 2403.09919 • Published • 22 -
SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Paper • 2305.09781 • Published • 4 -
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
Paper • 2408.04093 • Published • 4
Papers - Fine-tuning - Rag
Models - Table - Extraction
-
microsoft/table-transformer-detection
Object Detection • 28.8M • Updated • 1.55M • 379 -
RUCKBReasoning/TableLLM-13b
Text Generation • 13B • Updated • 49 • 31 -
RUCKBReasoning/TableLLM-7b
Text Generation • 7B • Updated • 84 • 15 -
TahaDouaji/detr-doc-table-detection
Object Detection • 41.6M • Updated • 185k • 60
Papers - Video - Agent
Papers - Audio - GAN - Upsamplimg
Papers - Audio - GAN
Papers - Image - Illumination
Papers - Decoders - 3D Nerf
Papers - Image - Edit
Papers - ControlNet
-
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 57 -
LightIt: Illumination Modeling and Control for Diffusion Models
Paper • 2403.10615 • Published • 18 -
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
Paper • 2403.16627 • Published • 21 -
DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
Paper • 2403.17237 • Published • 11
Papers - Fine-tuning - Parameter Efficiency
Papers - Image - Lightning
Papers - Text - 3D Mesh - Volumetric
Papers - Text - Label Generator
Papers - Image - Limited-Training
Papers - Image - Chart to Table
Papers - Image - Plot - Understanding and Reasoning
Papers - Image - 3D Asset Enhancement
Papers - Text - Taxonomy Generator
Papers - Training - Reward Model
-
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 59 -
WARM: On the Benefits of Weight Averaged Reward Models
Paper • 2401.12187 • Published • 19 -
RewardBench: Evaluating Reward Models for Language Modeling
Paper • 2403.13787 • Published • 22 -
DreamReward: Text-to-3D Generation with Human Preference
Paper • 2403.14613 • Published • 37
Papers - Fine-tuning - Language Model Policy with LoRA
Papers - Fine-tuning - Mixture of LoRA (MoL)
Papers - Robotic - Observational Learning
Papers - Attention - Cross
-
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers
Paper • 2403.12943 • Published • 15 -
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper • 2401.04577 • Published • 43 -
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 13 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 22
Papers - Training - Skill Learning
Papers - FIne-tuning - Multi-Agent
Papers - mPlug-Owl
Papers - Image - Document - mPlugOwl
Papers - Document - mPlugOwl
Papers - Structured Learning - Document
Papers - Prompt - Prompt Compression - Report
Papers - Prompt
-
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 25 -
Improving Text-to-Image Consistency via Automatic Prompt Optimization
Paper • 2403.17804 • Published • 20 -
The Unreasonable Effectiveness of Eccentric Automatic Prompts
Paper • 2402.10949 • Published • 5 -
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Paper • 2306.02707 • Published • 47
Papers - Image - Gaussian Splatting and NeRF
-
GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation
Paper • 2403.12365 • Published • 11 -
RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS
Paper • 2403.13806 • Published • 18 -
Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering
Paper • 2403.14554 • Published • 14 -
DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
Paper • 2403.17237 • Published • 11
Models - Reverse Engineering - Decompiler
Models - Reverse Engineering
Papers - Text - 3D
-
TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation
Paper • 2403.12906 • Published • 7 -
GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
Paper • 2403.14621 • Published • 16 -
LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis
Paper • 2403.15385 • Published • 8 -
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
Paper • 2411.09595 • Published • 77
Models - Table - Structure - Recognition
Paper - Image - Table - Extraction
Paper - Image - Table
Papers - Tabular
Converted the Elephants Never Forget paper to audio with Bark: https://drive.google.com/file/d/13IlbhKh71vxLpdYJ6mkIiiJZOUsf7XFv/view?usp=sharing
-
End-to-End Object Detection with Transformers
Paper • 2005.12872 • Published • 7 -
Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models
Paper • 2404.06209 • Published • 5 -
TabReD: A Benchmark of Tabular Machine Learning in-the-Wild
Paper • 2406.19380 • Published • 50 -
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
Paper • 2407.09025 • Published • 139
Papers - Image - Object Detection
-
End-to-End Object Detection with Transformers
Paper • 2005.12872 • Published • 7 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 30 -
Grounded Language-Image Pre-training
Paper • 2112.03857 • Published • 3 -
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Paper • 2311.06242 • Published • 94
Models - Image - Object Detection
Papers - Benchmarks - Reward Models
Papers - 3D - Text
Papers - Science - Molecule
Papers - Frankenmerging
-
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 58 -
Model Stock: All we need is just a few fine-tuned models
Paper • 2403.19522 • Published • 13 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 124
Papers - Image - Frankenmerging
Papers - Image - Model Merging
Papers - Attention - Grouped-Query Attention (GQA)
-
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 6 -
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Paper • 2404.12387 • Published • 39 -
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 126 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 166
Papers - Image - Math
-
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Paper • 2403.14624 • Published • 53 -
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Paper • 2407.01284 • Published • 81 -
MAVIS: Mathematical Visual Instruction Tuning
Paper • 2407.08739 • Published • 33
Papers - Benchmarks - Math
-
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Paper • 2403.14624 • Published • 53 -
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Paper • 2407.01284 • Published • 81 -
DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models
Paper • 2411.00836 • Published • 15
Papers - Image - Reward Model
Papers - Multimodal - Mamba
Papers - Video - Editing
Papers - Image - Personalization
Papers - Image - Personalization - Captions
Papers - Image - Blip
Papers - 3D - Reconstruction
Papers - Image - Video Generator
-
Explorative Inbetweening of Time and Space
Paper • 2403.14611 • Published • 13 -
StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN
Paper • 2403.14186 • Published • 10 -
Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
Paper • 2403.18795 • Published • 20
Papers - Video - Upsampler
Papers - Video - Time Reversal Fusion
Papers - Image - Adversarial (GAN)
Papers - Image - Video - Adversarial (GAN)
Papers - Toxicity
-
Recourse for reclamation: Chatting with generative language models
Paper • 2403.14467 • Published • 8 -
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Paper • 2403.15447 • Published • 16 -
Introducing v0.5 of the AI Safety Benchmark from MLCommons
Paper • 2404.12241 • Published • 13
Papers - Fine-tuning - Toxicity
Papers - Video - Content Motion Latent Diffusion
Papers - Decoders - Chain of Thought
Papers - Image - Depth Estimation
Papers - Image - Flow Matching
Papers - Image - Training
-
Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling
Paper • 2403.14551 • Published • 2 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 18 -
Toward a Better Understanding of Fourier Neural Operators: Analysis and Improvement from a Spectral Perspective
Paper • 2404.07200 • Published • 2 -
An inclusive review on deep learning techniques and their scope in handwriting recognition
Paper • 2404.08011 • Published • 1
Papers - Text - Classification - Social Media
Papers - Text - Classification
-
LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding
Paper • 2306.14924 • Published • 2 -
When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes
Paper • 2404.12365 • Published • 1 -
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
Paper • 2311.06668 • Published • 5 -
Wave Network: An Ultra-Small Language Model
Paper • 2411.02674 • Published • 3
Papers - Text - Training - Classification
Papers - Audio - Training
-
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
Paper • 2403.14438 • Published • 2 -
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
Paper • 2403.17694 • Published • 12 -
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper • 2404.14700 • Published • 32 -
SONAR: Sentence-Level Multimodal and Language-Agnostic Representations
Paper • 2308.11466 • Published • 1
Papers - Multimodal - Audio
Papers - Audio - Whisper vs Clap - Whisper wins with ASR
Papers - Encoders - Audio
Papers - ICL - In-Context Learning
-
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models
Paper • 2311.00871 • Published • 3 -
Can large language models explore in-context?
Paper • 2403.15371 • Published • 33 -
Data Distributional Properties Drive Emergent In-Context Learning in Transformers
Paper • 2205.05055 • Published • 2 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 37
Papers - Math - Derive New Math - Function Class
Papers - Agent - Architecture
Papers - Agent - Memory
Papers - Fine-tuning - DPO
Refer to additional papers: https://link.springer.com/article/10.1007/s10994-014-5458-8 and https://link.springer.com/article/10.1007/BF00992696
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Dueling RL: Reinforcement Learning with Trajectory Preferences
Paper • 2111.04850 • Published • 2
Papers - Critic Models
Papers - Training - Critic Model
-
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Paper • 2305.11738 • Published • 8 -
Shepherd: A Critic for Language Model Generation
Paper • 2308.04592 • Published • 32 -
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Paper • 2402.14809 • Published • 3 -
DRLC: Reinforcement Learning with Dense Rewards from LLM Critic
Paper • 2401.07382 • Published • 2
Papers - Security
-
Python Fuzzing for Trustworthy Machine Learning Frameworks
Paper • 2403.12723 • Published • 2 -
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?
Paper • 2404.03411 • Published • 11 -
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Paper • 2406.01637 • Published • 2 -
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
Paper • 2407.12784 • Published • 51
Papers - Security - Fuzzing
Papers - Reasoning - Critic Pattern
-
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Paper • 2305.11738 • Published • 8 -
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Paper • 2402.14809 • Published • 3 -
DRLC: Reinforcement Learning with Dense Rewards from LLM Critic
Paper • 2401.07382 • Published • 2
Papers - Benchmarks - Reasoning
-
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Paper • 2402.14809 • Published • 3 -
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools
Paper • 2406.03618 • Published • 2
Papers - Sports
Papers - Music
Papers - Pop Culture
Papers - Coding - Chain of Thought
-
ReGAL: Refactoring Programs to Discover Generalizable Abstractions
Paper • 2401.16467 • Published • 10 -
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Paper • 2404.02575 • Published • 50 -
How Far Can We Go with Practical Function-Level Program Repair?
Paper • 2404.12833 • Published • 7
Papers - Coding - Training
Papers - Coding - Fine-tuning
-
V-STaR: Training Verifiers for Self-Taught Reasoners
Paper • 2402.06457 • Published • 9 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 46 -
McEval: Massively Multilingual Code Evaluation
Paper • 2406.07436 • Published • 41 -
Is Programming by Example solved by LLMs?
Paper • 2406.08316 • Published • 13
Papers - Coding - Reasoning
-
V-STaR: Training Verifiers for Self-Taught Reasoners
Paper • 2402.06457 • Published • 9 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 46 -
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Paper • 2404.02575 • Published • 50
Papers - Fine-tuning - Reasoning
-
V-STaR: Training Verifiers for Self-Taught Reasoners
Paper • 2402.06457 • Published • 9 -
Self-Consistency Preference Optimization
Paper • 2411.04109 • Published • 19 -
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper • 2411.14405 • Published • 61 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 129
Papers - Video - Streaming
Papers - Mamba - FFT - EinFFT
Papers - Encoders - Video
Papers - Multimodal - Video - Text - Audio
Papers - Multimodal - Captions - Audio
Papers - Multimodal - Captions - Speech
Papers - Multimodal - Captions - Video
Papers - Synthetic Data - Multimodal
Papers - 3D
-
ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars
Paper • 2403.15383 • Published • 15 -
FlexiDreamer: Single Image-to-3D Generation with FlexiCubes
Paper • 2404.00987 • Published • 23 -
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Paper • 2402.15627 • Published • 38 -
Interactive3D: Create What You Want by Interactive 3D Generation
Paper • 2404.16510 • Published • 21
Papers - 3D - Synthetic Data
-
ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars
Paper • 2403.15383 • Published • 15 -
LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis
Paper • 2403.15385 • Published • 8 -
MaPa: Text-driven Photorealistic Material Painting for 3D Shapes
Paper • 2404.17569 • Published • 13 -
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Paper • 2404.16994 • Published • 36
Papers - Document - Understanding
-
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
Paper • 2403.15246 • Published • 11 -
Noise-Aware Training of Layout-Aware Language Models
Paper • 2404.00488 • Published • 10 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 27
Papers - Documents - Fine-tuning
-
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
Paper • 2403.15246 • Published • 11 -
Noise-Aware Training of Layout-Aware Language Models
Paper • 2404.00488 • Published • 10 -
Text Role Classification in Scientific Charts Using Multimodal Transformers
Paper • 2402.14579 • Published • 1
Papers - Compiler
Papers - Coding - Compiler
-
Compiler generated feedback for Large Language Models
Paper • 2403.14714 • Published • 7 -
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Paper • 2404.02575 • Published • 50 -
Compiling C to Safe Rust, Formalized
Paper • 2412.15042 • Published • 1
Papers - LLVM
Papers - Training - Teacher Model
Papers - Tree of Thoughts
Papers - Searchformer
Papers - Coding - Stack Traces
Papers - Training Research - Stack Traces
Papers - Fine-tuning - Search Based
Papers - Fine-tuning - Procedure Cloning
Papers - Encoders - T5
Papers - Decoders - T5
Papers - T5
-
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Paper • 2402.14083 • Published • 48 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 6 -
Training a T5 Using Lab-sized Resources
Paper • 2208.12097 • Published • 1 -
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Paper • 2212.05055 • Published • 6
Papers - DenseFormer
Papers - Training - Weighted Average
Papers - Encoders - Image - Clip
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
Paper • 2404.03413 • Published • 28 -
openai/clip-vit-large-patch14-336
Zero-Shot Image Classification • Updated • 3.36M • 277 -
openai/clip-vit-base-patch32
Zero-Shot Image Classification • Updated • 19.8M • 793
Papers - Training - Fitness Score
Papers - Training Research - Exemplary Prompts
Papers - Fine-tuning - Prompts
Models - TTS
Models - T5
Models - Documents
Papers - Encoders - VAE
Papers - Agent - Operating Systems
Papers - Image - Synthetic Data - Human Faces
Papers - Multilingual - Japanese
Papers - Fine-tuning - Multilingual
Papers - Document - Understanding - Historical Images Text
Papers - SAM - Segment Anything Model
-
Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models
Paper • 2309.01674 • Published • 2 -
Segment Anything
Paper • 2304.02643 • Published • 4 -
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Paper • 2403.18118 • Published • 12 -
A Multimodal Automated Interpretability Agent
Paper • 2404.14394 • Published • 22
Papers - Image - Historical
-
Insightful analysis of historical sources at scales beyond human capabilities using unsupervised Machine Learning and XAI
Paper • 2310.09091 • Published • 2 -
Evolution and Transformation of Scientific Knowledge over the Sphaera Corpus: A Network Study
Paper • 2004.00520 • Published • 2 -
NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement
Paper • 2404.05669 • Published • 1
Papers - Image - Explainability
Papers - Image - VGG
Papers - Image - Pattern Recognition
Papers - Image - Historical - Symbolic and Artistic
Papers - Training - Distribution-based
Papers - Research - Emergent Properties
Papers - Image - In-Context Learning
Papers - Deepmind - ICL vs RNN vs LTSM
Papers - Deepmind - ICL Rule-based Classification
Papers - DeepMind - ICL Small Models are More Exemplar-Based
Spaces - Decoders - Beam Search Visualizer
Spaces - Decoders - Beam
Spaces - Decoders
Papers - Video - NeRF
Papers - FAIR
Papers - Fine-tuning - Model Layer Pruning
Papers - Healthcare - Text - Antibodies
Papers - Intel - MLP
Papers - Performance - Intel
Papers - Image - Prompt
Papers - VQA
Papers - Fine-tuning - SFT
-
InternLM2 Technical Report
Paper • 2403.17297 • Published • 34 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 89 -
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data
Paper • 2404.12195 • Published • 12
Papers - Fine-tuning - Report
Papers - Text - Video Generator
Papers - Video - Enhance
Papers - Image - Gaussian Splatting - 2D
Papers - Meta
-
LIMA: Less Is More for Alignment
Paper • 2305.11206 • Published • 26 -
Garment3DGen: 3D Garment Stylization and Texture Generation
Paper • 2403.18816 • Published • 25 -
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Paper • 2403.18118 • Published • 12 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 82
Papers - Audio - Image
Papers - Image - Avatar Generator
Papers - Training Research - Audio
Papers - Healthcare - Synthetic Data Generator - 3D
Could also use a dna repo like: https://github.com/koeng101/dnadesign
Models - Image - Streaming
Datasets - Fine-tuning
Datasets - Meta
Papers - University - MIT
-
One-step Diffusion with Distribution Matching Distillation
Paper • 2311.18828 • Published • 3 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 82 -
Condition-Aware Neural Network for Controlled Image Generation
Paper • 2404.01143 • Published • 13 -
Locating and Editing Factual Associations in GPT
Paper • 2202.05262 • Published • 1
Papers - Google
-
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 86 -
Long-form factuality in large language models
Paper • 2403.18802 • Published • 26 -
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Paper • 2403.18818 • Published • 28 -
TC4D: Trajectory-Conditioned Text-to-4D Generation
Paper • 2403.17920 • Published • 18
Papers - Image - MultiDiffusion
Papers - Imagen
Papers - Convert - T2I to T2V
Papers - University - University of California Berkeley
-
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
Paper • 1801.03924 • Published • 2 -
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Paper • 2403.15042 • Published • 27 -
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Paper • 1712.05884 • Published • 3 -
Efficient Memory Management for Large Language Model Serving with PagedAttention
Paper • 2309.06180 • Published • 25
Papers - OpenAI
-
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
Paper • 1801.03924 • Published • 2 -
Fine-Tuning Language Models from Human Preferences
Paper • 1909.08593 • Published • 3 -
Training Verifiers to Solve Math Word Problems
Paper • 2110.14168 • Published • 4 -
Learning Transferable Visual Models From Natural Language Supervision
Paper • 2103.00020 • Published • 17
Papers - Adobe
Papers - RWKV
Papers - 3DGS
-
Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
Paper • 2403.18795 • Published • 20 -
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Paper • 2403.18118 • Published • 12 -
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling
Paper • 2403.19655 • Published • 19 -
Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces
Paper • 2403.20275 • Published • 10
Papers - Text - Fact Checking
Papers - Text - Factuality
Papers - Healthcare - Text
Papers - Healthcare - Training Research
Papers - University - Stanford University
-
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
Paper • 2403.18421 • Published • 23 -
Long-form factuality in large language models
Paper • 2403.18802 • Published • 26 -
stanford-crfm/BioMedLM
Text Generation • Updated • 1.83k • 440 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63
Papers - DataBricks
Models - Healthcare
Papers - Image - Generator - Large Resolution
Papers - Encoders - Synthetic Noise
Papers - Apple
-
Towards a World-English Language Model for On-Device Virtual Assistants
Paper • 2403.18783 • Published • 6 -
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 129 -
ReALM: Reference Resolution As Language Modeling
Paper • 2403.20329 • Published • 22 -
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Paper • 2404.05719 • Published • 82
Papers - Video - Clothing
Papers - Encoders - Video - MetaCLIP
Papers - IoT - Assistant
Papers - Training Research - Mixture FOFE
Papers - Training Research - AD FOFE
Papers - Image - Editing - Object Removal
Papers - Image - Editing - Object Insertion
Papers - Image - Editing - Counterfactual Supervision
Papers - 3DGS - Feature Rendering
Papers - 3DGS - Open-world Segmentation
Papers - 3DGS - Security Camera Object Detection
Papers - Microsoft
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 33 -
GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling
Paper • 2403.19655 • Published • 19 -
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper • 2404.00656 • Published • 11 -
Enabling Memory Safety of C Programs using LLMs
Paper • 2404.01096 • Published • 1
Papers - University - Carnegie Mellon University
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 33 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 37 -
PIQA: Reasoning about Physical Commonsense in Natural Language
Paper • 1911.11641 • Published • 3 -
AQuA: A Benchmarking Tool for Label Quality Assessment
Paper • 2306.09467 • Published • 1
Papers - Healthcare - Image Analysis
-
Generating Synthetic Computed Tomography for Radiotherapy: SynthRAD2023 Challenge Report
Paper • 2403.08447 • Published • 2 -
Deformable MRI Sequence Registration for AI-based Prostate Cancer Diagnosis
Paper • 2404.09666 • Published • 1 -
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning
Paper • 2408.07931 • Published • 22
Papers - Healthcare - Image - SynthRAD2023
Papers - Healthcare - Image - CT
Models - MoE - GQA
Papers - Image - Segmentation - Bounding Box Infilling
Models - MoE - Coding
Papers - Image - Translation
Papers - Text - Translation
Papers - Multilingual - German
Papers - Image - Synthetic Noise
Papers - Multilingual - Translation
Papers - Johns Hopkins
Papers - Multilingual - Synthetic Noise
Papers - Intel
-
Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation
Paper • 2403.19319 • Published • 14 -
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper • 2404.01197 • Published • 31 -
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 27 -
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
Paper • 2404.03118 • Published • 26
Papers - Fine-tuning - Text - U-Net
Papers - Image - Encoders - Text
Papers - Image - Encoders - Clip
-
TextCraftor: Your Text Encoder Can be Image Quality Controller
Paper • 2403.18978 • Published • 15 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 22 -
OmniFusion Technical Report
Paper • 2404.06212 • Published • 77 -
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper • 2404.07448 • Published • 12
Papers - Video - Reasoning - Time of Events
Papers - Video - Encoders
Papers - Video - Training - Understanding Time
Papers - Nvidia
-
LITA: Language Instructed Temporal-Localization Assistant
Paper • 2403.19046 • Published • 19 -
Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces
Paper • 2403.20275 • Published • 10 -
Condition-Aware Neural Network for Controlled Image Generation
Paper • 2404.01143 • Published • 13 -
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues
Paper • 2404.03820 • Published • 26
Papers - U-Net - 3D
Papers - 3DGS - 3D Mesh Generator
Models - Fine-tuning
Papers - Model - SFT - Alpaca and DPO - Solar
Papers - Fine-tuning - Preference-based RL (PbRL)
-
Dueling RL: Reinforcement Learning with Trajectory Preferences
Paper • 2111.04850 • Published • 2 -
Learning Trajectory Preferences for Manipulators via Iterative Improvement
Paper • 1306.6294 • Published • 3 -
Deep reinforcement learning from human preferences
Paper • 1706.03741 • Published • 4 -
Learning Dynamic Robot-to-Human Object Handover from Human Feedback
Paper • 1603.06390 • Published • 2
Papers - University - Cornell University
-
Learning Trajectory Preferences for Manipulators via Iterative Improvement
Paper • 1306.6294 • Published • 3 -
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Paper • 2403.19888 • Published • 12 -
RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Paper • 2404.03673 • Published • 16 -
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation
Paper • 2404.13026 • Published • 24
Papers - Robotics - Fine-tuning - PbRL
Papers - Fine-tuning - DPO - Reward Model Training
Papers - Reward Model
-
Fine-Tuning Language Models from Human Preferences
Paper • 1909.08593 • Published • 3 -
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 12 -
Leverage the Average: an Analysis of KL Regularization in RL
Paper • 2003.14089 • Published • 2 -
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Paper • 2404.01258 • Published • 12
Papers - Reward Model - Bradley-Terry
https://web.stanford.edu/class/archive/stats/stats200/stats200.1172/Lecture24.pdf
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
HyperCLOVA X Technical Report
Paper • 2404.01954 • Published • 25 -
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 12 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 89
Papers - Reward Model - Training
-
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 12 -
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper • 2310.01377 • Published • 5 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 89 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 124
Papers - University of Chicago
Papers - Reward Models - KL Regularization - RL
Papers - KL Regularization - ADP - Con/Divergence Error Rate
Models - Fine-tuning - PPO
Papers - Fine-tuning - Factuality
Papers - Fine-tuning - Emulator
Datasets - RLHF
Datasets - Fine-tuning - RLHF
Papers - top-p - Nucleus Sampling
Papers - top-k - Flat (good) vs Peaked (bad) Dist Sampling
Figure 5: The probability mass assigned to partial human sentences. Flat distributions lead to many
moderately probable tokens, while peaked distribut
Papers - Distribution - Zipf Analysis
Papers - Institute - Allen Institute
-
The Curious Case of Neural Text Degeneration
Paper • 1904.09751 • Published • 3 -
PIQA: Reasoning about Physical Commonsense in Natural Language
Paper • 1911.11641 • Published • 3 -
SocialIQA: Commonsense Reasoning about Social Interactions
Paper • 1904.09728 • Published • 3 -
HellaSwag: Can a Machine Really Finish Your Sentence?
Paper • 1905.07830 • Published • 6
Papers - University - University of Washington
-
The Curious Case of Neural Text Degeneration
Paper • 1904.09751 • Published • 3 -
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper • 2404.01197 • Published • 31 -
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Paper • 1905.10044 • Published • 2 -
PIQA: Reasoning about Physical Commonsense in Natural Language
Paper • 1911.11641 • Published • 3
Models - 1bit
Models - Bitnet - Text
Papers - Coding - Unit Tests
Papers - Tacotron 2
Spectrogram Prediction Network
As in Tacotron, mel spectrograms are computed through a short-
time Fourier transform (STFT) ... and a Hann window func
Papers - Audio - WaveNet
Papers - Audio - Time Domain Waveforms
Papers - Audio - TTS
-
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Paper • 1712.05884 • Published • 3 -
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Paper • 2403.16973 • Published • 2 -
High Fidelity Neural Audio Compression
Paper • 2210.13438 • Published • 4 -
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 10
Papers - Audio - Mel Spectogram
-
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Paper • 1712.05884 • Published • 3 -
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 12 -
Music Consistency Models
Paper • 2404.13358 • Published • 14 -
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper • 2404.14700 • Published • 32
Papers - Decoders - Audio
Papers - GAN
Papers - Image - GAN
Papers - GAN - Compression - Bitstream
Papers - GAN - Compression
Papers - Audio - STT - ASR
-
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Paper • 2303.00747 • Published • 5 -
Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion
Paper • 2311.14836 • Published • 2 -
SONAR: Sentence-Level Multimodal and Language-Agnostic Representations
Paper • 2308.11466 • Published • 1 -
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Paper • 2108.06209 • Published • 1
Papers - Audio - Speech Transcription
Papers - Audio - WhisperX
Papers - Audio - Voice Activity Detection
Papers - Audio - VoiceCraft
Models - Audio - TTS
Papers - Audio - Compression
Models - Audio
Models - Audio - Codec
Models - Audio - Encoders
Models - Audio - Decoders
Models - FAIR
Models - Meta - FAIR
Models - Audio - Music Generator
Models - Getting Started - Pre-training
Models - TinyLlama
-
keeeeenw/MicroLlama
Text Generation • 0.3B • Updated • 1.47k • 51 -
TinyLlama/TinyLlama-1.1B-intermediate-step-240k-503b
Text Generation • 1B • Updated • 458 • • 20 -
TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
Text Generation • 1B • Updated • 20.6k • • 180 -
TinyLlama/TinyLlama-1.1B-Chat-v1.0
Text Generation • 1B • Updated • 4.23M • 1.44k
Models - Reward Model
Models - Starling
Datasets - Chat - RLHF
Datasets - Starling
Papers - Audio - Masked Language Model
Papers - Audio - Residual Vector Quantization
Papers - Audio - Encoders
Models - Image - Object Detection - DETR
Models - ResNet
Papers - Audio - Inference - Rescore Models
Papers - Inference - Rescore Models
Inference - Autoregressive and Non-Autoregressive Models
Papers - Kyutai
https://kyutai.org/
Models - Text - Music Generator
Models - Audio - Hybrid - AR with NAR Models
Papers - Touch
Papers - MoE - Mamba
Papers - Flan-T5
Papers - IoT - Screen Usage Understanding and Context
Papers - Mobile - User Entity Context Understanding
Papers - Mamba - Limitations - In-Context Learning (ICL)
Models - MoE - Mamba
Papers - AI21 Labs
Papers - University of Tokyo
Papers - S-Lab
Papers - Duke
Papers - University of Wisconsin
Papers - Image - Report
Papers - Hallucinations
Papers - Trustworthiness
Papers - University of Bristol
Papers - Healthcare - Surgical Gestures
Papers - Vanderbilt
Papers - Fine-tuning - Dataset - Few-Shot Retrieval (FRet)
Papers - University - New York University
-
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Paper • 2403.19888 • Published • 12 -
Measuring Style Similarity in Diffusion Models
Paper • 2404.01292 • Published • 17 -
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Paper • 1804.07461 • Published • 4 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15
Papers - Embeddings
-
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Paper • 2403.20327 • Published • 48 -
Round and Round We Go! What makes Rotary Positional Encodings useful?
Paper • 2410.06205 • Published • 2 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Paper • 2410.20771 • Published • 3
Papers - Embeddings - Text
-
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Paper • 2403.20327 • Published • 48 -
2D Matryoshka Sentence Embeddings
Paper • 2402.14776 • Published • 6 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 156
Papers - Text - Memorization
Gradients flow differently for memorized and non-memorized during decoding
Papers - Training a 2.8B Model in 38 days
Papers - Huawei
-
DiJiang: Efficient Large Language Models through Compact Kernelization
Paper • 2403.19928 • Published • 12 -
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Paper • 2309.12284 • Published • 18 -
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
Paper • 2404.09204 • Published • 11 -
SAGS: Structure-Aware 3D Gaussian Splatting
Paper • 2404.19149 • Published • 14
Papers - vLLM
Papers - Inference - vLLM
Papers - Attention - PagedAttention
Papers - Fine-tuning - Model Merge
Papers - Frankenmerge - Model Stock - Use Fine-tuned Models
Papers - Naver
Models - Model Stock
Models - Frankenmerge
Models - Frankenmerge - Model Stock
Papers - Benchmarks
-
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 23 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 37 -
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Paper • 1804.07461 • Published • 4
Papers - Benchmarks - Financials
Papers - 1bit
-
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
Paper • 1606.06160 • Published • 1 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625 -
mobiuslabsgmbh/Llama-2-7b-chat-hf_1bitgs8_hqq
Text Generation • Updated • 53 • 74
Models - 2bit
Papers - Video - Fine-tuning
Papers - Video - Reward Model
Models - Spright
Papers - ASU
Papers - Hugging Face
Papers - University of Maryland
Papers - University - Tsinghua University
-
Condition-Aware Neural Network for Controlled Image Generation
Paper • 2404.01143 • Published • 13 -
FlexiDreamer: Single Image-to-3D Generation with FlexiCubes
Paper • 2404.00987 • Published • 23 -
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 46 -
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Paper • 2404.02893 • Published • 22
Papers - Chinese Academy of Sciences
Papers - Xidian University
Papers - 3D - FlexiCubes
Gradient-based surface extraction method
Papers - ShengShu
Papers - Fine-tuning - Llava - DPO
Papers - Non-Autoregressive Transformers
Papers - Salesforce
-
Non-Autoregressive Neural Machine Translation
Paper • 1711.02281 • Published • 1 -
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Paper • 2107.07651 • Published • 1 -
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Paper • 2404.07972 • Published • 50 -
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 18
Papers - Safety
Papers - Speech - Chain of Thought
Papers - Audio - Chain of Thought
Papers - Chinese University of Hong Kong
-
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper • 2404.00656 • Published • 11 -
CameraCtrl: Enabling Camera Control for Text-to-Video Generation
Paper • 2404.02101 • Published • 24 -
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1
Papers - Audio - Fine-tuning
-
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Paper • 2404.00656 • Published • 11 -
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 12 -
Long-form music generation with latent diffusion
Paper • 2404.10301 • Published • 27 -
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Paper • 2006.11477 • Published • 7
Papers - Audio - Fine-tuning - Lora
Papers - Image - Continual Training Framework
Papers - Documents - LayoutLM
-
Noise-Aware Training of Layout-Aware Language Models
Paper • 2404.00488 • Published • 10 -
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Paper • 2204.08387 • Published • 5 -
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Paper • 2012.14740 • Published • 2 -
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Paper • 1912.13318 • Published • 4
Papers - Documents - FormNet
-
Noise-Aware Training of Layout-Aware Language Models
Paper • 2404.00488 • Published • 10 -
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
Paper • 2305.02549 • Published • 6 -
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction
Paper • 2203.08411 • Published • 1 -
ETC: Encoding Long and Structured Inputs in Transformers
Paper • 2004.08483 • Published • 1
Papers - Document - OCR
-
Noise-Aware Training of Layout-Aware Language Models
Paper • 2404.00488 • Published • 10 -
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction
Paper • 2203.08411 • Published • 1 -
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
Paper • 2305.02549 • Published • 6 -
ETC: Encoding Long and Structured Inputs in Transformers
Paper • 2004.08483 • Published • 1
Papers - Ohio State
Papers - Video - Captions
-
Streaming Dense Video Captioning
Paper • 2404.01297 • Published • 13 -
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Paper • 2404.16994 • Published • 36 -
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Paper • 2406.04325 • Published • 75
Papers - Video - Streaming - Captions
Papers - Decoders - Training Decoding Point Supervision
Papers - Healthcare - Cardiac MRI - CMRxRecon Challenge 2023
Papers - Image - Healthcare - Cardiac MRI
Papers - Image - Healthcare
-
The state-of-the-art in Cardiac MRI Reconstruction: Results of the CMRxRecon Challenge in MICCAI 2023
Paper • 2404.01082 • Published • 1 -
Realism in Action: Anomaly-Aware Diagnosis of Brain Tumors from Medical Images Using YOLOv8 and DeiT
Paper • 2401.03302 • Published • 1 -
Brain2Music: Reconstructing Music from Human Brain Activity
Paper • 2307.11078 • Published • 41
Papers - Training Research - Optimizers
Papers - Coding - C/C++ - Memory
Papers - Coding - C/C++
Papers - Coding - Annotations, Decorators and Captions
Papers - Coding - Operating Systems - Memory
Papers - Image - Contrastive Graph Learning
Papers - Extended Transformer Construction
-
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
Paper • 2305.02549 • Published • 6 -
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction
Paper • 2203.08411 • Published • 1 -
ETC: Encoding Long and Structured Inputs in Transformers
Paper • 2004.08483 • Published • 1 -
LongT5: Efficient Text-To-Text Transformer for Long Sequences
Paper • 2112.07916 • Published • 2
Papers - Documents - Tabular
-
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
Paper • 2305.02549 • Published • 6 -
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction
Paper • 2203.08411 • Published • 1 -
More efficient manual review of automatically transcribed tabular data
Paper • 2306.16126 • Published • 1 -
CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents
Paper • 2004.12629 • Published • 3
Papers - Graph Convolutional Network
Papers - Documents - Graph Convolutional Network
Papers - Training Research - Contrastive Predictive Coding
Papers - Decoders - Bert
Papers - Optimizers - Adafactor
Papers - T5 - MoE
Papers - University of Georgia Tech
Papers - Image - Extract Style
Papers - Image - Contrastive Style Descriptors
Papers - Image - Use a Model to find a similar image
https://github.com/learn2phoenix/CSD
Papers - Ellis Institute
Papers - Shanghai AI Laboratory
-
CameraCtrl: Enabling Camera Control for Text-to-Video Generation
Paper • 2404.02101 • Published • 24 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 18 -
Interactive3D: Create What You Want by Interactive 3D Generation
Paper • 2404.16510 • Published • 21 -
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
Paper • 2406.07394 • Published • 29
Papers - Image - Security Cameras
Papers - Government - USA
Papers - University - University of Waterloo
-
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 37 -
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Paper • 2211.12588 • Published • 3 -
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Paper • 2402.16671 • Published • 29 -
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Paper • 2404.04167 • Published • 14
Papers - Vector Institute
-
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 37 -
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Paper • 2211.12588 • Published • 3 -
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Paper • 2402.16671 • Published • 29 -
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Paper • 2404.04167 • Published • 14
Papers - Benchmarks - Text
Papers - Benchmarks - In-Context Learning
Papers - Benchmarks - Text - Long Context
Models - Documents - OCR
Models - Text - Classifier - Zero-Shot
Models - Text - Classifier - Deberta
https://github.com/MoritzLaurer/zeroshot-classifier/tree/main
Papers - Network - Adaptive BitRate Algorithms
Papers - Network Traffic - 4G and 5G - OTA - Packet Shaping
Papers - Network Traffic - 4G and 5G - OTA
Papers - Network Traffic - 4G and 5G
Papers - Network Traffic - OTA
Papers - Network Traffic - Packet Shaping
Papers - Network Traffic - Transport Optimization
Papers - Network Traffic
Papers - University of Texas
Papers - University of Peking
-
LLM-ABR: Designing Adaptive Bitrate Algorithms via Large Language Models
Paper • 2404.01617 • Published • 8 -
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Paper • 2404.02905 • Published • 74 -
Learning From Mistakes Makes LLM Better Reasoner
Paper • 2310.20689 • Published • 29 -
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Paper • 2404.04167 • Published • 14
Papers - Coding - Preference Trees
Papers - Coding - Understanding Tree Structures
Papers - Math - Reasoning
-
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 46 -
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Paper • 2404.02893 • Published • 22 -
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Paper • 2309.12284 • Published • 18 -
Premise Order Matters in Reasoning with Large Language Models
Paper • 2402.08939 • Published • 28
Papers - University - University of Illinois
-
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 46 -
PointInfinity: Resolution-Invariant Point Diffusion Models
Paper • 2404.03566 • Published • 16 -
MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance
Paper • 2404.08252 • Published • 6 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 27
Papers - University - Northeastern University
-
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 46 -
Locating and Editing Factual Associations in Mamba
Paper • 2404.03646 • Published • 3 -
Locating and Editing Factual Associations in GPT
Paper • 2202.05262 • Published • 1 -
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 115
Papers - Multilingual - Finnish
Papers - Multilingual - Encoders - BPE
Papers - LLaVA
-
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 27 -
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
Paper • 2404.03118 • Published • 26 -
DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation
Paper • 2404.07917 • Published • 2 -
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 32
Papers - Gemma
Papers - Multimodal - Training
-
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 27 -
Data curation via joint example selection further accelerates multimodal learning
Paper • 2406.17711 • Published • 3 -
Unveiling Encoder-Free Vision-Language Models
Paper • 2406.11832 • Published • 54
Papers - Encoders - DinoV2
Papers - Image - Encoders - DinoV2
-
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 27 -
OmniFusion Technical Report
Paper • 2404.06212 • Published • 77 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
WildGaussians: 3D Gaussian Splatting in the Wild
Paper • 2407.08447 • Published • 9
Papers - Training Research - Scaling Properties - T2I
Papers - Training Research - Smaller vs Larger Models
Papers - Pre-training - In-filling - PSM and SPM ordering
Papers - Pre-training - Dynamic Context Length
For HyperClova X they split 90% at 4096 and 10% at 32k context length during pt
Papers - Text - Supervised Fine-tuning
Papers - Text - Supervised Fine-tuning - Batch Grouping
Batches are grouped by similar token length to help optimize gpu/hardware. Mini batch lengths are different but the max number of tokens is the same.
Papers - Fine-tuning - PPO
-
HyperCLOVA X Technical Report
Paper • 2404.01954 • Published • 25 -
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper • 2310.01377 • Published • 5 -
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Paper • 2305.14387 • Published • 1 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 129
Papers - Multilingual - Benchmarks
Papers - Amazon
-
On the Scalability of Diffusion-based Text-to-Image Generation
Paper • 2404.02883 • Published • 19 -
MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance
Paper • 2404.08252 • Published • 6 -
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
Paper • 2303.15647 • Published • 4
Papers - Image - SDXL
-
On the Scalability of Diffusion-based Text-to-Image Generation
Paper • 2404.02883 • Published • 19 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 22 -
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Paper • 2404.03653 • Published • 36 -
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Paper • 2404.07987 • Published • 48
Papers - ByteDance
-
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Paper • 2404.02905 • Published • 74 -
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Paper • 2404.07987 • Published • 48 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 30 -
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Paper • 2402.15627 • Published • 38
Papers - Video - Autoregressive Model
Papers - Infererence - Performance
-
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Paper • 2404.02905 • Published • 74 -
On Speculative Decoding for Multimodal Large Language Models
Paper • 2404.08856 • Published • 13 -
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 20
Papers - Coding - Algorithmic Reasoning
Papers - Coding - Think and Execute vs CoT and PoTs
Papers - Coding - Program of Thoughts (PoT)
Papers - Coding - Think and Exectue - 7B vs 13B vs GPT
Papers - Prompts - Detailed Examples
Papers - Infra - Cost - Automatic Compute Planning
Papers - Mixture of Depths - MLP, residuals, router, tokens
Papers - MoD - Router
Papers - Yonsei University
Papers - Image - NeRF
-
Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition
Paper • 2404.02514 • Published • 11 -
MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance
Paper • 2404.08252 • Published • 6 -
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video
Paper • 2404.09833 • Published • 30 -
MeshLRM: Large Reconstruction Model for High-Quality Mesh
Paper • 2404.12385 • Published • 27
Papers - Alibaba
-
Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition
Paper • 2404.02514 • Published • 11 -
BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer
Paper • 1904.06690 • Published • 1 -
Contrastive Chain-of-Thought Prompting
Paper • 2311.09277 • Published • 36 -
LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency
Paper • 2404.12872 • Published • 11
Papers - University - Fudan University
-
Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition
Paper • 2404.02514 • Published • 11 -
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Paper • 2404.04167 • Published • 14 -
Length Generalization of Causal Transformers without Position Encoding
Paper • 2404.12224 • Published • 1 -
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
Paper • 2406.07394 • Published • 29
Papers - Image - Frequency Decomposition
Papers - Image - Demosaic
Papers - University - Hong Kong University of Science and Te
-
Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss
Paper • 2404.02731 • Published • 1 -
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Paper • 2309.12284 • Published • 18 -
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 10 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 18
Papers - Image - Interior Design
Papers - 3D - Interior Design
Papers - ETH Zurich
-
I-Design: Personalized LLM Interior Designer
Paper • 2404.02838 • Published • 2 -
Scaling MLPs: A Tale of Inductive Bias
Paper • 2306.13575 • Published • 16 -
Fast Feedforward Networks
Paper • 2308.14711 • Published • 3 -
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
Paper • 2404.14047 • Published • 45
Papers - 3D - Indoor Scene Synthesis
Datasets - Reasoning
Papers - Reasoning - Self-Reference Metalinguistic
Papers - University - University of California San Diego
-
I am a Strange Dataset: Metalinguistic Tests for Language Models
Paper • 2401.05300 • Published • 3 -
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 66 -
SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Paper • 2305.09781 • Published • 4 -
MeshLRM: Large Reconstruction Model for High-Quality Mesh
Paper • 2404.12385 • Published • 27
Papers - PlayTest AI
Papers - Contextual AI
Papers - Reasoning - MRGSM8k - Meta Math Multi Step
Papers - Reasoning - GSM8k
-
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
Premise Order Matters in Reasoning with Large Language Models
Paper • 2402.08939 • Published • 28 -
Reasoning in Large Language Models: A Geometric Perspective
Paper • 2407.02678 • Published • 1 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 129
Papers - Tencent
-
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 55 -
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Paper • 2404.16790 • Published • 10 -
A Thorough Examination of Decoding Methods in the Era of LLMs
Paper • 2402.06925 • Published • 1
Papers - Benchmarks - GSM8k
-
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
Paper • 2406.07394 • Published • 29 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 166 -
Self-Consistency Preference Optimization
Paper • 2411.04109 • Published • 19
Datasets - Reasoning - Meta Math Multi-Step - GSM8k
Datasets - Math - Meta Context Reasoning
Papers - University of Cambridge
Papers - Southern University of Science and Technology
Papers - Alan Turing Institute
Papers - Max Planck Institute
Datasets - Text - QA
Datasets - Text - System Chat
Models - Image - Handwriting Comprehension
Models - Table - Handwriting Comprehension
Papers - Arctic University of Norway
Papers - Document - Tabular - Manual Review
Repo: https://github.com/HistLab/More-efficient-manual-review-of-automatically-transcribed-tabular-data
Papers - Documents - Tabular - Census
Papers - Image - Custom Annotation and Labeling Tools
Papers - Documents - Custom Annotation and Labeling Tools
Papers - Image - Tabular
Papers - CascadeTabNet
Papers - Image - OCR
-
CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents
Paper • 2004.12629 • Published • 3 -
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Paper • 2204.08387 • Published • 5 -
Text Role Classification in Scientific Charts Using Multimodal Transformers
Paper • 2402.14579 • Published • 1 -
An inclusive review on deep learning techniques and their scope in handwriting recognition
Paper • 2404.08011 • Published • 1
Papers - Pune Institute
Papers - Image - Table Structure Recognition
Papers - Documents - Table Recognition - Fine-tuning
Papers - Image - Fine-tuning - Tables
Papers - Image - OCR - Tesseract for Text Location
Papers - Document AI
-
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Paper • 1912.13318 • Published • 4 -
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Paper • 2012.14740 • Published • 2 -
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Paper • 2204.08387 • Published • 5
Papers - Harbin Institute
Papers - Coding - Benchmarks - Report
Papers - Coding - OpenCodeInterpreter
Papers - Benchmarks - Coding
-
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
Paper • 2404.03543 • Published • 18 -
McEval: Massively Multilingual Code Evaluation
Paper • 2406.07436 • Published • 41 -
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Paper • 2406.15877 • Published • 48 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 166
Papers - Coding - Training - Equal-Info Windows
Table 5: Transformers struggle to learn Arithmetic Coding. In the sequence-to-sequence setting,
a model that learns AC compression/decompression shoul
Papers - Coding - Multi-Model Inference
Papers - Coding - Distributed - Adaptive Computation Time
Papers - Anthropic
Papers - Training Research - Compression and Multi-Model Inf
Papers - Coding - Encoders
Papers - Encoders - Compression
Emergence with scale is unlikely Given the recent findings of [55], we anticipate that continuing
to scale models beyond 2 billion parameters is unlik
Papers - Coding - Compression
Papers - Tokenizer - Neural Compression
Papers - Inference - Multi-Model
Papers - Fine-tuning - ReFT
In this paper, we propose a strong alternative to PEFTs, LoReFT. LoReFT achieves strong per-
formance across benchmarks from four domains while being
Papers - Fine-tuning - Report - Llama 7B and 13B
Datasets - Reasoning - Commonsense
Papers - Tokenizers - Roberta
Papers - Reasoning - Commonsense
-
SocialIQA: Commonsense Reasoning about Social Interactions
Paper • 1904.09728 • Published • 3 -
PIQA: Reasoning about Physical Commonsense in Natural Language
Paper • 1911.11641 • Published • 3 -
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Paper • 1905.10044 • Published • 2 -
HellaSwag: Can a Machine Really Finish Your Sentence?
Paper • 1905.07830 • Published • 6
Papers - Reasoning - Social IQ
Papers - University of Houston
Papers - Image - Classifier - Label Quality Assessment
Datasets - Reasoning - Math
Papers - Benchmarks - Image - Labels
Papers - Benchmarks - Image
-
AQuA: A Benchmarking Tool for Label Quality Assessment
Paper • 2306.09467 • Published • 1 -
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Paper • 2404.07972 • Published • 50 -
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 26 -
Vision language models are blind
Paper • 2407.06581 • Published • 84
Papers - Reasoning - Math
MAWPS paper: https://aclanthology.org/N16-1136.pdf
-
Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
Paper • 1705.04146 • Published • 1 -
Training Verifiers to Solve Math Word Problems
Paper • 2110.14168 • Published • 4 -
Explaining Math Word Problem Solvers
Paper • 2307.13128 • Published • 1 -
MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms
Paper • 1905.13319 • Published • 2
Papers - Reasoning - Math - AQuA
https://github.com/google-deepmind/AQuA
Papers - University of Oxford
-
Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems
Paper • 1705.04146 • Published • 1 -
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?
Paper • 2404.03411 • Published • 11 -
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 29 -
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 20
Papers - University of IAIR Xi’an Jiaotong
Papers - Training - Instruction-Following
Alpaca eval: https://github.com/tatsu-lab/alpaca_eval
Datasets - Text - Instruction-following
Papers - RLHF
-
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper • 2310.01377 • Published • 5 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 89 -
Natural Language Reinforcement Learning
Paper • 2411.14251 • Published • 31 -
Group Robust Preference Optimization in Reward-free RLHF
Paper • 2405.20304 • Published • 1
Papers - Benchmarks - Text - General Language Understanding
Papers - Benchmarks - Text - Glue
Datasets - Benchmarks - Glue
Datasets - Benchmarks - Text
Papers - Encoders - Roberta
-
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
Paper • 1907.12461 • Published • 1 -
Transformer Language Models without Positional Encodings Still Learn Positional Information
Paper • 2203.16634 • Published • 5 -
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Paper • 2102.04664 • Published • 2
Papers - Reasoning - Program of Thoughts
Papers - University of California Santa Barabra
Papers - StructLM - Understanding Structured Data
Models - StructLM
Datasets - Text - StructLM
Papers - Prompts - System Chat
Papers - Prompts - Chain of Thought
Papers - Tokenizers - LLaMA Byte Pair Encoding (BPE)
Datasets - OCR - Image with Text from Textract
Datasets - Documents - OCR - Image with Text from Textract
Papers - Benchmarks - Web Browsing Tasks
Papers - University - Harvard University
-
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
Paper • 2404.03413 • Published • 28 -
Scaling Data-Constrained Language Models
Paper • 2305.16264 • Published • 16 -
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
Paper • 2406.19370 • Published • 1
Papers - Kaust
Papers - Image - Point Cloud
Papers - Video - MultiView Compressive Coding (MCC)
Papers - Image - Encoders - RBG-D
Papers - Image - Training - Low Res Predicts High Res
Papers - University - Beihang University
Papers - Tokenizers - Documents - TrOCR
Papers - Tokenizers - Image - TrOCR
Papers - Tokenizers - Image - Handwriting
Spaces - Image - Handwriting Recognition
Papers - University of Zhejiang
Papers - Audio - Text to Speech
-
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 10 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 10 -
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper • 2404.14700 • Published • 32
Papers - Audio - TTS - VALL-E
Papers - Audio - TTS - RALL-E
Papers - Security - Jailbreak
-
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?
Paper • 2404.03411 • Published • 11 -
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Paper • 2404.13208 • Published • 40 -
A False Sense of Safety: Unsafe Information Leakage in 'Safe' AI Responses
Paper • 2407.02551 • Published • 9
Papers - Benchmark - Security
Papers - LMU Munich
Papers - Siemens
Papers - University of Wuhan
Papers - Munich Center for Machine Learning (MCML)
Papers - Benchmarks - Website Navigation
Papers - Web Navigation - Chrome Extension
Papers - Web - Recognition
Papers - Web - Training - Curriculum Learning
Papers - Fine-tuning - Rejection Sampling (RFT)
Papers - Zhipu AI
Models - General Purpose
-
CohereLabs/c4ai-command-r-plus
Text Generation • 104B • Updated • 2.89k • 1.76k -
CohereLabs/c4ai-command-r-plus-4bit
Text Generation • 55B • Updated • 22 • 256 -
mistral-community/Mixtral-8x22B-v0.1-4bit
Text Generation • 73B • Updated • 32 • 55 -
CohereLabs/c4ai-command-r-v01
Text Generation • 35B • Updated • 11.5k • 1.1k
Datasets - Benchmarks - CodeEditorBench - OCI
Models - Chat
Models - Text - Image
Models - Multimodal - Chat
Models - Audio - Understanding
Models - Synthetic Data - Audio
Models - Audio - Edit with Text
Models - Audio - Classification and Segmentation
Models - Image - Chat
Models - Image - Synthetic Data
Spaces - Image - Chat
Papers - Audio - Understanding
Papers - Audio - Captions
Spaces - Qwen - Image
Datasets - SQL
Models - Audio - STT - ASR
-
openai/whisper-large-v3
Automatic Speech Recognition • 2B • Updated • 4.17M • • 5.03k -
openai/whisper-tiny
Automatic Speech Recognition • 37.8M • Updated • 609k • 374 -
openai/whisper-small
Automatic Speech Recognition • 0.2B • Updated • 2.46M • 469 -
openai/whisper-medium
Automatic Speech Recognition • 0.8B • Updated • 771k • 267
Papers - Redwood Research
Papers - Automated Interpretability
OpenAI has a 2024 tool referring to this technique: https://github.com/openai/transformer-debugger with https://transformer-circuits.pub/2023/monosema
Models - Encoders - Bidirectional
Models - Encoders - Bert
Papers - Text - Encoders - Image - Clip
Papers - Training Research - Rank-One Model Editing
Papers - Training Research - Mamba
Papers - Training Research - Ablation - Mamba
Papers - Training Research - Ablation - Factuality
Papers - Training Research - Weights - Activation Patching
Papers - Training Research - Interpretability
Papers - Interpretability - Rome - Factuality Editing
Websit: https://rome.baulab.info/
Papers - Interpretability
-
Prompt-to-Prompt Image Editing with Cross Attention Control
Paper • 2208.01626 • Published • 2 -
BERT Rediscovers the Classical NLP Pipeline
Paper • 1905.05950 • Published • 3 -
A Multiscale Visualization of Attention in the Transformer Model
Paper • 1906.05714 • Published • 2 -
Analyzing Transformers in Embedding Space
Paper • 2209.02535 • Published • 3
Papers - University of Tel-Aviv
-
Analyzing Transformers in Embedding Space
Paper • 2209.02535 • Published • 3 -
Prompt-to-Prompt Image Editing with Cross Attention Control
Paper • 2208.01626 • Published • 2 -
Dynamic Typography: Bringing Words to Life
Paper • 2404.11614 • Published • 45 -
Transformer Language Models without Positional Encodings Still Learn Positional Information
Paper • 2203.16634 • Published • 5
Papers - Interpretability - Attention
Papers - University of Brown
Papers - Training Research - Layer Understanding
Papers - Interpretability - Prompts
Papers - Image - Imagen
Papers - Training Research - Control Attention Reweighting
Papers - Attention - Weights - Re-Weighting
Papers - Training Research - Text - Token Visualization
https://github.com/jessevig/bertviz
Datasets - Image - ImageNet
Datasets - Image
Papers - Recommendation - Cloze Task
Papers - Recommendation - Encoders - Bert
FFNs: Using a smoother GELU instead of an ReLu
Papers - Recommendation
Papers - Recommendation - Multi-Task Learning
Papers - Recommendation - Bert4rec - SASRec
Papers - Recommendation - RTG Balancing
Papers - University of Zurich
Papers - Healthcare - Radiology
Papers - University - Shanghai Jiao Tong University
-
DeViDe: Faceted medical knowledge for improved medical vision-language pre-training
Paper • 2404.03618 • Published • 2 -
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Paper • 2404.04167 • Published • 14 -
SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Paper • 2305.09781 • Published • 4 -
McEval: Massively Multilingual Code Evaluation
Paper • 2406.07436 • Published • 41
Papers - Training Research - Pre-training - ALBEF
Papers - Training Research - Vision Language Pre-training
Papers - Pre-training - ALBEF - Multimodal Encoder
Papers - Multimodal - Encoders - ALBEF
Papers - Dataset - MultiModal - MultiLingual - Wiki
Papers - Fine-tuning - RLHF - Direct Nash Optimization (DNO)
Reward expressed as win-rates related to general preferences
Papers - RLHF - Iterative Contrastive Self-Improvement
A batched on-policy algorithm that conducts self-improvement iteratively via contrastive learning
Datasets - Text - Alpaca
Papers - RL - Consistency Model (RLCM)
a multi-step Markov Decision Process, allowing one to fine-tune consistency models toward a downstream task using just a reward function.
Papers - Fine-tuning - Image - Prompt Image Alignment
Papers - Harvey Mudd
Papers - Fine-tuning - Stream of Search
Papers - Training Research - Search Based (BFS / DFS)
Focuses on policy improvement through search-based sampling
Models - Text - Science
Papers - University of Tubingen
Papers - HKUST
Papers - Kuaishou
Papers - Text - Dialog Inpainting
Papers - 3DGS - Motion Blur
Papers - 3DGS - Color Transformation
Papers - Image - Encoders - RGB-T (Thermal)
Papers - University of Dalian
Models - Image - Stock Market - Pattern Detection
Papers - Audio - Encoders - HuBert with EnCodec
Papers - Audio - Bark
Papers - Mobile - Multimodal - Screen Image with Captions
Papers - Training Research - DeiT
Papers - Healthcare- DeiT
Papers - Image - Object Detection - YoloV8
Papers - Healthcare - Image - Cancer - Brain
Papers - Image - Hybrid - DeiT and YoloV8
Papers - Image - Healthcare - DICOM
Papers - Image - Healthcare - PTP Metrics
Papers - Image - DeiT
-
Realism in Action: Anomaly-Aware Diagnosis of Brain Tumors from Medical Images Using YOLOv8 and DeiT
Paper • 2401.03302 • Published • 1 -
MLP Can Be A Good Transformer Learner
Paper • 2404.05657 • Published • 1 -
Detecting and recognizing characters in Greek papyri with YOLOv8, DeiT and SimCLR
Paper • 2401.12513 • Published • 1 -
DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets
Paper • 2404.02900 • Published • 1
Papers - Custom Layers - MLP
-
MLP Can Be A Good Transformer Learner
Paper • 2404.05657 • Published • 1 -
Toward a Better Understanding of Fourier Neural Operators: Analysis and Improvement from a Spectral Perspective
Paper • 2404.07200 • Published • 2 -
An inclusive review on deep learning techniques and their scope in handwriting recognition
Paper • 2404.08011 • Published • 1 -
Long-form music generation with latent diffusion
Paper • 2404.10301 • Published • 27
Papers - University of Melbourne
Papers - Multilingual - Image - Greek
Papers - Indian Institute of Technology
Papers - Indian Institute of Science
Papers - University of Sorbonne
Papers - Regularization - LayerScale
Papers - Regularization - Binary Cross Entropy
Models - Image - DeiT
Models - Image - Classification
Papers - Image - Report - VQA
Papers - Image - Training - Mistral
Papers - AIRI Institute
Papers - Sber AI
Papers - Skoltech
Papers - Image - LLaVA
Papers - Image - Coco Testing
-
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Paper • 2310.03502 • Published • 78 -
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper • 2404.07448 • Published • 12 -
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 32 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 30
Papers - Image - Clip - Coco Testing
-
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Paper • 2310.03502 • Published • 78 -
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper • 2404.07448 • Published • 12 -
RegionGPT: Towards Region Understanding Vision Language Model
Paper • 2403.02330 • Published • 2 -
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 4
Papers - Image - Frechet Inception Distance (FID)
https://machinelearningmastery.com/how-to-implement-the-frechet-inception-distance-fid-from-scratch/
-
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Paper • 2310.03502 • Published • 78 -
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 4 -
Music Consistency Models
Paper • 2404.13358 • Published • 14 -
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper • 2404.14507 • Published • 23
Papers - Training - Long Context
-
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Paper • 2404.07143 • Published • 111 -
RULER: What's the Real Context Size of Your Long-Context Language Models?
Paper • 2404.06654 • Published • 39 -
An Evolved Universal Transformer Memory
Paper • 2410.13166 • Published • 6
Papers - Benchmark - Context
Papers - Benchmarks - Context - Ruler
Papers - Image - Decoders
Papers - Image - Decoders - ViT
Papers - Training - Image - Causal Self Attention
Papers - Image - Training - AS2D RoPE and SwiGLU
Papers - Training - Detailed Appendices
Papers - Image - Encoders - ViT
-
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
Paper • 2404.06903 • Published • 21 -
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 29 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
Paper • 2404.17672 • Published • 19
Papers - 3D - Panoramic View Generator
Papers - Image - Training - Self Refinement
Papers - Training - Noisy or Unseen Data Drops Accuracy 6%
Papers - Image - Object Detection - DETR
-
End-to-End Object Detection with Transformers
Paper • 2005.12872 • Published • 7 -
ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model
Paper • 2404.07773 • Published • 1 -
Efficient Transformer Encoders for Mask2Former-style models
Paper • 2404.15244 • Published • 1 -
DETRs Beat YOLOs on Real-time Object Detection
Paper • 2304.08069 • Published • 14
Spaces - Healthcare - Multimodal
Papers - Text - Social Skills
Papers - Fine-tuning - Orpo
Papers - KAIST AI
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 69 -
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
Paper • 2404.07738 • Published • 2 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 124
Papers - Image - Fourier Neural Operators (FNO) vs CNNs
Papers - Image - FNO - Low and High Frequency Data
Papers - Image - Training - Training with an Ensemble
Papers - Image - FNO - SpecBoost Ensemble
Papers - Image - Differential Equations - FNO - ReLu
Papers - Image - Spectral Analysis
Papers - Rag - Prompts
Papers - Rag - Multiple Documents in Parallel
Papers - Tokens - Path Equilibrium Positioning
Like coupled masses connected by springs
Papers - Tokens - Real-Valued Positioning
Papers - Model - Griffin
Papers - Models - Griffin - RecurrentGemma
Models - Mistral - Orpo
Papers - Fine-tuning - ControlNet
Papers - University of Central Florida
Papers - Reward Model - Consistency Loss - ControlNet
Papers - Audio - Datasets - Dialog
Papers - Qwen - Audio
Papers - Advanced Micro Devices
Papers - Image - Auto - Lane Detection
Papers - Image - Auto - Lane - Training Segmentation
Papers - Operating Systems
Papers - Agents - Operating Systems
Papers - Benchmarks - Agent - Multimodal - Tasks
Papers - University of Aalto
Papers - University - Princeton University
-
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Paper • 2404.07413 • Published • 38 -
Allowing humans to interactively guide machines where to look does not always improve a human-AI team's classification accuracy
Paper • 2404.05238 • Published • 3 -
Cognitive Architectures for Language Agents
Paper • 2309.02427 • Published • 8 -
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2
Papers - Megatron
Papers - Attention - Mixture of Attention Heads (MoA)
Generalized multi head using RoPE
Papers - DiffusionDet
Papers - Image - Generator - Gaussian Noise - Bounding Boxes
Papers - Image - Ordinary Differential Equations (ODE)
-
ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model
Paper • 2404.07773 • Published • 1 -
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis
Paper • 2404.13686 • Published • 28 -
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper • 2404.14507 • Published • 23 -
NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement
Paper • 2404.05669 • Published • 1
Papers - Image - Object Detection - Bounding Boxes
Papers - Image - Bounding Boxes - Loss - Timeseries
Datasets - Image - Coco - Obj Det, Segmentation, Captions
Models - Image - Image Segmentation - Coco
-
facebook/maskformer-swin-base-coco
Image Segmentation • 0.1B • Updated • 1.53k • • 26 -
facebook/mask2former-swin-small-coco-panoptic
Image Segmentation • 68.8M • Updated • 661 • 1 -
facebook/mask2former-swin-tiny-coco-instance
Image Segmentation • 47.5M • Updated • 65.3k • • 11 -
facebook/mask2former-swin-small-ade-semantic
Image Segmentation • 68.8M • Updated • 10.5k • • 8
Models - Image - DPT - Dino
Papers - Image - ConsistencyDet
Audio reading using bark: https://drive.google.com/file/d/1AlHLzeUd04LXgDj99SOvmQJTy9chufGo/view?usp=sharing
Papers - Image - TrOCR
Read by Bark: https://drive.google.com/file/d/1apmyvLMEQ97ObHKzQna9URFHF0Xg-EsO/view?usp=sharing
Models - Rag
Models - Mistral
Models - Image - Clip
Models - Image - Dino
Models - Agent
Models - Agent - On-Device
Spaces - Comics
Papers - Chain of Thoughts - Visualization
Papers - Visualization of Thought (VoT) - Mind’s Eye
Papers - Benchmarks - Documentation
Papers - Benchmark - Multimodal - Image Documentation
Papers - AutoDesk
Papers - Investing - Stock Forecasting
Papers - University of Shenzhen
Papers - Investing - AceFormer - ACEEMD
Papers - Image - Knowledge Graph
Papers - Agent
-
ODA: Observation-Driven Agent for integrating LLMs and Knowledge Graphs
Paper • 2404.07677 • Published • 1 -
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
Paper • 2404.07738 • Published • 2 -
Scaling Instructable Agents Across Many Simulated Worlds
Paper • 2404.10179 • Published • 28 -
A Multimodal Automated Interpretability Agent
Paper • 2404.14394 • Published • 22
Papers - Knowledge Graph - Tasks
Papers - Panasonic
Papers - University of Xiamen
Papers - Selective Language Modeling vs Causal
Papers - Fine-tuning - Math
-
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 93 -
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Paper • 2408.16293 • Published • 27 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 129
Datasets - Chat
Papers - Image - VQA
-
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 32 -
RegionGPT: Towards Region Understanding Vision Language Model
Paper • 2403.02330 • Published • 2 -
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Paper • 2404.12803 • Published • 30 -
Pegasus-v1 Technical Report
Paper • 2404.14687 • Published • 33
Papers - Image - VQA - Captions High Res Alignment
Papers - University - University of Santa Barbara
Papers - University - Columbia University
-
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 32 -
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 4 -
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation
Paper • 2404.13026 • Published • 24 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15
Papers - Image - VQA - Ferret
Papers - Image - Encoders - Dual Vision MLP projectors
Papers - Image - Referring Object Classification (ROC)
Where the model is tasked with identifying
the object in a region mentioned in a query. we utilize the validation split of the
LVIS dataset
Papers - Image - Dataset - LVIS
-
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 32 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 30 -
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 4 -
Grounded Language-Image Pre-training
Paper • 2112.03857 • Published • 3
Papers - Image - Grounding
Papers - Image - Training - OCR - High-Res Dense Alignment
Papers - Image - Captioning
Papers - Documents - UDOP
Papers - Documents - Fine-tuning - LayoutLM and UDOP
Papers - Image - Scientific Charts
Papers - Documents - Scientific Charts
Papers - University of Ulm
Papers - Image - Fine-tuning - ICPR22 dataset
Papers - Image - Fine-tuning - CHIME-R and EconBiz datasets
Papers - Image - Fine-tuning - DeGruyter dataset
Papers - Embeddings - Text - RoBERTA and BPE
Papers - Embeddings - Image
Papers - Embeddings - Image - DiT and dVAE
Papers - LayoutLM - Fine-tuning - Word Patch Alignment
Papers - Tokenizers - Text - T5
Papers - Fine-tuning - Hyperparameter - FUNSD
Papers - Classification - F1 Macro and F1 Micro
Papers - Timeseries
Papers - University of Panjab
Papers - Image - Report - Training - CNN RNN LTSM MLP
Papers - Image - Connectionist Temporal Classification (CTC)
Papers - Image - Climate - SHAP
Papers - Courant Institute
Papers - Image - Climate - ERA5
Papers - Image - Coco - Annotation Pipeline
Papers - Image - Mask - box-kMaX over kMaX-DeepLab
Papers - Image - Coco - Annotation RLHF
Papers - Image - Coco - Panoptic
Papers - Video - NeRF - Real Estate Walkthroughs
Papers - NeRF - Training - Photometric Consistency Patches
Papers - Image - Datasets - ETH3D
Papers - Image - Datasets - TanksAndTemples
Papers - Image - NeRF - Mesh - TSDF fusion RGBD sequences
Papers - Image - Evaluation Metrics - PSNR SSIM LPIPS
Datasets - Research Papers - ARXIV QA
Papers - University of Alberta
Papers - University of Auburn
Papers - Explainability - Image - VQA
Papers - Explainability - Image - VQA - CHM-Corr++
Spaces - Chat - QA - Research Papers on Arxiv read by Claude
Audio Reading - 2404.08639 - COCONut
Read by Bark: https://drive.google.com/file/d/1qltkY31-013JDQn-u2pmnjPyCaUcOqsV/view?usp=sharing
Audio Reading - 2403.07691 - ORPO Fine-tuning
Read by Bark: https://drive.google.com/file/d/1no3kjSmexQxlS-KjhRB0jB5hz72Yuhsb/view?usp=sharing
Audio Reading - 2212.05525 - Extending TrOCR
Read by Bark: https://drive.google.com/file/d/1apmyvLMEQ97ObHKzQna9URFHF0Xg-EsO/view?usp=sharing
Audio Reading - 2404.06209 - Elephants Never Forget
Read by Bark: https://drive.google.com/file/d/13IlbhKh71vxLpdYJ6mkIiiJZOUsf7XFv/view?usp=sharing
Audio Reading - 2404.07773 - ConsistencyDet
Read by Bark: https://drive.google.com/file/d/1AlHLzeUd04LXgDj99SOvmQJTy9chufGo/view?usp=sharing
Models - Reasoning
-
mlabonne/AlphaMonarch-7B
Text Generation • 7B • Updated • 14.4k • • 148 -
Qwen/QwQ-32B-Preview
Text Generation • 33B • Updated • 25.2k • • 1.74k -
deepseek-ai/DeepSeek-R1-Distill-Llama-8B
Text Generation • 8B • Updated • 573k • • 816 -
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
Text Generation • 8B • Updated • 632k • • 734
Datasets - Audio - Large
Datasets - Audio - Multilingual
Datasets - Audio - Multilingual - Large
Spaces - Audio - TTS
Models - WizardLM
Datasets - Benchmark - Tasks
Models - Image - QA
Datasets - Chat - Persuasion
Papers - Training Research - Dataset Ordering
Papers - Training - Curriculum Learning
Papers - Training - Education Stage then Cognitive Hierarchy
Papers - Training - Curriculum Instruction Tuning
Papers - Llama 2
-
Instruction Tuning with Human Curriculum
Paper • 2310.09518 • Published • 3 -
A Thorough Examination of Decoding Methods in the Era of LLMs
Paper • 2402.06925 • Published • 1 -
Distilling System 2 into System 1
Paper • 2407.06023 • Published • 4 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108
Papers - Training - AI2 Reasoning
Papers - Training - Out of Vocabulary
Papers - Training - Multilingual - Out of Vocabulary
Papers - University of Charles
Papers - Training - Report - LTSM vs LLM vs Ensemble
Papers - Training - Filter Low Quality with Contriever
Papers - University of Seoul National
Papers - University of Ewha Womans
Papers - University - National University of Singapore
-
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 12 -
Contrastive Chain-of-Thought Prompting
Paper • 2311.09277 • Published • 36 -
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper • 2404.14700 • Published • 32 -
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
Paper • 2406.06911 • Published • 12
Papers - University - University of Michigan
Papers - Audio - Fine-tuning - DPO
Papers - Audio - Fine-tuning - Alpaca
Papers - Audio - Clap
We use an ensemble filtering strategy based on two different CLAP models: 630k-audioset-best and 630k-best
Papers - Audio - Encoder - Variational Auto-Encoder (VAE)
Papers - Audio - Frechet Audio Distance (FAD) like FID
Papers - University of North Carolina Chapel Hill
Papers - University of Southern California
Papers - Megalodon - Unlimited Context
Papers - Multimodal - Long Context - Megalodon
Papers - 3DGS - Compression
Papers - Multimodal - Speculative Decoding
Papers - Inference - Multimodal
Papers - Qualcomm
Papers - Inference - Speculative Decoding - Draft Model
Papers - Dataset Grooming - Report
Papers - Dataset Generation - Guide
Papers - Image - Hyperspectral Images (HSI)
Papers - Mamba - Bidirectional
Papers - Healthcare - Image - Cancer
Papers - Healthcare - Image - Cancer - Prostate
Papers - Agent - Research
-
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
Paper • 2404.07738 • Published • 2 -
Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations
Paper • 2404.17521 • Published • 13 -
LEGENT: Open Platform for Embodied Agents
Paper • 2404.18243 • Published • 22 -
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
Paper • 2406.07394 • Published • 29
Papers - Research - Automated Research
Papers - Fine-tuning - DPO - KL Divergence vs Learning Rates
Papers - Tinkoff AI
Papers - Embeddings - Scalable Positional Encodings
Papers - University of Pennsylvania
Papers - Image - Layer Pruning
Papers - Inference - Image
Papers - Inference - Image - Layer Pruning
Audio Reading - 2402.16827 - Survey on Data Selection ~3.5h
Reading by Bark: https://drive.google.com/file/d/1cdKmflJ3jRKszi5s3RF4Ru6nUw8tvpHX/view?usp=sharing reading duration 3 hours and 28 minutes
Audio Reading - 2404.08011 - Review Handwriting Recognition
Read by Bark: https://drive.google.com/file/d/1yCc6rr199rQHHNwKozHhqzz0Rr48z03A/view?usp=sharing duration is 1 hour and 11 min 47s
Papers - Pre-training - Warm-Start - Encoder and Decoders
Papers - Pre-training - Pegasus
Papers - Imperial College of London
Papers - Pre-training - Text - Masked Language Models (MLM)
Papers - Pre-training - Self-Supervised for Downstream Tasks
Papers - Pre-training - Warm-Start - Encoders - BPE
Papers - Pre-training - Warm-Start - Encoders - Unigram
Papers - Pre-training - Summarization
Papers - Pre-training - Encoders - Bert
Papers - Pre-training - Encoders - Roberta
Papers - Pre-training - Warm-Start
Papers - Pre-training - Unsupervised
Papers - Pre-training - Checkpoints
Models - Fintech - Financial Summarization
Audio Reading - 2310.09518 - Instruct with Human Curriculum
Read by Bark: https://drive.google.com/file/d/1fEZ8uwnfniMljZ5S60NxOav6Qav2A-XB/view?usp=sharing duration is 44m 12s
Datasets - Image - Multilingual - VQA
Datasets - Image - VQA
Papers - Inference
Papers - Inference - KV Cache
Models - Encoders - Multimodal - Clip - SigLIP
better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities
Models - Image - Embeddings
Spaces - Multimodal - Image and Chat
Papers - Stability AI
Papers - Audio - Activation - Snake
Papers - Audio - Decoders - DAC - No tanh activation
The DAC decoder tanh caused harmonic distortion
Papers - Audio - RoPE
Papers - Audio - Embedding - Time - Sinusoidal Cross Attensi
Papers - Audio - Embedding - Text - Clap - Cross Attention
Papers - Audio - Embedding - Clap - Timestep - Prepended
Papers - Audio - Encoders - Clap - HTSAT audio RoBERTa text
Papers - Attention - Block-wise
Papers - Audio - Encoders - Clap - Training - Metadata
Papers - Audio - Musical Structure Analysis
Papers - Audio - Encoders - Laion-Clap
Found 5566 memorized, repeated audio sequences
Papers - Agent - Sima
Papers - World Sim - Agent - Tasks
Papers - Training - Video Games
Papers - Video Games
Papers - Video Games - Survival
Papers - Video Games - Crafting
Papers - Video Games - Survival - Valheim
Papers - Video Games - Navigation
Papers - Video Games - Object Tools
Papers - Video Games - Farming
Papers - Video Games - Environment Resource Planning
Papers - World Sim - Encoder - Image - Sparc
Papers - World Sim - Encoder - Video - Phenaki
an encoder-decoder model
which compresses videos to discrete embeddings (tokens) and a transformer model to translate
text embeddings to video tokens.
Papers - World Sim - OCR
Papers - World Sim - Training - Classifier-Free Guidance
Fig 10. CFG substantially improves language conditionality.
Papers - World Sim - Cognitive Architectures
Papers - Video - Phenaki
Papers - Video - Encoders - C-ViViT
The embeddings of
images and video patches from raw frames x are processed by a spatial and then a causal transformer
(AR in time) to gen video tokens
Papers - Video - Encoders - C-ViViT - MaskGiT
MaskGiT is trained to reconstruct
masked tokens z predicted by a frozen C-ViViT encoder and conditioned on T5X tokens of a given
prompt p0
Papers - Embeddings - Text - T5X
Papers - World Sim - Embedings - Text - T5X
Papers - JAX
Papers - GNN
-
On the Scalability of GNNs for Molecular Graphs
Paper • 2404.11568 • Published • 1 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 118 -
Architectures of Topological Deep Learning: A Survey on Topological Neural Networks
Paper • 2304.10031 • Published • 3 -
Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold
Paper • 2408.14608 • Published • 8
Papers - Training - GNN
Papers - GNN - Dataset - LargeMix
Papers - GNN - Fine-tuning
Papers - GNN - Benchmark - TDC
Papers - GNN - Benchmark - Polaris
Papers - GNN - Benchmark - MoleculeNet
Papers - Hybrid Arch - Skip Connections
Papers - GNN - MPNN
Papers - GNN - Encoders
Papers - GNN - Encoders - Positional and Structural Encoding
1) random walk diagonals 2) Laplacian eigenvectors for geometry and position 3) global structural information about the graph
Papers - GNN - Fine-tuning - Custom Layer - MLP
Papers - GNN - MoIE
Papers - Healthcare - Molecules - GNN
Papers - Healthcare - Molecules
Papers - Healthcare - GNN
Papers - GNN - Ensemble
Papers - Healthcare - Drug Discovery
Papers - Healthcare - Drug Discovery - GNN
Papers - Valence Labs
Papers - University of Montreal
Papers - University - University of Toronto
Papers - University of McGill
Papers - Healthcare - Image - X-ray
Papers - Healthcare - Image - Chest - X-ray
Papers - Healthcare - Image - Lung Disease
Papers - XAI
-
Explainable Lung Disease Classification from Chest X-Ray Images Utilizing Deep Learning and XAI
Paper • 2404.11428 • Published • 1 -
A Multimodal Automated Interpretability Agent
Paper • 2404.14394 • Published • 22 -
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Paper • 2404.07129 • Published • 3 -
The Geometry of Categorical and Hierarchical Concepts in Large Language Models
Paper • 2406.01506 • Published • 3
Papers - XAI - Gradient Weighted Class Activation Mapping
Grad-CAM
Papers - XAI - Loc Interpretable Model Agnostic Explanation
LIME
Papers - XAI - Fine-tuning
Papers - University of Ahsanullah
Papers - Healthcare - Image - Covid-19
Papers - Image - Visual Feature Extractor
Papers - Inference - Batch - Hierarchical Sharing Pattern
Papers - Optimizer - Lamb
Papers - Attention - Sliding Window
Papers - Training - 3D Parallelism - Back - Reduce-Scatter
Papers - Training - 3D Parallelism - Forward - All-Gather
Papers - Custom Layers - Feedforward Neural Network (FFN)
-
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Paper • 2402.15627 • Published • 38 -
One Wide Feedforward is All You Need
Paper • 2309.01826 • Published • 33 -
Fast Feedforward Networks
Paper • 2308.14711 • Published • 3 -
Memory Layers at Scale
Paper • 2412.09764 • Published • 5
Papers - Training Research - Model FLOPs Utilization (MFU)
Papers - Training Research - Fault Tolerance
Papers - Custom Layers - Decoders - No FFN
Papers - Training - Parameter Reduction - FFN
Papers - Equall AI
Papers - Multilingual - Spanish
Datasets - Fine-tuning - Orpo
Papers - Emergent Properties
Papers - Emergent Properties - Multiple Choice Grade
Papers - Emergent Properties - Exact String Match
Papers - Emergent Properties - Image
Papers - Training - Epoch - 4 Epochs by Default
See Page 7 Figure 5 on right: Repeating for 4 epochs is almost as good as new data
Papers - Attention - Mixture-of-Attention (MoA)
Papers - Surge Global
Papers - Benchmarks - Safety
Papers - Benchmarks - Toxicity
Papers - Reward Model - Fine-tuning
Papers - Fine-tuning - Reward Model
Papers - Reward Model - Cross-Lingual
We propose to perform reward optimization using a RM trained for a different language. Assuming model generation quality transfers cross-lingually
Papers - Datasets - Multilingual - Documents - Seahorse
contains documents and summaries in six languages (German, English, Spanish, Russian, Turkish, and Vietnamese) with pointwise human ratings
Papers - Datasets - Multilingual - OpenAssistant
multilingual, pairwise human-rated chat transcripts.
For the SFT data, we use the human-preferred response in each pair to finetune the model
Papers - Inference - Speculative Decoding - KV Cache
Papers - Speculative Decoding - KV Cache
we recognize two memory bottlenecks: model weights and KV cache, and the latter gradually bottleneck(s) as context length increases
Papers - KV Cache
-
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Paper • 2404.11912 • Published • 17 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 27 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 260 -
An Evolved Universal Transformer Memory
Paper • 2410.13166 • Published • 6
Papers - Inference - Speculative Decoding - Draft - KV Cache
Papers - Speculative Decoding - Draft - Base Model - JF68M
we utilize a 4K retrieval cache as an intermediate draft cache in our hierarchical system, while leveraging the JackFram/Llama68M (JF68M) [28] model
Papers - Speculative Decoding - Long Context
Papers - Speculative Decoding - Draft - Model - SpecInfer
Models - Speculative Decoding - Draft - Base Model
Models - Speculative Decoding - Draft - SpecInfer
Papers - Speculative Decoding - Token Tree Verification
Papers - Speculative Decoding - Token Verification
Papers - TensorRT-LLM - FasterTransformer - deprecated
Papers - Multimodal - Reka - Image Video Text Audio
Papers - Tokenizers - tiktoken
Papers - Animation - Text
Papers - Animation - Text - Kinetic Typography
Papers - Video - Text Animation
Papers - Image - LPIPS
-
Dynamic Typography: Bringing Words to Life
Paper • 2404.11614 • Published • 45 -
Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
Paper • 2404.14351 • Published • 6 -
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
Paper • 2404.17672 • Published • 19 -
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 71
Papers - Video - Score Distillation Sampling
Models - Fine-tuning - Orpo
Papers. - Samsung
Papers - Nota
Papers - 3D - Mesh Generator
-
MeshLRM: Large Reconstruction Model for High-Quality Mesh
Paper • 2404.12385 • Published • 27 -
MaPa: Text-driven Photorealistic Material Painting for 3D Shapes
Paper • 2404.17569 • Published • 13 -
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
Paper • 2406.10163 • Published • 33 -
Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials
Paper • 2407.02445 • Published • 4
Papers - Training - 3D - NeRF
Papers - Games - AlphaGo
Papers - Training - Self-Improvement
Papers - University of Turku
Datasets - Benchmarks - Image - QA - Real World Objects
Papers - Benchmarks - Image - QA - Abstract
Papers - Benchmarks - Image - Visual Commonsense
Datasets - Benchmarks - Image
Datasets - Benchmarks - Image - QA
Datasets - Benchmarks - Image - Blink
Papers - Context - NoPE
Papers - International Human Phenome Institute
Papers - University - East China Normal University
Papers - Datasets - Training - Context - LongBencb
Papers - Context - Length Generalization
Papers - Attention - NoPE - Long Context with SoftMax Temp
Uniform scaling not as good as Head-based scaling
Papers - Attention - Training - Context - Head-based Scaling
Papers - TinyLlama
Papers - Datasets - Training - Context - SlimPajama
Papers - Training - Eval - Sliding Window Perplexity
Papers - Datasets - Training - Context - Starcoderdata
Papers - Training - Eval - Sliding Window - PG19
Papers - Training - Eval - Sliding Window - Proof-pile
Papers - Context - NoPE vs RoPE - Passkey Retrieval Viz
Page 7 fig shows NoPE extending passed the models context size from pretraining or fine-tuning
Papers - Transformers Without Positional Encoding - NoPE
-
Length Generalization of Causal Transformers without Position Encoding
Paper • 2404.12224 • Published • 1 -
Transformer Language Models without Positional Encodings Still Learn Positional Information
Paper • 2203.16634 • Published • 5 -
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2 -
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2
Papers - Mila
Papers - IBM
Papers - ServiceNow
Papers - Attention - Multi-Head Attention (MHA)
-
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2 -
Transformers Can Represent n-gram Language Models
Paper • 2404.14994 • Published • 21 -
Are Sixteen Heads Really Better than One?
Paper • 1905.10650 • Published • 2 -
Reasoning in Large Language Models: A Geometric Perspective
Paper • 2407.02678 • Published • 1
Papers - Training - Residual Connections
Papers - Text - Encoders - Bert
-
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2 -
BERTs are Generative In-Context Learners
Paper • 2406.04823 • Published • 1 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 156 -
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Paper • 2006.11477 • Published • 7
Papers - Positional Encodings
Papers - Embeddings - Absolute Position Embedding (APE)
Papers - Embeddings - ALiBi
Papers - Encodings - Rotary - RoPE
-
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 166 -
Round and Round We Go! What makes Rotary Positional Encodings useful?
Paper • 2410.06205 • Published • 2 -
ThunderKittens: Simple, Fast, and Adorable AI Kernels
Paper • 2410.20399 • Published • 2
Papers - Encodings - No Positional Encodings - NoPE
-
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
Teaching Transformers Causal Reasoning through Axiomatic Training
Paper • 2407.07612 • Published • 2 -
Round and Round We Go! What makes Rotary Positional Encodings useful?
Paper • 2410.06205 • Published • 2
Papers - Embeddings - T5 Relative Bias
Papers - Chain of Thought - Scratchpad
Papers - Text - Classification - FastFit
Papers - University - Hebrew University of Jerusalem
Papers - Text - Datasets - Classification and Labels
Papers - Benchmarks - Text - Classification - FewMany
Papers - Weather
Papers - Datasets - Weather
Papers - Datasets - Weather - ERA5
Papers - Historical - Weather
Papers - University of Aarhus
Papers - University - Berlin Technical University
Datasets - Coding - Code Reviews
Datasets - Benchmarks - Coding
Datasets - Text - Web
Datasets - Text - CommonCrawl
Datasets - Text - QA - Web
Datasets - Text - Research Papers - QA - QASPER
Papers - Image - Graph - Understanding
Papers - Knowledge Graphs
-
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Paper • 2401.18059 • Published • 46 -
KGValidator: A Framework for Automatic Validation of Knowledge Graph Construction
Paper • 2404.15923 • Published • 2 -
The Geometry of Categorical and Hierarchical Concepts in Large Language Models
Paper • 2406.01506 • Published • 3
Papers - Image - Glip
Core techniques: 1) unified grounding loss 2) language-aware deep fusion 3) pre-training with both types of data.
Papers - University - UCLA
Papers - International Digital Economy Academy (IDEA)
Papers - Image - Phrase Grounding
Papers - Image - Bounding Box - Coco - Teacher and Student
Papers - Image - Grounded Captions
Models - Image - GLIGEN
Papers - Text - Instruct - Grounding and Captions
Papers - Image - UMAP
Papers - Text - Legal - Remove Redaction
Papers - University - University of Padua
Papers - Benchmarks - Text - Text Anonymization Benchmark
Papers - Text - Named Entity Recognition (NER)
Papers - Text - Encoders - Sentence Transformers (SBERT)
Papers - Text - Eval - SMOTE
Papers - ML - XGBoost
Papers - Text - Remove Redaction - Countermeasures
Papers - University - Delft University
Papers - FDM Business Services
Papers - Attention - Gated Self-Attentio - Spatial Grounding
Papers - Inference - Scheduled Sampling
improved visual quality as the rough concept location and outline are decided in the early stages, followed by fine-grained details in later stages.
Papers - Image - Object Detection - YOLO
-
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 4 -
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 42 -
DETRs Beat YOLOs on Real-time Object Detection
Paper • 2304.08069 • Published • 14 -
RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer
Paper • 2407.17140 • Published • 2
Papers - Image - Inpainting
Papers - Image - Keypoint
Papers - 3DGS - Structure from Motion
Papers - SQL - Database Migrations
Papers - SQL - Knowledge Graphs
Papers - SQL - Query Tree
Papers - SQL - Curriculum Learning
Papers - Web - Agent
Papers - University - Simon Fraser University
Papers - University - University of British Columbia
Papers - Coding - Git Commits
Papers - Coding - Defects
Papers - 3DGS - Material Point Method (MPM)
Papers - 3DGS - Motion
Papers - Video - Simulated Material Dynamics - MLS-MPM
Papers - 3DGS - K-Means Clustering
Driving particles
Papers - University - Huazhong University
Papers - Phi - Technical Report
Papers - Text - Mobile
Papers - Audio - Classifier-Free Guidance (CFG)
Papers - Kunlun
Papers - Image - Fine-tuning - LoRA
-
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis
Paper • 2404.13686 • Published • 28 -
MultiBooth: Towards Generating All Your Concepts in an Image from Text
Paper • 2404.14239 • Published • 9 -
Stylus: Automatic Adapter Selection for Diffusion Models
Paper • 2404.18928 • Published • 15 -
MagicQuill: An Intelligent Interactive Image Editing System
Paper • 2411.09703 • Published • 78
Papers - Multimodal - XAI
Papers - XAI - Eval - Synthetic Vision Neuron
Papers - XAI - Research in Appendix
Papers - XAI - MAIA
Papers - Llama 3
-
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
Paper • 2404.14047 • Published • 45 -
Reasoning in Large Language Models: A Geometric Perspective
Paper • 2407.02678 • Published • 1 -
Natural Language Reinforcement Learning
Paper • 2411.14251 • Published • 31 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108
Papers - Llama 3 - Fine-tuning - Quantization
Papers - Llama 3 - Fine-tuning
-
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
Paper • 2404.14047 • Published • 45 -
LiteSearch: Efficacious Tree Search for LLM
Paper • 2407.00320 • Published • 40 -
Cut Your Losses in Large-Vocabulary Language Models
Paper • 2411.09009 • Published • 49 -
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
Paper • 2411.09595 • Published • 77
Papers - Llama 3 - GPTQ AWQ PB-LLM BiLLM - 1.1-8 bits LoRA
Papers - Image - NeRF - Structure from Motion (SfM)
Papers - Niantic
Papers - Benchmarks - Fintech
Papers - Coding - Automated Workflows
Papers - Fintech - Datasets - SEC - Edgar Filings DB - N-CEN
Papers - Investing - Document QA - SEC Filings
Papers - JP Morgan Chase
Papers - Image - Consistency Trajectory Model (CTM)
Papers - KL Regularization - Diffusion Matching Distillation
Papers - Security - Prompt Injection
Papers - Prompts - Security - Instruction Prioritization
Papers - Image - Multi-Concept Customization (MCC)
Papers - Image - Adaptive Concept Normalization (ACN)
Papers - Image - Encoder - Single-Concept Learning - QFormer
Multi-modal Concept Extraction
Papers - Image - Synthetic Generator - Canny
Papers - Image - Synthetic Generator - Depth
Datasets - Image - Classification
Papers - Image - Datasets - CIFAR
-
All you need is a good init
Paper • 1511.06422 • Published • 1 -
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper • 2404.14507 • Published • 23 -
Deep Residual Learning for Image Recognition
Paper • 1512.03385 • Published • 8 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15
Papers - Image - Datasets - MNIST
Papers - Activation Functions
Papers - Pre-training - Layer Initialization
Papers - Pre-training - Layer Initialization - LSUV
Papers - Image - Datasets - ImageNet
-
All you need is a good init
Paper • 1511.06422 • Published • 1 -
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper • 2404.14507 • Published • 23 -
Efficient Transformer Encoders for Mask2Former-style models
Paper • 2404.15244 • Published • 1 -
Deep Residual Learning for Image Recognition
Paper • 1512.03385 • Published • 8
Papers - University - Czech Technical University
Papers - Pre-training - Weight Initialization
Models - Instruct - Context - 128k
Models - Phi-3
Models - Text - Long Context
Papers - Audio - Attention - FlashSpeech
Papers - Command-R
Papers - Cohere
Papers - Pre-training - Text - Cross-lingual
Papers - Training - KL-divergence Upper bound (KLUB)
Papers - Twelve Labs
Papers - Audio - Latent Consistency Model (LCM)
Papers - Audio - Discriminator - Adversarial Loss
Papers - Audio - Prosody Generator
Papers - Audio - Voice Conversion
Papers - MSRA
Papers - University - Inner Mongolia University
Papers - University - Beijing University
Papers - Attention - Flash Attention
Papers - OLMo
Papers - MobiLlama
Papers - Fine-tuning - Dataset - Instruct - UltraFeedback
Papers - Fine-tuning - PEFT
-
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 126 -
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
Paper • 2303.15647 • Published • 4 -
Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer
Paper • 2205.12148 • Published • 2 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 43
Papers - Fine-tuning - DoRA
Papers - OpenELM
-
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 126 -
apple/OpenELM-450M
Text Generation • 0.5B • Updated • 593 • 26 -
apple/OpenELM-3B
Text Generation • 3B • Updated • 925 • 126 -
apple/OpenELM-3B-Instruct
Text Generation • 3B • Updated • 2.86k • 339
Papers - Fine-tuning - Text - Bottleneck - RMSNorm
Models - OpenELM
Papers - Training Research - Flash Memory - DRAM
Papers - Attention - Sparse Attention
Papers - Attention - Hard Attention
Papers - Image - Mask2Former
Papers - Training - Early Exit - Gating Network
Encoder - Weighted Stochastic Depth
Papers - Image - Detectron2
Paper - Image - Segmentation - Cost vs Quality - Gating Net
Papers - University - University of California Riverside
Papers - NEC Laboratories
Papers - Image - Cost Reduction - Early Exit
Papers - Custom Layers - No Dropout - Batch Normalization
Papers - Model - Inception
Papers - Pre-training - Batch Normalization
Papers - Image - Training - Per-class Regressor (PCR)
Papers - Healthcare - DNA
Papers - Healthcare - Mamba
Papers - University - University of Massachusetts
Papers - Fine-tuning - Multilingual - Multi-task
Papers - Fine-tuning - Transfer Learning - Cross-Lingual
Papers - Fine-tuning - Named Entity Recognition (NER)
Papers - Fine-tuning - Part of Speech (POS)
Papers - University - University of Groningen
Papers - Cross-lingual
Models - Multilingual - Rag - Catalan, Spanish, English
Datasets - Text - Multilingual - Catalan, Spanish, English
Models - Audio - TTS - Catalan
Datasets - Text - Web, Medical Journals
Spaces - CoT
Spaces - Image - Clothing
Papers - Coding - Knowledge Graphs
Papers - Knowledge Graphs - Construction and Validation
Papers - Rag - Knowledge Graphs
Papers - Documents - Knowledge Graphs
Papers - Data Extraction - OpenIE
https://stanfordnlp.github.io/CoreNLP/openie.html
Papers - Prompt - Knowledge Graphs
Papers - Knowledge Graphs - Prompts
Papers - Knowledge Graphs - Validation - Pydantic
Papers - Quantexa
Papers - Knowledge Graphs - Llama 2
See Appendix A.2
Papers - Apple - CoreNet
Papers - Training - Contrastive Loss - CatLIP
Papers - Image - Classification- WordNet synsets
https://wordnet.princeton.edu/
Papers - Image - Pre-training - Transfer Learning
Papers - Mixture of Data Experts (MoDE)
Papers - Pre-training - Continual - Expert Onboarding
Papers - Image - MoDE - Clip
Papers - Training - Image - MoE - Clip
Papers - Image - Pre-training - Distribution Clustering
Papers - Embeddings - Clustering
Papers - Image - Encoders - MetaClip
Papers - Embeddings - Text - SimCSE
Papers - Embeddings - Text - TF-IDF
Papers - Pre-training - MoE - Flexible Expert Ensembles
Papers - MoE - Training - Expert Prioritization
Using SimCSE
Papers - Pre-training - MoE - Continual Learning
Papers - Pre-Training - MoE - Train One Expert
Papers - Inference - MoE - Routing with Task Metadata
Papers - MoE - Inference - Routing with Task Metadata
Papers - Image - Datasets - Flickr
Papers - Image - Encoders - OpenClip
Papers - MoE - Image - MoDE
Papers - Image - Benchmarks - Clip
Papers - Image - Datasets - LAION
-
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 71 -
Data curation via joint example selection further accelerates multimodal learning
Paper • 2406.17711 • Published • 3 -
Unveiling Encoder-Free Vision-Language Models
Paper • 2406.11832 • Published • 54
Papers - MoE - Routing - Softmax Normalization
Papers - Attention - BASS
Papers - 3DGS - Segmentation
Papers - 3D - NeRF
Papers - 3D - Interactive - Semantic Editing based on Loss
Papers - 3D - Interactive
Papers - 3D - Gaussian Splatting and NeRF
Papers - University - The Chinese University of Hong Kong
-
Interactive3D: Create What You Want by Interactive 3D Generation
Paper • 2404.16510 • Published • 21 -
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Paper • 2404.16790 • Published • 10 -
A Thorough Examination of Decoding Methods in the Era of LLMs
Paper • 2402.06925 • Published • 1 -
LLaVA-OneVision: Easy Visual Task Transfer
Paper • 2408.03326 • Published • 60
Papers - SenseTime Research
Papers - Benchmarks - Multimodal
-
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Paper • 2404.16790 • Published • 10 -
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Paper • 2406.08407 • Published • 28 -
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
Paper • 2408.03361 • Published • 85
Papers - Benchmarks - Multimodal - SEED-Bench
Papers - Multimodal - Benchmarks - Report
Papers - ARC Lab
Papers - Inference - Early Exit
Papers - Inference - Draft Model - Early Exit - Dropout
Papers - Prompts - Adversarial
Papers - Agent - Image
Papers - Agent - Robotics
Papers - Agent - Tasks
-
LEGENT: Open Platform for Embodied Agents
Paper • 2404.18243 • Published • 22 -
Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations
Paper • 2404.17521 • Published • 13 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 118 -
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 18
Papers - Healthcare - Fine-tuning
Papers - Healthcare - Chain of Reasoning (CoR)
Papers - Chain of Reasoning (CoR)
Papers - Training - Self-Guided with Search
Papers - Inference - Uncertainty-Guided Search
Papers - Healthcare - VQA - Understanding
Papers - Healthcare - Multimodal
Papers - Healthcare - Biomedical Research
Papers - Gemini
Papers - Healthcare - Prompts
Papers - Healthcare - Surgery - VQA
Papers - Healthcare - Radiology Objects in Context (ROCO)
Papers - Healthcare - Benchmarks - Text - NEJM
Papers - Healthcare - Benchmarks - Text - MMMU-HM
Papers - Healthcare - Benchmarks - Long Context - MIMIC-III
Papers - Healthcare - Benchmarks - VQA - MedVidQA
Papers - Healthcare - Benchmarks - Video - Cholec80
Papers - Healthcare - Benchmarks - Video - Cholec80-CVS
Papers - Healthcare - Report
Papers - Speculative Decoding - Early Exit
Papers - 3D - Garment
Papers - Training - Multi-Model Evaluation
Papers - Training - Multi-Model Evaluation - PoLL
Papers - Training - Evaluation - Multi-Hop QA
Papers - Prompts - Training - Evaluation - Multi-Hop QA
Papers - Agent - Training
Papers - Agent - Fine-tuning
Papers - Agent - Evaluation
Papers - Blender
Papers - 3D - Blender
Papers - 3D - Mesh Editing
Papers - 3D - Texture Editing
Papers - 3D - Lighting
Papers - Video - Robot Simulator - VQA
Papers - World Sim - VQA
Papers - World Sim - Scene Generation
Papers - University - Central South University
Papers - Image - Fine-tuning - Dataset - StylusDocs
Papers - Image - Multi-Model Evaluation
Papers - Image - Datasets - DOCCI
Papers - Image - Annotation Pipeline
Papers - Image - Annotation UI
Papers - 3DGS - Test - Dataset - RealEstate10k
Papers - 3DGS - Test - Dataset - Objaverse
Papers - Image - Detailed Multi-Object Generation
Papers - SK Telecom
Papers - 3DGS - Point Cloud - COLMAP
Papers - 3DGS - Tabular Structure Detection
Papers - 3DGS - Test - PSNR
Papers - 3DGS - Structure Preservation
Papers - University - Imperial College London
Papers - Custom Layers - KAN
Papers - Octopus
Papers - Nexa AI
Papers - Alternative Layers - KAN instead of MLP
Papers - California Institute of Technology
Papers - National Science Foundation (NSF)
Papers - University - University College London
Papers - ICL - Induction Head
-
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Paper • 2404.07129 • Published • 3 -
In-context Learning and Induction Heads
Paper • 2209.11895 • Published • 2 -
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Paper • 2403.07809 • Published • 1
Papers - ICL - Induction Circuit
Two layer induction heads
-
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Paper • 2404.07129 • Published • 3 -
In-context Learning and Induction Heads
Paper • 2209.11895 • Published • 2 -
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Paper • 2403.07809 • Published • 1
Papers - ICL - Training - Activations - Clamping
See: pattern-preserving ablation
Papers - Ensemble
Papers - Audio - Codec - Bitrate - Low
Papers - Model Editing
Papers - Image - Comics
Papers - Image - Multi-Caption Generation
Papers - University - Nankai University
Papers - Institute - Nankai Int Advanced Research Institute
Papers - Fine-tuning - LoRA - LoRAX
Spaces - Comics and Cartoons
Papers - Training - Datasets - Few-Shot Learning - OmniGlot
Papers - Emergent Properties - ICL - Induction Heads
Additional reading: https://transformer-circuits.pub/2021/framework/index.html
-
In-context Learning and Induction Heads
Paper • 2209.11895 • Published • 2 -
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Paper • 2404.07129 • Published • 3 -
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Paper • 2403.07809 • Published • 1
Papers - Custom Layers - No Dropout - Dropout Regularization
Papers - Custom Layers - Residual Connection - Ablation
Papers - Ablation - Attention - Head Pruning
Causal ablations taking into account LayerNorm
Papers - XAI - Attention - Induction Heads
Papers - Attention - Induction Heads
Papers - Attention - Ablation
Papers - Training - Ablation
Papers - Attention - Previous Token Head
Papers - Training Research - Loss Dynamics - Clamping
Papers - Training Research - Clamping
Modifying activations during training with proper gradient flow
Papers - XAI - Induction Head - Phase Change - Components
Papers - ICL - Induction Head - Num Labels vs Classes - Loss
Papers - ICL - Induction Circuit - Data Dependent Learning
Papers - Training - ICL - Induction Circuit Evolution
Papers - ICL - Induction Head - Copy vs QK Match
See figure 6: Classes vs labels in columns B and C. Subcircuit B delays phase change on number classes vs C delays on number of labels (dramatically)
Papers - ICL - Phase Change - Delay - Classes and Labels
Papers - XAI - Framework - pyvene
Papers - Pr(Ai)2R Group
Papers - Training - Interventions - Understanding
Papers - ICL - Locating Early and Late Fact Associations
Papers - ICL - Training - Distributed Alignment Search
Papers - ICL - Phase Change Delay - Large Vocabulary Size
Larger vocab is better compression, but may result in longer training ICL phase change delays due to the slower Induction Head Copy Subcircuit (C)
Papers - XAI - Attention - LayerNorm
Models - MoE - Reward Model
Papers - Reward Model - Preference Collection Construction
Papers - Reward Model - Model Merging vs Joint Training
Papers - Model Merging - DARE better than TIES
See Appendix E: Merging Method Ablation on MoE Mistral and instruct 7B. Ties merged degenerate vs DARE model merges did not degenerate.
Datasets - Reward Model - Preference Collection
Papers - LG
Papers - ICL - Residual Head Hypothesis
Papers - Dataset Storage - Orc vs Parquet
Local (gpu/cpu), compression (zstd) and over the wan (orc over s3 beats parquet too) results
Papers - Dataset Storage - Parquet
Papers - Dataset Storage - Orc
Papers - Voltron Data
Papers - Dataset Storage - cuDF - Parquet and Orc
See figure 19. Orc with cuDF beat parquet cuDF. Parquet arrow has dramatically more throughput without access to gpus
Papers - Dataset Storage - Zarr
Papers - Dataset Storage - Technical Report
Papers - Dataset Storage - Lessons Learned
Papers - BitNet
Additional paper with faq, code and tips on: https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf
-
You Only Cache Once: Decoder-Decoder Architectures for Language Models
Paper • 2405.05254 • Published • 10 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625 -
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model
Paper • 2406.04333 • Published • 38 -
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs
Paper • 2410.16144 • Published • 5
Papers - Pre-training - Rerankers
Papers - Fine-tuning - Rerankers
Models - Coding - Code Interpreter - Agent - Multi-shot
Papers - Training - Math
Papers - FNet - Fourier Transformers
Papers - SSMs
Papers - Mamba
-
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Paper • 2405.21060 • Published • 67 -
Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models
Paper • 2406.04320 • Published • 10 -
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Paper • 2403.19888 • Published • 12 -
Graph Mamba: Towards Learning on Graphs with State Space Models
Paper • 2402.08678 • Published • 17
Papers - Mamba - Mamba 2
Papers - Training - Cost Estimates
Papers - Epoch AI
Papers - Training - Historical GPU Cost Trends
Papers - Training - Report - Historical Cost Estimates
Papers - Reasoning - Complex - Alice in Wonderland - AIW
Papers - Reasoning - Complex - TACT
Papers - Text - Table Generation - Pandas DataFrames
Papers - Reasoning - Prompt - Table and Calculations
Papers - Coding - Table and Calculations using Pandas
Papers - Reasoning - Datasets - TACT
Models - Video - Captions
Datasets - Video - Captions
Papers - Chain of Thoughts - Multi-Shot - Buffer of Thoughts
Papers - Training - Piecewise Affine Multiplication
Papers - Training - PAM faster vs MatMul - CPU
Papers - Training - Multiplication Free
Papers - Training - Distribution Estimation - Autoregressive
Papers - Training - CNN - Binarized MNIST - Code Examples
Papers - Audio - Distribution Estimate - Spectrogram
Section 7.3.3
Papers - Unsupervised - Distribution Estimation
Papers - Datasets - Biology - SMILES
Papers - Healthcare - Virus Detection - Classification
Appendix A.9
Papers - Healthcare - Text - Biology QA
Papers - University - Hong Kong University
Papers - Image - Fine-tuning - Llama
Papers - Text to Image - Encoders - Flan-T5 XL
Papers - Image - Training Metrics - PSNR
Papers - Image - Training Metrics - SSIM
Papers - Image - Tokenizers - VQGAN
Papers - Image - Tokenizers - ViT-VQGAN
Papers - Image - Metrics - Inception Score (IS)
Papers - Image - Training - AutoRegressive
Papers - Inference - Image - vLLM
Table 7
Papers - Image - Training - Captions created with LLaVA
Stage 2 training using LLaVA to describe the image with a caption
Datasets - Text - Characters
Papers - Image - Tokenizer - L2 Normalization
Papers - Image - Training - Loss - Gradient Estimator
Papers - Image - Training - Loss - PatchGAN
Papers - Image - Training - Arch - 2D RoPE and SwiGLU
Papers - Image - Training - Detailed Training Tables
Papers - Image - Classifier-Free Guidance (CFG)
Guidance - "The intended effect is to decrease the diversity of the samples while increasing the quality of each individual sample."
-
Classifier-Free Diffusion Guidance
Paper • 2207.12598 • Published • 3 -
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 57 -
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models
Paper • 2404.07724 • Published • 14 -
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 71
Papers - Image - BigGAN
Papers - University - University of Heriot-Watt
Papers - Image - Metrics - FID and IS
Papers - Image - Sampling - Variety, Fidelity, Truncation
Papers - Image - Classifier - Inception v2 - JFT-300M
Papers - Training - Process Reward Model
Papers - RL - Monte Carlo Tree Search (MCTS)
Papers - Image - InceptionResNet-v2
Papers - Image - Datasets - JFT-300M
Papers - Image - Semantic Segmentation - Benchmark - PASCAL
Papers - Image - U-Net - Mask Augmentation
Papers - Ant Group
Papers - Monte Carlo Tree Search (MCTS) - Self-Refine MCTSr
Papers - Monte Carlo Tree Search (MCTS) - Math Reasoning
Papers - University - Hong Kong Polytechnic University
Papers - Image - Faster RCNN
Papers - Image - Region Proposal Network (RPN)
Papers - Image - Faster RCNN - 2nd Stage - Box Classifier
Papers - Image - Faster RCNN - Region Proposal Network (RPN)
Papers - Image - Human Pose Estimation - Coco
Papers - Image - InceptionResNet
Papers - Image - Deep Fakes - Detecting Video Forgeries
Papers - University - Drexel University
Papers - Prompts - Report
Papers - Agent - Security
Papers - Security - Pen Testing
Papers - Security - OWASP Testing
Papers - Image - Diffusion - Parallel Denoising
Papers - Image - Inference - Model Segmentation
Papers - Image - Denoising - Stride Denoising
Papers - Video - SDXL - Multi-GPU
Papers - Training - Multi-GPU
Papers - Coding - Benchmarks - McEval
Papers - Coding - Training - Annotations
Papers - Coding - Prompts
Papers - Coding - Inference - vLLM
Papers - Coding - Training - Distributed - PyTorch FSDP
Papers - Coding - Tokenizer - CodeBert
Papers - Coding - Tokenizer - Visualization - t-SNE
Papers - Coding - Tokenizer - Viz - Hierarchical Clustering
Papers - CCSE
Papers - Coding - MCoder
Papers - Coding - Classification - Categories Easy Med Hard
Papers - University - Beijing Information Science and Tech
Papers - Coding - Fine-tuning - CodeQwen
Papers - Coding - Fine-tuning - DeepSeekCoder
Papers - World Sim - Video - Benchmarks - MMWorld
Papers - University - University of California Santa Cruz
Papers - SSMs - Chimera
Papers - SSMs - Testing - Time Series Forecasting Report
Papers - SSMs - 2D Mamba
Papers - SSMs - Classification
Papers - SSMs - Time Series Anomaly Detection
Papers - Image - Augmentation - Edge Detection - HED
Papers - Image - ControlNet
-
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 57 -
Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention
Paper • 2408.00760 • Published • 8 -
MagicQuill: An Intelligent Interactive Image Editing System
Paper • 2411.09703 • Published • 78 -
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
Paper • 2403.06976 • Published • 2
Models - Image - ControlNet - Canny
Models - Image - ControlNet - Training Annotator
Papers - Image - Datasets - BSDS 500 - Berkeley Segmentation
Papers - Image - Datasets - NYUD - NYU Depth
Papers - Image - VGGNet
Papers - Image - Pipeline - HED
Papers - Image - CFG - CFG Resolution Weighting (CFG-RW)
Papers - 3DGS - Enhancement - Lighting
Papers - 3DGS - Cone Scatter Initialization
Papers - 3DGS - Security Camera - Image Enhancement
Papers - Training - Preference Optimization - DiscoPOP
Papers - Training - Preference Optimization - Code Samples
Papers - Training - Synthetic - Loss Functions
Papers - Image - Augmentation - Binarization - NAF-DPM
Papers - Image - OCR - Binarization - Otsu
Papers - Image - OCR - Binarization - Sauvola
Papers - Image - OCR - Binarization - DE-GAN
Papers - Image - OCR - Binarization - D2BFormer
Papers - Image - OCR - Binarization - DocDiff
Papers - Image - OCR - Binarization - DocEnTr
Papers - Image - OCR - CER (Character Error Rate)
Papers - Image - Datasets - OCR - DIBCO
Papers - Image - OCR - Metrics - PSNR, F-Measure, Fps
Papers - Image - DPM - Diffusion Probabilistic Model
Papers - Image - OCR - Fine-tuning - CTC Loss Function
Papers - Document - Deblurring
Papers - Datasets - Multimodal
-
DataComp: In search of the next generation of multimodal datasets
Paper • 2304.14108 • Published • 2 -
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Paper • 2407.08583 • Published • 13 -
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Paper • 2411.04709 • Published • 26 -
YFCC100M: The New Data in Multimedia Research
Paper • 1503.01817 • Published • 1
Models - Abliterated - Refusal Direction Editing
Papers - Image - Augmentation - Depth - MDE
Models - Image - Augmentation - Depth Estimation
Papers - Image - Augmentation - Plasma Fractals
Papers - XAI - Text - WordNet - Noun and Verb Hierarchy
Papers - Text - Training - Estimation - LDA
Papers - Quantization
-
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 56 -
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Paper • 2407.11062 • Published • 10 -
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models
Paper • 2407.12327 • Published • 79 -
BitNet a4.8: 4-bit Activations for 1-bit LLMs
Paper • 2411.04965 • Published • 69
Spaces - Image - Stable Diffusion - 3 - Medium
Papers - 3D - Artist-Created Meshes (AMs)
Papers - Inference - Speculative Decoding
Papers - Duplex Models
Papers - Embed - Duplex Models - Time-Division Mulitplexing
Papers - XAI - Confidence Regulation
Papers - Image - Charts - QA - Reasoning
-
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Paper • 2406.18521 • Published • 29 -
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Paper • 2407.01284 • Published • 81 -
ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
Paper • 2407.04172 • Published • 26
Papers - Image - Charts
Papers - Image - Benchmarks - Charts
Papers - In-Context Learning - Concept Learning Geometry
Papers - ICL - Prompt - Out of Distribution (OOD) Emergence
Papers - ICL - Concept Spaces
Papers - NTT Research
Papers - Coding - Programming by Example
Papers - Coding - List Functions, Editing, Logos ASCII Art
Papers - Coding - Eval - LambdaBeam Problems
Papers - Coding - Building Using Multi-shot Prompts
Spaces - Biology - ESM - Proteins
Models - Biology - Proteins - ESM
Papers - Healthcare - Datasets - Image - PubMedVision
Papers - Image - Fine-tuning - LLaVA
-
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale
Paper • 2406.19280 • Published • 63 -
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Paper • 2411.02327 • Published • 11 -
MagicQuill: An Intelligent Interactive Image Editing System
Paper • 2411.09703 • Published • 78 -
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 129
Papers - Image - Datasets - Biology - Arboretum
Papers - Math - TabMWP
Papers - Training - Brier Score - Probabilistic Accuracy
Datasets - Text - Personas
Papers - Rag - Benchmarks
Papers - Rag - Long Context
Models - Text - Multi-token Prediction
Papers - 3D - AssetGen
Papers - Benchmarks - Tables
Papers - Text - Inferential Adversaries
Papers - Image - Region Zoom
Papers - Multimodal - Embeddings
Papers - Image - Florence 2
Papers - 3DGS - Text - Enhance
Papers - 3DGS - Geometry-Bound
Papers - 3DGS - Loss - Interval Score Matching (ISM)
Papers - 3DGS - Classifier-Free Guidance (CFG)
Papers - 3DGS - Training - Model - Stability Diffusion 2
Papers - 3DGS - Text - Image - Mesh
Papers - Multimodal - 3DGS and Text
Models - Text - Research
Papers - Multimodal - Training - Joint Example Selection
Papers - Image - Training - Optimization - SigLIP
Models - Text - Fine-tuning - Axolotl
Models - Text - Chemistry
Models - SAE - Sparse Auto Encoders
Repo: https://github.com/EleutherAI/sae
Papers - Multimodal - Training - Decoder Only
Papers - Multimodal - Training - Patch Aligning Layer
Papers - Attention - Decoder Only
Papers - Multimodal - Training - Loss - Cross Entropy
Papers - Multimodal - Training - LLM Guided Pre-training
Papers - Agent - Tools
Papers - Agent - Math
Papers - Agent - Math - Reasoning
Papers - Text - Decoding - Truthful
Papers - Decoders - Report
Datasets - CoT - Math
Datasets - CoT
Papers - XAI - Attention - MLP - Partitioning - Affine Maps
Papers - XAI - Token Tracing - Model MLP Layers Plots
Papers - Decoders - Strategy - Beam Search - Report
Papers - RL - Gradient-Boosting
Papers - Markov Decision Process
Papers - RL - Actor-Critic
Papers - RL - GBT vs GBRL vs XGBoost
Papers - RL - Structured Data - Gradient Boosting
Papers - XGBoost
Papers - Datasets - Multimodal - Creator Guide
Papers - Decoders - Deterministic - FSD
Papers - Decoders - Deterministic - Diverse Beam Search
Papers - Decoders - Deterministic
Deterministic methods with unaligned models usually perform better on all tasks except for open-ended text generation.
Papers - Decoders - Stochastic
Papers - Decoders - Deterministic - DoLa
Papers - Decoders - Deterministic - Greedy Search
Papers - Decoders - Deterministic - Contrastive Search
Papers - Decoders - Deterministic - Contrastive Decoding
Papers - Decoders - Stochastic - Mirostat Sampling
Papers - Decoders - Stochastic - Typical Sampling
Papers - Decoders - Stochastic - Temperature Sampling
Papers - Decoders - Stochastic - Top-p Sampling
Papers - Decoders - Stochastic - Top-k Sampling
Papers - Decoders - Stochastic - n-Sampling
Papers - Benchmark - Coding - HumanEval
Papers - Coding - Datasets - MBPP
Papers - Text - Datasets - Translation - WMT22
Papers - Text - Benchmark - Translation - BLEU
Papers - Text - Benchmark - Factual Knowledge - FActScore
Papers - Text - Benchmark - Instructions - AlpacaEval
Papers - Fine-tuning - Math - QA
Papers - Image - Reasoning
Papers - Encodings - SPE - Sinusoidal Position Encoding
Papers - Encodings - LPE - Learnable Position Encodings
Papers - Text - Reasoning - Causal Chains
Papers - Text - Dataset - Knowledge Graph - WordNet
Papers - Knowledge Graph - Dataset - Text - WordNet
Papers - Knowledge Graph - GraphRag - WordNet -
Papers - CoT - Intermediate Thoughts
Papers - CoT - Branch Solve Merge (BSM)
Papers - Training - Text - Continual Learning
Models - Text - Embedding
-
nomic-ai/nomic-embed-text-v1
Sentence Similarity • 0.1B • Updated • 799k • 544 -
nomic-ai/nomic-embed-text-v1.5-GGUF
Sentence Similarity • 0.1B • Updated • 11.2k • 78 -
BAAI/bge-large-en-v1.5
Feature Extraction • 0.3B • Updated • 4.3M • • 590 -
mixedbread-ai/mxbai-embed-2d-large-v1
Feature Extraction • 0.3B • Updated • 900 • 40
Datasets - Text - Wiki - Embeddings - SBert
Papers - ICV - In-Context Vectors (controllable ICL)
Repo: https://github.com/shengliu66/ICV
Papers - Positive Geometries - Report
Papers - ICL - Attention
Papers - ICV - PCA - Directional Alignment
Papers - Text - Detoxification
Papers - Text - Datasets - Toxicity - ParaDetox
Papers - Text - Toxicity - Feature Shifting
Papers - Text - Safety
Papers - Fine-tuning - Text - Detoxification - LoRA
Papers - Text - Personalization - ICV
Papers - Text - Datasets - Formality - Yahoo Answers
Papers - Text - Role-Play - Shakespeare - Romeo and Juliet
Papers - Text - Datasets - Sentiment Transfer - Yelp Reviews
Papers - Text - Role-Play - Ranking Responses - ChatGPT
Papers - Vicuna
Papers - Text - Benchmarks - Similarity - Text - ROUGE-1
Papers - Text - Benchmark - Similar - Feature - Bert-Score
Papers - ICL - Detox - ICL Fine-tuning vs In-Context Vectors
Papers - Text - Personalization - Positivity
PPapers - Text - Safety - Diagonal Safety for Unsafe Queries
Papers - ICV - Strength - Tradeoffs Similarity and Fluency
Papers - Text - Jail break - ICV
Papers - Text - Role-Play - Style - Speaking
Papers - Text - Datasets - AGNews
Papers - Text - Activation Editing
Papers - Activation Editing - ICV
Papers - Text - Task Arithmetics - Fine-tune vs Base
Papers - ICV - Task Arithmetics
Papers - Text - Formality - Classifier - XLM-RoBERTa
Papers - Text - Sentiment - Classification
Papers - Attention - Dual Chunk
Papers - Attention - Rescale Weights - YARN
Papers - Activation - SwiGLU
Papers - Text - Training - Long Context
Papers - Training - Data Annotation
Papers - Benchmarks - Alignment - MT-Bench
Papers - Benchmarks - Text - Long Context - LV-Eval
Papers - Benchmarks - Long Context - Needle in a Haystack
Papers - Benchmarks - Text - Long Context - NeedleBench
Papers - Benchmarks - Biology
Papers - Text - Long Context
Papers - Text - Benchmarks - Reasoning - Long Context - ATC
Papers - 3DGS - Benchmarks - LPIPS
Papers - 3DGS - Scene Editing - Day vs Night - t-SNE
Papers - 3DGS - Editing - Appearance Interpolation
Papers - 3DGS - Datasets - Photo Tourism
Papers - 3DGS - Benchmarks - SSIM
Papers - 3DGS - Datasets - NeRF on-the-go
Papers - 3DGS - Fibonacci Sphere Sampling - Sky Handling
Papers - 3DGS - Uncertainty - Per-Pixel Binary Mask
Papers - Quantization - EfficientQAT
Papers - Ternary
Papers - Audio - Text - Music Generator
Papers - Quantization - AQLM
Papers - Security - Red Team - Agents
Papers - Multimodal - Benchmarks
Papers - Text - Linguistic Agency - Algospeak
Papers - Text - Cognitive Science - Participation
Papers - Text - Cognitive Science - Linguistic Agency
Papers - Text - Linguistics - Precarity - Conflict - Tension
Papers - Text - Linguistics - CYOA Game Exploration
Papers - Visualizations - Non-Euclidean Structures
Papers - Visualizations - Report
Papers - Visualizations - Topological, Geometric, Algebraic
Papers - Visualizations - High Dimensional Approximations
Papers - Image - Segmentation - High Dimensional Objects
Papers - Visualizations - Graphical Taxonomy
Papers - Visualizations - Dimensionality Reduction
-
Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures
Paper • 2407.09468 • Published • 2 -
Efficient Algorithms for t-distributed Stochastic Neighborhood Embedding
Paper • 1712.09005 • Published • 1 -
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
Paper • 1802.03426 • Published • 1 -
Principal subbundles for dimension reduction
Paper • 2307.03128 • Published • 1
Papers - Math - Non-Euclidean Geometry
Papers - Math - Visualizations
-
Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures
Paper • 2407.09468 • Published • 2 -
Barycentric Subspace Analysis on Manifolds
Paper • 1607.02833 • Published • 1 -
Template shape estimation: correcting an asymptotic bias
Paper • 1610.01502 • Published • 1 -
A Heat Diffusion Perspective on Geodesic Preserving Dimensionality Reduction
Paper • 2305.19043 • Published • 1
Papers - Math - Topology - Discrete Topological Structures
Papers - Math - Geometry - Distance - Riemannian Manifold
Papers - Math - Geometry - Distance - Riemannian Metric
Papers - Math - Geometry - Riemannian Geodesic
-
Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures
Paper • 2407.09468 • Published • 2 -
Geodesic Multi-Modal Mixup for Robust Fine-Tuning
Paper • 2203.03897 • Published • 1 -
A Heat Diffusion Perspective on Geodesic Preserving Dimensionality Reduction
Paper • 2305.19043 • Published • 1 -
A micro Lie theory for state estimation in robotics
Paper • 1812.01537 • Published • 1
Papers - Math - Research - Training Loss - Riemannian Metric
Papers - Coding - Science
Papers - Math - Geometry - Continuous Geometric Structures
Papers - Math - Algebra - Algebraic Transformations
Papers - Training - Energy - Carbon Footprint
Papers - Coding - Verilog
Papers - Coding - Hardware - FPGA
Papers - Coding - Agentic - Summarization - Prompting
Papers - Attention - Topology, Geometry and Algebra
See also: https://dawn.cs.stanford.edu/2019/10/10/noneuclidean/
Papers - Math - Structures - Topology, Geometry and Algebra
Papers - Training - Math - PCA
See also: https://dawn.cs.stanford.edu/2019/10/10/noneuclidean/
-
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
Paper • 2311.06668 • Published • 5 -
Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures
Paper • 2407.09468 • Published • 2 -
Efficient Algorithms for t-distributed Stochastic Neighborhood Embedding
Paper • 1712.09005 • Published • 1
Models - Attention - GQA
Papers - Image - PhotoMaker
Papers - Math - Algebra - Lie Group - SO(3)
Papers - Math - Group Action - Translate, Rotate, Reflect
Papers - Math - Structures in Data
Papers - Healthcare - CoT - Diagnosis
Papers - Healthcare - Medical Assistant - Diagnosis
Papers - Math - Training - Topological Structures
Papers - Topological Deep Learning - Structures in Data
Papers - Graphs
Papers - Training - Research - Data as Signals
Papers - Training - Noise - Labels
Papers - Attention - Algebra SE(d) - Fourier Nonlinearities
Papers - Attention - Algebra - Equivariant
Papers - Math - Fourier Components - Fourier Space
Papers - Encodings - Equivariant Positional Encodings
Models - Embedding - Text - BGE M3
Models - Text - Fine-tuning - SPPO
Models - Text - Fine-tuning - SPPO - Reranker
Papers - KAN
Papers - MLP
Papers - Training - Activation - Nonlinear - B-spline
Papers - Multilingual - Greek
Papers - Multilingual - Malaysian
Papers - Multilingual - Hebrew
Papers - Training - Research - Data as Coordinates
Papers - Math - Non-Euclidean Spaces - Domain and Codomain
Papers - Math - Non-Euclidean - Covariance Matrix - SPD
Papers - Math - Visualization - Non-Linear - t-SNE
2008 paper: https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf
Models - Bitnet - Layer Conversion
Models - Bitnet - Frankenmerge
Papers - Reasoning - Grokking
Paper - Non-Euclidean - Sphere - Frechet Mean - Geodesic
Papers - Math - Riemannian Manifolds - PCA
Papers - Math - PCA - Barycentric Subspace Analysis (BSA)
Papers - NEML - Frechet Mean - Consistency Bias
Papers - Math - Manifold - Metric Space - Quotient Space
Models - Image - Rectified Flow Transformers
Papers - Image - Rectified Flow Transformers
Papers - Math - Self-Compressing Models
Papers - Fine-tuning - LlamaFactory
Papers - Coding - DBA
Papers - Multimodal - Storytelling
Papers - Audio - Segmentation - Music - Vocals
Papers - Netflix
Papers - Georgia Institute of Technology
Papers - Audio - Segmentation -Cinematic Music
Papers - Image - Training - Instruct - VQA - Multi-Image
Papers - NTU
Papers - Fine-tuning - LoRA - Rank Stabilized Adapters
Papers - NEML - Manifold - Tangent Space - Exponential Map
Papers - NEML - Math - KNN with Geodesics and Frechet Mean
Recommended to explore constrained manifold areas with limited curvature (possible bias)
Papers - Math - Non-Euclidean Machine Learning (NEML)
See also: https://dawn.cs.stanford.edu/2019/10/10/noneuclidean/
Models - Coding - Compiler
Papers - NEML - Preprocessing - Topological Data Analysis
Papers - NEML - Latent Manifold - Topological Data Analysis
Papers - NEML - Preprocessing - Algebra - Group Learning
Papers - NEML - Latent Structure - Algebra - Group Learning
Papers - NEML - Transform - Euclidean to Manifold
Papers - NEML - Manifold - Local Geodesic Regression
Papers - NEML - Manifold - Bayesian - Kernel Regression
Papers - Image - Multi-Image
Papers - Benchmark - Distractions
Papers - mPLUG
Papers - Attention - Topology
Papers - Math - Regression - Geometric Structures
Papers - NEML - Manifolds Geometric - Polynomial Regression
Papers - NEML - Manifolds Geometric - Bezier Splines
Papers - NEML - Frechet Regression - Geodesic Regression
Papers - NEML - Regression - Manifold - Weighted Frechet
Papers - NEML - Regression - Stochastic - Non-Geodesic
Papers - NEML - Regression - Local Frechet Regression
Papers - NEML - Bayesian - Non-Parametric - Gaussian Process
Papers - NEML - Manifold Random Forest
Papers - NEML - Regression - Local Extrinsic
Papers - NEML - Manifold IO - Steinke Regular Splines
Papers - NEML - Manifold IO - Banerjee Kernel Regression
Papers - NEML - Geometric Structures - Dim Reduction - tSNE
Papers - NEML - Geometric Structures - Dim Reduction - UMAP
Papers - NEML - Geometric - Dimension Reduction - Isomap
Papers - NEML - Geometric - Dim Reduction - Barycentric Subs
Papers - NEML - Geometric - Dimension Reduction - Rie-SNE
Papers - NEML - Linear - Embeddings - Tangent Space PCA
Papers - NEML - Geometric - Dim Reduction - Poincare Embeds
Papers - NEML - Manifolds - VAE
Papers - NEML - Non-Euclidean Machine Learning
See also: "A Riemannian Framework for Tensor Computing" https://inria.hal.science/inria-00070743/file/RR-5255.pdf
-
Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures
Paper • 2407.09468 • Published • 2 -
Barycentric Subspace Analysis on Manifolds
Paper • 1607.02833 • Published • 1 -
Poincaré Embeddings for Learning Hierarchical Representations
Paper • 1705.08039 • Published • 1 -
Efficient Algorithms for t-distributed Stochastic Neighborhood Embedding
Paper • 1712.09005 • Published • 1
Papers - NEML - Hyberbolic - Frechet Mean - Poincare
Papers - NEML - Hyperbolic Learning - Poincare Ball
Papers - NEML - Poincare Ball
Papers - NEML - Datasets - WordNet
Papers - Science - Discovery
Papers - NEML - Euclidean Latents - Decoder - Riemannian LLE
Papers - NEML - Euclid Latents - Nongeodesic Sub Man - VAE
Papers - NEML - Manifold Latents - Hypersphere VAE
Papers - NEML - Manifold Latents - Lie Group Latent Space
Papers - NEML - Manifold Latents - Toroidal Latent Space
Papers - NEML - Manifold - Nonparametric Decoder - GPLVM
Papers - NEDL - Non-Euclidean Deep Learning
Also see: https://dawn.cs.stanford.edu/2019/10/10/noneuclidean/
-
Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures
Paper • 2407.09468 • Published • 2 -
Architectures of Topological Deep Learning: A Survey on Topological Neural Networks
Paper • 2304.10031 • Published • 3 -
Galois Theory
Paper • 2408.07499 • Published • 1 -
Equivariant Transformer Networks
Paper • 1901.11399 • Published • 1
Papers - NEDL - Model Layer - Euclidean - MLP
Papers - NEDL - Layer - Perceptron-Exp - Riemannian Expo Map
The manifold needs to be known for this layer to be implemented, and manifolds whose Exp enjoys an analytical expression are preferred
Papers - NEDL - Layer - Log Perceptron - Riemannian Log Map
The manifold needs to be known for this layer to be implemented, and manifolds whose Log enjoys an analytical expression are preferred.
Papers - NEDL - Model Layers - Topology, Geometry, Alegbra
Papers - NEDL - Benchmarks - Topology Deep Learning (TDL)
Papers - NEDL - Attention - Equivar - Steerable Transformers
Euclidean signal on Euclidean domains for keys, queries and values, with group action on codomain.
Papers - NEDL - Attention - Equivariance - LieTransformer
Euclidean signal on manifold domain for all inputs / outputs, with domain group action.
Papers - NEDL - Geometry - Layers - ManifoldNet
Manifold-valued data convolutions. Tangent mean can be used for Fréchet mean to save on compute
Papers - Monte-Carlo Tree Search - MCTS
Papers - Function Calling
Papers - Function Calling - LLM Compiler - Parallel
Papers - Agent - Web Navigation
Papers - Video Games - Image - Understanding - QA
Papers - Video - Segmentation
Papers - Multimodal - Blip-3
Papers - Image - Summarize as JSON
Papers - Math - Polynomial Symmetry - Galois Theory
Papers - NEDL - Research - Symmetry - Group - Galois Groups
Papers - NEDL - Topological Deep Learning (TDL)
-
Architectures of Topological Deep Learning: A Survey on Topological Neural Networks
Paper • 2304.10031 • Published • 3 -
Adaptive Topological Feature via Persistent Homology: Filtration Learning for Point Clouds
Paper • 2307.09259 • Published • 1 -
Persistent homology of the cosmic web. I: Hierarchical topology in ΛCDM cosmologies
Paper • 2011.12851 • Published • 1
Papers - NEDL - Topology - Persistent Homology
Papers - Security - Benchmark
Papers - Music - Piano - Performer - Robot - Motion
Papers - Music - Training - Performer - Finger Location
Papers - Music - Training - Segmentation - Piano
Papers - Music - Training - Annotation - Piano
Papers - Audio - Pipeline - Annotation - Finger Placement
Papers - NEDL - Equivariant Transformers
Papers - NEDL - Lie Groups
Papers - NEDL - Hyperbolic Rotation
Papers - NEDL - Embeddings - Hyperbolic
Papers - NEDL - Dim Redct - Principal Geodesics Analysis PGA
Papers - Benchmark - Tables - Reasoning - QA
Papers - Normalization - NLP - Power vs Batch
Papers - Normalization - NLP - Layer vs Batch
Papers - Normalization - Embedding Layer - SVD
Papers - Normalization - No Normalization - Fixup
Papers - Training - Initialization - Regularization - Fixup
Papers - ResNet - Training - Init - Exploding Gradients
Papers - ResNet - Activation - nonlinear ReLU
Papers - Training - Layers - Scalar - Bias and Multipliers
Papers - Training - Regularization - MixUp Regularizer
-
Fixup Initialization: Residual Learning Without Normalization
Paper • 1901.09321 • Published • 1 -
RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and Out Distribution Robustness
Paper • 2206.14502 • Published • 1 -
MixUp as Locally Linear Out-Of-Manifold Regularization
Paper • 1809.02499 • Published • 1
Papers - Training - Feature Space Cluster - Fisher Criterion
Papers - NEDL - Topology - Attention - Set Transformer
Papers - NEDL - Topology - Attn - Point Cloud Transformer
Papers - NEDL - Topology - Attention - Geodesic Transformer
Papers - NEDL - Topology - Attn - Graph Attn Transformer
Papers - NEDL - Topology - Attention - SE(3) Transformer
Papers - NEDL - Dim Reduction - Principal Geodesic Analysis
Papers - Text - Controllable Text Generation (CTG)
Papers - NEDL - Latent Space Manipulation
Papers - Training - Unlearning
Papers - Text - Survey
Papers - MoE - Jamba
Papers - Text - Benchmark - QA - Knowledge Conflicts
Spaces - Image - Segmentation
Spaces - Image - Prompt with LoRA
Spaces - Multimodal - Image Generation - Text and Image
Papers - NEDL - Geometry - Wasserstein Manifold
Papers - Multimodal - Alignment Correspondence Policy
Models - Image - Llava
Models - Image - SDXL
Papers - Training - Multi-Task Learning - Jacobian Descent
Repo: https://github.com/TorchJD/torchjd
Papers - Training - Loss - Multiple Loss - Jacobian Descent
Papers - Training - Hardware - Survey
Papers - NEDL - Topology - Lifting Topological Domains
Papers - Benchmarks - Data Science