Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Backdoors research
non-profit
Activity Feed
Follow
6
AI & ML interests
Mechinterp, AI safety
Recent Activity
abir-hr196
authored
a paper
about 19 hours ago
Activation Space Interventions Can Be Transferred Between Large Language Models
abir-hr196
authored
a paper
about 20 hours ago
TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research
amirabdullah19852020
authored
a paper
over 1 year ago
Beyond Training Objectives: Interpreting Reward Model Divergence in Large Language Models
View all activity
Team members
6
martian-mech-interp-grant
's models
4
Sort: Recently updated
martian-mech-interp-grant/code-backdoor-sft-llama3.1-8b-v0
8B
•
Updated
Oct 12, 2024
•
31
martian-mech-interp-grant/code-backdoor-sft-pythia-410m-v0
0.4B
•
Updated
Oct 11, 2024
•
5
martian-mech-interp-grant/code-backdoor-sft-gemma2-2b-v0
3B
•
Updated
Oct 11, 2024
•
6
martian-mech-interp-grant/code-backdoor-sft-pythia-1.4b-v0
1B
•
Updated
Oct 11, 2024
•
9