Spaces:

bilegentile
/

test

Runtime error

App Files Files Community

test / wiki /SD-Training-Methods.md

bilegentile

Upload folder using huggingface_hub

c19ca42 verified over 1 year ago

preview code

raw

history blame contribute delete

3.79 kB

	# stable diffusion training methods

	## fine-tuning

	- retrains parts of the hypernetwork with new data thus modifying original weights
	requires large and precisely labelled dataset
	- size is same as original model size, ~2-7gb
	- verdict: prohibitive due to large dataset and effort required

	## model merge

	- combines weights from multiple models according to specified rules
	- verdict: highly desired to create pre-set models for specific use-case

	## textual inversion

	- assign vector to a new concept with originally one vector per embedding, hacks to enable multi-vector embeddings
	works by expanding vocabulary of a model, but majority of learned content is actually assembled from existing concepts
	can be considered as a formula on which already learned weights should be combined to achieve learned concept
	- size 768/1024b per vector
	- verdict: best currently viable short-term training solution

	## aesthetic gradient

	- uses low-precision trained embeddings to steer clip using classifier guidance
	training is very cheap, but classifier guidance sloes down image generation
	result is basic transfer of style from learned image to generated image
	- size is same as embedding
	- origin: independent work
	- verdict: inconsistent results with minimal value

	## custom diffusion

	- fine-tuning specific model matrices with textual inversion
	similar speed and memory requirements to embedding training and supposedly gives better results in less steps
	- size ~50mb
	- origin: cmu
	- verdict: possibly promising, requires further investigation, surprisingly low chatter on this topic

	## hypernetwork

	- similar to model fine-tuning, but adds small a small neural network that on-the-fly modifies weights of the last two layers of the main model
	works like adaptive head that steers model in a learned direction so primary use-case is style transfer, not concept transfer
	- size is limited to learned layers, ~100-200mb
	- origin: leaked from novel.ai
	- verdict: lower priority as concept transfer is more important than style transfer

	## null-text inversion

	- similar concept to textual inversion, but trains unconditional embedding that is used for classifier free guidance instead of text embedding
	resulting embedding is apparently more detailed than standard textual embedding
	- size is larger but comparable to textual inversion
	- origin: google
	- verdict: possibly promising, requires further investigation, but no working prototype as of yet

	## clip inversion

	- similar concept to textual inversion, but uses clip embedding instead of text embedding
	- size is same as textual inversion
	- origin: google
	- verdict: prohibitive due to requirement of specially fine-tuned model as a starting point

	## dream artist

	- variation on ti training where both positive and negative embeddings are created
	- size is same as textual inversion
	- origin: independent work
	- verdict: skip for now as solution does not appear to be sufficiently maintained

	## dreambooth

	- similar to model fine-tuning except it adds information on top of model instead of forgetting/overwriting existing concepts
	- size is equal to original model size, ~2-7gb
	- origin: google, but heavily modified by independent work
	- verdict: prohibitive due to resulting size and requirement to load full model on-demand

	## lora

	- "low-rank adaptation of large language models"
	injects trainable layers to steer cross attention layers
	very flexible, but memory intensive so limited training opportunities on normal gpu
	multiple incompatible implementations: should choose which implementation to use
	- size varies from ~5mb to full-model size, average ~150-300mb
	- origin: microsoft
	- verdict: very promising, but memory prohibitive until further optimizations