fxmarty
/

logs

Model card Files Files and versions Community

logs / index.html

fxmarty

Upload folder using huggingface_hub

442e163 verified 3 months ago

raw

history blame contribute delete

9.92 kB


	<html>
	<head>
	<meta charset="UTF-8">
	</head>
	<style>

	table td { vertical-align: top; }

	.stack-trie { white-space: nowrap; font-family: monospace; }
	.stack-trie ul { padding-left: 1ch; }
	.stack-trie li { margin-left: 1ch; list-style-type: none; }
	.stack-trie .marker {
	cursor: pointer;
	}
	.stack-trie .marker.collapsed::before {
	content: "+ ";
	}
	.stack-trie .marker:not(.collapsed)::before {
	content: "- ";
	}
	.stack-trie a { text-decoration: none; }
	.stack-trie a:hover { text-decoration: underline; }
	.status-missing { background-color: purple; color: white; }
	.status-error { background-color: red; color: white; }
	.status-empty { background-color: white; color: black; }
	.status-ok { background-color: green; color: white; }
	.status-break { background-color: lime; color: black; }
	summary::-webkit-details-marker { color: #00ACF3; font-size: 125%; margin-right: 2px; }
	summary:focus { outline-style: none; }
	article > details > summary { font-size: 28px; margin-top: 16px; }
	details > p { margin-left: 24px; }
	details details summary { font-size: 16px; }

	</style>
	<script>

	function toggleList(toggleItem) {
	const listItem = toggleItem.parentNode;
	const nestedList = listItem.querySelector('ul');
	if (nestedList) {
	nestedList.style.display = nestedList.style.display === 'none' ? 'block' : 'none';

	// Toggle the collapse/expand indicator
	toggleItem.classList.toggle('collapsed');
	}
	}

	</script>
	<body>
	<div>

	<h2>Stack trie</h2>
	<p>
	The <strong>stack trie</strong> is a way of getting a quick orientation on where all the
	compilations in a model take place, esp., if you are compiling a codebase you are unfamiliar with.
	It is a tree of stack frames, for all stacks that triggered PT2 compilation. If only a single
	stack is in the tree, you will simply see a plain list of frames (most recent call last). With
	multiple stacks, at every point where two stacks diverge from having a common prefix, we increase
	the indentation of the list and have a separate sub-list per sub-tree.
	</p>
	<p>
	Links to particular compilation are color coded by status:
	<span class="status-ok">[Success]</span>,
	<span class="status-break">[Success with restart (e.g., graph break)]</span>,
	<span class="status-empty">[Empty graph]</span>,
	<span class="status-error">[Error]</span>,
	<span class="status-missing">[Metrics were missing]</span>
	</p>
	<details><summary>Stack</summary><div class='stack-trie'><ul><li>/shared_volume/repos/quark/bench_qdq.py:161 in <module><br>    mean, median = do_bench(run_scaled_fake_quantize_comp, kwargs_scaled_fake_quantize, num_runs=num_runs, num_warmup=num_warmup, name="quark qdq")</li>
	<li>/shared_volume/repos/quark/bench_qdq.py:70 in do_bench<br>    f(**kwargs)</li>
	<li><a href='#[0/0]' class='status-ok'>[0/0]</a> /shared_volume/repos/quark/bench_qdq.py:7 in run_scaled_fake_quantize<br>    </li>
	</ul></div></details>
	</div>
	<div>

	<h2>IR dumps</h2>
	<p>
	The <strong>IR dumps</strong> collected dumped intermediate products from various points of the PT2
	compilation process. The products are organized by compile id, and then sorted in chronological
	order.
	</p>
	<p>
	A <strong>compile id</strong> uniquely identifies are particular compilation inside a PT2
	program. It is traditionally written as <code>[x/y]</code>, where the <strong>frame id</strong> x
	identifies the particular Python frame which we are compiling, and <strong>frame compile
	id</strong> y identifies how many times we've recompiled this same frame. For example,
	<code>[0/0]</code> refers to the very first frame compiled by PT2; <code>[0/1]</code> refers to the
	first recompilation of this frame, while <code>[1/0]</code> refers to a different frame, within
	distinct code cache, which we are compiling next (perhaps because of a graph break). Although
	Dynamo treats distinct frames as completely unrelated, a frame compilation could overlap with another
	frame; for example, if you graph break in an inlined function, Dynamo will typically try to compile
	the nested frame again on an inner frame. You can identify the hierarchical relationship between
	frames by looking at the stack trie above.
	</p>
	<p>
	In some situations, the compile id will have an extra signifier <code>[x/y_z]</code>, where z is the
	<strong>attempt</strong> for this particular (re)compilation. Certain conditions will cause Dynamo to
	restart analysis, when Dynamo discovers that it needs to undo a decision it previously made. The most
	common cause of recompilation is a graph break in an inlined function call, which forces to restart
	and avoid inlining the function in the first place.
	</p>
	<p>
	When compiled autograd is enabled, the compile id will include a prefix signifier <code>[!a/x/y]</code>,
	where a is the <strong>compiled autograd id</strong>. For instance, <code>[!0/-/-]</code> refers
	to the first graph captured by compiled autograd. It is then traced by torch.compile as <code>[!0/x/y_z]</code>.
	</p>
	<p>
	Here is a high level description of PT2's compilation phases, and the intermediate products each
	phase generates:
	</p>
	<ol>
	<li><em>Optional:</em> If compiled autograd is enabled, and we are processing a backward call, compiled autograd will trace the autograd graph from the autograd engine, and produce an FX graph <code>compiled_autograd_graph</code> that will be Dynamo traced. Otherwise, Dynamo will directly trace user's bytecode.</li>
	<li>Dynamo symbolically evaluates the Python bytecode of a program, producing <code>dynamo_output_graph</code></li>
	<li><em>Optional:</em> If <code>optimize_ddp</code> is enabled, the DDPOptimizer will split the Dynamo output graph to improve pipelining communications. Each split subgraph is <code>optimize_ddp_split_child_submod</code>, and the high level graph that plumbs the graphs together is <code>optimize_ddp_split_graph</code>. If there are multiple splits, each subsequent build product will be produced multiple times, one for each split.</li>
	<li>AOTAutograd traces the (possibly split) Dynamo output graph, producing a <code>aot_joint_graph</code> if backwards is enabled. It then partitions the graph into <code>aot_forward_graph</code> and <code>aot_backward_graph</code>. If training is not needed, there may only be an <code>aot_inference_graph</code>.</li>
	<li>Inductor will apply some post grad FX passes, producing <code>inductor_post_grad_graph</code></li>
	<li>Inductor will perform code generation, producing the final <code>inductor_output_code</code> which will be executed at runtime. This output is a valid Python program and can be directly run.</li>
	</ol>


	<h2> Chromium Events </h2>
	PT2 generates <a href='chromium_events.json'>Chromium Trace Events</a> in JSON on specific events during compilation.
	You can download and view them in a tool like <a href='https://ui.perfetto.dev/'>Perfetto</a>.

	<p>
	Build products below:
	</p>
	<ul>

	<li><a id="[0/0]">[0/0]</a>
	<ul>

	<li><a href="-_0_0_0/dynamo_output_graph_0.txt">-_0_0_0/dynamo_output_graph_0.txt</a> (0)</li>

	<li><a href="-_0_0_0/inductor_pre_grad_graph_1.txt">-_0_0_0/inductor_pre_grad_graph_1.txt</a> (1)</li>

	<li><a href="-_0_0_0/before_recompile_pre_grad_2.txt">-_0_0_0/before_recompile_pre_grad_2.txt</a> (2)</li>

	<li><a href="-_0_0_0/after_recompile_pre_grad_3.txt">-_0_0_0/after_recompile_pre_grad_3.txt</a> (3)</li>

	<li><a href="-_0_0_0/aot_forward_graph_fw_metadata_4.txt">-_0_0_0/aot_forward_graph_fw_metadata_4.txt</a> (4)</li>

	<li><a href="-_0_0_0/aot_inference_graph_5.txt">-_0_0_0/aot_inference_graph_5.txt</a> (5)</li>

	<li><a href="-_0_0_0/torch._functorch.config_6.txt">-_0_0_0/torch._functorch.config_6.txt</a> (6)</li>

	<li><a href="-_0_0_0/inductor_output_code_ch44xxkifazlcpkp6mi44xhqeej2j5mbgwmesiwx6y3oajzmixxp_7.html">-_0_0_0/inductor_output_code_ch44xxkifazlcpkp6mi44xhqeej2j5mbgwmesiwx6y3oajzmixxp_7.html</a> (7)</li>

	<li><a href="-_0_0_0/fx_graph_cache_hit_8.json">-_0_0_0/fx_graph_cache_hit_8.json</a> ✅ (8)</li>

	<li><a href="-_0_0_0/aotautograd_cache_bypass_9.json">-_0_0_0/aotautograd_cache_bypass_9.json</a> ❓ (9)</li>

	<li><a href="-_0_0_0/dynamo_cpp_guards_str_10.txt">-_0_0_0/dynamo_cpp_guards_str_10.txt</a> (10)</li>

	<li><a href="-_0_0_0/compilation_metrics_11.html">-_0_0_0/compilation_metrics_11.html</a> (11)</li>

	</ul>
	</li>

	</ul>
	</div>






	<script>
	document.addEventListener('DOMContentLoaded', function() {

	// Append the current URL's query parameters to all relative links on the page
	const queryParams = new URLSearchParams(window.location.search);
	if (queryParams.size === 0) return url; // No query params, return original URL

	function appendQueryParams(url) {
	const newURL = new URL((new Request(url)).url); // new URL(<relative URL>) but it actually works
	const newSearchParams = new URLSearchParams(newURL.searchParams);
	console.log(newURL.searchParams);
	console.log(newSearchParams);

	// Append query parameters
	for (const [key, value] of queryParams) {
	newSearchParams.set(key, value);
	}

	newURL.search = newSearchParams;
	return newURL;
	}

	// Select all relative links on the page
	const relativeLinks = document.querySelectorAll('a[href]:not([href^="http://"]):not([href^="https://"]):not([href^="\#"])');

	// Append query parameters to each relative link
	relativeLinks.forEach((link) => {
	link.setAttribute("href", appendQueryParams(link.getAttribute("href")))
	});
	});
	</script>

	</body>
	</html>