diff --git a/README.md b/README.md
index 69a18775136..00cf1dad925 100644
--- a/README.md
+++ b/README.md
@@ -71,6 +71,7 @@ For the latest model updates and features, please see [MODEL_UPDATES.md](models/
- [Advanced Performance Optimizations for Models](./tech_reports/AdvancedPerformanceOperationsForModels/AdvancedPerformanceOptimizationsForModels.md) (updated Oct 17th)
- [Programming Mesh of Devices](./tech_reports/Programming%20Mesh%20of%20Devices/Programming%20Mesh%20of%20Devices%20with%20TT-NN.md) (updated Sept 9th)
- [ViT Implementation in TT-NN on GS](./tech_reports/ViT-TTNN/vit.md) (updated Sept 22nd)
+- [LLMs Bring up in TT-NN](./tech_reports/LLMs/llms.md) (updated Oct 29th)
---
diff --git a/tech_reports/LLMs/llms.md b/tech_reports/LLMs/llms.md
new file mode 100644
index 00000000000..4b4a34f6a7c
--- /dev/null
+++ b/tech_reports/LLMs/llms.md
@@ -0,0 +1,112 @@
+# LLMs in TT-NN
+Authors:
+## Contents
+- [LLMs in TT-NN](#llms-in-tt-nn)
+ - [Contents](#contents)
+ - [1. Overview](#1-overview)
+ - [2. Modules](#2-modules)
+ - [2.1 Embedding](#21-embedding)
+ - [2.2 RoPE](#22-rope)
+ - [2.3 Norm](#23-norm)
+ - [2.4 Attention](#24-attention)
+ - [2.5 MLP](#25-mlp)
+ - [2.6 Decoder](#26-decoder)
+ - [2.7 LM Head](#27-lm-head)
+ - [3. Features](#3-features)
+ - [3.1 Generative Decoding](#31-generative-decoding)
+ - [3.2 Prefill and Decode](#32-prefill-and-decode)
+ - [3.3 Multi-Device](#33-multi-device)
+ - [3.4 Continuous Batching](#34-continuous-batching)
+ - [3.5 vLLM Integration](#34-vllm-integration)
+ - [4. Best Practices and Optimizations](#4-best-practices-and-optimizations)
+ - [4.1 Tracing](#41-tracing)
+ - [4.2 Async Mode](#42-async-mode)
+ - [4.3 Multiple CQs](#43-multiple-cqs)
+ - [4.4 Op Configs](#44-op-configs)
+ - [4.5 Accuracy](#45-accuracy)
+ - [4.6 Performance Analysis](#46-performance-analysis)
+ - [4.7 Misc. Performance Optimizations](#47-misc-performance-optimizations)
+ - [4.8 Module Tests](#48-module-tests)
+ - [4.9 Performance Testing](#49-performance-testing)
+ - [4.10 Common Pitfalls](#410-common-pitfalls)
+ - [4.10.1 Error Messages](#4101-error-messages)
+ - [4.10.2 Shard Spec Mismatches](#4102-shard-spec-mismatches)
+ - [4.10.3 Ethernet Dispatch Cores](#4103-ethernet-dispatch-cores)
+ - [4.10.4 Hangs](#4104-hangs)
+ - [4.10.4.1 Tracing](#41041-tracing)
+ - [4.10.4.2 Large Matmuls](#41042-large-matmuls)
+
+## 1. Overview
+## 2. Modules
+### 2.1 Embedding
+### 2.2 RoPE
+ - Iterative update system
+ - When to use our fused op
+### 2.3 Norm
+ - Replicated layernorm vs distributed layernorm
+ - Layernorm/rmsnorm weights in row major / wrapped around tile size trick
+### 2.4 Attention
+ - Flash Attention and Flash Decode
+ - general description
+ - limitations
+ - which dims are parallelized
+### 2.5 MLP
+### 2.6 Decoder
+### 2.7 LM Head
+## 3. Features
+### 3.1 Generative Decoding
+### 3.2 Prefill and Decode
+ - submodules, tests
+ - how to combine prefill and decode,
+ - slicing prefill to fit in L1
+### 3.3 Multi-Device
+ - device mesh
+ - column parallel followed by row parallel
+ - sharding, CCL ops, reducing CCL overheads, etc.
+### 3.4 Continuous Batching
+ - quick intro and how it is implemented in demos.
+### 3.5 vLLM Integration
+ - Our vLLM repo and what's needed to integrate with it.
+## 4. Best Practices and Optimizations
+### 4.1 Tracing
+ - link to existing doc, why it helps decode more
+### 4.2 Async Mode
+### 4.3 Multiple CQs
+ - how to feed back output to input and read output asyncronously
+### 4.4 Op Configs
+ - Writing correct program configs and shard specs
+ - Deciding how many cores to run an op on
+ - Why did we use 16 cores for MLP
+ - Which matmul to use when @Colman Glagovich
+ - 1d, 2d, dram-sharded, ...
+ - Implicitly padding weights in program config for matmuls
+### 4.5 Accuracy
+ - How we measure it (PCC, perplexity, top-1/top-5, end-user tests, benchmarking)
+ - How much PCC is enough? Rules of thumb.
+ - Accuracy tests
+ - Debugging PCC issues
+### 4.6 Performance Analysis
+ - Performance tooling, tracy
+### 4.7 Misc. Performance Optimizations
+ - Which dim to shard matmuls on
+ - DRAM-sharding
+ - Avoiding sharded to interleaved calls
+### 4.8 Module Tests
+### 4.9 Performance Testing
+### 4.10 Common Pitfalls
+#### 4.10.1 Error Messages
+ - Running out of L1
+ - Shard spec and program config mismatches
+ - For some TTNN ops (e.g. ttnn.all_gather) it's not supported to pass -1 in the dim argument.
+ - You'll see an error related to op invocation where the arguments don't match
+#### 4.10.2 Shard Spec Mismatches
+#### 4.10.3 Ethernet Dispatch Cores
+ - link to any other description, and mention it is needed for N300 and T3K
+#### 4.10.4 Hangs
+##### 4.10.4.1 Tracing
+ - Host communications cause tracing to hang
+ - Running without async mode enabled causes tracing to hang
+ - Careful with print in tracing
+##### 4.10.4.2 Large Matmuls
+ - Large matmuls hanging? Link to appropriate ticket with workaround
+ - Issue is being investigated with a workaround of setting the output subblock to 1,1 and grid size to 8x7