diff --git a/latest/CODE_OF_CONDUCT.html b/latest/CODE_OF_CONDUCT.html
index 49b420108..64d878afe 100644
--- a/latest/CODE_OF_CONDUCT.html
+++ b/latest/CODE_OF_CONDUCT.html
@@ -4,7 +4,7 @@
-
Contributor Covenant Code of Conduct — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Contributor Covenant Code of Conduct — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/SECURITY.html b/latest/SECURITY.html
index 6c0bb02b7..7daa7cc66 100644
--- a/latest/SECURITY.html
+++ b/latest/SECURITY.html
@@ -4,7 +4,7 @@
- Security Policy — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Security Policy — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/_sources/examples/README.md.txt b/latest/_sources/examples/README.md.txt
index b9e164af3..073468f7d 100644
--- a/latest/_sources/examples/README.md.txt
+++ b/latest/_sources/examples/README.md.txt
@@ -15,3 +15,4 @@ A wide variety of examples are provided to demonstrate the usage of Intel® Exte
|[Stable Diffusion Inference for Text2Image on Intel GPU](./stable_diffussion_inference)|Example for running Stable Diffusion Text2Image inference on Intel GPU with the optimizations from Intel® Extension for TensorFlow*.|GPU|
|[Accelerate ResNet50 Training by XPUAutoShard on Intel GPU](./train_resnet50_with_autoshard)|Example on running ResNet50 training on Intel GPU with the XPUAutoShard feature.|GPU|
|[Accelerate BERT-Large Pretraining on Intel GPU](./pretrain_bert)|Example on running BERT-Large pretraining on Intel GPU with the optimizations from Intel® Extension for TensorFlow*.|GPU|
+|[Accelerate 3D-UNet Training w/o horovod for medical image segmentation on Intel GPU](./train_3d_unet)|Example on running 3D-UNet training for medical image segmentation on Intel GPU with the optimizations from Intel® Extension for TensorFlow*.|GPU|
\ No newline at end of file
diff --git a/latest/_sources/examples/train_3d_unet/README.md.txt b/latest/_sources/examples/train_3d_unet/README.md.txt
new file mode 100644
index 000000000..c1f1b7298
--- /dev/null
+++ b/latest/_sources/examples/train_3d_unet/README.md.txt
@@ -0,0 +1,125 @@
+# Accelerate 3D-Unet Training w/o horovod for medical image segmentation on Intel GPU
+
+## Introduction
+
+Intel® Extension for TensorFlow* is compatible with stock TensorFlow*.
+This example shows 3D-UNet Training for medical image segmentation. It contains single-tile training scripts and multi-tile training scripts with horovod.
+
+Install the Intel® Extension for TensorFlow* in legacy running environment, Tensorflow will execute the Training on Intel GPU.
+
+## Hardware Requirements
+
+Verified Hardware Platforms:
+
+ - Intel® Data Center GPU Max Series
+
+## Prerequisites
+
+### Model Code change
+
+To get better performance, instead of installing the official repository, you can apply the patch and install it as shown here. You can choose one patch from single-tile patch `3dunet_itex.patch` and multi-tile patch `3dunet_itex_with_horovod.patch`.
+
+```
+git clone https://github.com/NVIDIA/DeepLearningExamples.git
+cd DeepLearningExamples/TensorFlow/Segmentation/UNet_3D_Medical/
+git checkout 88eb3cff2f03dad85035621d041e23a14345999e
+git apply patch # When applying this patch, please move it to the above 3D-UNet dir first.
+```
+
+### Prepare for GPU
+
+Refer to [Prepare](../common_guide_running.md##Prepare).
+
+### Setup Running Environment
+
+You can use `./pip_set_env.sh` to setup for GPU. It contains the following two steps: creating virtual environment and installing python packages.
+
++ Create Virtual Environment
+
+```
+python -m venv env_itex
+source source env_itex/bin/activate
+```
+
++ Install
+
+```
+pip install --upgrade pip
+pip install --upgrade intel-extension-for-tensorflow[gpu]
+pip install intel-optimization-for-horovod
+pip install tfa-nightly
+pip install git+https://github.com/NVIDIA/dllogger.git
+```
+
+### Enable Running Environment
+
+Enable oneAPI running environment (only for GPU) and virtual running environment.
+
+ * For GPU, refer to [Running](../common_guide_running.md##Running)
+
+### Prepare Dataset
+
+We use [Brain Tumor Segmentation 2019 dataset](https://www.med.upenn.edu/cbica/brats-2019/) for 3D-UNet training. Upon registration, the challenge's data is made available through the `https//ipp.cbica.upenn.edu` service.
+
+The training and test datasets are given as 3D `nifti` volumes that can be read using the Nibabel library and NumPy. It can be converted from `nifti` to `tfrecord` using `./dataset/preprocess_data.py` script.
+
+## Execute the Example
+
+Assume current_dir is `examples/train_maskrcnn/DeepLearningExamples/TensorFlow/Segmentation/UNet_3D_Medical/`.
+
+Here we provide single-tile training scripts and multi-tile training scripts with horovod. The datatype can be float32 or bfloat16.
+
+```
+DATASET_DIR=/the/path/to/dataset
+OUTPUT_DIR=/the/path/to/output_dir
+```
+
+### Single Tile
+
+First apply patch.
+
+```
+git apply 3dunet_itex.patch
+```
+
++ float32
+
+```
+python main.py --benchmark --data_dir $DATASET_DIR --model_dir $OUTPUT_DIR --exec_mode train --batch_size $BATCH_SIZE --warmup_steps 150 --max_steps 1000 --log_every 1
+```
+
++ bfloat16
+
+```
+python main.py --benchmark --data_dir $DATASET_PATH --model_dir $OUTPUT_DIR --exec_mode train --warmup_steps 150 --max_steps 1000 --batch_size=$BATCH_SIZE --log_every 1 --amp
+```
+
+### Multi-tile with horovod
+
+First apply patch.
+
+```
+git apply 3dunet_itex_with_horovod.patch
+```
+
++ float32
+
+```
+mpirun -np 2 -prepend-rank -ppn 2 \
+ python main.py --data_dir=$DATASET_DIR --benchmark --model_dir=$MODEL_DIR --exec_mode train --warmup_steps 150 --max_steps 1000 --batch_size=$BATCH_SIZE
+```
+
++ bf16
+
+```
+mpirun -np 2 -prepend-rank -ppn 2 \
+ python main.py --data_dir=$DATASET_DIR --benchmark --model_dir=$MODEL_DIR --exec_mode train --warmup_steps 150 --max_steps 1000 --batch_size=$BATCH_SIZE --amp
+```
+
+## FAQ
+
+1. If you get the following error log, refer to [Enable Running Environment](#Enable-Running-Environment) to Enable oneAPI running environment.
+
+```
+tensorflow.python.framework.errors_impl.NotFoundError: libmkl_sycl.so.2: cannot open shared object file: No such file or directory
+```
\ No newline at end of file
diff --git a/latest/_static/documentation_options.js b/latest/_static/documentation_options.js
index 61055d351..88f3944cc 100644
--- a/latest/_static/documentation_options.js
+++ b/latest/_static/documentation_options.js
@@ -1,6 +1,6 @@
var DOCUMENTATION_OPTIONS = {
URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'),
- VERSION: '0.1.dev1+g53bc0e2',
+ VERSION: '0.1.dev1+g1cf4e36',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/latest/docker/README.html b/latest/docker/README.html
index 62dda4e1b..714b67a0c 100644
--- a/latest/docker/README.html
+++ b/latest/docker/README.html
@@ -4,7 +4,7 @@
- Intel® Extension for TensorFlow* Docker Container Guide — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Intel® Extension for TensorFlow* Docker Container Guide — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docker/tensorflow-serving/README.html b/latest/docker/tensorflow-serving/README.html
index 0e60e192b..33d165cf1 100644
--- a/latest/docker/tensorflow-serving/README.html
+++ b/latest/docker/tensorflow-serving/README.html
@@ -4,7 +4,7 @@
- Intel® Extension for TensorFlow* Serving - Docker Container Guide — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Intel® Extension for TensorFlow* Serving - Docker Container Guide — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/README.html b/latest/docs/README.html
index 10c47745a..84243c4a0 100644
--- a/latest/docs/README.html
+++ b/latest/docs/README.html
@@ -4,7 +4,7 @@
- Welcome to Intel® Extension for TensorFlow* documentation — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Welcome to Intel® Extension for TensorFlow* documentation — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/build_docs/docs_build_tips.html b/latest/docs/build_docs/docs_build_tips.html
index 56c120710..bc7ada793 100644
--- a/latest/docs/build_docs/docs_build_tips.html
+++ b/latest/docs/build_docs/docs_build_tips.html
@@ -4,7 +4,7 @@
- Online Documentation Build Guide — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Online Documentation Build Guide — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/build_docs/source/index.html b/latest/docs/build_docs/source/index.html
index 52c8a1eae..14fe16fdc 100644
--- a/latest/docs/build_docs/source/index.html
+++ b/latest/docs/build_docs/source/index.html
@@ -4,7 +4,7 @@
- Welcome to Intel ® Extension for TensorFlow* documentation! — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Welcome to Intel ® Extension for TensorFlow* documentation! — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/community/contributing.html b/latest/docs/community/contributing.html
index 520210c05..4ec184146 100644
--- a/latest/docs/community/contributing.html
+++ b/latest/docs/community/contributing.html
@@ -4,7 +4,7 @@
- Contributing guidelines — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Contributing guidelines — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/community/releases.html b/latest/docs/community/releases.html
index 74622d4c0..d971f02ac 100644
--- a/latest/docs/community/releases.html
+++ b/latest/docs/community/releases.html
@@ -4,7 +4,7 @@
- Releases — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Releases — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/design/directory_structure.html b/latest/docs/design/directory_structure.html
index e6e93f0f2..99e5b907d 100644
--- a/latest/docs/design/directory_structure.html
+++ b/latest/docs/design/directory_structure.html
@@ -4,7 +4,7 @@
- Directory Tree Structure — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Directory Tree Structure — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/design/extension_design.html b/latest/docs/design/extension_design.html
index aa6f25092..0f987d941 100644
--- a/latest/docs/design/extension_design.html
+++ b/latest/docs/design/extension_design.html
@@ -4,7 +4,7 @@
- Extension Design — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Extension Design — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/design/how_to_write_custom_op.html b/latest/docs/design/how_to_write_custom_op.html
index 15b6ce17b..27cb59e2a 100644
--- a/latest/docs/design/how_to_write_custom_op.html
+++ b/latest/docs/design/how_to_write_custom_op.html
@@ -4,7 +4,7 @@
- How to write custom op — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ How to write custom op — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/design/optimization/README.html b/latest/docs/design/optimization/README.html
index 160b09b20..5df51fe16 100644
--- a/latest/docs/design/optimization/README.html
+++ b/latest/docs/design/optimization/README.html
@@ -4,7 +4,7 @@
- Optimizations Design — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Optimizations Design — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/design/optimization/oneDNN_object_cache.html b/latest/docs/design/optimization/oneDNN_object_cache.html
index b9d7f1d8e..5081b0297 100644
--- a/latest/docs/design/optimization/oneDNN_object_cache.html
+++ b/latest/docs/design/optimization/oneDNN_object_cache.html
@@ -4,7 +4,7 @@
- oneDNN object cache optimization — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ oneDNN object cache optimization — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/guide/FAQ.html b/latest/docs/guide/FAQ.html
index 1bbf3d3cc..1065443a6 100644
--- a/latest/docs/guide/FAQ.html
+++ b/latest/docs/guide/FAQ.html
@@ -4,7 +4,7 @@
- Frequently Asked Questions — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Frequently Asked Questions — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/guide/INT8_quantization.html b/latest/docs/guide/INT8_quantization.html
index 9b5dceab1..5f3e263ef 100644
--- a/latest/docs/guide/INT8_quantization.html
+++ b/latest/docs/guide/INT8_quantization.html
@@ -4,7 +4,7 @@
- INT8 Quantization — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ INT8 Quantization — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/guide/OpenXLA_Support_on_GPU.html b/latest/docs/guide/OpenXLA_Support_on_GPU.html
index 811c7af32..55b04a806 100644
--- a/latest/docs/guide/OpenXLA_Support_on_GPU.html
+++ b/latest/docs/guide/OpenXLA_Support_on_GPU.html
@@ -4,7 +4,7 @@
- OpenXLA Support on GPU via PJRT — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ OpenXLA Support on GPU via PJRT — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/guide/XPUAutoShard.html b/latest/docs/guide/XPUAutoShard.html
index 259e414bc..40128cf65 100644
--- a/latest/docs/guide/XPUAutoShard.html
+++ b/latest/docs/guide/XPUAutoShard.html
@@ -4,7 +4,7 @@
- XPUAutoShard on GPU [Experimental] — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ XPUAutoShard on GPU [Experimental] — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/guide/aamp_tune.html b/latest/docs/guide/aamp_tune.html
index 3d8977894..7a957d7cf 100644
--- a/latest/docs/guide/aamp_tune.html
+++ b/latest/docs/guide/aamp_tune.html
@@ -4,7 +4,7 @@
- Tune Advanced Auto Mixed Precision — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Tune Advanced Auto Mixed Precision — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/guide/advanced_auto_mixed_precision.html b/latest/docs/guide/advanced_auto_mixed_precision.html
index 6c30370d3..d26849028 100644
--- a/latest/docs/guide/advanced_auto_mixed_precision.html
+++ b/latest/docs/guide/advanced_auto_mixed_precision.html
@@ -4,7 +4,7 @@
- Advanced Auto Mixed Precision — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Advanced Auto Mixed Precision — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/guide/environment_variables.html b/latest/docs/guide/environment_variables.html
index d7fe62f18..04a0578b8 100644
--- a/latest/docs/guide/environment_variables.html
+++ b/latest/docs/guide/environment_variables.html
@@ -4,7 +4,7 @@
- Environment Variables — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Environment Variables — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/guide/features.html b/latest/docs/guide/features.html
index ede805a66..b070a65d0 100644
--- a/latest/docs/guide/features.html
+++ b/latest/docs/guide/features.html
@@ -4,7 +4,7 @@
- Features — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Features — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/guide/how_to_enable_profiler.html b/latest/docs/guide/how_to_enable_profiler.html
index f63c0da5b..52c9eb7fa 100644
--- a/latest/docs/guide/how_to_enable_profiler.html
+++ b/latest/docs/guide/how_to_enable_profiler.html
@@ -4,7 +4,7 @@
- GPU Profiler — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ GPU Profiler — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/guide/infrastructure.html b/latest/docs/guide/infrastructure.html
index c64eeeec6..009569782 100644
--- a/latest/docs/guide/infrastructure.html
+++ b/latest/docs/guide/infrastructure.html
@@ -4,7 +4,7 @@
- Infrastructure — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Infrastructure — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/guide/itex_fusion.html b/latest/docs/guide/itex_fusion.html
index 717bde53e..09adff846 100644
--- a/latest/docs/guide/itex_fusion.html
+++ b/latest/docs/guide/itex_fusion.html
@@ -4,7 +4,7 @@
- Graph fusion — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Graph fusion — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/guide/itex_ops.html b/latest/docs/guide/itex_ops.html
index 01a2672da..f86e24c9e 100644
--- a/latest/docs/guide/itex_ops.html
+++ b/latest/docs/guide/itex_ops.html
@@ -4,7 +4,7 @@
- Customized Operators — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Customized Operators — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/guide/itex_ops_override.html b/latest/docs/guide/itex_ops_override.html
index e90c6dcda..932e2d08c 100644
--- a/latest/docs/guide/itex_ops_override.html
+++ b/latest/docs/guide/itex_ops_override.html
@@ -4,7 +4,7 @@
- Operators Override — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Operators Override — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/guide/keras_mixed_precision.html b/latest/docs/guide/keras_mixed_precision.html
index cf010a8fd..9eb16cd81 100644
--- a/latest/docs/guide/keras_mixed_precision.html
+++ b/latest/docs/guide/keras_mixed_precision.html
@@ -4,7 +4,7 @@
- Keras Mixed Precision — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Keras Mixed Precision — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/guide/launch.html b/latest/docs/guide/launch.html
index 248b874f4..ff5cb13bd 100644
--- a/latest/docs/guide/launch.html
+++ b/latest/docs/guide/launch.html
@@ -4,7 +4,7 @@
- Launch Script User Guide — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Launch Script User Guide — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/guide/practice_guide.html b/latest/docs/guide/practice_guide.html
index fd2c6afe4..023bef03c 100644
--- a/latest/docs/guide/practice_guide.html
+++ b/latest/docs/guide/practice_guide.html
@@ -4,7 +4,7 @@
- Practice Guide — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Practice Guide — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/guide/python_api.html b/latest/docs/guide/python_api.html
index 5f79396d5..50f299527 100644
--- a/latest/docs/guide/python_api.html
+++ b/latest/docs/guide/python_api.html
@@ -4,7 +4,7 @@
- Python APIs — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Python APIs — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/guide/tf_serving_install.html b/latest/docs/guide/tf_serving_install.html
index 233c4b5de..96c03f556 100644
--- a/latest/docs/guide/tf_serving_install.html
+++ b/latest/docs/guide/tf_serving_install.html
@@ -4,7 +4,7 @@
- Install TensorFlow Serving with Intel® Extension for TensorFlow* — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Install TensorFlow Serving with Intel® Extension for TensorFlow* — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/install/experimental/install_for_arc_gpu.html b/latest/docs/install/experimental/install_for_arc_gpu.html
index bccd2a9a8..75df89cbd 100644
--- a/latest/docs/install/experimental/install_for_arc_gpu.html
+++ b/latest/docs/install/experimental/install_for_arc_gpu.html
@@ -4,7 +4,7 @@
- Experimental: Intel® Arc™ A-Series GPU Software Installation — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Experimental: Intel® Arc™ A-Series GPU Software Installation — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/install/experimental/install_for_gpu_conda.html b/latest/docs/install/experimental/install_for_gpu_conda.html
index 12fa00d89..0992817fd 100644
--- a/latest/docs/install/experimental/install_for_gpu_conda.html
+++ b/latest/docs/install/experimental/install_for_gpu_conda.html
@@ -4,7 +4,7 @@
- Conda Environment Installation Instructions — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Conda Environment Installation Instructions — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/install/how_to_build.html b/latest/docs/install/how_to_build.html
index 3571b94c6..acce435c2 100644
--- a/latest/docs/install/how_to_build.html
+++ b/latest/docs/install/how_to_build.html
@@ -4,7 +4,7 @@
- Overview — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Overview — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/install/install_for_cpp.html b/latest/docs/install/install_for_cpp.html
index 953d3b3f9..891b47ea1 100644
--- a/latest/docs/install/install_for_cpp.html
+++ b/latest/docs/install/install_for_cpp.html
@@ -4,7 +4,7 @@
- Intel® Extension for TensorFlow* for C++ — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Intel® Extension for TensorFlow* for C++ — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/install/install_for_cpu.html b/latest/docs/install/install_for_cpu.html
index d77c1be58..f5e6cc01b 100644
--- a/latest/docs/install/install_for_cpu.html
+++ b/latest/docs/install/install_for_cpu.html
@@ -4,7 +4,7 @@
- Intel CPU Software Installation — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Intel CPU Software Installation — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/install/install_for_xpu.html b/latest/docs/install/install_for_xpu.html
index 5a79f9a56..4508b7ec8 100644
--- a/latest/docs/install/install_for_xpu.html
+++ b/latest/docs/install/install_for_xpu.html
@@ -4,7 +4,7 @@
- Intel XPU Software Installation — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Intel XPU Software Installation — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/docs/install/installation_guide.html b/latest/docs/install/installation_guide.html
index c5501637c..b24c198c1 100644
--- a/latest/docs/install/installation_guide.html
+++ b/latest/docs/install/installation_guide.html
@@ -4,7 +4,7 @@
- Installation Guide — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Installation Guide — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/examples/README.html b/latest/examples/README.html
index d5f505bcb..3b7d6d472 100644
--- a/latest/examples/README.html
+++ b/latest/examples/README.html
@@ -4,7 +4,7 @@
- Examples — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Examples — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
@@ -150,6 +150,11 @@ Examples
+Example on running 3D-UNet training for medical image segmentation on Intel GPU with the optimizations from Intel® Extension for TensorFlow*.
+GPU
+
diff --git a/latest/examples/accelerate_alexnet_by_quantization/README.html b/latest/examples/accelerate_alexnet_by_quantization/README.html
index 9d84921c2..08d3ca9c1 100644
--- a/latest/examples/accelerate_alexnet_by_quantization/README.html
+++ b/latest/examples/accelerate_alexnet_by_quantization/README.html
@@ -4,7 +4,7 @@
- Accelerate AlexNet by Quantization with Intel® Extension for Tensorflow* — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Accelerate AlexNet by Quantization with Intel® Extension for Tensorflow* — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/examples/common_guide_running.html b/latest/examples/common_guide_running.html
index caac6493d..32ce82bd2 100644
--- a/latest/examples/common_guide_running.html
+++ b/latest/examples/common_guide_running.html
@@ -4,7 +4,7 @@
- Common Guide for Running — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Common Guide for Running — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/examples/examples.html b/latest/examples/examples.html
index f8a1df144..42d12b215 100644
--- a/latest/examples/examples.html
+++ b/latest/examples/examples.html
@@ -4,7 +4,7 @@
- Examples — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Examples — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/examples/infer_inception_v4_amp/README.html b/latest/examples/infer_inception_v4_amp/README.html
index 0d0a01ce6..6f1ba2a9c 100644
--- a/latest/examples/infer_inception_v4_amp/README.html
+++ b/latest/examples/infer_inception_v4_amp/README.html
@@ -4,7 +4,7 @@
- Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision on Intel CPU and GPU via Docker Container or Bare Metal — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision on Intel CPU and GPU via Docker Container or Bare Metal — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/examples/infer_resnet50/README.html b/latest/examples/infer_resnet50/README.html
index ca07f6591..953aaf3e5 100644
--- a/latest/examples/infer_resnet50/README.html
+++ b/latest/examples/infer_resnet50/README.html
@@ -4,7 +4,7 @@
- ResNet50 Inference on Intel CPU and GPU — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ ResNet50 Inference on Intel CPU and GPU — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/examples/model_zoo_example/README.html b/latest/examples/model_zoo_example/README.html
index 560452b4b..4f0778fe9 100644
--- a/latest/examples/model_zoo_example/README.html
+++ b/latest/examples/model_zoo_example/README.html
@@ -4,7 +4,7 @@
- Accelerate Deep Learning Training and Inference for Model Zoo Workloads on Intel GPU — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Accelerate Deep Learning Training and Inference for Model Zoo Workloads on Intel GPU — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/examples/pretrain_bert/README.html b/latest/examples/pretrain_bert/README.html
index abbdc5be5..a00365035 100644
--- a/latest/examples/pretrain_bert/README.html
+++ b/latest/examples/pretrain_bert/README.html
@@ -4,7 +4,7 @@
- Accelerate BERT-Large Pretraining on Intel GPU — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Accelerate BERT-Large Pretraining on Intel GPU — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/examples/quantize_inception_v3/README.html b/latest/examples/quantize_inception_v3/README.html
index 2d765c7f3..62e64caa1 100644
--- a/latest/examples/quantize_inception_v3/README.html
+++ b/latest/examples/quantize_inception_v3/README.html
@@ -4,7 +4,7 @@
- Quantize Inception V3 by Intel® Extension for Tensorflow* on Intel® Xeon® — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Quantize Inception V3 by Intel® Extension for Tensorflow* on Intel® Xeon® — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/examples/quick_example.html b/latest/examples/quick_example.html
index 7f7dd05f4..745deddef 100644
--- a/latest/examples/quick_example.html
+++ b/latest/examples/quick_example.html
@@ -4,7 +4,7 @@
- Quick Example on Intel CPU and GPU — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Quick Example on Intel CPU and GPU — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/examples/stable_diffussion_inference/README.html b/latest/examples/stable_diffussion_inference/README.html
index 069de8ef5..0457886ba 100644
--- a/latest/examples/stable_diffussion_inference/README.html
+++ b/latest/examples/stable_diffussion_inference/README.html
@@ -4,7 +4,7 @@
- Stable Diffusion Inference for Text2Image on Intel GPU — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Stable Diffusion Inference for Text2Image on Intel GPU — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/examples/train_3d_unet/README.html b/latest/examples/train_3d_unet/README.html
new file mode 100644
index 000000000..d893d1c44
--- /dev/null
+++ b/latest/examples/train_3d_unet/README.html
@@ -0,0 +1,239 @@
+
+
+
+
+
+
+ Accelerate 3D-Unet Training w/o horovod for medical image segmentation on Intel GPU — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Intel® Extension for TensorFlow*
+
+
+
+
+
+
+
+ Accelerate 3D-Unet Training w/o horovod for medical image segmentation on Intel GPU
+
+ View page source
+
+
+
+
+
+
+
+
+Accelerate 3D-Unet Training w/o horovod for medical image segmentation on Intel GPU
+
+Introduction
+Intel® Extension for TensorFlow* is compatible with stock TensorFlow*.
+This example shows 3D-UNet Training for medical image segmentation. It contains single-tile training scripts and multi-tile training scripts with horovod.
+Install the Intel® Extension for TensorFlow* in legacy running environment, Tensorflow will execute the Training on Intel GPU.
+
+
+Hardware Requirements
+Verified Hardware Platforms:
+
+
+
+Prerequisites
+
+Model Code change
+To get better performance, instead of installing the official repository, you can apply the patch and install it as shown here. You can choose one patch from single-tile patch 3dunet_itex.patch
and multi-tile patch 3dunet_itex_with_horovod.patch
.
+git clone https : // github . com / NVIDIA / DeepLearningExamples . git
+cd DeepLearningExamples / TensorFlow / Segmentation / UNet_3D_Medical /
+git checkout 88 eb3cff2f03dad85035621d041e23a14345999e
+git apply patch # When applying this patch, please move it to the above 3D-UNet dir first.
+
+
+
+
+Prepare for GPU
+Refer to Prepare .
+
+
+Setup Running Environment
+You can use ./pip_set_env.sh
to setup for GPU. It contains the following two steps: creating virtual environment and installing python packages.
+
+python - m venv env_itex
+source source env_itex / bin / activate
+
+
+
+pip install -- upgrade pip
+pip install -- upgrade intel - extension - for - tensorflow [ gpu ]
+pip install intel - optimization - for - horovod
+pip install tfa - nightly
+pip install git + https : // github . com / NVIDIA / dllogger . git
+
+
+
+
+Enable Running Environment
+Enable oneAPI running environment (only for GPU) and virtual running environment.
+
+
+
+Prepare Dataset
+We use Brain Tumor Segmentation 2019 dataset for 3D-UNet training. Upon registration, the challenge’s data is made available through the https//ipp.cbica.upenn.edu
service.
+The training and test datasets are given as 3D nifti
volumes that can be read using the Nibabel library and NumPy. It can be converted from nifti
to tfrecord
using ./dataset/preprocess_data.py
script.
+
+
+
+Execute the Example
+Assume current_dir is examples/train_maskrcnn/DeepLearningExamples/TensorFlow/Segmentation/UNet_3D_Medical/
.
+Here we provide single-tile training scripts and multi-tile training scripts with horovod. The datatype can be float32 or bfloat16.
+DATASET_DIR =/ the / path / to / dataset
+OUTPUT_DIR =/ the / path / to / output_dir
+
+
+
+Single Tile
+First apply patch.
+git apply 3 dunet_itex . patch
+
+
+
+ python main.py --benchmark --data_dir $DATASET_DIR --model_dir $OUTPUT_DIR --exec_mode train --batch_size $BATCH_SIZE --warmup_steps 150 --max_steps 1000 --log_every 1
+
+
+
+ python main.py --benchmark --data_dir $DATASET_PATH --model_dir $OUTPUT_DIR --exec_mode train --warmup_steps 150 --max_steps 1000 --batch_size=$BATCH_SIZE --log_every 1 --amp
+
+
+
+
+Multi-tile with horovod
+First apply patch.
+git apply 3 dunet_itex_with_horovod . patch
+
+
+
+ mpirun -np 2 -prepend-rank -ppn 2 \
+ python main.py --data_dir=$DATASET_DIR --benchmark --model_dir=$MODEL_DIR --exec_mode train --warmup_steps 150 --max_steps 1000 --batch_size=$BATCH_SIZE
+
+
+
+ mpirun -np 2 -prepend-rank -ppn 2 \
+ python main.py --data_dir=$DATASET_DIR --benchmark --model_dir=$MODEL_DIR --exec_mode train --warmup_steps 150 --max_steps 1000 --batch_size=$BATCH_SIZE --amp
+
+
+
+
+
+FAQ
+
+If you get the following error log, refer to Enable Running Environment to Enable oneAPI running environment.
+
+tensorflow . python . framework . errors_impl . NotFoundError : libmkl_sycl . so .2 : cannot open shared object file : No such file or directory
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/latest/examples/train_bert/README.html b/latest/examples/train_bert/README.html
index 2b54f44ce..a08c9807e 100644
--- a/latest/examples/train_bert/README.html
+++ b/latest/examples/train_bert/README.html
@@ -4,7 +4,7 @@
- BERT Training for Classifying Text on Intel CPU and GPU — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ BERT Training for Classifying Text on Intel CPU and GPU — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/examples/train_bert_fp8/README.html b/latest/examples/train_bert_fp8/README.html
index 17d97e9a9..f851aec7d 100644
--- a/latest/examples/train_bert_fp8/README.html
+++ b/latest/examples/train_bert_fp8/README.html
@@ -4,7 +4,7 @@
- FP8 BERT-Large Fine-tuning for Classifying Text on Intel GPU — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ FP8 BERT-Large Fine-tuning for Classifying Text on Intel GPU — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/examples/train_horovod/mnist/README.html b/latest/examples/train_horovod/mnist/README.html
index a5a12cbe9..7c9c986fd 100644
--- a/latest/examples/train_horovod/mnist/README.html
+++ b/latest/examples/train_horovod/mnist/README.html
@@ -4,7 +4,7 @@
- Distributed Training Example with Intel® Optimization for Horovod* on Intel® GPU — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Distributed Training Example with Intel® Optimization for Horovod* on Intel® GPU — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/examples/train_horovod/resnet50/README.html b/latest/examples/train_horovod/resnet50/README.html
index 1d71fc7d4..7b43cc19d 100644
--- a/latest/examples/train_horovod/resnet50/README.html
+++ b/latest/examples/train_horovod/resnet50/README.html
@@ -4,7 +4,7 @@
- Distributed Training Example with Intel® Optimization for Horovod* — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Distributed Training Example with Intel® Optimization for Horovod* — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/examples/train_resnet50_with_autoshard/README.html b/latest/examples/train_resnet50_with_autoshard/README.html
index 1df102bb8..732fd6e23 100644
--- a/latest/examples/train_resnet50_with_autoshard/README.html
+++ b/latest/examples/train_resnet50_with_autoshard/README.html
@@ -4,7 +4,7 @@
- Accelerate ResNet50 Training by XPUAutoShard on Intel GPU — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Accelerate ResNet50 Training by XPUAutoShard on Intel GPU — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/genindex.html b/latest/genindex.html
index b7a3f77e5..cf8271e49 100644
--- a/latest/genindex.html
+++ b/latest/genindex.html
@@ -3,7 +3,7 @@
- Index — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Index — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/get_started.html b/latest/get_started.html
index 8c06b85c3..01db23b8d 100644
--- a/latest/get_started.html
+++ b/latest/get_started.html
@@ -4,7 +4,7 @@
- Quick Get Started* — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Quick Get Started* — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/objects.inv b/latest/objects.inv
index 66dd8e89c..fd9f6fee1 100644
Binary files a/latest/objects.inv and b/latest/objects.inv differ
diff --git a/latest/search.html b/latest/search.html
index f96068f3a..ff5eba567 100644
--- a/latest/search.html
+++ b/latest/search.html
@@ -3,7 +3,7 @@
- Search — Intel® Extension for TensorFlow* 0.1.dev1+g53bc0e2 documentation
+ Search — Intel® Extension for TensorFlow* 0.1.dev1+g1cf4e36 documentation
diff --git a/latest/searchindex.js b/latest/searchindex.js
index ab414f0f9..0f5648267 100644
--- a/latest/searchindex.js
+++ b/latest/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"docnames": ["CODE_OF_CONDUCT", "SECURITY", "docker/README", "docker/tensorflow-serving/README", "docs/README", "docs/build_docs/docs_build_tips", "docs/build_docs/source/index", "docs/community/contributing", "docs/community/releases", "docs/design/directory_structure", "docs/design/extension_design", "docs/design/how_to_write_custom_op", "docs/design/optimization/README", "docs/design/optimization/oneDNN_object_cache", "docs/guide/FAQ", "docs/guide/INT8_quantization", "docs/guide/OpenXLA_Support_on_GPU", "docs/guide/XPUAutoShard", "docs/guide/aamp_tune", "docs/guide/advanced_auto_mixed_precision", "docs/guide/environment_variables", "docs/guide/features", "docs/guide/how_to_enable_profiler", "docs/guide/infrastructure", "docs/guide/itex_fusion", "docs/guide/itex_ops", "docs/guide/itex_ops_override", "docs/guide/keras_mixed_precision", "docs/guide/launch", "docs/guide/practice_guide", "docs/guide/python_api", "docs/guide/tf_serving_install", "docs/install/experimental/install_for_arc_gpu", "docs/install/experimental/install_for_gpu_conda", "docs/install/how_to_build", "docs/install/install_for_cpp", "docs/install/install_for_cpu", "docs/install/install_for_xpu", "docs/install/installation_guide", "examples/README", "examples/accelerate_alexnet_by_quantization/README", "examples/common_guide_running", "examples/examples", "examples/infer_inception_v4_amp/README", "examples/infer_resnet50/README", "examples/model_zoo_example/README", "examples/pretrain_bert/README", "examples/quantize_inception_v3/README", "examples/quick_example", "examples/stable_diffussion_inference/README", "examples/train_bert/README", "examples/train_bert_fp8/README", "examples/train_horovod/mnist/README", "examples/train_horovod/resnet50/README", "examples/train_resnet50_with_autoshard/README", "get_started", "index"], "filenames": ["CODE_OF_CONDUCT.md", "SECURITY.md", "docker/README.md", "docker/tensorflow-serving/README.md", "docs/README.md", "docs/build_docs/docs_build_tips.md", "docs/build_docs/source/index.rst", "docs/community/contributing.md", "docs/community/releases.md", "docs/design/directory_structure.md", "docs/design/extension_design.md", "docs/design/how_to_write_custom_op.md", "docs/design/optimization/README.md", "docs/design/optimization/oneDNN_object_cache.md", "docs/guide/FAQ.md", "docs/guide/INT8_quantization.md", "docs/guide/OpenXLA_Support_on_GPU.md", "docs/guide/XPUAutoShard.md", "docs/guide/aamp_tune.md", "docs/guide/advanced_auto_mixed_precision.md", "docs/guide/environment_variables.md", "docs/guide/features.rst", "docs/guide/how_to_enable_profiler.md", "docs/guide/infrastructure.md", "docs/guide/itex_fusion.md", "docs/guide/itex_ops.md", "docs/guide/itex_ops_override.md", "docs/guide/keras_mixed_precision.md", "docs/guide/launch.md", "docs/guide/practice_guide.md", "docs/guide/python_api.md", "docs/guide/tf_serving_install.md", "docs/install/experimental/install_for_arc_gpu.md", "docs/install/experimental/install_for_gpu_conda.md", "docs/install/how_to_build.md", "docs/install/install_for_cpp.md", "docs/install/install_for_cpu.md", "docs/install/install_for_xpu.md", "docs/install/installation_guide.rst", "examples/README.md", "examples/accelerate_alexnet_by_quantization/README.md", "examples/common_guide_running.md", "examples/examples.md", "examples/infer_inception_v4_amp/README.md", "examples/infer_resnet50/README.md", "examples/model_zoo_example/README.md", "examples/pretrain_bert/README.md", "examples/quantize_inception_v3/README.md", "examples/quick_example.md", "examples/stable_diffussion_inference/README.md", "examples/train_bert/README.md", "examples/train_bert_fp8/README.md", "examples/train_horovod/mnist/README.md", "examples/train_horovod/resnet50/README.md", "examples/train_resnet50_with_autoshard/README.md", "get_started.md", "index.rst"], "titles": ["Contributor Covenant Code of Conduct", "Security Policy", "Intel\u00ae Extension for TensorFlow* Docker Container Guide", "Intel\u00ae Extension for TensorFlow* Serving - Docker Container Guide", "Welcome to Intel\u00ae Extension for TensorFlow* documentation", "Online Documentation Build Guide", "Welcome to Intel \u00ae Extension for TensorFlow* documentation!", "Contributing guidelines", "Releases", "Directory Tree Structure", "Extension Design", "How to write custom op", "Optimizations Design", "oneDNN object cache optimization", "Frequently Asked Questions", "INT8 Quantization", "OpenXLA Support on GPU via PJRT", "XPUAutoShard on GPU [Experimental]", "Tune Advanced Auto Mixed Precision", "Advanced Auto Mixed Precision", "Environment Variables", "Features", "GPU Profiler", "Infrastructure", "Graph fusion", "Customized Operators", "Operators Override", "Keras Mixed Precision", "Launch Script User Guide", "Practice Guide", "Python APIs", "Install TensorFlow Serving with Intel\u00ae Extension for TensorFlow*", "Experimental: Intel\u00ae Arc\u2122 A-Series GPU Software Installation", "Conda Environment Installation Instructions", "Overview", "Intel\u00ae Extension for TensorFlow* for C++", "Intel CPU Software Installation", "Intel XPU Software Installation", "Installation Guide", "Examples", "Accelerate AlexNet by Quantization with Intel\u00ae Extension for Tensorflow*", "Common Guide for Running", "Examples", "Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision on Intel CPU and GPU via Docker Container or Bare Metal", "ResNet50 Inference on Intel CPU and GPU", "Accelerate Deep Learning Training and Inference for Model Zoo Workloads on Intel GPU", "Accelerate BERT-Large Pretraining on Intel GPU", "Quantize Inception V3 by Intel\u00ae Extension for Tensorflow* on Intel\u00ae Xeon\u00ae", "Quick Example on Intel CPU and GPU", "Stable Diffusion Inference for Text2Image on Intel GPU", "BERT Training for Classifying Text on Intel CPU and GPU", "FP8 BERT-Large Fine-tuning for Classifying Text on Intel GPU", "Distributed Training Example with Intel\u00ae Optimization for Horovod* on Intel\u00ae GPU", "Distributed Training Example with Intel\u00ae Optimization for Horovod*", "Accelerate ResNet50 Training by XPUAutoShard on Intel GPU", "Quick Get Started*", "Welcome to Intel \u00ae Extension for TensorFlow* documentation!"], "terms": {"we": [0, 2, 7, 11, 16, 24, 27, 29, 30, 31, 33, 34, 35, 40, 41, 43, 46, 47, 49, 51, 55], "member": [0, 30], "leader": 0, "make": [0, 2, 3, 5, 7, 11, 14, 16, 18, 19, 27, 29, 34, 35, 43], "particip": 0, "commun": [0, 2, 7, 9, 21, 23, 29, 37, 55], "harass": 0, "free": [0, 21, 28], "experi": [0, 4, 21, 23, 29], "everyon": 0, "regardless": 0, "ag": 0, "bodi": 0, "size": [0, 20, 25, 27, 28, 52, 54], "visibl": [0, 2, 11, 31, 53], "invis": 0, "disabl": [0, 15, 19, 28, 29, 30], "ethnic": 0, "sex": 0, "characterist": 0, "gender": 0, "ident": [0, 27], "express": 0, "level": [0, 14, 16, 17, 23, 24, 27, 32], "educ": [0, 53], "socio": 0, "econom": 0, "statu": [0, 11, 19, 35], "nation": 0, "person": 0, "appear": [0, 27], "race": 0, "cast": [0, 18, 24, 27], "color": 0, "religion": 0, "sexual": 0, "orient": 0, "act": [0, 21, 31], "interact": [0, 34], "wai": [0, 14, 19, 27, 31, 33], "contribut": [0, 4, 21, 28, 34], "an": [0, 2, 3, 7, 11, 13, 14, 18, 19, 21, 24, 25, 27, 28, 29, 31, 33, 34, 35, 37, 39, 42, 47, 48, 51, 54, 55], "open": [0, 5, 7, 14, 18, 21, 31, 32, 43, 44, 46, 47, 49, 50, 51, 55], "welcom": [0, 7, 55], "divers": 0, "inclus": 0, "healthi": 0, "exampl": [0, 2, 4, 5, 7, 9, 11, 15, 20, 21, 24, 25, 26, 27, 29, 30, 31, 33, 40, 43, 45, 47, 50, 55], "behavior": [0, 27, 28, 29], "posit": [0, 7], "environ": [0, 4, 11, 13, 15, 19, 21, 22, 23, 27, 29, 31, 35, 38, 39, 42, 53, 55], "includ": [0, 7, 13, 14, 16, 17, 18, 20, 23, 35, 37, 47, 48, 53, 55], "demonstr": [0, 16, 39, 42], "empathi": 0, "kind": [0, 4, 21, 48], "toward": 0, "other": [0, 17, 20, 25, 27, 28, 29, 30, 31, 32, 34, 37, 50, 52, 53, 55], "peopl": 0, "Being": 0, "respect": [0, 28, 46], "differ": [0, 2, 4, 13, 16, 20, 21, 23, 25, 28, 29, 30, 38, 53], "opinion": 0, "viewpoint": 0, "give": 0, "gracefulli": 0, "accept": [0, 7, 17], "construct": [0, 11, 17, 27], "feedback": [0, 7], "apolog": 0, "those": [0, 18, 19, 31, 53], "affect": [0, 18, 27], "mistak": 0, "learn": [0, 15, 19, 21, 25, 28, 29, 31, 34, 39, 40, 42, 55], "from": [0, 3, 4, 5, 7, 11, 16, 17, 18, 19, 21, 22, 27, 28, 29, 30, 32, 34, 38, 39, 42, 43, 45, 46, 47, 50, 53, 55], "focus": 0, "what": [0, 14, 27], "i": [0, 4, 5, 7, 9, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, 31, 32, 33, 34, 35, 36, 37, 39, 40, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55], "best": [0, 32], "just": 0, "u": [0, 16, 22, 28, 37], "individu": [0, 20], "overal": [0, 29], "unaccept": 0, "The": [0, 2, 4, 5, 7, 9, 13, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 31, 32, 34, 35, 36, 37, 40, 43, 46, 47, 50, 51, 52, 54], "us": [0, 2, 3, 4, 5, 7, 13, 14, 15, 16, 18, 19, 20, 22, 23, 24, 25, 26, 27, 29, 30, 32, 33, 34, 35, 37, 39, 41, 42, 43, 45, 46, 47, 48, 50, 51, 53, 55], "languag": [0, 35], "imageri": 0, "attent": [0, 20], "advanc": [0, 4, 14, 20, 30, 39, 42, 55], "ani": [0, 4, 11, 20, 21, 23, 24, 27, 28, 32, 33, 34, 37, 40, 48, 50], "troll": 0, "insult": 0, "derogatori": 0, "comment": [0, 7, 14], "polit": 0, "attack": 0, "public": [0, 4, 5, 11, 21, 25, 30, 31], "privat": 0, "publish": [0, 5], "inform": [0, 1, 7, 8, 20, 28, 29, 30, 34, 37, 40, 47, 55], "physic": [0, 29, 54], "email": 0, "address": [0, 29, 32], "without": [0, 4, 18, 20, 21, 23, 27, 34, 39, 42, 47, 50, 55], "explicit": [0, 11, 27, 29], "permiss": [0, 5], "which": [0, 4, 7, 9, 13, 14, 15, 16, 17, 18, 19, 20, 24, 27, 28, 29, 30, 32, 34, 37, 40, 41, 47, 51], "could": [0, 14, 18, 27, 30, 35, 37, 40, 47], "reason": [0, 27], "consid": [0, 18, 52], "inappropri": 0, "profession": 0, "set": [0, 4, 7, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 29, 30, 32, 33, 35, 43, 46, 47, 51, 55], "ar": [0, 2, 4, 5, 7, 11, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 34, 36, 37, 39, 40, 42, 43, 46, 47, 48, 52, 55], "clarifi": 0, "take": [0, 11, 24, 27, 28, 29, 31, 33, 46], "appropri": [0, 3, 29, 34], "fair": 0, "action": [0, 5], "thei": [0, 18, 27, 28, 29], "deem": 0, "threaten": 0, "offens": 0, "harm": 0, "have": [0, 18, 27, 29, 32, 33, 34, 40, 47], "right": [0, 25], "remov": [0, 11, 18, 24, 53], "edit": [0, 2], "reject": 0, "commit": [0, 5, 17, 31, 53], "wiki": 0, "issu": [0, 1, 7, 14, 18, 27, 32, 34, 37, 50, 55], "align": [0, 13], "thi": [0, 2, 3, 5, 11, 13, 14, 16, 17, 18, 19, 20, 21, 23, 24, 25, 27, 28, 29, 30, 31, 33, 34, 35, 37, 40, 41, 45, 46, 47, 48, 51, 54, 55], "moder": 0, "decis": [0, 17], "when": [0, 5, 14, 17, 19, 24, 27, 28, 29, 31, 32, 34, 46, 47, 50], "appli": [0, 17, 25, 27, 30, 46, 49, 51, 53, 54], "within": [0, 15, 25, 32, 46], "all": [0, 7, 11, 14, 18, 20, 21, 25, 27, 29, 32, 37, 40, 43, 46, 53, 54], "space": [0, 29, 55], "also": [0, 4, 7, 15, 16, 17, 19, 21, 23, 27, 28, 29, 32, 33, 36, 37, 55], "offici": [0, 29, 39, 40, 41, 42, 46, 49, 51, 53, 54], "repres": [0, 17], "e": [0, 2, 3, 5, 11, 17, 27, 28, 31, 35, 37], "mail": 0, "post": [0, 7, 18, 19, 24, 30], "via": [0, 11, 17, 39, 42, 54, 55], "social": 0, "media": 0, "account": 0, "appoint": 0, "onlin": [0, 55], "offlin": 0, "event": 0, "instanc": 0, "abus": 0, "otherwis": [0, 17, 27, 30, 47, 48], "mai": [0, 7, 13, 14, 18, 19, 24, 27, 28, 29, 32, 33, 37, 49, 55], "report": [0, 7, 20, 55], "itex": [0, 2, 3, 4, 8, 9, 11, 13, 14, 16, 17, 18, 19, 20, 21, 23, 26, 27, 28, 31, 32, 33, 34, 35, 36, 37, 41, 43, 47, 49, 53, 54, 55], "maintain": [0, 7, 8, 18, 21, 23, 25, 31], "intel": [0, 1, 5, 8, 9, 11, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, 28, 33, 38, 39, 42, 55], "com": [0, 5, 7, 8, 16, 21, 27, 29, 31, 32, 33, 34, 35, 37, 40, 43, 46, 47, 49, 50, 51, 52, 53, 54, 55], "complaint": 0, "review": 0, "investig": [0, 28], "promptli": 0, "fairli": 0, "oblig": 0, "privaci": 0, "secur": 0, "incid": 0, "follow": [0, 2, 3, 7, 15, 17, 18, 22, 24, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 43, 44, 46, 48, 49, 50, 51, 54, 55], "impact": [0, 5, 14, 18, 24, 29, 50], "determin": [0, 11, 27, 29], "consequ": 0, "violat": 0, "unprofession": 0, "unwelcom": 0, "A": [0, 5, 16, 17, 18, 24, 27, 28, 29, 30, 31, 37, 39, 42, 43, 52], "written": [0, 7], "provid": [0, 2, 4, 7, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32, 33, 34, 37, 39, 40, 42, 46, 47, 54, 55], "clariti": 0, "around": [0, 28, 46], "natur": 0, "explan": 0, "why": 0, "wa": [0, 28, 29, 30, 34], "apologi": 0, "request": [0, 7, 55], "through": [0, 14, 27, 29, 34, 55], "singl": [0, 4, 7, 15, 20, 21, 24, 46], "seri": [0, 16, 29, 30, 34, 37, 40, 43, 45, 46, 47, 49, 50, 51, 52, 54, 55], "continu": [0, 14, 18, 27], "No": [0, 14, 19, 22, 34, 43, 44, 46, 49, 50, 51], "involv": 0, "unsolicit": 0, "specifi": [0, 3, 11, 21, 24, 27, 28, 29, 31, 34], "period": [0, 29], "time": [0, 11, 14, 16, 18, 19, 20, 21, 22, 27, 29, 34, 40, 46], "avoid": [0, 24, 27, 28, 29, 33], "well": [0, 2, 8, 11, 21, 26, 27, 28, 29, 46], "extern": [0, 14, 35], "channel": [0, 24, 25, 38], "like": [0, 2, 7, 16, 17, 25, 27, 29, 30, 41, 43, 51, 52], "term": [0, 25, 55], "lead": [0, 18], "seriou": 0, "sustain": 0, "sort": 0, "allow": [0, 16, 18, 27, 29, 50, 55], "dure": [0, 15, 18, 19, 24, 27, 33, 34, 43], "pattern": [0, 4, 15, 21, 24], "aggress": [0, 18, 19], "disparag": 0, "class": [0, 11, 27, 30], "adapt": 0, "version": [0, 2, 11, 14, 16, 27, 29, 32, 33, 34, 36, 37, 40, 41], "avail": [0, 2, 3, 11, 14, 19, 25, 28, 29, 34, 36, 37, 55], "http": [0, 2, 5, 7, 8, 16, 21, 22, 27, 29, 31, 32, 33, 34, 35, 36, 37, 40, 43, 46, 47, 49, 50, 51, 52, 53, 54, 55], "www": [0, 21, 37], "org": [0, 2, 7, 21, 35, 50, 53], "_": [0, 11, 13, 16, 17, 18, 20, 22, 24, 27, 28, 29, 30, 31, 34, 35, 41, 43, 46, 47, 48, 49, 50, 51, 52, 53], "html": [0, 5, 37], "were": [0, 28, 29], "inspir": 0, "mozilla": 0, "": [0, 5, 14, 18, 20, 21, 27, 29, 31, 34, 35, 40, 43, 47, 49, 50, 55], "ladder": 0, "For": [0, 1, 2, 7, 11, 14, 15, 16, 18, 19, 20, 23, 25, 26, 27, 28, 30, 31, 32, 35, 37, 43, 44, 45, 46, 49, 50, 51, 52, 54], "answer": 0, "common": [0, 11, 14, 17, 21, 29], "question": [0, 4, 55], "about": [0, 7, 19, 29, 31, 40, 46, 47, 52], "see": [0, 1, 2, 7, 22, 25, 27, 28, 29, 31, 32, 34, 47, 55], "faq": 0, "translat": [0, 34], "center": [1, 4, 16, 21, 25, 26, 30, 34, 37, 40, 43, 45, 46, 47, 49, 50, 51, 52, 54, 55], "more": [1, 4, 7, 11, 16, 18, 19, 21, 25, 29, 31, 32, 34, 37, 40, 46, 47, 48, 52], "how": [1, 5, 14, 16, 17, 18, 29, 31, 34, 35, 37, 39, 42, 52, 55], "work": [1, 4, 7, 14, 15, 19, 20, 21, 27, 28, 29, 35, 40, 47], "resolv": 1, "handl": [1, 13], "guidelin": [1, 4, 45, 55], "document": [2, 3, 27, 33], "ha": [2, 3, 14, 18, 19, 27, 29, 32, 35, 46, 54], "instruct": [2, 3, 4, 7, 18, 19, 21, 29, 36, 37, 49, 55], "assumpt": [2, 3], "host": [2, 3, 27, 37, 43], "machin": [2, 3, 21, 27, 28, 29, 31, 36, 37, 48, 52], "linux": [2, 3, 7, 16, 28, 29, 33, 34, 36, 37, 47], "kernel": [2, 3, 9, 10, 15, 16, 20, 22, 23, 24, 25, 27, 32, 34, 36, 37, 46, 47, 49, 55], "compat": [2, 3, 4, 15, 19, 21, 23, 26, 27, 30, 46, 47, 49, 50, 51], "driver": [2, 3, 14, 27, 33, 40, 43, 47, 55], "instal": [2, 3, 4, 7, 9, 14, 18, 19, 21, 22, 23, 26, 27, 28, 29, 30, 40, 41, 43, 44, 46, 47, 49, 50, 51, 53], "softwar": [2, 33, 38, 40, 47, 48, 52], "refer": [2, 3, 7, 11, 15, 16, 17, 18, 19, 20, 21, 23, 27, 29, 30, 31, 32, 34, 35, 37, 40, 41, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55], "xpu": [2, 4, 11, 14, 16, 17, 19, 22, 25, 26, 27, 30, 38, 48, 49, 52], "cpu": [2, 3, 4, 9, 11, 14, 15, 18, 19, 20, 23, 24, 27, 30, 31, 38, 39, 40, 42], "detail": [2, 3, 11, 15, 16, 17, 18, 19, 21, 23, 25, 27, 29, 30, 32, 34, 37, 40, 43, 46, 55], "download": [2, 8, 27, 29, 32, 35, 37, 46], "copi": [2, 3, 35], "wheel": [2, 33, 34], "model": [2, 3, 13, 15, 16, 17, 18, 19, 20, 21, 22, 29, 30, 39, 40, 42, 47, 50, 52, 54, 55], "directori": [2, 3, 4, 5, 7, 14, 17, 28, 31, 32, 34, 35, 37, 43, 44, 46, 49, 51], "you": [2, 3, 4, 5, 7, 8, 11, 13, 14, 16, 17, 18, 20, 21, 22, 23, 27, 28, 29, 30, 31, 32, 33, 34, 36, 37, 40, 41, 43, 44, 46, 47, 48, 49, 51, 53, 54], "can": [2, 3, 7, 11, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 25, 27, 28, 29, 30, 31, 32, 33, 34, 36, 37, 38, 40, 46, 54, 55], "get": [2, 4, 7, 11, 13, 16, 21, 27, 29, 30, 31, 32, 34, 43, 44, 46, 49, 51], "link": [2, 35, 47], "pypi": [2, 38, 55], "project": [2, 5, 7, 55], "file": [2, 5, 7, 14, 17, 18, 22, 28, 31, 32, 37, 43, 44, 46, 49, 50, 51, 53, 55], "lib": [2, 14, 16, 28, 34, 35, 50], "To": [2, 3, 4, 7, 18, 19, 24, 27, 29, 32, 34, 35, 36, 37, 40, 46, 47, 49], "optim": [2, 4, 9, 14, 15, 16, 17, 18, 23, 25, 26, 27, 28, 29, 30, 32, 33, 37, 39, 40, 42, 43, 45, 46, 47, 49, 55], "horovod": [2, 32, 33, 37, 39, 42], "oneapi": [2, 14, 16, 21, 31, 33, 35, 40, 43, 44, 46, 47, 49, 50, 51, 54, 55], "collect": [2, 29, 37], "librari": [2, 3, 11, 28, 29, 32, 34, 37], "oneccl": [2, 32, 33, 37], "mkdir": [2, 3, 53, 54], "cd": [2, 5, 7, 16, 29, 31, 34, 35, 43, 46, 49, 51, 52, 53, 54], "wget": [2, 7, 29, 32, 34, 35, 37, 43, 50, 52], "sh": [2, 3, 5, 14, 31, 32, 33, 34, 35, 37, 41, 43, 44, 46, 47, 49, 50, 51, 52, 55], "o": [2, 16, 22, 32, 33, 35, 37, 47], "some": [2, 11, 16, 18, 19, 26, 27, 28, 29, 34, 46, 52], "python": [2, 4, 9, 14, 16, 19, 22, 23, 25, 26, 27, 28, 29, 31, 32, 33, 34, 36, 37, 40, 41, 46, 47, 48, 50, 51, 52, 53, 55], "hard": [2, 49], "code": [2, 4, 5, 9, 11, 16, 20, 21, 22, 23, 29, 31, 38, 39, 40, 42, 43, 47, 53], "insid": [2, 55], "If": [2, 3, 5, 16, 20, 22, 25, 26, 27, 28, 29, 30, 32, 34, 36, 37, 40, 43, 44, 46, 47, 48, 49, 51, 53], "re": [2, 29, 41], "3": [2, 7, 18, 20, 22, 24, 25, 26, 27, 28, 29, 30, 33, 34, 35, 36, 37, 40, 41, 47, 48, 54], "10": [2, 14, 16, 18, 19, 25, 27, 28, 32, 34, 36, 37, 47, 54, 55], "2": [2, 14, 15, 17, 18, 19, 20, 24, 25, 27, 28, 29, 30, 33, 34, 36, 37, 40, 43, 44, 46, 47, 48, 49, 51, 52, 53, 54, 55], "13": [2, 16, 32, 33, 34, 35, 36, 37, 40, 47, 52, 54, 55], "ubuntu": [2, 16, 31, 34, 35, 36, 37], "22": [2, 16, 31, 32, 34, 36, 37, 54], "04": [2, 16, 31, 32, 34, 35, 36, 37], "layer": [2, 9, 19, 25, 27, 47], "updat": [2, 18, 27, 31, 32, 33, 34, 35, 36, 37, 54], "shown": [2, 3, 15, 22, 24, 28, 46, 49], "below": [2, 3, 16, 24, 25, 27, 28, 29, 30, 31, 32, 34, 46, 53], "image_nam": [2, 3], "arg": [2, 13, 30], "ubuntu_vers": 2, "python3": [2, 5, 33, 34, 49, 50], "tf_ver": 2, "whl": [2, 11, 32, 34, 35, 55], "t": [2, 5, 11, 13, 17, 18, 20, 27, 28, 49, 50], "f": [2, 32, 35, 55], "dockerfil": 2, "enter": [2, 3, 22, 33, 34], "folder": [2, 3, 19, 31, 34, 53], "command": [2, 3, 14, 16, 22, 28, 29, 32, 33, 34, 36, 37, 41, 43, 47, 51], "start": [2, 3, 14, 21, 22, 27, 28, 31], "v": [2, 3, 18, 31, 33, 35, 37, 41, 43], "option": [2, 3, 7, 11, 16, 18, 21, 28, 30, 34, 53, 54, 55], "mount": [2, 3], "your": [2, 3, 5, 7, 14, 29, 31, 32, 33, 34, 36, 37, 41, 43, 47, 49, 50, 53, 54, 55], "local": [2, 3, 7, 14, 19, 28, 29, 31, 34, 35, 36, 37, 52], "attach": [2, 3, 27, 29], "devic": [2, 3, 4, 9, 10, 11, 13, 14, 16, 17, 19, 20, 21, 22, 23, 24, 27, 30, 31, 34, 35, 37, 43, 53, 54, 55], "dev": [2, 3, 14, 22, 31, 37, 43, 51], "dri": [2, 3, 31, 37, 43], "dir": [2, 3, 7, 46, 50, 51], "workspac": [2, 3, 31, 53], "path": [2, 3, 7, 16, 18, 19, 20, 22, 28, 29, 30, 31, 32, 33, 34, 35, 37, 43, 47, 51, 53, 54, 55], "privileg": [2, 3, 43], "ipc": [2, 3, 37, 43], "http_proxi": [2, 3], "https_proxi": [2, 3], "no_proxi": [2, 3], "bash": [2, 32, 33, 34, 37, 43, 46, 47, 55], "now": [2, 18, 27, 29, 31], "c": [2, 4, 10, 11, 14, 16, 28, 29, 32, 33, 34, 36, 37, 38, 55], "client": [2, 35], "import": [2, 7, 11, 14, 16, 17, 18, 19, 22, 23, 25, 26, 27, 29, 32, 33, 34, 36, 37, 43, 47, 48, 55], "device_lib": 2, "print": [2, 11, 16, 19, 22, 25, 27, 28, 30, 32, 33, 34, 36, 37, 43, 44, 48, 49, 54, 55], "list_local_devic": 2, "should": [2, 5, 7, 22, 27, 29, 31, 32, 33, 36, 37, 40, 51, 54], "list": [2, 7, 11, 19, 24, 27, 28, 29, 32, 34, 53], "sampl": [2, 22, 40, 47, 49], "output": [2, 7, 11, 13, 19, 20, 24, 25, 27, 30, 32, 34, 35, 43, 47, 51], "look": [2, 16, 24, 31], "name": [2, 3, 4, 5, 7, 11, 14, 16, 18, 19, 20, 25, 26, 27, 29, 31, 39, 42, 49, 52], "0": [2, 5, 11, 14, 15, 16, 19, 20, 22, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 40, 44, 46, 47, 48, 50, 51, 52, 53, 54, 55], "device_typ": [2, 14, 17, 52, 54], "memory_limit": 2, "268435456": 2, "incarn": 2, "9266936945121049176": 2, "xla_global_id": 2, "1": [2, 4, 5, 14, 18, 19, 20, 21, 22, 25, 26, 27, 28, 29, 30, 33, 34, 43, 46, 47, 48, 51, 52, 53, 54, 55], "bus_id": 2, "15031084974591766410": 2, "physical_device_desc": 2, "intel_xpu": 2, "pci": 2, "bu": 2, "id": [2, 31], "undefin": [2, 16], "17448926295332318308": 2, "step": [3, 16, 17, 18, 25, 27, 29, 31, 40, 52, 53, 54], "cpp": [3, 14, 17, 32], "cc": [3, 11, 14, 16, 17, 27, 31, 37, 52, 54], "sourc": [3, 4, 7, 11, 16, 17, 21, 32, 33, 37, 38, 41, 43, 44, 47, 50, 52, 55], "Then": [3, 11, 16, 22, 30, 36, 37, 47], "packag": [3, 16, 29, 32, 33, 34, 36, 40, 47, 50, 55], "p": [3, 25, 31, 36, 37, 43, 53], "bazel": [3, 11, 16, 31, 35], "bin": [3, 7, 11, 16, 28, 31, 34, 35, 41, 43, 44, 47, 50, 52], "cp": [3, 35], "r": [3, 7, 14, 16, 27, 29, 54], "path_to_itex": 3, "out": [3, 15, 16, 27, 35, 44, 48, 49, 54], "k8": [3, 35], "opt": [3, 11, 14, 16, 32, 34, 35, 37, 41, 52], "st": [3, 35], "tar": [3, 7, 29], "cvfh": 3, "path_to_tensorflow_serv": 3, "tensorflow_serv": [3, 31], "model_serv": [3, 31], "tensorflow_model_serv": [3, 31], "gpu": [3, 4, 9, 11, 14, 15, 18, 19, 20, 23, 24, 25, 27, 30, 31, 33, 38, 39, 42], "sure": [3, 11, 16, 27, 32, 34], "meet": [3, 25, 55], "either": [3, 19], "target": [3, 17, 34, 35], "8500": [3, 31], "model_nam": [3, 31], "model_dir": [3, 31, 53, 54], "overview": 4, "infrastructur": [4, 9, 20], "quick": [4, 11, 39, 42], "releas": [4, 14, 17, 29, 30, 31, 34, 40, 49, 50], "frequent": 4, "ask": [4, 34], "guid": [4, 9, 11, 16, 18, 21, 27, 31, 32, 34, 35, 37, 40, 47], "build": [4, 7, 9, 38, 39, 40, 42, 55], "conda": [4, 14, 38], "distribut": [4, 8, 29, 32, 33, 37, 38, 39, 42, 55], "featur": [4, 7, 8, 11, 13, 17, 25, 29, 34, 39, 42, 47, 54, 55], "variabl": [4, 13, 15, 16, 19, 21, 22, 23, 24, 25, 27, 29, 31, 33, 35, 47], "api": [4, 7, 9, 10, 14, 15, 16, 19, 25, 26, 27, 29, 31, 35, 47, 48], "auto": [4, 11, 17, 28, 30, 35], "mix": [4, 30, 39, 42], "precis": [4, 30, 39, 40, 42, 49, 51], "graph": [4, 9, 10, 13, 15, 18, 20, 23, 39, 42, 48, 54, 55], "custom": [4, 7, 9, 18, 21, 26, 28, 30, 32, 37, 46], "oper": [4, 13, 15, 18, 23, 24, 27, 29, 55], "overrid": [4, 11, 18, 27], "int8": [4, 27, 40, 47], "quantiz": [4, 39, 42], "xpuautoshard": [4, 30, 39, 42], "profil": [4, 9, 27, 29], "launcher": [4, 28, 29], "topic": 4, "practic": [4, 27, 28], "support": [4, 7, 13, 14, 15, 17, 18, 19, 22, 24, 27, 28, 29, 30, 32, 34, 36, 37, 40, 43, 47, 53, 54], "openxla": 4, "develop": [4, 16, 21, 29, 32, 34, 36, 37, 55], "design": [4, 7, 9, 14, 21, 31, 40], "structur": [4, 16, 19, 28, 29], "op": [4, 9, 10, 17, 20, 21, 23, 24, 26, 27, 35, 46, 49], "gener": [4, 5, 20, 21, 23, 27, 28, 29, 31, 33, 34, 36, 43, 47], "default": [4, 7, 13, 14, 15, 18, 19, 20, 21, 23, 27, 29, 30, 34, 37, 46, 47, 48, 53, 54], "configur": [4, 8, 11, 14, 16, 17, 19, 21, 23, 27, 28, 30, 32, 37, 55], "good": [4, 19, 21, 23, 29, 31], "perform": [4, 15, 17, 19, 20, 21, 22, 23, 24, 25, 27, 28, 29, 30, 34, 39, 42, 46, 47, 49, 54, 55], "chang": [4, 5, 7, 11, 18, 19, 20, 21, 23, 27, 28, 33, 39, 40, 42, 50, 52, 53], "simpl": [4, 21, 23, 27, 35], "frontend": [4, 21, 23], "util": [4, 9, 11, 14, 21, 23, 28, 29, 50, 54], "user": [4, 5, 7, 11, 13, 19, 20, 21, 23, 32, 34, 36, 37, 38, 43, 49, 55], "onli": [4, 5, 13, 14, 17, 18, 20, 21, 23, 24, 27, 28, 30, 31, 32, 36, 46, 49, 50, 51, 53, 54], "minor": [4, 21, 23], "applic": [4, 21, 23, 29, 30, 31, 40], "scenario": [4, 13, 20, 21, 23, 29, 30], "typic": [4, 21, 23, 27, 29], "need": [4, 8, 13, 14, 16, 17, 20, 21, 23, 27, 28, 31, 32, 33, 34, 35, 37, 43, 47, 48, 50, 53, 54], "add": [4, 5, 17, 18, 19, 24, 29, 31, 32, 35, 43, 49, 53, 54], "two": [4, 13, 14, 19, 21, 23, 27, 29, 34, 43, 46, 49], "three": [4, 21, 22, 23, 28], "claus": [4, 21, 23], "origin": [4, 18, 21, 23, 24, 25, 35, 40, 43, 50], "amp": [4, 18, 28, 39, 42, 55], "low": [4, 18, 21, 23, 27, 40], "data": [4, 15, 16, 17, 18, 21, 22, 25, 27, 30, 34, 37, 40, 43, 45, 46, 47, 49, 50, 51, 52, 54, 55], "type": [4, 7, 11, 14, 18, 20, 21, 28, 30, 33, 34, 43], "bfloat16": [4, 11, 18, 19, 21, 24, 27, 30, 43, 46, 51], "float16": [4, 18, 19, 21, 27, 30, 43], "nativ": [4, 15, 21, 53], "3rd": [4, 21, 36], "xeon": [4, 21, 29, 34, 36, 39, 42, 43], "scalabl": [4, 21, 31, 36, 43], "processor": [4, 21, 29, 36, 43, 47, 48], "cooper": [4, 21, 39, 42, 47], "lake": [4, 21], "avx512": [4, 21, 37, 47], "further": [4, 21], "boost": [4, 21, 28, 29], "less": [4, 18, 19, 21, 24, 27, 43], "memori": [4, 9, 11, 13, 14, 15, 18, 19, 21, 25, 27, 43, 53], "lower": [4, 15, 18, 19, 21, 43], "fulli": [4, 19, 21], "enabl": [4, 13, 15, 16, 17, 18, 21, 22, 25, 27, 28, 29, 30, 33, 34, 35, 53], "fuse": [4, 16, 18, 19, 21, 24, 26, 46], "specif": [4, 16, 27, 29, 30, 31, 32, 37, 55], "new": [4, 5, 7, 8, 15, 21, 23, 24, 27, 29, 40], "better": [4, 15, 18, 19, 21, 24, 25, 28, 29, 39, 42, 46, 47, 49], "conv2d": [4, 21, 48], "relu": [4, 11, 16, 19, 21, 24, 25, 26, 27, 48], "linear": [4, 19, 21, 25, 27], "benefit": [4, 21, 27, 29, 30], "fusion": [4, 9, 16, 17, 18, 19, 21, 26, 30], "deliv": [4, 19, 21], "transpar": [4, 21], "fashion": [4, 21], "implement": [4, 7, 10, 16, 17, 19, 21, 23, 25, 26, 29, 55], "sever": [4, 21, 28, 29, 34], "namespac": [4, 17, 21, 23, 25, 26, 30, 35], "extend": [4, 14, 21, 23, 25, 29, 30], "defin": [4, 16, 27], "export": [4, 7, 11, 15, 16, 17, 18, 19, 21, 22, 27, 28, 29, 31, 33, 35, 41, 43, 47, 51, 53, 54], "ze_enable_tracing_lay": [4, 21, 22, 27], "usecyclespersecondtim": [4, 21, 22, 27], "enable_tf_profil": [4, 21, 22, 27], "co": [4, 14, 15, 21], "neural": [4, 15, 21, 29, 39, 40, 42, 47], "compressor": [4, 15, 21, 39, 40, 42, 47], "solut": [4, 14, 15, 21], "equival": [4, 27], "experiment": [4, 13, 14, 16, 22, 30, 34, 37, 53], "automat": [4, 5, 16, 17, 18, 19, 21, 26, 27, 28, 29, 30, 32, 37, 39, 42, 44, 48, 54], "shard": [4, 17, 21, 30, 53], "input": [4, 11, 13, 17, 19, 20, 21, 22, 24, 25, 27, 30, 54], "place": [4, 17, 21, 29, 35], "maxim": [4, 17, 21, 25, 30, 54], "hardwar": [4, 17, 19, 21, 23, 25, 28, 30, 39, 42], "usag": [4, 14, 21, 29, 30, 39, 42], "adopt": [4, 15, 21], "uniform": [4, 16, 21], "pjrt": [4, 21, 55], "plugin": [4, 10, 16, 18, 19, 21, 22, 31, 34, 52, 55], "mechan": [4, 21], "backend": [4, 16, 21, 23, 26, 27, 30, 37, 43, 44, 47, 48, 55], "show": [5, 14, 16, 18, 27, 34, 35, 37, 39, 40, 42, 43, 45, 46, 47, 49, 50, 51, 53, 54], "script": [5, 21, 22, 29, 34, 43, 46, 48, 50, 53], "relat": [5, 28, 31], "save": [5, 11, 17, 28, 30, 51], "doc": [5, 9, 11, 50], "build_doc": 5, "trigger": [5, 19, 30], "merg": 5, "pr": 5, "github": [5, 7, 8, 16, 21, 29, 31, 34, 35, 37, 40, 43, 46, 49, 51, 52, 53, 54, 55], "repo": [5, 32, 33], "main": [5, 16, 17, 21, 32, 35, 52], "branch": [5, 7, 16, 34, 53], "execut": [5, 11, 13, 15, 16, 17, 18, 19, 20, 22, 25, 27, 29, 47, 48], "content": [5, 35, 37], "doesn": [5, 17, 18, 50], "contain": [5, 9, 15, 17, 28, 29, 31, 38, 39, 42, 55], "won": [5, 28], "product": [5, 7, 21, 31, 32], "git": [5, 11, 16, 30, 31, 34, 35, 43, 46, 49, 51, 52, 53, 54], "tag": [5, 53], "must": [5, 15, 27], "ad": [5, 13, 17, 18, 21, 23, 27, 34, 46, 54], "same": [5, 7, 14, 16, 20, 21, 23, 24, 25, 27, 28, 29, 30, 31, 35, 40, 48, 53], "manual": [5, 7, 18, 27, 28], "result": [5, 15, 16, 17, 19, 22, 27, 29, 30, 33, 40, 44, 46, 48, 49, 50, 54], "gh": 5, "page": [5, 21, 22, 23, 29, 55], "io": [5, 31], "site": [5, 8, 32, 33, 34, 37, 50, 55], "note": [5, 11, 17, 18, 20, 25, 27, 28, 30, 31, 34, 35, 37, 43, 49, 52, 53], "write": [5, 7, 19], "abl": 5, "clone": [5, 16, 31, 34, 35, 46, 49, 51, 53, 54], "extens": [5, 8, 9, 11, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, 28, 29, 33, 38, 39, 41, 42, 43, 44, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55], "tensorflow": [5, 8, 9, 10, 11, 13, 14, 15, 16, 17, 20, 22, 24, 25, 26, 27, 28, 29, 33, 38, 39, 41, 42, 43, 44, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55], "checkout": [5, 16, 31, 35, 54], "build_tmp": 5, "m": [5, 16, 28, 29, 40, 41, 49, 52], "push": 5, "befor": [5, 7, 11, 18, 19, 24, 27, 28, 29, 34, 35, 54], "submit": [5, 7, 55], "modifi": [5, 35, 43, 54], "draft": 5, "server": [5, 16, 34, 37], "9000": 5, "web": [5, 50], "browser": [5, 22, 36, 37, 47, 49, 50], "g": [5, 17, 27, 35], "chrome": 5, "127": [5, 31], "localhost": [5, 11, 20, 36, 37, 52], "check": [5, 7, 11, 13, 14, 18, 19, 21, 23, 27, 28, 32, 33, 34, 35, 40, 41, 43, 51, 52, 55], "picker": 5, "function": [5, 16, 17, 20, 21, 23, 25, 26, 27, 29, 30], "want": [5, 7, 27, 28, 32, 34, 37, 49, 51], "switch": [5, 29], "begin": [7, 11, 43], "share": [7, 14, 29, 32, 43, 44, 46, 49, 51], "intent": 7, "team": [7, 49], "base": [7, 11, 14, 15, 16, 18, 19, 25, 29, 32, 33, 36, 43, 46, 47, 51, 52, 54, 55], "bug": [7, 55], "propos": [7, 25], "log": [7, 11, 16, 18, 20, 22, 27, 30, 35, 37, 43, 44, 46, 49, 50, 51, 52, 54], "intend": [7, 55], "approv": 7, "fix": [7, 27, 32], "search": [7, 28], "pick": 7, "d": [7, 32, 34, 35, 53], "pleas": [7, 11, 14, 16, 17, 21, 27, 32, 34, 35, 37, 40, 43, 46, 48, 51, 52, 55], "pull": [7, 31, 36, 37, 43], "ensur": [7, 28], "run": [7, 11, 14, 18, 19, 22, 24, 26, 27, 28, 29, 30, 34, 36, 37, 39, 42, 53, 55], "patch": [7, 31, 46, 49, 51, 53, 54], "signific": [7, 18], "requir": [7, 11, 13, 15, 21, 22, 24, 25, 27, 28, 33, 40], "rfc": [7, 16, 21], "process": [7, 11, 21, 27, 28, 29, 31, 46, 47], "consist": [7, 27], "discuss": 7, "promot": 7, "found": [7, 14, 27, 28, 29, 31, 34], "dedic": 7, "contributor": [7, 55], "coven": [7, 55], "conduct": [7, 28], "full": [7, 37], "locat": [7, 8, 34, 35, 46, 49], "benchmark": 7, "llga": [7, 30], "saniti": [7, 55], "migrat": 7, "path_to_python_unit_test": 7, "ut": 7, "find": [7, 11, 22, 29, 31], "py": [7, 11, 16, 22, 28, 31, 43, 44, 49, 50, 51, 52, 53, 54], "do": [7, 14, 16, 19, 27, 28, 30, 34, 47], "done": [7, 22, 27, 29, 32], "standard": [7, 25], "pylint": 7, "against": 7, "definit": [7, 18, 23, 30], "root": [7, 34, 35, 50], "pip": [7, 11, 14, 16, 22, 30, 31, 32, 33, 34, 36, 37, 40, 41, 49, 52, 53, 54, 55], "rcfile": 7, "pylintrc": 7, "myfil": 7, "conform": 7, "googl": [7, 14, 16, 21, 22, 31, 51], "both": [7, 14, 15, 18, 19, 23, 28, 29, 30, 37, 43], "clang": 7, "format": [7, 9, 18, 24, 27, 30], "cpplint": 7, "apt": [7, 16, 31, 32, 37], "12": [7, 14, 27, 28, 37, 46, 49, 52, 54, 55], "inplac": 7, "stdout": [7, 28], "filter": 7, "legal": 7, "copyright": 7, "exclud": 7, "third_parti": [7, 9, 31], "recurs": 7, "sometim": 7, "fals": [7, 17, 25, 27, 28, 46, 51, 54], "error": [7, 11, 14, 20, 25, 27, 31, 43, 44, 46, 49, 50, 51], "nolint": 7, "nolintnextlin": 7, "skip": [7, 27, 28, 33], "line": [7, 27, 29, 31, 43, 50, 54], "mkl": [7, 31, 32, 33, 34, 35, 37], "h": [7, 11, 14, 17, 31, 35, 52], "include_subdir": 7, "buildifi": 7, "tool": [7, 9, 11, 14, 18, 29, 32, 33, 34, 37, 55], "bzl": 7, "convent": 7, "xxx": [7, 47, 50], "tpl": 7, "go": [7, 35, 36, 37], "golang": 7, "dl": 7, "go1": 7, "15": [7, 16, 28, 37], "amd64": [7, 32], "gz": [7, 29], "sudo": [7, 16, 31, 32, 34, 37], "usr": [7, 28, 32], "xzf": 7, "bazelbuild": [7, 34], "buildtool": 7, "src": [7, 11, 14, 17, 31], "home": [7, 28, 32, 36, 37, 50], "NOT": [7, 14], "zzz": 7, "view": 8, "latest": [8, 16, 31, 33, 34, 35, 37, 55], "previou": [8, 25, 29], "valid": [8, 30], "here": [8, 11, 17, 18, 24, 34, 46, 49, 54], "contact": 8, "addit": [8, 21, 23, 24, 29, 55], "assist": 8, "none": [8, 25, 26, 27, 28, 30], "docker": [9, 38, 39, 42], "docs_build": 9, "core": [9, 11, 14, 16, 17, 26, 27, 29, 34, 35, 37, 47, 48, 52, 54], "test": [9, 19, 22, 27, 31, 33, 39, 42, 50, 55], "third": [9, 55], "parti": [9, 55], "program": [9, 29, 55], "kei": [9, 16, 17, 20, 32], "parent": 9, "sub": [9, 14, 18, 19, 29, 30], "descript": [9, 13, 18, 28, 29, 30, 39, 42, 50], "onednn": [9, 11, 12, 14, 15, 16, 20, 24, 29, 30, 39, 42], "propag": [9, 13, 17], "miscellan": 9, "repositori": [9, 32, 46], "modular": 10, "pluggabl": [10, 35, 37], "streamexecutor": [10, 16], "registr": [10, 11], "pluggabledevic": [10, 55], "pass": [11, 15, 16, 17, 27, 30, 49, 54], "procedur": [11, 16, 32, 36, 37], "tf": [11, 14, 15, 19, 22, 25, 26, 27, 28, 30, 32, 34, 36, 37, 47, 48, 53, 54], "__version__": [11, 30, 32, 34, 36, 37, 55], "verbos": [11, 19, 20, 27, 28], "itex_verbos": [11, 16, 17], "onednn_verbos": 11, "familiar": [11, 16], "architectur": 11, "built": [11, 31, 36, 37], "creat": [11, 18, 27, 28, 30, 33, 35, 37, 41, 47, 54], "offcial": 11, "geluop": 11, "init": [11, 53], "void": 11, "register_geluop": 11, "declar": 11, "call": [11, 15, 16, 26, 27, 29, 30, 38, 41, 47, 48, 50, 51], "nn": [11, 16, 25, 26, 30, 48], "itex_vlog": 11, "statusuniqueptr": 11, "tf_newstatu": [11, 35], "tf_opdefinitionbuild": 11, "op_build": 11, "tf_newopdefinitionbuild": 11, "gelu": [11, 30], "tf_opdefinitionbuilderaddinput": 11, "tf_opdefinitionbuilderaddoutput": 11, "activ": [11, 18, 19, 22, 25, 27, 29, 30, 32, 33, 34, 36, 37, 41, 43, 44, 47, 48, 50, 52], "tf_opdefinitionbuilderaddattr": 11, "half": [11, 27], "float": [11, 18, 20, 27, 30, 35, 43], "approxim": [11, 25], "bool": 11, "true": [11, 22, 25, 26, 27, 28, 30, 46, 51, 54], "tf_opdefinitionbuildersetshapeinferencefunct": 11, "unchanged_shape_fn": 11, "tf_registeropdefinit": 11, "itex_check_eq": 11, "tf_ok": [11, 35], "tf_getcod": [11, 35], "fail": [11, 27, 30], "its": [11, 25, 27, 28, 29, 32, 37, 48], "docstr": 11, "attr": [11, 20], "might": [11, 34], "debug": [11, 20, 22, 30], "one": [11, 14, 15, 20, 21, 27, 29, 34, 43, 48, 53], "made": 11, "separ": [11, 16, 23, 24, 27, 29, 33, 34, 55], "register_kernel_build": 11, "device_cpu": 11, "typeconstraint": 11, "cpudevic": 11, "device_gpu": [11, 17, 54], "gpudevic": 11, "engin": [11, 14], "polymorph": 11, "load_ops_librari": 11, "load": [11, 27, 31, 37], "register_": 11, "macro": 11, "directli": [11, 17, 27, 28, 29, 37], "relubaseop": 11, "eltwisebaseop": 11, "opkernel": 11, "templat": 11, "typenam": 11, "opkernelconstruct": 11, "context": [11, 25, 29], "dnnl": [11, 13], "algorithm": [11, 25], "eltwise_gelu_erf": 11, "0f": 11, "hasattr": [11, 30], "op_requires_ok": 11, "getattr": 11, "approximate_": 11, "alg_kind_": 11, "eltwise_gelu_tanh": 11, "algo": 11, "alpha": 11, "beta": 11, "eltwis": 11, "rewrit": [11, 16, 17], "comput": [11, 15, 16, 25, 27, 29, 32, 40, 48, 49, 55], "ctx": 11, "alpha_": 11, "beta_": 11, "opkernelcontext": 11, "try": [11, 21, 28, 40, 47], "onednn_engin": 11, "creatednnlengin": 11, "tensor": [11, 25, 27, 35, 48], "dst_tensor": 11, "nullptr": 11, "noth": 11, "return": [11, 16, 17, 27, 30, 35], "src_tensor": 11, "shape": [11, 13, 17, 19, 25, 27, 48], "num_el": 11, "allocate_output": 11, "kdstindex": 11, "forward": [11, 27, 49], "descriptor": 11, "primit": [11, 13, 20], "eltwise_forward": 11, "desc": [11, 13], "fwd_desc": 11, "prop_kind": 11, "src_md": 11, "primitive_attr": 11, "set_scratchpad_mod": 11, "scratchpad_mod": 11, "primitive_desc": 11, "fwd_pd": 11, "fwd_primit": 11, "onednn_stream": 11, "creatednnlstream": 11, "std": [11, 35], "unordered_map": 11, "int": [11, 35], "fwd_primitive_arg": 11, "dnnl_arg_src": 11, "src_mem": 11, "dnnl_arg_dst": 11, "dst_mem": 11, "dnnl_arg_scratchpad": 11, "scratchpad_mem": 11, "catch": 11, "protect": 11, "eltwise_relu": 11, "hpp": 11, "It": [11, 14, 15, 16, 17, 18, 19, 20, 21, 27, 29, 33, 34, 39, 42, 47, 50, 55], "elig": 11, "infer": [11, 15, 17, 18, 19, 24, 27, 31, 39, 40, 42, 47, 50], "backward": [11, 27], "descibl": 11, "click": [11, 34], "header": 11, "itex_xpu_librari": 11, "relu_op": 11, "hdr": [11, 31], "relu_op_functor": 11, "eltwise_base_hdr": 11, "copt": [11, 31], "tf_copt": [11, 31], "linkstat": 11, "dep": [11, 31], "alwayslink": [11, 31], "gpu_kernel": 11, "In": [11, 16, 18, 19, 27, 28, 29, 33, 40, 43, 47, 48, 52, 54], "tip": [11, 20, 29, 31], "compil": [11, 14, 16, 19, 21, 27, 29, 30, 31, 32, 33, 34, 35, 37], "name_scop": 11, "convert_to_tensor": 11, "intel_extension_for_tensorflow": [11, 17, 18, 19, 25, 26, 27, 28, 31, 32, 33, 34, 36, 37, 43, 55], "clean": [11, 35], "xfd": 11, "config": [11, 14, 16, 17, 18, 19, 27, 31, 32, 34, 35, 37, 43, 47, 52, 53, 54], "pip_packag": [11, 34], "build_pip_packag": [11, 34], "uninstal": 11, "intel_extension_for_tensorflow_lib": [11, 34], "x": [11, 19, 25, 26, 27, 34, 35, 43, 48, 52], "constant": [11, 15, 25, 26, 27], "dtype": [11, 19, 25, 26, 48, 54], "float32": [11, 16, 19, 24, 25, 26, 27, 46, 48], "y": [11, 16, 25, 26, 27, 32, 34, 35, 43, 52, 55], "nn_op": 11, "141": 11, "common_runtim": 11, "eager": [11, 25], "1445": 11, "job": [11, 20, 35], "replica": [11, 20], "task": [11, 20, 29, 53], "100": [11, 27, 30, 46, 53], "eltwise_bas": 11, "44": [11, 28], "exec": [11, 13], "ocl": 11, "gen9": 11, "forward_train": 11, "data_f32": 11, "block": [11, 29, 30, 37], "f0": 11, "diff_undef": 11, "undef": 11, "scratchpad": [11, 13], "alg": 11, "5": [11, 18, 19, 20, 22, 25, 27, 30, 32, 34, 35, 36, 46, 48, 51, 54], "xxxxxx": 11, "op_kernel": 11, "773": 11, "object": [12, 14, 18, 27, 29, 30, 43, 44, 46, 49, 50, 51], "cach": [12, 15, 29], "creation": 13, "overhead": [13, 27, 29], "becom": [13, 29], "notic": [13, 27], "especi": [13, 33], "small": [13, 25, 27, 28, 29], "latenc": [13, 43, 49], "bind": [13, 29, 35], "node": [13, 18, 20, 24, 29, 33, 40], "By": [13, 27, 28, 29, 47], "off": [13, 28, 30, 47, 54, 55], "dynam": [13, 27, 29], "mean": [13, 14, 18, 25, 27, 28, 29, 34], "invalid": [13, 29], "dim": 13, "meta": 13, "layout": [13, 28, 30], "parallel": [13, 16, 29], "schedul": [13, 25, 28, 29], "thread": [13, 28, 29, 30, 37], "safe": [13, 18, 30, 55], "stream": [13, 49, 54], "demand": [13, 55], "satisfi": [13, 23], "concurr": [13, 29], "case": [13, 18, 19, 21, 27, 28, 29, 43, 53], "mutex": 13, "lock": 13, "weight": [13, 25, 27, 46, 48, 54], "bia": [13, 20, 24, 25, 48], "temporari": 13, "area": 13, "reorder": 13, "argument": [13, 25, 27, 28, 30], "whether": [14, 24, 28, 29], "successfulli": [14, 31, 33, 34, 35, 37, 54], "platform": [14, 16, 27, 29, 30, 32, 34, 36, 46, 49, 50, 51, 54], "zero": [14, 16, 25, 26, 27, 32], "opencl": [14, 16, 32, 37], "And": [14, 32, 36, 37], "high": [14, 16, 17, 27, 29, 55], "list_physical_devic": [14, 19, 27], "tell": 14, "regist": [14, 16, 40, 47], "2021": 14, "07": [14, 25, 37, 54], "01": [14, 30], "06": [14, 27], "40": [14, 28], "55": [14, 28, 29, 54], "510076": 14, "dpcpp_runtim": [14, 27], "116": 14, "select": [14, 16, 27, 28, 30, 49, 55], "physicaldevic": [14, 52], "physical_devic": [14, 52], "know": [14, 19, 27], "rate": [14, 15, 18, 25, 31], "system": [14, 21, 29, 31, 33, 34], "monitor": 14, "capabl": [14, 27], "clock": 14, "frequenc": 14, "eu": 14, "count": 14, "amount": [14, 27], "so": [14, 16, 19, 27, 28, 29, 30, 31, 34, 35, 43, 44, 46, 49, 50, 51, 52], "each": [14, 25, 27, 28, 29, 54], "modul": [14, 16, 17, 28], "relationship": [14, 18], "replac": [14, 25, 26, 31, 35], "stock": [14, 23, 24, 27, 32, 33, 36, 37, 40, 46, 49, 50, 51, 54, 55], "sinc": [14, 27, 29], "9": [14, 16, 18, 25, 28, 33, 34, 40, 41, 50, 54], "That": [14, 29, 34, 43], "them": [14, 18, 21, 27, 28, 29, 31, 50, 53], "unknown": [14, 27], "help": [14, 19, 20, 21, 28, 29, 37, 40, 47], "acceler": [14, 16, 30, 39, 42, 43, 47, 55], "q1": 14, "2024": 14, "discontinu": 14, "upstream": [14, 18], "futur": 14, "current": [14, 17, 22, 30, 46, 54], "upgrad": [14, 32, 33, 36, 37, 40, 41, 55], "section": [14, 27, 29, 32], "problem": [14, 24, 27, 29], "encount": 14, "sycl": [14, 16], "level_zero_util": 14, "33": [14, 16, 32, 37, 53], "fatal": 14, "level_zero": 14, "ze_api": 14, "modulenotfounderror": 14, "depend": [14, 19, 28, 29, 32, 34, 35, 37], "framework": [14, 32, 35, 43, 44, 45, 46, 49, 51, 53], "errors_impl": [14, 43, 44, 46, 49, 51], "notfounderror": [14, 43, 44, 46, 49, 51], "libmkl_sycl": [14, 43, 44, 46, 49, 51], "cannot": [14, 18, 43, 44, 46, 49, 51], "setvar": [14, 32, 37, 41, 52], "env": [14, 31, 33, 34, 35, 37, 41, 47, 49], "var": [14, 31, 33, 34, 35, 37], "toolkit": [14, 16, 32, 33, 40, 43, 52, 55], "glibcxx_3": 14, "4": [14, 17, 18, 20, 24, 25, 27, 28, 29, 33, 46, 48, 52, 54], "30": [14, 35, 54], "forg": 14, "gxx_linux": 14, "64": [14, 16, 17, 19, 27, 28, 32, 34, 35, 36, 37, 46], "higher": [14, 15, 20, 27, 29], "glibcxx": 14, "veri": [15, 27, 46], "popular": 15, "deep": [15, 29, 39, 42, 55], "techniqu": [15, 27], "invent": 15, "improv": [15, 19, 27, 29, 34, 54], "speed": [15, 18, 29, 39, 40, 42], "minim": [15, 29], "number": [15, 24, 27, 29, 39, 40, 42, 46, 49, 53, 54], "bit": [15, 16, 18, 27, 30, 32, 34, 35, 36, 37, 43], "convert": [15, 17, 18, 19, 27, 40, 43], "real": [15, 27, 53], "valu": [15, 17, 18, 20, 25, 27, 28, 29, 30, 53], "represent": 15, "mainli": [15, 17, 28], "phase": [15, 46], "loss": [15, 18, 19, 39, 40, 42, 47, 52], "accuraci": [15, 18, 19, 25, 27, 39, 40, 42, 47, 52], "reduc": [15, 18, 27, 29, 34, 40, 46, 49, 54], "miss": 15, "cost": 15, "network": [15, 29], "v2": [15, 30, 33, 46, 53], "newer": [15, 40, 41, 47], "integr": [15, 16, 29, 34], "box": 15, "green": 15, "subgraph": 15, "onednngraph": 15, "part": [15, 17, 29, 46], "executor": 15, "partit": [15, 29], "deleg": 15, "grappler": [15, 17, 19, 52], "fold": 15, "itex_tf_constant_fold": [15, 47], "incept": [15, 18, 39, 42, 49], "v3": [15, 39, 42], "introduc": [16, 28, 29], "seamlessli": 16, "simplifi": [16, 40], "quickli": [16, 20, 27], "initi": [16, 17, 20, 27, 34, 53], "pytorch": 16, "xla": 16, "numpi": [16, 22, 25, 27, 48], "style": 16, "compos": [16, 17], "transform": [16, 24, 25], "batch": [16, 17, 25, 27, 28, 54], "differenti": [16, 34], "multipl": [16, 18, 20, 29, 53, 54], "_src": 16, "xla_bridg": 16, "register_pjrt_plugin_factori": 16, "getenv": 16, "pjrt_names_and_library_path": 16, "your_itex_path": 16, "libitex_xla_extens": 16, "jaxlib": 16, "xla_extens": 16, "lastest": 16, "interfac": [16, 17, 38, 55], "got": 16, "getpjrtapi": 16, "verifi": [16, 33, 34, 39, 42, 46, 49, 50, 51, 54], "max": [16, 30, 34, 37, 43, 45, 46, 49, 50, 51, 52, 54], "647": [16, 32, 37], "flex": [16, 34, 37, 40, 43, 45, 47, 49, 51, 55], "170": [16, 34, 37, 49, 51], "arc": [16, 34, 37, 43, 55], "red": [16, 37], "hat": [16, 37], "8": [16, 18, 25, 27, 28, 30, 32, 34, 36, 37, 46, 47, 53], "6": [16, 18, 27, 30, 37, 46], "suse": [16, 37], "enterpris": [16, 37], "sle": [16, 37], "sp3": [16, 37], "sp4": [16, 37], "2023": [16, 32, 33, 37, 52], "19": [16, 28, 32, 36, 37], "later": [16, 29, 32, 36, 37], "manylinux2014": [16, 32, 36, 37], "append": [16, 32, 36, 37], "after": [16, 17, 18, 19, 22, 24, 26, 27, 29, 30, 32, 33, 37, 40, 46], "compon": [16, 17, 19, 30, 32, 33, 34, 37], "icd": [16, 32, 37], "23": [16, 28, 32, 37, 53, 54], "17": [16, 28, 32, 35, 37], "26241": [16, 32, 37], "There": [16, 21, 34, 40, 43, 47], "ye": [16, 19, 33], "wish": [16, 34], "n": [16, 18, 24, 25, 29, 30, 33, 34, 35, 48], "libitex": [16, 35], "ld_library_path": [16, 35], "your_python_sit": 16, "info": [16, 17, 18, 28, 35, 40, 43], "jnp": 16, "jit": 16, "def": [16, 27], "lax_conv": 16, "random": [16, 25, 48], "prngkei": 16, "lh": 16, "rh": 16, "side": 16, "lax": 16, "conv_with_general_pad": 16, "multipli": [16, 27], "itex_gpu_runtim": 16, "129": [16, 28], "servic": 16, "176": [16, 32], "0x56060b5ae740": 16, "doe": [16, 24, 27], "guarante": [16, 32], "184": 16, "0449753": 16, "093208": 16, "1844783": 16, "9769732": 16, "5857391": 16, "6942389": 16, "9218378": 16, "2862523": 16, "1549542": 16, "8367321": 16, "3978379": 16, "3860377": 16, "9456574": 16, "062028": 16, "0365305": 16, "901286": 16, "5255247": 16, "1421617": 16, "0621": 16, "2933435": 16, "1257985": 16, "1095486": 16, "5584903": 16, "1229166": 16, "7746235": 16, "2446113": 16, "7870374": 16, "8216239": 16, "557919": 16, "9832508": 16, "0887792": 16, "5433128": 16, "9749291": 16, "2580051": 16, "6096935": 16, "264905": 16, "175818": 16, "0094342": 16, "005763": 16, "6559253": 16, "3896458": 16, "4036925": 16, "1342552": 16, "8239582": 16, "6091168": 16, "434404": 16, "671778": 16, "7397764": 16, "930626": 16, "659667": 16, "6508744": 16, "3305787": 16, "4061482": 16, "0829628": 16, "130649": 16, "6637266": 16, "594426": 16, "2636002": 16, "7168686": 16, "8598001": 16, "9009514": 16, "7938274": 16, "4870623": 16, "6193901": 16, "5297288": 16, "0247464": 16, "0905268": 16, "7598859": 16, "9362347": 16, "9513799": 16, "9403584": 16, "1483061": 16, "hlo_pass_pipelin": 16, "301": 16, "hlo": 16, "pipelin": [16, 39, 40, 42, 47], "jit_lax_conv": 16, "181": 16, "fusion_merg": 16, "multi_output_fus": 16, "conv": [16, 17, 24, 48], "convolut": [16, 29], "gpu_compil": 16, "1221": 16, "llvm": 16, "spir_compil": 16, "255": [16, 19, 27], "compiletargetbinari": 16, "compiletospir": 16, "11": [16, 18, 28, 32, 33, 34, 55], "cumul": 16, "99": 16, "74": 16, "pjrt_stream_executor_cli": 16, "2201": 16, "num_replica": 16, "num_partit": 16, "num_addressable_devic": 16, "2268": 16, "replic": 16, "complet": [16, 29], "1208": 16, "pjrtstreamexecutorbuff": 16, "delet": 16, "1299": 16, "toliter": 16, "v0": [16, 30, 33], "mnist_classifi": 16, "given": [17, 25, 28], "tile": [17, 20, 30, 46, 52, 54], "split": [17, 18, 53], "dimens": 17, "As": [17, 24, 27, 28, 29], "first": [17, 18, 19, 22, 24, 25, 27, 28, 29, 32, 33, 36, 37, 46], "limit": [17, 29, 55], "homogen": 17, "At": [17, 21, 40, 49], "tfg": 17, "mlir": 17, "assum": [17, 27, 29, 33, 34, 46], "matmul": [17, 20, 24, 26, 35], "normal": [17, 20, 25, 27, 29, 34, 43], "autoshard": [17, 54], "back": [17, 27], "under": [17, 23, 26, 28, 30, 34, 47], "primari": [17, 29], "entri": 17, "point": [17, 18, 20, 27, 30, 32, 37, 43], "auto_sharding_pass_mlir": 17, "invok": 17, "hook": 17, "convers": [17, 18, 19, 24], "between": [17, 18, 19, 21, 29, 31, 34, 49, 53, 54], "graphdef": [17, 18], "dialect": 17, "type_infer": 17, "tfg_to_h": 17, "auto_sharding_pass": 17, "hs_to_tfg": 17, "mark": 17, "scope": [17, 35, 53], "unshard": 17, "annot": 17, "uniniti": 17, "properti": [17, 18, 27], "ir": 17, "heterogen": [17, 55], "reli": 17, "heurist": 17, "hsp": 17, "per": [17, 27, 28, 29, 33, 52, 54], "semant": [17, 20, 25], "final": [17, 19, 27, 46], "accord": [17, 18, 43, 50, 52, 53], "turn": [17, 55], "graphopt": [17, 18, 19, 43, 54], "ON": [17, 30, 43, 54], "flag": [17, 35], "global": [17, 27, 30, 54], "shardingconfig": [17, 54], "mode": [17, 20, 24, 30, 46, 49, 53], "auto_mod": [17, 54], "paramet": [17, 26, 43], "batch_siz": [17, 19, 27, 54], "stage_num": [17, 54], "decid": 17, "device_num": [17, 54], "graph_opt": [17, 18, 19, 30, 43, 47, 54], "sharding_config": [17, 54], "itex_cfg": [17, 54], "configproto": [17, 18, 19, 43, 47, 54], "set_config": [17, 18, 19, 43, 54], "itex_optimizer_before_shard": 17, "pbtxt": 17, "itex_optimizer_after_shard": 17, "resnet50": [17, 28, 39, 42], "train": [17, 18, 21, 24, 25, 26, 28, 31, 32, 33, 37, 38, 39, 42, 43, 46, 47, 51], "fp16": [18, 19, 39, 42, 43, 46], "bf16": [18, 19, 24, 39, 40, 42, 43, 46, 54], "obvious": 18, "compar": [18, 27, 29, 39, 42], "fp32": [18, 19, 20, 24, 39, 40, 42, 46, 47], "danger": 18, "order": [18, 19, 27, 28, 29, 33, 38], "achiev": [18, 29], "faster": [18, 19, 25, 27, 29, 43], "strong": 18, "four": 18, "allowlist": 18, "denylist": 18, "inferlist": 18, "clearlist": 18, "let": [18, 27, 31], "balanc": [18, 19], "expect": [18, 33, 47, 55], "alwai": [18, 27], "critic": 18, "addition": [18, 27], "downstream": 18, "too": [18, 27, 32, 37], "exp": 18, "gt": [18, 30, 54], "due": [18, 29], "effect": [18, 28, 29], "desir": [18, 28], "explain": 18, "principl": 18, "index": [18, 29, 53], "7": [18, 27, 28, 30, 46, 49], "everi": [18, 20, 49], "ii": [18, 19, 30], "whose": 18, "iii": [18, 19], "deni": 18, "ignor": [18, 27, 31], "iv": [18, 19], "insert": [18, 19, 24, 47], "increas": [18, 27, 47], "priorit": 18, "auto_mixed_precision_opt": [18, 19, 43], "automixedprecosionopt": 18, "16": [18, 27, 28, 30, 36, 43, 46], "32": [18, 25, 26, 27, 28, 30, 43, 46, 51], "data_typ": [18, 19, 43], "itex_auto_mixed_precision_data_typ": [18, 19, 43], "ampthre": 18, "default_data_typ": [18, 30], "unsafe_force_al": 18, "itex_auto_mixed_precision_unsafe_force_al": 18, "allowlist_add": [18, 19], "itex_auto_mixed_precision_allowlist_add": [18, 19], "string": [18, 27, 28, 34, 35], "denylist_add": 18, "itex_auto_mixed_precision_denylist_add": 18, "inferlist_add": 18, "itex_auto_mixed_precision_inferlist_add": 18, "clearlist_add": 18, "itex_auto_mixed_precision_clearlist_add": 18, "allowlist_remov": 18, "itex_auto_mixed_precision_allowlist_remov": 18, "denylist_remov": 18, "itex_auto_mixed_precision_denylist_remov": 18, "inferlist_remov": [18, 19], "itex_auto_mixed_precision_inferlist_remov": [18, 19], "clearlist_remov": 18, "itex_auto_mixed_precision_clearlist_remov": 18, "avgpool": [18, 19], "mani": [18, 21, 27, 28, 29, 52], "extra": [18, 27], "up": [18, 22, 27, 29, 32, 35, 39, 42, 46, 49, 51], "tabl": [18, 27, 28], "correspond": [18, 28], "itex_auto_mixed_precision_log_path": [18, 19, 20, 30], "tf_auto_mixed_precision_graph_rewrite_log_path": 18, "tf_auto_mixed_precision_graph_rewrite_level": 18, "tf_auto_mixed_precision_graph_rewrite_allowlist_add": 18, "tf_auto_mixed_precision_graph_rewrite_denylist_add": 18, "tf_auto_mixed_precision_graph_rewrite_inferlist_add": 18, "tf_auto_mixed_precision_graph_rewrite_clearlist_add": 18, "tf_auto_mixed_precision_graph_rewrite_allowlist_remov": 18, "tf_auto_mixed_precision_graph_rewrite_denylist_remov": 18, "tf_auto_mixed_precision_graph_rewrite_inferlist_remov": 18, "tf_auto_mixed_precision_graph_rewrite_clearlist_remov": 18, "With": [18, 19, 27, 28, 40, 44, 48, 49], "most": [18, 19, 27, 28, 29, 43, 50], "basic": [18, 19, 20, 27], "itexauto_mixed_precision_opt": [18, 19], "automixedprecisionopt": [18, 19, 43], "float16graph_opt": [18, 19], "auto_mixed_precision_optionsgraph_opt": 18, "auto_mixed_precis": [18, 19, 30, 43], "onconfig": [18, 19], "itex_auto_mixed_precis": [18, 19, 28, 30, 43], "1export": [18, 19], "avgpool3d": [18, 19], "cnn": [18, 29, 39, 40, 42], "v4": [18, 39, 42], "epoch": [18, 19, 27, 46, 52], "slower": [18, 19, 27], "becaus": [18, 19, 27], "subsequ": [18, 19, 27, 29, 49], "alreadi": [18, 27, 33, 40], "howev": [18, 21, 24, 27, 28, 29, 49], "usual": 18, "chanc": [18, 27], "my": [18, 19], "automixedprecis": 18, "1657011814330": 18, "pb": [18, 19, 31, 43], "binari": [18, 31, 34], "txt": [18, 32, 37, 49, 51, 54], "text": [18, 39, 42], "preop": 18, "1657011815538": 18, "pre": [18, 30, 36, 37, 46, 50], "paintbucket": 18, "netron": 18, "softmax": [18, 19, 27], "move": [18, 29, 46], "altern": 18, "abov": [18, 19, 22, 27, 28, 29, 32, 43, 46, 47, 50, 51, 52, 54], "littl": 18, "drop": [18, 28], "occupi": 18, "over": [18, 27], "whole": [18, 20, 30, 46], "runtim": [18, 23, 25, 27, 29, 32, 34, 53, 55], "repeat": 18, "until": [18, 29], "reach": 18, "peak": [18, 23], "consumpt": [19, 21, 27, 43], "kera": [19, 25, 26, 47, 49, 52, 55], "similar": [19, 29], "offer": [19, 29], "frozen": 19, "layernorm": [19, 24, 26], "instancenorm": [19, 26], "swish": [19, 24], "power": [19, 55], "versu": [19, 29], "remapp": [19, 24, 30], "exist": [19, 24, 26, 27, 28, 40], "cover": [19, 21, 24, 28, 29], "than": [19, 25, 27, 29, 32, 37, 43, 48, 52], "knowledg": [19, 29], "possibl": [19, 29, 34], "special": [19, 23, 27, 34], "bfloat16graph_opt": 19, "4096": [19, 27], "unit": [19, 25, 27, 29], "num_unit": [19, 27], "els": [19, 27, 35, 53], "784": [19, 27, 28], "digit": [19, 27], "dens": [19, 20, 27], "dense_1": [19, 27], "dense_2": [19, 27], "dense_logit": [19, 27], "predict": [19, 26, 27, 51], "sparse_categorical_crossentropi": [19, 27], "rmsprop": [19, 27], "metric": [19, 27], "x_train": [19, 27], "y_train": [19, 27], "x_test": [19, 27], "y_test": [19, 27], "dataset": [19, 27, 47, 52], "mnist": [19, 27, 31, 39, 42, 52], "load_data": [19, 27], "reshap": [19, 25, 27], "60000": [19, 27], "astyp": [19, 27, 48], "10000": [19, 25, 27], "histori": [19, 27], "fit": [19, 29], "8192": [19, 27], "validation_split": [19, 27], "test_scor": [19, 27], "evalu": [19, 27, 49, 51], "stabil": [19, 27], "rule": 19, "introduct": [19, 55], "adjust": [20, 25], "Not": 20, "rest": [20, 24], "ll": [20, 24], "prioriti": [20, 30], "itex_tile_as_devic": 20, "card": [20, 52], "treat": 20, "itex_fp32_math_mod": 20, "math": [20, 24, 27, 32, 37], "tf32": 20, "bf32": 20, "auto_mixed_precision_log_path": [20, 30], "tf_cpp_max_vlog_level": 20, "itex_cpp_min_log_level": 20, "tf_cpp_min_log_level": 20, "displai": 20, "onc": [20, 27, 29], "across": [20, 25], "iter": [20, 54], "larg": [20, 27, 29, 39], "dump": 20, "bert": [20, 39, 42], "encod": 20, "layer_0": 20, "biasadd": [20, 26], "read": [20, 27, 40], "dt_float": [20, 35], "data_format": [20, 54], "nhwc": [20, 29], "remain": 20, "situat": [20, 30], "preserv": 20, "dpc": [21, 32, 33, 34, 37], "besid": [21, 29], "etc": [21, 32], "aka": 21, "almost": 21, "thing": 21, "expos": [21, 22, 55], "factor": [21, 28], "influenc": [21, 28, 29], "properli": [21, 28], "unifi": [21, 28], "topologi": [21, 28, 29], "combin": [21, 28, 29, 49], "autom": [21, 28], "complic": [21, 28], "launch": [21, 37, 49], "blob": [21, 31], "20230123": 21, "md": 21, "openxla_support_on_gpu": 21, "tfx": 21, "bridg": [21, 31], "streamlin": [21, 31], "deploi": [21, 31], "while": [21, 27, 29, 30, 31, 34, 44, 48, 50], "effici": [21, 29, 31, 54], "easi": [21, 40, 55], "track": [22, 50], "item": 22, "stat": 22, "trace": 22, "viewer": 22, "tensorflow_hub": 22, "tensorboard": [22, 55], "np": [22, 25, 48, 52, 53], "tf_hub": 22, "logpath": 22, "join": [22, 29], "profiler_demo": 22, "set_log_device_plac": 22, "keraslay": 22, "tfhub": 22, "imagenet": [22, 53], "resnet_v1_50": 22, "classif": 22, "ones": [22, 25, 26, 30], "224": 22, "warm": 22, "stop": [22, 29], "demo": 22, "logdir": 22, "bind_al": 22, "analyz": 22, "tab": 22, "dashboard": 22, "refresh": 22, "bring": [23, 27, 28, 55], "deeper": 23, "choos": [23, 25, 27, 28, 29, 34, 38, 43, 47, 48], "These": [24, 27, 28, 55], "equal": [24, 29, 53], "notequ": 24, "greaterequ": 24, "greater": [24, 29], "lessequ": 24, "l2loss": 24, "addn": 24, "batchmatmul": [24, 26], "mul": 24, "trainingop": 24, "relu6": 24, "elu": 24, "leakyrelu": 24, "gelu_erf": 24, "gelu_tanh": 24, "tanh": [24, 25, 26], "sigmoid": [24, 25, 26], "fusedbatchnorm": 24, "fusedbatchnormgrad": 24, "relugrad": 24, "biasaddgrad": 24, "convgradfilt": 24, "pad": [24, 25, 48], "break": 24, "closer": 24, "accmatmul": 24, "fusedmatmul": 24, "fusedaccmatmul": 24, "matcher": 24, "withsum": 24, "attribut": [24, 30], "tout": 24, "tpost": 24, "is_bf16_math_mod": 24, "boolean": [24, 28], "indic": [24, 27, 43, 54], "transpos": [24, 26], "conv3d": 24, "maxpool3d": 24, "unnecessari": [24, 27, 29], "ndhwc": 24, "ncdhw": 24, "adam": 25, "decai": 25, "weight_decay_r": 25, "001": [25, 26], "learning_r": [25, 51], "beta_1": 25, "beta_2": 25, "999": 25, "epsilon": [25, 26], "1e": [25, 27], "exclude_from_weight_decai": 25, "layer_norm": 25, "kwarg": [25, 26], "adamw": 25, "describ": [25, 27, 28, 29], "decoupl": 25, "regular": 25, "loshch": 25, "ilov": 25, "hutter": 25, "pdf": 25, "tfa": [25, 26], "trainabl": 25, "piecewiseconstantdecai": 25, "15000": 25, "lr": [25, 52], "wd": 25, "lambda": 25, "ba": 25, "et": 25, "al": 25, "2016": 25, "axi": [25, 26], "scale": [25, 26, 54], "beta_initi": [25, 26], "gamma_initi": [25, 26], "beta_regular": [25, 26], "gamma_regular": [25, 26], "beta_constraint": [25, 26], "gamma_constraint": [25, 26], "independ": [25, 28], "rather": 25, "close": [25, 29], "deviat": 25, "arang": 25, "99998": 25, "group": [25, 29], "yuxin": 25, "wu": 25, "kaim": 25, "he": 25, "divid": [25, 27, 29], "varianc": 25, "empir": 25, "stabl": [25, 27, 39, 42, 55], "norm": 25, "wide": [25, 39, 42], "rang": [25, 27, 29], "linearli": 25, "4d": 25, "gaussian": 25, "where": [25, 27, 29, 34], "nonlinear": 25, "gate": 25, "sign": [25, 32], "arrai": 25, "00404969": 25, "15865526": 25, "8413447": 25, "9959502": 25, "00363725": 25, "158808": 25, "841192": 25, "9963627": 25, "long": 25, "short": [25, 27], "hochreit": 25, "schmidhub": 25, "1997": 25, "lstm": 25, "200": [25, 26, 53], "recurrent_activ": [25, 26], "use_bia": [25, 26], "kernel_initi": [25, 26], "glorot_uniform": [25, 26], "recurrent_initi": [25, 26], "orthogon": [25, 26], "bias_initi": [25, 26], "constraint": 25, "fallback": 25, "fast": 25, "mask": 25, "strictli": 25, "outermost": 25, "return_sequ": 25, "return_st": 25, "whole_seq_output": 25, "final_memory_st": 25, "final_carry_st": 25, "experimental_ops_overrid": [26, 30], "overload": 26, "kept": [26, 27], "layernormgrad": 26, "itexlayernorm": 26, "itexlayernormgrad": 26, "itexgelu": 26, "itexgelugrad": 26, "addon": [26, 52, 53], "itexlstm": 26, "itexrnn": 26, "mixed_precis": 27, "mixed_float16": 27, "mixed_bfloat16": 27, "distinguish": 27, "nvidia": [27, 46, 49], "is_gpu_avail": 27, "test_func": 27, "identif": 27, "2022": [27, 28, 30], "14": [27, 52], "02": 27, "52": [27, 28], "41": 27, "061277": 27, "w": 27, "gpu_profil": 27, "111": [27, 29], "warn": [27, 28, 35], "061301": 27, "114": [27, 52], "061306": 27, "118": 27, "063685": 27, "063851": 27, "stream_executor": 27, "cuda": 27, "cuda_driv": 27, "269": 27, "cuinit": 27, "303": 27, "063865": 27, "cuda_diagnost": 27, "156": 27, "dut3046": 27, "atsp": 27, "proc": [27, 29], "caus": [27, 29, 50], "set_global_polici": 27, "slowli": 27, "least": [27, 32, 33], "multi": [27, 29, 30, 33, 34, 54], "worker": [27, 53], "messag": [27, 28], "aspect": 27, "constructor": 27, "numer": 27, "queri": 27, "compute_dtyp": 27, "variable_dtyp": 27, "mention": [27, 29], "next": 27, "domin": 27, "neglig": 27, "therefor": [27, 29], "fewer": 27, "finish": [27, 34, 48, 50], "dense1": 27, "dense2": 27, "previous": 27, "Their": 27, "mismatch": 27, "dtype_polici": 27, "incorrect": 27, "end": [27, 39, 40, 42, 47], "would": [27, 32, 34, 53], "correct": [27, 34], "keep": [27, 29], "middl": 27, "fine": [27, 28, 29, 46], "intermedi": 27, "flow": 27, "occur": 27, "think": 27, "But": 27, "necessari": [27, 32, 36, 37, 48], "last": [27, 50], "suffici": 27, "even": [27, 28, 29, 38, 55], "still": 27, "simpli": [27, 54], "particular": 27, "storag": [27, 43, 50], "googleapi": [27, 43, 50], "npz": 27, "11490434": 27, "1u": 27, "don": 27, "divis": 27, "retriev": 27, "scratch": [27, 46], "again": 27, "initial_weight": 27, "get_weight": 27, "6240": 27, "3359": 27, "val_loss": 27, "9755": 27, "val_accuraci": 27, "7494": 27, "83m": 27, "7987": 27, "7520": 27, "3455": 27, "8972": 27, "81m": 27, "3670": 27, "8819": 27, "3753": 27, "8751": 27, "85m": 27, "3555": 27, "8863": 27, "2155": 27, "9377": 27, "84m": 27, "1986": 27, "9410": 27, "4498": 27, "8534": 27, "spend": 27, "afterward": [27, 28, 29], "colab": 27, "rerun": 27, "cell": [27, 49], "On": [27, 29, 32, 36, 37], "significantli": 27, "sped": 27, "world": 27, "doubl": 27, "toi": 27, "entir": 27, "60": [27, 28, 46], "000": 27, "imag": [27, 36, 37, 49, 53], "narrow": 27, "65504": 27, "infin": 27, "much": [27, 29, 47], "256": [27, 54], "inf": 27, "rare": 27, "gradient": 27, "prevent": 27, "concept": [27, 29], "sai": [27, 46], "1024": 27, "greatli": 27, "pseudocod": 27, "loss_scal": 27, "grad": 27, "compute_gradi": 27, "trainable_vari": 27, "tricki": 27, "solv": 27, "explicitli": [27, 28, 30, 47], "wrapper": [27, 37], "lossscaleoptim": 27, "far": 27, "did": [27, 29], "wrap": 27, "highli": 27, "recommend": [27, 29, 30, 31, 32, 33, 34, 36, 37, 41, 47], "been": [27, 29, 49, 54], "known": [27, 50], "loss_object": 27, "sparsecategoricalcrossentropi": 27, "train_dataset": 27, "from_tensor_slic": 27, "shuffl": 27, "test_dataset": 27, "method": [27, 29, 40, 47], "unscal": 27, "get_scaled_loss": 27, "get_unscaled_gradi": 27, "apply_gradi": 27, "nan": 27, "halv": 27, "had": [27, 29], "potenti": [27, 55], "train_step": [27, 46, 54], "gradienttap": 27, "tape": 27, "scaled_loss": 27, "scaled_gradi": 27, "zip": 27, "few": 27, "happen": [27, 50], "qualiti": 27, "test_step": 27, "retrain": 27, "set_weight": 27, "epoch_loss_avg": 27, "test_accuraci": 27, "sparsecategoricalaccuraci": 27, "update_st": 27, "924008369445801": 27, "7239000201225281": 27, "5294489860534668": 27, "9168000221252441": 27, "3364005982875824": 27, "9381000399589539": 27, "25294047594070435": 27, "9486000537872314": 27, "26531240344047546": 27, "9536000490188599": 27, "perspect": [28, 29], "numactl": 28, "placement": [28, 29], "polici": [28, 29, 55], "malloc": [28, 29], "unspecifi": 28, "knob": 28, "your_script": 28, "your_script_arg": 28, "latency_mod": 28, "throughput_mod": 28, "often": [28, 32, 36, 37], "calcul": [28, 49], "mutual": 28, "exclus": 28, "infer_resnet50": [28, 44], "undesir": 28, "log_path": 28, "absolut": 28, "rel": 28, "One": [28, 29], "prefix": 28, "_timestamp_inst": 28, "anoth": [28, 29], "_timestamp_instance_n_cor": 28, "run_20210712212258_inst": 28, "run_20210712212258_instance_0_cores_0": 28, "43": [28, 53], "interpret": 28, "no_python": 28, "prepend": [28, 53], "log_file_prefix": 28, "yourself": 28, "ninstanc": 28, "integ": 28, "instance_idx": 28, "among": [28, 29], "ncore_per_inst": 28, "resourc": [28, 29, 50, 53], "node_id": 28, "skip_cross_node_cor": 28, "cross": [28, 29], "disable_numactl": 28, "disable_taskset": 28, "taskset": 28, "use_logical_cor": 28, "core_list": 28, "core_id": 28, "enable_tcmalloc": 28, "enable_jemalloc": 28, "use_default_alloc": 28, "prefer": [28, 32, 36, 37], "certain": [28, 29], "openmp": 28, "kmp_affin": [28, 29], "granular": [28, 29], "compact": [28, 29], "hyper": [28, 29], "our": 28, "enable_itex_amp": 28, "enable_itex_layout_opt": 28, "itex_layout_opt": [28, 29, 30], "num": [28, 29], "intraop": 28, "interop": 28, "run_20221009103552_instance_0_cores_0": 28, "run_20221009103552_inst": 28, "cat": 28, "09": 28, "35": [28, 37], "53": 28, "136": 28, "__main__": 28, "neither": 28, "nor": 28, "conda_prefix": 28, "virtual_env": 28, "lib64": 28, "sdp": 28, "ld_preload": [28, 29], "omp_num_thread": 28, "96": [28, 35], "kmp_blocktim": [28, 29], "tf_enable_onednn_opt": 28, "137": 28, "localalloc": 28, "95": 28, "tee": [28, 32, 46, 54], "run_20221009104740_inst": 28, "run_20221009104740_instance_0_cores_0": 28, "191": 28, "47": 28, "908": 28, "909": 28, "192": 28, "run_20221009105044_inst": 28, "run_20221009105044_instance_0_cores_12": 28, "50": 28, "693": 28, "694": 28, "run_20221009105320_inst": 28, "run_20221009105320_instance_0_cores_0": 28, "21": 28, "089": 28, "090": 28, "run_20221009105838_inst": 28, "run_20221009105838_instance_0_cores_0": 28, "run_20221009105838_instance_1_cores_12": 28, "run_20221009105838_instance_2_cores_24": 28, "run_20221009105838_instance_3_cores_36": 28, "run_20221009105838_instance_4_cores_48": 28, "59": 28, "run_20221009105838_instance_5_cores_60": 28, "71": 28, "run_20221009105838_instance_6_cores_72": 28, "83": [28, 29], "run_20221009105838_instance_7_cores_84": 28, "58": 28, "38": 28, "757": 28, "772": 28, "795": 28, "24": [28, 52], "806": 28, "36": 28, "817": 28, "48": 28, "828": 28, "839": 28, "72": 28, "850": 28, "84": [28, 29], "run_20221009110327_inst": 28, "run_20221009110327_instance_0_cores_0": 28, "run_20221009110327_instance_1_cores_4": 28, "run_20221009110327_instance_2_cores_8": 28, "run_20221009110327_instance_3_cores_12": 28, "run_20221009110327_instance_4_cores_16": 28, "run_20221009110327_instance_5_cores_20": 28, "run_20221009110327_instance_6_cores_24": 28, "27": [28, 29, 54], "run_20221009110327_instance_7_cores_28": 28, "31": [28, 32], "run_20221009110327_instance_8_cores_32": 28, "run_20221009110327_instance_9_cores_36": 28, "39": 28, "run_20221009110327_instance_10_cores_40": 28, "run_20221009110327_instance_11_cores_44": 28, "run_20221009110327_instance_12_cores_48": 28, "51": 28, "run_20221009110327_instance_13_cores_52": 28, "run_20221009110327_instance_14_cores_56": 28, "run_20221009110327_instance_15_cores_60": 28, "63": 28, "run_20221009110327_instance_16_cores_64": 28, "67": 28, "run_20221009110327_instance_17_cores_68": 28, "run_20221009110327_instance_18_cores_72": 28, "75": 28, "run_20221009110327_instance_19_cores_76": 28, "79": 28, "run_20221009110327_instance_20_cores_80": 28, "run_20221009110327_instance_21_cores_84": 28, "87": 28, "run_20221009110327_instance_22_cores_88": 28, "91": 28, "run_20221009110327_instance_23_cores_92": 28, "03": [28, 53], "198": 28, "215": 28, "216": 28, "229": 28, "241": 28, "254": 28, "266": 28, "278": 28, "20": [28, 35, 36, 54], "290": 28, "302": 28, "28": [28, 29, 33, 37], "315": 28, "327": 28, "339": 28, "351": 28, "364": 28, "376": 28, "388": 28, "56": [28, 29], "400": 28, "413": 28, "425": 28, "68": 28, "438": 28, "452": 28, "76": 28, "465": 28, "80": 28, "480": 28, "494": 28, "88": [28, 51], "509": 28, "92": 28, "run_20221009110849_inst": 28, "run_20221009110849_instance_0_cores_0": 28, "run_20221009110849_instance_1_cores_11": 28, "run_20221009110849_instance_2_cores_22": 28, "run_20221009110849_instance_3_cores_33": 28, "08": 28, "49": [28, 37], "891": 28, "892": 28, "run_20221009110849_instance_1_cores_24": 28, "930": 28, "run_20221009110849_instance_2_cores_48": 28, "951": 28, "run_20221009110849_instance_3_cores_72": 28, "confirm": [28, 34], "34": [28, 53], "586": 28, "assign": [28, 29, 35], "604": 28, "605": 28, "run_20221009111034_instance_0_cores_0": 28, "144": 28, "145": [28, 53, 54], "run_20221009111239_instance_0_cores_24": 28, "run_20221009111753_inst": 28, "run_20221009111753_instance_0_cores_0": 28, "947": 28, "948": 28, "run_20221009111951_inst": 28, "run_20221009111951_instance_0_cores_0": 28, "404": 28, "405": 28, "match": [28, 38], "conf": 28, "549": 28, "550": 28, "malloc_conf": 28, "oversize_threshold": 28, "background_thread": 28, "metadata_thp": 28, "run_20221009112720_instance_0_cores_0": 28, "29": 28, "05": [28, 52], "206": 28, "207": 28, "run_20221009112905_instance_0_cores_0": 28, "911": 28, "run_20221009112956_instance_0_cores_0": 28, "although": 29, "articl": 29, "omp": 29, "briefli": 29, "background": 29, "being": 29, "socket": [29, 33, 54], "competit": 29, "stall": 29, "busi": 29, "uma": 29, "connect": 29, "control": [29, 39, 42, 47, 54], "remot": 29, "lscpu": [29, 47], "platinum": 29, "8180m": 29, "detect": 29, "onboard": 29, "logic": 29, "thu": 29, "total": [29, 52], "112": 29, "second": [29, 47, 53, 54], "neg": 29, "50ghz": 29, "node0": 29, "node1": 29, "friendli": 29, "nchw": 29, "idea": 29, "bound": 29, "workload": [29, 39, 42, 47, 55], "nth": 29, "man": 29, "cpunodebind": 29, "membind": 29, "wikipedia": [29, 46], "wherebi": 29, "master": [29, 31], "consecut": 29, "fork": 29, "figur": 29, "illustr": 29, "libgomp": 29, "libiomp": 29, "region": 29, "along": 29, "seen": 29, "coupl": 29, "commonli": 29, "gomp": 29, "affin": 29, "comma": 29, "hyphen": 29, "contigu": 29, "gomp_cpu_affin": 29, "omp_proc_bind": 29, "omp_schedul": 29, "static": 29, "ld": 29, "preload": 29, "libiomp5": [29, 35], "kmp": 29, "dramat": 29, "togeth": 29, "thrash": 29, "suppos": [29, 46], "leav": 29, "compet": 29, "strategi": [29, 53], "proclist": 29, "classic": 29, "blocktim": 29, "millisecond": 29, "wait": 29, "sleep": 29, "200m": 29, "elaps": 29, "larger": [29, 34], "reserv": 29, "sole": 29, "penal": 29, "plai": 29, "role": 29, "destruct": 29, "reus": [29, 40], "jemalloc": 29, "hold": 29, "dealloc": 29, "costli": 29, "gperftool": 29, "plu": 29, "nice": 29, "analysi": 29, "xzvf": 29, "heap": 29, "checker": 29, "debugalloc": 29, "flexibl": 30, "protocolmessag": 30, "easili": 30, "tune": [30, 40, 46], "offononoffoff": 30, "itex_onednn_graph": [30, 47], "itex_layout_optitex_remapperitex_auto_mixed_precisionitex_shard": 30, "except": [30, 37], "enum": 30, "itexdatatyp": 30, "datatyp": [30, 35, 46], "toggl": 30, "unless": 30, "field": 30, "onednn_graph": 30, "onednn_graphoverrid": 30, "layout_opt": 30, "itex_remapp": 30, "itex_shard": 30, "xpu_force_sync": 30, "itex_sync_exec": 30, "sync": 30, "hurt": 30, "rais": 30, "valueerror": 30, "git_vers": [30, 33], "7112d33": 30, "onednn_cpu_git_vers": 30, "a930253": 30, "onednn_gpu_git_vers": 30, "compiler_vers": 30, "gcc": 30, "20180905": 30, "dpcpp": [30, 32], "122": 30, "tf_compatible_vers": 30, "lt": 30, "put": 31, "libitex_cpu_cc": [31, 35], "libitex_gpu_cc": [31, 35], "l28": 31, "exit": 31, "xxxxx": [31, 54], "kernels_experiment": 31, "tf_cuda_librari": 31, "if_not_mobil": 31, "p1": 31, "tf_serv": 31, "serving_plugin": 31, "l24": 31, "l29": 31, "local_repositori": 31, "org_tensorflow": 31, "wno": 31, "stringop": 31, "truncat": 31, "rm": [31, 35, 41, 43, 53], "rf": [31, 41, 53], "tmp": 31, "mnist_saved_model": 31, "saved_model": 31, "l": [31, 35], "modelserv": 31, "plug": [31, 55], "hub": 31, "port": [31, 49], "rest_api_port": 31, "8501": 31, "model_base_path": 31, "tensorflow_plugin": 31, "path_to_libitex_cpu_cc": 31, "oneapi_install_path": 31, "path_to_libitex_gpu_cc": 31, "mnist_client": 31, "num_test": 31, "1000": 31, "xx": 31, "earli": 32, "effort": 32, "basi": 32, "subystem": 32, "graphic": [32, 34], "101": 32, "4255": 32, "dch": 32, "gpg": 32, "agent": 32, "qo": 32, "dearmor": 32, "keyr": 32, "echo": 32, "deb": 32, "arch": 32, "i386": 32, "jammi": 32, "igc": 32, "cm": 32, "libigc1": 32, "13822": 32, "libigdfcl1": 32, "libigdgmm12": 32, "pub": 32, "sw": 32, "archiv": 32, "instead": [32, 46, 49], "icd_23": 32, "04_amd64": 32, "isol": [32, 36, 37], "basekit": [32, 33, 37], "weekli": 32, "env_check": [32, 33, 37, 55], "quick_exampl": 32, "access": 32, "onemkl": [32, 33, 34, 37], "registrationcent": [32, 37], "akdlm": [32, 37], "irc_na": [32, 37], "992857b9": [32, 37], "624c": [32, 37], "45de": [32, 37], "9701": [32, 37], "f6445d845359": [32, 37], "l_basekit_p_2023": [32, 37], "49397_offlin": [32, 37], "mpi": [32, 33, 37], "deploy": [33, 36, 37], "miniconda": 33, "approach": 33, "easiest": 33, "setup": [33, 36, 38, 40], "press": 33, "curl": 33, "anaconda": 33, "miniconda3": 33, "x86_64": [33, 34], "restart": 33, "termin": 33, "bashrc": 33, "intelpython3_ful": 33, "142f5f29": 33, "ccl": [33, 37], "cluster": 33, "fi_provid": 33, "though": 34, "virtual": [34, 46, 47, 49, 50, 51, 54], "itex_build": 34, "aot": 34, "ahead": 34, "startup": 34, "prolong": 34, "minut": 34, "tookit": 34, "tree": 34, "prompt": 34, "differenct": 34, "fill": 34, "ats": 34, "m150": 34, "acm": 34, "g11": 34, "ve": 34, "140": 34, "m75": 34, "pvc": 34, "a730m": 34, "g10": 34, "a380": 34, "wrong": 34, "identifi": 34, "libitex_common": 34, "_pywrap_itex": 34, "libitex_cpu": 34, "libitex_gpu": 34, "preconfigur": 34, "bazelrc": 34, "shoul": 35, "diretcori": 35, "llvm_openmp": 35, "pythonhost": 35, "ed": 35, "310fee0477ce46f722c561dd7e21eebca0d1d29bdb3cf4a2335b845fbba4": 35, "cp311": 35, "manylinux_2_17_x86_64": 35, "manylinux2014_x86_64": 35, "b": [35, 43, 47, 53, 54], "unzip": 35, "tensorflow_2": 35, "symbol": 35, "ln": 35, "libtensorflow_cc": 35, "libtensorflow_framework": 35, "libtensorflow": 35, "r2": [35, 54], "install_head": 35, "environment": 35, "library_path": 35, "tf_loadpluggabledevicelibrari": 35, "c_api_experiment": 35, "tf_statu": 35, "lib_path": 35, "client_sess": 35, "standard_op": 35, "newrootscop": 35, "assign_x": 35, "randomnorm": 35, "assign_i": 35, "z": [35, 52], "const": 35, "vz": 35, "vector": 35, "clientsess": 35, "session": [35, 47], "fetch": 35, "tf_check_ok": 35, "matrix": 35, "xpu_lib_path": 35, "c_str": 35, "tf_code": 35, "status_msg": 35, "tf_messag": 35, "makefil": 35, "tf_include_path": 35, "tfcc_path": 35, "example_test": 35, "ltensorflow_framework": 35, "ltensorflow_cc": 35, "wl": 35, "rpath": 35, "tbb": [35, 37], "2nd": 36, "4th": [36, 43], "cento": 36, "sapphir": [36, 43], "rapid": [36, 43], "8888": [36, 37, 43, 47, 49, 50], "pip3": 36, "simultan": 37, "stack": [37, 38], "libiari": 37, "en": 37, "consol": 37, "00": 37, "374832": 37, "itex_cpu_wrapp": 37, "42": 37, "217981": 37, "itex_gpu_wrapp": 37, "205706": 37, "313231": 37, "varieti": [39, 42], "classifi": [39, 42], "bare": [39, 42], "metal": [39, 42], "alexnet": [39, 42], "recogn": [39, 40, 42], "handwrit": [39, 40, 42], "ai": [39, 40, 42, 45, 47, 55], "zoo": [39, 42], "diffus": [39, 42, 55], "text2imag": [39, 42], "pretrain": 39, "technologi": 40, "big": 40, "blocker": 40, "analyt": 40, "websit": [40, 55], "env_nam": 41, "env_itex": [41, 43, 47, 49, 50, 52], "venv": [41, 52], "internet": 43, "throughput": [43, 49], "seriesintel": 43, "170intel": 43, "seriesne": 43, "seriessupport": 43, "itex_repo": 43, "pwd": [43, 54], "infer_inception_v4_amp": 43, "v1_8": 43, "inceptionv4_fp32_pretrained_model": 43, "set_env_gpu": [43, 44, 50], "ws1": 43, "infer_fp32_vs_amp": 43, "screen": 43, "01837550401687622": 43, "0113076031208038": 43, "fp": 43, "128": [43, 46, 51], "92880015134813": 43, "1691980294577": 43, "6153628825864496": 43, "867908472383153": 43, "wors": 43, "set_env_cpu": [44, 50], "env_itex_cpu": [44, 50], "success": [44, 48, 49, 54], "n02123159": 44, "tiger_cat": 44, "22355853": 44, "legaci": [46, 49, 50, 51, 54], "deeplearningexampl": 46, "tensorflow2": [46, 52], "languagemodel": 46, "pip_set_env": [46, 47, 49, 51], "extract": 46, "squad": [46, 51], "bookcorpu": 46, "data_download": 46, "v1": [46, 47, 51, 55], "google_pretrained_weight": 46, "uncased_l": 46, "24_h": 46, "1024_a": 46, "12_h": 46, "768_a": 46, "tfrecord": 46, "books_wiki_en_corpu": 46, "consum": 46, "v100": 46, "dai": 46, "pretrain_bert": 46, "lamb": 46, "maximum": 46, "sequenc": [46, 49], "length": 46, "phase1": 46, "phase2": 46, "512": [46, 51], "train_batch_size_phase1": 46, "train_batch_size_phase2": 46, "eval_batch_s": 46, "learning_rate_phase1": 46, "5e": 46, "learning_rate_phase2": 46, "usa_xla": 46, "num_gpu": [46, 54], "warmup_steps_phase1": 46, "660": 46, "warmup_steps_phase2": 46, "66": 46, "2600": 46, "save_checkpoint_step": 46, "num_accumulation_steps_phase1": 46, "num_accumulation_steps_phase2": 46, "bert_model": [46, 51], "gbs1": 46, "expr": 46, "gbs2": 46, "pretrain_result_dir": 46, "tf_bert_pretraining_lamb_": 46, "_gbs1_": 46, "_gbs2_": 46, "data_dir": [46, 53], "run_pretraining_lamb": 46, "pretrain_lamb": 46, "checkpoint": 46, "batch_size_per_gpu": 46, "learning_rate_per_gpu": 46, "use_xla": 46, "squad_vers": 46, "use_mytrain": 46, "pretrain_path": 46, "phase_2": 46, "ckpt": [46, 51], "result_dir": 46, "tf_bert_finetune_": 46, "run_squad": [46, 51], "calibr": 47, "qdq": 47, "dequant": 47, "flower": 47, "photo": 47, "transfer": 47, "stage": 47, "protobuf": 47, "rewriter_config_pb2": 47, "infer_config": 47, "rewrite_opt": 47, "constant_fold": 47, "rewriterconfig": 47, "set_sess": 47, "speedup": [47, 54], "grep": 47, "vnni": 47, "avx_vnni": 47, "amx": 47, "amx_bf16": 47, "amx_int8": 47, "run_jupyt": 47, "yyi": 47, "xxxxxxxx": 47, "ipynb": [47, 49, 50], "mit": 47, "sy": 48, "num_channel": 48, "input_width": 48, "input_height": 48, "filter_width": 48, "filter_height": 48, "rand": 48, "stride": 48, "bias_add": 48, "479142": 48, "7296917": 48, "6456823": 48, "077278": 48, "9259825": 48, "3000765": 48, "3999124": 48, "0527704": 48, "0656753": 48, "85485": 48, "7297122": 48, "9373732": 48, "4818356": 48, "1455178": 48, "4929404": 48, "6422923": 48, "718459": 48, "7090344": 48, "988714": 48, "3391027": 48, "875052": 48, "6461415": 48, "9349675": 48, "327398": 48, "298973": 48, "3905785": 48, "1704025": 48, "9154005": 48, "6926193": 48, "9677248": 48, "481086": 48, "9746864": 48, "8941312": 48, "3221133": 48, "5479512": 48, "197306": 48, "305706": 48, "9873173": 48, "5597944": 48, "250221": 48, "118212": 48, "8672705": 48, "949225": 48, "2636094": 48, "5300783": 48, "1403804": 48, "1729176": 48, "6628485": 48, "2607155": 48, "6342418": 48, "9381838": 48, "6761076": 48, "5063303": 48, "4718971": 48, "8880196": 48, "1658201": 48, "3787665": 48, "1193419": 48, "42261": 48, "318963": 48, "8809638": 48, "6514435": 48, "3549364": 48, "8598063": 48, "517385": 48, "9702091": 48, "9260886": 48, "3804817": 48, "381424": 48, "6027272": 48, "7787259": 48, "9631021": 48, "93901324": 48, "2134862": 48, "89942324": 48, "cv": 49, "concaten": 49, "loop": [49, 54], "hasn": 49, "reset": 49, "66fa74b6a2a0bb1e563ae8bce66496b118b95200": 49, "ipykernel": 49, "url": [49, 50], "token": [49, 50], "stable_diffussion_infer": 49, "stable_diffusion_infer": 49, "present": 49, "fr\u00e9chet": 49, "distanc": 49, "fid": 49, "outcom": 49, "a100": 49, "stable_diffusion_accuraci": 49, "load_ref_result": 49, "ref_result_dir": 49, "nv_result": 49, "img_arrays_for_acc": 49, "81": [49, 51], "1146879196167": 49, "328223477737884": 49, "tutori": 50, "pacakg": 50, "tensorflow_doc": 50, "classify_text_with_bert": 50, "ip": 50, "f502f0715979ec73c571ca5676ba58431b916f5f58ee3333": 50, "crash": 50, "tri": 50, "traceback": 50, "recent": 50, "174": 50, "__del__": 50, "typeerror": 50, "nonetyp": 50, "callabl": 50, "research": [51, 53], "bert_large_dir": 51, "squad_dir": 51, "output_dir": 51, "vocab_fil": 51, "vocab": 51, "bert_config_fil": 51, "bert_config": 51, "json": 51, "init_checkpoint": 51, "do_train": 51, "train_fil": 51, "do_predict": 51, "predict_fil": 51, "train_batch_s": 51, "3e": 51, "num_train_epoch": 51, "max_seq_length": 51, "doc_strid": 51, "use_tpu": 51, "tpu_nam": 51, "produc": 51, "f1": 51, "41249612335034": 51, "exact_match": 51, "2488174077578": 51, "gin": [52, 53], "raw": 52, "train_horovod": 52, "tensorflow2_keras_mnist": 52, "horovodrun": 52, "18": 52, "54": 52, "006950": 52, "custom_graph_optimizer_registri": 52, "163161": 52, "940695": 52, "107809": 52, "163517": 52, "250": 52, "yym": 52, "xxxx": [52, 53], "yyyi": 52, "zzzz": 52, "yaml": 53, "itex_dummi": 53, "hvd_support_light": 53, "hvd_support": 53, "light": 53, "minimum": 53, "alloc": 53, "growth": 53, "distributedoptim": 53, "lar": 53, "paper": 53, "net": 53, "php": 53, "non": 53, "commerci": 53, "purpos": 53, "pythonpath": 53, "imagenet_data": 53, "config_fil": 53, "number_of_process": 53, "process_per_nod": 53, "correspondingli": 53, "dummi": 53, "rank": 53, "fi": 53, "mpirun": 53, "ppn": 53, "vision": 53, "image_classif": [53, 54], "classifier_train": 53, "train_and_ev": 53, "model_typ": 53, "resnet": [53, 54], "i0909": 53, "323099": 53, "140645511436096": 53, "keras_util": [53, 54], "timehistori": [53, 54], "324534": 53, "140611700504384": 53, "037004": 53, "037142": 53, "213994": 53, "300": 53, "214127": 53, "accordingli": 54, "tf_num_interop_thread": 54, "tf_num_intraop_thread": 54, "resnet_ctl_imagenet_main": 54, "train_epoch": 54, "steps_per_loop": 54, "log_step": 54, "skip_ev": 54, "use_synthetic_data": 54, "distribution_strategi": 54, "use_tf_while_loop": 54, "use_tf_funct": 54, "enable_xla": 54, "enable_tensorboard": 54, "enable_checkpoint_and_export": 54, "channels_last": 54, "single_l2_loss_op": 54, "follw": 54, "use_itex_shard": 54, "pramet": 54, "suggest": 54, "2x256x10": 54, "5120": 54, "itex_enable_multiple_stream": 54, "queue": 54, "resnet50_itex": 54, "tfg_optimizer_hook": 54, "289": 54, "i0324": 54, "594147": 54, "140348344015936": 54, "597360": 54, "479": 54, "sec": 54, "train_accuraci": 54, "train_loss": 54, "634554": 54, "161625": 54, "163815": 54, "790632": 54, "792936": 54, "103148": 54, "25": 54, "416651": 54, "419072": 54, "3359284": 54, "025180": 54, "027671": 54, "3343554": 54, "aim": 55, "flexibli": 55, "diagram": 55, "summari": 55, "ecosystem": 55, "estim": 55, "manag": 55, "dockerhub": 55, "come": 55, "soon": 55, "visit": 55, "tour": 55, "collabor": 55, "adher": 55, "innov": 55, "jax": 55, "vulner": 55, "apach": 55, "govern": 55, "forth": 55}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"contributor": 0, "coven": 0, "code": [0, 7, 17, 19, 34, 35, 46, 48, 49, 50, 51, 52, 54], "conduct": 0, "our": 0, "pledg": 0, "standard": 0, "enforc": 0, "respons": 0, "scope": 0, "guidelin": [0, 7], "1": [0, 11, 16, 31, 32, 35], "correct": 0, "2": [0, 11, 16, 31, 32, 35], "warn": 0, "3": [0, 11, 16, 32], "temporari": 0, "ban": 0, "4": [0, 11, 16, 32], "perman": 0, "attribut": [0, 18], "secur": [1, 55], "polici": [1, 27], "report": 1, "vulner": 1, "intel": [2, 3, 4, 6, 7, 23, 29, 30, 31, 32, 34, 35, 36, 37, 40, 41, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 56], "extens": [2, 3, 4, 6, 7, 10, 23, 30, 31, 32, 34, 35, 36, 37, 40, 47, 56], "tensorflow": [2, 3, 4, 6, 7, 18, 19, 21, 23, 30, 31, 32, 34, 35, 36, 37, 40, 47, 56], "docker": [2, 3, 31, 36, 37, 43, 45], "contain": [2, 3, 36, 37, 43, 45], "guid": [2, 3, 5, 7, 28, 29, 38, 41, 45], "descript": [2, 3], "binari": [2, 3, 55], "prepar": [2, 3, 35, 41, 43, 44, 46, 49, 50, 51, 52, 53, 54], "usag": [2, 15, 17, 18, 19, 22, 26, 28], "i": [2, 3, 28], "custom": [2, 11, 19, 23, 25, 27], "build": [2, 3, 5, 11, 14, 16, 27, 31, 34, 35, 36, 37], "script": [2, 28, 41], "ii": [2, 3, 28], "iii": [2, 28], "run": [2, 3, 16, 31, 32, 35, 40, 41, 43, 44, 45, 46, 47, 49, 50, 51, 52, 54], "verifi": [2, 11, 32, 36, 37], "That": 2, "gpu": [2, 16, 17, 21, 22, 29, 32, 34, 35, 37, 40, 41, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55], "access": [2, 29], "from": [2, 14, 31, 35, 36, 37], "serv": [3, 21, 31], "imag": [3, 31], "welcom": [4, 6, 56], "document": [4, 5, 6, 7, 55, 56], "highlight": 4, "onlin": 5, "introduct": [5, 13, 23, 40, 43, 45, 46, 47, 49, 50, 51, 54], "updat": 5, "latest": 5, "version": [5, 30, 47], "creat": [5, 34, 52], "releas": [5, 8, 32], "local": [5, 40, 47], "test": [5, 7, 43], "contribut": [7, 55], "develop": 7, "tip": [7, 19], "debug": 7, "unit": 7, "python": [7, 11, 17, 18, 20, 21, 30, 35, 43, 44, 49, 54], "style": 7, "c": [7, 31, 35], "bazel": [7, 34], "known": 8, "issu": 8, "incompat": 8, "chang": [8, 46, 49, 51], "directori": 9, "tree": 9, "structur": [9, 17], "design": [10, 12, 28], "workflow": [10, 15, 17], "resourc": [10, 55], "how": [11, 27], "write": 11, "op": [11, 25, 30], "prerequisit": [11, 30, 44, 46, 49, 50, 51, 54], "defin": 11, "interfac": 11, "regist": 11, "kernel": 11, "implement": [11, 24], "6": 11, "add": 11, "7": 11, "us": [11, 21, 28, 31], "8": 11, "packag": [11, 35, 37, 54], "9": 11, "instal": [11, 16, 31, 32, 33, 34, 36, 37, 38, 48, 52, 54, 55], "optim": [12, 13, 19, 21, 24, 52, 53], "onednn": [13, 47], "object": 13, "cach": 13, "convolut": 13, "frequent": 14, "ask": 14, "question": 14, "troubleshoot": 14, "sourc": [14, 31, 34, 35], "runtim": 14, "int8": [15, 21], "quantiz": [15, 21, 40, 47], "overview": [15, 16, 17, 19, 20, 27, 28, 29, 30, 34], "openxla": [16, 21], "support": [16, 21, 35, 55], "via": [16, 20, 32, 36, 37, 43], "pjrt": 16, "hardwar": [16, 27, 29, 32, 34, 36, 37, 40, 43, 46, 47, 49, 50, 51, 54, 55], "softwar": [16, 29, 32, 36, 37, 55], "requir": [16, 32, 34, 36, 37, 43, 46, 49, 50, 51, 54, 55], "driver": [16, 32, 34, 37, 41], "librari": [16, 31, 35], "jax": 16, "exampl": [16, 17, 18, 19, 22, 28, 34, 35, 39, 42, 44, 46, 48, 49, 51, 52, 53, 54], "xpuautoshard": [17, 21, 54], "experiment": [17, 21, 32], "api": [17, 18, 20, 21, 23, 30, 43, 44, 49, 54], "dump": 17, "graph": [17, 19, 21, 24, 30, 47], "tune": [18, 19, 51], "advanc": [18, 19, 21, 23, 28, 43, 47], "auto": [18, 19, 20, 21], "mix": [18, 19, 20, 21, 24, 27, 43], "precis": [18, 19, 20, 21, 27, 43], "background": [18, 40, 47], "numer": 18, "stabil": 18, "configur": [18, 20, 29, 34, 35, 43, 47], "list": 18, "rule": 18, "improv": 18, "perform": [18, 43, 53], "environ": [18, 20, 28, 30, 32, 33, 34, 36, 37, 40, 41, 43, 44, 46, 47, 49, 50, 51, 52, 54], "variabl": [18, 20, 28, 30, 32, 37, 43], "differ": [18, 27], "stock": [18, 19], "end": 18, "mobilenet": 18, "amp": [19, 21, 43], "v": [19, 28], "data": [19, 24, 53], "type": [19, 24, 27], "featur": [19, 21, 23], "manual": 19, "quick": [19, 45, 48, 55], "train": [19, 27, 45, 50, 52, 53, 54], "setup": [19, 27, 32, 37, 41, 43, 44, 46, 49, 50, 51, 52, 54], "enabl": [19, 41, 43, 44, 46, 47, 49, 50, 51, 52, 54], "origin": 19, "notic": 19, "log": [19, 28], "save": 19, "oper": [19, 21, 25, 26, 30], "itex_verbos": 20, "level": 20, "definit": 20, "backend": 20, "config": [20, 30], "protocol": [20, 30], "option": [20, 32, 35], "eas": 21, "profil": [21, 22], "cpu": [21, 29, 34, 35, 36, 37, 43, 44, 47, 48, 49, 50, 55], "launcher": 21, "faq": [22, 43, 44, 46, 49, 50, 51], "infrastructur": 23, "architectur": 23, "public": 23, "manag": 23, "xpu": [23, 34, 37, 55], "engin": 23, "fusion": 24, "basic": [24, 28], "detail": 24, "gener": 24, "layout": [24, 29], "itex": [25, 30], "adamwithweightdecayoptim": 25, "layernorm": 25, "groupnorm": 25, "gelu": [25, 26], "itexlstm": 25, "overrid": [26, 30], "layer": 26, "normal": 26, "dens": 26, "activ": 26, "instanc": [26, 28], "lstm": 26, "kera": 27, "identifi": 27, "set": [27, 28, 40, 53, 54], "dtype": 27, "model": [27, 31, 43, 45, 46, 49, 51, 53], "fit": 27, "loss": 27, "scale": 27, "underflow": 27, "overflow": 27, "loop": 27, "launch": 28, "user": 28, "common": [28, 34, 41], "execut": [28, 40, 43, 44, 46, 49, 50, 51, 52, 53, 54], "mode": 28, "latenc": 28, "throughput": 28, "multi": 28, "numa": [28, 29], "control": 28, "memori": [28, 29], "alloc": [28, 29], "singl": 28, "infer": [28, 43, 44, 45, 49], "all": 28, "physic": 28, "core": 28, "includ": 28, "logic": 28, "one": 28, "node": 28, "iv": 28, "your": 28, "number": 28, "multipl": 28, "vi": 28, "vii": 28, "viii": 28, "index": 28, "ix": 28, "tf_num_intraop_thread": 28, "x": 28, "tf_num_interop_thread": 28, "tcmalloc": [28, 29], "jemalloc": 28, "default": 28, "practic": 29, "tabl": [29, 55], "content": 29, "non": 29, "uniform": 29, "format": 29, "numactl": 29, "openmp": 29, "omp_num_thread": 29, "gnu": 29, "import": 30, "intel_extension_for_tensorflow": 30, "name": 30, "preserv": 30, "configproto": 30, "gpuoption": 30, "graphopt": 30, "automixedprecisionopt": 30, "shardingconfig": 30, "debugopt": 30, "set_config": 30, "get_config": 30, "server": [31, 40, 47], "dockerfil": [31, 36, 37], "sampl": 31, "arc": 32, "A": 32, "seri": 32, "window": 32, "subsystem": 32, "linux": 32, "wsl2": 32, "nativ": 32, "directli": 32, "step": [32, 33, 43, 44, 49, 50], "By": 32, "instruct": [32, 33], "ubuntu": 32, "pypi": [32, 34, 36, 37], "wheel": [32, 36, 37], "virtual": [32, 36, 37, 41, 52], "system": [32, 36, 37], "full": 32, "oneapi": [32, 34, 37, 41, 52], "conda": [33, 34], "precondit": 33, "download": [34, 43, 50, 52, 53], "extra": 34, "onli": [34, 37], "base": [34, 37, 40, 41], "toolkit": [34, 37, 41], "For": 34, "addit": 34, "cc": 35, "header": 35, "file": 35, "extract": 35, "recommend": 35, "integr": 35, "linker": 35, "load": 35, "get": [36, 37, 55], "dockerhub": [36, 37], "bare": [36, 37, 43, 45], "metal": [36, 37, 43, 45], "check": [37, 47], "platform": 37, "acceler": [40, 45, 46, 54], "alexnet": 40, "devcloud": [40, 47], "up": [40, 43], "speed": 43, "incept": [43, 47], "v4": 43, "automat": 43, "skip": [43, 44, 49, 50], "thi": [43, 44, 49, 50], "clone": [43, 52], "repositori": 43, "pretrain": [43, 46], "compar": 43, "fp32": [43, 49], "result": 43, "method": 43, "resnet50": [44, 54], "output": [44, 48, 49, 52, 53, 54], "deep": [45, 47], "learn": [45, 47], "zoo": 45, "workload": 45, "start": [45, 55], "bert": [46, 50, 51], "larg": [46, 51], "dataset": [46, 53], "command": [46, 52, 53, 54], "finetun": 46, "v3": 47, "xeon": 47, "disabl": 47, "constant": 47, "fold": 47, "function": 47, "boost": 47, "matrix": 47, "startup": [47, 50], "jupyt": [47, 49, 50], "notebook": [47, 49, 50], "licens": [47, 55], "quick_exampl": 48, "py": 48, "note": 48, "stabl": 49, "diffus": 49, "text2imag": 49, "fp16": 49, "accuraci": [49, 51], "classifi": [50, 51], "text": [50, 51], "fp8": 51, "fine": 51, "bf16": 51, "distribut": [52, 53], "horovod": [52, 53], "depend": [52, 53], "repo": [52, 53], "patch": 52, "appli": 52, "devic": 52, "count": 52, "inform": 53, "paramet": [53, 54], "hvd": 53, "other": 54, "pythonpath": 54, "without": 54, "With": 54, "shard": 54, "further": 54, "channel": 55, "compat": 55, "weekli": 55}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 57}, "alltitles": {"Contributor Covenant Code of Conduct": [[0, "contributor-covenant-code-of-conduct"]], "Our Pledge": [[0, "our-pledge"]], "Our Standards": [[0, "our-standards"]], "Enforcement Responsibilities": [[0, "enforcement-responsibilities"]], "Scope": [[0, "scope"]], "Enforcement": [[0, "enforcement"]], "Enforcement Guidelines": [[0, "enforcement-guidelines"]], "1. Correction": [[0, "correction"]], "2. Warning": [[0, "warning"]], "3. Temporary Ban": [[0, "temporary-ban"]], "4. Permanent Ban": [[0, "permanent-ban"]], "Attribution": [[0, "attribution"]], "Security Policy": [[1, "security-policy"]], "Report a Vulnerability": [[1, "report-a-vulnerability"]], "Intel\u00ae Extension for TensorFlow* Docker Container Guide": [[2, "intel-extension-for-tensorflow-docker-container-guide"]], "Description": [[2, "description"], [3, "description"]], "Binaries Preparation": [[2, "binaries-preparation"]], "Usage of Docker Container": [[2, "usage-of-docker-container"]], "I. Customize Build Script": [[2, "i-customize-build-script"]], "II. Build the Container": [[2, "ii-build-the-container"], [3, "ii-build-the-container"]], "III. Running the Container": [[2, "iii-running-the-container"]], "Verify That Intel GPU is Accessible From TensorFlow": [[2, "verify-that-intel-gpu-is-accessible-from-tensorflow"]], "Intel\u00ae Extension for TensorFlow* Serving - Docker Container Guide": [[3, "intel-extension-for-tensorflow-serving-docker-container-guide"]], "Build the Docker Image": [[3, "build-the-docker-image"]], "I. Binaries Preparation": [[3, "i-binaries-preparation"]], "Running the Container": [[3, "running-the-container"]], "Welcome to Intel\u00ae Extension for TensorFlow* documentation": [[4, "welcome-to-intel-extension-for-tensorflow-documentation"]], "Documentation": [[4, "documentation"], [55, "documentation"]], "Highlights": [[4, "highlights"]], "Online Documentation Build Guide": [[5, "online-documentation-build-guide"]], "Introduction": [[5, "introduction"], [13, "introduction"], [23, "introduction"], [40, "introduction"], [43, "introduction"], [45, "introduction"], [46, "introduction"], [47, "introduction"], [49, "introduction"], [50, "introduction"], [51, "introduction"], [54, "introduction"]], "Update latest Version": [[5, "update-latest-version"]], "Create Release Version": [[5, "create-release-version"]], "Build to Local Test": [[5, "build-to-local-test"]], "Welcome to Intel \u00ae Extension for TensorFlow* documentation!": [[6, "welcome-to-intel-extension-for-tensorflow-documentation"], [56, "welcome-to-intel-extension-for-tensorflow-documentation"]], "Contributing guidelines": [[7, "contributing-guidelines"]], "Contributing to Intel\u00ae Extension for TensorFlow*": [[7, "contributing-to-intel-extension-for-tensorflow"]], "Developing Intel\u00ae Extension for TensorFlow*": [[7, "developing-intel-extension-for-tensorflow"]], "Tips and Debugging": [[7, "tips-and-debugging"]], "Unit testing": [[7, "unit-testing"]], "Python Unit Testing": [[7, "python-unit-testing"]], "Code style guide": [[7, "code-style-guide"]], "Python coding style": [[7, "python-coding-style"]], "C++ coding style": [[7, "c-coding-style"]], "bazel style guide": [[7, "bazel-style-guide"]], "Documentation style guide": [[7, "documentation-style-guide"]], "Releases": [[8, "releases"]], "Known Issues": [[8, "known-issues"]], "Incompatible Changes": [[8, "incompatible-changes"]], "Directory Tree Structure": [[9, "directory-tree-structure"]], "Extension Design": [[10, "extension-design"]], "Workflow": [[10, "workflow"], [15, "workflow"], [17, "workflow"]], "Resources": [[10, "resources"], [55, "resources"]], "How to write custom op": [[11, "how-to-write-custom-op"]], "1. Prerequisite": [[11, "prerequisite"]], "2. Define the op interface and Register op": [[11, "define-the-op-interface-and-register-op"]], "3. Register the kernels for the op": [[11, "register-the-kernels-for-the-op"]], "4. Implement the kernels": [[11, "implement-the-kernels"]], "6. Add the op to BUILD": [[11, "add-the-op-to-build"]], "7. Use the op in Python": [[11, "use-the-op-in-python"]], "8. Build the package": [[11, "build-the-package"]], "9. Install and Verify": [[11, "install-and-verify"]], "Optimizations Design": [[12, "optimizations-design"]], "oneDNN object cache optimization": [[13, "onednn-object-cache-optimization"]], "Optimization in convolution": [[13, "optimization-in-convolution"]], "Frequently Asked Questions": [[14, "frequently-asked-questions"]], "Troubleshooting": [[14, "troubleshooting"]], "Build from source": [[14, "build-from-source"], [31, "build-from-source"]], "Runtime": [[14, "runtime"]], "INT8 Quantization": [[15, "int8-quantization"], [21, "int8-quantization"]], "Overview": [[15, "overview"], [17, "overview"], [19, "overview"], [20, "overview"], [27, "overview"], [28, "overview"], [29, "overview"], [30, "overview"], [34, "overview"]], "Usage": [[15, "usage"], [17, "usage"], [18, "usage"], [18, "id1"], [19, "usage"], [22, "usage"], [26, "usage"]], "OpenXLA Support on GPU via PJRT": [[16, "openxla-support-on-gpu-via-pjrt"]], "1. Overview": [[16, "overview"]], "2. Hardware and Software Requirement": [[16, "hardware-and-software-requirement"]], "Hardware Requirements": [[16, "hardware-requirements"], [32, "hardware-requirements"], [34, "hardware-requirements"], [36, "hardware-requirements"], [37, "hardware-requirements"], [46, "hardware-requirements"], [49, "hardware-requirements"], [50, "hardware-requirements"], [51, "hardware-requirements"], [54, "hardware-requirements"]], "Software Requirements": [[16, "software-requirements"], [32, "software-requirements"], [36, "software-requirements"], [37, "software-requirements"]], "Install GPU Drivers": [[16, "install-gpu-drivers"], [37, "install-gpu-drivers"]], "3. Build Library for JAX": [[16, "build-library-for-jax"]], "4. Run JAX Example": [[16, "run-jax-example"]], "XPUAutoShard on GPU [Experimental]": [[17, "xpuautoshard-on-gpu-experimental"], [21, "xpuautoshard-on-gpu-experimental"]], "Code Structure": [[17, "code-structure"]], "Python API": [[17, "python-api"], [18, "python-api"], [43, "python-api"], [54, "python-api"]], "Dump the graph": [[17, "dump-the-graph"]], "Examples": [[17, "examples"], [28, "examples"], [39, "examples"], [42, "examples"]], "Tune Advanced Auto Mixed Precision": [[18, "tune-advanced-auto-mixed-precision"]], "Background": [[18, "background"], [40, "background"], [47, "background"]], "Numeric Stability": [[18, "numeric-stability"]], "Configuration List": [[18, "configuration-list"]], "Example of Mix Precision by List": [[18, "example-of-mix-precision-by-list"]], "Rule to Improve Performance by the Configuration List": [[18, "rule-to-improve-performance-by-the-configuration-list"]], "Python API Attribute & Environment Variable": [[18, "python-api-attribute-environment-variable"]], "Environment Variable Difference with Stock TensorFlow": [[18, "environment-variable-difference-with-stock-tensorflow"]], "Example": [[18, "example"], [19, "example"], [35, "example"]], "End-to-end Example": [[18, "end-to-end-example"]], "Tuning Performance Example on MobileNet": [[18, "tuning-performance-example-on-mobilenet"]], "Advanced Auto Mixed Precision": [[19, "advanced-auto-mixed-precision"], [19, "id1"]], "Advanced AMP vs. Stock TensorFlow AMP": [[19, "advanced-amp-vs-stock-tensorflow-amp"]], "Data Type": [[19, "data-type"]], "Graph Optimizer": [[19, "graph-optimizer"]], "Feature": [[19, "feature"]], "Tune Advanced AMP Manually": [[19, "tune-advanced-amp-manually"]], "Quick Training Example": [[19, "quick-training-example"]], "Setup": [[19, "setup"], [27, "setup"]], "Enable Advanced AMP": [[19, "enable-advanced-amp"]], "Original Code": [[19, "original-code"]], "Notice": [[19, "notice"]], "Tips": [[19, "tips"]], "Log and Save Optimized Graph": [[19, "log-and-save-optimized-graph"]], "Custom Operation": [[19, "custom-operation"]], "Environment Variables": [[20, "environment-variables"], [28, "environment-variables"]], "Configuration via Environment Variables": [[20, "configuration-via-environment-variables"]], "ITEX_VERBOSE level definition": [[20, "itex-verbose-level-definition"]], "Environment Variables with Python APIs": [[20, "environment-variables-with-python-apis"]], "Backend and Config Protocol": [[20, "backend-and-config-protocol"]], "Auto Mixed Precision Options": [[20, "auto-mixed-precision-options"]], "Features": [[21, "features"]], "Operator Optimization": [[21, "operator-optimization"]], "Graph Optimization": [[21, "graph-optimization"]], "Advanced Auto Mixed Precision (AMP)": [[21, "advanced-auto-mixed-precision-amp"]], "Ease-of-use Python API": [[21, "ease-of-use-python-api"]], "GPU Profiler": [[21, "gpu-profiler"], [22, "gpu-profiler"]], "CPU Launcher [Experimental]": [[21, "cpu-launcher-experimental"]], "OpenXLA Support on GPU [Experimental]": [[21, "openxla-support-on-gpu-experimental"]], "TensorFlow Serving": [[21, "tensorflow-serving"]], "Example:": [[22, "example"]], "FAQ": [[22, "faq"], [43, "faq"], [44, "faq"], [46, "faq"], [49, "faq"], [50, "faq"], [51, "faq"]], "Infrastructure": [[23, "infrastructure"]], "Architecture": [[23, "architecture"]], "TensorFlow Public API": [[23, "tensorflow-public-api"]], "Custom API": [[23, "custom-api"]], "Intel Advanced Feature and Extension Management": [[23, "intel-advanced-feature-and-extension-management"]], "XPU Engine": [[23, "xpu-engine"]], "Graph fusion": [[24, "graph-fusion"]], "Basic fusion": [[24, "basic-fusion"]], "Mixed data type fusion": [[24, "mixed-data-type-fusion"]], "Implementation Details": [[24, "implementation-details"]], "Generic layout optimizer": [[24, "generic-layout-optimizer"]], "Customized Operators": [[25, "customized-operators"]], "itex.ops.AdamWithWeightDecayOptimizer": [[25, "itex-ops-adamwithweightdecayoptimizer"]], "itex.ops.LayerNormalization": [[25, "itex-ops-layernormalization"]], "itex.ops.GroupNormalization": [[25, "itex-ops-groupnormalization"]], "itex.ops.gelu": [[25, "itex-ops-gelu"]], "itex.ops.ItexLSTM": [[25, "itex-ops-itexlstm"]], "Operators Override": [[26, "operators-override"]], "Layer Normalization": [[26, "layer-normalization"]], "Dense Layer": [[26, "dense-layer"]], "Gelu Activation": [[26, "gelu-activation"]], "Instance Normalization": [[26, "instance-normalization"]], "LSTM": [[26, "lstm"]], "Keras Mixed Precision": [[27, "keras-mixed-precision"]], "How to identify different hardware types?": [[27, "how-to-identify-different-hardware-types"]], "Setting the dtype policy": [[27, "setting-the-dtype-policy"]], "Building the model": [[27, "building-the-model"]], "Training the model with Model.fit": [[27, "training-the-model-with-model-fit"]], "Loss scaling": [[27, "loss-scaling"]], "Underflow and Overflow": [[27, "underflow-and-overflow"]], "Loss scaling overview": [[27, "loss-scaling-overview"]], "Training the model with a custom training loop": [[27, "training-the-model-with-a-custom-training-loop"]], "Launch Script User Guide": [[28, "launch-script-user-guide"]], "Common Execution Mode": [[28, "common-execution-mode"]], "Latency mode": [[28, "latency-mode"]], "Throughput mode": [[28, "throughput-mode"]], "Basic Settings": [[28, "basic-settings"]], "Launch Log": [[28, "launch-log"]], "Advanced Settings": [[28, "advanced-settings"]], "Multi-instance": [[28, "multi-instance"]], "NUMA Control": [[28, "numa-control"]], "Memory Allocator": [[28, "memory-allocator"], [29, "memory-allocator"]], "Single instance for inference": [[28, "single-instance-for-inference"]], "I. Use all physical cores": [[28, "i-use-all-physical-cores"]], "II. Use all cores including logical cores": [[28, "ii-use-all-cores-including-logical-cores"]], "III. Use physical cores on one node": [[28, "iii-use-physical-cores-on-one-node"]], "IV. Use your designated number of cores": [[28, "iv-use-your-designated-number-of-cores"]], "Multiple instances for inference": [[28, "multiple-instances-for-inference"]], "V. Throughput mode": [[28, "v-throughput-mode"]], "VI. Latency mode": [[28, "vi-latency-mode"]], "VII. Your designated number of instances": [[28, "vii-your-designated-number-of-instances"]], "VIII. Your designated number of instances and instance index": [[28, "viii-your-designated-number-of-instances-and-instance-index"]], "Set environment variables for inference": [[28, "set-environment-variables-for-inference"]], "IX. Set environment variable TF_NUM_INTRAOP_THREADS": [[28, "ix-set-environment-variable-tf-num-intraop-threads"]], "X. Set environment variable TF_NUM_INTEROP_THREADS": [[28, "x-set-environment-variable-tf-num-interop-threads"]], "Usage of TCMalloc/Jemalloc/Default memory allocator": [[28, "usage-of-tcmalloc-jemalloc-default-memory-allocator"]], "Jemalloc": [[28, "jemalloc"]], "TCMalloc": [[28, "tcmalloc"], [29, "tcmalloc"]], "Default memory allocator": [[28, "default-memory-allocator"]], "Practice Guide": [[29, "practice-guide"]], "Table of Contents": [[29, "table-of-contents"]], "CPU Practice Guide": [[29, "cpu-practice-guide"]], "Hardware Configuration": [[29, "hardware-configuration"]], "Non-Uniform Memory Access (NUMA)": [[29, "non-uniform-memory-access-numa"]], "Software Configuration": [[29, "software-configuration"]], "Memory Layout format": [[29, "memory-layout-format"]], "Numactl": [[29, "numactl"]], "OpenMP": [[29, "openmp"]], "OMP_NUM_THREADS": [[29, "omp-num-threads"]], "GNU OpenMP": [[29, "gnu-openmp"]], "Intel OpenMP": [[29, "intel-openmp"]], "GPU Practice Guide": [[29, "gpu-practice-guide"]], "Python APIs": [[30, "python-apis"]], "Prerequisite: import intel_extension_for_tensorflow as itex": [[30, "prerequisite-import-intel-extension-for-tensorflow-as-itex"]], "Python APIs and Environment Variable Names": [[30, "python-apis-and-environment-variable-names"]], "Python APIs and preserved environment variable Names": [[30, "python-apis-and-preserved-environment-variable-names"]], "Intel\u00ae Extension for TensorFlow* Config Protocol": [[30, "intel-extension-for-tensorflow-config-protocol"]], "itex.ConfigProto": [[30, "itex-configproto"]], "itex.GPUOptions": [[30, "itex-gpuoptions"]], "itex.GraphOptions": [[30, "itex-graphoptions"]], "itex.AutoMixedPrecisionOptions": [[30, "itex-automixedprecisionoptions"]], "itex.ShardingConfig": [[30, "itex-shardingconfig"]], "itex.DebugOptions": [[30, "itex-debugoptions"]], "itex.set_config": [[30, "itex-set-config"]], "itex.get_config": [[30, "itex-get-config"]], "itex operators": [[30, "itex-operators"]], "itex ops override": [[30, "itex-ops-override"]], "itex graph": [[30, "itex-graph"]], "itex version": [[30, "itex-version"]], "Install TensorFlow Serving with Intel\u00ae Extension for TensorFlow*": [[31, "install-tensorflow-serving-with-intel-extension-for-tensorflow"]], "Install Model Server": [[31, "install-model-server"]], "Install using Docker": [[31, "install-using-docker"]], "1. Build Intel\u00ae Extension for TensorFlow* C++ library": [[31, "build-intel-extension-for-tensorflow-c-library"]], "2. Build TensorFlow Serving": [[31, "build-tensorflow-serving"]], "Build Docker image from Dockerfile": [[31, "build-docker-image-from-dockerfile"]], "Run sample": [[31, "run-sample"]], "Experimental: Intel\u00ae Arc\u2122 A-Series GPU Software Installation": [[32, "experimental-intel-arc-a-series-gpu-software-installation"]], "Experimental Release": [[32, "experimental-release"]], "Windows Subsystem for Linux 2 (WSL2)": [[32, "windows-subsystem-for-linux-2-wsl2"], [32, "id1"]], "Native Linux Running Directly on Hardware": [[32, "native-linux-running-directly-on-hardware"], [32, "id2"]], "Step-By-Step Instructions": [[32, "step-by-step-instructions"]], "1. Install GPU Drivers": [[32, "install-gpu-drivers"]], "Windows GPU Drivers": [[32, "windows-gpu-drivers"]], "Ubuntu Linux Installed in WSL2": [[32, "ubuntu-linux-installed-in-wsl2"]], "2. Install TensorFlow* via PyPI Wheel in Linux": [[32, "install-tensorflow-via-pypi-wheel-in-linux"]], "Install TensorFlow": [[32, "install-tensorflow"], [34, "install-tensorflow"], [36, "install-tensorflow"], [37, "install-tensorflow"]], "Virtual environment install": [[32, "virtual-environment-install"], [36, "virtual-environment-install"], [37, "virtual-environment-install"]], "System environment install": [[32, "system-environment-install"], [36, "system-environment-install"], [37, "system-environment-install"]], "3. Install Intel\u00ae Extension for TensorFlow*": [[32, "install-intel-extension-for-tensorflow"]], "4. Verify the Installation": [[32, "verify-the-installation"]], "Optional: Install Full Intel\u00ae oneAPI": [[32, "optional-install-full-intel-oneapi"]], "Setup environment variables": [[32, "setup-environment-variables"], [37, "setup-environment-variables"]], "Conda Environment Installation Instructions": [[33, "conda-environment-installation-instructions"]], "Preconditions": [[33, "preconditions"]], "Step by step instructions:": [[33, "step-by-step-instructions"]], "Requirements": [[34, "requirements"]], "Common Requirements": [[34, "common-requirements"]], "Install Bazel": [[34, "install-bazel"]], "Download Source Code": [[34, "download-source-code"]], "Create a Conda Environment": [[34, "create-a-conda-environment"]], "Extra Requirements for XPU/GPU Build Only": [[34, "extra-requirements-for-xpu-gpu-build-only"]], "Install Intel GPU Driver": [[34, "install-intel-gpu-driver"]], "Install oneAPI Base Toolkit": [[34, "install-oneapi-base-toolkit"]], "Build Intel\u00ae Extension for TensorFlow* PyPI": [[34, "build-intel-extension-for-tensorflow-pypi"]], "Configure": [[34, "configure"]], "Configure For CPU": [[34, "configure-for-cpu"]], "Configure For GPU/XPU": [[34, "configure-for-gpu-xpu"]], "Build Source Code": [[34, "build-source-code"]], "Additional": [[34, "additional"]], "Configure Example for CPU": [[34, "configure-example-for-cpu"]], "Configure Example For GPU or XPU": [[34, "configure-example-for-gpu-or-xpu"]], "Intel\u00ae Extension for TensorFlow* for C++": [[35, "intel-extension-for-tensorflow-for-c"]], "Prepare": [[35, "prepare"], [41, "prepare"]], "Configure the build": [[35, "configure-the-build"]], "Build the CC library": [[35, "build-the-cc-library"]], "GPU support": [[35, "gpu-support"]], "CPU support": [[35, "cpu-support"]], "Prepare Tensorflow* CC library and header files": [[35, "prepare-tensorflow-cc-library-and-header-files"]], "Option 1: Extract from Tensorflow* python package (Recommended)": [[35, "option-1-extract-from-tensorflow-python-package-recommended"]], "Option 2: Build from TensorFlow* source code": [[35, "option-2-build-from-tensorflow-source-code"]], "Integrate the CC library": [[35, "integrate-the-cc-library"]], "Linker": [[35, "linker"]], "Load": [[35, "load"]], "Build and run": [[35, "build-and-run"]], "Intel CPU Software Installation": [[36, "intel-cpu-software-installation"]], "Install via Docker container": [[36, "install-via-docker-container"], [37, "install-via-docker-container"]], "Build Docker container from Dockerfile": [[36, "build-docker-container-from-dockerfile"], [37, "build-docker-container-from-dockerfile"]], "Get docker container from dockerhub": [[36, "get-docker-container-from-dockerhub"], [37, "get-docker-container-from-dockerhub"]], "Install via PyPI wheel in bare metal": [[36, "install-via-pypi-wheel-in-bare-metal"], [37, "install-via-pypi-wheel-in-bare-metal"]], "Install Intel\u00ae Extension for TensorFlow*": [[36, "install-intel-extension-for-tensorflow"], [37, "install-intel-extension-for-tensorflow"]], "Verify the Installation": [[36, "verify-the-installation"], [37, "verify-the-installation"]], "Intel XPU Software Installation": [[37, "intel-xpu-software-installation"]], "Install oneAPI Base Toolkit Packages": [[37, "install-oneapi-base-toolkit-packages"]], "Check the Environment for XPU": [[37, "check-the-environment-for-xpu"]], "XPU for CPU only platform": [[37, "xpu-for-cpu-only-platform"]], "Installation Guide": [[38, "installation-guide"]], "Accelerate AlexNet by Quantization with Intel\u00ae Extension for Tensorflow*": [[40, "accelerate-alexnet-by-quantization-with-intel-extension-for-tensorflow"]], "Hardware Environment": [[40, "hardware-environment"], [47, "hardware-environment"]], "GPU": [[40, "gpu"], [47, "gpu"]], "Local Server": [[40, "local-server"], [47, "local-server"]], "Intel\u00ae DevCloud": [[40, "intel-devcloud"], [47, "intel-devcloud"]], "Running Environment": [[40, "running-environment"], [47, "running-environment"]], "Set up Base Running Environment": [[40, "set-up-base-running-environment"]], "Set up Intel\u00ae Extension for Tensorflow* for GPU": [[40, "set-up-intel-extension-for-tensorflow-for-gpu"]], "Execute": [[40, "execute"], [50, "execute"]], "Common Guide for Running": [[41, "common-guide-for-running"]], "Intel GPU Driver": [[41, "intel-gpu-driver"]], "Intel\u00ae oneAPI Base Toolkit": [[41, "intel-oneapi-base-toolkit"]], "Setup Running Environment": [[41, "setup-running-environment"], [43, "setup-running-environment"], [44, "setup-running-environment"], [46, "setup-running-environment"], [49, "setup-running-environment"], [50, "setup-running-environment"], [51, "setup-running-environment"], [52, "setup-running-environment"]], "Running": [[41, "running"]], "Enable oneAPI Running Environment": [[41, "enable-oneapi-running-environment"]], "Enable Virtual Running Environment": [[41, "enable-virtual-running-environment"]], "Run Script": [[41, "run-script"]], "Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision on Intel CPU and GPU via Docker Container or Bare Metal": [[43, "speed-up-inference-of-inception-v4-by-advanced-automatic-mixed-precision-on-intel-cpu-and-gpu-via-docker-container-or-bare-metal"]], "Step": [[43, "step"]], "Hardware Requirement": [[43, "hardware-requirement"], [55, "hardware-requirement"]], "Prepare for GPU (Skip this Step for CPU)": [[43, "prepare-for-gpu-skip-this-step-for-cpu"]], "Clone the Repository": [[43, "clone-the-repository"]], "Download the Pretrained-model": [[43, "download-the-pretrained-model"]], "Enable Running Environment": [[43, "enable-running-environment"], [44, "enable-running-environment"], [46, "enable-running-environment"], [49, "enable-running-environment"], [50, "enable-running-environment"], [51, "enable-running-environment"], [54, "enable-running-environment"]], "Execute Testing and Comparing the Performance of FP32 and Advanced AMP on CPU and GPU in Docker Container or Bare Metal": [[43, "execute-testing-and-comparing-the-performance-of-fp32-and-advanced-amp-on-cpu-and-gpu-in-docker-container-or-bare-metal"]], "Environment Variable Configuration": [[43, "environment-variable-configuration"]], "Result": [[43, "result"]], "Advanced: Enable Advanced AMP Method": [[43, "advanced-enable-advanced-amp-method"]], "ResNet50 Inference on Intel CPU and GPU": [[44, "resnet50-inference-on-intel-cpu-and-gpu"]], "Prerequisites": [[44, "prerequisites"], [46, "prerequisites"], [49, "prerequisites"], [50, "prerequisites"], [51, "prerequisites"], [54, "prerequisites"]], "Prepare for GPU (Skip this step for CPU)": [[44, "prepare-for-gpu-skip-this-step-for-cpu"], [49, "prepare-for-gpu-skip-this-step-for-cpu"], [50, "prepare-for-gpu-skip-this-step-for-cpu"]], "Executes the Example with Python API": [[44, "executes-the-example-with-python-api"], [49, "executes-the-example-with-python-api"], [54, "executes-the-example-with-python-api"]], "Example Output": [[44, "example-output"], [48, "example-output"], [49, "example-output"], [54, "example-output"]], "Accelerate Deep Learning Training and Inference for Model Zoo Workloads on Intel GPU": [[45, "accelerate-deep-learning-training-and-inference-for-model-zoo-workloads-on-intel-gpu"]], "Quick Start Guide": [[45, "quick-start-guide"]], "Run Models in the Docker Container": [[45, "run-models-in-the-docker-container"]], "Run Models on Bare Metal": [[45, "run-models-on-bare-metal"]], "Accelerate BERT-Large Pretraining on Intel GPU": [[46, "accelerate-bert-large-pretraining-on-intel-gpu"]], "Model Code change": [[46, "model-code-change"], [49, "model-code-change"], [51, "model-code-change"]], "Prepare for GPU": [[46, "prepare-for-gpu"], [51, "prepare-for-gpu"], [54, "prepare-for-gpu"]], "Prepare Dataset": [[46, "prepare-dataset"]], "Execute the Example": [[46, "execute-the-example"], [51, "execute-the-example"]], "Pretraining Command": [[46, "pretraining-command"]], "Finetune Command": [[46, "finetune-command"]], "Quantize Inception V3 by Intel\u00ae Extension for Tensorflow* on Intel\u00ae Xeon\u00ae": [[47, "quantize-inception-v3-by-intel-extension-for-tensorflow-on-intel-xeon"]], "Configuration": [[47, "configuration"]], "Intel\u00ae Extension for Tensorflow* Version": [[47, "intel-extension-for-tensorflow-version"]], "Enable oneDNN Graph": [[47, "enable-onednn-graph"]], "Disable Constant Folding Function": [[47, "disable-constant-folding-function"]], "CPU": [[47, "cpu"]], "Check Intel\u00ae Deep Learning Boost": [[47, "check-intel-deep-learning-boost"]], "Check Intel\u00ae Advanced Matrix Extensions": [[47, "check-intel-advanced-matrix-extensions"]], "Startup Jupyter Notebook": [[47, "startup-jupyter-notebook"], [50, "startup-jupyter-notebook"]], "License": [[47, "license"], [55, "license"]], "Quick Example on Intel CPU and GPU": [[48, "quick-example-on-intel-cpu-and-gpu"]], "Installation": [[48, "installation"]], "Code": [[48, "code"]], "quick_example.py": [[48, "quick-example-py"]], "Notes": [[48, "notes"]], "Stable Diffusion Inference for Text2Image on Intel GPU": [[49, "stable-diffusion-inference-for-text2image-on-intel-gpu"]], "Running the Jupyter Notebook": [[49, "running-the-jupyter-notebook"]], "FP32 Inference": [[49, "fp32-inference"]], "FP16 Inference": [[49, "fp16-inference"]], "Accuracy": [[49, "accuracy"], [51, "accuracy"]], "BERT Training for Classifying Text on Intel CPU and GPU": [[50, "bert-training-for-classifying-text-on-intel-cpu-and-gpu"]], "Download Jupyter Code:": [[50, "download-jupyter-code"]], "FP8 BERT-Large Fine-tuning for Classifying Text on Intel GPU": [[51, "fp8-bert-large-fine-tuning-for-classifying-text-on-intel-gpu"]], "BF16 + FP8 Fine-tuning": [[51, "bf16-fp8-fine-tuning"]], "Distributed Training Example with Intel\u00ae Optimization for Horovod* on Intel\u00ae GPU": [[52, "distributed-training-example-with-intel-optimization-for-horovod-on-intel-gpu"]], "Dependency": [[52, "dependency"], [53, "dependency"]], "Create Virtual Environment": [[52, "create-virtual-environment"]], "Install": [[52, "install"], [55, "install"]], "Prepare Example Code": [[52, "prepare-example-code"]], "Clone Horovod Repo": [[52, "clone-horovod-repo"]], "Download Patch": [[52, "download-patch"]], "Apply Patch for Intel GPU": [[52, "apply-patch-for-intel-gpu"]], "Execution": [[52, "execution"], [53, "execution"]], "Enable oneAPI": [[52, "enable-oneapi"]], "Device Count": [[52, "device-count"]], "Running Command": [[52, "running-command"]], "Output": [[52, "output"]], "Distributed Training Example with Intel\u00ae Optimization for Horovod*": [[53, "distributed-training-example-with-intel-optimization-for-horovod"]], "Model Information": [[53, "model-information"]], "Model examples preparation": [[53, "model-examples-preparation"]], "Model Repo": [[53, "model-repo"]], "Download Dataset": [[53, "download-dataset"]], "Set Model Parameters": [[53, "set-model-parameters"]], "HVD command": [[53, "hvd-command"]], "OUTPUT": [[53, "output"]], "Performance Data": [[53, "performance-data"]], "Accelerate ResNet50 Training by XPUAutoShard on Intel GPU": [[54, "accelerate-resnet50-training-by-xpuautoshard-on-intel-gpu"]], "Prepare the Codes": [[54, "prepare-the-codes"]], "Install Other Required Packages": [[54, "install-other-required-packages"]], "Setup PYTHONPATH": [[54, "setup-pythonpath"]], "Without XPUAutoShard": [[54, "without-xpuautoshard"]], "With XPUAutoShard": [[54, "with-xpuautoshard"]], "Sharding Parameters Setting": [[54, "sharding-parameters-setting"]], "Further Settings": [[54, "further-settings"]], "Executing Command": [[54, "executing-command"]], "Quick Get Started*": [[55, "quick-get-started"]], "Software Requirement": [[55, "software-requirement"]], "Installation Channel:": [[55, "installation-channel"]], "Compatibility Table": [[55, "compatibility-table"]], "Install for XPU": [[55, "install-for-xpu"]], "Install for CPU": [[55, "install-for-cpu"]], "Install for weekly binaries": [[55, "install-for-weekly-binaries"]], "Install for GPU weekly": [[55, "install-for-gpu-weekly"]], "Contributing": [[55, "contributing"]], "Support": [[55, "support"]], "Security": [[55, "security"]]}, "indexentries": {}})
\ No newline at end of file
+Search.setIndex({"docnames": ["CODE_OF_CONDUCT", "SECURITY", "docker/README", "docker/tensorflow-serving/README", "docs/README", "docs/build_docs/docs_build_tips", "docs/build_docs/source/index", "docs/community/contributing", "docs/community/releases", "docs/design/directory_structure", "docs/design/extension_design", "docs/design/how_to_write_custom_op", "docs/design/optimization/README", "docs/design/optimization/oneDNN_object_cache", "docs/guide/FAQ", "docs/guide/INT8_quantization", "docs/guide/OpenXLA_Support_on_GPU", "docs/guide/XPUAutoShard", "docs/guide/aamp_tune", "docs/guide/advanced_auto_mixed_precision", "docs/guide/environment_variables", "docs/guide/features", "docs/guide/how_to_enable_profiler", "docs/guide/infrastructure", "docs/guide/itex_fusion", "docs/guide/itex_ops", "docs/guide/itex_ops_override", "docs/guide/keras_mixed_precision", "docs/guide/launch", "docs/guide/practice_guide", "docs/guide/python_api", "docs/guide/tf_serving_install", "docs/install/experimental/install_for_arc_gpu", "docs/install/experimental/install_for_gpu_conda", "docs/install/how_to_build", "docs/install/install_for_cpp", "docs/install/install_for_cpu", "docs/install/install_for_xpu", "docs/install/installation_guide", "examples/README", "examples/accelerate_alexnet_by_quantization/README", "examples/common_guide_running", "examples/examples", "examples/infer_inception_v4_amp/README", "examples/infer_resnet50/README", "examples/model_zoo_example/README", "examples/pretrain_bert/README", "examples/quantize_inception_v3/README", "examples/quick_example", "examples/stable_diffussion_inference/README", "examples/train_3d_unet/README", "examples/train_bert/README", "examples/train_bert_fp8/README", "examples/train_horovod/mnist/README", "examples/train_horovod/resnet50/README", "examples/train_resnet50_with_autoshard/README", "get_started", "index"], "filenames": ["CODE_OF_CONDUCT.md", "SECURITY.md", "docker/README.md", "docker/tensorflow-serving/README.md", "docs/README.md", "docs/build_docs/docs_build_tips.md", "docs/build_docs/source/index.rst", "docs/community/contributing.md", "docs/community/releases.md", "docs/design/directory_structure.md", "docs/design/extension_design.md", "docs/design/how_to_write_custom_op.md", "docs/design/optimization/README.md", "docs/design/optimization/oneDNN_object_cache.md", "docs/guide/FAQ.md", "docs/guide/INT8_quantization.md", "docs/guide/OpenXLA_Support_on_GPU.md", "docs/guide/XPUAutoShard.md", "docs/guide/aamp_tune.md", "docs/guide/advanced_auto_mixed_precision.md", "docs/guide/environment_variables.md", "docs/guide/features.rst", "docs/guide/how_to_enable_profiler.md", "docs/guide/infrastructure.md", "docs/guide/itex_fusion.md", "docs/guide/itex_ops.md", "docs/guide/itex_ops_override.md", "docs/guide/keras_mixed_precision.md", "docs/guide/launch.md", "docs/guide/practice_guide.md", "docs/guide/python_api.md", "docs/guide/tf_serving_install.md", "docs/install/experimental/install_for_arc_gpu.md", "docs/install/experimental/install_for_gpu_conda.md", "docs/install/how_to_build.md", "docs/install/install_for_cpp.md", "docs/install/install_for_cpu.md", "docs/install/install_for_xpu.md", "docs/install/installation_guide.rst", "examples/README.md", "examples/accelerate_alexnet_by_quantization/README.md", "examples/common_guide_running.md", "examples/examples.md", "examples/infer_inception_v4_amp/README.md", "examples/infer_resnet50/README.md", "examples/model_zoo_example/README.md", "examples/pretrain_bert/README.md", "examples/quantize_inception_v3/README.md", "examples/quick_example.md", "examples/stable_diffussion_inference/README.md", "examples/train_3d_unet/README.md", "examples/train_bert/README.md", "examples/train_bert_fp8/README.md", "examples/train_horovod/mnist/README.md", "examples/train_horovod/resnet50/README.md", "examples/train_resnet50_with_autoshard/README.md", "get_started.md", "index.rst"], "titles": ["Contributor Covenant Code of Conduct", "Security Policy", "Intel\u00ae Extension for TensorFlow* Docker Container Guide", "Intel\u00ae Extension for TensorFlow* Serving - Docker Container Guide", "Welcome to Intel\u00ae Extension for TensorFlow* documentation", "Online Documentation Build Guide", "Welcome to Intel \u00ae Extension for TensorFlow* documentation!", "Contributing guidelines", "Releases", "Directory Tree Structure", "Extension Design", "How to write custom op", "Optimizations Design", "oneDNN object cache optimization", "Frequently Asked Questions", "INT8 Quantization", "OpenXLA Support on GPU via PJRT", "XPUAutoShard on GPU [Experimental]", "Tune Advanced Auto Mixed Precision", "Advanced Auto Mixed Precision", "Environment Variables", "Features", "GPU Profiler", "Infrastructure", "Graph fusion", "Customized Operators", "Operators Override", "Keras Mixed Precision", "Launch Script User Guide", "Practice Guide", "Python APIs", "Install TensorFlow Serving with Intel\u00ae Extension for TensorFlow*", "Experimental: Intel\u00ae Arc\u2122 A-Series GPU Software Installation", "Conda Environment Installation Instructions", "Overview", "Intel\u00ae Extension for TensorFlow* for C++", "Intel CPU Software Installation", "Intel XPU Software Installation", "Installation Guide", "Examples", "Accelerate AlexNet by Quantization with Intel\u00ae Extension for Tensorflow*", "Common Guide for Running", "Examples", "Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision on Intel CPU and GPU via Docker Container or Bare Metal", "ResNet50 Inference on Intel CPU and GPU", "Accelerate Deep Learning Training and Inference for Model Zoo Workloads on Intel GPU", "Accelerate BERT-Large Pretraining on Intel GPU", "Quantize Inception V3 by Intel\u00ae Extension for Tensorflow* on Intel\u00ae Xeon\u00ae", "Quick Example on Intel CPU and GPU", "Stable Diffusion Inference for Text2Image on Intel GPU", "Accelerate 3D-Unet Training w/o horovod for medical image segmentation on Intel GPU", "BERT Training for Classifying Text on Intel CPU and GPU", "FP8 BERT-Large Fine-tuning for Classifying Text on Intel GPU", "Distributed Training Example with Intel\u00ae Optimization for Horovod* on Intel\u00ae GPU", "Distributed Training Example with Intel\u00ae Optimization for Horovod*", "Accelerate ResNet50 Training by XPUAutoShard on Intel GPU", "Quick Get Started*", "Welcome to Intel \u00ae Extension for TensorFlow* documentation!"], "terms": {"we": [0, 2, 7, 11, 16, 24, 27, 29, 30, 31, 33, 34, 35, 40, 41, 43, 46, 47, 49, 50, 52, 56], "member": [0, 30], "leader": 0, "make": [0, 2, 3, 5, 7, 11, 14, 16, 18, 19, 27, 29, 34, 35, 43], "particip": 0, "commun": [0, 2, 7, 9, 21, 23, 29, 37, 56], "harass": 0, "free": [0, 21, 28], "experi": [0, 4, 21, 23, 29], "everyon": 0, "regardless": 0, "ag": 0, "bodi": 0, "size": [0, 20, 25, 27, 28, 53, 55], "visibl": [0, 2, 11, 31, 54], "invis": 0, "disabl": [0, 15, 19, 28, 29, 30], "ethnic": 0, "sex": 0, "characterist": 0, "gender": 0, "ident": [0, 27], "express": 0, "level": [0, 14, 16, 17, 23, 24, 27, 32], "educ": [0, 54], "socio": 0, "econom": 0, "statu": [0, 11, 19, 35], "nation": 0, "person": 0, "appear": [0, 27], "race": 0, "cast": [0, 18, 24, 27], "color": 0, "religion": 0, "sexual": 0, "orient": 0, "act": [0, 21, 31], "interact": [0, 34], "wai": [0, 14, 19, 27, 31, 33], "contribut": [0, 4, 21, 28, 34], "an": [0, 2, 3, 7, 11, 13, 14, 18, 19, 21, 24, 25, 27, 28, 29, 31, 33, 34, 35, 37, 39, 42, 47, 48, 52, 55, 56], "open": [0, 5, 7, 14, 18, 21, 31, 32, 43, 44, 46, 47, 49, 50, 51, 52, 56], "welcom": [0, 7, 56], "divers": 0, "inclus": 0, "healthi": 0, "exampl": [0, 2, 4, 5, 7, 9, 11, 15, 20, 21, 24, 25, 26, 27, 29, 30, 31, 33, 40, 43, 45, 47, 51, 56], "behavior": [0, 27, 28, 29], "posit": [0, 7], "environ": [0, 4, 11, 13, 15, 19, 21, 22, 23, 27, 29, 31, 35, 38, 39, 42, 54, 56], "includ": [0, 7, 13, 14, 16, 17, 18, 20, 23, 35, 37, 47, 48, 54, 56], "demonstr": [0, 16, 39, 42], "empathi": 0, "kind": [0, 4, 21, 48], "toward": 0, "other": [0, 17, 20, 25, 27, 28, 29, 30, 31, 32, 34, 37, 51, 53, 54, 56], "peopl": 0, "Being": 0, "respect": [0, 28, 46], "differ": [0, 2, 4, 13, 16, 20, 21, 23, 25, 28, 29, 30, 38, 54], "opinion": 0, "viewpoint": 0, "give": 0, "gracefulli": 0, "accept": [0, 7, 17], "construct": [0, 11, 17, 27], "feedback": [0, 7], "apolog": 0, "those": [0, 18, 19, 31, 54], "affect": [0, 18, 27], "mistak": 0, "learn": [0, 15, 19, 21, 25, 28, 29, 31, 34, 39, 40, 42, 56], "from": [0, 3, 4, 5, 7, 11, 16, 17, 18, 19, 21, 22, 27, 28, 29, 30, 32, 34, 38, 39, 42, 43, 45, 46, 47, 50, 51, 54, 56], "focus": 0, "what": [0, 14, 27], "i": [0, 4, 5, 7, 9, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, 31, 32, 33, 34, 35, 36, 37, 39, 40, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56], "best": [0, 32], "just": 0, "u": [0, 16, 22, 28, 37], "individu": [0, 20], "overal": [0, 29], "unaccept": 0, "The": [0, 2, 4, 5, 7, 9, 13, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 31, 32, 34, 35, 36, 37, 40, 43, 46, 47, 50, 51, 52, 53, 55], "us": [0, 2, 3, 4, 5, 7, 13, 14, 15, 16, 18, 19, 20, 22, 23, 24, 25, 26, 27, 29, 30, 32, 33, 34, 35, 37, 39, 41, 42, 43, 45, 46, 47, 48, 50, 51, 52, 54, 56], "languag": [0, 35], "imageri": 0, "attent": [0, 20], "advanc": [0, 4, 14, 20, 30, 39, 42, 56], "ani": [0, 4, 11, 20, 21, 23, 24, 27, 28, 32, 33, 34, 37, 40, 48, 51], "troll": 0, "insult": 0, "derogatori": 0, "comment": [0, 7, 14], "polit": 0, "attack": 0, "public": [0, 4, 5, 11, 21, 25, 30, 31], "privat": 0, "publish": [0, 5], "inform": [0, 1, 7, 8, 20, 28, 29, 30, 34, 37, 40, 47, 56], "physic": [0, 29, 55], "email": 0, "address": [0, 29, 32], "without": [0, 4, 18, 20, 21, 23, 27, 34, 39, 42, 47, 51, 56], "explicit": [0, 11, 27, 29], "permiss": [0, 5], "which": [0, 4, 7, 9, 13, 14, 15, 16, 17, 18, 19, 20, 24, 27, 28, 29, 30, 32, 34, 37, 40, 41, 47, 52], "could": [0, 14, 18, 27, 30, 35, 37, 40, 47], "reason": [0, 27], "consid": [0, 18, 53], "inappropri": 0, "profession": 0, "set": [0, 4, 7, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 29, 30, 32, 33, 35, 43, 46, 47, 52, 56], "ar": [0, 2, 4, 5, 7, 11, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 34, 36, 37, 39, 40, 42, 43, 46, 47, 48, 50, 53, 56], "clarifi": 0, "take": [0, 11, 24, 27, 28, 29, 31, 33, 46], "appropri": [0, 3, 29, 34], "fair": 0, "action": [0, 5], "thei": [0, 18, 27, 28, 29], "deem": 0, "threaten": 0, "offens": 0, "harm": 0, "have": [0, 18, 27, 29, 32, 33, 34, 40, 47], "right": [0, 25], "remov": [0, 11, 18, 24, 54], "edit": [0, 2], "reject": 0, "commit": [0, 5, 17, 31, 54], "wiki": 0, "issu": [0, 1, 7, 14, 18, 27, 32, 34, 37, 51, 56], "align": [0, 13], "thi": [0, 2, 3, 5, 11, 13, 14, 16, 17, 18, 19, 20, 21, 23, 24, 25, 27, 28, 29, 30, 31, 33, 34, 35, 37, 40, 41, 45, 46, 47, 48, 50, 52, 55, 56], "moder": 0, "decis": [0, 17], "when": [0, 5, 14, 17, 19, 24, 27, 28, 29, 31, 32, 34, 46, 47, 50, 51], "appli": [0, 17, 25, 27, 30, 46, 49, 50, 52, 54, 55], "within": [0, 15, 25, 32, 46], "all": [0, 7, 11, 14, 18, 20, 21, 25, 27, 29, 32, 37, 40, 43, 46, 54, 55], "space": [0, 29, 56], "also": [0, 4, 7, 15, 16, 17, 19, 21, 23, 27, 28, 29, 32, 33, 36, 37, 56], "offici": [0, 29, 39, 40, 41, 42, 46, 49, 50, 52, 54, 55], "repres": [0, 17], "e": [0, 2, 3, 5, 11, 17, 27, 28, 31, 35, 37], "mail": 0, "post": [0, 7, 18, 19, 24, 30], "via": [0, 11, 17, 39, 42, 55, 56], "social": 0, "media": 0, "account": 0, "appoint": 0, "onlin": [0, 56], "offlin": 0, "event": 0, "instanc": 0, "abus": 0, "otherwis": [0, 17, 27, 30, 47, 48], "mai": [0, 7, 13, 14, 18, 19, 24, 27, 28, 29, 32, 33, 37, 49, 56], "report": [0, 7, 20, 56], "itex": [0, 2, 3, 4, 8, 9, 11, 13, 14, 16, 17, 18, 19, 20, 21, 23, 26, 27, 28, 31, 32, 33, 34, 35, 36, 37, 41, 43, 47, 49, 54, 55, 56], "maintain": [0, 7, 8, 18, 21, 23, 25, 31], "intel": [0, 1, 5, 8, 9, 11, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, 28, 33, 38, 39, 42, 56], "com": [0, 5, 7, 8, 16, 21, 27, 29, 31, 32, 33, 34, 35, 37, 40, 43, 46, 47, 49, 50, 51, 52, 53, 54, 55, 56], "complaint": 0, "review": 0, "investig": [0, 28], "promptli": 0, "fairli": 0, "oblig": 0, "privaci": 0, "secur": 0, "incid": 0, "follow": [0, 2, 3, 7, 15, 17, 18, 22, 24, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 43, 44, 46, 48, 49, 50, 51, 52, 55, 56], "impact": [0, 5, 14, 18, 24, 29, 51], "determin": [0, 11, 27, 29], "consequ": 0, "violat": 0, "unprofession": 0, "unwelcom": 0, "A": [0, 5, 16, 17, 18, 24, 27, 28, 29, 30, 31, 37, 39, 42, 43, 53], "written": [0, 7], "provid": [0, 2, 4, 7, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32, 33, 34, 37, 39, 40, 42, 46, 47, 50, 55, 56], "clariti": 0, "around": [0, 28, 46], "natur": 0, "explan": 0, "why": 0, "wa": [0, 28, 29, 30, 34], "apologi": 0, "request": [0, 7, 56], "through": [0, 14, 27, 29, 34, 50, 56], "singl": [0, 4, 7, 15, 20, 21, 24, 46], "seri": [0, 16, 29, 30, 34, 37, 40, 43, 45, 46, 47, 49, 50, 51, 52, 53, 55, 56], "continu": [0, 14, 18, 27], "No": [0, 14, 19, 22, 34, 43, 44, 46, 49, 50, 51, 52], "involv": 0, "unsolicit": 0, "specifi": [0, 3, 11, 21, 24, 27, 28, 29, 31, 34], "period": [0, 29], "time": [0, 11, 14, 16, 18, 19, 20, 21, 22, 27, 29, 34, 40, 46], "avoid": [0, 24, 27, 28, 29, 33], "well": [0, 2, 8, 11, 21, 26, 27, 28, 29, 46], "extern": [0, 14, 35], "channel": [0, 24, 25, 38], "like": [0, 2, 7, 16, 17, 25, 27, 29, 30, 41, 43, 52, 53], "term": [0, 25, 56], "lead": [0, 18], "seriou": 0, "sustain": 0, "sort": 0, "allow": [0, 16, 18, 27, 29, 51, 56], "dure": [0, 15, 18, 19, 24, 27, 33, 34, 43], "pattern": [0, 4, 15, 21, 24], "aggress": [0, 18, 19], "disparag": 0, "class": [0, 11, 27, 30], "adapt": 0, "version": [0, 2, 11, 14, 16, 27, 29, 32, 33, 34, 36, 37, 40, 41], "avail": [0, 2, 3, 11, 14, 19, 25, 28, 29, 34, 36, 37, 50, 56], "http": [0, 2, 5, 7, 8, 16, 21, 22, 27, 29, 31, 32, 33, 34, 35, 36, 37, 40, 43, 46, 47, 49, 50, 51, 52, 53, 54, 55, 56], "www": [0, 21, 37], "org": [0, 2, 7, 21, 35, 51, 54], "_": [0, 11, 13, 16, 17, 18, 20, 22, 24, 27, 28, 29, 30, 31, 34, 35, 41, 43, 46, 47, 48, 49, 50, 51, 52, 53, 54], "html": [0, 5, 37], "were": [0, 28, 29], "inspir": 0, "mozilla": 0, "": [0, 5, 14, 18, 20, 21, 27, 29, 31, 34, 35, 40, 43, 47, 49, 50, 51, 56], "ladder": 0, "For": [0, 1, 2, 7, 11, 14, 15, 16, 18, 19, 20, 23, 25, 26, 27, 28, 30, 31, 32, 35, 37, 43, 44, 45, 46, 49, 50, 51, 52, 53, 55], "answer": 0, "common": [0, 11, 14, 17, 21, 29], "question": [0, 4, 56], "about": [0, 7, 19, 29, 31, 40, 46, 47, 53], "see": [0, 1, 2, 7, 22, 25, 27, 28, 29, 31, 32, 34, 47, 56], "faq": 0, "translat": [0, 34], "center": [1, 4, 16, 21, 25, 26, 30, 34, 37, 40, 43, 45, 46, 47, 49, 50, 51, 52, 53, 55, 56], "more": [1, 4, 7, 11, 16, 18, 19, 21, 25, 29, 31, 32, 34, 37, 40, 46, 47, 48, 53], "how": [1, 5, 14, 16, 17, 18, 29, 31, 34, 35, 37, 39, 42, 53, 56], "work": [1, 4, 7, 14, 15, 19, 20, 21, 27, 28, 29, 35, 40, 47], "resolv": 1, "handl": [1, 13], "guidelin": [1, 4, 45, 56], "document": [2, 3, 27, 33], "ha": [2, 3, 14, 18, 19, 27, 29, 32, 35, 46, 55], "instruct": [2, 3, 4, 7, 18, 19, 21, 29, 36, 37, 49, 56], "assumpt": [2, 3], "host": [2, 3, 27, 37, 43], "machin": [2, 3, 21, 27, 28, 29, 31, 36, 37, 48, 53], "linux": [2, 3, 7, 16, 28, 29, 33, 34, 36, 37, 47], "kernel": [2, 3, 9, 10, 15, 16, 20, 22, 23, 24, 25, 27, 32, 34, 36, 37, 46, 47, 49, 56], "compat": [2, 3, 4, 15, 19, 21, 23, 26, 27, 30, 46, 47, 49, 50, 51, 52], "driver": [2, 3, 14, 27, 33, 40, 43, 47, 56], "instal": [2, 3, 4, 7, 9, 14, 18, 19, 21, 22, 23, 26, 27, 28, 29, 30, 40, 41, 43, 44, 46, 47, 49, 50, 51, 52, 54], "softwar": [2, 33, 38, 40, 47, 48, 53], "refer": [2, 3, 7, 11, 15, 16, 17, 18, 19, 20, 21, 23, 27, 29, 30, 31, 32, 34, 35, 37, 40, 41, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 56], "xpu": [2, 4, 11, 14, 16, 17, 19, 22, 25, 26, 27, 30, 38, 48, 49, 53], "cpu": [2, 3, 4, 9, 11, 14, 15, 18, 19, 20, 23, 24, 27, 30, 31, 38, 39, 40, 42], "detail": [2, 3, 11, 15, 16, 17, 18, 19, 21, 23, 25, 27, 29, 30, 32, 34, 37, 40, 43, 46, 56], "download": [2, 8, 27, 29, 32, 35, 37, 46], "copi": [2, 3, 35], "wheel": [2, 33, 34], "model": [2, 3, 13, 15, 16, 17, 18, 19, 20, 21, 22, 29, 30, 39, 40, 42, 47, 51, 53, 55, 56], "directori": [2, 3, 4, 5, 7, 14, 17, 28, 31, 32, 34, 35, 37, 43, 44, 46, 49, 50, 52], "you": [2, 3, 4, 5, 7, 8, 11, 13, 14, 16, 17, 18, 20, 21, 22, 23, 27, 28, 29, 30, 31, 32, 33, 34, 36, 37, 40, 41, 43, 44, 46, 47, 48, 49, 50, 52, 54, 55], "can": [2, 3, 7, 11, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 25, 27, 28, 29, 30, 31, 32, 33, 34, 36, 37, 38, 40, 46, 50, 55, 56], "get": [2, 4, 7, 11, 13, 16, 21, 27, 29, 30, 31, 32, 34, 43, 44, 46, 49, 50, 52], "link": [2, 35, 47], "pypi": [2, 38, 56], "project": [2, 5, 7, 56], "file": [2, 5, 7, 14, 17, 18, 22, 28, 31, 32, 37, 43, 44, 46, 49, 50, 51, 52, 54, 56], "lib": [2, 14, 16, 28, 34, 35, 51], "To": [2, 3, 4, 7, 18, 19, 24, 27, 29, 32, 34, 35, 36, 37, 40, 46, 47, 49, 50], "optim": [2, 4, 9, 14, 15, 16, 17, 18, 23, 25, 26, 27, 28, 29, 30, 32, 33, 37, 39, 40, 42, 43, 45, 46, 47, 49, 50, 56], "horovod": [2, 32, 33, 37, 39, 42], "oneapi": [2, 14, 16, 21, 31, 33, 35, 40, 43, 44, 46, 47, 49, 50, 51, 52, 55, 56], "collect": [2, 29, 37], "librari": [2, 3, 11, 28, 29, 32, 34, 37, 50], "oneccl": [2, 32, 33, 37], "mkdir": [2, 3, 54, 55], "cd": [2, 5, 7, 16, 29, 31, 34, 35, 43, 46, 49, 50, 52, 53, 54, 55], "wget": [2, 7, 29, 32, 34, 35, 37, 43, 51, 53], "sh": [2, 3, 5, 14, 31, 32, 33, 34, 35, 37, 41, 43, 44, 46, 47, 49, 50, 51, 52, 53, 56], "o": [2, 16, 22, 32, 33, 35, 37, 39, 47], "some": [2, 11, 16, 18, 19, 26, 27, 28, 29, 34, 46, 53], "python": [2, 4, 9, 14, 16, 19, 22, 23, 25, 26, 27, 28, 29, 31, 32, 33, 34, 36, 37, 40, 41, 46, 47, 48, 50, 51, 52, 53, 54, 56], "hard": [2, 49], "code": [2, 4, 5, 9, 11, 16, 20, 21, 22, 23, 29, 31, 38, 39, 40, 42, 43, 47, 54], "insid": [2, 56], "If": [2, 3, 5, 16, 20, 22, 25, 26, 27, 28, 29, 30, 32, 34, 36, 37, 40, 43, 44, 46, 47, 48, 49, 50, 52, 54], "re": [2, 29, 41], "3": [2, 7, 18, 20, 22, 24, 25, 26, 27, 28, 29, 30, 33, 34, 35, 36, 37, 40, 41, 47, 48, 55], "10": [2, 14, 16, 18, 19, 25, 27, 28, 32, 34, 36, 37, 47, 55, 56], "2": [2, 14, 15, 17, 18, 19, 20, 24, 25, 27, 28, 29, 30, 33, 34, 36, 37, 40, 43, 44, 46, 47, 48, 49, 50, 52, 53, 54, 55, 56], "13": [2, 16, 32, 33, 34, 35, 36, 37, 40, 47, 53, 55, 56], "ubuntu": [2, 16, 31, 34, 35, 36, 37], "22": [2, 16, 31, 32, 34, 36, 37, 55], "04": [2, 16, 31, 32, 34, 35, 36, 37], "layer": [2, 9, 19, 25, 27, 47], "updat": [2, 18, 27, 31, 32, 33, 34, 35, 36, 37, 55], "shown": [2, 3, 15, 22, 24, 28, 46, 49, 50], "below": [2, 3, 16, 24, 25, 27, 28, 29, 30, 31, 32, 34, 46, 54], "image_nam": [2, 3], "arg": [2, 13, 30], "ubuntu_vers": 2, "python3": [2, 5, 33, 34, 49, 51], "tf_ver": 2, "whl": [2, 11, 32, 34, 35, 56], "t": [2, 5, 11, 13, 17, 18, 20, 27, 28, 49, 51], "f": [2, 32, 35, 56], "dockerfil": 2, "enter": [2, 3, 22, 33, 34], "folder": [2, 3, 19, 31, 34, 54], "command": [2, 3, 14, 16, 22, 28, 29, 32, 33, 34, 36, 37, 41, 43, 47, 52], "start": [2, 3, 14, 21, 22, 27, 28, 31], "v": [2, 3, 18, 31, 33, 35, 37, 41, 43], "option": [2, 3, 7, 11, 16, 18, 21, 28, 30, 34, 54, 55, 56], "mount": [2, 3], "your": [2, 3, 5, 7, 14, 29, 31, 32, 33, 34, 36, 37, 41, 43, 47, 49, 51, 54, 55, 56], "local": [2, 3, 7, 14, 19, 28, 29, 31, 34, 35, 36, 37, 53], "attach": [2, 3, 27, 29], "devic": [2, 3, 4, 9, 10, 11, 13, 14, 16, 17, 19, 20, 21, 22, 23, 24, 27, 30, 31, 34, 35, 37, 43, 54, 55, 56], "dev": [2, 3, 14, 22, 31, 37, 43, 52], "dri": [2, 3, 31, 37, 43], "dir": [2, 3, 7, 46, 50, 51, 52], "workspac": [2, 3, 31, 54], "path": [2, 3, 7, 16, 18, 19, 20, 22, 28, 29, 30, 31, 32, 33, 34, 35, 37, 43, 47, 50, 52, 54, 55, 56], "privileg": [2, 3, 43], "ipc": [2, 3, 37, 43], "http_proxi": [2, 3], "https_proxi": [2, 3], "no_proxi": [2, 3], "bash": [2, 32, 33, 34, 37, 43, 46, 47, 56], "now": [2, 18, 27, 29, 31], "c": [2, 4, 10, 11, 14, 16, 28, 29, 32, 33, 34, 36, 37, 38, 56], "client": [2, 35], "import": [2, 7, 11, 14, 16, 17, 18, 19, 22, 23, 25, 26, 27, 29, 32, 33, 34, 36, 37, 43, 47, 48, 56], "device_lib": 2, "print": [2, 11, 16, 19, 22, 25, 27, 28, 30, 32, 33, 34, 36, 37, 43, 44, 48, 49, 55, 56], "list_local_devic": 2, "should": [2, 5, 7, 22, 27, 29, 31, 32, 33, 36, 37, 40, 52, 55], "list": [2, 7, 11, 19, 24, 27, 28, 29, 32, 34, 54], "sampl": [2, 22, 40, 47, 49], "output": [2, 7, 11, 13, 19, 20, 24, 25, 27, 30, 32, 34, 35, 43, 47, 52], "look": [2, 16, 24, 31], "name": [2, 3, 4, 5, 7, 11, 14, 16, 18, 19, 20, 25, 26, 27, 29, 31, 39, 42, 49, 53], "0": [2, 5, 11, 14, 15, 16, 19, 20, 22, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 40, 44, 46, 47, 48, 51, 52, 53, 54, 55, 56], "device_typ": [2, 14, 17, 53, 55], "memory_limit": 2, "268435456": 2, "incarn": 2, "9266936945121049176": 2, "xla_global_id": 2, "1": [2, 4, 5, 14, 18, 19, 20, 21, 22, 25, 26, 27, 28, 29, 30, 33, 34, 43, 46, 47, 48, 50, 52, 53, 54, 55, 56], "bus_id": 2, "15031084974591766410": 2, "physical_device_desc": 2, "intel_xpu": 2, "pci": 2, "bu": 2, "id": [2, 31], "undefin": [2, 16], "17448926295332318308": 2, "step": [3, 16, 17, 18, 25, 27, 29, 31, 40, 50, 53, 54, 55], "cpp": [3, 14, 17, 32], "cc": [3, 11, 14, 16, 17, 27, 31, 37, 53, 55], "sourc": [3, 4, 7, 11, 16, 17, 21, 32, 33, 37, 38, 41, 43, 44, 47, 50, 51, 53, 56], "Then": [3, 11, 16, 22, 30, 36, 37, 47], "packag": [3, 16, 29, 32, 33, 34, 36, 40, 47, 50, 51, 56], "p": [3, 25, 31, 36, 37, 43, 54], "bazel": [3, 11, 16, 31, 35], "bin": [3, 7, 11, 16, 28, 31, 34, 35, 41, 43, 44, 47, 50, 51, 53], "cp": [3, 35], "r": [3, 7, 14, 16, 27, 29, 55], "path_to_itex": 3, "out": [3, 15, 16, 27, 35, 44, 48, 49, 55], "k8": [3, 35], "opt": [3, 11, 14, 16, 32, 34, 35, 37, 41, 53], "st": [3, 35], "tar": [3, 7, 29], "cvfh": 3, "path_to_tensorflow_serv": 3, "tensorflow_serv": [3, 31], "model_serv": [3, 31], "tensorflow_model_serv": [3, 31], "gpu": [3, 4, 9, 11, 14, 15, 18, 19, 20, 23, 24, 25, 27, 30, 31, 33, 38, 39, 42], "sure": [3, 11, 16, 27, 32, 34], "meet": [3, 25, 56], "either": [3, 19], "target": [3, 17, 34, 35], "8500": [3, 31], "model_nam": [3, 31], "model_dir": [3, 31, 50, 54, 55], "overview": 4, "infrastructur": [4, 9, 20], "quick": [4, 11, 39, 42], "releas": [4, 14, 17, 29, 30, 31, 34, 40, 49, 51], "frequent": 4, "ask": [4, 34], "guid": [4, 9, 11, 16, 18, 21, 27, 31, 32, 34, 35, 37, 40, 47], "build": [4, 7, 9, 38, 39, 40, 42, 56], "conda": [4, 14, 38], "distribut": [4, 8, 29, 32, 33, 37, 38, 39, 42, 56], "featur": [4, 7, 8, 11, 13, 17, 25, 29, 34, 39, 42, 47, 55, 56], "variabl": [4, 13, 15, 16, 19, 21, 22, 23, 24, 25, 27, 29, 31, 33, 35, 47], "api": [4, 7, 9, 10, 14, 15, 16, 19, 25, 26, 27, 29, 31, 35, 47, 48], "auto": [4, 11, 17, 28, 30, 35], "mix": [4, 30, 39, 42], "precis": [4, 30, 39, 40, 42, 49, 52], "graph": [4, 9, 10, 13, 15, 18, 20, 23, 39, 42, 48, 55, 56], "custom": [4, 7, 9, 18, 21, 26, 28, 30, 32, 37, 46], "oper": [4, 13, 15, 18, 23, 24, 27, 29, 56], "overrid": [4, 11, 18, 27], "int8": [4, 27, 40, 47], "quantiz": [4, 39, 42], "xpuautoshard": [4, 30, 39, 42], "profil": [4, 9, 27, 29], "launcher": [4, 28, 29], "topic": 4, "practic": [4, 27, 28], "support": [4, 7, 13, 14, 15, 17, 18, 19, 22, 24, 27, 28, 29, 30, 32, 34, 36, 37, 40, 43, 47, 54, 55], "openxla": 4, "develop": [4, 16, 21, 29, 32, 34, 36, 37, 56], "design": [4, 7, 9, 14, 21, 31, 40], "structur": [4, 16, 19, 28, 29], "op": [4, 9, 10, 17, 20, 21, 23, 24, 26, 27, 35, 46, 49], "gener": [4, 5, 20, 21, 23, 27, 28, 29, 31, 33, 34, 36, 43, 47], "default": [4, 7, 13, 14, 15, 18, 19, 20, 21, 23, 27, 29, 30, 34, 37, 46, 47, 48, 54, 55], "configur": [4, 8, 11, 14, 16, 17, 19, 21, 23, 27, 28, 30, 32, 37, 56], "good": [4, 19, 21, 23, 29, 31], "perform": [4, 15, 17, 19, 20, 21, 22, 23, 24, 25, 27, 28, 29, 30, 34, 39, 42, 46, 47, 49, 50, 55, 56], "chang": [4, 5, 7, 11, 18, 19, 20, 21, 23, 27, 28, 33, 39, 40, 42, 51, 53, 54], "simpl": [4, 21, 23, 27, 35], "frontend": [4, 21, 23], "util": [4, 9, 11, 14, 21, 23, 28, 29, 51, 55], "user": [4, 5, 7, 11, 13, 19, 20, 21, 23, 32, 34, 36, 37, 38, 43, 49, 56], "onli": [4, 5, 13, 14, 17, 18, 20, 21, 23, 24, 27, 28, 30, 31, 32, 36, 46, 49, 50, 51, 52, 54, 55], "minor": [4, 21, 23], "applic": [4, 21, 23, 29, 30, 31, 40], "scenario": [4, 13, 20, 21, 23, 29, 30], "typic": [4, 21, 23, 27, 29], "need": [4, 8, 13, 14, 16, 17, 20, 21, 23, 27, 28, 31, 32, 33, 34, 35, 37, 43, 47, 48, 51, 54, 55], "add": [4, 5, 17, 18, 19, 24, 29, 31, 32, 35, 43, 49, 54, 55], "two": [4, 13, 14, 19, 21, 23, 27, 29, 34, 43, 46, 49, 50], "three": [4, 21, 22, 23, 28], "claus": [4, 21, 23], "origin": [4, 18, 21, 23, 24, 25, 35, 40, 43, 51], "amp": [4, 18, 28, 39, 42, 50, 56], "low": [4, 18, 21, 23, 27, 40], "data": [4, 15, 16, 17, 18, 21, 22, 25, 27, 30, 34, 37, 40, 43, 45, 46, 47, 49, 50, 51, 52, 53, 55, 56], "type": [4, 7, 11, 14, 18, 20, 21, 28, 30, 33, 34, 43], "bfloat16": [4, 11, 18, 19, 21, 24, 27, 30, 43, 46, 50, 52], "float16": [4, 18, 19, 21, 27, 30, 43], "nativ": [4, 15, 21, 54], "3rd": [4, 21, 36], "xeon": [4, 21, 29, 34, 36, 39, 42, 43], "scalabl": [4, 21, 31, 36, 43], "processor": [4, 21, 29, 36, 43, 47, 48], "cooper": [4, 21, 39, 42, 47], "lake": [4, 21], "avx512": [4, 21, 37, 47], "further": [4, 21], "boost": [4, 21, 28, 29], "less": [4, 18, 19, 21, 24, 27, 43], "memori": [4, 9, 11, 13, 14, 15, 18, 19, 21, 25, 27, 43, 54], "lower": [4, 15, 18, 19, 21, 43], "fulli": [4, 19, 21], "enabl": [4, 13, 15, 16, 17, 18, 21, 22, 25, 27, 28, 29, 30, 33, 34, 35, 54], "fuse": [4, 16, 18, 19, 21, 24, 26, 46], "specif": [4, 16, 27, 29, 30, 31, 32, 37, 56], "new": [4, 5, 7, 8, 15, 21, 23, 24, 27, 29, 40], "better": [4, 15, 18, 19, 21, 24, 25, 28, 29, 39, 42, 46, 47, 49, 50], "conv2d": [4, 21, 48], "relu": [4, 11, 16, 19, 21, 24, 25, 26, 27, 48], "linear": [4, 19, 21, 25, 27], "benefit": [4, 21, 27, 29, 30], "fusion": [4, 9, 16, 17, 18, 19, 21, 26, 30], "deliv": [4, 19, 21], "transpar": [4, 21], "fashion": [4, 21], "implement": [4, 7, 10, 16, 17, 19, 21, 23, 25, 26, 29, 56], "sever": [4, 21, 28, 29, 34], "namespac": [4, 17, 21, 23, 25, 26, 30, 35], "extend": [4, 14, 21, 23, 25, 29, 30], "defin": [4, 16, 27], "export": [4, 7, 11, 15, 16, 17, 18, 19, 21, 22, 27, 28, 29, 31, 33, 35, 41, 43, 47, 52, 54, 55], "ze_enable_tracing_lay": [4, 21, 22, 27], "usecyclespersecondtim": [4, 21, 22, 27], "enable_tf_profil": [4, 21, 22, 27], "co": [4, 14, 15, 21], "neural": [4, 15, 21, 29, 39, 40, 42, 47], "compressor": [4, 15, 21, 39, 40, 42, 47], "solut": [4, 14, 15, 21], "equival": [4, 27], "experiment": [4, 13, 14, 16, 22, 30, 34, 37, 54], "automat": [4, 5, 16, 17, 18, 19, 21, 26, 27, 28, 29, 30, 32, 37, 39, 42, 44, 48, 55], "shard": [4, 17, 21, 30, 54], "input": [4, 11, 13, 17, 19, 20, 21, 22, 24, 25, 27, 30, 55], "place": [4, 17, 21, 29, 35], "maxim": [4, 17, 21, 25, 30, 55], "hardwar": [4, 17, 19, 21, 23, 25, 28, 30, 39, 42], "usag": [4, 14, 21, 29, 30, 39, 42], "adopt": [4, 15, 21], "uniform": [4, 16, 21], "pjrt": [4, 21, 56], "plugin": [4, 10, 16, 18, 19, 21, 22, 31, 34, 53, 56], "mechan": [4, 21], "backend": [4, 16, 21, 23, 26, 27, 30, 37, 43, 44, 47, 48, 56], "show": [5, 14, 16, 18, 27, 34, 35, 37, 39, 40, 42, 43, 45, 46, 47, 49, 50, 51, 52, 54, 55], "script": [5, 21, 22, 29, 34, 43, 46, 48, 50, 51, 54], "relat": [5, 28, 31], "save": [5, 11, 17, 28, 30, 52], "doc": [5, 9, 11, 51], "build_doc": 5, "trigger": [5, 19, 30], "merg": 5, "pr": 5, "github": [5, 7, 8, 16, 21, 29, 31, 34, 35, 37, 40, 43, 46, 49, 50, 52, 53, 54, 55, 56], "repo": [5, 32, 33], "main": [5, 16, 17, 21, 32, 35, 50, 53], "branch": [5, 7, 16, 34, 54], "execut": [5, 11, 13, 15, 16, 17, 18, 19, 20, 22, 25, 27, 29, 47, 48], "content": [5, 35, 37], "doesn": [5, 17, 18, 51], "contain": [5, 9, 15, 17, 28, 29, 31, 38, 39, 42, 50, 56], "won": [5, 28], "product": [5, 7, 21, 31, 32], "git": [5, 11, 16, 30, 31, 34, 35, 43, 46, 49, 50, 52, 53, 54, 55], "tag": [5, 54], "must": [5, 15, 27], "ad": [5, 13, 17, 18, 21, 23, 27, 34, 46, 55], "same": [5, 7, 14, 16, 20, 21, 23, 24, 25, 27, 28, 29, 30, 31, 35, 40, 48, 54], "manual": [5, 7, 18, 27, 28], "result": [5, 15, 16, 17, 19, 22, 27, 29, 30, 33, 40, 44, 46, 48, 49, 51, 55], "gh": 5, "page": [5, 21, 22, 23, 29, 56], "io": [5, 31], "site": [5, 8, 32, 33, 34, 37, 51, 56], "note": [5, 11, 17, 18, 20, 25, 27, 28, 30, 31, 34, 35, 37, 43, 49, 53, 54], "write": [5, 7, 19], "abl": 5, "clone": [5, 16, 31, 34, 35, 46, 49, 50, 52, 54, 55], "extens": [5, 8, 9, 11, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, 28, 29, 33, 38, 39, 41, 42, 43, 44, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55, 56], "tensorflow": [5, 8, 9, 10, 11, 13, 14, 15, 16, 17, 20, 22, 24, 25, 26, 27, 28, 29, 33, 38, 39, 41, 42, 43, 44, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55, 56], "checkout": [5, 16, 31, 35, 50, 55], "build_tmp": 5, "m": [5, 16, 28, 29, 40, 41, 49, 50, 53], "push": 5, "befor": [5, 7, 11, 18, 19, 24, 27, 28, 29, 34, 35, 55], "submit": [5, 7, 56], "modifi": [5, 35, 43, 55], "draft": 5, "server": [5, 16, 34, 37], "9000": 5, "web": [5, 51], "browser": [5, 22, 36, 37, 47, 49, 51], "g": [5, 17, 27, 35], "chrome": 5, "127": [5, 31], "localhost": [5, 11, 20, 36, 37, 53], "check": [5, 7, 11, 13, 14, 18, 19, 21, 23, 27, 28, 32, 33, 34, 35, 40, 41, 43, 52, 53, 56], "picker": 5, "function": [5, 16, 17, 20, 21, 23, 25, 26, 27, 29, 30], "want": [5, 7, 27, 28, 32, 34, 37, 49, 52], "switch": [5, 29], "begin": [7, 11, 43], "share": [7, 14, 29, 32, 43, 44, 46, 49, 50, 52], "intent": 7, "team": [7, 49], "base": [7, 11, 14, 15, 16, 18, 19, 25, 29, 32, 33, 36, 43, 46, 47, 52, 53, 55, 56], "bug": [7, 56], "propos": [7, 25], "log": [7, 11, 16, 18, 20, 22, 27, 30, 35, 37, 43, 44, 46, 49, 50, 51, 52, 53, 55], "intend": [7, 56], "approv": 7, "fix": [7, 27, 32], "search": [7, 28], "pick": 7, "d": [7, 32, 34, 35, 54], "pleas": [7, 11, 14, 16, 17, 21, 27, 32, 34, 35, 37, 40, 43, 46, 48, 50, 52, 53, 56], "pull": [7, 31, 36, 37, 43], "ensur": [7, 28], "run": [7, 11, 14, 18, 19, 22, 24, 26, 27, 28, 29, 30, 34, 36, 37, 39, 42, 54, 56], "patch": [7, 31, 46, 49, 50, 52, 54, 55], "signific": [7, 18], "requir": [7, 11, 13, 15, 21, 22, 24, 25, 27, 28, 33, 40], "rfc": [7, 16, 21], "process": [7, 11, 21, 27, 28, 29, 31, 46, 47], "consist": [7, 27], "discuss": 7, "promot": 7, "found": [7, 14, 27, 28, 29, 31, 34], "dedic": 7, "contributor": [7, 56], "coven": [7, 56], "conduct": [7, 28], "full": [7, 37], "locat": [7, 8, 34, 35, 46, 49], "benchmark": [7, 50], "llga": [7, 30], "saniti": [7, 56], "migrat": 7, "path_to_python_unit_test": 7, "ut": 7, "find": [7, 11, 22, 29, 31], "py": [7, 11, 16, 22, 28, 31, 43, 44, 49, 50, 51, 52, 53, 54, 55], "do": [7, 14, 16, 19, 27, 28, 30, 34, 47], "done": [7, 22, 27, 29, 32], "standard": [7, 25], "pylint": 7, "against": 7, "definit": [7, 18, 23, 30], "root": [7, 34, 35, 51], "pip": [7, 11, 14, 16, 22, 30, 31, 32, 33, 34, 36, 37, 40, 41, 49, 50, 53, 54, 55, 56], "rcfile": 7, "pylintrc": 7, "myfil": 7, "conform": 7, "googl": [7, 14, 16, 21, 22, 31, 52], "both": [7, 14, 15, 18, 19, 23, 28, 29, 30, 37, 43], "clang": 7, "format": [7, 9, 18, 24, 27, 30], "cpplint": 7, "apt": [7, 16, 31, 32, 37], "12": [7, 14, 27, 28, 37, 46, 49, 53, 55, 56], "inplac": 7, "stdout": [7, 28], "filter": 7, "legal": 7, "copyright": 7, "exclud": 7, "third_parti": [7, 9, 31], "recurs": 7, "sometim": 7, "fals": [7, 17, 25, 27, 28, 46, 52, 55], "error": [7, 11, 14, 20, 25, 27, 31, 43, 44, 46, 49, 50, 51, 52], "nolint": 7, "nolintnextlin": 7, "skip": [7, 27, 28, 33], "line": [7, 27, 29, 31, 43, 51, 55], "mkl": [7, 31, 32, 33, 34, 35, 37], "h": [7, 11, 14, 17, 31, 35, 53], "include_subdir": 7, "buildifi": 7, "tool": [7, 9, 11, 14, 18, 29, 32, 33, 34, 37, 56], "bzl": 7, "convent": 7, "xxx": [7, 47, 51], "tpl": 7, "go": [7, 35, 36, 37], "golang": 7, "dl": 7, "go1": 7, "15": [7, 16, 28, 37], "amd64": [7, 32], "gz": [7, 29], "sudo": [7, 16, 31, 32, 34, 37], "usr": [7, 28, 32], "xzf": 7, "bazelbuild": [7, 34], "buildtool": 7, "src": [7, 11, 14, 17, 31], "home": [7, 28, 32, 36, 37, 51], "NOT": [7, 14], "zzz": 7, "view": 8, "latest": [8, 16, 31, 33, 34, 35, 37, 56], "previou": [8, 25, 29], "valid": [8, 30], "here": [8, 11, 17, 18, 24, 34, 46, 49, 50, 55], "contact": 8, "addit": [8, 21, 23, 24, 29, 56], "assist": 8, "none": [8, 25, 26, 27, 28, 30], "docker": [9, 38, 39, 42], "docs_build": 9, "core": [9, 11, 14, 16, 17, 26, 27, 29, 34, 35, 37, 47, 48, 53, 55], "test": [9, 19, 22, 27, 31, 33, 39, 42, 50, 51, 56], "third": [9, 56], "parti": [9, 56], "program": [9, 29, 56], "kei": [9, 16, 17, 20, 32], "parent": 9, "sub": [9, 14, 18, 19, 29, 30], "descript": [9, 13, 18, 28, 29, 30, 39, 42, 51], "onednn": [9, 11, 12, 14, 15, 16, 20, 24, 29, 30, 39, 42], "propag": [9, 13, 17], "miscellan": 9, "repositori": [9, 32, 46, 50], "modular": 10, "pluggabl": [10, 35, 37], "streamexecutor": [10, 16], "registr": [10, 11, 50], "pluggabledevic": [10, 56], "pass": [11, 15, 16, 17, 27, 30, 49, 55], "procedur": [11, 16, 32, 36, 37], "tf": [11, 14, 15, 19, 22, 25, 26, 27, 28, 30, 32, 34, 36, 37, 47, 48, 54, 55], "__version__": [11, 30, 32, 34, 36, 37, 56], "verbos": [11, 19, 20, 27, 28], "itex_verbos": [11, 16, 17], "onednn_verbos": 11, "familiar": [11, 16], "architectur": 11, "built": [11, 31, 36, 37], "creat": [11, 18, 27, 28, 30, 33, 35, 37, 41, 47, 50, 55], "offcial": 11, "geluop": 11, "init": [11, 54], "void": 11, "register_geluop": 11, "declar": 11, "call": [11, 15, 16, 26, 27, 29, 30, 38, 41, 47, 48, 51, 52], "nn": [11, 16, 25, 26, 30, 48], "itex_vlog": 11, "statusuniqueptr": 11, "tf_newstatu": [11, 35], "tf_opdefinitionbuild": 11, "op_build": 11, "tf_newopdefinitionbuild": 11, "gelu": [11, 30], "tf_opdefinitionbuilderaddinput": 11, "tf_opdefinitionbuilderaddoutput": 11, "activ": [11, 18, 19, 22, 25, 27, 29, 30, 32, 33, 34, 36, 37, 41, 43, 44, 47, 48, 50, 51, 53], "tf_opdefinitionbuilderaddattr": 11, "half": [11, 27], "float": [11, 18, 20, 27, 30, 35, 43], "approxim": [11, 25], "bool": 11, "true": [11, 22, 25, 26, 27, 28, 30, 46, 52, 55], "tf_opdefinitionbuildersetshapeinferencefunct": 11, "unchanged_shape_fn": 11, "tf_registeropdefinit": 11, "itex_check_eq": 11, "tf_ok": [11, 35], "tf_getcod": [11, 35], "fail": [11, 27, 30], "its": [11, 25, 27, 28, 29, 32, 37, 48], "docstr": 11, "attr": [11, 20], "might": [11, 34], "debug": [11, 20, 22, 30], "one": [11, 14, 15, 20, 21, 27, 29, 34, 43, 48, 50, 54], "made": [11, 50], "separ": [11, 16, 23, 24, 27, 29, 33, 34, 56], "register_kernel_build": 11, "device_cpu": 11, "typeconstraint": 11, "cpudevic": 11, "device_gpu": [11, 17, 55], "gpudevic": 11, "engin": [11, 14], "polymorph": 11, "load_ops_librari": 11, "load": [11, 27, 31, 37], "register_": 11, "macro": 11, "directli": [11, 17, 27, 28, 29, 37], "relubaseop": 11, "eltwisebaseop": 11, "opkernel": 11, "templat": 11, "typenam": 11, "opkernelconstruct": 11, "context": [11, 25, 29], "dnnl": [11, 13], "algorithm": [11, 25], "eltwise_gelu_erf": 11, "0f": 11, "hasattr": [11, 30], "op_requires_ok": 11, "getattr": 11, "approximate_": 11, "alg_kind_": 11, "eltwise_gelu_tanh": 11, "algo": 11, "alpha": 11, "beta": 11, "eltwis": 11, "rewrit": [11, 16, 17], "comput": [11, 15, 16, 25, 27, 29, 32, 40, 48, 49, 56], "ctx": 11, "alpha_": 11, "beta_": 11, "opkernelcontext": 11, "try": [11, 21, 28, 40, 47], "onednn_engin": 11, "creatednnlengin": 11, "tensor": [11, 25, 27, 35, 48], "dst_tensor": 11, "nullptr": 11, "noth": 11, "return": [11, 16, 17, 27, 30, 35], "src_tensor": 11, "shape": [11, 13, 17, 19, 25, 27, 48], "num_el": 11, "allocate_output": 11, "kdstindex": 11, "forward": [11, 27, 49], "descriptor": 11, "primit": [11, 13, 20], "eltwise_forward": 11, "desc": [11, 13], "fwd_desc": 11, "prop_kind": 11, "src_md": 11, "primitive_attr": 11, "set_scratchpad_mod": 11, "scratchpad_mod": 11, "primitive_desc": 11, "fwd_pd": 11, "fwd_primit": 11, "onednn_stream": 11, "creatednnlstream": 11, "std": [11, 35], "unordered_map": 11, "int": [11, 35], "fwd_primitive_arg": 11, "dnnl_arg_src": 11, "src_mem": 11, "dnnl_arg_dst": 11, "dst_mem": 11, "dnnl_arg_scratchpad": 11, "scratchpad_mem": 11, "catch": 11, "protect": 11, "eltwise_relu": 11, "hpp": 11, "It": [11, 14, 15, 16, 17, 18, 19, 20, 21, 27, 29, 33, 34, 39, 42, 47, 50, 51, 56], "elig": 11, "infer": [11, 15, 17, 18, 19, 24, 27, 31, 39, 40, 42, 47, 51], "backward": [11, 27], "descibl": 11, "click": [11, 34], "header": 11, "itex_xpu_librari": 11, "relu_op": 11, "hdr": [11, 31], "relu_op_functor": 11, "eltwise_base_hdr": 11, "copt": [11, 31], "tf_copt": [11, 31], "linkstat": 11, "dep": [11, 31], "alwayslink": [11, 31], "gpu_kernel": 11, "In": [11, 16, 18, 19, 27, 28, 29, 33, 40, 43, 47, 48, 53, 55], "tip": [11, 20, 29, 31], "compil": [11, 14, 16, 19, 21, 27, 29, 30, 31, 32, 33, 34, 35, 37], "name_scop": 11, "convert_to_tensor": 11, "intel_extension_for_tensorflow": [11, 17, 18, 19, 25, 26, 27, 28, 31, 32, 33, 34, 36, 37, 43, 56], "clean": [11, 35], "xfd": 11, "config": [11, 14, 16, 17, 18, 19, 27, 31, 32, 34, 35, 37, 43, 47, 53, 54, 55], "pip_packag": [11, 34], "build_pip_packag": [11, 34], "uninstal": 11, "intel_extension_for_tensorflow_lib": [11, 34], "x": [11, 19, 25, 26, 27, 34, 35, 43, 48, 53], "constant": [11, 15, 25, 26, 27], "dtype": [11, 19, 25, 26, 48, 55], "float32": [11, 16, 19, 24, 25, 26, 27, 46, 48, 50], "y": [11, 16, 25, 26, 27, 32, 34, 35, 43, 53, 56], "nn_op": 11, "141": 11, "common_runtim": 11, "eager": [11, 25], "1445": 11, "job": [11, 20, 35], "replica": [11, 20], "task": [11, 20, 29, 54], "100": [11, 27, 30, 46, 54], "eltwise_bas": 11, "44": [11, 28], "exec": [11, 13], "ocl": 11, "gen9": 11, "forward_train": 11, "data_f32": 11, "block": [11, 29, 30, 37], "f0": 11, "diff_undef": 11, "undef": 11, "scratchpad": [11, 13], "alg": 11, "5": [11, 18, 19, 20, 22, 25, 27, 30, 32, 34, 35, 36, 46, 48, 52, 55], "xxxxxx": 11, "op_kernel": 11, "773": 11, "object": [12, 14, 18, 27, 29, 30, 43, 44, 46, 49, 50, 51, 52], "cach": [12, 15, 29], "creation": 13, "overhead": [13, 27, 29], "becom": [13, 29], "notic": [13, 27], "especi": [13, 33], "small": [13, 25, 27, 28, 29], "latenc": [13, 43, 49], "bind": [13, 29, 35], "node": [13, 18, 20, 24, 29, 33, 40], "By": [13, 27, 28, 29, 47], "off": [13, 28, 30, 47, 55, 56], "dynam": [13, 27, 29], "mean": [13, 14, 18, 25, 27, 28, 29, 34], "invalid": [13, 29], "dim": 13, "meta": 13, "layout": [13, 28, 30], "parallel": [13, 16, 29], "schedul": [13, 25, 28, 29], "thread": [13, 28, 29, 30, 37], "safe": [13, 18, 30, 56], "stream": [13, 49, 55], "demand": [13, 56], "satisfi": [13, 23], "concurr": [13, 29], "case": [13, 18, 19, 21, 27, 28, 29, 43, 54], "mutex": 13, "lock": 13, "weight": [13, 25, 27, 46, 48, 55], "bia": [13, 20, 24, 25, 48], "temporari": 13, "area": 13, "reorder": 13, "argument": [13, 25, 27, 28, 30], "whether": [14, 24, 28, 29], "successfulli": [14, 31, 33, 34, 35, 37, 55], "platform": [14, 16, 27, 29, 30, 32, 34, 36, 46, 49, 50, 51, 52, 55], "zero": [14, 16, 25, 26, 27, 32], "opencl": [14, 16, 32, 37], "And": [14, 32, 36, 37], "high": [14, 16, 17, 27, 29, 56], "list_physical_devic": [14, 19, 27], "tell": 14, "regist": [14, 16, 40, 47], "2021": 14, "07": [14, 25, 37, 55], "01": [14, 30], "06": [14, 27], "40": [14, 28], "55": [14, 28, 29, 55], "510076": 14, "dpcpp_runtim": [14, 27], "116": 14, "select": [14, 16, 27, 28, 30, 49, 56], "physicaldevic": [14, 53], "physical_devic": [14, 53], "know": [14, 19, 27], "rate": [14, 15, 18, 25, 31], "system": [14, 21, 29, 31, 33, 34], "monitor": 14, "capabl": [14, 27], "clock": 14, "frequenc": 14, "eu": 14, "count": 14, "amount": [14, 27], "so": [14, 16, 19, 27, 28, 29, 30, 31, 34, 35, 43, 44, 46, 49, 50, 51, 52, 53], "each": [14, 25, 27, 28, 29, 55], "modul": [14, 16, 17, 28], "relationship": [14, 18], "replac": [14, 25, 26, 31, 35], "stock": [14, 23, 24, 27, 32, 33, 36, 37, 40, 46, 49, 50, 51, 52, 55, 56], "sinc": [14, 27, 29], "9": [14, 16, 18, 25, 28, 33, 34, 40, 41, 51, 55], "That": [14, 29, 34, 43], "them": [14, 18, 21, 27, 28, 29, 31, 51, 54], "unknown": [14, 27], "help": [14, 19, 20, 21, 28, 29, 37, 40, 47], "acceler": [14, 16, 30, 39, 42, 43, 47, 56], "q1": 14, "2024": 14, "discontinu": 14, "upstream": [14, 18], "futur": 14, "current": [14, 17, 22, 30, 46, 50, 55], "upgrad": [14, 32, 33, 36, 37, 40, 41, 50, 56], "section": [14, 27, 29, 32], "problem": [14, 24, 27, 29], "encount": 14, "sycl": [14, 16], "level_zero_util": 14, "33": [14, 16, 32, 37, 54], "fatal": 14, "level_zero": 14, "ze_api": 14, "modulenotfounderror": 14, "depend": [14, 19, 28, 29, 32, 34, 35, 37], "framework": [14, 32, 35, 43, 44, 45, 46, 49, 50, 52, 54], "errors_impl": [14, 43, 44, 46, 49, 50, 52], "notfounderror": [14, 43, 44, 46, 49, 50, 52], "libmkl_sycl": [14, 43, 44, 46, 49, 50, 52], "cannot": [14, 18, 43, 44, 46, 49, 50, 52], "setvar": [14, 32, 37, 41, 53], "env": [14, 31, 33, 34, 35, 37, 41, 47, 49], "var": [14, 31, 33, 34, 35, 37], "toolkit": [14, 16, 32, 33, 40, 43, 53, 56], "glibcxx_3": 14, "4": [14, 17, 18, 20, 24, 25, 27, 28, 29, 33, 46, 48, 53, 55], "30": [14, 35, 55], "forg": 14, "gxx_linux": 14, "64": [14, 16, 17, 19, 27, 28, 32, 34, 35, 36, 37, 46], "higher": [14, 15, 20, 27, 29], "glibcxx": 14, "veri": [15, 27, 46], "popular": 15, "deep": [15, 29, 39, 42, 56], "techniqu": [15, 27], "invent": 15, "improv": [15, 19, 27, 29, 34, 55], "speed": [15, 18, 29, 39, 40, 42], "minim": [15, 29], "number": [15, 24, 27, 29, 39, 40, 42, 46, 49, 54, 55], "bit": [15, 16, 18, 27, 30, 32, 34, 35, 36, 37, 43], "convert": [15, 17, 18, 19, 27, 40, 43, 50], "real": [15, 27, 54], "valu": [15, 17, 18, 20, 25, 27, 28, 29, 30, 54], "represent": 15, "mainli": [15, 17, 28], "phase": [15, 46], "loss": [15, 18, 19, 39, 40, 42, 47, 53], "accuraci": [15, 18, 19, 25, 27, 39, 40, 42, 47, 53], "reduc": [15, 18, 27, 29, 34, 40, 46, 49, 55], "miss": 15, "cost": 15, "network": [15, 29], "v2": [15, 30, 33, 46, 54], "newer": [15, 40, 41, 47], "integr": [15, 16, 29, 34], "box": 15, "green": 15, "subgraph": 15, "onednngraph": 15, "part": [15, 17, 29, 46], "executor": 15, "partit": [15, 29], "deleg": 15, "grappler": [15, 17, 19, 53], "fold": 15, "itex_tf_constant_fold": [15, 47], "incept": [15, 18, 39, 42, 49], "v3": [15, 39, 42], "introduc": [16, 28, 29], "seamlessli": 16, "simplifi": [16, 40], "quickli": [16, 20, 27], "initi": [16, 17, 20, 27, 34, 54], "pytorch": 16, "xla": 16, "numpi": [16, 22, 25, 27, 48, 50], "style": 16, "compos": [16, 17], "transform": [16, 24, 25], "batch": [16, 17, 25, 27, 28, 55], "differenti": [16, 34], "multipl": [16, 18, 20, 29, 54, 55], "_src": 16, "xla_bridg": 16, "register_pjrt_plugin_factori": 16, "getenv": 16, "pjrt_names_and_library_path": 16, "your_itex_path": 16, "libitex_xla_extens": 16, "jaxlib": 16, "xla_extens": 16, "lastest": 16, "interfac": [16, 17, 38, 56], "got": 16, "getpjrtapi": 16, "verifi": [16, 33, 34, 39, 42, 46, 49, 50, 51, 52, 55], "max": [16, 30, 34, 37, 43, 45, 46, 49, 50, 51, 52, 53, 55], "647": [16, 32, 37], "flex": [16, 34, 37, 40, 43, 45, 47, 49, 52, 56], "170": [16, 34, 37, 49, 52], "arc": [16, 34, 37, 43, 56], "red": [16, 37], "hat": [16, 37], "8": [16, 18, 25, 27, 28, 30, 32, 34, 36, 37, 46, 47, 54], "6": [16, 18, 27, 30, 37, 46], "suse": [16, 37], "enterpris": [16, 37], "sle": [16, 37], "sp3": [16, 37], "sp4": [16, 37], "2023": [16, 32, 33, 37, 53], "19": [16, 28, 32, 36, 37], "later": [16, 29, 32, 36, 37], "manylinux2014": [16, 32, 36, 37], "append": [16, 32, 36, 37], "after": [16, 17, 18, 19, 22, 24, 26, 27, 29, 30, 32, 33, 37, 40, 46], "compon": [16, 17, 19, 30, 32, 33, 34, 37], "icd": [16, 32, 37], "23": [16, 28, 32, 37, 54, 55], "17": [16, 28, 32, 35, 37], "26241": [16, 32, 37], "There": [16, 21, 34, 40, 43, 47], "ye": [16, 19, 33], "wish": [16, 34], "n": [16, 18, 24, 25, 29, 30, 33, 34, 35, 48], "libitex": [16, 35], "ld_library_path": [16, 35], "your_python_sit": 16, "info": [16, 17, 18, 28, 35, 40, 43], "jnp": 16, "jit": 16, "def": [16, 27], "lax_conv": 16, "random": [16, 25, 48], "prngkei": 16, "lh": 16, "rh": 16, "side": 16, "lax": 16, "conv_with_general_pad": 16, "multipli": [16, 27], "itex_gpu_runtim": 16, "129": [16, 28], "servic": [16, 50], "176": [16, 32], "0x56060b5ae740": 16, "doe": [16, 24, 27], "guarante": [16, 32], "184": 16, "0449753": 16, "093208": 16, "1844783": 16, "9769732": 16, "5857391": 16, "6942389": 16, "9218378": 16, "2862523": 16, "1549542": 16, "8367321": 16, "3978379": 16, "3860377": 16, "9456574": 16, "062028": 16, "0365305": 16, "901286": 16, "5255247": 16, "1421617": 16, "0621": 16, "2933435": 16, "1257985": 16, "1095486": 16, "5584903": 16, "1229166": 16, "7746235": 16, "2446113": 16, "7870374": 16, "8216239": 16, "557919": 16, "9832508": 16, "0887792": 16, "5433128": 16, "9749291": 16, "2580051": 16, "6096935": 16, "264905": 16, "175818": 16, "0094342": 16, "005763": 16, "6559253": 16, "3896458": 16, "4036925": 16, "1342552": 16, "8239582": 16, "6091168": 16, "434404": 16, "671778": 16, "7397764": 16, "930626": 16, "659667": 16, "6508744": 16, "3305787": 16, "4061482": 16, "0829628": 16, "130649": 16, "6637266": 16, "594426": 16, "2636002": 16, "7168686": 16, "8598001": 16, "9009514": 16, "7938274": 16, "4870623": 16, "6193901": 16, "5297288": 16, "0247464": 16, "0905268": 16, "7598859": 16, "9362347": 16, "9513799": 16, "9403584": 16, "1483061": 16, "hlo_pass_pipelin": 16, "301": 16, "hlo": 16, "pipelin": [16, 39, 40, 42, 47], "jit_lax_conv": 16, "181": 16, "fusion_merg": 16, "multi_output_fus": 16, "conv": [16, 17, 24, 48], "convolut": [16, 29], "gpu_compil": 16, "1221": 16, "llvm": 16, "spir_compil": 16, "255": [16, 19, 27], "compiletargetbinari": 16, "compiletospir": 16, "11": [16, 18, 28, 32, 33, 34, 56], "cumul": 16, "99": 16, "74": 16, "pjrt_stream_executor_cli": 16, "2201": 16, "num_replica": 16, "num_partit": 16, "num_addressable_devic": 16, "2268": 16, "replic": 16, "complet": [16, 29], "1208": 16, "pjrtstreamexecutorbuff": 16, "delet": 16, "1299": 16, "toliter": 16, "v0": [16, 30, 33], "mnist_classifi": 16, "given": [17, 25, 28, 50], "tile": [17, 20, 30, 46, 53, 55], "split": [17, 18, 54], "dimens": 17, "As": [17, 24, 27, 28, 29], "first": [17, 18, 19, 22, 24, 25, 27, 28, 29, 32, 33, 36, 37, 46, 50], "limit": [17, 29, 56], "homogen": 17, "At": [17, 21, 40, 49], "tfg": 17, "mlir": 17, "assum": [17, 27, 29, 33, 34, 46, 50], "matmul": [17, 20, 24, 26, 35], "normal": [17, 20, 25, 27, 29, 34, 43], "autoshard": [17, 55], "back": [17, 27], "under": [17, 23, 26, 28, 30, 34, 47], "primari": [17, 29], "entri": 17, "point": [17, 18, 20, 27, 30, 32, 37, 43], "auto_sharding_pass_mlir": 17, "invok": 17, "hook": 17, "convers": [17, 18, 19, 24], "between": [17, 18, 19, 21, 29, 31, 34, 49, 54, 55], "graphdef": [17, 18], "dialect": 17, "type_infer": 17, "tfg_to_h": 17, "auto_sharding_pass": 17, "hs_to_tfg": 17, "mark": 17, "scope": [17, 35, 54], "unshard": 17, "annot": 17, "uniniti": 17, "properti": [17, 18, 27], "ir": 17, "heterogen": [17, 56], "reli": 17, "heurist": 17, "hsp": 17, "per": [17, 27, 28, 29, 33, 53, 55], "semant": [17, 20, 25], "final": [17, 19, 27, 46], "accord": [17, 18, 43, 51, 53, 54], "turn": [17, 56], "graphopt": [17, 18, 19, 43, 55], "ON": [17, 30, 43, 55], "flag": [17, 35], "global": [17, 27, 30, 55], "shardingconfig": [17, 55], "mode": [17, 20, 24, 30, 46, 49, 54], "auto_mod": [17, 55], "paramet": [17, 26, 43], "batch_siz": [17, 19, 27, 50, 55], "stage_num": [17, 55], "decid": 17, "device_num": [17, 55], "graph_opt": [17, 18, 19, 30, 43, 47, 55], "sharding_config": [17, 55], "itex_cfg": [17, 55], "configproto": [17, 18, 19, 43, 47, 55], "set_config": [17, 18, 19, 43, 55], "itex_optimizer_before_shard": 17, "pbtxt": 17, "itex_optimizer_after_shard": 17, "resnet50": [17, 28, 39, 42], "train": [17, 18, 21, 24, 25, 26, 28, 31, 32, 33, 37, 38, 39, 42, 43, 46, 47, 52], "fp16": [18, 19, 39, 42, 43, 46], "bf16": [18, 19, 24, 39, 40, 42, 43, 46, 50, 55], "obvious": 18, "compar": [18, 27, 29, 39, 42], "fp32": [18, 19, 20, 24, 39, 40, 42, 46, 47], "danger": 18, "order": [18, 19, 27, 28, 29, 33, 38], "achiev": [18, 29], "faster": [18, 19, 25, 27, 29, 43], "strong": 18, "four": 18, "allowlist": 18, "denylist": 18, "inferlist": 18, "clearlist": 18, "let": [18, 27, 31], "balanc": [18, 19], "expect": [18, 33, 47, 56], "alwai": [18, 27], "critic": 18, "addition": [18, 27], "downstream": 18, "too": [18, 27, 32, 37], "exp": 18, "gt": [18, 30, 55], "due": [18, 29], "effect": [18, 28, 29], "desir": [18, 28], "explain": 18, "principl": 18, "index": [18, 29, 54], "7": [18, 27, 28, 30, 46, 49], "everi": [18, 20, 49], "ii": [18, 19, 30], "whose": 18, "iii": [18, 19], "deni": 18, "ignor": [18, 27, 31], "iv": [18, 19], "insert": [18, 19, 24, 47], "increas": [18, 27, 47], "priorit": 18, "auto_mixed_precision_opt": [18, 19, 43], "automixedprecosionopt": 18, "16": [18, 27, 28, 30, 36, 43, 46], "32": [18, 25, 26, 27, 28, 30, 43, 46, 52], "data_typ": [18, 19, 43], "itex_auto_mixed_precision_data_typ": [18, 19, 43], "ampthre": 18, "default_data_typ": [18, 30], "unsafe_force_al": 18, "itex_auto_mixed_precision_unsafe_force_al": 18, "allowlist_add": [18, 19], "itex_auto_mixed_precision_allowlist_add": [18, 19], "string": [18, 27, 28, 34, 35], "denylist_add": 18, "itex_auto_mixed_precision_denylist_add": 18, "inferlist_add": 18, "itex_auto_mixed_precision_inferlist_add": 18, "clearlist_add": 18, "itex_auto_mixed_precision_clearlist_add": 18, "allowlist_remov": 18, "itex_auto_mixed_precision_allowlist_remov": 18, "denylist_remov": 18, "itex_auto_mixed_precision_denylist_remov": 18, "inferlist_remov": [18, 19], "itex_auto_mixed_precision_inferlist_remov": [18, 19], "clearlist_remov": 18, "itex_auto_mixed_precision_clearlist_remov": 18, "avgpool": [18, 19], "mani": [18, 21, 27, 28, 29, 53], "extra": [18, 27], "up": [18, 22, 27, 29, 32, 35, 39, 42, 46, 49, 52], "tabl": [18, 27, 28], "correspond": [18, 28], "itex_auto_mixed_precision_log_path": [18, 19, 20, 30], "tf_auto_mixed_precision_graph_rewrite_log_path": 18, "tf_auto_mixed_precision_graph_rewrite_level": 18, "tf_auto_mixed_precision_graph_rewrite_allowlist_add": 18, "tf_auto_mixed_precision_graph_rewrite_denylist_add": 18, "tf_auto_mixed_precision_graph_rewrite_inferlist_add": 18, "tf_auto_mixed_precision_graph_rewrite_clearlist_add": 18, "tf_auto_mixed_precision_graph_rewrite_allowlist_remov": 18, "tf_auto_mixed_precision_graph_rewrite_denylist_remov": 18, "tf_auto_mixed_precision_graph_rewrite_inferlist_remov": 18, "tf_auto_mixed_precision_graph_rewrite_clearlist_remov": 18, "With": [18, 19, 27, 28, 40, 44, 48, 49], "most": [18, 19, 27, 28, 29, 43, 51], "basic": [18, 19, 20, 27], "itexauto_mixed_precision_opt": [18, 19], "automixedprecisionopt": [18, 19, 43], "float16graph_opt": [18, 19], "auto_mixed_precision_optionsgraph_opt": 18, "auto_mixed_precis": [18, 19, 30, 43], "onconfig": [18, 19], "itex_auto_mixed_precis": [18, 19, 28, 30, 43], "1export": [18, 19], "avgpool3d": [18, 19], "cnn": [18, 29, 39, 40, 42], "v4": [18, 39, 42], "epoch": [18, 19, 27, 46, 53], "slower": [18, 19, 27], "becaus": [18, 19, 27], "subsequ": [18, 19, 27, 29, 49], "alreadi": [18, 27, 33, 40], "howev": [18, 21, 24, 27, 28, 29, 49], "usual": 18, "chanc": [18, 27], "my": [18, 19], "automixedprecis": 18, "1657011814330": 18, "pb": [18, 19, 31, 43], "binari": [18, 31, 34], "txt": [18, 32, 37, 49, 52, 55], "text": [18, 39, 42], "preop": 18, "1657011815538": 18, "pre": [18, 30, 36, 37, 46, 51], "paintbucket": 18, "netron": 18, "softmax": [18, 19, 27], "move": [18, 29, 46, 50], "altern": 18, "abov": [18, 19, 22, 27, 28, 29, 32, 43, 46, 47, 50, 51, 52, 53, 55], "littl": 18, "drop": [18, 28], "occupi": 18, "over": [18, 27], "whole": [18, 20, 30, 46], "runtim": [18, 23, 25, 27, 29, 32, 34, 54, 56], "repeat": 18, "until": [18, 29], "reach": 18, "peak": [18, 23], "consumpt": [19, 21, 27, 43], "kera": [19, 25, 26, 47, 49, 53, 56], "similar": [19, 29], "offer": [19, 29], "frozen": 19, "layernorm": [19, 24, 26], "instancenorm": [19, 26], "swish": [19, 24], "power": [19, 56], "versu": [19, 29], "remapp": [19, 24, 30], "exist": [19, 24, 26, 27, 28, 40], "cover": [19, 21, 24, 28, 29], "than": [19, 25, 27, 29, 32, 37, 43, 48, 53], "knowledg": [19, 29], "possibl": [19, 29, 34], "special": [19, 23, 27, 34], "bfloat16graph_opt": 19, "4096": [19, 27], "unit": [19, 25, 27, 29], "num_unit": [19, 27], "els": [19, 27, 35, 54], "784": [19, 27, 28], "digit": [19, 27], "dens": [19, 20, 27], "dense_1": [19, 27], "dense_2": [19, 27], "dense_logit": [19, 27], "predict": [19, 26, 27, 52], "sparse_categorical_crossentropi": [19, 27], "rmsprop": [19, 27], "metric": [19, 27], "x_train": [19, 27], "y_train": [19, 27], "x_test": [19, 27], "y_test": [19, 27], "dataset": [19, 27, 47, 53], "mnist": [19, 27, 31, 39, 42, 53], "load_data": [19, 27], "reshap": [19, 25, 27], "60000": [19, 27], "astyp": [19, 27, 48], "10000": [19, 25, 27], "histori": [19, 27], "fit": [19, 29], "8192": [19, 27], "validation_split": [19, 27], "test_scor": [19, 27], "evalu": [19, 27, 49, 52], "stabil": [19, 27], "rule": 19, "introduct": [19, 56], "adjust": [20, 25], "Not": 20, "rest": [20, 24], "ll": [20, 24], "prioriti": [20, 30], "itex_tile_as_devic": 20, "card": [20, 53], "treat": 20, "itex_fp32_math_mod": 20, "math": [20, 24, 27, 32, 37], "tf32": 20, "bf32": 20, "auto_mixed_precision_log_path": [20, 30], "tf_cpp_max_vlog_level": 20, "itex_cpp_min_log_level": 20, "tf_cpp_min_log_level": 20, "displai": 20, "onc": [20, 27, 29], "across": [20, 25], "iter": [20, 55], "larg": [20, 27, 29, 39], "dump": 20, "bert": [20, 39, 42], "encod": 20, "layer_0": 20, "biasadd": [20, 26], "read": [20, 27, 40, 50], "dt_float": [20, 35], "data_format": [20, 55], "nhwc": [20, 29], "remain": 20, "situat": [20, 30], "preserv": 20, "dpc": [21, 32, 33, 34, 37], "besid": [21, 29], "etc": [21, 32], "aka": 21, "almost": 21, "thing": 21, "expos": [21, 22, 56], "factor": [21, 28], "influenc": [21, 28, 29], "properli": [21, 28], "unifi": [21, 28], "topologi": [21, 28, 29], "combin": [21, 28, 29, 49], "autom": [21, 28], "complic": [21, 28], "launch": [21, 37, 49], "blob": [21, 31], "20230123": 21, "md": 21, "openxla_support_on_gpu": 21, "tfx": 21, "bridg": [21, 31], "streamlin": [21, 31], "deploi": [21, 31], "while": [21, 27, 29, 30, 31, 34, 44, 48, 51], "effici": [21, 29, 31, 55], "easi": [21, 40, 56], "track": [22, 51], "item": 22, "stat": 22, "trace": 22, "viewer": 22, "tensorflow_hub": 22, "tensorboard": [22, 56], "np": [22, 25, 48, 50, 53, 54], "tf_hub": 22, "logpath": 22, "join": [22, 29], "profiler_demo": 22, "set_log_device_plac": 22, "keraslay": 22, "tfhub": 22, "imagenet": [22, 54], "resnet_v1_50": 22, "classif": 22, "ones": [22, 25, 26, 30], "224": 22, "warm": 22, "stop": [22, 29], "demo": 22, "logdir": 22, "bind_al": 22, "analyz": 22, "tab": 22, "dashboard": 22, "refresh": 22, "bring": [23, 27, 28, 56], "deeper": 23, "choos": [23, 25, 27, 28, 29, 34, 38, 43, 47, 48, 50], "These": [24, 27, 28, 56], "equal": [24, 29, 54], "notequ": 24, "greaterequ": 24, "greater": [24, 29], "lessequ": 24, "l2loss": 24, "addn": 24, "batchmatmul": [24, 26], "mul": 24, "trainingop": 24, "relu6": 24, "elu": 24, "leakyrelu": 24, "gelu_erf": 24, "gelu_tanh": 24, "tanh": [24, 25, 26], "sigmoid": [24, 25, 26], "fusedbatchnorm": 24, "fusedbatchnormgrad": 24, "relugrad": 24, "biasaddgrad": 24, "convgradfilt": 24, "pad": [24, 25, 48], "break": 24, "closer": 24, "accmatmul": 24, "fusedmatmul": 24, "fusedaccmatmul": 24, "matcher": 24, "withsum": 24, "attribut": [24, 30], "tout": 24, "tpost": 24, "is_bf16_math_mod": 24, "boolean": [24, 28], "indic": [24, 27, 43, 55], "transpos": [24, 26], "conv3d": 24, "maxpool3d": 24, "unnecessari": [24, 27, 29], "ndhwc": 24, "ncdhw": 24, "adam": 25, "decai": 25, "weight_decay_r": 25, "001": [25, 26], "learning_r": [25, 52], "beta_1": 25, "beta_2": 25, "999": 25, "epsilon": [25, 26], "1e": [25, 27], "exclude_from_weight_decai": 25, "layer_norm": 25, "kwarg": [25, 26], "adamw": 25, "describ": [25, 27, 28, 29], "decoupl": 25, "regular": 25, "loshch": 25, "ilov": 25, "hutter": 25, "pdf": 25, "tfa": [25, 26, 50], "trainabl": 25, "piecewiseconstantdecai": 25, "15000": 25, "lr": [25, 53], "wd": 25, "lambda": 25, "ba": 25, "et": 25, "al": 25, "2016": 25, "axi": [25, 26], "scale": [25, 26, 55], "beta_initi": [25, 26], "gamma_initi": [25, 26], "beta_regular": [25, 26], "gamma_regular": [25, 26], "beta_constraint": [25, 26], "gamma_constraint": [25, 26], "independ": [25, 28], "rather": 25, "close": [25, 29], "deviat": 25, "arang": 25, "99998": 25, "group": [25, 29], "yuxin": 25, "wu": 25, "kaim": 25, "he": 25, "divid": [25, 27, 29], "varianc": 25, "empir": 25, "stabl": [25, 27, 39, 42, 56], "norm": 25, "wide": [25, 39, 42], "rang": [25, 27, 29], "linearli": 25, "4d": 25, "gaussian": 25, "where": [25, 27, 29, 34], "nonlinear": 25, "gate": 25, "sign": [25, 32], "arrai": 25, "00404969": 25, "15865526": 25, "8413447": 25, "9959502": 25, "00363725": 25, "158808": 25, "841192": 25, "9963627": 25, "long": 25, "short": [25, 27], "hochreit": 25, "schmidhub": 25, "1997": 25, "lstm": 25, "200": [25, 26, 54], "recurrent_activ": [25, 26], "use_bia": [25, 26], "kernel_initi": [25, 26], "glorot_uniform": [25, 26], "recurrent_initi": [25, 26], "orthogon": [25, 26], "bias_initi": [25, 26], "constraint": 25, "fallback": 25, "fast": 25, "mask": 25, "strictli": 25, "outermost": 25, "return_sequ": 25, "return_st": 25, "whole_seq_output": 25, "final_memory_st": 25, "final_carry_st": 25, "experimental_ops_overrid": [26, 30], "overload": 26, "kept": [26, 27], "layernormgrad": 26, "itexlayernorm": 26, "itexlayernormgrad": 26, "itexgelu": 26, "itexgelugrad": 26, "addon": [26, 53, 54], "itexlstm": 26, "itexrnn": 26, "mixed_precis": 27, "mixed_float16": 27, "mixed_bfloat16": 27, "distinguish": 27, "nvidia": [27, 46, 49, 50], "is_gpu_avail": 27, "test_func": 27, "identif": 27, "2022": [27, 28, 30], "14": [27, 53], "02": 27, "52": [27, 28], "41": 27, "061277": 27, "w": [27, 39], "gpu_profil": 27, "111": [27, 29], "warn": [27, 28, 35], "061301": 27, "114": [27, 53], "061306": 27, "118": 27, "063685": 27, "063851": 27, "stream_executor": 27, "cuda": 27, "cuda_driv": 27, "269": 27, "cuinit": 27, "303": 27, "063865": 27, "cuda_diagnost": 27, "156": 27, "dut3046": 27, "atsp": 27, "proc": [27, 29], "caus": [27, 29, 51], "set_global_polici": 27, "slowli": 27, "least": [27, 32, 33], "multi": [27, 29, 30, 33, 34, 55], "worker": [27, 54], "messag": [27, 28], "aspect": 27, "constructor": 27, "numer": 27, "queri": 27, "compute_dtyp": 27, "variable_dtyp": 27, "mention": [27, 29], "next": 27, "domin": 27, "neglig": 27, "therefor": [27, 29], "fewer": 27, "finish": [27, 34, 48, 51], "dense1": 27, "dense2": 27, "previous": 27, "Their": 27, "mismatch": 27, "dtype_polici": 27, "incorrect": 27, "end": [27, 39, 40, 42, 47], "would": [27, 32, 34, 54], "correct": [27, 34], "keep": [27, 29], "middl": 27, "fine": [27, 28, 29, 46], "intermedi": 27, "flow": 27, "occur": 27, "think": 27, "But": 27, "necessari": [27, 32, 36, 37, 48], "last": [27, 51], "suffici": 27, "even": [27, 28, 29, 38, 56], "still": 27, "simpli": [27, 55], "particular": 27, "storag": [27, 43, 51], "googleapi": [27, 43, 51], "npz": 27, "11490434": 27, "1u": 27, "don": 27, "divis": 27, "retriev": 27, "scratch": [27, 46], "again": 27, "initial_weight": 27, "get_weight": 27, "6240": 27, "3359": 27, "val_loss": 27, "9755": 27, "val_accuraci": 27, "7494": 27, "83m": 27, "7987": 27, "7520": 27, "3455": 27, "8972": 27, "81m": 27, "3670": 27, "8819": 27, "3753": 27, "8751": 27, "85m": 27, "3555": 27, "8863": 27, "2155": 27, "9377": 27, "84m": 27, "1986": 27, "9410": 27, "4498": 27, "8534": 27, "spend": 27, "afterward": [27, 28, 29], "colab": 27, "rerun": 27, "cell": [27, 49], "On": [27, 29, 32, 36, 37], "significantli": 27, "sped": 27, "world": 27, "doubl": 27, "toi": 27, "entir": 27, "60": [27, 28, 46], "000": 27, "imag": [27, 36, 37, 39, 49, 54], "narrow": 27, "65504": 27, "infin": 27, "much": [27, 29, 47], "256": [27, 55], "inf": 27, "rare": 27, "gradient": 27, "prevent": 27, "concept": [27, 29], "sai": [27, 46], "1024": 27, "greatli": 27, "pseudocod": 27, "loss_scal": 27, "grad": 27, "compute_gradi": 27, "trainable_vari": 27, "tricki": 27, "solv": 27, "explicitli": [27, 28, 30, 47], "wrapper": [27, 37], "lossscaleoptim": 27, "far": 27, "did": [27, 29], "wrap": 27, "highli": 27, "recommend": [27, 29, 30, 31, 32, 33, 34, 36, 37, 41, 47], "been": [27, 29, 49, 55], "known": [27, 51], "loss_object": 27, "sparsecategoricalcrossentropi": 27, "train_dataset": 27, "from_tensor_slic": 27, "shuffl": 27, "test_dataset": 27, "method": [27, 29, 40, 47], "unscal": 27, "get_scaled_loss": 27, "get_unscaled_gradi": 27, "apply_gradi": 27, "nan": 27, "halv": 27, "had": [27, 29], "potenti": [27, 56], "train_step": [27, 46, 55], "gradienttap": 27, "tape": 27, "scaled_loss": 27, "scaled_gradi": 27, "zip": 27, "few": 27, "happen": [27, 51], "qualiti": 27, "test_step": 27, "retrain": 27, "set_weight": 27, "epoch_loss_avg": 27, "test_accuraci": 27, "sparsecategoricalaccuraci": 27, "update_st": 27, "924008369445801": 27, "7239000201225281": 27, "5294489860534668": 27, "9168000221252441": 27, "3364005982875824": 27, "9381000399589539": 27, "25294047594070435": 27, "9486000537872314": 27, "26531240344047546": 27, "9536000490188599": 27, "perspect": [28, 29], "numactl": 28, "placement": [28, 29], "polici": [28, 29, 56], "malloc": [28, 29], "unspecifi": 28, "knob": 28, "your_script": 28, "your_script_arg": 28, "latency_mod": 28, "throughput_mod": 28, "often": [28, 32, 36, 37], "calcul": [28, 49], "mutual": 28, "exclus": 28, "infer_resnet50": [28, 44], "undesir": 28, "log_path": 28, "absolut": 28, "rel": 28, "One": [28, 29], "prefix": 28, "_timestamp_inst": 28, "anoth": [28, 29], "_timestamp_instance_n_cor": 28, "run_20210712212258_inst": 28, "run_20210712212258_instance_0_cores_0": 28, "43": [28, 54], "interpret": 28, "no_python": 28, "prepend": [28, 50, 54], "log_file_prefix": 28, "yourself": 28, "ninstanc": 28, "integ": 28, "instance_idx": 28, "among": [28, 29], "ncore_per_inst": 28, "resourc": [28, 29, 51, 54], "node_id": 28, "skip_cross_node_cor": 28, "cross": [28, 29], "disable_numactl": 28, "disable_taskset": 28, "taskset": 28, "use_logical_cor": 28, "core_list": 28, "core_id": 28, "enable_tcmalloc": 28, "enable_jemalloc": 28, "use_default_alloc": 28, "prefer": [28, 32, 36, 37], "certain": [28, 29], "openmp": 28, "kmp_affin": [28, 29], "granular": [28, 29], "compact": [28, 29], "hyper": [28, 29], "our": 28, "enable_itex_amp": 28, "enable_itex_layout_opt": 28, "itex_layout_opt": [28, 29, 30], "num": [28, 29], "intraop": 28, "interop": 28, "run_20221009103552_instance_0_cores_0": 28, "run_20221009103552_inst": 28, "cat": 28, "09": 28, "35": [28, 37], "53": 28, "136": 28, "__main__": 28, "neither": 28, "nor": 28, "conda_prefix": 28, "virtual_env": 28, "lib64": 28, "sdp": 28, "ld_preload": [28, 29], "omp_num_thread": 28, "96": [28, 35], "kmp_blocktim": [28, 29], "tf_enable_onednn_opt": 28, "137": 28, "localalloc": 28, "95": 28, "tee": [28, 32, 46, 55], "run_20221009104740_inst": 28, "run_20221009104740_instance_0_cores_0": 28, "191": 28, "47": 28, "908": 28, "909": 28, "192": 28, "run_20221009105044_inst": 28, "run_20221009105044_instance_0_cores_12": 28, "50": 28, "693": 28, "694": 28, "run_20221009105320_inst": 28, "run_20221009105320_instance_0_cores_0": 28, "21": 28, "089": 28, "090": 28, "run_20221009105838_inst": 28, "run_20221009105838_instance_0_cores_0": 28, "run_20221009105838_instance_1_cores_12": 28, "run_20221009105838_instance_2_cores_24": 28, "run_20221009105838_instance_3_cores_36": 28, "run_20221009105838_instance_4_cores_48": 28, "59": 28, "run_20221009105838_instance_5_cores_60": 28, "71": 28, "run_20221009105838_instance_6_cores_72": 28, "83": [28, 29], "run_20221009105838_instance_7_cores_84": 28, "58": 28, "38": 28, "757": 28, "772": 28, "795": 28, "24": [28, 53], "806": 28, "36": 28, "817": 28, "48": 28, "828": 28, "839": 28, "72": 28, "850": 28, "84": [28, 29], "run_20221009110327_inst": 28, "run_20221009110327_instance_0_cores_0": 28, "run_20221009110327_instance_1_cores_4": 28, "run_20221009110327_instance_2_cores_8": 28, "run_20221009110327_instance_3_cores_12": 28, "run_20221009110327_instance_4_cores_16": 28, "run_20221009110327_instance_5_cores_20": 28, "run_20221009110327_instance_6_cores_24": 28, "27": [28, 29, 55], "run_20221009110327_instance_7_cores_28": 28, "31": [28, 32], "run_20221009110327_instance_8_cores_32": 28, "run_20221009110327_instance_9_cores_36": 28, "39": 28, "run_20221009110327_instance_10_cores_40": 28, "run_20221009110327_instance_11_cores_44": 28, "run_20221009110327_instance_12_cores_48": 28, "51": 28, "run_20221009110327_instance_13_cores_52": 28, "run_20221009110327_instance_14_cores_56": 28, "run_20221009110327_instance_15_cores_60": 28, "63": 28, "run_20221009110327_instance_16_cores_64": 28, "67": 28, "run_20221009110327_instance_17_cores_68": 28, "run_20221009110327_instance_18_cores_72": 28, "75": 28, "run_20221009110327_instance_19_cores_76": 28, "79": 28, "run_20221009110327_instance_20_cores_80": 28, "run_20221009110327_instance_21_cores_84": 28, "87": 28, "run_20221009110327_instance_22_cores_88": 28, "91": 28, "run_20221009110327_instance_23_cores_92": 28, "03": [28, 54], "198": 28, "215": 28, "216": 28, "229": 28, "241": 28, "254": 28, "266": 28, "278": 28, "20": [28, 35, 36, 55], "290": 28, "302": 28, "28": [28, 29, 33, 37], "315": 28, "327": 28, "339": 28, "351": 28, "364": 28, "376": 28, "388": 28, "56": [28, 29], "400": 28, "413": 28, "425": 28, "68": 28, "438": 28, "452": 28, "76": 28, "465": 28, "80": 28, "480": 28, "494": 28, "88": [28, 52], "509": 28, "92": 28, "run_20221009110849_inst": 28, "run_20221009110849_instance_0_cores_0": 28, "run_20221009110849_instance_1_cores_11": 28, "run_20221009110849_instance_2_cores_22": 28, "run_20221009110849_instance_3_cores_33": 28, "08": 28, "49": [28, 37], "891": 28, "892": 28, "run_20221009110849_instance_1_cores_24": 28, "930": 28, "run_20221009110849_instance_2_cores_48": 28, "951": 28, "run_20221009110849_instance_3_cores_72": 28, "confirm": [28, 34], "34": [28, 54], "586": 28, "assign": [28, 29, 35], "604": 28, "605": 28, "run_20221009111034_instance_0_cores_0": 28, "144": 28, "145": [28, 54, 55], "run_20221009111239_instance_0_cores_24": 28, "run_20221009111753_inst": 28, "run_20221009111753_instance_0_cores_0": 28, "947": 28, "948": 28, "run_20221009111951_inst": 28, "run_20221009111951_instance_0_cores_0": 28, "404": 28, "405": 28, "match": [28, 38], "conf": 28, "549": 28, "550": 28, "malloc_conf": 28, "oversize_threshold": 28, "background_thread": 28, "metadata_thp": 28, "run_20221009112720_instance_0_cores_0": 28, "29": 28, "05": [28, 53], "206": 28, "207": 28, "run_20221009112905_instance_0_cores_0": 28, "911": 28, "run_20221009112956_instance_0_cores_0": 28, "although": 29, "articl": 29, "omp": 29, "briefli": 29, "background": 29, "being": 29, "socket": [29, 33, 55], "competit": 29, "stall": 29, "busi": 29, "uma": 29, "connect": 29, "control": [29, 39, 42, 47, 55], "remot": 29, "lscpu": [29, 47], "platinum": 29, "8180m": 29, "detect": 29, "onboard": 29, "logic": 29, "thu": 29, "total": [29, 53], "112": 29, "second": [29, 47, 54, 55], "neg": 29, "50ghz": 29, "node0": 29, "node1": 29, "friendli": 29, "nchw": 29, "idea": 29, "bound": 29, "workload": [29, 39, 42, 47, 56], "nth": 29, "man": 29, "cpunodebind": 29, "membind": 29, "wikipedia": [29, 46], "wherebi": 29, "master": [29, 31], "consecut": 29, "fork": 29, "figur": 29, "illustr": 29, "libgomp": 29, "libiomp": 29, "region": 29, "along": 29, "seen": 29, "coupl": 29, "commonli": 29, "gomp": 29, "affin": 29, "comma": 29, "hyphen": 29, "contigu": 29, "gomp_cpu_affin": 29, "omp_proc_bind": 29, "omp_schedul": 29, "static": 29, "ld": 29, "preload": 29, "libiomp5": [29, 35], "kmp": 29, "dramat": 29, "togeth": 29, "thrash": 29, "suppos": [29, 46], "leav": 29, "compet": 29, "strategi": [29, 54], "proclist": 29, "classic": 29, "blocktim": 29, "millisecond": 29, "wait": 29, "sleep": 29, "200m": 29, "elaps": 29, "larger": [29, 34], "reserv": 29, "sole": 29, "penal": 29, "plai": 29, "role": 29, "destruct": 29, "reus": [29, 40], "jemalloc": 29, "hold": 29, "dealloc": 29, "costli": 29, "gperftool": 29, "plu": 29, "nice": 29, "analysi": 29, "xzvf": 29, "heap": 29, "checker": 29, "debugalloc": 29, "flexibl": 30, "protocolmessag": 30, "easili": 30, "tune": [30, 40, 46], "offononoffoff": 30, "itex_onednn_graph": [30, 47], "itex_layout_optitex_remapperitex_auto_mixed_precisionitex_shard": 30, "except": [30, 37], "enum": 30, "itexdatatyp": 30, "datatyp": [30, 35, 46, 50], "toggl": 30, "unless": 30, "field": 30, "onednn_graph": 30, "onednn_graphoverrid": 30, "layout_opt": 30, "itex_remapp": 30, "itex_shard": 30, "xpu_force_sync": 30, "itex_sync_exec": 30, "sync": 30, "hurt": 30, "rais": 30, "valueerror": 30, "git_vers": [30, 33], "7112d33": 30, "onednn_cpu_git_vers": 30, "a930253": 30, "onednn_gpu_git_vers": 30, "compiler_vers": 30, "gcc": 30, "20180905": 30, "dpcpp": [30, 32], "122": 30, "tf_compatible_vers": 30, "lt": 30, "put": 31, "libitex_cpu_cc": [31, 35], "libitex_gpu_cc": [31, 35], "l28": 31, "exit": 31, "xxxxx": [31, 55], "kernels_experiment": 31, "tf_cuda_librari": 31, "if_not_mobil": 31, "p1": 31, "tf_serv": 31, "serving_plugin": 31, "l24": 31, "l29": 31, "local_repositori": 31, "org_tensorflow": 31, "wno": 31, "stringop": 31, "truncat": 31, "rm": [31, 35, 41, 43, 54], "rf": [31, 41, 54], "tmp": 31, "mnist_saved_model": 31, "saved_model": 31, "l": [31, 35], "modelserv": 31, "plug": [31, 56], "hub": 31, "port": [31, 49], "rest_api_port": 31, "8501": 31, "model_base_path": 31, "tensorflow_plugin": 31, "path_to_libitex_cpu_cc": 31, "oneapi_install_path": 31, "path_to_libitex_gpu_cc": 31, "mnist_client": 31, "num_test": 31, "1000": [31, 50], "xx": 31, "earli": 32, "effort": 32, "basi": 32, "subystem": 32, "graphic": [32, 34], "101": 32, "4255": 32, "dch": 32, "gpg": 32, "agent": 32, "qo": 32, "dearmor": 32, "keyr": 32, "echo": 32, "deb": 32, "arch": 32, "i386": 32, "jammi": 32, "igc": 32, "cm": 32, "libigc1": 32, "13822": 32, "libigdfcl1": 32, "libigdgmm12": 32, "pub": 32, "sw": 32, "archiv": 32, "instead": [32, 46, 49, 50], "icd_23": 32, "04_amd64": 32, "isol": [32, 36, 37], "basekit": [32, 33, 37], "weekli": 32, "env_check": [32, 33, 37, 56], "quick_exampl": 32, "access": 32, "onemkl": [32, 33, 34, 37], "registrationcent": [32, 37], "akdlm": [32, 37], "irc_na": [32, 37], "992857b9": [32, 37], "624c": [32, 37], "45de": [32, 37], "9701": [32, 37], "f6445d845359": [32, 37], "l_basekit_p_2023": [32, 37], "49397_offlin": [32, 37], "mpi": [32, 33, 37], "deploy": [33, 36, 37], "miniconda": 33, "approach": 33, "easiest": 33, "setup": [33, 36, 38, 40], "press": 33, "curl": 33, "anaconda": 33, "miniconda3": 33, "x86_64": [33, 34], "restart": 33, "termin": 33, "bashrc": 33, "intelpython3_ful": 33, "142f5f29": 33, "ccl": [33, 37], "cluster": 33, "fi_provid": 33, "though": 34, "virtual": [34, 46, 47, 49, 50, 51, 52, 55], "itex_build": 34, "aot": 34, "ahead": 34, "startup": 34, "prolong": 34, "minut": 34, "tookit": 34, "tree": 34, "prompt": 34, "differenct": 34, "fill": 34, "ats": 34, "m150": 34, "acm": 34, "g11": 34, "ve": 34, "140": 34, "m75": 34, "pvc": 34, "a730m": 34, "g10": 34, "a380": 34, "wrong": 34, "identifi": 34, "libitex_common": 34, "_pywrap_itex": 34, "libitex_cpu": 34, "libitex_gpu": 34, "preconfigur": 34, "bazelrc": 34, "shoul": 35, "diretcori": 35, "llvm_openmp": 35, "pythonhost": 35, "ed": 35, "310fee0477ce46f722c561dd7e21eebca0d1d29bdb3cf4a2335b845fbba4": 35, "cp311": 35, "manylinux_2_17_x86_64": 35, "manylinux2014_x86_64": 35, "b": [35, 43, 47, 54, 55], "unzip": 35, "tensorflow_2": 35, "symbol": 35, "ln": 35, "libtensorflow_cc": 35, "libtensorflow_framework": 35, "libtensorflow": 35, "r2": [35, 55], "install_head": 35, "environment": 35, "library_path": 35, "tf_loadpluggabledevicelibrari": 35, "c_api_experiment": 35, "tf_statu": 35, "lib_path": 35, "client_sess": 35, "standard_op": 35, "newrootscop": 35, "assign_x": 35, "randomnorm": 35, "assign_i": 35, "z": [35, 53], "const": 35, "vz": 35, "vector": 35, "clientsess": 35, "session": [35, 47], "fetch": 35, "tf_check_ok": 35, "matrix": 35, "xpu_lib_path": 35, "c_str": 35, "tf_code": 35, "status_msg": 35, "tf_messag": 35, "makefil": 35, "tf_include_path": 35, "tfcc_path": 35, "example_test": 35, "ltensorflow_framework": 35, "ltensorflow_cc": 35, "wl": 35, "rpath": 35, "tbb": [35, 37], "2nd": 36, "4th": [36, 43], "cento": 36, "sapphir": [36, 43], "rapid": [36, 43], "8888": [36, 37, 43, 47, 49, 51], "pip3": 36, "simultan": 37, "stack": [37, 38], "libiari": 37, "en": 37, "consol": 37, "00": 37, "374832": 37, "itex_cpu_wrapp": 37, "42": 37, "217981": 37, "itex_gpu_wrapp": 37, "205706": 37, "313231": 37, "varieti": [39, 42], "classifi": [39, 42], "bare": [39, 42], "metal": [39, 42], "alexnet": [39, 42], "recogn": [39, 40, 42], "handwrit": [39, 40, 42], "ai": [39, 40, 42, 45, 47, 56], "zoo": [39, 42], "diffus": [39, 42, 56], "text2imag": [39, 42], "pretrain": 39, "3d": 39, "unet": 39, "medic": 39, "segment": 39, "technologi": 40, "big": 40, "blocker": 40, "analyt": 40, "websit": [40, 56], "env_nam": 41, "env_itex": [41, 43, 47, 49, 50, 51, 53], "venv": [41, 50, 53], "internet": 43, "throughput": [43, 49], "seriesintel": 43, "170intel": 43, "seriesne": 43, "seriessupport": 43, "itex_repo": 43, "pwd": [43, 55], "infer_inception_v4_amp": 43, "v1_8": 43, "inceptionv4_fp32_pretrained_model": 43, "set_env_gpu": [43, 44, 51], "ws1": 43, "infer_fp32_vs_amp": 43, "screen": 43, "01837550401687622": 43, "0113076031208038": 43, "fp": 43, "128": [43, 46, 52], "92880015134813": 43, "1691980294577": 43, "6153628825864496": 43, "867908472383153": 43, "wors": 43, "set_env_cpu": [44, 51], "env_itex_cpu": [44, 51], "success": [44, 48, 49, 55], "n02123159": 44, "tiger_cat": 44, "22355853": 44, "legaci": [46, 49, 50, 51, 52, 55], "deeplearningexampl": [46, 50], "tensorflow2": [46, 53], "languagemodel": 46, "pip_set_env": [46, 47, 49, 50, 52], "extract": 46, "squad": [46, 52], "bookcorpu": 46, "data_download": 46, "v1": [46, 47, 52, 56], "google_pretrained_weight": 46, "uncased_l": 46, "24_h": 46, "1024_a": 46, "12_h": 46, "768_a": 46, "tfrecord": [46, 50], "books_wiki_en_corpu": 46, "consum": 46, "v100": 46, "dai": 46, "pretrain_bert": 46, "lamb": 46, "maximum": 46, "sequenc": [46, 49], "length": 46, "phase1": 46, "phase2": 46, "512": [46, 52], "train_batch_size_phase1": 46, "train_batch_size_phase2": 46, "eval_batch_s": 46, "learning_rate_phase1": 46, "5e": 46, "learning_rate_phase2": 46, "usa_xla": 46, "num_gpu": [46, 55], "warmup_steps_phase1": 46, "660": 46, "warmup_steps_phase2": 46, "66": 46, "2600": 46, "save_checkpoint_step": 46, "num_accumulation_steps_phase1": 46, "num_accumulation_steps_phase2": 46, "bert_model": [46, 52], "gbs1": 46, "expr": 46, "gbs2": 46, "pretrain_result_dir": 46, "tf_bert_pretraining_lamb_": 46, "_gbs1_": 46, "_gbs2_": 46, "data_dir": [46, 50, 54], "run_pretraining_lamb": 46, "pretrain_lamb": 46, "checkpoint": 46, "batch_size_per_gpu": 46, "learning_rate_per_gpu": 46, "use_xla": 46, "squad_vers": 46, "use_mytrain": 46, "pretrain_path": 46, "phase_2": 46, "ckpt": [46, 52], "result_dir": 46, "tf_bert_finetune_": 46, "run_squad": [46, 52], "calibr": 47, "qdq": 47, "dequant": 47, "flower": 47, "photo": 47, "transfer": 47, "stage": 47, "protobuf": 47, "rewriter_config_pb2": 47, "infer_config": 47, "rewrite_opt": 47, "constant_fold": 47, "rewriterconfig": 47, "set_sess": 47, "speedup": [47, 55], "grep": 47, "vnni": 47, "avx_vnni": 47, "amx": 47, "amx_bf16": 47, "amx_int8": 47, "run_jupyt": 47, "yyi": 47, "xxxxxxxx": 47, "ipynb": [47, 49, 51], "mit": 47, "sy": 48, "num_channel": 48, "input_width": 48, "input_height": 48, "filter_width": 48, "filter_height": 48, "rand": 48, "stride": 48, "bias_add": 48, "479142": 48, "7296917": 48, "6456823": 48, "077278": 48, "9259825": 48, "3000765": 48, "3999124": 48, "0527704": 48, "0656753": 48, "85485": 48, "7297122": 48, "9373732": 48, "4818356": 48, "1455178": 48, "4929404": 48, "6422923": 48, "718459": 48, "7090344": 48, "988714": 48, "3391027": 48, "875052": 48, "6461415": 48, "9349675": 48, "327398": 48, "298973": 48, "3905785": 48, "1704025": 48, "9154005": 48, "6926193": 48, "9677248": 48, "481086": 48, "9746864": 48, "8941312": 48, "3221133": 48, "5479512": 48, "197306": 48, "305706": 48, "9873173": 48, "5597944": 48, "250221": 48, "118212": 48, "8672705": 48, "949225": 48, "2636094": 48, "5300783": 48, "1403804": 48, "1729176": 48, "6628485": 48, "2607155": 48, "6342418": 48, "9381838": 48, "6761076": 48, "5063303": 48, "4718971": 48, "8880196": 48, "1658201": 48, "3787665": 48, "1193419": 48, "42261": 48, "318963": 48, "8809638": 48, "6514435": 48, "3549364": 48, "8598063": 48, "517385": 48, "9702091": 48, "9260886": 48, "3804817": 48, "381424": 48, "6027272": 48, "7787259": 48, "9631021": 48, "93901324": 48, "2134862": 48, "89942324": 48, "cv": 49, "concaten": 49, "loop": [49, 55], "hasn": 49, "reset": 49, "66fa74b6a2a0bb1e563ae8bce66496b118b95200": 49, "ipykernel": 49, "url": [49, 51], "token": [49, 51], "stable_diffussion_infer": 49, "stable_diffusion_infer": 49, "present": 49, "fr\u00e9chet": 49, "distanc": 49, "fid": 49, "outcom": 49, "a100": 49, "stable_diffusion_accuraci": 49, "load_ref_result": 49, "ref_result_dir": 49, "nv_result": 49, "img_arrays_for_acc": 49, "81": [49, 52], "1146879196167": 49, "328223477737884": 49, "3dunet_itex": 50, "3dunet_itex_with_horovod": 50, "unet_3d_med": 50, "88eb3cff2f03dad85035621d041e23a14345999": 50, "nightli": 50, "dllogger": 50, "brain": 50, "tumor": 50, "2019": 50, "upon": 50, "challeng": 50, "ipp": 50, "cbica": 50, "upenn": 50, "edu": 50, "nifti": 50, "volum": 50, "nibabel": 50, "preprocess_data": 50, "train_maskrcnn": 50, "dataset_dir": 50, "output_dir": [50, 52], "exec_mod": 50, "warmup_step": 50, "150": 50, "max_step": 50, "log_everi": 50, "dataset_path": 50, "mpirun": [50, 54], "rank": [50, 54], "ppn": [50, 54], "tutori": 51, "pacakg": 51, "tensorflow_doc": 51, "classify_text_with_bert": 51, "ip": 51, "f502f0715979ec73c571ca5676ba58431b916f5f58ee3333": 51, "crash": 51, "tri": 51, "traceback": 51, "recent": 51, "174": 51, "__del__": 51, "typeerror": 51, "nonetyp": 51, "callabl": 51, "research": [52, 54], "bert_large_dir": 52, "squad_dir": 52, "vocab_fil": 52, "vocab": 52, "bert_config_fil": 52, "bert_config": 52, "json": 52, "init_checkpoint": 52, "do_train": 52, "train_fil": 52, "do_predict": 52, "predict_fil": 52, "train_batch_s": 52, "3e": 52, "num_train_epoch": 52, "max_seq_length": 52, "doc_strid": 52, "use_tpu": 52, "tpu_nam": 52, "produc": 52, "f1": 52, "41249612335034": 52, "exact_match": 52, "2488174077578": 52, "gin": [53, 54], "raw": 53, "train_horovod": 53, "tensorflow2_keras_mnist": 53, "horovodrun": 53, "18": 53, "54": 53, "006950": 53, "custom_graph_optimizer_registri": 53, "163161": 53, "940695": 53, "107809": 53, "163517": 53, "250": 53, "yym": 53, "xxxx": [53, 54], "yyyi": 53, "zzzz": 53, "yaml": 54, "itex_dummi": 54, "hvd_support_light": 54, "hvd_support": 54, "light": 54, "minimum": 54, "alloc": 54, "growth": 54, "distributedoptim": 54, "lar": 54, "paper": 54, "net": 54, "php": 54, "non": 54, "commerci": 54, "purpos": 54, "pythonpath": 54, "imagenet_data": 54, "config_fil": 54, "number_of_process": 54, "process_per_nod": 54, "correspondingli": 54, "dummi": 54, "fi": 54, "vision": 54, "image_classif": [54, 55], "classifier_train": 54, "train_and_ev": 54, "model_typ": 54, "resnet": [54, 55], "i0909": 54, "323099": 54, "140645511436096": 54, "keras_util": [54, 55], "timehistori": [54, 55], "324534": 54, "140611700504384": 54, "037004": 54, "037142": 54, "213994": 54, "300": 54, "214127": 54, "accordingli": 55, "tf_num_interop_thread": 55, "tf_num_intraop_thread": 55, "resnet_ctl_imagenet_main": 55, "train_epoch": 55, "steps_per_loop": 55, "log_step": 55, "skip_ev": 55, "use_synthetic_data": 55, "distribution_strategi": 55, "use_tf_while_loop": 55, "use_tf_funct": 55, "enable_xla": 55, "enable_tensorboard": 55, "enable_checkpoint_and_export": 55, "channels_last": 55, "single_l2_loss_op": 55, "follw": 55, "use_itex_shard": 55, "pramet": 55, "suggest": 55, "2x256x10": 55, "5120": 55, "itex_enable_multiple_stream": 55, "queue": 55, "resnet50_itex": 55, "tfg_optimizer_hook": 55, "289": 55, "i0324": 55, "594147": 55, "140348344015936": 55, "597360": 55, "479": 55, "sec": 55, "train_accuraci": 55, "train_loss": 55, "634554": 55, "161625": 55, "163815": 55, "790632": 55, "792936": 55, "103148": 55, "25": 55, "416651": 55, "419072": 55, "3359284": 55, "025180": 55, "027671": 55, "3343554": 55, "aim": 56, "flexibli": 56, "diagram": 56, "summari": 56, "ecosystem": 56, "estim": 56, "manag": 56, "dockerhub": 56, "come": 56, "soon": 56, "visit": 56, "tour": 56, "collabor": 56, "adher": 56, "innov": 56, "jax": 56, "vulner": 56, "apach": 56, "govern": 56, "forth": 56}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"contributor": 0, "coven": 0, "code": [0, 7, 17, 19, 34, 35, 46, 48, 49, 50, 51, 52, 53, 55], "conduct": 0, "our": 0, "pledg": 0, "standard": 0, "enforc": 0, "respons": 0, "scope": 0, "guidelin": [0, 7], "1": [0, 11, 16, 31, 32, 35], "correct": 0, "2": [0, 11, 16, 31, 32, 35], "warn": 0, "3": [0, 11, 16, 32], "temporari": 0, "ban": 0, "4": [0, 11, 16, 32], "perman": 0, "attribut": [0, 18], "secur": [1, 56], "polici": [1, 27], "report": 1, "vulner": 1, "intel": [2, 3, 4, 6, 7, 23, 29, 30, 31, 32, 34, 35, 36, 37, 40, 41, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 57], "extens": [2, 3, 4, 6, 7, 10, 23, 30, 31, 32, 34, 35, 36, 37, 40, 47, 57], "tensorflow": [2, 3, 4, 6, 7, 18, 19, 21, 23, 30, 31, 32, 34, 35, 36, 37, 40, 47, 57], "docker": [2, 3, 31, 36, 37, 43, 45], "contain": [2, 3, 36, 37, 43, 45], "guid": [2, 3, 5, 7, 28, 29, 38, 41, 45], "descript": [2, 3], "binari": [2, 3, 56], "prepar": [2, 3, 35, 41, 43, 44, 46, 49, 50, 51, 52, 53, 54, 55], "usag": [2, 15, 17, 18, 19, 22, 26, 28], "i": [2, 3, 28], "custom": [2, 11, 19, 23, 25, 27], "build": [2, 3, 5, 11, 14, 16, 27, 31, 34, 35, 36, 37], "script": [2, 28, 41], "ii": [2, 3, 28], "iii": [2, 28], "run": [2, 3, 16, 31, 32, 35, 40, 41, 43, 44, 45, 46, 47, 49, 50, 51, 52, 53, 55], "verifi": [2, 11, 32, 36, 37], "That": 2, "gpu": [2, 16, 17, 21, 22, 29, 32, 34, 35, 37, 40, 41, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 56], "access": [2, 29], "from": [2, 14, 31, 35, 36, 37], "serv": [3, 21, 31], "imag": [3, 31, 50], "welcom": [4, 6, 57], "document": [4, 5, 6, 7, 56, 57], "highlight": 4, "onlin": 5, "introduct": [5, 13, 23, 40, 43, 45, 46, 47, 49, 50, 51, 52, 55], "updat": 5, "latest": 5, "version": [5, 30, 47], "creat": [5, 34, 53], "releas": [5, 8, 32], "local": [5, 40, 47], "test": [5, 7, 43], "contribut": [7, 56], "develop": 7, "tip": [7, 19], "debug": 7, "unit": 7, "python": [7, 11, 17, 18, 20, 21, 30, 35, 43, 44, 49, 55], "style": 7, "c": [7, 31, 35], "bazel": [7, 34], "known": 8, "issu": 8, "incompat": 8, "chang": [8, 46, 49, 50, 52], "directori": 9, "tree": 9, "structur": [9, 17], "design": [10, 12, 28], "workflow": [10, 15, 17], "resourc": [10, 56], "how": [11, 27], "write": 11, "op": [11, 25, 30], "prerequisit": [11, 30, 44, 46, 49, 50, 51, 52, 55], "defin": 11, "interfac": 11, "regist": 11, "kernel": 11, "implement": [11, 24], "6": 11, "add": 11, "7": 11, "us": [11, 21, 28, 31], "8": 11, "packag": [11, 35, 37, 55], "9": 11, "instal": [11, 16, 31, 32, 33, 34, 36, 37, 38, 48, 53, 55, 56], "optim": [12, 13, 19, 21, 24, 53, 54], "onednn": [13, 47], "object": 13, "cach": 13, "convolut": 13, "frequent": 14, "ask": 14, "question": 14, "troubleshoot": 14, "sourc": [14, 31, 34, 35], "runtim": 14, "int8": [15, 21], "quantiz": [15, 21, 40, 47], "overview": [15, 16, 17, 19, 20, 27, 28, 29, 30, 34], "openxla": [16, 21], "support": [16, 21, 35, 56], "via": [16, 20, 32, 36, 37, 43], "pjrt": 16, "hardwar": [16, 27, 29, 32, 34, 36, 37, 40, 43, 46, 47, 49, 50, 51, 52, 55, 56], "softwar": [16, 29, 32, 36, 37, 56], "requir": [16, 32, 34, 36, 37, 43, 46, 49, 50, 51, 52, 55, 56], "driver": [16, 32, 34, 37, 41], "librari": [16, 31, 35], "jax": 16, "exampl": [16, 17, 18, 19, 22, 28, 34, 35, 39, 42, 44, 46, 48, 49, 50, 52, 53, 54, 55], "xpuautoshard": [17, 21, 55], "experiment": [17, 21, 32], "api": [17, 18, 20, 21, 23, 30, 43, 44, 49, 55], "dump": 17, "graph": [17, 19, 21, 24, 30, 47], "tune": [18, 19, 52], "advanc": [18, 19, 21, 23, 28, 43, 47], "auto": [18, 19, 20, 21], "mix": [18, 19, 20, 21, 24, 27, 43], "precis": [18, 19, 20, 21, 27, 43], "background": [18, 40, 47], "numer": 18, "stabil": 18, "configur": [18, 20, 29, 34, 35, 43, 47], "list": 18, "rule": 18, "improv": 18, "perform": [18, 43, 54], "environ": [18, 20, 28, 30, 32, 33, 34, 36, 37, 40, 41, 43, 44, 46, 47, 49, 50, 51, 52, 53, 55], "variabl": [18, 20, 28, 30, 32, 37, 43], "differ": [18, 27], "stock": [18, 19], "end": 18, "mobilenet": 18, "amp": [19, 21, 43], "v": [19, 28], "data": [19, 24, 54], "type": [19, 24, 27], "featur": [19, 21, 23], "manual": 19, "quick": [19, 45, 48, 56], "train": [19, 27, 45, 50, 51, 53, 54, 55], "setup": [19, 27, 32, 37, 41, 43, 44, 46, 49, 50, 51, 52, 53, 55], "enabl": [19, 41, 43, 44, 46, 47, 49, 50, 51, 52, 53, 55], "origin": 19, "notic": 19, "log": [19, 28], "save": 19, "oper": [19, 21, 25, 26, 30], "itex_verbos": 20, "level": 20, "definit": 20, "backend": 20, "config": [20, 30], "protocol": [20, 30], "option": [20, 32, 35], "eas": 21, "profil": [21, 22], "cpu": [21, 29, 34, 35, 36, 37, 43, 44, 47, 48, 49, 51, 56], "launcher": 21, "faq": [22, 43, 44, 46, 49, 50, 51, 52], "infrastructur": 23, "architectur": 23, "public": 23, "manag": 23, "xpu": [23, 34, 37, 56], "engin": 23, "fusion": 24, "basic": [24, 28], "detail": 24, "gener": 24, "layout": [24, 29], "itex": [25, 30], "adamwithweightdecayoptim": 25, "layernorm": 25, "groupnorm": 25, "gelu": [25, 26], "itexlstm": 25, "overrid": [26, 30], "layer": 26, "normal": 26, "dens": 26, "activ": 26, "instanc": [26, 28], "lstm": 26, "kera": 27, "identifi": 27, "set": [27, 28, 40, 54, 55], "dtype": 27, "model": [27, 31, 43, 45, 46, 49, 50, 52, 54], "fit": 27, "loss": 27, "scale": 27, "underflow": 27, "overflow": 27, "loop": 27, "launch": 28, "user": 28, "common": [28, 34, 41], "execut": [28, 40, 43, 44, 46, 49, 50, 51, 52, 53, 54, 55], "mode": 28, "latenc": 28, "throughput": 28, "multi": [28, 50], "numa": [28, 29], "control": 28, "memori": [28, 29], "alloc": [28, 29], "singl": [28, 50], "infer": [28, 43, 44, 45, 49], "all": 28, "physic": 28, "core": 28, "includ": 28, "logic": 28, "one": 28, "node": 28, "iv": 28, "your": 28, "number": 28, "multipl": 28, "vi": 28, "vii": 28, "viii": 28, "index": 28, "ix": 28, "tf_num_intraop_thread": 28, "x": 28, "tf_num_interop_thread": 28, "tcmalloc": [28, 29], "jemalloc": 28, "default": 28, "practic": 29, "tabl": [29, 56], "content": 29, "non": 29, "uniform": 29, "format": 29, "numactl": 29, "openmp": 29, "omp_num_thread": 29, "gnu": 29, "import": 30, "intel_extension_for_tensorflow": 30, "name": 30, "preserv": 30, "configproto": 30, "gpuoption": 30, "graphopt": 30, "automixedprecisionopt": 30, "shardingconfig": 30, "debugopt": 30, "set_config": 30, "get_config": 30, "server": [31, 40, 47], "dockerfil": [31, 36, 37], "sampl": 31, "arc": 32, "A": 32, "seri": 32, "window": 32, "subsystem": 32, "linux": 32, "wsl2": 32, "nativ": 32, "directli": 32, "step": [32, 33, 43, 44, 49, 51], "By": 32, "instruct": [32, 33], "ubuntu": 32, "pypi": [32, 34, 36, 37], "wheel": [32, 36, 37], "virtual": [32, 36, 37, 41, 53], "system": [32, 36, 37], "full": 32, "oneapi": [32, 34, 37, 41, 53], "conda": [33, 34], "precondit": 33, "download": [34, 43, 51, 53, 54], "extra": 34, "onli": [34, 37], "base": [34, 37, 40, 41], "toolkit": [34, 37, 41], "For": 34, "addit": 34, "cc": 35, "header": 35, "file": 35, "extract": 35, "recommend": 35, "integr": 35, "linker": 35, "load": 35, "get": [36, 37, 56], "dockerhub": [36, 37], "bare": [36, 37, 43, 45], "metal": [36, 37, 43, 45], "check": [37, 47], "platform": 37, "acceler": [40, 45, 46, 50, 55], "alexnet": 40, "devcloud": [40, 47], "up": [40, 43], "speed": 43, "incept": [43, 47], "v4": 43, "automat": 43, "skip": [43, 44, 49, 51], "thi": [43, 44, 49, 51], "clone": [43, 53], "repositori": 43, "pretrain": [43, 46], "compar": 43, "fp32": [43, 49], "result": 43, "method": 43, "resnet50": [44, 55], "output": [44, 48, 49, 53, 54, 55], "deep": [45, 47], "learn": [45, 47], "zoo": 45, "workload": 45, "start": [45, 56], "bert": [46, 51, 52], "larg": [46, 52], "dataset": [46, 50, 54], "command": [46, 53, 54, 55], "finetun": 46, "v3": 47, "xeon": 47, "disabl": 47, "constant": 47, "fold": 47, "function": 47, "boost": 47, "matrix": 47, "startup": [47, 51], "jupyt": [47, 49, 51], "notebook": [47, 49, 51], "licens": [47, 56], "quick_exampl": 48, "py": 48, "note": 48, "stabl": 49, "diffus": 49, "text2imag": 49, "fp16": 49, "accuraci": [49, 52], "3d": 50, "unet": 50, "w": 50, "o": 50, "horovod": [50, 53, 54], "medic": 50, "segment": 50, "tile": 50, "classifi": [51, 52], "text": [51, 52], "fp8": 52, "fine": 52, "bf16": 52, "distribut": [53, 54], "depend": [53, 54], "repo": [53, 54], "patch": 53, "appli": 53, "devic": 53, "count": 53, "inform": 54, "paramet": [54, 55], "hvd": 54, "other": 55, "pythonpath": 55, "without": 55, "With": 55, "shard": 55, "further": 55, "channel": 56, "compat": 56, "weekli": 56}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 57}, "alltitles": {"Contributor Covenant Code of Conduct": [[0, "contributor-covenant-code-of-conduct"]], "Our Pledge": [[0, "our-pledge"]], "Our Standards": [[0, "our-standards"]], "Enforcement Responsibilities": [[0, "enforcement-responsibilities"]], "Scope": [[0, "scope"]], "Enforcement": [[0, "enforcement"]], "Enforcement Guidelines": [[0, "enforcement-guidelines"]], "1. Correction": [[0, "correction"]], "2. Warning": [[0, "warning"]], "3. Temporary Ban": [[0, "temporary-ban"]], "4. Permanent Ban": [[0, "permanent-ban"]], "Attribution": [[0, "attribution"]], "Security Policy": [[1, "security-policy"]], "Report a Vulnerability": [[1, "report-a-vulnerability"]], "Intel\u00ae Extension for TensorFlow* Docker Container Guide": [[2, "intel-extension-for-tensorflow-docker-container-guide"]], "Description": [[2, "description"], [3, "description"]], "Binaries Preparation": [[2, "binaries-preparation"]], "Usage of Docker Container": [[2, "usage-of-docker-container"]], "I. Customize Build Script": [[2, "i-customize-build-script"]], "II. Build the Container": [[2, "ii-build-the-container"], [3, "ii-build-the-container"]], "III. Running the Container": [[2, "iii-running-the-container"]], "Verify That Intel GPU is Accessible From TensorFlow": [[2, "verify-that-intel-gpu-is-accessible-from-tensorflow"]], "Intel\u00ae Extension for TensorFlow* Serving - Docker Container Guide": [[3, "intel-extension-for-tensorflow-serving-docker-container-guide"]], "Build the Docker Image": [[3, "build-the-docker-image"]], "I. Binaries Preparation": [[3, "i-binaries-preparation"]], "Running the Container": [[3, "running-the-container"]], "Welcome to Intel\u00ae Extension for TensorFlow* documentation": [[4, "welcome-to-intel-extension-for-tensorflow-documentation"]], "Documentation": [[4, "documentation"], [56, "documentation"]], "Highlights": [[4, "highlights"]], "Online Documentation Build Guide": [[5, "online-documentation-build-guide"]], "Introduction": [[5, "introduction"], [13, "introduction"], [23, "introduction"], [40, "introduction"], [43, "introduction"], [45, "introduction"], [46, "introduction"], [47, "introduction"], [49, "introduction"], [50, "introduction"], [51, "introduction"], [52, "introduction"], [55, "introduction"]], "Update latest Version": [[5, "update-latest-version"]], "Create Release Version": [[5, "create-release-version"]], "Build to Local Test": [[5, "build-to-local-test"]], "Welcome to Intel \u00ae Extension for TensorFlow* documentation!": [[6, "welcome-to-intel-extension-for-tensorflow-documentation"], [57, "welcome-to-intel-extension-for-tensorflow-documentation"]], "Contributing guidelines": [[7, "contributing-guidelines"]], "Contributing to Intel\u00ae Extension for TensorFlow*": [[7, "contributing-to-intel-extension-for-tensorflow"]], "Developing Intel\u00ae Extension for TensorFlow*": [[7, "developing-intel-extension-for-tensorflow"]], "Tips and Debugging": [[7, "tips-and-debugging"]], "Unit testing": [[7, "unit-testing"]], "Python Unit Testing": [[7, "python-unit-testing"]], "Code style guide": [[7, "code-style-guide"]], "Python coding style": [[7, "python-coding-style"]], "C++ coding style": [[7, "c-coding-style"]], "bazel style guide": [[7, "bazel-style-guide"]], "Documentation style guide": [[7, "documentation-style-guide"]], "Releases": [[8, "releases"]], "Known Issues": [[8, "known-issues"]], "Incompatible Changes": [[8, "incompatible-changes"]], "Directory Tree Structure": [[9, "directory-tree-structure"]], "Extension Design": [[10, "extension-design"]], "Workflow": [[10, "workflow"], [15, "workflow"], [17, "workflow"]], "Resources": [[10, "resources"], [56, "resources"]], "How to write custom op": [[11, "how-to-write-custom-op"]], "1. Prerequisite": [[11, "prerequisite"]], "2. Define the op interface and Register op": [[11, "define-the-op-interface-and-register-op"]], "3. Register the kernels for the op": [[11, "register-the-kernels-for-the-op"]], "4. Implement the kernels": [[11, "implement-the-kernels"]], "6. Add the op to BUILD": [[11, "add-the-op-to-build"]], "7. Use the op in Python": [[11, "use-the-op-in-python"]], "8. Build the package": [[11, "build-the-package"]], "9. Install and Verify": [[11, "install-and-verify"]], "Optimizations Design": [[12, "optimizations-design"]], "oneDNN object cache optimization": [[13, "onednn-object-cache-optimization"]], "Optimization in convolution": [[13, "optimization-in-convolution"]], "Frequently Asked Questions": [[14, "frequently-asked-questions"]], "Troubleshooting": [[14, "troubleshooting"]], "Build from source": [[14, "build-from-source"], [31, "build-from-source"]], "Runtime": [[14, "runtime"]], "INT8 Quantization": [[15, "int8-quantization"], [21, "int8-quantization"]], "Overview": [[15, "overview"], [17, "overview"], [19, "overview"], [20, "overview"], [27, "overview"], [28, "overview"], [29, "overview"], [30, "overview"], [34, "overview"]], "Usage": [[15, "usage"], [17, "usage"], [18, "usage"], [18, "id1"], [19, "usage"], [22, "usage"], [26, "usage"]], "OpenXLA Support on GPU via PJRT": [[16, "openxla-support-on-gpu-via-pjrt"]], "1. Overview": [[16, "overview"]], "2. Hardware and Software Requirement": [[16, "hardware-and-software-requirement"]], "Hardware Requirements": [[16, "hardware-requirements"], [32, "hardware-requirements"], [34, "hardware-requirements"], [36, "hardware-requirements"], [37, "hardware-requirements"], [46, "hardware-requirements"], [49, "hardware-requirements"], [50, "hardware-requirements"], [51, "hardware-requirements"], [52, "hardware-requirements"], [55, "hardware-requirements"]], "Software Requirements": [[16, "software-requirements"], [32, "software-requirements"], [36, "software-requirements"], [37, "software-requirements"]], "Install GPU Drivers": [[16, "install-gpu-drivers"], [37, "install-gpu-drivers"]], "3. Build Library for JAX": [[16, "build-library-for-jax"]], "4. Run JAX Example": [[16, "run-jax-example"]], "XPUAutoShard on GPU [Experimental]": [[17, "xpuautoshard-on-gpu-experimental"], [21, "xpuautoshard-on-gpu-experimental"]], "Code Structure": [[17, "code-structure"]], "Python API": [[17, "python-api"], [18, "python-api"], [43, "python-api"], [55, "python-api"]], "Dump the graph": [[17, "dump-the-graph"]], "Examples": [[17, "examples"], [28, "examples"], [39, "examples"], [42, "examples"]], "Tune Advanced Auto Mixed Precision": [[18, "tune-advanced-auto-mixed-precision"]], "Background": [[18, "background"], [40, "background"], [47, "background"]], "Numeric Stability": [[18, "numeric-stability"]], "Configuration List": [[18, "configuration-list"]], "Example of Mix Precision by List": [[18, "example-of-mix-precision-by-list"]], "Rule to Improve Performance by the Configuration List": [[18, "rule-to-improve-performance-by-the-configuration-list"]], "Python API Attribute & Environment Variable": [[18, "python-api-attribute-environment-variable"]], "Environment Variable Difference with Stock TensorFlow": [[18, "environment-variable-difference-with-stock-tensorflow"]], "Example": [[18, "example"], [19, "example"], [35, "example"]], "End-to-end Example": [[18, "end-to-end-example"]], "Tuning Performance Example on MobileNet": [[18, "tuning-performance-example-on-mobilenet"]], "Advanced Auto Mixed Precision": [[19, "advanced-auto-mixed-precision"], [19, "id1"]], "Advanced AMP vs. Stock TensorFlow AMP": [[19, "advanced-amp-vs-stock-tensorflow-amp"]], "Data Type": [[19, "data-type"]], "Graph Optimizer": [[19, "graph-optimizer"]], "Feature": [[19, "feature"]], "Tune Advanced AMP Manually": [[19, "tune-advanced-amp-manually"]], "Quick Training Example": [[19, "quick-training-example"]], "Setup": [[19, "setup"], [27, "setup"]], "Enable Advanced AMP": [[19, "enable-advanced-amp"]], "Original Code": [[19, "original-code"]], "Notice": [[19, "notice"]], "Tips": [[19, "tips"]], "Log and Save Optimized Graph": [[19, "log-and-save-optimized-graph"]], "Custom Operation": [[19, "custom-operation"]], "Environment Variables": [[20, "environment-variables"], [28, "environment-variables"]], "Configuration via Environment Variables": [[20, "configuration-via-environment-variables"]], "ITEX_VERBOSE level definition": [[20, "itex-verbose-level-definition"]], "Environment Variables with Python APIs": [[20, "environment-variables-with-python-apis"]], "Backend and Config Protocol": [[20, "backend-and-config-protocol"]], "Auto Mixed Precision Options": [[20, "auto-mixed-precision-options"]], "Features": [[21, "features"]], "Operator Optimization": [[21, "operator-optimization"]], "Graph Optimization": [[21, "graph-optimization"]], "Advanced Auto Mixed Precision (AMP)": [[21, "advanced-auto-mixed-precision-amp"]], "Ease-of-use Python API": [[21, "ease-of-use-python-api"]], "GPU Profiler": [[21, "gpu-profiler"], [22, "gpu-profiler"]], "CPU Launcher [Experimental]": [[21, "cpu-launcher-experimental"]], "OpenXLA Support on GPU [Experimental]": [[21, "openxla-support-on-gpu-experimental"]], "TensorFlow Serving": [[21, "tensorflow-serving"]], "Example:": [[22, "example"]], "FAQ": [[22, "faq"], [43, "faq"], [44, "faq"], [46, "faq"], [49, "faq"], [50, "faq"], [51, "faq"], [52, "faq"]], "Infrastructure": [[23, "infrastructure"]], "Architecture": [[23, "architecture"]], "TensorFlow Public API": [[23, "tensorflow-public-api"]], "Custom API": [[23, "custom-api"]], "Intel Advanced Feature and Extension Management": [[23, "intel-advanced-feature-and-extension-management"]], "XPU Engine": [[23, "xpu-engine"]], "Graph fusion": [[24, "graph-fusion"]], "Basic fusion": [[24, "basic-fusion"]], "Mixed data type fusion": [[24, "mixed-data-type-fusion"]], "Implementation Details": [[24, "implementation-details"]], "Generic layout optimizer": [[24, "generic-layout-optimizer"]], "Customized Operators": [[25, "customized-operators"]], "itex.ops.AdamWithWeightDecayOptimizer": [[25, "itex-ops-adamwithweightdecayoptimizer"]], "itex.ops.LayerNormalization": [[25, "itex-ops-layernormalization"]], "itex.ops.GroupNormalization": [[25, "itex-ops-groupnormalization"]], "itex.ops.gelu": [[25, "itex-ops-gelu"]], "itex.ops.ItexLSTM": [[25, "itex-ops-itexlstm"]], "Operators Override": [[26, "operators-override"]], "Layer Normalization": [[26, "layer-normalization"]], "Dense Layer": [[26, "dense-layer"]], "Gelu Activation": [[26, "gelu-activation"]], "Instance Normalization": [[26, "instance-normalization"]], "LSTM": [[26, "lstm"]], "Keras Mixed Precision": [[27, "keras-mixed-precision"]], "How to identify different hardware types?": [[27, "how-to-identify-different-hardware-types"]], "Setting the dtype policy": [[27, "setting-the-dtype-policy"]], "Building the model": [[27, "building-the-model"]], "Training the model with Model.fit": [[27, "training-the-model-with-model-fit"]], "Loss scaling": [[27, "loss-scaling"]], "Underflow and Overflow": [[27, "underflow-and-overflow"]], "Loss scaling overview": [[27, "loss-scaling-overview"]], "Training the model with a custom training loop": [[27, "training-the-model-with-a-custom-training-loop"]], "Launch Script User Guide": [[28, "launch-script-user-guide"]], "Common Execution Mode": [[28, "common-execution-mode"]], "Latency mode": [[28, "latency-mode"]], "Throughput mode": [[28, "throughput-mode"]], "Basic Settings": [[28, "basic-settings"]], "Launch Log": [[28, "launch-log"]], "Advanced Settings": [[28, "advanced-settings"]], "Multi-instance": [[28, "multi-instance"]], "NUMA Control": [[28, "numa-control"]], "Memory Allocator": [[28, "memory-allocator"], [29, "memory-allocator"]], "Single instance for inference": [[28, "single-instance-for-inference"]], "I. Use all physical cores": [[28, "i-use-all-physical-cores"]], "II. Use all cores including logical cores": [[28, "ii-use-all-cores-including-logical-cores"]], "III. Use physical cores on one node": [[28, "iii-use-physical-cores-on-one-node"]], "IV. Use your designated number of cores": [[28, "iv-use-your-designated-number-of-cores"]], "Multiple instances for inference": [[28, "multiple-instances-for-inference"]], "V. Throughput mode": [[28, "v-throughput-mode"]], "VI. Latency mode": [[28, "vi-latency-mode"]], "VII. Your designated number of instances": [[28, "vii-your-designated-number-of-instances"]], "VIII. Your designated number of instances and instance index": [[28, "viii-your-designated-number-of-instances-and-instance-index"]], "Set environment variables for inference": [[28, "set-environment-variables-for-inference"]], "IX. Set environment variable TF_NUM_INTRAOP_THREADS": [[28, "ix-set-environment-variable-tf-num-intraop-threads"]], "X. Set environment variable TF_NUM_INTEROP_THREADS": [[28, "x-set-environment-variable-tf-num-interop-threads"]], "Usage of TCMalloc/Jemalloc/Default memory allocator": [[28, "usage-of-tcmalloc-jemalloc-default-memory-allocator"]], "Jemalloc": [[28, "jemalloc"]], "TCMalloc": [[28, "tcmalloc"], [29, "tcmalloc"]], "Default memory allocator": [[28, "default-memory-allocator"]], "Practice Guide": [[29, "practice-guide"]], "Table of Contents": [[29, "table-of-contents"]], "CPU Practice Guide": [[29, "cpu-practice-guide"]], "Hardware Configuration": [[29, "hardware-configuration"]], "Non-Uniform Memory Access (NUMA)": [[29, "non-uniform-memory-access-numa"]], "Software Configuration": [[29, "software-configuration"]], "Memory Layout format": [[29, "memory-layout-format"]], "Numactl": [[29, "numactl"]], "OpenMP": [[29, "openmp"]], "OMP_NUM_THREADS": [[29, "omp-num-threads"]], "GNU OpenMP": [[29, "gnu-openmp"]], "Intel OpenMP": [[29, "intel-openmp"]], "GPU Practice Guide": [[29, "gpu-practice-guide"]], "Python APIs": [[30, "python-apis"]], "Prerequisite: import intel_extension_for_tensorflow as itex": [[30, "prerequisite-import-intel-extension-for-tensorflow-as-itex"]], "Python APIs and Environment Variable Names": [[30, "python-apis-and-environment-variable-names"]], "Python APIs and preserved environment variable Names": [[30, "python-apis-and-preserved-environment-variable-names"]], "Intel\u00ae Extension for TensorFlow* Config Protocol": [[30, "intel-extension-for-tensorflow-config-protocol"]], "itex.ConfigProto": [[30, "itex-configproto"]], "itex.GPUOptions": [[30, "itex-gpuoptions"]], "itex.GraphOptions": [[30, "itex-graphoptions"]], "itex.AutoMixedPrecisionOptions": [[30, "itex-automixedprecisionoptions"]], "itex.ShardingConfig": [[30, "itex-shardingconfig"]], "itex.DebugOptions": [[30, "itex-debugoptions"]], "itex.set_config": [[30, "itex-set-config"]], "itex.get_config": [[30, "itex-get-config"]], "itex operators": [[30, "itex-operators"]], "itex ops override": [[30, "itex-ops-override"]], "itex graph": [[30, "itex-graph"]], "itex version": [[30, "itex-version"]], "Install TensorFlow Serving with Intel\u00ae Extension for TensorFlow*": [[31, "install-tensorflow-serving-with-intel-extension-for-tensorflow"]], "Install Model Server": [[31, "install-model-server"]], "Install using Docker": [[31, "install-using-docker"]], "1. Build Intel\u00ae Extension for TensorFlow* C++ library": [[31, "build-intel-extension-for-tensorflow-c-library"]], "2. Build TensorFlow Serving": [[31, "build-tensorflow-serving"]], "Build Docker image from Dockerfile": [[31, "build-docker-image-from-dockerfile"]], "Run sample": [[31, "run-sample"]], "Experimental: Intel\u00ae Arc\u2122 A-Series GPU Software Installation": [[32, "experimental-intel-arc-a-series-gpu-software-installation"]], "Experimental Release": [[32, "experimental-release"]], "Windows Subsystem for Linux 2 (WSL2)": [[32, "windows-subsystem-for-linux-2-wsl2"], [32, "id1"]], "Native Linux Running Directly on Hardware": [[32, "native-linux-running-directly-on-hardware"], [32, "id2"]], "Step-By-Step Instructions": [[32, "step-by-step-instructions"]], "1. Install GPU Drivers": [[32, "install-gpu-drivers"]], "Windows GPU Drivers": [[32, "windows-gpu-drivers"]], "Ubuntu Linux Installed in WSL2": [[32, "ubuntu-linux-installed-in-wsl2"]], "2. Install TensorFlow* via PyPI Wheel in Linux": [[32, "install-tensorflow-via-pypi-wheel-in-linux"]], "Install TensorFlow": [[32, "install-tensorflow"], [34, "install-tensorflow"], [36, "install-tensorflow"], [37, "install-tensorflow"]], "Virtual environment install": [[32, "virtual-environment-install"], [36, "virtual-environment-install"], [37, "virtual-environment-install"]], "System environment install": [[32, "system-environment-install"], [36, "system-environment-install"], [37, "system-environment-install"]], "3. Install Intel\u00ae Extension for TensorFlow*": [[32, "install-intel-extension-for-tensorflow"]], "4. Verify the Installation": [[32, "verify-the-installation"]], "Optional: Install Full Intel\u00ae oneAPI": [[32, "optional-install-full-intel-oneapi"]], "Setup environment variables": [[32, "setup-environment-variables"], [37, "setup-environment-variables"]], "Conda Environment Installation Instructions": [[33, "conda-environment-installation-instructions"]], "Preconditions": [[33, "preconditions"]], "Step by step instructions:": [[33, "step-by-step-instructions"]], "Requirements": [[34, "requirements"]], "Common Requirements": [[34, "common-requirements"]], "Install Bazel": [[34, "install-bazel"]], "Download Source Code": [[34, "download-source-code"]], "Create a Conda Environment": [[34, "create-a-conda-environment"]], "Extra Requirements for XPU/GPU Build Only": [[34, "extra-requirements-for-xpu-gpu-build-only"]], "Install Intel GPU Driver": [[34, "install-intel-gpu-driver"]], "Install oneAPI Base Toolkit": [[34, "install-oneapi-base-toolkit"]], "Build Intel\u00ae Extension for TensorFlow* PyPI": [[34, "build-intel-extension-for-tensorflow-pypi"]], "Configure": [[34, "configure"]], "Configure For CPU": [[34, "configure-for-cpu"]], "Configure For GPU/XPU": [[34, "configure-for-gpu-xpu"]], "Build Source Code": [[34, "build-source-code"]], "Additional": [[34, "additional"]], "Configure Example for CPU": [[34, "configure-example-for-cpu"]], "Configure Example For GPU or XPU": [[34, "configure-example-for-gpu-or-xpu"]], "Intel\u00ae Extension for TensorFlow* for C++": [[35, "intel-extension-for-tensorflow-for-c"]], "Prepare": [[35, "prepare"], [41, "prepare"]], "Configure the build": [[35, "configure-the-build"]], "Build the CC library": [[35, "build-the-cc-library"]], "GPU support": [[35, "gpu-support"]], "CPU support": [[35, "cpu-support"]], "Prepare Tensorflow* CC library and header files": [[35, "prepare-tensorflow-cc-library-and-header-files"]], "Option 1: Extract from Tensorflow* python package (Recommended)": [[35, "option-1-extract-from-tensorflow-python-package-recommended"]], "Option 2: Build from TensorFlow* source code": [[35, "option-2-build-from-tensorflow-source-code"]], "Integrate the CC library": [[35, "integrate-the-cc-library"]], "Linker": [[35, "linker"]], "Load": [[35, "load"]], "Build and run": [[35, "build-and-run"]], "Intel CPU Software Installation": [[36, "intel-cpu-software-installation"]], "Install via Docker container": [[36, "install-via-docker-container"], [37, "install-via-docker-container"]], "Build Docker container from Dockerfile": [[36, "build-docker-container-from-dockerfile"], [37, "build-docker-container-from-dockerfile"]], "Get docker container from dockerhub": [[36, "get-docker-container-from-dockerhub"], [37, "get-docker-container-from-dockerhub"]], "Install via PyPI wheel in bare metal": [[36, "install-via-pypi-wheel-in-bare-metal"], [37, "install-via-pypi-wheel-in-bare-metal"]], "Install Intel\u00ae Extension for TensorFlow*": [[36, "install-intel-extension-for-tensorflow"], [37, "install-intel-extension-for-tensorflow"]], "Verify the Installation": [[36, "verify-the-installation"], [37, "verify-the-installation"]], "Intel XPU Software Installation": [[37, "intel-xpu-software-installation"]], "Install oneAPI Base Toolkit Packages": [[37, "install-oneapi-base-toolkit-packages"]], "Check the Environment for XPU": [[37, "check-the-environment-for-xpu"]], "XPU for CPU only platform": [[37, "xpu-for-cpu-only-platform"]], "Installation Guide": [[38, "installation-guide"]], "Accelerate AlexNet by Quantization with Intel\u00ae Extension for Tensorflow*": [[40, "accelerate-alexnet-by-quantization-with-intel-extension-for-tensorflow"]], "Hardware Environment": [[40, "hardware-environment"], [47, "hardware-environment"]], "GPU": [[40, "gpu"], [47, "gpu"]], "Local Server": [[40, "local-server"], [47, "local-server"]], "Intel\u00ae DevCloud": [[40, "intel-devcloud"], [47, "intel-devcloud"]], "Running Environment": [[40, "running-environment"], [47, "running-environment"]], "Set up Base Running Environment": [[40, "set-up-base-running-environment"]], "Set up Intel\u00ae Extension for Tensorflow* for GPU": [[40, "set-up-intel-extension-for-tensorflow-for-gpu"]], "Execute": [[40, "execute"], [51, "execute"]], "Common Guide for Running": [[41, "common-guide-for-running"]], "Intel GPU Driver": [[41, "intel-gpu-driver"]], "Intel\u00ae oneAPI Base Toolkit": [[41, "intel-oneapi-base-toolkit"]], "Setup Running Environment": [[41, "setup-running-environment"], [43, "setup-running-environment"], [44, "setup-running-environment"], [46, "setup-running-environment"], [49, "setup-running-environment"], [50, "setup-running-environment"], [51, "setup-running-environment"], [52, "setup-running-environment"], [53, "setup-running-environment"]], "Running": [[41, "running"]], "Enable oneAPI Running Environment": [[41, "enable-oneapi-running-environment"]], "Enable Virtual Running Environment": [[41, "enable-virtual-running-environment"]], "Run Script": [[41, "run-script"]], "Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision on Intel CPU and GPU via Docker Container or Bare Metal": [[43, "speed-up-inference-of-inception-v4-by-advanced-automatic-mixed-precision-on-intel-cpu-and-gpu-via-docker-container-or-bare-metal"]], "Step": [[43, "step"]], "Hardware Requirement": [[43, "hardware-requirement"], [56, "hardware-requirement"]], "Prepare for GPU (Skip this Step for CPU)": [[43, "prepare-for-gpu-skip-this-step-for-cpu"]], "Clone the Repository": [[43, "clone-the-repository"]], "Download the Pretrained-model": [[43, "download-the-pretrained-model"]], "Enable Running Environment": [[43, "enable-running-environment"], [44, "enable-running-environment"], [46, "enable-running-environment"], [49, "enable-running-environment"], [50, "enable-running-environment"], [51, "enable-running-environment"], [52, "enable-running-environment"], [55, "enable-running-environment"]], "Execute Testing and Comparing the Performance of FP32 and Advanced AMP on CPU and GPU in Docker Container or Bare Metal": [[43, "execute-testing-and-comparing-the-performance-of-fp32-and-advanced-amp-on-cpu-and-gpu-in-docker-container-or-bare-metal"]], "Environment Variable Configuration": [[43, "environment-variable-configuration"]], "Result": [[43, "result"]], "Advanced: Enable Advanced AMP Method": [[43, "advanced-enable-advanced-amp-method"]], "ResNet50 Inference on Intel CPU and GPU": [[44, "resnet50-inference-on-intel-cpu-and-gpu"]], "Prerequisites": [[44, "prerequisites"], [46, "prerequisites"], [49, "prerequisites"], [50, "prerequisites"], [51, "prerequisites"], [52, "prerequisites"], [55, "prerequisites"]], "Prepare for GPU (Skip this step for CPU)": [[44, "prepare-for-gpu-skip-this-step-for-cpu"], [49, "prepare-for-gpu-skip-this-step-for-cpu"], [51, "prepare-for-gpu-skip-this-step-for-cpu"]], "Executes the Example with Python API": [[44, "executes-the-example-with-python-api"], [49, "executes-the-example-with-python-api"], [55, "executes-the-example-with-python-api"]], "Example Output": [[44, "example-output"], [48, "example-output"], [49, "example-output"], [55, "example-output"]], "Accelerate Deep Learning Training and Inference for Model Zoo Workloads on Intel GPU": [[45, "accelerate-deep-learning-training-and-inference-for-model-zoo-workloads-on-intel-gpu"]], "Quick Start Guide": [[45, "quick-start-guide"]], "Run Models in the Docker Container": [[45, "run-models-in-the-docker-container"]], "Run Models on Bare Metal": [[45, "run-models-on-bare-metal"]], "Accelerate BERT-Large Pretraining on Intel GPU": [[46, "accelerate-bert-large-pretraining-on-intel-gpu"]], "Model Code change": [[46, "model-code-change"], [49, "model-code-change"], [50, "model-code-change"], [52, "model-code-change"]], "Prepare for GPU": [[46, "prepare-for-gpu"], [50, "prepare-for-gpu"], [52, "prepare-for-gpu"], [55, "prepare-for-gpu"]], "Prepare Dataset": [[46, "prepare-dataset"], [50, "prepare-dataset"]], "Execute the Example": [[46, "execute-the-example"], [50, "execute-the-example"], [52, "execute-the-example"]], "Pretraining Command": [[46, "pretraining-command"]], "Finetune Command": [[46, "finetune-command"]], "Quantize Inception V3 by Intel\u00ae Extension for Tensorflow* on Intel\u00ae Xeon\u00ae": [[47, "quantize-inception-v3-by-intel-extension-for-tensorflow-on-intel-xeon"]], "Configuration": [[47, "configuration"]], "Intel\u00ae Extension for Tensorflow* Version": [[47, "intel-extension-for-tensorflow-version"]], "Enable oneDNN Graph": [[47, "enable-onednn-graph"]], "Disable Constant Folding Function": [[47, "disable-constant-folding-function"]], "CPU": [[47, "cpu"]], "Check Intel\u00ae Deep Learning Boost": [[47, "check-intel-deep-learning-boost"]], "Check Intel\u00ae Advanced Matrix Extensions": [[47, "check-intel-advanced-matrix-extensions"]], "Startup Jupyter Notebook": [[47, "startup-jupyter-notebook"], [51, "startup-jupyter-notebook"]], "License": [[47, "license"], [56, "license"]], "Quick Example on Intel CPU and GPU": [[48, "quick-example-on-intel-cpu-and-gpu"]], "Installation": [[48, "installation"]], "Code": [[48, "code"]], "quick_example.py": [[48, "quick-example-py"]], "Notes": [[48, "notes"]], "Stable Diffusion Inference for Text2Image on Intel GPU": [[49, "stable-diffusion-inference-for-text2image-on-intel-gpu"]], "Running the Jupyter Notebook": [[49, "running-the-jupyter-notebook"]], "FP32 Inference": [[49, "fp32-inference"]], "FP16 Inference": [[49, "fp16-inference"]], "Accuracy": [[49, "accuracy"], [52, "accuracy"]], "Accelerate 3D-Unet Training w/o horovod for medical image segmentation on Intel GPU": [[50, "accelerate-3d-unet-training-w-o-horovod-for-medical-image-segmentation-on-intel-gpu"]], "Single Tile": [[50, "single-tile"]], "Multi-tile with horovod": [[50, "multi-tile-with-horovod"]], "BERT Training for Classifying Text on Intel CPU and GPU": [[51, "bert-training-for-classifying-text-on-intel-cpu-and-gpu"]], "Download Jupyter Code:": [[51, "download-jupyter-code"]], "FP8 BERT-Large Fine-tuning for Classifying Text on Intel GPU": [[52, "fp8-bert-large-fine-tuning-for-classifying-text-on-intel-gpu"]], "BF16 + FP8 Fine-tuning": [[52, "bf16-fp8-fine-tuning"]], "Distributed Training Example with Intel\u00ae Optimization for Horovod* on Intel\u00ae GPU": [[53, "distributed-training-example-with-intel-optimization-for-horovod-on-intel-gpu"]], "Dependency": [[53, "dependency"], [54, "dependency"]], "Create Virtual Environment": [[53, "create-virtual-environment"]], "Install": [[53, "install"], [56, "install"]], "Prepare Example Code": [[53, "prepare-example-code"]], "Clone Horovod Repo": [[53, "clone-horovod-repo"]], "Download Patch": [[53, "download-patch"]], "Apply Patch for Intel GPU": [[53, "apply-patch-for-intel-gpu"]], "Execution": [[53, "execution"], [54, "execution"]], "Enable oneAPI": [[53, "enable-oneapi"]], "Device Count": [[53, "device-count"]], "Running Command": [[53, "running-command"]], "Output": [[53, "output"]], "Distributed Training Example with Intel\u00ae Optimization for Horovod*": [[54, "distributed-training-example-with-intel-optimization-for-horovod"]], "Model Information": [[54, "model-information"]], "Model examples preparation": [[54, "model-examples-preparation"]], "Model Repo": [[54, "model-repo"]], "Download Dataset": [[54, "download-dataset"]], "Set Model Parameters": [[54, "set-model-parameters"]], "HVD command": [[54, "hvd-command"]], "OUTPUT": [[54, "output"]], "Performance Data": [[54, "performance-data"]], "Accelerate ResNet50 Training by XPUAutoShard on Intel GPU": [[55, "accelerate-resnet50-training-by-xpuautoshard-on-intel-gpu"]], "Prepare the Codes": [[55, "prepare-the-codes"]], "Install Other Required Packages": [[55, "install-other-required-packages"]], "Setup PYTHONPATH": [[55, "setup-pythonpath"]], "Without XPUAutoShard": [[55, "without-xpuautoshard"]], "With XPUAutoShard": [[55, "with-xpuautoshard"]], "Sharding Parameters Setting": [[55, "sharding-parameters-setting"]], "Further Settings": [[55, "further-settings"]], "Executing Command": [[55, "executing-command"]], "Quick Get Started*": [[56, "quick-get-started"]], "Software Requirement": [[56, "software-requirement"]], "Installation Channel:": [[56, "installation-channel"]], "Compatibility Table": [[56, "compatibility-table"]], "Install for XPU": [[56, "install-for-xpu"]], "Install for CPU": [[56, "install-for-cpu"]], "Install for weekly binaries": [[56, "install-for-weekly-binaries"]], "Install for GPU weekly": [[56, "install-for-gpu-weekly"]], "Contributing": [[56, "contributing"]], "Support": [[56, "support"]], "Security": [[56, "security"]]}, "indexentries": {}})
\ No newline at end of file