-
-

Distributed Training Example with Intel® Optimization for Horovod*

-
-

Model Information

- - - - - - - - - - - - - - - - - - - -
Use CaseFrameworkModel RepoBranch Commit TagOptional Patch
TrainingTensorFlowTensorflow-Modelsv2.8.0itex.yaml
itex_dummy.yaml
hvd_support_light.patch
or hvd_support.patch

-
-

Dependency

- -
pip install gin gin-config tensorflow-addons tensorflow-model-optimization tensorflow-datasets
-
-
-
-
-

Model examples preparation

-
-

Model Repo

-
WORKSPACE=xxxx # set your workspace folder
-cd $WORKSPACE
-git clone -b v2.8.0 https://github.com/tensorflow/models.git tensorflow-models
-cd tensorflow-models
-git apply path/to/hvd_support_light.patch  # or path/to/hvd_support.patch
-
-
-

hvd_support_light.patch is the minimum change.

-
    -
  • hvd.init() is Horovod initialization, including resource allocation.

  • -
  • tf.config.experimental.set_memory_growth(): If memory growth is enabled, the runtime initialization will not allocate all memory on the device.

  • -
  • tf.config.experimental.set_visible_devices(): Set the list of visible devices.

  • -
  • strategy_scope: Remove native distributed.

  • -
  • hvd.DistributedOptimizer(): use Horovod distributed optimizer.

  • -
  • dataset.shard(): Multiple workers run the same code but with different data. Dataset is split equally between different index workers.

  • -
-

hvd_support.patch adds LARS optimizer paper

-
-
-

Download Dataset

-

Download imagenet dataset from https://image-net.org/download-images.php

-

Note Only for non-commercial research and/or educational purposes

-
-
-
-

Execution

-
-

Set Model Parameters

-

Export those parameters to script or environment.

-
export PYTHONPATH=${WORKSPACE}/tensorflow-models
-MODEL_DIR=${WORKSPACE}/output
-DATA_DIR=${WORKSPACE}/imagenet_data/imagenet
-
-CONFIG_FILE=path/to/itex.yaml
-NUMBER_OF_PROCESS=2
-PROCESS_PER_NODE=2
-
-
-
    -
  • Download itex.yaml or itex_dummy.yaml and set one of them as CONFIG_FILE, then model would correspondingly run with real data or dummy data. Default value is itex.yaml.

  • -
  • Set NUMBER_OF_PROCESS and PROCESS_PER_NODE according to hvd rank number you need. Default value is a 2 rank task.

  • -
-
-
-

HVD command

-
if [ ! -d "$MODEL_DIR" ]; then
-    mkdir -p $MODEL_DIR
-else
-    rm -rf $MODEL_DIR && mkdir -p $MODEL_DIR                         
-fi
-
-mpirun -np $NUMBER_OF_PROCESS -ppn $PROCESS_PER_NODE --prepend-rank \
-python ${PYTHONPATH}/official/vision/image_classification/classifier_trainer.py \
---mode=train_and_eval \
---model_type=resnet \
---dataset=imagenet \
---model_dir=$MODEL_DIR \
---data_dir=$DATA_DIR \
---config_file=$CONFIG_FILE
-
-
-
-
-
-

OUTPUT

-
-

Performance Data

-
[1] I0909 03:33:23.323099 140645511436096 keras_utils.py:145] TimeHistory: xxxx seconds, xxxx examples/second between steps 0 and 100
-[0] I0909 03:33:23.324534 140611700504384 keras_utils.py:145] TimeHistory: xxxx seconds, xxxx examples/second between steps 0 and 100
-[0] I0909 03:33:43.037004 140611700504384 keras_utils.py:145] TimeHistory: xxxx seconds, xxxx examples/second between steps 100 and 200
-[1] I0909 03:33:43.037142 140645511436096 keras_utils.py:145] TimeHistory: xxxx seconds, xxxx examples/second between steps 100 and 200
-[1] I0909 03:34:03.213994 140645511436096 keras_utils.py:145] TimeHistory: xxxx seconds, xxxx examples/second between steps 200 and 300
-[0] I0909 03:34:03.214127 140611700504384 keras_utils.py:145] TimeHistory: xxxx seconds, xxxx examples/second between steps 200 and 300
-
-
-
-
+
+

Refer to train_resnet50

@@ -220,7 +105,7 @@

Performance DataSphinx using a theme provided by Read the Docs. - +

diff --git a/latest/examples/train_maskrcnn/README.html b/latest/examples/train_maskrcnn/README.html index ad8f61c7f..631c43d9a 100644 --- a/latest/examples/train_maskrcnn/README.html +++ b/latest/examples/train_maskrcnn/README.html @@ -4,7 +4,7 @@ - Accelerate Mask R-CNN Training w/o horovod on Intel GPU — Intel® Extension for TensorFlow* 0.1.dev1+g864d43e documentation + Accelerate Mask R-CNN Training w/o horovod on Intel GPU — Intel® Extension for TensorFlow* 0.1.dev1+g2198162 documentation @@ -233,7 +233,7 @@

FAQSphinx using a theme provided by Read the Docs. - +

diff --git a/latest/examples/train_resnet50/README.html b/latest/examples/train_resnet50/README.html index 97e83dc70..c29398dfa 100644 --- a/latest/examples/train_resnet50/README.html +++ b/latest/examples/train_resnet50/README.html @@ -4,7 +4,7 @@ - Resnet50 train on Intel GPU — Intel® Extension for TensorFlow* 0.1.dev1+g864d43e documentation + Resnet50 train on Intel GPU — Intel® Extension for TensorFlow* 0.1.dev1+g2198162 documentation @@ -249,7 +249,7 @@

Example Output with hvdSphinx using a theme provided by Read the Docs. - +

diff --git a/latest/examples/train_resnet50_with_autoshard/README.html b/latest/examples/train_resnet50_with_autoshard/README.html index da7172f9c..cff4c7691 100644 --- a/latest/examples/train_resnet50_with_autoshard/README.html +++ b/latest/examples/train_resnet50_with_autoshard/README.html @@ -4,7 +4,7 @@ - Accelerate ResNet50 Training by XPUAutoShard on Intel GPU — Intel® Extension for TensorFlow* 0.1.dev1+g864d43e documentation + Accelerate ResNet50 Training by XPUAutoShard on Intel GPU — Intel® Extension for TensorFlow* 0.1.dev1+g2198162 documentation @@ -275,7 +275,7 @@

Example OutputSphinx using a theme provided by Read the Docs. - +

diff --git a/latest/genindex.html b/latest/genindex.html index ed516fe36..e8488c994 100644 --- a/latest/genindex.html +++ b/latest/genindex.html @@ -3,7 +3,7 @@ - Index — Intel® Extension for TensorFlow* 0.1.dev1+g864d43e documentation + Index — Intel® Extension for TensorFlow* 0.1.dev1+g2198162 documentation @@ -106,7 +106,7 @@

Index

Built with Sphinx using a theme provided by Read the Docs. - +

diff --git a/latest/get_started.html b/latest/get_started.html index 5a79e1cb3..dba6c99c3 100644 --- a/latest/get_started.html +++ b/latest/get_started.html @@ -4,7 +4,7 @@ - Quick Get Started* — Intel® Extension for TensorFlow* 0.1.dev1+g864d43e documentation + Quick Get Started* — Intel® Extension for TensorFlow* 0.1.dev1+g2198162 documentation @@ -310,7 +310,7 @@

LicenseSphinx using a theme provided by Read the Docs. - +

diff --git a/latest/objects.inv b/latest/objects.inv index 571d9b77f..19bd6427e 100644 Binary files a/latest/objects.inv and b/latest/objects.inv differ diff --git a/latest/search.html b/latest/search.html index 9f547b163..e293e7e8b 100644 --- a/latest/search.html +++ b/latest/search.html @@ -3,7 +3,7 @@ - Search — Intel® Extension for TensorFlow* 0.1.dev1+g864d43e documentation + Search — Intel® Extension for TensorFlow* 0.1.dev1+g2198162 documentation @@ -114,7 +114,7 @@ Built with Sphinx using a theme provided by Read the Docs. - +

diff --git a/latest/searchindex.js b/latest/searchindex.js index 75290fec5..455d8b94e 100644 --- a/latest/searchindex.js +++ b/latest/searchindex.js @@ -1 +1 @@ -Search.setIndex({"docnames": ["CODE_OF_CONDUCT", "SECURITY", "docker/README", "docker/tensorflow-serving/README", "docs/README", "docs/build_docs/docs_build_tips", "docs/build_docs/source/index", "docs/community/contributing", "docs/community/releases", "docs/design/directory_structure", "docs/design/extension_design", "docs/design/how_to_write_custom_op", "docs/design/optimization/README", "docs/design/optimization/oneDNN_object_cache", "docs/guide/FAQ", "docs/guide/INT8_quantization", "docs/guide/OpenXLA_Support_on_GPU", "docs/guide/XPUAutoShard", "docs/guide/aamp_tune", "docs/guide/advanced_auto_mixed_precision", "docs/guide/environment_variables", "docs/guide/features", "docs/guide/how_to_enable_profiler", "docs/guide/infrastructure", "docs/guide/itex_fusion", "docs/guide/itex_ops", "docs/guide/itex_ops_override", "docs/guide/keras_mixed_precision", "docs/guide/launch", "docs/guide/practice_guide", "docs/guide/python_api", "docs/guide/tf_serving_install", "docs/install/experimental/install_for_arc_gpu", "docs/install/experimental/install_for_gpu_conda", "docs/install/how_to_build", "docs/install/install_for_cpp", "docs/install/install_for_cpu", "docs/install/install_for_xpu", "docs/install/installation_guide", "examples/README", "examples/accelerate_alexnet_by_quantization/README", "examples/common_guide_running", "examples/infer_inception_v4_amp/README", "examples/infer_resnet50/README", "examples/model_zoo_example/README", "examples/pretrain_bert/README", "examples/quantize_inception_v3/README", "examples/quick_example", "examples/stable_diffussion_inference/README", "examples/train_3d_unet/README", "examples/train_bert/README", "examples/train_bert_fp8/README", "examples/train_horovod/mnist/README", "examples/train_maskrcnn/README", "examples/train_resnet50/README", "examples/train_resnet50_with_autoshard/README", "get_started", "index"], "filenames": ["CODE_OF_CONDUCT.md", "SECURITY.md", "docker/README.md", "docker/tensorflow-serving/README.md", "docs/README.md", "docs/build_docs/docs_build_tips.md", "docs/build_docs/source/index.rst", "docs/community/contributing.md", "docs/community/releases.md", "docs/design/directory_structure.md", "docs/design/extension_design.md", "docs/design/how_to_write_custom_op.md", "docs/design/optimization/README.md", "docs/design/optimization/oneDNN_object_cache.md", "docs/guide/FAQ.md", "docs/guide/INT8_quantization.md", "docs/guide/OpenXLA_Support_on_GPU.md", "docs/guide/XPUAutoShard.md", "docs/guide/aamp_tune.md", "docs/guide/advanced_auto_mixed_precision.md", "docs/guide/environment_variables.md", "docs/guide/features.rst", "docs/guide/how_to_enable_profiler.md", "docs/guide/infrastructure.md", "docs/guide/itex_fusion.md", "docs/guide/itex_ops.md", "docs/guide/itex_ops_override.md", "docs/guide/keras_mixed_precision.md", "docs/guide/launch.md", "docs/guide/practice_guide.md", "docs/guide/python_api.md", "docs/guide/tf_serving_install.md", "docs/install/experimental/install_for_arc_gpu.md", "docs/install/experimental/install_for_gpu_conda.md", "docs/install/how_to_build.md", "docs/install/install_for_cpp.md", "docs/install/install_for_cpu.md", "docs/install/install_for_xpu.md", "docs/install/installation_guide.rst", "examples/README.md", "examples/accelerate_alexnet_by_quantization/README.md", "examples/common_guide_running.md", "examples/infer_inception_v4_amp/README.md", "examples/infer_resnet50/README.md", "examples/model_zoo_example/README.md", "examples/pretrain_bert/README.md", "examples/quantize_inception_v3/README.md", "examples/quick_example.md", "examples/stable_diffussion_inference/README.md", "examples/train_3d_unet/README.md", "examples/train_bert/README.md", "examples/train_bert_fp8/README.md", "examples/train_horovod/mnist/README.md", "examples/train_maskrcnn/README.md", "examples/train_resnet50/README.md", "examples/train_resnet50_with_autoshard/README.md", "get_started.md", "index.rst"], "titles": ["Contributor Covenant Code of Conduct", "Security Policy", "Intel\u00ae Extension for TensorFlow* Docker Container Guide", "Intel\u00ae Extension for TensorFlow* Serving - Docker Container Guide", "Welcome to Intel\u00ae Extension for TensorFlow* documentation", "Online Documentation Build Guide", "Welcome to Intel \u00ae Extension for TensorFlow* documentation!", "Contributing guidelines", "Releases", "Directory Tree Structure", "Extension Design", "How to write custom op", "Optimizations Design", "oneDNN object cache optimization", "Frequently Asked Questions", "INT8 Quantization", "OpenXLA Support on GPU via PJRT", "XPUAutoShard on GPU [Experimental]", "Tune Advanced Auto Mixed Precision", "Advanced Auto Mixed Precision", "Environment Variables", "Features", "GPU Profiler", "Infrastructure", "Graph fusion", "Customized Operators", "Operators Override", "Keras Mixed Precision", "Launch Script User Guide", "Practice Guide", "Python APIs", "Install TensorFlow Serving with Intel\u00ae Extension for TensorFlow*", "Experimental: Intel\u00ae Arc\u2122 A-Series GPU Software Installation", "Conda Environment Installation Instructions", "Overview", "Intel\u00ae Extension for TensorFlow* for C++", "Intel CPU Software Installation", "Intel XPU Software Installation", "Installation Guide", "Examples", "Accelerate AlexNet by Quantization with Intel\u00ae Extension for Tensorflow*", "Common Guide for Running", "Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision on Intel CPU and GPU via Docker Container or Bare Metal", "ResNet50 Inference on Intel CPU and GPU", "Accelerate Deep Learning Training and Inference for Model Zoo Workloads on Intel GPU", "Accelerate BERT-Large Pretraining on Intel GPU", "Quantize Inception V3 by Intel\u00ae Extension for Tensorflow* on Intel\u00ae Xeon\u00ae", "Quick Example on Intel CPU and GPU", "Stable Diffusion Inference for Text2Image on Intel GPU", "Accelerate 3D-Unet Training w/o horovod for medical image segmentation on Intel GPU", "BERT Training for Classifying Text on Intel CPU and GPU", "FP8 BERT-Large Fine-tuning for Classifying Text on Intel GPU", "Distributed Training Example with Intel\u00ae Optimization for Horovod* on Intel\u00ae GPU", "Accelerate Mask R-CNN Training w/o horovod on Intel GPU", "Resnet50 train on Intel GPU", "Accelerate ResNet50 Training by XPUAutoShard on Intel GPU", "Quick Get Started*", "Welcome to Intel \u00ae Extension for TensorFlow* documentation!"], "terms": {"we": [0, 2, 7, 11, 16, 24, 27, 29, 30, 31, 33, 34, 35, 40, 41, 42, 45, 46, 48, 49, 51, 53, 54, 56], "member": [0, 30], "leader": 0, "make": [0, 2, 3, 5, 7, 11, 14, 16, 18, 19, 27, 29, 34, 35, 42], "particip": 0, "commun": [0, 2, 7, 9, 21, 23, 29, 37, 56], "harass": 0, "free": [0, 21, 28], "experi": [0, 4, 21, 23, 29], "everyon": 0, "regardless": 0, "ag": 0, "bodi": 0, "size": [0, 20, 25, 27, 28, 52, 55], "visibl": [0, 2, 11, 31], "invis": 0, "disabl": [0, 15, 19, 28, 29, 30], "ethnic": 0, "sex": 0, "characterist": 0, "gender": 0, "ident": [0, 27], "express": 0, "level": [0, 14, 16, 17, 23, 24, 27, 32], "educ": 0, "socio": 0, "econom": 0, "statu": [0, 11, 19, 35], "nation": 0, "person": 0, "appear": [0, 27], "race": 0, "cast": [0, 18, 24, 27], "color": 0, "religion": 0, "sexual": 0, "orient": 0, "act": [0, 21, 31], "interact": [0, 34], "wai": [0, 14, 19, 27, 31, 33], "contribut": [0, 4, 21, 28, 34], "an": [0, 2, 3, 7, 11, 13, 14, 18, 19, 21, 24, 25, 27, 28, 29, 31, 33, 34, 35, 37, 39, 46, 47, 51, 55, 56], "open": [0, 5, 7, 14, 18, 21, 31, 32, 42, 43, 45, 46, 48, 49, 50, 51, 53, 56], "welcom": [0, 7, 56], "divers": 0, "inclus": 0, "healthi": 0, "exampl": [0, 2, 4, 5, 7, 9, 11, 15, 20, 21, 24, 25, 26, 27, 29, 30, 31, 32, 33, 40, 42, 44, 46, 50, 56], "behavior": [0, 27, 28, 29], "posit": [0, 7], "environ": [0, 4, 11, 13, 15, 19, 21, 22, 23, 27, 29, 31, 38, 39, 56], "includ": [0, 7, 13, 14, 16, 17, 18, 20, 23, 35, 37, 46, 47, 56], "demonstr": [0, 16, 39], "empathi": 0, "kind": [0, 4, 21, 47], "toward": 0, "other": [0, 17, 20, 25, 27, 28, 29, 30, 31, 32, 34, 35, 37, 50, 52, 56], "peopl": 0, "Being": 0, "respect": [0, 28, 45], "differ": [0, 2, 4, 13, 16, 20, 21, 23, 25, 28, 29, 30, 38], "opinion": 0, "viewpoint": 0, "give": 0, "gracefulli": 0, "accept": [0, 7, 17], "construct": [0, 11, 17, 27], "feedback": [0, 7], "apolog": 0, "those": [0, 18, 19, 31, 54], "affect": [0, 18, 27], "mistak": 0, "learn": [0, 15, 19, 21, 25, 28, 29, 31, 34, 35, 39, 40, 56], "from": [0, 3, 4, 5, 7, 11, 16, 17, 18, 19, 21, 22, 27, 28, 29, 30, 32, 34, 38, 39, 42, 44, 45, 46, 49, 50, 56], "focus": 0, "what": [0, 14, 27], "i": [0, 4, 5, 7, 9, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, 31, 32, 33, 34, 35, 36, 37, 39, 40, 42, 43, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56], "best": [0, 32], "just": 0, "u": [0, 16, 22, 28, 37], "individu": [0, 20], "overal": [0, 29], "unaccept": 0, "The": [0, 2, 4, 5, 7, 9, 13, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 31, 32, 34, 35, 36, 37, 40, 42, 45, 46, 49, 50, 51, 52, 53, 54, 55], "us": [0, 2, 3, 4, 5, 7, 13, 14, 15, 16, 18, 19, 20, 22, 23, 24, 25, 26, 27, 29, 30, 32, 33, 34, 35, 37, 39, 41, 42, 44, 45, 46, 47, 49, 50, 51, 53, 56], "languag": [0, 35], "imageri": 0, "attent": [0, 20], "advanc": [0, 4, 14, 20, 30, 39, 56], "ani": [0, 4, 11, 20, 21, 23, 24, 27, 28, 32, 33, 34, 37, 40, 47, 50], "troll": 0, "insult": 0, "derogatori": 0, "comment": [0, 7, 14], "polit": 0, "attack": 0, "public": [0, 4, 5, 11, 21, 25, 30, 31], "privat": 0, "publish": [0, 5], "inform": [0, 1, 7, 8, 20, 28, 29, 30, 34, 35, 37, 40, 46, 54, 56], "physic": [0, 29, 55], "email": 0, "address": [0, 29, 32], "without": [0, 4, 18, 20, 21, 23, 27, 34, 35, 39, 46, 50, 56], "explicit": [0, 11, 27, 29], "permiss": [0, 5], "which": [0, 4, 7, 9, 13, 14, 15, 16, 17, 18, 19, 20, 24, 27, 28, 29, 30, 32, 34, 35, 37, 40, 41, 46, 51], "could": [0, 14, 18, 27, 30, 35, 37, 40, 46], "reason": [0, 27], "consid": [0, 18, 52], "inappropri": 0, "profession": 0, "set": [0, 4, 7, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 29, 30, 32, 33, 35, 37, 42, 45, 46, 51, 56], "ar": [0, 2, 4, 5, 7, 11, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 34, 35, 36, 37, 39, 40, 42, 45, 46, 47, 49, 52, 54, 56], "clarifi": 0, "take": [0, 11, 24, 27, 28, 29, 31, 33, 45], "appropri": [0, 3, 29, 34, 35], "fair": 0, "action": [0, 5], "thei": [0, 18, 27, 28, 29], "deem": 0, "threaten": 0, "offens": 0, "harm": 0, "have": [0, 18, 27, 29, 32, 33, 34, 40, 46], "right": [0, 25], "remov": [0, 11, 18, 24], "edit": [0, 2], "reject": 0, "commit": [0, 5, 17, 31], "wiki": 0, "issu": [0, 1, 7, 14, 18, 27, 32, 34, 35, 37, 50, 56], "align": [0, 13], "thi": [0, 2, 3, 5, 11, 13, 14, 16, 17, 18, 19, 20, 21, 23, 24, 25, 27, 28, 29, 30, 31, 33, 34, 35, 37, 40, 41, 44, 45, 46, 47, 49, 51, 53, 55, 56], "moder": 0, "decis": [0, 17], "when": [0, 5, 14, 17, 19, 24, 27, 28, 29, 31, 32, 34, 35, 45, 46, 49, 50, 53], "appli": [0, 17, 25, 27, 30, 45, 48, 49, 51, 53, 55], "within": [0, 15, 25, 32, 45], "all": [0, 7, 11, 14, 18, 20, 21, 25, 27, 29, 32, 37, 40, 42, 45, 55], "space": [0, 29, 56], "also": [0, 4, 7, 15, 16, 17, 19, 21, 23, 27, 28, 29, 32, 33, 36, 37, 56], "offici": [0, 29, 39, 40, 41, 45, 48, 49, 51, 53, 54, 55], "repres": [0, 17], "e": [0, 2, 3, 5, 11, 17, 27, 28, 31, 35, 37, 53, 54], "mail": 0, "post": [0, 7, 18, 19, 24, 30], "via": [0, 11, 17, 39, 55, 56], "social": 0, "media": 0, "account": 0, "appoint": 0, "onlin": [0, 56], "offlin": 0, "event": 0, "instanc": 0, "abus": 0, "otherwis": [0, 17, 27, 30, 46, 47], "mai": [0, 7, 13, 14, 18, 19, 24, 27, 28, 29, 32, 33, 37, 48, 56], "report": [0, 7, 20, 56], "itex": [0, 2, 3, 4, 8, 9, 11, 13, 14, 16, 17, 18, 19, 20, 21, 23, 26, 27, 28, 31, 32, 33, 34, 35, 36, 37, 41, 42, 46, 48, 55, 56], "maintain": [0, 7, 8, 18, 21, 23, 25, 31], "intel": [0, 1, 5, 8, 9, 11, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, 28, 33, 38, 39, 56], "com": [0, 5, 7, 8, 16, 21, 27, 29, 31, 32, 33, 34, 35, 37, 40, 42, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55, 56], "complaint": 0, "review": 0, "investig": [0, 28], "promptli": 0, "fairli": 0, "oblig": 0, "privaci": 0, "secur": 0, "incid": 0, "follow": [0, 2, 3, 7, 15, 17, 18, 22, 24, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 42, 43, 45, 47, 48, 49, 50, 51, 53, 54, 55, 56], "impact": [0, 5, 14, 18, 24, 29, 50], "determin": [0, 11, 27, 29], "consequ": 0, "violat": 0, "unprofession": 0, "unwelcom": 0, "A": [0, 5, 16, 17, 18, 24, 27, 28, 29, 30, 31, 37, 39, 42, 52], "written": [0, 7], "provid": [0, 2, 4, 7, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32, 33, 34, 35, 37, 39, 40, 45, 46, 49, 53, 54, 55, 56], "clariti": 0, "around": [0, 28, 45], "natur": 0, "explan": 0, "why": 0, "wa": [0, 28, 29, 30, 34, 35], "apologi": 0, "request": [0, 7, 56], "through": [0, 14, 27, 29, 34, 35, 49, 56], "singl": [0, 4, 7, 15, 20, 21, 24, 45, 53, 54], "seri": [0, 16, 29, 30, 34, 35, 37, 40, 42, 44, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55, 56], "continu": [0, 14, 18, 27], "No": [0, 14, 19, 22, 34, 42, 43, 45, 48, 49, 50, 51, 53], "involv": 0, "unsolicit": 0, "specifi": [0, 3, 11, 21, 24, 27, 28, 29, 31, 34, 35], "period": [0, 29], "time": [0, 11, 14, 16, 18, 19, 20, 21, 22, 27, 29, 34, 35, 40, 45], "avoid": [0, 24, 27, 28, 29, 33], "well": [0, 2, 8, 11, 21, 26, 27, 28, 29, 45], "extern": [0, 14, 35], "channel": [0, 24, 25, 38], "like": [0, 2, 7, 16, 17, 25, 27, 29, 30, 41, 42, 51, 52], "term": [0, 25, 56], "lead": [0, 18], "seriou": 0, "sustain": 0, "sort": 0, "allow": [0, 16, 18, 27, 29, 50, 56], "dure": [0, 15, 18, 19, 24, 27, 33, 34, 35, 42], "pattern": [0, 4, 15, 21, 24], "aggress": [0, 18, 19], "disparag": 0, "class": [0, 11, 27, 30], "adapt": 0, "version": [0, 2, 11, 14, 16, 27, 29, 32, 33, 34, 35, 36, 37, 40, 41], "avail": [0, 2, 3, 11, 14, 19, 25, 28, 29, 34, 36, 37, 49, 56], "http": [0, 2, 5, 7, 8, 16, 21, 22, 27, 29, 31, 32, 33, 34, 35, 36, 37, 40, 42, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55, 56], "www": [0, 21, 37], "org": [0, 2, 7, 21, 35, 50], "_": [0, 11, 13, 16, 17, 18, 20, 22, 24, 27, 28, 29, 30, 31, 32, 34, 35, 41, 42, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54], "html": [0, 5, 11, 37], "were": [0, 28, 29], "inspir": 0, "mozilla": 0, "": [0, 5, 14, 18, 20, 21, 27, 29, 31, 34, 35, 40, 42, 46, 48, 49, 50, 56], "ladder": 0, "For": [0, 1, 2, 7, 11, 14, 15, 16, 18, 19, 20, 23, 25, 26, 27, 28, 30, 31, 32, 37, 42, 43, 44, 45, 48, 49, 50, 51, 52, 53, 54, 55], "answer": 0, "common": [0, 11, 14, 17, 21, 29, 39], "question": [0, 4, 56], "about": [0, 7, 19, 29, 31, 40, 45, 46, 52], "see": [0, 1, 2, 7, 22, 25, 27, 28, 29, 31, 32, 34, 46, 54, 56], "faq": 0, "translat": [0, 34, 35], "center": [1, 4, 16, 21, 25, 26, 30, 34, 35, 37, 40, 42, 44, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55, 56], "more": [1, 4, 7, 11, 16, 18, 19, 21, 25, 29, 31, 32, 34, 35, 37, 40, 45, 46, 47, 52, 54], "how": [1, 5, 14, 16, 17, 18, 29, 31, 34, 35, 37, 39, 52, 54, 56], "work": [1, 4, 7, 14, 15, 19, 20, 21, 27, 28, 29, 35, 40, 46], "resolv": 1, "handl": [1, 13], "guidelin": [1, 4, 44, 56], "document": [2, 3, 27, 33], "ha": [2, 3, 14, 18, 19, 27, 29, 32, 35, 45, 55], "instruct": [2, 3, 4, 7, 18, 19, 21, 29, 36, 37, 48, 54, 56], "assumpt": [2, 3], "host": [2, 3, 27, 37, 42], "machin": [2, 3, 21, 27, 28, 29, 31, 36, 37, 47, 52], "linux": [2, 3, 7, 16, 28, 29, 33, 34, 35, 36, 37, 46], "kernel": [2, 3, 9, 10, 15, 16, 20, 22, 23, 24, 25, 27, 32, 34, 35, 36, 37, 45, 46, 48, 56], "compat": [2, 3, 4, 15, 19, 21, 23, 26, 27, 30, 45, 46, 48, 49, 50, 51, 53, 54], "driver": [2, 3, 14, 27, 33, 40, 42, 46, 56], "instal": [2, 3, 4, 7, 9, 14, 18, 19, 21, 22, 23, 26, 27, 28, 29, 30, 40, 41, 42, 43, 45, 46, 48, 49, 50, 51, 53], "softwar": [2, 33, 38, 40, 46, 47, 52], "refer": [2, 3, 7, 11, 15, 16, 17, 18, 19, 20, 21, 23, 27, 29, 30, 31, 32, 34, 35, 37, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56], "xpu": [2, 4, 11, 14, 16, 17, 19, 22, 25, 26, 27, 30, 38, 47, 48, 52], "cpu": [2, 3, 4, 9, 11, 14, 15, 18, 19, 20, 23, 24, 27, 30, 31, 38, 39, 40], "detail": [2, 3, 11, 15, 16, 17, 18, 19, 21, 23, 25, 27, 29, 30, 32, 34, 35, 37, 40, 42, 45, 56], "download": [2, 8, 27, 29, 32, 37, 45, 53, 54], "copi": [2, 3, 35], "wheel": [2, 33, 34], "model": [2, 3, 13, 15, 16, 17, 18, 19, 20, 21, 22, 29, 30, 39, 40, 46, 50, 52, 55, 56], "directori": [2, 3, 4, 5, 7, 14, 17, 28, 31, 32, 34, 35, 37, 42, 43, 45, 48, 49, 51, 53], "you": [2, 3, 4, 5, 7, 8, 11, 13, 14, 16, 17, 18, 20, 21, 22, 23, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 40, 41, 42, 43, 45, 46, 47, 48, 49, 51, 53, 54, 55], "can": [2, 3, 7, 11, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 25, 27, 28, 29, 30, 31, 32, 33, 34, 36, 37, 38, 40, 45, 49, 53, 54, 55, 56], "get": [2, 4, 7, 11, 13, 16, 21, 27, 29, 30, 31, 32, 34, 35, 42, 43, 45, 48, 49, 51, 53], "link": [2, 35, 46], "pypi": [2, 38, 56], "project": [2, 5, 7, 56], "file": [2, 5, 7, 14, 17, 18, 22, 28, 31, 32, 37, 42, 43, 45, 48, 49, 50, 51, 53, 54, 56], "lib": [2, 14, 16, 28, 34, 35, 37, 50], "To": [2, 3, 4, 7, 18, 19, 24, 27, 29, 32, 34, 35, 36, 37, 40, 45, 46, 48, 49, 53], "optim": [2, 4, 9, 14, 15, 16, 17, 18, 23, 25, 26, 27, 28, 29, 30, 32, 33, 37, 39, 40, 42, 44, 45, 46, 48, 49, 53, 54, 56], "horovod": [2, 32, 33, 37, 39], "oneapi": [2, 14, 16, 21, 31, 33, 40, 42, 43, 45, 46, 48, 49, 50, 51, 53, 54, 55, 56], "collect": [2, 29, 37], "librari": [2, 3, 11, 28, 29, 32, 34, 37, 49], "oneccl": [2, 32, 33, 37], "mkdir": [2, 3, 54, 55], "cd": [2, 5, 7, 16, 29, 31, 34, 35, 42, 45, 48, 49, 51, 52, 53, 55], "wget": [2, 7, 29, 32, 34, 35, 37, 42, 50, 52], "sh": [2, 3, 5, 14, 31, 32, 33, 34, 35, 37, 41, 42, 43, 45, 46, 48, 49, 50, 51, 52, 53, 54, 56], "o": [2, 16, 22, 32, 33, 35, 37, 39, 46], "some": [2, 11, 16, 18, 19, 26, 27, 28, 29, 34, 35, 45, 52], "python": [2, 4, 9, 14, 16, 19, 22, 23, 25, 26, 27, 28, 29, 31, 32, 33, 34, 36, 37, 40, 41, 45, 46, 47, 49, 50, 51, 52, 53, 54, 56], "hard": [2, 48], "code": [2, 4, 5, 9, 11, 16, 20, 21, 22, 23, 29, 31, 38, 39, 40, 42, 46], "insid": [2, 56], "If": [2, 3, 5, 16, 20, 22, 25, 26, 27, 28, 29, 30, 32, 34, 35, 36, 37, 40, 42, 43, 45, 46, 47, 48, 49, 51, 53], "re": [2, 29, 41], "3": [2, 7, 18, 20, 22, 24, 25, 26, 27, 28, 29, 30, 33, 34, 35, 36, 37, 40, 41, 46, 47, 55], "10": [2, 14, 16, 18, 19, 25, 27, 28, 32, 34, 35, 36, 37, 46, 54, 55, 56], "2": [2, 14, 15, 17, 18, 19, 20, 24, 25, 27, 28, 29, 30, 33, 34, 36, 37, 40, 42, 43, 45, 46, 47, 48, 49, 51, 52, 53, 54, 55, 56], "13": [2, 16, 32, 33, 34, 35, 36, 37, 40, 46, 52, 55, 56], "ubuntu": [2, 16, 31, 34, 35, 36, 37], "22": [2, 16, 31, 32, 34, 36, 37, 55], "04": [2, 16, 31, 32, 34, 36, 37], "layer": [2, 9, 19, 25, 27, 46], "updat": [2, 18, 27, 31, 32, 33, 34, 35, 36, 37, 55], "shown": [2, 3, 15, 22, 24, 28, 45, 48, 49, 53], "below": [2, 3, 16, 24, 25, 27, 28, 29, 30, 31, 32, 34, 45], "image_nam": [2, 3], "arg": [2, 13, 30], "ubuntu_vers": 2, "python3": [2, 5, 33, 34, 48, 50], "tf_ver": 2, "whl": [2, 11, 32, 34, 35, 56], "t": [2, 5, 11, 13, 17, 18, 20, 27, 28, 48, 50], "f": [2, 32, 35, 56], "dockerfil": 2, "enter": [2, 3, 22, 33, 34, 35], "folder": [2, 3, 19, 31, 34, 35, 54], "command": [2, 3, 14, 16, 22, 28, 29, 32, 33, 34, 35, 36, 37, 41, 42, 46, 51], "start": [2, 3, 14, 21, 22, 27, 28, 31], "v": [2, 3, 18, 31, 33, 35, 37, 41, 42], "option": [2, 3, 7, 11, 16, 18, 21, 28, 30, 34, 54, 55, 56], "mount": [2, 3], "your": [2, 3, 5, 7, 14, 29, 31, 32, 33, 34, 35, 36, 37, 41, 42, 46, 48, 50, 55, 56], "local": [2, 3, 7, 14, 19, 28, 29, 31, 34, 35, 36, 37, 52], "attach": [2, 3, 27, 29], "devic": [2, 3, 4, 9, 10, 11, 13, 14, 16, 17, 19, 20, 21, 22, 23, 24, 27, 30, 31, 34, 35, 37, 42, 55, 56], "dev": [2, 3, 14, 22, 31, 37, 42, 51], "dri": [2, 3, 31, 37, 42], "dir": [2, 3, 7, 45, 49, 50, 51, 53], "workspac": [2, 3, 31], "path": [2, 3, 7, 16, 18, 19, 20, 22, 28, 29, 30, 31, 32, 33, 34, 35, 37, 42, 46, 49, 51, 53, 54, 55, 56], "privileg": [2, 3, 42], "ipc": [2, 3, 37, 42], "http_proxi": [2, 3], "https_proxi": [2, 3], "no_proxi": [2, 3], "bash": [2, 32, 33, 34, 35, 37, 42, 45, 46, 53, 56], "now": [2, 18, 27, 29, 31], "c": [2, 4, 10, 11, 14, 16, 28, 29, 32, 33, 34, 36, 37, 38, 56], "client": [2, 35], "import": [2, 7, 11, 14, 16, 17, 18, 19, 22, 23, 25, 26, 27, 29, 32, 33, 34, 35, 36, 37, 42, 46, 47, 56], "device_lib": 2, "print": [2, 11, 16, 19, 22, 25, 27, 28, 30, 32, 33, 34, 35, 36, 37, 42, 43, 47, 48, 55, 56], "list_local_devic": 2, "should": [2, 5, 7, 22, 27, 29, 31, 32, 33, 36, 37, 40, 51, 55], "list": [2, 7, 11, 19, 24, 27, 28, 29, 32, 34, 35], "sampl": [2, 22, 40, 46, 48], "output": [2, 7, 11, 13, 19, 20, 24, 25, 27, 30, 32, 34, 35, 42, 46, 51], "look": [2, 16, 24, 31], "name": [2, 3, 4, 5, 7, 11, 14, 16, 18, 19, 20, 25, 26, 27, 29, 31, 39, 48, 52], "0": [2, 5, 11, 14, 15, 16, 19, 20, 22, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 40, 43, 45, 46, 47, 50, 51, 52, 53, 54, 55, 56], "device_typ": [2, 14, 17, 52, 55], "memory_limit": 2, "268435456": 2, "incarn": 2, "9266936945121049176": 2, "xla_global_id": 2, "1": [2, 4, 5, 14, 18, 19, 20, 21, 22, 25, 26, 27, 28, 29, 30, 33, 34, 42, 45, 46, 47, 49, 51, 52, 53, 54, 55, 56], "bus_id": 2, "15031084974591766410": 2, "physical_device_desc": 2, "intel_xpu": 2, "pci": 2, "bu": 2, "id": [2, 31], "undefin": [2, 16], "17448926295332318308": 2, "step": [3, 16, 17, 18, 25, 27, 29, 31, 39, 40, 49, 52, 53, 55], "cpp": [3, 14, 17, 32], "cc": [3, 11, 14, 16, 17, 27, 31, 37, 52, 55], "sourc": [3, 4, 7, 11, 16, 17, 21, 32, 33, 37, 38, 41, 42, 43, 46, 49, 50, 52, 53, 56], "Then": [3, 11, 16, 22, 30, 36, 37, 46], "packag": [3, 16, 29, 32, 33, 34, 36, 40, 46, 49, 50, 53, 56], "p": [3, 25, 31, 36, 37, 42, 54], "bazel": [3, 11, 16, 31], "bin": [3, 7, 11, 16, 28, 31, 34, 35, 37, 41, 42, 43, 46, 49, 50, 52, 53], "cp": [3, 35], "r": [3, 7, 14, 16, 27, 29, 39, 55], "path_to_itex": 3, "out": [3, 15, 16, 27, 35, 43, 47, 48, 55], "k8": [3, 35], "opt": [3, 11, 14, 16, 32, 34, 35, 37, 41, 52], "st": [3, 35], "tar": [3, 7, 29], "cvfh": 3, "path_to_tensorflow_serv": 3, "tensorflow_serv": [3, 31], "model_serv": [3, 31], "tensorflow_model_serv": [3, 31], "gpu": [3, 4, 9, 11, 14, 15, 18, 19, 20, 23, 24, 25, 27, 30, 31, 33, 38, 39], "sure": [3, 11, 16, 27, 32, 34, 35], "meet": [3, 25, 56], "either": [3, 19], "target": [3, 17, 34, 35], "8500": [3, 31], "model_nam": [3, 31], "model_dir": [3, 31, 49, 53, 54, 55], "overview": 4, "infrastructur": [4, 9, 20], "quick": [4, 11, 32, 39], "releas": [4, 14, 17, 29, 30, 31, 34, 35, 40, 48, 50], "frequent": 4, "ask": [4, 34, 35], "guid": [4, 9, 11, 16, 18, 21, 27, 31, 32, 34, 35, 37, 39, 40, 46], "build": [4, 7, 9, 38, 39, 40, 56], "conda": [4, 14, 38], "distribut": [4, 8, 29, 32, 33, 37, 38, 39, 54, 56], "featur": [4, 7, 8, 11, 13, 17, 25, 29, 34, 39, 46, 55, 56], "variabl": [4, 13, 15, 16, 19, 21, 22, 23, 24, 25, 27, 29, 31, 33, 35, 46], "api": [4, 7, 9, 10, 14, 15, 16, 19, 25, 26, 27, 29, 31, 35, 46, 47], "auto": [4, 11, 17, 28, 30, 35], "mix": [4, 30, 39], "precis": [4, 30, 39, 40, 48, 51], "graph": [4, 9, 10, 13, 15, 18, 20, 23, 39, 47, 55, 56], "custom": [4, 7, 9, 18, 21, 26, 28, 30, 32, 37, 45], "oper": [4, 13, 15, 18, 23, 24, 27, 29, 56], "overrid": [4, 11, 18, 27], "int8": [4, 27, 40, 46], "quantiz": [4, 39], "xpuautoshard": [4, 30, 39], "profil": [4, 9, 27, 29], "launcher": [4, 28, 29], "topic": 4, "practic": [4, 27, 28], "support": [4, 7, 13, 14, 15, 17, 18, 19, 22, 24, 27, 28, 29, 30, 32, 34, 35, 36, 37, 40, 42, 46, 54, 55], "openxla": 4, "develop": [4, 16, 21, 29, 32, 34, 35, 36, 37, 56], "design": [4, 7, 9, 14, 21, 31, 40], "structur": [4, 16, 19, 28, 29], "op": [4, 9, 10, 17, 20, 21, 23, 24, 26, 27, 35, 45, 48], "gener": [4, 5, 20, 21, 23, 27, 28, 29, 31, 33, 34, 36, 42, 46], "default": [4, 7, 13, 14, 15, 18, 19, 20, 21, 23, 27, 29, 30, 34, 35, 37, 45, 46, 47, 53, 54, 55], "configur": [4, 8, 11, 14, 16, 17, 19, 21, 23, 27, 28, 30, 32, 37, 54, 56], "good": [4, 19, 21, 23, 29, 31], "perform": [4, 15, 17, 19, 20, 21, 22, 23, 24, 25, 27, 28, 29, 30, 34, 35, 39, 45, 46, 48, 49, 53, 55, 56], "chang": [4, 5, 7, 11, 18, 19, 20, 21, 23, 27, 28, 33, 39, 40, 50, 52], "simpl": [4, 21, 23, 27, 35], "frontend": [4, 21, 23], "util": [4, 9, 11, 14, 21, 23, 28, 29, 50, 55], "user": [4, 5, 7, 11, 13, 19, 20, 21, 23, 32, 34, 35, 36, 37, 38, 42, 48, 56], "onli": [4, 5, 13, 14, 17, 18, 20, 21, 23, 24, 27, 28, 30, 31, 32, 36, 45, 48, 49, 50, 51, 53, 54, 55], "minor": [4, 21, 23], "applic": [4, 21, 23, 29, 30, 31, 40], "scenario": [4, 13, 20, 21, 23, 29, 30], "typic": [4, 21, 23, 27, 29], "need": [4, 8, 13, 14, 16, 17, 20, 21, 23, 27, 28, 31, 32, 33, 34, 35, 37, 42, 46, 47, 50, 54, 55], "add": [4, 5, 17, 18, 19, 24, 29, 31, 32, 35, 42, 48, 55], "two": [4, 13, 14, 19, 21, 23, 27, 29, 34, 42, 45, 48, 49, 53], "three": [4, 21, 22, 23, 28], "claus": [4, 21, 23], "origin": [4, 18, 21, 23, 24, 25, 35, 40, 42, 50], "amp": [4, 18, 28, 39, 49, 53, 56], "low": [4, 18, 21, 23, 27, 40], "data": [4, 15, 16, 17, 18, 21, 22, 25, 27, 30, 34, 35, 37, 40, 42, 44, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55, 56], "type": [4, 7, 11, 14, 18, 20, 21, 28, 30, 33, 34, 35, 42], "bfloat16": [4, 11, 18, 19, 21, 24, 27, 30, 42, 45, 49, 51, 53], "float16": [4, 18, 19, 21, 27, 30, 42], "nativ": [4, 15, 21], "3rd": [4, 21, 36], "xeon": [4, 21, 29, 34, 35, 36, 39, 42], "scalabl": [4, 21, 31, 36, 42], "processor": [4, 21, 29, 36, 42, 46, 47], "cooper": [4, 21, 39, 46], "lake": [4, 21], "avx512": [4, 21, 37, 46], "further": [4, 21], "boost": [4, 21, 28, 29], "less": [4, 18, 19, 21, 24, 27, 42], "memori": [4, 9, 11, 13, 14, 15, 18, 19, 21, 25, 27, 42], "lower": [4, 15, 18, 19, 21, 42], "fulli": [4, 19, 21], "enabl": [4, 13, 15, 16, 17, 18, 21, 22, 25, 27, 28, 29, 30, 33, 34, 35], "fuse": [4, 16, 18, 19, 21, 24, 26, 45], "specif": [4, 16, 27, 29, 30, 31, 32, 37, 54, 56], "new": [4, 5, 7, 8, 15, 21, 23, 24, 27, 29, 40], "better": [4, 15, 18, 19, 21, 24, 25, 28, 29, 39, 45, 46, 48, 49, 53], "conv2d": [4, 21, 47], "relu": [4, 11, 16, 19, 21, 24, 25, 26, 27, 47], "linear": [4, 19, 21, 25, 27], "benefit": [4, 21, 27, 29, 30], "fusion": [4, 9, 16, 17, 18, 19, 21, 26, 30], "deliv": [4, 19, 21], "transpar": [4, 21], "fashion": [4, 21], "implement": [4, 7, 10, 16, 17, 19, 21, 23, 25, 26, 29, 56], "sever": [4, 21, 28, 29, 34, 35, 39, 54], "namespac": [4, 17, 21, 23, 25, 26, 30, 35], "extend": [4, 14, 21, 23, 25, 29, 30], "defin": [4, 16, 27], "export": [4, 7, 11, 15, 16, 17, 18, 19, 21, 22, 27, 28, 29, 31, 33, 35, 37, 41, 42, 46, 51, 54, 55], "ze_enable_tracing_lay": [4, 21, 22, 27], "usecyclespersecondtim": [4, 21, 22, 27], "enable_tf_profil": [4, 21, 22, 27], "co": [4, 14, 15, 21], "neural": [4, 15, 21, 29, 39, 40, 46], "compressor": [4, 15, 21, 39, 40, 46], "solut": [4, 14, 15, 21], "equival": [4, 27], "experiment": [4, 13, 14, 16, 22, 30, 34, 35, 37], "automat": [4, 5, 16, 17, 18, 19, 21, 26, 27, 28, 29, 30, 32, 37, 39, 43, 47, 55], "shard": [4, 17, 21, 30], "input": [4, 11, 13, 17, 19, 20, 21, 22, 24, 25, 27, 30, 55], "place": [4, 17, 21, 29, 35], "maxim": [4, 17, 21, 25, 30, 55], "hardwar": [4, 17, 19, 21, 23, 25, 28, 30, 39], "usag": [4, 14, 21, 29, 30, 39], "adopt": [4, 15, 21], "uniform": [4, 16, 21], "pjrt": [4, 21, 56], "plugin": [4, 10, 16, 18, 19, 21, 22, 31, 34, 52, 56], "mechan": [4, 21], "backend": [4, 16, 21, 23, 26, 27, 30, 37, 42, 43, 46, 47, 56], "show": [5, 14, 16, 18, 27, 34, 35, 37, 39, 40, 42, 44, 45, 46, 48, 49, 50, 51, 53, 54, 55], "script": [5, 21, 22, 29, 34, 35, 42, 45, 47, 49, 50, 53, 54], "relat": [5, 28, 31], "save": [5, 11, 17, 28, 30, 51], "doc": [5, 9, 11, 50], "build_doc": 5, "trigger": [5, 19, 30], "merg": 5, "pr": 5, "github": [5, 7, 8, 16, 21, 29, 31, 34, 35, 37, 40, 42, 45, 48, 49, 51, 52, 53, 54, 55, 56], "repo": [5, 32, 33], "main": [5, 16, 17, 21, 32, 35, 49, 52, 53, 54], "branch": [5, 7, 16, 34], "execut": [5, 11, 13, 15, 16, 17, 18, 19, 20, 22, 25, 27, 29, 39, 46, 47], "content": [5, 35, 37], "doesn": [5, 17, 18, 50], "contain": [5, 9, 15, 17, 28, 29, 31, 38, 39, 49, 53, 56], "won": [5, 28], "product": [5, 7, 21, 31, 32], "git": [5, 11, 16, 30, 31, 34, 35, 42, 45, 48, 49, 51, 52, 53, 54, 55], "tag": 5, "must": [5, 15, 27], "ad": [5, 13, 17, 18, 21, 23, 27, 34, 45, 55], "same": [5, 7, 14, 16, 20, 21, 23, 24, 25, 27, 28, 29, 30, 31, 35, 40, 47], "manual": [5, 7, 18, 27, 28, 54], "result": [5, 15, 16, 17, 19, 22, 27, 29, 30, 33, 40, 43, 45, 47, 48, 50, 55], "gh": 5, "page": [5, 21, 22, 23, 29, 56], "io": [5, 31], "site": [5, 8, 32, 33, 34, 37, 50, 56], "note": [5, 11, 17, 18, 20, 25, 27, 28, 30, 31, 34, 35, 37, 42, 48, 52, 54], "write": [5, 7, 19], "abl": 5, "clone": [5, 16, 31, 34, 35, 45, 48, 49, 51, 53, 54, 55], "extens": [5, 8, 9, 11, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, 28, 29, 33, 38, 39, 41, 42, 43, 44, 45, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56], "tensorflow": [5, 8, 9, 10, 11, 13, 14, 15, 16, 17, 20, 22, 24, 25, 26, 27, 28, 29, 33, 38, 39, 41, 42, 43, 44, 45, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56], "checkout": [5, 16, 31, 35, 49, 53, 55], "build_tmp": 5, "m": [5, 16, 28, 29, 40, 41, 48, 49, 52, 53], "push": 5, "befor": [5, 7, 11, 18, 19, 24, 27, 28, 29, 34, 35, 39, 55], "submit": [5, 7, 56], "modifi": [5, 35, 42, 55], "draft": 5, "server": [5, 16, 34, 35, 37], "9000": 5, "web": [5, 50], "browser": [5, 22, 36, 37, 46, 48, 50], "g": [5, 17, 27, 35, 54], "chrome": 5, "127": [5, 31], "localhost": [5, 11, 20, 36, 37, 52], "check": [5, 7, 11, 13, 14, 18, 19, 21, 23, 27, 28, 32, 33, 34, 35, 40, 41, 42, 51, 52, 56], "picker": 5, "function": [5, 16, 17, 20, 21, 23, 25, 26, 27, 29, 30], "want": [5, 7, 27, 28, 32, 34, 37, 48, 51], "switch": [5, 29], "begin": [7, 11, 42], "share": [7, 14, 29, 32, 42, 43, 45, 48, 49, 51, 53], "intent": 7, "team": [7, 48], "base": [7, 11, 14, 15, 16, 18, 19, 25, 29, 32, 33, 36, 39, 42, 45, 46, 51, 52, 55, 56], "bug": [7, 56], "propos": [7, 25], "log": [7, 11, 16, 18, 20, 22, 27, 30, 35, 37, 42, 43, 45, 48, 49, 50, 51, 52, 53, 55], "intend": [7, 56], "approv": 7, "fix": [7, 27, 32], "search": [7, 28], "pick": 7, "d": [7, 32, 34, 35, 54], "pleas": [7, 11, 14, 16, 17, 21, 27, 32, 34, 35, 37, 39, 40, 42, 45, 47, 49, 51, 52, 53, 54, 56], "pull": [7, 31, 36, 37, 42], "ensur": [7, 28], "run": [7, 11, 14, 18, 19, 22, 24, 26, 27, 28, 29, 30, 34, 36, 37, 56], "patch": [7, 31, 45, 48, 49, 51, 53, 55], "signific": [7, 18], "requir": [7, 11, 13, 15, 21, 22, 24, 25, 27, 28, 33, 40], "rfc": [7, 16, 21], "process": [7, 11, 21, 27, 28, 29, 31, 45, 46], "consist": [7, 27], "discuss": 7, "promot": 7, "found": [7, 14, 27, 28, 29, 31, 34], "dedic": 7, "contributor": [7, 56], "coven": [7, 56], "conduct": [7, 28], "full": [7, 37], "locat": [7, 8, 34, 35, 45, 48], "benchmark": [7, 49, 54], "llga": [7, 30], "saniti": [7, 56], "migrat": 7, "path_to_python_unit_test": 7, "ut": 7, "find": [7, 11, 22, 29, 31], "py": [7, 11, 16, 22, 28, 31, 42, 43, 48, 49, 50, 51, 52, 53, 54, 55], "do": [7, 14, 16, 19, 27, 28, 30, 34, 46], "done": [7, 22, 27, 29, 32], "standard": [7, 25], "pylint": 7, "against": 7, "definit": [7, 18, 23, 30], "root": [7, 34, 35, 50], "pip": [7, 11, 14, 16, 22, 30, 31, 32, 33, 34, 35, 36, 37, 40, 41, 48, 49, 52, 53, 55, 56], "rcfile": 7, "pylintrc": 7, "myfil": 7, "conform": 7, "googl": [7, 14, 16, 21, 22, 31, 51], "both": [7, 14, 15, 18, 19, 23, 28, 29, 30, 37, 42, 54], "clang": 7, "format": [7, 9, 18, 24, 27, 30, 54], "cpplint": 7, "apt": [7, 16, 31, 32, 37], "12": [7, 14, 27, 28, 37, 45, 48, 52, 54, 55, 56], "inplac": 7, "stdout": [7, 28], "filter": 7, "legal": 7, "copyright": 7, "exclud": 7, "third_parti": [7, 9, 31], "recurs": 7, "sometim": 7, "fals": [7, 17, 25, 27, 28, 45, 51, 55], "error": [7, 11, 14, 20, 25, 27, 31, 42, 43, 45, 48, 49, 50, 51, 53], "nolint": 7, "nolintnextlin": 7, "skip": [7, 27, 28, 33], "line": [7, 27, 29, 31, 42, 50, 55], "mkl": [7, 31, 32, 33, 34, 35, 37], "h": [7, 11, 14, 17, 31, 35, 52], "include_subdir": 7, "buildifi": 7, "tool": [7, 9, 11, 14, 18, 29, 32, 33, 34, 37, 56], "bzl": 7, "convent": 7, "xxx": [7, 46, 50], "tpl": 7, "go": [7, 35, 36, 37], "golang": 7, "dl": 7, "go1": 7, "15": [7, 16, 28, 37], "html64": [7, 32], "gz": [7, 29], "sudo": [7, 16, 31, 32, 34, 35, 37], "usr": [7, 28, 32], "xzf": 7, "bazelbuild": [7, 34, 35], "buildtool": 7, "src": [7, 11, 14, 17, 31], "home": [7, 28, 32, 36, 37, 50], "NOT": [7, 14], "zzz": 7, "view": 8, "latest": [8, 16, 31, 33, 34, 35, 37, 56], "previou": [8, 25, 29], "valid": [8, 30], "here": [8, 11, 17, 18, 24, 34, 35, 45, 48, 49, 53, 55], "contact": 8, "addit": [8, 21, 23, 24, 29, 35, 56], "assist": 8, "none": [8, 25, 26, 27, 28, 30], "docker": [9, 38, 39], "docs_build": 9, "core": [9, 11, 14, 16, 17, 26, 27, 29, 34, 35, 37, 46, 47, 52, 55], "test": [9, 19, 22, 27, 31, 33, 39, 49, 50, 56], "third": [9, 56], "parti": [9, 56], "program": [9, 29, 56], "kei": [9, 16, 17, 20, 32], "parent": 9, "sub": [9, 14, 18, 19, 29, 30], "descript": [9, 13, 18, 28, 29, 30, 39, 50], "onednn": [9, 11, 12, 14, 15, 16, 20, 24, 29, 30, 39], "propag": [9, 13, 17], "miscellan": 9, "repositori": [9, 32, 45, 49, 53], "modular": 10, "pluggabl": [10, 35, 37], "streamexecutor": [10, 16], "registr": [10, 11, 49], "pluggabledevic": [10, 56], "pass": [11, 15, 16, 17, 27, 30, 48, 55], "procedur": [11, 16, 32, 36, 37], "tf": [11, 14, 15, 19, 22, 25, 26, 27, 28, 30, 32, 34, 35, 36, 37, 46, 47, 55], "__version__": [11, 30, 32, 34, 35, 36, 37, 56], "verbos": [11, 19, 20, 27, 28], "itex_verbos": [11, 16, 17], "onednn_verbos": 11, "familiar": [11, 16], "architectur": 11, "built": [11, 31, 36, 37], "creat": [11, 18, 27, 28, 30, 33, 37, 41, 46, 49, 53, 55], "offcial": 11, "geluop": 11, "init": 11, "void": 11, "register_geluop": 11, "declar": 11, "call": [11, 15, 16, 26, 27, 29, 30, 38, 41, 46, 47, 50, 51], "nn": [11, 16, 25, 26, 30, 47], "itex_vlog": 11, "statusuniqueptr": 11, "tf_newstatu": [11, 35], "tf_opdefinitionbuild": 11, "op_build": 11, "tf_newopdefinitionbuild": 11, "gelu": [11, 30], "tf_opdefinitionbuilderaddinput": 11, "tf_opdefinitionbuilderaddoutput": 11, "activ": [11, 18, 19, 22, 25, 27, 29, 30, 32, 33, 34, 35, 36, 37, 41, 42, 43, 46, 47, 49, 50, 52, 53], "tf_opdefinitionbuilderaddattr": 11, "half": [11, 27], "float": [11, 18, 20, 27, 30, 35, 42], "approxim": [11, 25], "bool": 11, "true": [11, 22, 25, 26, 27, 28, 30, 45, 51, 55], "tf_opdefinitionbuildersetshapeinferencefunct": 11, "unchanged_shape_fn": 11, "tf_registeropdefinit": 11, "itex_check_eq": 11, "tf_ok": [11, 35], "tf_getcod": [11, 35], "fail": [11, 27, 30], "its": [11, 25, 27, 28, 29, 32, 37, 47], "docstr": 11, "attr": [11, 20], "might": [11, 34], "debug": [11, 20, 22, 30], "one": [11, 14, 15, 20, 21, 27, 29, 34, 35, 42, 47, 49, 54], "made": [11, 49], "separ": [11, 16, 23, 24, 27, 29, 33, 34, 56], "register_kernel_build": 11, "device_cpu": 11, "typeconstraint": 11, "cpudevic": 11, "device_gpu": [11, 17, 55], "gpudevic": 11, "engin": [11, 14], "polymorph": 11, "load_ops_librari": 11, "load": [11, 27, 31, 37], "register_": 11, "macro": 11, "directli": [11, 17, 27, 28, 29, 37], "relubaseop": 11, "eltwisebaseop": 11, "opkernel": 11, "templat": 11, "typenam": 11, "opkernelconstruct": 11, "context": [11, 25, 29], "dnnl": [11, 13], "algorithm": [11, 25], "eltwise_gelu_erf": 11, "0f": 11, "hasattr": [11, 30], "op_requires_ok": 11, "getattr": 11, "approximate_": 11, "alg_kind_": 11, "eltwise_gelu_tanh": 11, "algo": 11, "alpha": 11, "beta": 11, "eltwis": 11, "rewrit": [11, 16, 17], "comput": [11, 15, 16, 25, 27, 29, 32, 40, 47, 48, 56], "ctx": 11, "alpha_": 11, "beta_": 11, "opkernelcontext": 11, "try": [11, 21, 28, 40, 46], "onednn_engin": 11, "creatednnlengin": 11, "tensor": [11, 25, 27, 35, 47], "dst_tensor": 11, "nullptr": 11, "noth": 11, "return": [11, 16, 17, 27, 30, 35], "src_tensor": 11, "shape": [11, 13, 17, 19, 25, 27, 47], "num_el": 11, "allocate_output": 11, "kdstindex": 11, "forward": [11, 27, 48], "descriptor": 11, "primit": [11, 13, 20], "eltwise_forward": 11, "desc": [11, 13], "fwd_desc": 11, "prop_kind": 11, "primitive_attr": 11, "set_scratchpad_mod": 11, "scratchpad_mod": 11, "primitive_desc": 11, "fwd_pd": 11, "fwd_primit": 11, "onednn_stream": 11, "creatednnlstream": 11, "std": [11, 35], "unordered_map": 11, "int": [11, 35], "fwd_primitive_arg": 11, "dnnl_arg_src": 11, "src_mem": 11, "dnnl_arg_dst": 11, "dst_mem": 11, "dnnl_arg_scratchpad": 11, "scratchpad_mem": 11, "catch": 11, "protect": 11, "eltwise_relu": 11, "hpp": 11, "It": [11, 14, 15, 16, 17, 18, 19, 20, 21, 27, 29, 33, 34, 39, 46, 49, 50, 53, 56], "elig": 11, "infer": [11, 15, 17, 18, 19, 24, 27, 31, 39, 40, 46, 50], "backward": [11, 27], "descibl": 11, "click": [11, 34, 35], "header": 11, "itex_xpu_librari": 11, "relu_op": 11, "hdr": [11, 31], "relu_op_functor": 11, "eltwise_base_hdr": 11, "copt": [11, 31], "tf_copt": [11, 31], "linkstat": 11, "dep": [11, 31], "alwayslink": [11, 31], "gpu_kernel": 11, "In": [11, 16, 18, 19, 27, 28, 29, 33, 40, 42, 46, 47, 52, 55], "tip": [11, 20, 29, 31], "compil": [11, 14, 16, 19, 21, 27, 29, 30, 31, 32, 33, 34, 35, 37], "name_scop": 11, "convert_to_tensor": 11, "intel_extension_for_tensorflow": [11, 17, 18, 19, 25, 26, 27, 28, 31, 32, 33, 34, 36, 37, 42, 56], "clean": [11, 35], "xfd": 11, "config": [11, 14, 16, 17, 18, 19, 27, 31, 32, 34, 35, 37, 42, 46, 52, 54, 55], "pip_packag": [11, 34], "build_pip_packag": [11, 34], "uninstal": 11, "intel_extension_for_tensorflow_lib": [11, 34], "x": [11, 19, 25, 26, 27, 34, 35, 42, 47, 52], "constant": [11, 15, 25, 26, 27], "dtype": [11, 19, 25, 26, 47, 55], "float32": [11, 16, 19, 24, 25, 26, 27, 45, 47, 49, 53], "y": [11, 16, 25, 26, 27, 32, 34, 35, 42, 52, 56], "nn_op": 11, "141": 11, "common_runtim": 11, "eager": [11, 25], "1445": 11, "job": [11, 20, 35], "replica": [11, 20], "task": [11, 20, 29, 54], "100": [11, 27, 30, 45], "eltwise_bas": 11, "44": [11, 28], "exec": [11, 13], "ocl": 11, "gen9": 11, "forward_train": 11, "data_f32": 11, "block": [11, 29, 30, 37], "f0": 11, "diff_undef": 11, "undef": 11, "scratchpad": [11, 13], "alg": 11, "5": [11, 18, 19, 20, 22, 25, 27, 30, 32, 34, 35, 36, 45, 47, 51, 55], "xxxxxx": 11, "op_kernel": 11, "773": 11, "object": [12, 14, 18, 27, 29, 30, 42, 43, 45, 48, 49, 50, 51, 53], "cach": [12, 15, 29], "creation": 13, "overhead": [13, 27, 29], "becom": [13, 29], "notic": [13, 27], "especi": [13, 33], "small": [13, 25, 27, 28, 29], "latenc": [13, 42, 48], "bind": [13, 29, 35], "node": [13, 18, 20, 24, 29, 33, 40], "By": [13, 27, 28, 29, 46], "off": [13, 28, 30, 46, 55, 56], "dynam": [13, 27, 29], "mean": [13, 14, 18, 25, 27, 28, 29, 34, 35], "invalid": [13, 29], "dim": 13, "meta": 13, "layout": [13, 28, 30], "parallel": [13, 16, 29], "schedul": [13, 25, 28, 29], "thread": [13, 28, 29, 30, 37], "safe": [13, 18, 30, 56], "stream": [13, 48, 55], "demand": [13, 56], "satisfi": [13, 23], "concurr": [13, 29], "case": [13, 18, 19, 21, 27, 28, 29, 42], "mutex": 13, "lock": 13, "weight": [13, 25, 27, 45, 47, 53, 55], "bia": [13, 20, 24, 25, 47], "temporari": 13, "area": 13, "reorder": 13, "argument": [13, 25, 27, 28, 30], "whether": [14, 24, 28, 29], "successfulli": [14, 31, 33, 34, 35, 37, 55], "platform": [14, 16, 27, 29, 30, 32, 34, 35, 36, 45, 48, 49, 50, 51, 53, 54, 55], "zero": [14, 16, 25, 26, 27, 32], "opencl": [14, 16, 32, 37], "And": [14, 32, 36, 37], "high": [14, 16, 17, 27, 29, 56], "list_physical_devic": [14, 19, 27], "tell": 14, "regist": [14, 16, 40, 46], "2021": 14, "07": [14, 25, 37, 54, 55], "01": [14, 30, 54], "06": [14, 27], "40": [14, 28], "55": [14, 28, 29, 55], "510076": 14, "dpcpp_runtim": [14, 27], "116": 14, "select": [14, 16, 27, 28, 30, 48, 56], "physicaldevic": [14, 52], "physical_devic": [14, 52], "know": [14, 19, 27], "rate": [14, 15, 18, 25, 31], "system": [14, 21, 29, 31, 33, 34, 35], "monitor": 14, "capabl": [14, 27], "clock": 14, "frequenc": 14, "eu": 14, "count": 14, "amount": [14, 27], "so": [14, 16, 19, 27, 28, 29, 30, 31, 34, 35, 42, 43, 45, 48, 49, 50, 51, 52, 53], "each": [14, 25, 27, 28, 29, 55], "modul": [14, 16, 17, 28], "relationship": [14, 18], "replac": [14, 25, 26, 31, 35], "stock": [14, 23, 24, 27, 32, 33, 36, 37, 40, 45, 48, 49, 50, 51, 53, 54, 55, 56], "sinc": [14, 27, 29], "9": [14, 16, 18, 25, 28, 33, 34, 40, 41, 50, 55], "That": [14, 29, 34, 35, 42], "them": [14, 18, 21, 27, 28, 29, 31, 50, 54], "unknown": [14, 27], "help": [14, 19, 20, 21, 28, 29, 37, 40, 46], "acceler": [14, 16, 30, 39, 42, 46, 56], "q1": 14, "2024": 14, "discontinu": 14, "upstream": [14, 18], "futur": 14, "current": [14, 17, 22, 30, 45, 49, 53, 55], "upgrad": [14, 32, 33, 36, 37, 40, 41, 49, 53, 56], "section": [14, 27, 29, 32], "problem": [14, 24, 27, 29], "encount": 14, "sycl": [14, 16], "level_zero_util": 14, "33": [14, 16, 32, 37], "fatal": 14, "level_zero": 14, "ze_api": 14, "modulenotfounderror": 14, "depend": [14, 19, 28, 29, 32, 34, 35, 37], "framework": [14, 32, 35, 42, 43, 44, 45, 48, 49, 51, 53], "errors_impl": [14, 42, 43, 45, 48, 49, 51, 53], "notfounderror": [14, 42, 43, 45, 48, 49, 51, 53], "libmkl_sycl": [14, 42, 43, 45, 48, 49, 51, 53], "cannot": [14, 18, 42, 43, 45, 48, 49, 51, 53], "setvar": [14, 32, 37, 41, 52], "env": [14, 31, 33, 34, 35, 37, 41, 46, 48], "var": [14, 31, 33, 34, 35, 37], "toolkit": [14, 16, 32, 33, 40, 42, 52, 56], "glibcxx_3": 14, "4": [14, 17, 18, 20, 24, 25, 27, 28, 29, 33, 45, 47, 52, 53, 55], "30": [14, 35, 55], "forg": 14, "gxx_linux": 14, "64": [14, 16, 17, 19, 27, 28, 32, 34, 36, 37, 45], "higher": [14, 15, 20, 27, 29], "glibcxx": 14, "veri": [15, 27, 45], "popular": 15, "deep": [15, 29, 39, 56], "techniqu": [15, 27], "invent": 15, "improv": [15, 19, 27, 29, 34, 35, 55], "speed": [15, 18, 29, 39, 40], "minim": [15, 29], "number": [15, 24, 27, 29, 39, 40, 45, 48, 54, 55], "bit": [15, 16, 18, 27, 30, 32, 34, 36, 37, 42], "convert": [15, 17, 18, 19, 27, 40, 42, 49, 54], "real": [15, 27, 54], "valu": [15, 17, 18, 20, 25, 27, 28, 29, 30, 54], "represent": 15, "mainli": [15, 17, 28], "phase": [15, 45], "loss": [15, 18, 19, 39, 40, 46, 52], "accuraci": [15, 18, 19, 25, 27, 39, 40, 46, 52, 54], "reduc": [15, 18, 27, 29, 34, 35, 40, 45, 48, 55], "miss": 15, "cost": 15, "network": [15, 29], "v2": [15, 30, 33, 45, 54], "newer": [15, 40, 41, 46], "integr": [15, 16, 29, 34], "box": 15, "green": 15, "subgraph": 15, "onednngraph": 15, "part": [15, 17, 29, 45, 53], "executor": 15, "partit": [15, 29], "deleg": 15, "grappler": [15, 17, 19, 52], "fold": 15, "itex_tf_constant_fold": [15, 46], "incept": [15, 18, 39, 48], "v3": [15, 39], "introduc": [16, 28, 29], "seamlessli": 16, "simplifi": [16, 40], "quickli": [16, 20, 27], "initi": [16, 17, 20, 27, 34, 35], "pytorch": 16, "xla": 16, "numpi": [16, 22, 25, 27, 47, 49], "style": 16, "compos": [16, 17], "transform": [16, 24, 25], "batch": [16, 17, 25, 27, 28, 55], "differenti": [16, 34], "multipl": [16, 18, 20, 29, 55], "_src": 16, "xla_bridg": 16, "register_pjrt_plugin_factori": 16, "getenv": 16, "pjrt_names_and_library_path": 16, "your_itex_path": 16, "libitex_xla_extens": 16, "jaxlib": 16, "xla_extens": 16, "lastest": 16, "interfac": [16, 17, 38, 56], "got": 16, "getpjrtapi": 16, "verifi": [16, 33, 34, 35, 39, 45, 48, 49, 50, 51, 53, 54, 55], "max": [16, 30, 34, 35, 37, 42, 44, 45, 48, 49, 50, 51, 52, 53, 54, 55], "647": [16, 32, 37], "flex": [16, 34, 35, 37, 40, 42, 44, 46, 48, 51, 56], "170": [16, 34, 35, 37, 48, 51], "arc": [16, 34, 35, 37, 42, 56], "red": [16, 37], "hat": [16, 37], "8": [16, 18, 25, 27, 28, 30, 32, 34, 35, 36, 37, 45, 46, 54], "6": [16, 18, 27, 30, 37, 45], "suse": [16, 37], "enterpris": [16, 37], "sle": [16, 37], "sp3": [16, 37], "sp4": [16, 37], "2023": [16, 32, 33, 37, 52], "19": [16, 28, 32, 36, 37], "later": [16, 29, 32, 36, 37], "manylinux2014": [16, 32, 36, 37], "append": [16, 32, 36, 37], "after": [16, 17, 18, 19, 22, 24, 26, 27, 29, 30, 32, 33, 37, 40, 45], "compon": [16, 17, 19, 30, 32, 33, 34, 35, 37], "icd": [16, 32, 37], "23": [16, 28, 32, 37, 55], "17": [16, 28, 32, 35, 37], "26241": [16, 32, 37], "There": [16, 21, 34, 40, 42, 46, 54], "ye": [16, 19, 33], "wish": [16, 34], "n": [16, 18, 24, 25, 29, 30, 33, 34, 35, 47], "libitex": [16, 35], "ld_library_path": [16, 35, 37], "your_python_sit": 16, "info": [16, 17, 18, 28, 35, 40, 42], "jnp": 16, "jit": 16, "def": [16, 27], "lax_conv": 16, "random": [16, 25, 47], "prngkei": 16, "lh": 16, "rh": 16, "side": 16, "lax": 16, "conv_with_general_pad": 16, "multipli": [16, 27], "itex_gpu_runtim": 16, "129": [16, 28], "servic": [16, 49], "176": [16, 32], "0x56060b5ae740": 16, "doe": [16, 24, 27], "guarante": [16, 32], "184": 16, "0449753": 16, "093208": 16, "1844783": 16, "9769732": 16, "5857391": 16, "6942389": 16, "9218378": 16, "2862523": 16, "1549542": 16, "8367321": 16, "3978379": 16, "3860377": 16, "9456574": 16, "062028": 16, "0365305": 16, "901286": 16, "5255247": 16, "1421617": 16, "0621": 16, "2933435": 16, "1257985": 16, "1095486": 16, "5584903": 16, "1229166": 16, "7746235": 16, "2446113": 16, "7870374": 16, "8216239": 16, "557919": 16, "9832508": 16, "0887792": 16, "5433128": 16, "9749291": 16, "2580051": 16, "6096935": 16, "264905": 16, "175818": 16, "0094342": 16, "005763": 16, "6559253": 16, "3896458": 16, "4036925": 16, "1342552": 16, "8239582": 16, "6091168": 16, "434404": 16, "671778": 16, "7397764": 16, "930626": 16, "659667": 16, "6508744": 16, "3305787": 16, "4061482": 16, "0829628": 16, "130649": 16, "6637266": 16, "594426": 16, "2636002": 16, "7168686": 16, "8598001": 16, "9009514": 16, "7938274": 16, "4870623": 16, "6193901": 16, "5297288": 16, "0247464": 16, "0905268": 16, "7598859": 16, "9362347": 16, "9513799": 16, "9403584": 16, "1483061": 16, "hlo_pass_pipelin": 16, "301": 16, "hlo": 16, "pipelin": [16, 39, 40, 46], "jit_lax_conv": 16, "181": 16, "fusion_merg": 16, "multi_output_fus": 16, "conv": [16, 17, 24, 47], "convolut": [16, 29], "gpu_compil": 16, "1221": 16, "llvm": 16, "spir_compil": 16, "255": [16, 19, 27], "compiletargetbinari": 16, "compiletospir": 16, "11": [16, 18, 28, 32, 33, 34, 35, 54, 56], "cumul": 16, "99": 16, "74": 16, "pjrt_stream_executor_cli": 16, "2201": 16, "num_replica": 16, "num_partit": 16, "num_addressable_devic": 16, "2268": 16, "replic": 16, "complet": [16, 29], "1208": 16, "pjrtstreamexecutorbuff": 16, "delet": 16, "1299": 16, "toliter": 16, "v0": [16, 30, 33], "mnist_classifi": 16, "given": [17, 25, 28, 49], "tile": [17, 20, 30, 45, 52, 53, 54, 55], "split": [17, 18], "dimens": 17, "As": [17, 24, 27, 28, 29], "first": [17, 18, 19, 22, 24, 25, 27, 28, 29, 32, 33, 36, 37, 45, 49, 53], "limit": [17, 29, 56], "homogen": 17, "At": [17, 21, 40, 48], "tfg": 17, "mlir": 17, "assum": [17, 27, 29, 33, 34, 35, 45, 49, 53], "matmul": [17, 20, 24, 26, 35], "normal": [17, 20, 25, 27, 29, 34, 42], "autoshard": [17, 55], "back": [17, 27], "under": [17, 23, 26, 28, 30, 34, 46], "primari": [17, 29], "entri": 17, "point": [17, 18, 20, 27, 30, 32, 37, 42], "auto_sharding_pass_mlir": 17, "invok": 17, "hook": 17, "convers": [17, 18, 19, 24], "between": [17, 18, 19, 21, 29, 31, 34, 48, 54, 55], "graphdef": [17, 18], "dialect": 17, "type_infer": 17, "tfg_to_h": 17, "auto_sharding_pass": 17, "hs_to_tfg": 17, "mark": 17, "scope": [17, 35], "unshard": 17, "annot": 17, "uniniti": 17, "properti": [17, 18, 27], "ir": 17, "heterogen": [17, 56], "reli": 17, "heurist": 17, "hsp": 17, "per": [17, 27, 28, 29, 33, 52, 55], "semant": [17, 20, 25], "final": [17, 19, 27, 45], "accord": [17, 18, 42, 50, 52, 54], "turn": [17, 56], "graphopt": [17, 18, 19, 42, 55], "ON": [17, 30, 42, 55], "flag": [17, 35, 53], "global": [17, 27, 30, 55], "shardingconfig": [17, 55], "mode": [17, 20, 24, 30, 45, 48, 54], "auto_mod": [17, 55], "paramet": [17, 26, 42], "batch_siz": [17, 19, 27, 49, 55], "stage_num": [17, 55], "decid": 17, "device_num": [17, 55], "graph_opt": [17, 18, 19, 30, 42, 46, 55], "sharding_config": [17, 55], "itex_cfg": [17, 55], "configproto": [17, 18, 19, 42, 46, 55], "set_config": [17, 18, 19, 42, 55], "itex_optimizer_before_shard": 17, "pbtxt": 17, "itex_optimizer_after_shard": 17, "resnet50": [17, 28, 39], "train": [17, 18, 21, 24, 25, 26, 28, 31, 32, 33, 37, 38, 39, 42, 45, 46, 51], "fp16": [18, 19, 39, 42, 45], "bf16": [18, 19, 24, 39, 40, 42, 45, 49, 53, 54, 55], "obvious": 18, "compar": [18, 27, 29, 39], "fp32": [18, 19, 20, 24, 39, 40, 45, 46, 53], "danger": 18, "order": [18, 19, 27, 28, 29, 33, 38], "achiev": [18, 29], "faster": [18, 19, 25, 27, 29, 42], "strong": 18, "four": 18, "allowlist": 18, "denylist": 18, "inferlist": 18, "clearlist": 18, "let": [18, 27, 31], "balanc": [18, 19], "expect": [18, 33, 46, 56], "alwai": [18, 27], "critic": 18, "addition": [18, 27], "downstream": 18, "too": [18, 27, 32, 37], "exp": 18, "gt": [18, 30, 55], "due": [18, 29], "effect": [18, 28, 29], "desir": [18, 28], "explain": 18, "principl": 18, "index": [18, 29], "7": [18, 27, 28, 30, 45, 48], "everi": [18, 20, 48], "ii": [18, 19, 30], "whose": 18, "iii": [18, 19], "deni": 18, "ignor": [18, 27, 31], "iv": [18, 19], "insert": [18, 19, 24, 46], "increas": [18, 27, 46], "priorit": 18, "auto_mixed_precision_opt": [18, 19, 42], "automixedprecosionopt": 18, "16": [18, 27, 28, 30, 36, 42, 45, 54], "32": [18, 25, 26, 27, 28, 30, 42, 45, 51, 54], "data_typ": [18, 19, 42], "itex_auto_mixed_precision_data_typ": [18, 19, 42], "ampthre": 18, "default_data_typ": [18, 30], "unsafe_force_al": 18, "itex_auto_mixed_precision_unsafe_force_al": 18, "allowlist_add": [18, 19], "itex_auto_mixed_precision_allowlist_add": [18, 19], "string": [18, 27, 28, 34, 35], "denylist_add": 18, "itex_auto_mixed_precision_denylist_add": 18, "inferlist_add": 18, "itex_auto_mixed_precision_inferlist_add": 18, "clearlist_add": 18, "itex_auto_mixed_precision_clearlist_add": 18, "allowlist_remov": 18, "itex_auto_mixed_precision_allowlist_remov": 18, "denylist_remov": 18, "itex_auto_mixed_precision_denylist_remov": 18, "inferlist_remov": [18, 19], "itex_auto_mixed_precision_inferlist_remov": [18, 19], "clearlist_remov": 18, "itex_auto_mixed_precision_clearlist_remov": 18, "avgpool": [18, 19], "mani": [18, 21, 27, 28, 29, 52], "extra": [18, 27], "up": [18, 22, 27, 29, 32, 35, 39, 45, 48, 51], "tabl": [18, 27, 28], "correspond": [18, 28], "itex_auto_mixed_precision_log_path": [18, 19, 20, 30], "tf_auto_mixed_precision_graph_rewrite_log_path": 18, "tf_auto_mixed_precision_graph_rewrite_level": 18, "tf_auto_mixed_precision_graph_rewrite_allowlist_add": 18, "tf_auto_mixed_precision_graph_rewrite_denylist_add": 18, "tf_auto_mixed_precision_graph_rewrite_inferlist_add": 18, "tf_auto_mixed_precision_graph_rewrite_clearlist_add": 18, "tf_auto_mixed_precision_graph_rewrite_allowlist_remov": 18, "tf_auto_mixed_precision_graph_rewrite_denylist_remov": 18, "tf_auto_mixed_precision_graph_rewrite_inferlist_remov": 18, "tf_auto_mixed_precision_graph_rewrite_clearlist_remov": 18, "With": [18, 19, 27, 28, 40, 43, 47, 48], "most": [18, 19, 27, 28, 29, 42, 50], "basic": [18, 19, 20, 27], "itexauto_mixed_precision_opt": [18, 19], "automixedprecisionopt": [18, 19, 42], "float16graph_opt": [18, 19], "auto_mixed_precision_optionsgraph_opt": 18, "auto_mixed_precis": [18, 19, 30, 42], "onconfig": [18, 19], "itex_auto_mixed_precis": [18, 19, 28, 30, 42], "1export": [18, 19], "avgpool3d": [18, 19], "cnn": [18, 29, 39, 40], "v4": [18, 39], "epoch": [18, 19, 27, 45, 52, 53], "slower": [18, 19, 27], "becaus": [18, 19, 27], "subsequ": [18, 19, 27, 29, 48], "alreadi": [18, 27, 33, 40], "howev": [18, 21, 24, 27, 28, 29, 48], "usual": 18, "chanc": [18, 27], "my": [18, 19], "automixedprecis": 18, "1657011814330": 18, "pb": [18, 19, 31, 42], "binari": [18, 31, 34, 35], "txt": [18, 32, 37, 48, 51, 55], "text": [18, 39], "preop": 18, "1657011815538": 18, "pre": [18, 30, 36, 37, 45, 50, 53], "paintbucket": 18, "netron": 18, "softmax": [18, 19, 27], "move": [18, 29, 45, 49, 53], "altern": 18, "abov": [18, 19, 22, 27, 28, 29, 32, 42, 45, 46, 49, 50, 51, 52, 53, 55], "littl": 18, "drop": [18, 28], "occupi": 18, "over": [18, 27], "whole": [18, 20, 30, 45], "runtim": [18, 23, 25, 27, 29, 32, 34, 35, 56], "repeat": 18, "until": [18, 29], "reach": 18, "peak": [18, 23], "consumpt": [19, 21, 27, 42], "kera": [19, 25, 26, 46, 48, 52, 56], "similar": [19, 29], "offer": [19, 29], "frozen": 19, "layernorm": [19, 24, 26], "instancenorm": [19, 26], "swish": [19, 24], "power": [19, 56], "versu": [19, 29], "remapp": [19, 24, 30], "exist": [19, 24, 26, 27, 28, 40], "cover": [19, 21, 24, 28, 29], "than": [19, 25, 27, 29, 32, 37, 42, 47, 52], "knowledg": [19, 29], "possibl": [19, 29, 34], "special": [19, 23, 27, 34, 35], "bfloat16graph_opt": 19, "4096": [19, 27], "unit": [19, 25, 27, 29], "num_unit": [19, 27], "els": [19, 27, 35, 54], "784": [19, 27, 28], "digit": [19, 27], "dens": [19, 20, 27], "dense_1": [19, 27], "dense_2": [19, 27], "dense_logit": [19, 27], "predict": [19, 26, 27, 51], "sparse_categorical_crossentropi": [19, 27], "rmsprop": [19, 27], "metric": [19, 27], "x_train": [19, 27], "y_train": [19, 27], "x_test": [19, 27], "y_test": [19, 27], "dataset": [19, 27, 46, 52], "mnist": [19, 27, 31, 39, 52], "load_data": [19, 27], "reshap": [19, 25, 27], "60000": [19, 27], "astyp": [19, 27, 47], "10000": [19, 25, 27], "histori": [19, 27], "fit": [19, 29], "8192": [19, 27], "validation_split": [19, 27], "test_scor": [19, 27], "evalu": [19, 27, 48, 51], "stabil": [19, 27], "rule": 19, "introduct": [19, 56], "adjust": [20, 25], "Not": 20, "rest": [20, 24], "ll": [20, 24], "prioriti": [20, 30], "itex_tile_as_devic": 20, "card": [20, 52], "treat": 20, "itex_fp32_math_mod": 20, "math": [20, 24, 27, 32, 37], "tf32": 20, "bf32": 20, "auto_mixed_precision_log_path": [20, 30], "tf_cpp_max_vlog_level": 20, "itex_cpp_min_log_level": 20, "tf_cpp_min_log_level": 20, "displai": 20, "onc": [20, 27, 29], "across": [20, 25], "iter": [20, 55], "larg": [20, 27, 29, 39], "dump": 20, "bert": [20, 39], "encod": 20, "layer_0": 20, "biasadd": [20, 26], "read": [20, 27, 40, 49], "dt_float": [20, 35], "data_format": [20, 55], "nhwc": [20, 29], "remain": 20, "situat": [20, 30], "preserv": 20, "dpc": [21, 32, 33, 34, 35, 37], "besid": [21, 29], "etc": [21, 32], "aka": 21, "almost": 21, "thing": 21, "expos": [21, 22, 56], "factor": [21, 28], "influenc": [21, 28, 29], "properli": [21, 28], "unifi": [21, 28], "topologi": [21, 28, 29], "combin": [21, 28, 29, 48], "autom": [21, 28], "complic": [21, 28], "launch": [21, 37, 48], "blob": [21, 31], "20230123": 21, "md": 21, "openxla_support_on_gpu": 21, "tfx": 21, "bridg": [21, 31], "streamlin": [21, 31], "deploi": [21, 31], "while": [21, 27, 29, 30, 31, 34, 43, 47, 50], "effici": [21, 29, 31, 55], "easi": [21, 40, 56], "track": [22, 50], "item": 22, "stat": 22, "trace": 22, "viewer": 22, "tensorflow_hub": 22, "tensorboard": [22, 56], "np": [22, 25, 47, 49, 52, 53, 54], "tf_hub": 22, "logpath": 22, "join": [22, 29], "profiler_demo": 22, "set_log_device_plac": 22, "keraslay": 22, "tfhub": 22, "imagenet": 22, "resnet_v1_50": 22, "classif": 22, "ones": [22, 25, 26, 30], "224": 22, "warm": 22, "stop": [22, 29], "demo": 22, "logdir": 22, "bind_al": 22, "analyz": 22, "tab": 22, "dashboard": 22, "refresh": 22, "bring": [23, 27, 28, 56], "deeper": 23, "choos": [23, 25, 27, 28, 29, 34, 35, 38, 42, 46, 47, 49], "These": [24, 27, 28, 56], "equal": [24, 29], "notequ": 24, "greaterequ": 24, "greater": [24, 29], "lessequ": 24, "l2loss": 24, "addn": 24, "batchmatmul": [24, 26], "mul": 24, "trainingop": 24, "relu6": 24, "elu": 24, "leakyrelu": 24, "gelu_erf": 24, "gelu_tanh": 24, "tanh": [24, 25, 26], "sigmoid": [24, 25, 26], "fusedbatchnorm": 24, "fusedbatchnormgrad": 24, "relugrad": 24, "biasaddgrad": 24, "convgradfilt": 24, "pad": [24, 25, 47], "break": 24, "closer": 24, "accmatmul": 24, "fusedmatmul": 24, "fusedaccmatmul": 24, "matcher": 24, "withsum": 24, "attribut": [24, 30], "tout": 24, "tpost": 24, "is_bf16_math_mod": 24, "boolean": [24, 28], "indic": [24, 27, 42, 55], "transpos": [24, 26], "conv3d": 24, "maxpool3d": 24, "unnecessari": [24, 27, 29], "ndhwc": 24, "ncdhw": 24, "adam": 25, "decai": 25, "weight_decay_r": 25, "001": [25, 26], "learning_r": [25, 51], "beta_1": 25, "beta_2": 25, "999": 25, "epsilon": [25, 26], "1e": [25, 27], "exclude_from_weight_decai": 25, "layer_norm": 25, "kwarg": [25, 26], "adamw": 25, "describ": [25, 27, 28, 29], "decoupl": 25, "regular": 25, "loshch": 25, "ilov": 25, "hutter": 25, "pdf": 25, "tfa": [25, 26, 49], "trainabl": 25, "piecewiseconstantdecai": 25, "15000": 25, "lr": [25, 52], "wd": 25, "lambda": 25, "ba": 25, "et": 25, "al": 25, "2016": 25, "axi": [25, 26], "scale": [25, 26, 55], "beta_initi": [25, 26], "gamma_initi": [25, 26], "beta_regular": [25, 26], "gamma_regular": [25, 26], "beta_constraint": [25, 26], "gamma_constraint": [25, 26], "independ": [25, 28], "rather": 25, "close": [25, 29], "deviat": 25, "arang": 25, "99998": 25, "group": [25, 29], "yuxin": 25, "wu": 25, "kaim": 25, "he": 25, "divid": [25, 27, 29], "varianc": 25, "empir": 25, "stabl": [25, 27, 39, 56], "norm": 25, "wide": [25, 39], "rang": [25, 27, 29], "linearli": 25, "4d": 25, "gaussian": 25, "where": [25, 27, 29, 34], "nonlinear": 25, "gate": 25, "sign": [25, 32], "arrai": 25, "00404969": 25, "15865526": 25, "8413447": 25, "9959502": 25, "00363725": 25, "158808": 25, "841192": 25, "9963627": 25, "long": 25, "short": [25, 27], "hochreit": 25, "schmidhub": 25, "1997": 25, "lstm": 25, "200": [25, 26], "recurrent_activ": [25, 26], "use_bia": [25, 26], "kernel_initi": [25, 26], "glorot_uniform": [25, 26], "recurrent_initi": [25, 26], "orthogon": [25, 26], "bias_initi": [25, 26], "constraint": 25, "fallback": 25, "fast": 25, "mask": [25, 39], "strictli": 25, "outermost": 25, "return_sequ": 25, "return_st": 25, "whole_seq_output": 25, "final_memory_st": 25, "final_carry_st": 25, "experimental_ops_overrid": [26, 30], "overload": 26, "kept": [26, 27], "layernormgrad": 26, "itexlayernorm": 26, "itexlayernormgrad": 26, "itexgelu": 26, "itexgelugrad": 26, "addon": [26, 52], "itexlstm": 26, "itexrnn": 26, "mixed_precis": 27, "mixed_float16": 27, "mixed_bfloat16": 27, "distinguish": 27, "nvidia": [27, 45, 48, 49, 53], "is_gpu_avail": 27, "test_func": 27, "identif": 27, "2022": [27, 28, 30], "14": [27, 52], "02": [27, 54], "52": [27, 28], "41": 27, "061277": 27, "w": [27, 39], "gpu_profil": 27, "111": [27, 29], "warn": [27, 28, 35], "061301": 27, "114": [27, 52], "061306": 27, "118": 27, "063685": 27, "063851": 27, "stream_executor": 27, "cuda": 27, "cuda_driv": 27, "269": 27, "cuinit": 27, "303": 27, "063865": 27, "cuda_diagnost": 27, "156": 27, "dut3046": 27, "atsp": 27, "proc": [27, 29], "caus": [27, 29, 50], "set_global_polici": 27, "slowli": 27, "least": [27, 32, 33], "multi": [27, 29, 30, 33, 34, 53, 55], "worker": 27, "messag": [27, 28], "aspect": 27, "constructor": 27, "numer": 27, "queri": 27, "compute_dtyp": 27, "variable_dtyp": 27, "mention": [27, 29], "next": 27, "domin": 27, "neglig": 27, "therefor": [27, 29], "fewer": 27, "finish": [27, 34, 47, 50], "dense1": 27, "dense2": 27, "previous": 27, "Their": 27, "mismatch": 27, "dtype_polici": 27, "incorrect": 27, "end": [27, 39, 40, 46], "would": [27, 32, 34, 54], "correct": [27, 34, 35], "keep": [27, 29], "middl": 27, "fine": [27, 28, 29, 45], "intermedi": 27, "flow": 27, "occur": 27, "think": 27, "But": 27, "necessari": [27, 32, 36, 37, 47], "last": [27, 50], "suffici": 27, "even": [27, 28, 29, 38, 56], "still": 27, "simpli": [27, 55], "particular": 27, "storag": [27, 42, 50], "googleapi": [27, 42, 50], "npz": 27, "11490434": 27, "1u": 27, "don": 27, "divis": 27, "retriev": 27, "scratch": [27, 45], "again": 27, "initial_weight": 27, "get_weight": 27, "6240": 27, "3359": 27, "val_loss": 27, "9755": 27, "val_accuraci": 27, "7494": 27, "83m": 27, "7987": 27, "7520": 27, "3455": 27, "8972": 27, "81m": 27, "3670": 27, "8819": 27, "3753": 27, "8751": 27, "85m": 27, "3555": 27, "8863": 27, "2155": 27, "9377": 27, "84m": 27, "1986": 27, "9410": 27, "4498": 27, "8534": 27, "spend": 27, "afterward": [27, 28, 29], "colab": 27, "rerun": 27, "cell": [27, 48], "On": [27, 29, 32, 36, 37], "significantli": 27, "sped": 27, "world": 27, "doubl": 27, "toi": 27, "entir": 27, "60": [27, 28, 45], "000": 27, "imag": [27, 36, 37, 39, 48], "narrow": 27, "65504": 27, "infin": 27, "much": [27, 29, 46], "256": [27, 55], "inf": 27, "rare": 27, "gradient": 27, "prevent": 27, "concept": [27, 29], "sai": [27, 45], "1024": 27, "greatli": 27, "pseudocod": 27, "loss_scal": 27, "grad": 27, "compute_gradi": 27, "trainable_vari": 27, "tricki": 27, "solv": 27, "explicitli": [27, 28, 30, 46], "wrapper": [27, 37], "lossscaleoptim": 27, "far": 27, "did": [27, 29], "wrap": 27, "highli": 27, "recommend": [27, 29, 30, 31, 32, 33, 34, 36, 37, 41, 46], "been": [27, 29, 48, 55], "known": [27, 50], "loss_object": 27, "sparsecategoricalcrossentropi": 27, "train_dataset": 27, "from_tensor_slic": 27, "shuffl": 27, "test_dataset": 27, "method": [27, 29, 40, 46], "unscal": 27, "get_scaled_loss": 27, "get_unscaled_gradi": 27, "apply_gradi": 27, "nan": 27, "halv": 27, "had": [27, 29], "potenti": [27, 56], "train_step": [27, 45, 55], "gradienttap": 27, "tape": 27, "scaled_loss": 27, "scaled_gradi": 27, "zip": 27, "few": [27, 54], "happen": [27, 50], "qualiti": 27, "test_step": 27, "retrain": 27, "set_weight": 27, "epoch_loss_avg": 27, "test_accuraci": 27, "sparsecategoricalaccuraci": 27, "update_st": 27, "924008369445801": 27, "7239000201225281": 27, "5294489860534668": 27, "9168000221252441": 27, "3364005982875824": 27, "9381000399589539": 27, "25294047594070435": 27, "9486000537872314": 27, "26531240344047546": 27, "9536000490188599": 27, "perspect": [28, 29], "numactl": 28, "placement": [28, 29], "polici": [28, 29, 56], "malloc": [28, 29], "unspecifi": 28, "knob": 28, "your_script": 28, "your_script_arg": 28, "latency_mod": 28, "throughput_mod": 28, "often": [28, 32, 36, 37], "calcul": [28, 48], "mutual": 28, "exclus": 28, "infer_resnet50": [28, 43], "undesir": 28, "log_path": 28, "absolut": 28, "rel": 28, "One": [28, 29], "prefix": 28, "_timestamp_inst": 28, "anoth": [28, 29], "_timestamp_instance_n_cor": 28, "run_20210712212258_inst": 28, "run_20210712212258_instance_0_cores_0": 28, "43": 28, "interpret": 28, "no_python": 28, "prepend": [28, 49, 53, 54], "log_file_prefix": 28, "yourself": 28, "ninstanc": 28, "integ": 28, "instance_idx": 28, "among": [28, 29], "ncore_per_inst": 28, "resourc": [28, 29, 50], "node_id": 28, "skip_cross_node_cor": 28, "cross": [28, 29], "disable_numactl": 28, "disable_taskset": 28, "taskset": 28, "use_logical_cor": 28, "core_list": 28, "core_id": 28, "enable_tcmalloc": 28, "enable_jemalloc": 28, "use_default_alloc": 28, "prefer": [28, 32, 36, 37], "certain": [28, 29], "openmp": 28, "kmp_affin": [28, 29], "granular": [28, 29], "compact": [28, 29], "hyper": [28, 29], "our": 28, "enable_itex_amp": 28, "enable_itex_layout_opt": 28, "itex_layout_opt": [28, 29, 30], "num": [28, 29], "intraop": 28, "interop": 28, "run_20221009103552_instance_0_cores_0": 28, "run_20221009103552_inst": 28, "cat": 28, "09": [28, 54], "35": [28, 37], "53": 28, "136": 28, "__main__": 28, "neither": 28, "nor": 28, "conda_prefix": 28, "virtual_env": 28, "lib64": 28, "sdp": 28, "ld_preload": [28, 29], "omp_num_thread": 28, "96": [28, 35], "kmp_blocktim": [28, 29], "tf_enable_onednn_opt": 28, "137": 28, "localalloc": 28, "95": 28, "tee": [28, 32, 45, 55], "run_20221009104740_inst": 28, "run_20221009104740_instance_0_cores_0": 28, "191": 28, "47": [28, 54], "908": 28, "909": 28, "192": 28, "run_20221009105044_inst": 28, "run_20221009105044_instance_0_cores_12": 28, "50": [28, 53], "693": 28, "694": 28, "run_20221009105320_inst": 28, "run_20221009105320_instance_0_cores_0": 28, "21": 28, "089": 28, "090": 28, "run_20221009105838_inst": 28, "run_20221009105838_instance_0_cores_0": 28, "run_20221009105838_instance_1_cores_12": 28, "run_20221009105838_instance_2_cores_24": 28, "run_20221009105838_instance_3_cores_36": 28, "run_20221009105838_instance_4_cores_48": 28, "59": 28, "run_20221009105838_instance_5_cores_60": 28, "71": 28, "run_20221009105838_instance_6_cores_72": 28, "83": [28, 29], "run_20221009105838_instance_7_cores_84": 28, "58": 28, "38": 28, "757": 28, "772": 28, "795": 28, "24": [28, 52], "806": 28, "36": 28, "817": 28, "48": [28, 54], "828": 28, "839": 28, "72": 28, "850": 28, "84": [28, 29], "run_20221009110327_inst": 28, "run_20221009110327_instance_0_cores_0": 28, "run_20221009110327_instance_1_cores_4": 28, "run_20221009110327_instance_2_cores_8": 28, "run_20221009110327_instance_3_cores_12": 28, "run_20221009110327_instance_4_cores_16": 28, "run_20221009110327_instance_5_cores_20": 28, "run_20221009110327_instance_6_cores_24": 28, "27": [28, 29, 55], "run_20221009110327_instance_7_cores_28": 28, "31": [28, 32], "run_20221009110327_instance_8_cores_32": 28, "run_20221009110327_instance_9_cores_36": 28, "39": 28, "run_20221009110327_instance_10_cores_40": 28, "run_20221009110327_instance_11_cores_44": 28, "run_20221009110327_instance_12_cores_48": 28, "51": 28, "run_20221009110327_instance_13_cores_52": 28, "run_20221009110327_instance_14_cores_56": 28, "run_20221009110327_instance_15_cores_60": 28, "63": 28, "run_20221009110327_instance_16_cores_64": 28, "67": 28, "run_20221009110327_instance_17_cores_68": 28, "run_20221009110327_instance_18_cores_72": 28, "75": 28, "run_20221009110327_instance_19_cores_76": 28, "79": 28, "run_20221009110327_instance_20_cores_80": 28, "run_20221009110327_instance_21_cores_84": 28, "87": 28, "run_20221009110327_instance_22_cores_88": 28, "91": 28, "run_20221009110327_instance_23_cores_92": 28, "03": [28, 54], "198": 28, "215": 28, "216": 28, "229": 28, "241": 28, "254": 28, "266": 28, "278": 28, "20": [28, 36, 53, 55], "290": 28, "302": 28, "28": [28, 29, 33, 37], "315": 28, "327": 28, "339": 28, "351": 28, "364": 28, "376": 28, "388": 28, "56": [28, 29], "400": [28, 54], "413": 28, "425": 28, "68": 28, "438": 28, "452": 28, "76": 28, "465": 28, "80": 28, "480": 28, "494": 28, "88": [28, 51], "509": 28, "92": 28, "run_20221009110849_inst": 28, "run_20221009110849_instance_0_cores_0": 28, "run_20221009110849_instance_1_cores_11": 28, "run_20221009110849_instance_2_cores_22": 28, "run_20221009110849_instance_3_cores_33": 28, "08": [28, 54], "49": [28, 37, 54], "891": 28, "892": 28, "run_20221009110849_instance_1_cores_24": 28, "930": 28, "run_20221009110849_instance_2_cores_48": 28, "951": 28, "run_20221009110849_instance_3_cores_72": 28, "confirm": [28, 34, 35], "34": 28, "586": 28, "assign": [28, 29, 35], "604": 28, "605": 28, "run_20221009111034_instance_0_cores_0": 28, "144": 28, "145": [28, 54, 55], "run_20221009111239_instance_0_cores_24": 28, "run_20221009111753_inst": 28, "run_20221009111753_instance_0_cores_0": 28, "947": 28, "948": 28, "run_20221009111951_inst": 28, "run_20221009111951_instance_0_cores_0": 28, "404": 28, "405": 28, "match": [28, 38], "conf": 28, "549": 28, "550": 28, "malloc_conf": 28, "oversize_threshold": 28, "background_thread": 28, "metadata_thp": 28, "run_20221009112720_instance_0_cores_0": 28, "29": 28, "05": [28, 52], "206": 28, "207": 28, "run_20221009112905_instance_0_cores_0": 28, "911": 28, "run_20221009112956_instance_0_cores_0": 28, "although": 29, "articl": 29, "omp": 29, "briefli": 29, "background": 29, "being": 29, "socket": [29, 33, 55], "competit": 29, "stall": 29, "busi": 29, "uma": 29, "connect": 29, "control": [29, 39, 46, 55], "remot": 29, "lscpu": [29, 46], "platinum": 29, "8180m": 29, "detect": 29, "onboard": 29, "logic": 29, "thu": 29, "total": [29, 52], "112": 29, "second": [29, 46, 54, 55], "neg": 29, "50ghz": 29, "node0": 29, "node1": 29, "friendli": 29, "nchw": 29, "idea": 29, "bound": 29, "workload": [29, 39, 46, 56], "nth": 29, "man": 29, "cpunodebind": 29, "membind": 29, "wikipedia": [29, 45], "wherebi": 29, "master": [29, 31], "consecut": 29, "fork": 29, "figur": 29, "illustr": 29, "libgomp": 29, "libiomp": 29, "region": 29, "along": 29, "seen": 29, "coupl": 29, "commonli": 29, "gomp": 29, "affin": 29, "comma": 29, "hyphen": 29, "contigu": 29, "gomp_cpu_affin": 29, "omp_proc_bind": 29, "omp_schedul": 29, "static": 29, "ld": 29, "preload": 29, "libiomp5": [29, 35], "kmp": 29, "dramat": 29, "togeth": 29, "thrash": 29, "suppos": [29, 45], "leav": 29, "compet": 29, "strategi": 29, "proclist": 29, "classic": 29, "blocktim": 29, "millisecond": 29, "wait": 29, "sleep": 29, "200m": 29, "elaps": 29, "larger": [29, 34, 35], "reserv": 29, "sole": 29, "penal": 29, "plai": 29, "role": 29, "destruct": 29, "reus": [29, 40], "jemalloc": 29, "hold": 29, "dealloc": 29, "costli": 29, "gperftool": 29, "plu": 29, "nice": 29, "analysi": 29, "xzvf": 29, "heap": 29, "checker": 29, "debugalloc": 29, "flexibl": 30, "protocolmessag": 30, "easili": 30, "tune": [30, 40, 45], "offononoffoff": 30, "itex_onednn_graph": [30, 46], "itex_layout_optitex_remapperitex_auto_mixed_precisionitex_shard": 30, "except": [30, 37], "enum": 30, "itexdatatyp": 30, "datatyp": [30, 35, 45, 49, 53], "toggl": 30, "unless": 30, "field": 30, "onednn_graph": 30, "onednn_graphoverrid": 30, "layout_opt": 30, "itex_remapp": 30, "itex_shard": 30, "xpu_force_sync": 30, "itex_sync_exec": 30, "sync": 30, "hurt": 30, "rais": 30, "valueerror": 30, "git_vers": [30, 33], "7112d33": 30, "onednn_cpu_git_vers": 30, "a930253": 30, "onednn_gpu_git_vers": 30, "compiler_vers": 30, "gcc": 30, "20180905": 30, "dpcpp": [30, 32], "122": 30, "tf_compatible_vers": 30, "lt": 30, "put": 31, "libitex_cpu_cc": [31, 35], "libitex_gpu_cc": [31, 35], "l28": 31, "exit": 31, "xxxxx": [31, 55], "kernels_experiment": 31, "tf_cuda_librari": 31, "if_not_mobil": 31, "p1": 31, "tf_serv": 31, "serving_plugin": 31, "l24": 31, "l29": 31, "local_repositori": 31, "org_tensorflow": 31, "wno": 31, "stringop": 31, "truncat": 31, "rm": [31, 35, 41, 42, 54], "rf": [31, 41, 54], "tmp": 31, "mnist_saved_model": 31, "saved_model": 31, "l": [31, 35], "modelserv": 31, "plug": [31, 56], "hub": 31, "port": [31, 48], "rest_api_port": 31, "8501": 31, "model_base_path": 31, "tensorflow_plugin": 31, "path_to_libitex_cpu_cc": 31, "oneapi_install_path": 31, "path_to_libitex_gpu_cc": 31, "mnist_client": 31, "num_test": 31, "1000": [31, 49, 54], "xx": [31, 54], "earli": 32, "effort": 32, "basi": 32, "subystem": 32, "graphic": [32, 34, 35], "101": 32, "4255": 32, "dch": 32, "gpg": 32, "agent": 32, "qo": 32, "dearmor": 32, "keyr": 32, "echo": 32, "deb": 32, "arch": 32, "i386": 32, "jammi": 32, "igc": 32, "cm": 32, "libigc1": 32, "13822": 32, "libigdfcl1": 32, "libigdgmm12": 32, "pub": 32, "sw": 32, "archiv": 32, "instead": [32, 45, 48, 49, 53], "icd_23": 32, "04_": 32, "isol": [32, 36, 37], "basekit": [32, 33, 37], "weekli": 32, "env_check": [32, 33, 37, 56], "access": 32, "onemkl": [32, 33, 34, 35, 37], "registrationcent": [32, 37], "akdlm": [32, 37], "irc_na": [32, 37], "992857b9": [32, 37], "624c": [32, 37], "45de": [32, 37], "9701": [32, 37], "f6445d845359": [32, 37], "l_basekit_p_2023": [32, 37], "49397_offlin": [32, 37], "mpi": [32, 33, 37], "deploy": [33, 36, 37], "miniconda": 33, "approach": 33, "easiest": 33, "setup": [33, 36, 38, 40], "press": 33, "curl": 33, "anaconda": 33, "miniconda3": 33, "x86_64": [33, 34, 35], "restart": 33, "termin": 33, "bashrc": 33, "intelpython3_ful": 33, "142f5f29": 33, "ccl": [33, 37], "cluster": 33, "fi_provid": 33, "though": 34, "virtual": [34, 35, 45, 46, 48, 49, 50, 51, 53, 54, 55], "itex_build": [34, 35], "aot": [34, 35], "ahead": [34, 35], "startup": [34, 35], "prolong": [34, 35], "minut": [34, 35], "tookit": [34, 35], "tree": [34, 35], "prompt": [34, 35], "differenct": [34, 35], "fill": [34, 35], "ats": [34, 35], "m150": [34, 35], "acm": [34, 35], "g11": [34, 35], "ve": [34, 35], "140": [34, 35], "m75": [34, 35], "pvc": [34, 35], "a730m": [34, 35], "g10": [34, 35], "a380": [34, 35], "wrong": [34, 35], "identifi": 34, "libitex_common": 34, "_pywrap_itex": 34, "libitex_cpu": 34, "libitex_gpu": 34, "preconfigur": 34, "bazelrc": 34, "shoul": 35, "diretcori": 35, "llvm_openmp": 35, "pythonhost": 35, "ed": 35, "310fee0477ce46f722c561dd7e21eebca0d1d29bdb3cf4a2335b845fbba4": 35, "cp311": 35, "manylinux_2_17_x86_64": 35, "manylinux2014_x86_64": 35, "b": [35, 42, 46, 54, 55], "unzip": 35, "tensorflow_2": 35, "symbol": 35, "ln": 35, "libtensorflow_cc": 35, "libtensorflow_framework": 35, "libtensorflow": 35, "r2": [35, 55], "install_head": 35, "environment": 35, "library_path": 35, "tf_loadpluggabledevicelibrari": 35, "c_api_experiment": 35, "tf_statu": 35, "lib_path": 35, "client_sess": 35, "standard_op": 35, "newrootscop": 35, "assign_x": 35, "randomnorm": 35, "assign_i": 35, "z": [35, 52], "const": 35, "vz": 35, "vector": 35, "clientsess": 35, "session": [35, 46], "fetch": 35, "tf_check_ok": 35, "matrix": 35, "xpu_lib_path": 35, "c_str": 35, "tf_code": 35, "status_msg": 35, "tf_messag": 35, "makefil": 35, "tf_include_path": 35, "tfcc_path": 35, "example_test": 35, "ltensorflow_framework": 35, "ltensorflow_cc": 35, "wl": 35, "rpath": 35, "2nd": 36, "4th": [36, 42], "cento": 36, "sapphir": [36, 42], "rapid": [36, 42], "8888": [36, 37, 42, 46, 48, 50], "pip3": 36, "simultan": 37, "stack": [37, 38], "intel64": 37, "libfabr": 37, "i_mpi_root": 37, "ccl_root": 37, "fi_provider_path": 37, "tbb": 37, "libiari": 37, "en": 37, "consol": 37, "00": [37, 54], "374832": 37, "itex_cpu_wrapp": 37, "42": 37, "217981": 37, "itex_gpu_wrapp": 37, "205706": 37, "313231": 37, "varieti": 39, "classifi": [39, 54], "bare": 39, "metal": 39, "alexnet": 39, "recogn": [39, 40], "handwrit": [39, 40], "ai": [39, 40, 44, 46, 56], "zoo": 39, "diffus": [39, 56], "text2imag": 39, "pretrain": 39, "3d": 39, "unet": 39, "medic": 39, "segment": [39, 53], "technologi": 40, "big": 40, "blocker": 40, "analyt": 40, "websit": [40, 56], "env_nam": 41, "env_itex": [41, 42, 46, 48, 49, 50, 52, 53], "venv": [41, 49, 52, 53], "internet": 42, "throughput": [42, 48], "seriesintel": 42, "170intel": 42, "seriesne": 42, "seriessupport": 42, "itex_repo": 42, "pwd": [42, 55], "infer_inception_v4_amp": 42, "v1_8": 42, "inceptionv4_fp32_pretrained_model": 42, "set_env_gpu": [42, 43, 50], "ws1": 42, "infer_fp32_vs_amp": 42, "screen": 42, "01837550401687622": 42, "0113076031208038": 42, "fp": 42, "128": [42, 45, 51], "92880015134813": 42, "1691980294577": 42, "6153628825864496": 42, "867908472383153": 42, "wors": 42, "set_env_cpu": [43, 50], "env_itex_cpu": [43, 50], "success": [43, 47, 48, 55], "n02123159": 43, "tiger_cat": 43, "22355853": 43, "legaci": [45, 48, 49, 50, 51, 53, 54, 55], "deeplearningexampl": [45, 49, 53], "tensorflow2": [45, 52, 53], "languagemodel": 45, "pip_set_env": [45, 46, 48, 49, 51, 53, 54], "extract": 45, "squad": [45, 51], "bookcorpu": 45, "data_download": 45, "v1": [45, 46, 51, 56], "google_pretrained_weight": 45, "uncased_l": 45, "24_h": 45, "1024_a": 45, "12_h": 45, "768_a": 45, "tfrecord": [45, 49, 54], "books_wiki_en_corpu": 45, "consum": 45, "v100": 45, "dai": 45, "pretrain_bert": 45, "lamb": 45, "maximum": 45, "sequenc": [45, 48], "length": 45, "phase1": 45, "phase2": 45, "512": [45, 51], "train_batch_size_phase1": 45, "train_batch_size_phase2": 45, "eval_batch_s": 45, "learning_rate_phase1": 45, "5e": 45, "learning_rate_phase2": 45, "usa_xla": 45, "num_gpu": [45, 55], "warmup_steps_phase1": 45, "660": 45, "warmup_steps_phase2": 45, "66": 45, "2600": 45, "save_checkpoint_step": 45, "num_accumulation_steps_phase1": 45, "num_accumulation_steps_phase2": 45, "bert_model": [45, 51], "gbs1": 45, "expr": 45, "gbs2": 45, "pretrain_result_dir": 45, "tf_bert_pretraining_lamb_": 45, "_gbs1_": 45, "_gbs2_": 45, "data_dir": [45, 49, 53, 54], "run_pretraining_lamb": 45, "pretrain_lamb": 45, "checkpoint": 45, "batch_size_per_gpu": 45, "learning_rate_per_gpu": 45, "use_xla": 45, "squad_vers": 45, "use_mytrain": 45, "pretrain_path": 45, "phase_2": 45, "ckpt": [45, 51], "result_dir": 45, "tf_bert_finetune_": 45, "run_squad": [45, 51], "calibr": 46, "qdq": 46, "dequant": 46, "flower": 46, "photo": 46, "transfer": 46, "stage": 46, "protobuf": 46, "rewriter_config_pb2": 46, "infer_config": 46, "rewrite_opt": 46, "constant_fold": 46, "rewriterconfig": 46, "set_sess": 46, "speedup": [46, 55], "grep": 46, "vnni": 46, "avx_vnni": 46, "amx": 46, "amx_bf16": 46, "amx_int8": 46, "run_jupyt": 46, "yyi": 46, "xxxxxxxx": 46, "ipynb": [46, 48, 50], "mit": 46, "sy": 47, "num_channel": 47, "input_width": 47, "input_height": 47, "filter_width": 47, "filter_height": 47, "rand": 47, "stride": 47, "bias_add": 47, "479142": 47, "7296917": 47, "6456823": 47, "077278": 47, "9259825": 47, "3000765": 47, "3999124": 47, "0527704": 47, "0656753": 47, "85485": 47, "7297122": 47, "9373732": 47, "4818356": 47, "1455178": 47, "4929404": 47, "6422923": 47, "718459": 47, "7090344": 47, "988714": 47, "3391027": 47, "875052": 47, "6461415": 47, "9349675": 47, "327398": 47, "298973": 47, "3905785": 47, "1704025": 47, "9154005": 47, "6926193": 47, "9677248": 47, "481086": 47, "9746864": 47, "8941312": 47, "3221133": 47, "5479512": 47, "197306": 47, "305706": 47, "9873173": 47, "5597944": 47, "250221": 47, "118212": 47, "8672705": 47, "949225": 47, "2636094": 47, "5300783": 47, "1403804": 47, "1729176": 47, "6628485": 47, "2607155": 47, "6342418": 47, "9381838": 47, "6761076": 47, "5063303": 47, "4718971": 47, "8880196": 47, "1658201": 47, "3787665": 47, "1193419": 47, "42261": 47, "318963": 47, "8809638": 47, "6514435": 47, "3549364": 47, "8598063": 47, "517385": 47, "9702091": 47, "9260886": 47, "3804817": 47, "381424": 47, "6027272": 47, "7787259": 47, "9631021": 47, "93901324": 47, "2134862": 47, "89942324": 47, "cv": 48, "concaten": 48, "loop": [48, 55], "hasn": 48, "reset": 48, "66fa74b6a2a0bb1e563ae8bce66496b118b95200": 48, "ipykernel": 48, "url": [48, 50], "token": [48, 50], "stable_diffussion_infer": 48, "stable_diffusion_infer": 48, "present": 48, "fr\u00e9chet": 48, "distanc": 48, "fid": 48, "outcom": 48, "a100": 48, "stable_diffusion_accuraci": 48, "load_ref_result": 48, "ref_result_dir": 48, "nv_result": 48, "img_arrays_for_acc": 48, "81": [48, 51], "1146879196167": 48, "328223477737884": 48, "3dunet_itex": 49, "3dunet_itex_with_horovod": 49, "unet_3d_med": 49, "88eb3cff2f03dad85035621d041e23a14345999": 49, "nightli": 49, "dllogger": [49, 53], "brain": 49, "tumor": 49, "2019": 49, "upon": 49, "challeng": 49, "ipp": 49, "cbica": 49, "upenn": 49, "edu": 49, "nifti": 49, "volum": 49, "nibabel": 49, "preprocess_data": 49, "train_maskrcnn": [49, 53], "dataset_dir": [49, 53], "output_dir": [49, 51, 53], "exec_mod": 49, "warmup_step": 49, "150": 49, "max_step": 49, "log_everi": [49, 53], "dataset_path": 49, "mpirun": [49, 53, 54], "rank": [49, 53, 54], "ppn": [49, 53, 54], "tutori": 50, "pacakg": 50, "tensorflow_doc": 50, "classify_text_with_bert": 50, "ip": 50, "f502f0715979ec73c571ca5676ba58431b916f5f58ee3333": 50, "crash": 50, "tri": 50, "traceback": 50, "recent": 50, "174": 50, "__del__": 50, "typeerror": 50, "nonetyp": 50, "callabl": 50, "research": 51, "bert_large_dir": 51, "squad_dir": 51, "vocab_fil": 51, "vocab": 51, "bert_config_fil": 51, "bert_config": 51, "json": 51, "init_checkpoint": 51, "do_train": 51, "train_fil": 51, "do_predict": 51, "predict_fil": 51, "train_batch_s": [51, 53], "3e": 51, "num_train_epoch": 51, "max_seq_length": 51, "doc_strid": 51, "use_tpu": 51, "tpu_nam": 51, "produc": 51, "f1": 51, "41249612335034": 51, "exact_match": 51, "2488174077578": 51, "gin": 52, "raw": 52, "train_horovod": 52, "tensorflow2_keras_mnist": 52, "horovodrun": 52, "18": 52, "54": 52, "006950": 52, "custom_graph_optimizer_registri": 52, "163161": 52, "940695": 52, "107809": 52, "163517": 52, "250": 52, "yym": 52, "xxxx": [52, 54], "yyyi": 52, "zzzz": 52, "maskrcnn": 53, "c481324031ecf0f70f8939516c02e16cac60446d": 53, "opencv": 53, "headless": 53, "pybind11": 53, "cocoapi": 53, "egg": 53, "pycocotool": 53, "subdirectori": 53, "pythonapi": 53, "preprocess": 53, "coco": 53, "2017": 53, "download_and_preprocess_coco": 53, "resnet": [53, 54, 55], "download_weight": 53, "save_dir": 53, "pretrained_dir": 53, "seed": 53, "use_synthetic_data": [53, 55], "steps_per_epoch": 53, "log_warmup_step": 53, "lar": 54, "hvd_configur": 54, "hvd_support": 54, "tfd": 54, "trainer": 54, "snippet": 54, "readm": 54, "runner": 54, "ctl": 54, "wherea": 54, "classifier_train": 54, "builder": 54, "record": 54, "yaml": 54, "correspondli": 54, "dummi": 54, "itex_bf16_lar": 54, "itex_fp32_lar": 54, "itex_dummy_bf16_lar": 54, "itex_dummy_fp32_lar": 54, "pythonpath": 54, "config_fil": 54, "itex_xx": 54, "itex_bf16": 54, "itex_fp32": 54, "itex_dummy_bf16": 54, "itex_dummy_fp32": 54, "fi": 54, "vision": 54, "image_classif": [54, 55], "train_and_ev": 54, "model_typ": 54, "number_of_process": 54, "process_per_nod": 54, "i0203": 54, "006297": 54, "139660941027136": 54, "keras_util": [54, 55], "timehistori": [54, 55], "1900": 54, "2000": 54, "590331": 54, "2100": 54, "178206": 54, "2200": 54, "790128": 54, "2300": 54, "408512": 54, "2400": 54, "i0817": 54, "602742": 54, "139898862851904": 54, "600": 54, "603262": 54, "140612319840064": 54, "917546": 54, "800": 54, "917738": 54, "277716": 54, "277811": 54, "555174": 54, "1200": 54, "555221": 54, "accordingli": 55, "tf_num_interop_thread": 55, "tf_num_intraop_thread": 55, "resnet_ctl_imagenet_main": 55, "train_epoch": 55, "steps_per_loop": 55, "log_step": 55, "skip_ev": 55, "distribution_strategi": 55, "use_tf_while_loop": 55, "use_tf_funct": 55, "enable_xla": 55, "enable_tensorboard": 55, "enable_checkpoint_and_export": 55, "channels_last": 55, "single_l2_loss_op": 55, "follw": 55, "use_itex_shard": 55, "pramet": 55, "suggest": 55, "2x256x10": 55, "5120": 55, "itex_enable_multiple_stream": 55, "queue": 55, "resnet50_itex": 55, "tfg_optimizer_hook": 55, "289": 55, "i0324": 55, "594147": 55, "140348344015936": 55, "597360": 55, "479": 55, "sec": 55, "train_accuraci": 55, "train_loss": 55, "634554": 55, "161625": 55, "163815": 55, "790632": 55, "792936": 55, "103148": 55, "25": 55, "416651": 55, "419072": 55, "3359284": 55, "025180": 55, "027671": 55, "3343554": 55, "aim": 56, "flexibli": 56, "diagram": 56, "summari": 56, "ecosystem": 56, "estim": 56, "manag": 56, "dockerhub": 56, "come": 56, "soon": 56, "visit": 56, "tour": 56, "collabor": 56, "adher": 56, "innov": 56, "jax": 56, "vulner": 56, "apach": 56, "govern": 56, "forth": 56}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"contributor": 0, "coven": 0, "code": [0, 7, 17, 19, 34, 35, 45, 47, 48, 49, 50, 51, 52, 53, 54, 55], "conduct": 0, "our": 0, "pledg": 0, "standard": 0, "enforc": 0, "respons": 0, "scope": 0, "guidelin": [0, 7], "1": [0, 11, 16, 31, 32, 35], "correct": 0, "2": [0, 11, 16, 31, 32, 35], "warn": 0, "3": [0, 11, 16, 32], "temporari": 0, "ban": 0, "4": [0, 11, 16, 32], "perman": 0, "attribut": [0, 18], "secur": [1, 56], "polici": [1, 27], "report": 1, "vulner": 1, "intel": [2, 3, 4, 6, 7, 23, 29, 30, 31, 32, 34, 35, 36, 37, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 57], "extens": [2, 3, 4, 6, 7, 10, 23, 30, 31, 32, 34, 35, 36, 37, 40, 46, 57], "tensorflow": [2, 3, 4, 6, 7, 18, 19, 21, 23, 30, 31, 32, 34, 35, 36, 37, 40, 46, 57], "docker": [2, 3, 31, 36, 37, 42, 44], "contain": [2, 3, 36, 37, 42, 44], "guid": [2, 3, 5, 7, 28, 29, 38, 41, 44], "descript": [2, 3], "binari": [2, 3, 56], "prepar": [2, 3, 35, 39, 41, 42, 43, 45, 48, 49, 50, 51, 52, 53, 54, 55], "usag": [2, 15, 17, 18, 19, 22, 26, 28], "i": [2, 3, 28], "custom": [2, 11, 19, 23, 25, 27], "build": [2, 3, 5, 11, 14, 16, 27, 31, 34, 35, 36, 37], "script": [2, 28, 41], "ii": [2, 3, 28], "iii": [2, 28], "run": [2, 3, 16, 31, 32, 35, 39, 40, 41, 42, 43, 44, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55], "verifi": [2, 11, 32, 36, 37], "That": 2, "gpu": [2, 16, 17, 21, 22, 29, 32, 34, 35, 37, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56], "access": [2, 29], "from": [2, 14, 31, 35, 36, 37], "serv": [3, 21, 31], "imag": [3, 31, 49], "welcom": [4, 6, 57], "document": [4, 5, 6, 7, 56, 57], "highlight": 4, "onlin": 5, "introduct": [5, 13, 23, 40, 42, 44, 45, 46, 48, 49, 50, 51, 53, 54, 55], "updat": 5, "latest": 5, "version": [5, 30, 46], "creat": [5, 34, 35, 52], "releas": [5, 8, 32], "local": [5, 40, 46], "test": [5, 7, 42], "contribut": [7, 56], "develop": 7, "tip": [7, 19], "debug": 7, "unit": 7, "python": [7, 11, 17, 18, 20, 21, 30, 35, 42, 43, 48, 55], "style": 7, "c": [7, 31, 35], "bazel": [7, 34, 35], "known": 8, "issu": 8, "incompat": 8, "chang": [8, 45, 48, 49, 51, 53, 54], "directori": 9, "tree": 9, "structur": [9, 17], "design": [10, 12, 28], "workflow": [10, 15, 17], "resourc": [10, 56], "how": [11, 27], "write": 11, "op": [11, 25, 30], "prerequisit": [11, 30, 43, 45, 48, 49, 50, 51, 53, 54, 55], "defin": 11, "interfac": 11, "regist": 11, "kernel": 11, "implement": [11, 24], "6": 11, "add": 11, "7": 11, "us": [11, 21, 28, 31, 54], "8": 11, "packag": [11, 35, 37, 55], "9": 11, "instal": [11, 16, 31, 32, 33, 34, 35, 36, 37, 38, 47, 52, 55, 56], "optim": [12, 13, 19, 21, 24, 52], "onednn": [13, 46], "object": 13, "cach": 13, "convolut": 13, "frequent": 14, "ask": 14, "question": 14, "troubleshoot": 14, "sourc": [14, 31, 34, 35], "runtim": 14, "int8": [15, 21], "quantiz": [15, 21, 40, 46], "overview": [15, 16, 17, 19, 20, 27, 28, 29, 30, 34], "openxla": [16, 21], "support": [16, 21, 56], "via": [16, 20, 32, 36, 37, 42], "pjrt": 16, "hardwar": [16, 27, 29, 32, 34, 35, 36, 37, 40, 42, 45, 46, 48, 49, 50, 51, 53, 54, 55, 56], "softwar": [16, 29, 32, 36, 37, 56], "requir": [16, 32, 34, 35, 36, 37, 42, 45, 48, 49, 50, 51, 53, 54, 55, 56], "driver": [16, 32, 34, 35, 37, 41], "librari": [16, 31, 35], "jax": 16, "exampl": [16, 17, 18, 19, 22, 28, 34, 35, 39, 43, 45, 47, 48, 49, 51, 52, 53, 54, 55], "xpuautoshard": [17, 21, 55], "experiment": [17, 21, 32], "api": [17, 18, 20, 21, 23, 30, 42, 43, 48, 55], "dump": 17, "graph": [17, 19, 21, 24, 30, 46], "tune": [18, 19, 51], "advanc": [18, 19, 21, 23, 28, 42, 46], "auto": [18, 19, 20, 21], "mix": [18, 19, 20, 21, 24, 27, 42], "precis": [18, 19, 20, 21, 27, 42], "background": [18, 40, 46], "numer": 18, "stabil": 18, "configur": [18, 20, 29, 34, 35, 42, 46], "list": 18, "rule": 18, "improv": 18, "perform": [18, 42], "environ": [18, 20, 28, 30, 32, 33, 34, 35, 36, 37, 40, 41, 42, 43, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55], "variabl": [18, 20, 28, 30, 32, 37, 42], "differ": [18, 27], "stock": [18, 19], "end": 18, "mobilenet": 18, "amp": [19, 21, 42], "v": [19, 28], "data": [19, 24], "type": [19, 24, 27], "featur": [19, 21, 23], "manual": 19, "quick": [19, 44, 47, 56], "train": [19, 27, 44, 49, 50, 52, 53, 54, 55], "setup": [19, 27, 32, 37, 41, 42, 43, 45, 48, 49, 50, 51, 52, 53, 54, 55], "enabl": [19, 41, 42, 43, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55], "origin": 19, "notic": 19, "log": [19, 28], "save": 19, "oper": [19, 21, 25, 26, 30], "itex_verbos": 20, "level": 20, "definit": 20, "backend": 20, "config": [20, 30], "protocol": [20, 30], "option": [20, 32, 35], "eas": 21, "profil": [21, 22], "cpu": [21, 29, 34, 35, 36, 37, 42, 43, 46, 47, 48, 50, 54, 56], "launcher": 21, "faq": [22, 42, 43, 45, 48, 49, 50, 51, 53], "infrastructur": 23, "architectur": 23, "public": 23, "manag": 23, "xpu": [23, 34, 35, 37, 56], "engin": 23, "fusion": 24, "basic": [24, 28], "detail": 24, "gener": 24, "layout": [24, 29], "itex": [25, 30], "adamwithweightdecayoptim": 25, "layernorm": 25, "groupnorm": 25, "gelu": [25, 26], "itexlstm": 25, "overrid": [26, 30], "layer": 26, "normal": 26, "dens": 26, "activ": 26, "instanc": [26, 28], "lstm": 26, "kera": 27, "identifi": 27, "set": [27, 28, 40, 54, 55], "dtype": 27, "model": [27, 31, 42, 44, 45, 48, 49, 51, 53, 54], "fit": 27, "loss": 27, "scale": 27, "underflow": 27, "overflow": 27, "loop": 27, "launch": 28, "user": 28, "common": [28, 34, 35, 41], "execut": [28, 40, 42, 43, 45, 48, 49, 50, 51, 52, 53, 54, 55], "mode": 28, "latenc": 28, "throughput": 28, "multi": [28, 49], "numa": [28, 29], "control": 28, "memori": [28, 29], "alloc": [28, 29], "singl": [28, 49], "infer": [28, 42, 43, 44, 48], "all": 28, "physic": 28, "core": 28, "includ": 28, "logic": 28, "one": 28, "node": 28, "iv": 28, "your": 28, "number": 28, "multipl": 28, "vi": 28, "vii": 28, "viii": 28, "index": 28, "ix": 28, "tf_num_intraop_thread": 28, "x": 28, "tf_num_interop_thread": 28, "tcmalloc": [28, 29], "jemalloc": 28, "default": 28, "practic": 29, "tabl": [29, 56], "content": 29, "non": 29, "uniform": 29, "format": 29, "numactl": 29, "openmp": 29, "omp_num_thread": 29, "gnu": 29, "import": 30, "intel_extension_for_tensorflow": 30, "name": 30, "preserv": 30, "configproto": 30, "gpuoption": 30, "graphopt": 30, "automixedprecisionopt": 30, "shardingconfig": 30, "debugopt": 30, "set_config": 30, "get_config": 30, "server": [31, 40, 46], "dockerfil": [31, 36, 37], "sampl": 31, "arc": 32, "A": 32, "seri": 32, "window": 32, "subsystem": 32, "linux": 32, "wsl2": 32, "nativ": 32, "directli": 32, "step": [32, 33, 42, 43, 48, 50, 54], "By": 32, "instruct": [32, 33], "ubuntu": 32, "pypi": [32, 34, 36, 37], "wheel": [32, 36, 37], "virtual": [32, 36, 37, 41, 52], "system": [32, 36, 37], "full": 32, "oneapi": [32, 34, 35, 37, 41, 52], "conda": [33, 34, 35], "precondit": 33, "download": [34, 35, 42, 50, 52], "extra": [34, 35], "onli": [34, 35, 37], "base": [34, 35, 37, 40, 41], "toolkit": [34, 35, 37, 41], "For": [34, 35], "addit": 34, "cc": 35, "header": 35, "file": 35, "extract": 35, "recommend": 35, "integr": 35, "linker": 35, "load": 35, "get": [36, 37, 56], "dockerhub": [36, 37], "bare": [36, 37, 42, 44], "metal": [36, 37, 42, 44], "check": [37, 46], "platform": 37, "acceler": [40, 44, 45, 49, 53, 55], "alexnet": 40, "devcloud": [40, 46], "up": [40, 42], "speed": 42, "incept": [42, 46], "v4": 42, "automat": 42, "skip": [42, 43, 48, 50, 54], "thi": [42, 43, 48, 50, 54], "clone": [42, 52], "repositori": 42, "pretrain": [42, 45], "compar": 42, "fp32": [42, 48], "result": 42, "method": 42, "resnet50": [43, 54, 55], "output": [43, 47, 48, 52, 54, 55], "deep": [44, 46], "learn": [44, 46], "zoo": 44, "workload": 44, "start": [44, 56], "bert": [45, 50, 51], "larg": [45, 51], "dataset": [45, 49, 53, 54], "command": [45, 52, 54, 55], "finetun": 45, "v3": 46, "xeon": 46, "disabl": 46, "constant": 46, "fold": 46, "function": 46, "boost": 46, "matrix": 46, "startup": [46, 50], "jupyt": [46, 48, 50], "notebook": [46, 48, 50], "licens": [46, 56], "quick_exampl": 47, "py": 47, "note": 47, "stabl": 48, "diffus": 48, "text2imag": 48, "fp16": 48, "accuraci": [48, 51], "3d": 49, "unet": 49, "w": [49, 53], "o": [49, 53], "horovod": [49, 52, 53, 54], "medic": 49, "segment": 49, "tile": 49, "classifi": [50, 51], "text": [50, 51], "fp8": 51, "fine": 51, "bf16": 51, "distribut": 52, "depend": 52, "repo": 52, "patch": [52, 54], "appli": [52, 54], "devic": 52, "count": 52, "mask": 53, "r": 53, "cnn": 53, "If": 54, "imagenet": 54, "paramet": [54, 55], "without": [54, 55], "hvd": 54, "other": 55, "pythonpath": 55, "With": 55, "shard": 55, "further": 55, "channel": 56, "compat": 56, "weekli": 56}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 57}, "alltitles": {"Contributor Covenant Code of Conduct": [[0, "contributor-covenant-code-of-conduct"]], "Our Pledge": [[0, "our-pledge"]], "Our Standards": [[0, "our-standards"]], "Enforcement Responsibilities": [[0, "enforcement-responsibilities"]], "Scope": [[0, "scope"]], "Enforcement": [[0, "enforcement"]], "Enforcement Guidelines": [[0, "enforcement-guidelines"]], "1. Correction": [[0, "correction"]], "2. Warning": [[0, "warning"]], "3. Temporary Ban": [[0, "temporary-ban"]], "4. Permanent Ban": [[0, "permanent-ban"]], "Attribution": [[0, "attribution"]], "Security Policy": [[1, "security-policy"]], "Report a Vulnerability": [[1, "report-a-vulnerability"]], "Intel\u00ae Extension for TensorFlow* Docker Container Guide": [[2, "intel-extension-for-tensorflow-docker-container-guide"]], "Description": [[2, "description"], [3, "description"]], "Binaries Preparation": [[2, "binaries-preparation"]], "Usage of Docker Container": [[2, "usage-of-docker-container"]], "I. Customize Build Script": [[2, "i-customize-build-script"]], "II. Build the Container": [[2, "ii-build-the-container"], [3, "ii-build-the-container"]], "III. Running the Container": [[2, "iii-running-the-container"]], "Verify That Intel GPU is Accessible From TensorFlow": [[2, "verify-that-intel-gpu-is-accessible-from-tensorflow"]], "Intel\u00ae Extension for TensorFlow* Serving - Docker Container Guide": [[3, "intel-extension-for-tensorflow-serving-docker-container-guide"]], "Build the Docker Image": [[3, "build-the-docker-image"]], "I. Binaries Preparation": [[3, "i-binaries-preparation"]], "Running the Container": [[3, "running-the-container"]], "Welcome to Intel\u00ae Extension for TensorFlow* documentation": [[4, "welcome-to-intel-extension-for-tensorflow-documentation"]], "Documentation": [[4, "documentation"], [56, "documentation"]], "Highlights": [[4, "highlights"]], "Online Documentation Build Guide": [[5, "online-documentation-build-guide"]], "Introduction": [[5, "introduction"], [13, "introduction"], [23, "introduction"], [40, "introduction"], [42, "introduction"], [44, "introduction"], [45, "introduction"], [46, "introduction"], [48, "introduction"], [49, "introduction"], [50, "introduction"], [51, "introduction"], [53, "introduction"], [54, "introduction"], [55, "introduction"]], "Update latest Version": [[5, "update-latest-version"]], "Create Release Version": [[5, "create-release-version"]], "Build to Local Test": [[5, "build-to-local-test"]], "Welcome to Intel \u00ae Extension for TensorFlow* documentation!": [[6, "welcome-to-intel-extension-for-tensorflow-documentation"], [57, "welcome-to-intel-extension-for-tensorflow-documentation"]], "Contributing guidelines": [[7, "contributing-guidelines"]], "Contributing to Intel\u00ae Extension for TensorFlow*": [[7, "contributing-to-intel-extension-for-tensorflow"]], "Developing Intel\u00ae Extension for TensorFlow*": [[7, "developing-intel-extension-for-tensorflow"]], "Tips and Debugging": [[7, "tips-and-debugging"]], "Unit testing": [[7, "unit-testing"]], "Python Unit Testing": [[7, "python-unit-testing"]], "Code style guide": [[7, "code-style-guide"]], "Python coding style": [[7, "python-coding-style"]], "C++ coding style": [[7, "c-coding-style"]], "bazel style guide": [[7, "bazel-style-guide"]], "Documentation style guide": [[7, "documentation-style-guide"]], "Releases": [[8, "releases"]], "Known Issues": [[8, "known-issues"]], "Incompatible Changes": [[8, "incompatible-changes"]], "Directory Tree Structure": [[9, "directory-tree-structure"]], "Extension Design": [[10, "extension-design"]], "Workflow": [[10, "workflow"], [15, "workflow"], [17, "workflow"]], "Resources": [[10, "resources"], [56, "resources"]], "How to write custom op": [[11, "how-to-write-custom-op"]], "1. Prerequisite": [[11, "prerequisite"]], "2. Define the op interface and Register op": [[11, "define-the-op-interface-and-register-op"]], "3. Register the kernels for the op": [[11, "register-the-kernels-for-the-op"]], "4. Implement the kernels": [[11, "implement-the-kernels"]], "6. Add the op to BUILD": [[11, "add-the-op-to-build"]], "7. Use the op in Python": [[11, "use-the-op-in-python"]], "8. Build the package": [[11, "build-the-package"]], "9. Install and Verify": [[11, "install-and-verify"]], "Optimizations Design": [[12, "optimizations-design"]], "oneDNN object cache optimization": [[13, "onednn-object-cache-optimization"]], "Optimization in convolution": [[13, "optimization-in-convolution"]], "Frequently Asked Questions": [[14, "frequently-asked-questions"]], "Troubleshooting": [[14, "troubleshooting"]], "Build from source": [[14, "build-from-source"], [31, "build-from-source"]], "Runtime": [[14, "runtime"]], "INT8 Quantization": [[15, "int8-quantization"], [21, "int8-quantization"]], "Overview": [[15, "overview"], [17, "overview"], [19, "overview"], [20, "overview"], [27, "overview"], [28, "overview"], [29, "overview"], [30, "overview"], [34, "overview"]], "Usage": [[15, "usage"], [17, "usage"], [18, "usage"], [18, "id1"], [19, "usage"], [22, "usage"], [26, "usage"]], "OpenXLA Support on GPU via PJRT": [[16, "openxla-support-on-gpu-via-pjrt"]], "1. Overview": [[16, "overview"]], "2. Hardware and Software Requirement": [[16, "hardware-and-software-requirement"]], "Hardware Requirements": [[16, "hardware-requirements"], [32, "hardware-requirements"], [34, "hardware-requirements"], [35, "hardware-requirements"], [36, "hardware-requirements"], [37, "hardware-requirements"], [45, "hardware-requirements"], [48, "hardware-requirements"], [49, "hardware-requirements"], [50, "hardware-requirements"], [51, "hardware-requirements"], [53, "hardware-requirements"], [54, "hardware-requirements"], [55, "hardware-requirements"]], "Software Requirements": [[16, "software-requirements"], [32, "software-requirements"], [36, "software-requirements"], [37, "software-requirements"]], "Install GPU Drivers": [[16, "install-gpu-drivers"], [37, "install-gpu-drivers"]], "3. Build Library for JAX": [[16, "build-library-for-jax"]], "4. Run JAX Example": [[16, "run-jax-example"]], "XPUAutoShard on GPU [Experimental]": [[17, "xpuautoshard-on-gpu-experimental"], [21, "xpuautoshard-on-gpu-experimental"]], "Code Structure": [[17, "code-structure"]], "Python API": [[17, "python-api"], [18, "python-api"], [42, "python-api"], [55, "python-api"]], "Dump the graph": [[17, "dump-the-graph"]], "Examples": [[17, "examples"], [28, "examples"], [39, "examples"], [39, "id1"]], "Tune Advanced Auto Mixed Precision": [[18, "tune-advanced-auto-mixed-precision"]], "Background": [[18, "background"], [40, "background"], [46, "background"]], "Numeric Stability": [[18, "numeric-stability"]], "Configuration List": [[18, "configuration-list"]], "Example of Mix Precision by List": [[18, "example-of-mix-precision-by-list"]], "Rule to Improve Performance by the Configuration List": [[18, "rule-to-improve-performance-by-the-configuration-list"]], "Python API Attribute & Environment Variable": [[18, "python-api-attribute-environment-variable"]], "Environment Variable Difference with Stock TensorFlow": [[18, "environment-variable-difference-with-stock-tensorflow"]], "Example": [[18, "example"], [19, "example"], [35, "example"]], "End-to-end Example": [[18, "end-to-end-example"]], "Tuning Performance Example on MobileNet": [[18, "tuning-performance-example-on-mobilenet"]], "Advanced Auto Mixed Precision": [[19, "advanced-auto-mixed-precision"], [19, "id1"]], "Advanced AMP vs. Stock TensorFlow AMP": [[19, "advanced-amp-vs-stock-tensorflow-amp"]], "Data Type": [[19, "data-type"]], "Graph Optimizer": [[19, "graph-optimizer"]], "Feature": [[19, "feature"]], "Tune Advanced AMP Manually": [[19, "tune-advanced-amp-manually"]], "Quick Training Example": [[19, "quick-training-example"]], "Setup": [[19, "setup"], [27, "setup"]], "Enable Advanced AMP": [[19, "enable-advanced-amp"]], "Original Code": [[19, "original-code"]], "Notice": [[19, "notice"]], "Tips": [[19, "tips"]], "Log and Save Optimized Graph": [[19, "log-and-save-optimized-graph"]], "Custom Operation": [[19, "custom-operation"]], "Environment Variables": [[20, "environment-variables"], [28, "environment-variables"]], "Configuration via Environment Variables": [[20, "configuration-via-environment-variables"]], "ITEX_VERBOSE level definition": [[20, "itex-verbose-level-definition"]], "Environment Variables with Python APIs": [[20, "environment-variables-with-python-apis"]], "Backend and Config Protocol": [[20, "backend-and-config-protocol"]], "Auto Mixed Precision Options": [[20, "auto-mixed-precision-options"]], "Features": [[21, "features"]], "Operator Optimization": [[21, "operator-optimization"]], "Graph Optimization": [[21, "graph-optimization"]], "Advanced Auto Mixed Precision (AMP)": [[21, "advanced-auto-mixed-precision-amp"]], "Ease-of-use Python API": [[21, "ease-of-use-python-api"]], "GPU Profiler": [[21, "gpu-profiler"], [22, "gpu-profiler"]], "CPU Launcher [Experimental]": [[21, "cpu-launcher-experimental"]], "OpenXLA Support on GPU [Experimental]": [[21, "openxla-support-on-gpu-experimental"]], "TensorFlow Serving": [[21, "tensorflow-serving"]], "Example:": [[22, "example"]], "FAQ": [[22, "faq"], [42, "faq"], [43, "faq"], [45, "faq"], [48, "faq"], [49, "faq"], [50, "faq"], [51, "faq"], [53, "faq"]], "Infrastructure": [[23, "infrastructure"]], "Architecture": [[23, "architecture"]], "TensorFlow Public API": [[23, "tensorflow-public-api"]], "Custom API": [[23, "custom-api"]], "Intel Advanced Feature and Extension Management": [[23, "intel-advanced-feature-and-extension-management"]], "XPU Engine": [[23, "xpu-engine"]], "Graph fusion": [[24, "graph-fusion"]], "Basic fusion": [[24, "basic-fusion"]], "Mixed data type fusion": [[24, "mixed-data-type-fusion"]], "Implementation Details": [[24, "implementation-details"]], "Generic layout optimizer": [[24, "generic-layout-optimizer"]], "Customized Operators": [[25, "customized-operators"]], "itex.ops.AdamWithWeightDecayOptimizer": [[25, "itex-ops-adamwithweightdecayoptimizer"]], "itex.ops.LayerNormalization": [[25, "itex-ops-layernormalization"]], "itex.ops.GroupNormalization": [[25, "itex-ops-groupnormalization"]], "itex.ops.gelu": [[25, "itex-ops-gelu"]], "itex.ops.ItexLSTM": [[25, "itex-ops-itexlstm"]], "Operators Override": [[26, "operators-override"]], "Layer Normalization": [[26, "layer-normalization"]], "Dense Layer": [[26, "dense-layer"]], "Gelu Activation": [[26, "gelu-activation"]], "Instance Normalization": [[26, "instance-normalization"]], "LSTM": [[26, "lstm"]], "Keras Mixed Precision": [[27, "keras-mixed-precision"]], "How to identify different hardware types?": [[27, "how-to-identify-different-hardware-types"]], "Setting the dtype policy": [[27, "setting-the-dtype-policy"]], "Building the model": [[27, "building-the-model"]], "Training the model with Model.fit": [[27, "training-the-model-with-model-fit"]], "Loss scaling": [[27, "loss-scaling"]], "Underflow and Overflow": [[27, "underflow-and-overflow"]], "Loss scaling overview": [[27, "loss-scaling-overview"]], "Training the model with a custom training loop": [[27, "training-the-model-with-a-custom-training-loop"]], "Launch Script User Guide": [[28, "launch-script-user-guide"]], "Common Execution Mode": [[28, "common-execution-mode"]], "Latency mode": [[28, "latency-mode"]], "Throughput mode": [[28, "throughput-mode"]], "Basic Settings": [[28, "basic-settings"]], "Launch Log": [[28, "launch-log"]], "Advanced Settings": [[28, "advanced-settings"]], "Multi-instance": [[28, "multi-instance"]], "NUMA Control": [[28, "numa-control"]], "Memory Allocator": [[28, "memory-allocator"], [29, "memory-allocator"]], "Single instance for inference": [[28, "single-instance-for-inference"]], "I. Use all physical cores": [[28, "i-use-all-physical-cores"]], "II. Use all cores including logical cores": [[28, "ii-use-all-cores-including-logical-cores"]], "III. Use physical cores on one node": [[28, "iii-use-physical-cores-on-one-node"]], "IV. Use your designated number of cores": [[28, "iv-use-your-designated-number-of-cores"]], "Multiple instances for inference": [[28, "multiple-instances-for-inference"]], "V. Throughput mode": [[28, "v-throughput-mode"]], "VI. Latency mode": [[28, "vi-latency-mode"]], "VII. Your designated number of instances": [[28, "vii-your-designated-number-of-instances"]], "VIII. Your designated number of instances and instance index": [[28, "viii-your-designated-number-of-instances-and-instance-index"]], "Set environment variables for inference": [[28, "set-environment-variables-for-inference"]], "IX. Set environment variable TF_NUM_INTRAOP_THREADS": [[28, "ix-set-environment-variable-tf-num-intraop-threads"]], "X. Set environment variable TF_NUM_INTEROP_THREADS": [[28, "x-set-environment-variable-tf-num-interop-threads"]], "Usage of TCMalloc/Jemalloc/Default memory allocator": [[28, "usage-of-tcmalloc-jemalloc-default-memory-allocator"]], "Jemalloc": [[28, "jemalloc"]], "TCMalloc": [[28, "tcmalloc"], [29, "tcmalloc"]], "Default memory allocator": [[28, "default-memory-allocator"]], "Practice Guide": [[29, "practice-guide"]], "Table of Contents": [[29, "table-of-contents"]], "CPU Practice Guide": [[29, "cpu-practice-guide"]], "Hardware Configuration": [[29, "hardware-configuration"]], "Non-Uniform Memory Access (NUMA)": [[29, "non-uniform-memory-access-numa"]], "Software Configuration": [[29, "software-configuration"]], "Memory Layout format": [[29, "memory-layout-format"]], "Numactl": [[29, "numactl"]], "OpenMP": [[29, "openmp"]], "OMP_NUM_THREADS": [[29, "omp-num-threads"]], "GNU OpenMP": [[29, "gnu-openmp"]], "Intel OpenMP": [[29, "intel-openmp"]], "GPU Practice Guide": [[29, "gpu-practice-guide"]], "Python APIs": [[30, "python-apis"]], "Prerequisite: import intel_extension_for_tensorflow as itex": [[30, "prerequisite-import-intel-extension-for-tensorflow-as-itex"]], "Python APIs and Environment Variable Names": [[30, "python-apis-and-environment-variable-names"]], "Python APIs and preserved environment variable Names": [[30, "python-apis-and-preserved-environment-variable-names"]], "Intel\u00ae Extension for TensorFlow* Config Protocol": [[30, "intel-extension-for-tensorflow-config-protocol"]], "itex.ConfigProto": [[30, "itex-configproto"]], "itex.GPUOptions": [[30, "itex-gpuoptions"]], "itex.GraphOptions": [[30, "itex-graphoptions"]], "itex.AutoMixedPrecisionOptions": [[30, "itex-automixedprecisionoptions"]], "itex.ShardingConfig": [[30, "itex-shardingconfig"]], "itex.DebugOptions": [[30, "itex-debugoptions"]], "itex.set_config": [[30, "itex-set-config"]], "itex.get_config": [[30, "itex-get-config"]], "itex operators": [[30, "itex-operators"]], "itex ops override": [[30, "itex-ops-override"]], "itex graph": [[30, "itex-graph"]], "itex version": [[30, "itex-version"]], "Install TensorFlow Serving with Intel\u00ae Extension for TensorFlow*": [[31, "install-tensorflow-serving-with-intel-extension-for-tensorflow"]], "Install Model Server": [[31, "install-model-server"]], "Install using Docker": [[31, "install-using-docker"]], "1. Build Intel\u00ae Extension for TensorFlow* C++ library": [[31, "build-intel-extension-for-tensorflow-c-library"]], "2. Build TensorFlow Serving": [[31, "build-tensorflow-serving"]], "Build Docker image from Dockerfile": [[31, "build-docker-image-from-dockerfile"]], "Run sample": [[31, "run-sample"]], "Experimental: Intel\u00ae Arc\u2122 A-Series GPU Software Installation": [[32, "experimental-intel-arc-a-series-gpu-software-installation"]], "Experimental Release": [[32, "experimental-release"]], "Windows Subsystem for Linux 2 (WSL2)": [[32, "windows-subsystem-for-linux-2-wsl2"], [32, "id1"]], "Native Linux Running Directly on Hardware": [[32, "native-linux-running-directly-on-hardware"], [32, "id2"]], "Step-By-Step Instructions": [[32, "step-by-step-instructions"]], "1. Install GPU Drivers": [[32, "install-gpu-drivers"]], "Windows GPU Drivers": [[32, "windows-gpu-drivers"]], "Ubuntu Linux Installed in WSL2": [[32, "ubuntu-linux-installed-in-wsl2"]], "2. Install TensorFlow* via PyPI Wheel in Linux": [[32, "install-tensorflow-via-pypi-wheel-in-linux"]], "Install TensorFlow": [[32, "install-tensorflow"], [34, "install-tensorflow"], [35, "install-tensorflow"], [36, "install-tensorflow"], [37, "install-tensorflow"]], "Virtual environment install": [[32, "virtual-environment-install"], [36, "virtual-environment-install"], [37, "virtual-environment-install"]], "System environment install": [[32, "system-environment-install"], [36, "system-environment-install"], [37, "system-environment-install"]], "3. Install Intel\u00ae Extension for TensorFlow*": [[32, "install-intel-extension-for-tensorflow"]], "4. Verify the Installation": [[32, "verify-the-installation"]], "Optional: Install Full Intel\u00ae oneAPI": [[32, "optional-install-full-intel-oneapi"]], "Setup environment variables": [[32, "setup-environment-variables"], [37, "setup-environment-variables"]], "Conda Environment Installation Instructions": [[33, "conda-environment-installation-instructions"]], "Preconditions": [[33, "preconditions"]], "Step by step instructions:": [[33, "step-by-step-instructions"]], "Requirements": [[34, "requirements"], [35, "requirements"]], "Common Requirements": [[34, "common-requirements"], [35, "common-requirements"]], "Install Bazel": [[34, "install-bazel"], [35, "install-bazel"]], "Download Source Code": [[34, "download-source-code"], [35, "download-source-code"]], "Create a Conda Environment": [[34, "create-a-conda-environment"], [35, "create-a-conda-environment"]], "Extra Requirements for XPU/GPU Build Only": [[34, "extra-requirements-for-xpu-gpu-build-only"], [35, "extra-requirements-for-xpu-gpu-build-only"]], "Install Intel GPU Driver": [[34, "install-intel-gpu-driver"], [35, "install-intel-gpu-driver"]], "Install oneAPI Base Toolkit": [[34, "install-oneapi-base-toolkit"], [35, "install-oneapi-base-toolkit"]], "Build Intel\u00ae Extension for TensorFlow* PyPI": [[34, "build-intel-extension-for-tensorflow-pypi"]], "Configure": [[34, "configure"], [35, "configure"]], "Configure For CPU": [[34, "configure-for-cpu"], [35, "configure-for-cpu"]], "Configure For GPU/XPU": [[34, "configure-for-gpu-xpu"]], "Build Source Code": [[34, "build-source-code"], [35, "build-source-code"]], "Additional": [[34, "additional"]], "Configure Example for CPU": [[34, "configure-example-for-cpu"]], "Configure Example For GPU or XPU": [[34, "configure-example-for-gpu-or-xpu"]], "Intel\u00ae Extension for TensorFlow* for C++": [[35, "intel-extension-for-tensorflow-for-c"]], "Build Intel\u00ae Extension for TensorFlow* CC library": [[35, "build-intel-extension-for-tensorflow-cc-library"]], "Configure For GPU": [[35, "configure-for-gpu"]], "Prepare Tensorflow* CC library and header files": [[35, "prepare-tensorflow-cc-library-and-header-files"]], "Option 1: Extract from Tensorflow* python package (Recommended)": [[35, "option-1-extract-from-tensorflow-python-package-recommended"]], "Option 2: Build from TensorFlow* source code": [[35, "option-2-build-from-tensorflow-source-code"]], "Integrate the CC library": [[35, "integrate-the-cc-library"]], "Linker": [[35, "linker"]], "Load": [[35, "load"]], "Build and run": [[35, "build-and-run"]], "Intel CPU Software Installation": [[36, "intel-cpu-software-installation"]], "Install via Docker container": [[36, "install-via-docker-container"], [37, "install-via-docker-container"]], "Build Docker container from Dockerfile": [[36, "build-docker-container-from-dockerfile"], [37, "build-docker-container-from-dockerfile"]], "Get docker container from dockerhub": [[36, "get-docker-container-from-dockerhub"], [37, "get-docker-container-from-dockerhub"]], "Install via PyPI wheel in bare metal": [[36, "install-via-pypi-wheel-in-bare-metal"], [37, "install-via-pypi-wheel-in-bare-metal"]], "Install Intel\u00ae Extension for TensorFlow*": [[36, "install-intel-extension-for-tensorflow"], [37, "install-intel-extension-for-tensorflow"]], "Verify the Installation": [[36, "verify-the-installation"], [37, "verify-the-installation"]], "Intel XPU Software Installation": [[37, "intel-xpu-software-installation"]], "Install oneAPI Base Toolkit Packages": [[37, "install-oneapi-base-toolkit-packages"]], "Check the Environment for XPU": [[37, "check-the-environment-for-xpu"]], "XPU for CPU only platform": [[37, "xpu-for-cpu-only-platform"]], "Installation Guide": [[38, "installation-guide"]], "Prepare for Running": [[39, "prepare-for-running"]], "Accelerate AlexNet by Quantization with Intel\u00ae Extension for Tensorflow*": [[40, "accelerate-alexnet-by-quantization-with-intel-extension-for-tensorflow"]], "Hardware Environment": [[40, "hardware-environment"], [46, "hardware-environment"]], "GPU": [[40, "gpu"], [46, "gpu"]], "Local Server": [[40, "local-server"], [46, "local-server"]], "Intel\u00ae DevCloud": [[40, "intel-devcloud"], [46, "intel-devcloud"]], "Running Environment": [[40, "running-environment"], [46, "running-environment"]], "Set up Base Running Environment": [[40, "set-up-base-running-environment"]], "Set up Intel\u00ae Extension for Tensorflow* for GPU": [[40, "set-up-intel-extension-for-tensorflow-for-gpu"]], "Execute": [[40, "execute"], [50, "execute"]], "Common Guide for Running": [[41, "common-guide-for-running"]], "Prepare": [[41, "prepare"]], "Intel GPU Driver": [[41, "intel-gpu-driver"]], "Intel\u00ae oneAPI Base Toolkit": [[41, "intel-oneapi-base-toolkit"]], "Setup Running Environment": [[41, "setup-running-environment"], [42, "setup-running-environment"], [43, "setup-running-environment"], [45, "setup-running-environment"], [48, "setup-running-environment"], [49, "setup-running-environment"], [50, "setup-running-environment"], [51, "setup-running-environment"], [52, "setup-running-environment"], [53, "setup-running-environment"], [54, "setup-running-environment"]], "Running": [[41, "running"]], "Enable oneAPI Running Environment": [[41, "enable-oneapi-running-environment"]], "Enable Virtual Running Environment": [[41, "enable-virtual-running-environment"]], "Run Script": [[41, "run-script"]], "Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision on Intel CPU and GPU via Docker Container or Bare Metal": [[42, "speed-up-inference-of-inception-v4-by-advanced-automatic-mixed-precision-on-intel-cpu-and-gpu-via-docker-container-or-bare-metal"]], "Step": [[42, "step"]], "Hardware Requirement": [[42, "hardware-requirement"], [56, "hardware-requirement"]], "Prepare for GPU (Skip this Step for CPU)": [[42, "prepare-for-gpu-skip-this-step-for-cpu"]], "Clone the Repository": [[42, "clone-the-repository"]], "Download the Pretrained-model": [[42, "download-the-pretrained-model"]], "Enable Running Environment": [[42, "enable-running-environment"], [43, "enable-running-environment"], [45, "enable-running-environment"], [48, "enable-running-environment"], [49, "enable-running-environment"], [50, "enable-running-environment"], [51, "enable-running-environment"], [53, "enable-running-environment"], [54, "enable-running-environment"], [55, "enable-running-environment"]], "Execute Testing and Comparing the Performance of FP32 and Advanced AMP on CPU and GPU in Docker Container or Bare Metal": [[42, "execute-testing-and-comparing-the-performance-of-fp32-and-advanced-amp-on-cpu-and-gpu-in-docker-container-or-bare-metal"]], "Environment Variable Configuration": [[42, "environment-variable-configuration"]], "Result": [[42, "result"]], "Advanced: Enable Advanced AMP Method": [[42, "advanced-enable-advanced-amp-method"]], "ResNet50 Inference on Intel CPU and GPU": [[43, "resnet50-inference-on-intel-cpu-and-gpu"]], "Prerequisites": [[43, "prerequisites"], [45, "prerequisites"], [48, "prerequisites"], [49, "prerequisites"], [50, "prerequisites"], [51, "prerequisites"], [53, "prerequisites"], [54, "prerequisites"], [55, "prerequisites"]], "Prepare for GPU (Skip this step for CPU)": [[43, "prepare-for-gpu-skip-this-step-for-cpu"], [48, "prepare-for-gpu-skip-this-step-for-cpu"], [50, "prepare-for-gpu-skip-this-step-for-cpu"], [54, "prepare-for-gpu-skip-this-step-for-cpu"]], "Executes the Example with Python API": [[43, "executes-the-example-with-python-api"], [48, "executes-the-example-with-python-api"], [55, "executes-the-example-with-python-api"]], "Example Output": [[43, "example-output"], [47, "example-output"], [48, "example-output"], [55, "example-output"]], "Accelerate Deep Learning Training and Inference for Model Zoo Workloads on Intel GPU": [[44, "accelerate-deep-learning-training-and-inference-for-model-zoo-workloads-on-intel-gpu"]], "Quick Start Guide": [[44, "quick-start-guide"]], "Run Models in the Docker Container": [[44, "run-models-in-the-docker-container"]], "Run Models on Bare Metal": [[44, "run-models-on-bare-metal"]], "Accelerate BERT-Large Pretraining on Intel GPU": [[45, "accelerate-bert-large-pretraining-on-intel-gpu"]], "Model Code change": [[45, "model-code-change"], [48, "model-code-change"], [49, "model-code-change"], [51, "model-code-change"], [53, "model-code-change"], [54, "model-code-change"]], "Prepare for GPU": [[45, "prepare-for-gpu"], [49, "prepare-for-gpu"], [51, "prepare-for-gpu"], [53, "prepare-for-gpu"], [55, "prepare-for-gpu"]], "Prepare Dataset": [[45, "prepare-dataset"], [49, "prepare-dataset"], [53, "prepare-dataset"]], "Execute the Example": [[45, "execute-the-example"], [49, "execute-the-example"], [51, "execute-the-example"], [53, "execute-the-example"]], "Pretraining Command": [[45, "pretraining-command"]], "Finetune Command": [[45, "finetune-command"]], "Quantize Inception V3 by Intel\u00ae Extension for Tensorflow* on Intel\u00ae Xeon\u00ae": [[46, "quantize-inception-v3-by-intel-extension-for-tensorflow-on-intel-xeon"]], "Configuration": [[46, "configuration"]], "Intel\u00ae Extension for Tensorflow* Version": [[46, "intel-extension-for-tensorflow-version"]], "Enable oneDNN Graph": [[46, "enable-onednn-graph"]], "Disable Constant Folding Function": [[46, "disable-constant-folding-function"]], "CPU": [[46, "cpu"]], "Check Intel\u00ae Deep Learning Boost": [[46, "check-intel-deep-learning-boost"]], "Check Intel\u00ae Advanced Matrix Extensions": [[46, "check-intel-advanced-matrix-extensions"]], "Startup Jupyter Notebook": [[46, "startup-jupyter-notebook"], [50, "startup-jupyter-notebook"]], "License": [[46, "license"], [56, "license"]], "Quick Example on Intel CPU and GPU": [[47, "quick-example-on-intel-cpu-and-gpu"]], "Installation": [[47, "installation"]], "Code": [[47, "code"]], "quick_example.py": [[47, "quick-example-py"]], "Notes": [[47, "notes"]], "Stable Diffusion Inference for Text2Image on Intel GPU": [[48, "stable-diffusion-inference-for-text2image-on-intel-gpu"]], "Running the Jupyter Notebook": [[48, "running-the-jupyter-notebook"]], "FP32 Inference": [[48, "fp32-inference"]], "FP16 Inference": [[48, "fp16-inference"]], "Accuracy": [[48, "accuracy"], [51, "accuracy"]], "Accelerate 3D-Unet Training w/o horovod for medical image segmentation on Intel GPU": [[49, "accelerate-3d-unet-training-w-o-horovod-for-medical-image-segmentation-on-intel-gpu"]], "Single Tile": [[49, "single-tile"]], "Multi-tile with horovod": [[49, "multi-tile-with-horovod"]], "BERT Training for Classifying Text on Intel CPU and GPU": [[50, "bert-training-for-classifying-text-on-intel-cpu-and-gpu"]], "Download Jupyter Code:": [[50, "download-jupyter-code"]], "FP8 BERT-Large Fine-tuning for Classifying Text on Intel GPU": [[51, "fp8-bert-large-fine-tuning-for-classifying-text-on-intel-gpu"]], "BF16 + FP8 Fine-tuning": [[51, "bf16-fp8-fine-tuning"]], "Distributed Training Example with Intel\u00ae Optimization for Horovod* on Intel\u00ae GPU": [[52, "distributed-training-example-with-intel-optimization-for-horovod-on-intel-gpu"]], "Dependency": [[52, "dependency"]], "Create Virtual Environment": [[52, "create-virtual-environment"]], "Install": [[52, "install"], [56, "install"]], "Prepare Example Code": [[52, "prepare-example-code"]], "Clone Horovod Repo": [[52, "clone-horovod-repo"]], "Download Patch": [[52, "download-patch"]], "Apply Patch for Intel GPU": [[52, "apply-patch-for-intel-gpu"]], "Execution": [[52, "execution"], [54, "execution"]], "Enable oneAPI": [[52, "enable-oneapi"]], "Device Count": [[52, "device-count"]], "Running Command": [[52, "running-command"]], "Output": [[52, "output"]], "Accelerate Mask R-CNN Training w/o horovod on Intel GPU": [[53, "accelerate-mask-r-cnn-training-w-o-horovod-on-intel-gpu"]], "Resnet50 train on Intel GPU": [[54, "resnet50-train-on-intel-gpu"]], "Apply Patch": [[54, "apply-patch"]], "If not use Horovod": [[54, "if-not-use-horovod"]], "If use Horovod": [[54, "if-use-horovod"]], "Prepare ImageNet dataset": [[54, "prepare-imagenet-dataset"]], "Set Model Parameters": [[54, "set-model-parameters"]], "Command": [[54, "command"]], "Command with Horovod": [[54, "command-with-horovod"]], "Example Output without hvd": [[54, "example-output-without-hvd"]], "Example Output with hvd": [[54, "example-output-with-hvd"]], "Accelerate ResNet50 Training by XPUAutoShard on Intel GPU": [[55, "accelerate-resnet50-training-by-xpuautoshard-on-intel-gpu"]], "Prepare the Codes": [[55, "prepare-the-codes"]], "Install Other Required Packages": [[55, "install-other-required-packages"]], "Setup PYTHONPATH": [[55, "setup-pythonpath"]], "Without XPUAutoShard": [[55, "without-xpuautoshard"]], "With XPUAutoShard": [[55, "with-xpuautoshard"]], "Sharding Parameters Setting": [[55, "sharding-parameters-setting"]], "Further Settings": [[55, "further-settings"]], "Executing Command": [[55, "executing-command"]], "Quick Get Started*": [[56, "quick-get-started"]], "Software Requirement": [[56, "software-requirement"]], "Installation Channel:": [[56, "installation-channel"]], "Compatibility Table": [[56, "compatibility-table"]], "Install for XPU": [[56, "install-for-xpu"]], "Install for CPU": [[56, "install-for-cpu"]], "Install for weekly binaries": [[56, "install-for-weekly-binaries"]], "Install for GPU weekly": [[56, "install-for-gpu-weekly"]], "Contributing": [[56, "contributing"]], "Support": [[56, "support"]], "Security": [[56, "security"]]}, "indexentries": {}}) \ No newline at end of file +Search.setIndex({"docnames": ["CODE_OF_CONDUCT", "SECURITY", "docker/README", "docker/tensorflow-serving/README", "docs/README", "docs/build_docs/docs_build_tips", "docs/build_docs/source/index", "docs/community/contributing", "docs/community/releases", "docs/design/directory_structure", "docs/design/extension_design", "docs/design/how_to_write_custom_op", "docs/design/optimization/README", "docs/design/optimization/oneDNN_object_cache", "docs/guide/FAQ", "docs/guide/INT8_quantization", "docs/guide/OpenXLA_Support_on_GPU", "docs/guide/XPUAutoShard", "docs/guide/aamp_tune", "docs/guide/advanced_auto_mixed_precision", "docs/guide/environment_variables", "docs/guide/features", "docs/guide/how_to_enable_profiler", "docs/guide/infrastructure", "docs/guide/itex_fusion", "docs/guide/itex_ops", "docs/guide/itex_ops_override", "docs/guide/keras_mixed_precision", "docs/guide/launch", "docs/guide/practice_guide", "docs/guide/python_api", "docs/guide/tf_serving_install", "docs/install/experimental/install_for_arc_gpu", "docs/install/experimental/install_for_gpu_conda", "docs/install/how_to_build", "docs/install/install_for_cpp", "docs/install/install_for_cpu", "docs/install/install_for_xpu", "docs/install/installation_guide", "examples/README", "examples/accelerate_alexnet_by_quantization/README", "examples/common_guide_running", "examples/infer_inception_v4_amp/README", "examples/infer_resnet50/README", "examples/model_zoo_example/README", "examples/pretrain_bert/README", "examples/quantize_inception_v3/README", "examples/quick_example", "examples/stable_diffussion_inference/README", "examples/train_3d_unet/README", "examples/train_bert/README", "examples/train_bert_fp8/README", "examples/train_horovod/mnist/README", "examples/train_horovod/resnet50/README", "examples/train_maskrcnn/README", "examples/train_resnet50/README", "examples/train_resnet50_with_autoshard/README", "get_started", "index"], "filenames": ["CODE_OF_CONDUCT.md", "SECURITY.md", "docker/README.md", "docker/tensorflow-serving/README.md", "docs/README.md", "docs/build_docs/docs_build_tips.md", "docs/build_docs/source/index.rst", "docs/community/contributing.md", "docs/community/releases.md", "docs/design/directory_structure.md", "docs/design/extension_design.md", "docs/design/how_to_write_custom_op.md", "docs/design/optimization/README.md", "docs/design/optimization/oneDNN_object_cache.md", "docs/guide/FAQ.md", "docs/guide/INT8_quantization.md", "docs/guide/OpenXLA_Support_on_GPU.md", "docs/guide/XPUAutoShard.md", "docs/guide/aamp_tune.md", "docs/guide/advanced_auto_mixed_precision.md", "docs/guide/environment_variables.md", "docs/guide/features.rst", "docs/guide/how_to_enable_profiler.md", "docs/guide/infrastructure.md", "docs/guide/itex_fusion.md", "docs/guide/itex_ops.md", "docs/guide/itex_ops_override.md", "docs/guide/keras_mixed_precision.md", "docs/guide/launch.md", "docs/guide/practice_guide.md", "docs/guide/python_api.md", "docs/guide/tf_serving_install.md", "docs/install/experimental/install_for_arc_gpu.md", "docs/install/experimental/install_for_gpu_conda.md", "docs/install/how_to_build.md", "docs/install/install_for_cpp.md", "docs/install/install_for_cpu.md", "docs/install/install_for_xpu.md", "docs/install/installation_guide.rst", "examples/README.md", "examples/accelerate_alexnet_by_quantization/README.md", "examples/common_guide_running.md", "examples/infer_inception_v4_amp/README.md", "examples/infer_resnet50/README.md", "examples/model_zoo_example/README.md", "examples/pretrain_bert/README.md", "examples/quantize_inception_v3/README.md", "examples/quick_example.md", "examples/stable_diffussion_inference/README.md", "examples/train_3d_unet/README.md", "examples/train_bert/README.md", "examples/train_bert_fp8/README.md", "examples/train_horovod/mnist/README.md", "examples/train_horovod/resnet50/README.md", "examples/train_maskrcnn/README.md", "examples/train_resnet50/README.md", "examples/train_resnet50_with_autoshard/README.md", "get_started.md", "index.rst"], "titles": ["Contributor Covenant Code of Conduct", "Security Policy", "Intel\u00ae Extension for TensorFlow* Docker Container Guide", "Intel\u00ae Extension for TensorFlow* Serving - Docker Container Guide", "Welcome to Intel\u00ae Extension for TensorFlow* documentation", "Online Documentation Build Guide", "Welcome to Intel \u00ae Extension for TensorFlow* documentation!", "Contributing guidelines", "Releases", "Directory Tree Structure", "Extension Design", "How to write custom op", "Optimizations Design", "oneDNN object cache optimization", "Frequently Asked Questions", "INT8 Quantization", "OpenXLA Support on GPU via PJRT", "XPUAutoShard on GPU [Experimental]", "Tune Advanced Auto Mixed Precision", "Advanced Auto Mixed Precision", "Environment Variables", "Features", "GPU Profiler", "Infrastructure", "Graph fusion", "Customized Operators", "Operators Override", "Keras Mixed Precision", "Launch Script User Guide", "Practice Guide", "Python APIs", "Install TensorFlow Serving with Intel\u00ae Extension for TensorFlow*", "Experimental: Intel\u00ae Arc\u2122 A-Series GPU Software Installation", "Conda Environment Installation Instructions", "Overview", "Intel\u00ae Extension for TensorFlow* for C++", "Intel CPU Software Installation", "Intel XPU Software Installation", "Installation Guide", "Examples", "Accelerate AlexNet by Quantization with Intel\u00ae Extension for Tensorflow*", "Common Guide for Running", "Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision on Intel CPU and GPU via Docker Container or Bare Metal", "ResNet50 Inference on Intel CPU and GPU", "Accelerate Deep Learning Training and Inference for Model Zoo Workloads on Intel GPU", "Accelerate BERT-Large Pretraining on Intel GPU", "Quantize Inception V3 by Intel\u00ae Extension for Tensorflow* on Intel\u00ae Xeon\u00ae", "Quick Example on Intel CPU and GPU", "Stable Diffusion Inference for Text2Image on Intel GPU", "Accelerate 3D-Unet Training w/o horovod for medical image segmentation on Intel GPU", "BERT Training for Classifying Text on Intel CPU and GPU", "FP8 BERT-Large Fine-tuning for Classifying Text on Intel GPU", "Distributed Training Example with Intel\u00ae Optimization for Horovod* on Intel\u00ae GPU", "Refer to train_resnet50", "Accelerate Mask R-CNN Training w/o horovod on Intel GPU", "Resnet50 train on Intel GPU", "Accelerate ResNet50 Training by XPUAutoShard on Intel GPU", "Quick Get Started*", "Welcome to Intel \u00ae Extension for TensorFlow* documentation!"], "terms": {"we": [0, 2, 7, 11, 16, 24, 27, 29, 30, 31, 33, 34, 35, 40, 41, 42, 45, 46, 48, 49, 51, 54, 55, 57], "member": [0, 30], "leader": 0, "make": [0, 2, 3, 5, 7, 11, 14, 16, 18, 19, 27, 29, 34, 35, 42], "particip": 0, "commun": [0, 2, 7, 9, 21, 23, 29, 37, 57], "harass": 0, "free": [0, 21, 28], "experi": [0, 4, 21, 23, 29], "everyon": 0, "regardless": 0, "ag": 0, "bodi": 0, "size": [0, 20, 25, 27, 28, 52, 56], "visibl": [0, 2, 11, 31], "invis": 0, "disabl": [0, 15, 19, 28, 29, 30], "ethnic": 0, "sex": 0, "characterist": 0, "gender": 0, "ident": [0, 27], "express": 0, "level": [0, 14, 16, 17, 23, 24, 27, 32], "educ": 0, "socio": 0, "econom": 0, "statu": [0, 11, 19, 35], "nation": 0, "person": 0, "appear": [0, 27], "race": 0, "cast": [0, 18, 24, 27], "color": 0, "religion": 0, "sexual": 0, "orient": 0, "act": [0, 21, 31], "interact": [0, 34], "wai": [0, 14, 19, 27, 31, 33], "contribut": [0, 4, 21, 28, 34], "an": [0, 2, 3, 7, 11, 13, 14, 18, 19, 21, 24, 25, 27, 28, 29, 31, 33, 34, 35, 37, 39, 46, 47, 51, 56, 57], "open": [0, 5, 7, 14, 18, 21, 31, 32, 42, 43, 45, 46, 48, 49, 50, 51, 54, 57], "welcom": [0, 7, 57], "divers": 0, "inclus": 0, "healthi": 0, "exampl": [0, 2, 4, 5, 7, 9, 11, 15, 20, 21, 24, 25, 26, 27, 29, 30, 31, 32, 33, 40, 42, 44, 46, 50, 57], "behavior": [0, 27, 28, 29], "posit": [0, 7], "environ": [0, 4, 11, 13, 15, 19, 21, 22, 23, 27, 29, 31, 38, 39, 57], "includ": [0, 7, 13, 14, 16, 17, 18, 20, 23, 35, 37, 46, 47, 57], "demonstr": [0, 16, 39], "empathi": 0, "kind": [0, 4, 21, 47], "toward": 0, "other": [0, 17, 20, 25, 27, 28, 29, 30, 31, 32, 34, 35, 37, 50, 52, 57], "peopl": 0, "Being": 0, "respect": [0, 28, 45], "differ": [0, 2, 4, 13, 16, 20, 21, 23, 25, 28, 29, 30, 38], "opinion": 0, "viewpoint": 0, "give": 0, "gracefulli": 0, "accept": [0, 7, 17], "construct": [0, 11, 17, 27], "feedback": [0, 7], "apolog": 0, "those": [0, 18, 19, 31, 55], "affect": [0, 18, 27], "mistak": 0, "learn": [0, 15, 19, 21, 25, 28, 29, 31, 34, 35, 39, 40, 57], "from": [0, 3, 4, 5, 7, 11, 16, 17, 18, 19, 21, 22, 27, 28, 29, 30, 32, 34, 38, 39, 42, 44, 45, 46, 49, 50, 57], "focus": 0, "what": [0, 14, 27], "i": [0, 4, 5, 7, 9, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, 31, 32, 33, 34, 35, 36, 37, 39, 40, 42, 43, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 56, 57], "best": [0, 32], "just": 0, "u": [0, 16, 22, 28, 37], "individu": [0, 20], "overal": [0, 29], "unaccept": 0, "The": [0, 2, 4, 5, 7, 9, 13, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 31, 32, 34, 35, 36, 37, 40, 42, 45, 46, 49, 50, 51, 52, 54, 55, 56], "us": [0, 2, 3, 4, 5, 7, 13, 14, 15, 16, 18, 19, 20, 22, 23, 24, 25, 26, 27, 29, 30, 32, 33, 34, 35, 37, 39, 41, 42, 44, 45, 46, 47, 49, 50, 51, 54, 57], "languag": [0, 35], "imageri": 0, "attent": [0, 20], "advanc": [0, 4, 14, 20, 30, 39, 57], "ani": [0, 4, 11, 20, 21, 23, 24, 27, 28, 32, 33, 34, 37, 40, 47, 50], "troll": 0, "insult": 0, "derogatori": 0, "comment": [0, 7, 14], "polit": 0, "attack": 0, "public": [0, 4, 5, 11, 21, 25, 30, 31], "privat": 0, "publish": [0, 5], "inform": [0, 1, 7, 8, 20, 28, 29, 30, 34, 35, 37, 40, 46, 55, 57], "physic": [0, 29, 56], "email": 0, "address": [0, 29, 32], "without": [0, 4, 18, 20, 21, 23, 27, 34, 35, 39, 46, 50, 57], "explicit": [0, 11, 27, 29], "permiss": [0, 5], "which": [0, 4, 7, 9, 13, 14, 15, 16, 17, 18, 19, 20, 24, 27, 28, 29, 30, 32, 34, 35, 37, 40, 41, 46, 51], "could": [0, 14, 18, 27, 30, 35, 37, 40, 46], "reason": [0, 27], "consid": [0, 18, 52], "inappropri": 0, "profession": 0, "set": [0, 4, 7, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 29, 30, 32, 33, 35, 37, 42, 45, 46, 51, 57], "ar": [0, 2, 4, 5, 7, 11, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 34, 35, 36, 37, 39, 40, 42, 45, 46, 47, 49, 52, 55, 57], "clarifi": 0, "take": [0, 11, 24, 27, 28, 29, 31, 33, 45], "appropri": [0, 3, 29, 34, 35], "fair": 0, "action": [0, 5], "thei": [0, 18, 27, 28, 29], "deem": 0, "threaten": 0, "offens": 0, "harm": 0, "have": [0, 18, 27, 29, 32, 33, 34, 40, 46], "right": [0, 25], "remov": [0, 11, 18, 24], "edit": [0, 2], "reject": 0, "commit": [0, 5, 17, 31], "wiki": 0, "issu": [0, 1, 7, 14, 18, 27, 32, 34, 35, 37, 50, 57], "align": [0, 13], "thi": [0, 2, 3, 5, 11, 13, 14, 16, 17, 18, 19, 20, 21, 23, 24, 25, 27, 28, 29, 30, 31, 33, 34, 35, 37, 40, 41, 44, 45, 46, 47, 49, 51, 54, 56, 57], "moder": 0, "decis": [0, 17], "when": [0, 5, 14, 17, 19, 24, 27, 28, 29, 31, 32, 34, 35, 45, 46, 49, 50, 54], "appli": [0, 17, 25, 27, 30, 45, 48, 49, 51, 54, 56], "within": [0, 15, 25, 32, 45], "all": [0, 7, 11, 14, 18, 20, 21, 25, 27, 29, 32, 37, 40, 42, 45, 56], "space": [0, 29, 57], "also": [0, 4, 7, 15, 16, 17, 19, 21, 23, 27, 28, 29, 32, 33, 36, 37, 57], "offici": [0, 29, 39, 40, 41, 45, 48, 49, 51, 54, 55, 56], "repres": [0, 17], "e": [0, 2, 3, 5, 11, 17, 27, 28, 31, 35, 37, 54, 55], "mail": 0, "post": [0, 7, 18, 19, 24, 30], "via": [0, 11, 17, 39, 56, 57], "social": 0, "media": 0, "account": 0, "appoint": 0, "onlin": [0, 57], "offlin": 0, "event": 0, "instanc": 0, "abus": 0, "otherwis": [0, 17, 27, 30, 46, 47], "mai": [0, 7, 13, 14, 18, 19, 24, 27, 28, 29, 32, 33, 37, 48, 57], "report": [0, 7, 20, 57], "itex": [0, 2, 3, 4, 8, 9, 11, 13, 14, 16, 17, 18, 19, 20, 21, 23, 26, 27, 28, 31, 32, 33, 34, 35, 36, 37, 41, 42, 46, 48, 56, 57], "maintain": [0, 7, 8, 18, 21, 23, 25, 31], "intel": [0, 1, 5, 8, 9, 11, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, 28, 33, 38, 39, 57], "com": [0, 5, 7, 8, 16, 21, 27, 29, 31, 32, 33, 34, 35, 37, 40, 42, 45, 46, 48, 49, 50, 51, 52, 54, 55, 56, 57], "complaint": 0, "review": 0, "investig": [0, 28], "promptli": 0, "fairli": 0, "oblig": 0, "privaci": 0, "secur": 0, "incid": 0, "follow": [0, 2, 3, 7, 15, 17, 18, 22, 24, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 42, 43, 45, 47, 48, 49, 50, 51, 54, 55, 56, 57], "impact": [0, 5, 14, 18, 24, 29, 50], "determin": [0, 11, 27, 29], "consequ": 0, "violat": 0, "unprofession": 0, "unwelcom": 0, "A": [0, 5, 16, 17, 18, 24, 27, 28, 29, 30, 31, 37, 39, 42, 52], "written": [0, 7], "provid": [0, 2, 4, 7, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32, 33, 34, 35, 37, 39, 40, 45, 46, 49, 54, 55, 56, 57], "clariti": 0, "around": [0, 28, 45], "natur": 0, "explan": 0, "why": 0, "wa": [0, 28, 29, 30, 34, 35], "apologi": 0, "request": [0, 7, 57], "through": [0, 14, 27, 29, 34, 35, 49, 57], "singl": [0, 4, 7, 15, 20, 21, 24, 45, 54, 55], "seri": [0, 16, 29, 30, 34, 35, 37, 40, 42, 44, 45, 46, 48, 49, 50, 51, 52, 54, 55, 56, 57], "continu": [0, 14, 18, 27], "No": [0, 14, 19, 22, 34, 42, 43, 45, 48, 49, 50, 51, 54], "involv": 0, "unsolicit": 0, "specifi": [0, 3, 11, 21, 24, 27, 28, 29, 31, 34, 35], "period": [0, 29], "time": [0, 11, 14, 16, 18, 19, 20, 21, 22, 27, 29, 34, 35, 40, 45], "avoid": [0, 24, 27, 28, 29, 33], "well": [0, 2, 8, 11, 21, 26, 27, 28, 29, 45], "extern": [0, 14, 35], "channel": [0, 24, 25, 38], "like": [0, 2, 7, 16, 17, 25, 27, 29, 30, 41, 42, 51, 52], "term": [0, 25, 57], "lead": [0, 18], "seriou": 0, "sustain": 0, "sort": 0, "allow": [0, 16, 18, 27, 29, 50, 57], "dure": [0, 15, 18, 19, 24, 27, 33, 34, 35, 42], "pattern": [0, 4, 15, 21, 24], "aggress": [0, 18, 19], "disparag": 0, "class": [0, 11, 27, 30], "adapt": 0, "version": [0, 2, 11, 14, 16, 27, 29, 32, 33, 34, 35, 36, 37, 40, 41], "avail": [0, 2, 3, 11, 14, 19, 25, 28, 29, 34, 36, 37, 49, 57], "http": [0, 2, 5, 7, 8, 16, 21, 22, 27, 29, 31, 32, 33, 34, 35, 36, 37, 40, 42, 45, 46, 48, 49, 50, 51, 52, 54, 55, 56, 57], "www": [0, 21, 37], "org": [0, 2, 7, 21, 35, 50], "_": [0, 11, 13, 16, 17, 18, 20, 22, 24, 27, 28, 29, 30, 31, 32, 34, 35, 41, 42, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55], "html": [0, 5, 11, 37], "were": [0, 28, 29], "inspir": 0, "mozilla": 0, "": [0, 5, 14, 18, 20, 21, 27, 29, 31, 34, 35, 40, 42, 46, 48, 49, 50, 57], "ladder": 0, "For": [0, 1, 2, 7, 11, 14, 15, 16, 18, 19, 20, 23, 25, 26, 27, 28, 30, 31, 32, 37, 42, 43, 44, 45, 48, 49, 50, 51, 52, 54, 55, 56], "answer": 0, "common": [0, 11, 14, 17, 21, 29, 39], "question": [0, 4, 57], "about": [0, 7, 19, 29, 31, 40, 45, 46, 52], "see": [0, 1, 2, 7, 22, 25, 27, 28, 29, 31, 32, 34, 46, 55, 57], "faq": 0, "translat": [0, 34, 35], "center": [1, 4, 16, 21, 25, 26, 30, 34, 35, 37, 40, 42, 44, 45, 46, 48, 49, 50, 51, 52, 54, 55, 56, 57], "more": [1, 4, 7, 11, 16, 18, 19, 21, 25, 29, 31, 32, 34, 35, 37, 40, 45, 46, 47, 52, 55], "how": [1, 5, 14, 16, 17, 18, 29, 31, 34, 35, 37, 39, 52, 55, 57], "work": [1, 4, 7, 14, 15, 19, 20, 21, 27, 28, 29, 35, 40, 46], "resolv": 1, "handl": [1, 13], "guidelin": [1, 4, 44, 57], "document": [2, 3, 27, 33], "ha": [2, 3, 14, 18, 19, 27, 29, 32, 35, 45, 56], "instruct": [2, 3, 4, 7, 18, 19, 21, 29, 36, 37, 48, 55, 57], "assumpt": [2, 3], "host": [2, 3, 27, 37, 42], "machin": [2, 3, 21, 27, 28, 29, 31, 36, 37, 47, 52], "linux": [2, 3, 7, 16, 28, 29, 33, 34, 35, 36, 37, 46], "kernel": [2, 3, 9, 10, 15, 16, 20, 22, 23, 24, 25, 27, 32, 34, 35, 36, 37, 45, 46, 48, 57], "compat": [2, 3, 4, 15, 19, 21, 23, 26, 27, 30, 45, 46, 48, 49, 50, 51, 54, 55], "driver": [2, 3, 14, 27, 33, 40, 42, 46, 57], "instal": [2, 3, 4, 7, 9, 14, 18, 19, 21, 22, 23, 26, 27, 28, 29, 30, 40, 41, 42, 43, 45, 46, 48, 49, 50, 51, 54], "softwar": [2, 33, 38, 40, 46, 47, 52], "refer": [2, 3, 7, 11, 15, 16, 17, 18, 19, 20, 21, 23, 27, 29, 30, 31, 32, 34, 35, 37, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 56, 57], "xpu": [2, 4, 11, 14, 16, 17, 19, 22, 25, 26, 27, 30, 38, 47, 48, 52], "cpu": [2, 3, 4, 9, 11, 14, 15, 18, 19, 20, 23, 24, 27, 30, 31, 38, 39, 40], "detail": [2, 3, 11, 15, 16, 17, 18, 19, 21, 23, 25, 27, 29, 30, 32, 34, 35, 37, 40, 42, 45, 57], "download": [2, 8, 27, 29, 32, 37, 45, 54, 55], "copi": [2, 3, 35], "wheel": [2, 33, 34], "model": [2, 3, 13, 15, 16, 17, 18, 19, 20, 21, 22, 29, 30, 39, 40, 46, 50, 52, 56, 57], "directori": [2, 3, 4, 5, 7, 14, 17, 28, 31, 32, 34, 35, 37, 42, 43, 45, 48, 49, 51, 54], "you": [2, 3, 4, 5, 7, 8, 11, 13, 14, 16, 17, 18, 20, 21, 22, 23, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 40, 41, 42, 43, 45, 46, 47, 48, 49, 51, 54, 55, 56], "can": [2, 3, 7, 11, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 25, 27, 28, 29, 30, 31, 32, 33, 34, 36, 37, 38, 40, 45, 49, 54, 55, 56, 57], "get": [2, 4, 7, 11, 13, 16, 21, 27, 29, 30, 31, 32, 34, 35, 42, 43, 45, 48, 49, 51, 54], "link": [2, 35, 46], "pypi": [2, 38, 57], "project": [2, 5, 7, 57], "file": [2, 5, 7, 14, 17, 18, 22, 28, 31, 32, 37, 42, 43, 45, 48, 49, 50, 51, 54, 55, 57], "lib": [2, 14, 16, 28, 34, 35, 37, 50], "To": [2, 3, 4, 7, 18, 19, 24, 27, 29, 32, 34, 35, 36, 37, 40, 45, 46, 48, 49, 54], "optim": [2, 4, 9, 14, 15, 16, 17, 18, 23, 25, 26, 27, 28, 29, 30, 32, 33, 37, 39, 40, 42, 44, 45, 46, 48, 49, 54, 55, 57], "horovod": [2, 32, 33, 37, 39], "oneapi": [2, 14, 16, 21, 31, 33, 40, 42, 43, 45, 46, 48, 49, 50, 51, 54, 55, 56, 57], "collect": [2, 29, 37], "librari": [2, 3, 11, 28, 29, 32, 34, 37, 49], "oneccl": [2, 32, 33, 37], "mkdir": [2, 3, 55, 56], "cd": [2, 5, 7, 16, 29, 31, 34, 35, 42, 45, 48, 49, 51, 52, 54, 56], "wget": [2, 7, 29, 32, 34, 35, 37, 42, 50, 52], "sh": [2, 3, 5, 14, 31, 32, 33, 34, 35, 37, 41, 42, 43, 45, 46, 48, 49, 50, 51, 52, 54, 55, 57], "o": [2, 16, 22, 32, 33, 35, 37, 39, 46], "some": [2, 11, 16, 18, 19, 26, 27, 28, 29, 34, 35, 45, 52], "python": [2, 4, 9, 14, 16, 19, 22, 23, 25, 26, 27, 28, 29, 31, 32, 33, 34, 36, 37, 40, 41, 45, 46, 47, 49, 50, 51, 52, 54, 55, 57], "hard": [2, 48], "code": [2, 4, 5, 9, 11, 16, 20, 21, 22, 23, 29, 31, 38, 39, 40, 42, 46], "insid": [2, 57], "If": [2, 3, 5, 16, 20, 22, 25, 26, 27, 28, 29, 30, 32, 34, 35, 36, 37, 40, 42, 43, 45, 46, 47, 48, 49, 51, 54], "re": [2, 29, 41], "3": [2, 7, 18, 20, 22, 24, 25, 26, 27, 28, 29, 30, 33, 34, 35, 36, 37, 40, 41, 46, 47, 56], "10": [2, 14, 16, 18, 19, 25, 27, 28, 32, 34, 35, 36, 37, 46, 55, 56, 57], "2": [2, 14, 15, 17, 18, 19, 20, 24, 25, 27, 28, 29, 30, 33, 34, 36, 37, 40, 42, 43, 45, 46, 47, 48, 49, 51, 52, 54, 55, 56, 57], "13": [2, 16, 32, 33, 34, 35, 36, 37, 40, 46, 52, 56, 57], "ubuntu": [2, 16, 31, 34, 35, 36, 37], "22": [2, 16, 31, 32, 34, 36, 37, 56], "04": [2, 16, 31, 32, 34, 36, 37], "layer": [2, 9, 19, 25, 27, 46], "updat": [2, 18, 27, 31, 32, 33, 34, 35, 36, 37, 56], "shown": [2, 3, 15, 22, 24, 28, 45, 48, 49, 54], "below": [2, 3, 16, 24, 25, 27, 28, 29, 30, 31, 32, 34, 45], "image_nam": [2, 3], "arg": [2, 13, 30], "ubuntu_vers": 2, "python3": [2, 5, 33, 34, 48, 50], "tf_ver": 2, "whl": [2, 11, 32, 34, 35, 57], "t": [2, 5, 11, 13, 17, 18, 20, 27, 28, 48, 50], "f": [2, 32, 35, 57], "dockerfil": 2, "enter": [2, 3, 22, 33, 34, 35], "folder": [2, 3, 19, 31, 34, 35, 55], "command": [2, 3, 14, 16, 22, 28, 29, 32, 33, 34, 35, 36, 37, 41, 42, 46, 51], "start": [2, 3, 14, 21, 22, 27, 28, 31], "v": [2, 3, 18, 31, 33, 35, 37, 41, 42], "option": [2, 3, 7, 11, 16, 18, 21, 28, 30, 34, 55, 56, 57], "mount": [2, 3], "your": [2, 3, 5, 7, 14, 29, 31, 32, 33, 34, 35, 36, 37, 41, 42, 46, 48, 50, 56, 57], "local": [2, 3, 7, 14, 19, 28, 29, 31, 34, 35, 36, 37, 52], "attach": [2, 3, 27, 29], "devic": [2, 3, 4, 9, 10, 11, 13, 14, 16, 17, 19, 20, 21, 22, 23, 24, 27, 30, 31, 34, 35, 37, 42, 56, 57], "dev": [2, 3, 14, 22, 31, 37, 42, 51], "dri": [2, 3, 31, 37, 42], "dir": [2, 3, 7, 45, 49, 50, 51, 54], "workspac": [2, 3, 31], "path": [2, 3, 7, 16, 18, 19, 20, 22, 28, 29, 30, 31, 32, 33, 34, 35, 37, 42, 46, 49, 51, 54, 55, 56, 57], "privileg": [2, 3, 42], "ipc": [2, 3, 37, 42], "http_proxi": [2, 3], "https_proxi": [2, 3], "no_proxi": [2, 3], "bash": [2, 32, 33, 34, 35, 37, 42, 45, 46, 54, 57], "now": [2, 18, 27, 29, 31], "c": [2, 4, 10, 11, 14, 16, 28, 29, 32, 33, 34, 36, 37, 38, 57], "client": [2, 35], "import": [2, 7, 11, 14, 16, 17, 18, 19, 22, 23, 25, 26, 27, 29, 32, 33, 34, 35, 36, 37, 42, 46, 47, 57], "device_lib": 2, "print": [2, 11, 16, 19, 22, 25, 27, 28, 30, 32, 33, 34, 35, 36, 37, 42, 43, 47, 48, 56, 57], "list_local_devic": 2, "should": [2, 5, 7, 22, 27, 29, 31, 32, 33, 36, 37, 40, 51, 56], "list": [2, 7, 11, 19, 24, 27, 28, 29, 32, 34, 35], "sampl": [2, 22, 40, 46, 48], "output": [2, 7, 11, 13, 19, 20, 24, 25, 27, 30, 32, 34, 35, 42, 46, 51], "look": [2, 16, 24, 31], "name": [2, 3, 4, 5, 7, 11, 14, 16, 18, 19, 20, 25, 26, 27, 29, 31, 39, 48, 52], "0": [2, 5, 11, 14, 15, 16, 19, 20, 22, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 40, 43, 45, 46, 47, 50, 51, 52, 54, 55, 56, 57], "device_typ": [2, 14, 17, 52, 56], "memory_limit": 2, "268435456": 2, "incarn": 2, "9266936945121049176": 2, "xla_global_id": 2, "1": [2, 4, 5, 14, 18, 19, 20, 21, 22, 25, 26, 27, 28, 29, 30, 33, 34, 42, 45, 46, 47, 49, 51, 52, 54, 55, 56, 57], "bus_id": 2, "15031084974591766410": 2, "physical_device_desc": 2, "intel_xpu": 2, "pci": 2, "bu": 2, "id": [2, 31], "undefin": [2, 16], "17448926295332318308": 2, "step": [3, 16, 17, 18, 25, 27, 29, 31, 39, 40, 49, 52, 54, 56], "cpp": [3, 14, 17, 32], "cc": [3, 11, 14, 16, 17, 27, 31, 37, 52, 56], "sourc": [3, 4, 7, 11, 16, 17, 21, 32, 33, 37, 38, 41, 42, 43, 46, 49, 50, 52, 54, 57], "Then": [3, 11, 16, 22, 30, 36, 37, 46], "packag": [3, 16, 29, 32, 33, 34, 36, 40, 46, 49, 50, 54, 57], "p": [3, 25, 31, 36, 37, 42, 55], "bazel": [3, 11, 16, 31], "bin": [3, 7, 11, 16, 28, 31, 34, 35, 37, 41, 42, 43, 46, 49, 50, 52, 54], "cp": [3, 35], "r": [3, 7, 14, 16, 27, 29, 39, 56], "path_to_itex": 3, "out": [3, 15, 16, 27, 35, 43, 47, 48, 56], "k8": [3, 35], "opt": [3, 11, 14, 16, 32, 34, 35, 37, 41, 52], "st": [3, 35], "tar": [3, 7, 29], "cvfh": 3, "path_to_tensorflow_serv": 3, "tensorflow_serv": [3, 31], "model_serv": [3, 31], "tensorflow_model_serv": [3, 31], "gpu": [3, 4, 9, 11, 14, 15, 18, 19, 20, 23, 24, 25, 27, 30, 31, 33, 38, 39], "sure": [3, 11, 16, 27, 32, 34, 35], "meet": [3, 25, 57], "either": [3, 19], "target": [3, 17, 34, 35], "8500": [3, 31], "model_nam": [3, 31], "model_dir": [3, 31, 49, 54, 55, 56], "overview": 4, "infrastructur": [4, 9, 20], "quick": [4, 11, 32, 39], "releas": [4, 14, 17, 29, 30, 31, 34, 35, 40, 48, 50], "frequent": 4, "ask": [4, 34, 35], "guid": [4, 9, 11, 16, 18, 21, 27, 31, 32, 34, 35, 37, 39, 40, 46], "build": [4, 7, 9, 38, 39, 40, 57], "conda": [4, 14, 38], "distribut": [4, 8, 29, 32, 33, 37, 38, 39, 55, 57], "featur": [4, 7, 8, 11, 13, 17, 25, 29, 34, 39, 46, 56, 57], "variabl": [4, 13, 15, 16, 19, 21, 22, 23, 24, 25, 27, 29, 31, 33, 35, 46], "api": [4, 7, 9, 10, 14, 15, 16, 19, 25, 26, 27, 29, 31, 35, 46, 47], "auto": [4, 11, 17, 28, 30, 35], "mix": [4, 30, 39], "precis": [4, 30, 39, 40, 48, 51], "graph": [4, 9, 10, 13, 15, 18, 20, 23, 39, 47, 56, 57], "custom": [4, 7, 9, 18, 21, 26, 28, 30, 32, 37, 45], "oper": [4, 13, 15, 18, 23, 24, 27, 29, 57], "overrid": [4, 11, 18, 27], "int8": [4, 27, 40, 46], "quantiz": [4, 39], "xpuautoshard": [4, 30, 39], "profil": [4, 9, 27, 29], "launcher": [4, 28, 29], "topic": 4, "practic": [4, 27, 28], "support": [4, 7, 13, 14, 15, 17, 18, 19, 22, 24, 27, 28, 29, 30, 32, 34, 35, 36, 37, 40, 42, 46, 55, 56], "openxla": 4, "develop": [4, 16, 21, 29, 32, 34, 35, 36, 37, 57], "design": [4, 7, 9, 14, 21, 31, 40], "structur": [4, 16, 19, 28, 29], "op": [4, 9, 10, 17, 20, 21, 23, 24, 26, 27, 35, 45, 48], "gener": [4, 5, 20, 21, 23, 27, 28, 29, 31, 33, 34, 36, 42, 46], "default": [4, 7, 13, 14, 15, 18, 19, 20, 21, 23, 27, 29, 30, 34, 35, 37, 45, 46, 47, 54, 55, 56], "configur": [4, 8, 11, 14, 16, 17, 19, 21, 23, 27, 28, 30, 32, 37, 55, 57], "good": [4, 19, 21, 23, 29, 31], "perform": [4, 15, 17, 19, 20, 21, 22, 23, 24, 25, 27, 28, 29, 30, 34, 35, 39, 45, 46, 48, 49, 54, 56, 57], "chang": [4, 5, 7, 11, 18, 19, 20, 21, 23, 27, 28, 33, 39, 40, 50, 52], "simpl": [4, 21, 23, 27, 35], "frontend": [4, 21, 23], "util": [4, 9, 11, 14, 21, 23, 28, 29, 50, 56], "user": [4, 5, 7, 11, 13, 19, 20, 21, 23, 32, 34, 35, 36, 37, 38, 42, 48, 57], "onli": [4, 5, 13, 14, 17, 18, 20, 21, 23, 24, 27, 28, 30, 31, 32, 36, 45, 48, 49, 50, 51, 54, 55, 56], "minor": [4, 21, 23], "applic": [4, 21, 23, 29, 30, 31, 40], "scenario": [4, 13, 20, 21, 23, 29, 30], "typic": [4, 21, 23, 27, 29], "need": [4, 8, 13, 14, 16, 17, 20, 21, 23, 27, 28, 31, 32, 33, 34, 35, 37, 42, 46, 47, 50, 55, 56], "add": [4, 5, 17, 18, 19, 24, 29, 31, 32, 35, 42, 48, 56], "two": [4, 13, 14, 19, 21, 23, 27, 29, 34, 42, 45, 48, 49, 54], "three": [4, 21, 22, 23, 28], "claus": [4, 21, 23], "origin": [4, 18, 21, 23, 24, 25, 35, 40, 42, 50], "amp": [4, 18, 28, 39, 49, 54, 57], "low": [4, 18, 21, 23, 27, 40], "data": [4, 15, 16, 17, 18, 21, 22, 25, 27, 30, 34, 35, 37, 40, 42, 44, 45, 46, 48, 49, 50, 51, 52, 54, 55, 56, 57], "type": [4, 7, 11, 14, 18, 20, 21, 28, 30, 33, 34, 35, 42], "bfloat16": [4, 11, 18, 19, 21, 24, 27, 30, 42, 45, 49, 51, 54], "float16": [4, 18, 19, 21, 27, 30, 42], "nativ": [4, 15, 21], "3rd": [4, 21, 36], "xeon": [4, 21, 29, 34, 35, 36, 39, 42], "scalabl": [4, 21, 31, 36, 42], "processor": [4, 21, 29, 36, 42, 46, 47], "cooper": [4, 21, 39, 46], "lake": [4, 21], "avx512": [4, 21, 37, 46], "further": [4, 21], "boost": [4, 21, 28, 29], "less": [4, 18, 19, 21, 24, 27, 42], "memori": [4, 9, 11, 13, 14, 15, 18, 19, 21, 25, 27, 42], "lower": [4, 15, 18, 19, 21, 42], "fulli": [4, 19, 21], "enabl": [4, 13, 15, 16, 17, 18, 21, 22, 25, 27, 28, 29, 30, 33, 34, 35], "fuse": [4, 16, 18, 19, 21, 24, 26, 45], "specif": [4, 16, 27, 29, 30, 31, 32, 37, 55, 57], "new": [4, 5, 7, 8, 15, 21, 23, 24, 27, 29, 40], "better": [4, 15, 18, 19, 21, 24, 25, 28, 29, 39, 45, 46, 48, 49, 54], "conv2d": [4, 21, 47], "relu": [4, 11, 16, 19, 21, 24, 25, 26, 27, 47], "linear": [4, 19, 21, 25, 27], "benefit": [4, 21, 27, 29, 30], "fusion": [4, 9, 16, 17, 18, 19, 21, 26, 30], "deliv": [4, 19, 21], "transpar": [4, 21], "fashion": [4, 21], "implement": [4, 7, 10, 16, 17, 19, 21, 23, 25, 26, 29, 57], "sever": [4, 21, 28, 29, 34, 35, 39, 55], "namespac": [4, 17, 21, 23, 25, 26, 30, 35], "extend": [4, 14, 21, 23, 25, 29, 30], "defin": [4, 16, 27], "export": [4, 7, 11, 15, 16, 17, 18, 19, 21, 22, 27, 28, 29, 31, 33, 35, 37, 41, 42, 46, 51, 55, 56], "ze_enable_tracing_lay": [4, 21, 22, 27], "usecyclespersecondtim": [4, 21, 22, 27], "enable_tf_profil": [4, 21, 22, 27], "co": [4, 14, 15, 21], "neural": [4, 15, 21, 29, 39, 40, 46], "compressor": [4, 15, 21, 39, 40, 46], "solut": [4, 14, 15, 21], "equival": [4, 27], "experiment": [4, 13, 14, 16, 22, 30, 34, 35, 37], "automat": [4, 5, 16, 17, 18, 19, 21, 26, 27, 28, 29, 30, 32, 37, 39, 43, 47, 56], "shard": [4, 17, 21, 30], "input": [4, 11, 13, 17, 19, 20, 21, 22, 24, 25, 27, 30, 56], "place": [4, 17, 21, 29, 35], "maxim": [4, 17, 21, 25, 30, 56], "hardwar": [4, 17, 19, 21, 23, 25, 28, 30, 39], "usag": [4, 14, 21, 29, 30, 39], "adopt": [4, 15, 21], "uniform": [4, 16, 21], "pjrt": [4, 21, 57], "plugin": [4, 10, 16, 18, 19, 21, 22, 31, 34, 52, 57], "mechan": [4, 21], "backend": [4, 16, 21, 23, 26, 27, 30, 37, 42, 43, 46, 47, 57], "show": [5, 14, 16, 18, 27, 34, 35, 37, 39, 40, 42, 44, 45, 46, 48, 49, 50, 51, 54, 55, 56], "script": [5, 21, 22, 29, 34, 35, 42, 45, 47, 49, 50, 54, 55], "relat": [5, 28, 31], "save": [5, 11, 17, 28, 30, 51], "doc": [5, 9, 11, 50], "build_doc": 5, "trigger": [5, 19, 30], "merg": 5, "pr": 5, "github": [5, 7, 8, 16, 21, 29, 31, 34, 35, 37, 40, 42, 45, 48, 49, 51, 52, 54, 55, 56, 57], "repo": [5, 32, 33], "main": [5, 16, 17, 21, 32, 35, 49, 52, 54, 55], "branch": [5, 7, 16, 34], "execut": [5, 11, 13, 15, 16, 17, 18, 19, 20, 22, 25, 27, 29, 39, 46, 47], "content": [5, 35, 37], "doesn": [5, 17, 18, 50], "contain": [5, 9, 15, 17, 28, 29, 31, 38, 39, 49, 54, 57], "won": [5, 28], "product": [5, 7, 21, 31, 32], "git": [5, 11, 16, 30, 31, 34, 35, 42, 45, 48, 49, 51, 52, 54, 55, 56], "tag": 5, "must": [5, 15, 27], "ad": [5, 13, 17, 18, 21, 23, 27, 34, 45, 56], "same": [5, 7, 14, 16, 20, 21, 23, 24, 25, 27, 28, 29, 30, 31, 35, 40, 47], "manual": [5, 7, 18, 27, 28, 55], "result": [5, 15, 16, 17, 19, 22, 27, 29, 30, 33, 40, 43, 45, 47, 48, 50, 56], "gh": 5, "page": [5, 21, 22, 23, 29, 57], "io": [5, 31], "site": [5, 8, 32, 33, 34, 37, 50, 57], "note": [5, 11, 17, 18, 20, 25, 27, 28, 30, 31, 34, 35, 37, 42, 48, 52, 55], "write": [5, 7, 19], "abl": 5, "clone": [5, 16, 31, 34, 35, 45, 48, 49, 51, 54, 55, 56], "extens": [5, 8, 9, 11, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, 28, 29, 33, 38, 39, 41, 42, 43, 44, 45, 47, 48, 49, 50, 51, 52, 54, 55, 56, 57], "tensorflow": [5, 8, 9, 10, 11, 13, 14, 15, 16, 17, 20, 22, 24, 25, 26, 27, 28, 29, 33, 38, 39, 41, 42, 43, 44, 45, 47, 48, 49, 50, 51, 52, 54, 55, 56, 57], "checkout": [5, 16, 31, 35, 49, 54, 56], "build_tmp": 5, "m": [5, 16, 28, 29, 40, 41, 48, 49, 52, 54], "push": 5, "befor": [5, 7, 11, 18, 19, 24, 27, 28, 29, 34, 35, 39, 56], "submit": [5, 7, 57], "modifi": [5, 35, 42, 56], "draft": 5, "server": [5, 16, 34, 35, 37], "9000": 5, "web": [5, 50], "browser": [5, 22, 36, 37, 46, 48, 50], "g": [5, 17, 27, 35, 55], "chrome": 5, "127": [5, 31], "localhost": [5, 11, 20, 36, 37, 52], "check": [5, 7, 11, 13, 14, 18, 19, 21, 23, 27, 28, 32, 33, 34, 35, 40, 41, 42, 51, 52, 57], "picker": 5, "function": [5, 16, 17, 20, 21, 23, 25, 26, 27, 29, 30], "want": [5, 7, 27, 28, 32, 34, 37, 48, 51], "switch": [5, 29], "begin": [7, 11, 42], "share": [7, 14, 29, 32, 42, 43, 45, 48, 49, 51, 54], "intent": 7, "team": [7, 48], "base": [7, 11, 14, 15, 16, 18, 19, 25, 29, 32, 33, 36, 39, 42, 45, 46, 51, 52, 56, 57], "bug": [7, 57], "propos": [7, 25], "log": [7, 11, 16, 18, 20, 22, 27, 30, 35, 37, 42, 43, 45, 48, 49, 50, 51, 52, 54, 56], "intend": [7, 57], "approv": 7, "fix": [7, 27, 32], "search": [7, 28], "pick": 7, "d": [7, 32, 34, 35, 55], "pleas": [7, 11, 14, 16, 17, 21, 27, 32, 34, 35, 37, 39, 40, 42, 45, 47, 49, 51, 52, 54, 55, 57], "pull": [7, 31, 36, 37, 42], "ensur": [7, 28], "run": [7, 11, 14, 18, 19, 22, 24, 26, 27, 28, 29, 30, 34, 36, 37, 57], "patch": [7, 31, 45, 48, 49, 51, 54, 56], "signific": [7, 18], "requir": [7, 11, 13, 15, 21, 22, 24, 25, 27, 28, 33, 40], "rfc": [7, 16, 21], "process": [7, 11, 21, 27, 28, 29, 31, 45, 46], "consist": [7, 27], "discuss": 7, "promot": 7, "found": [7, 14, 27, 28, 29, 31, 34], "dedic": 7, "contributor": [7, 57], "coven": [7, 57], "conduct": [7, 28], "full": [7, 37], "locat": [7, 8, 34, 35, 45, 48], "benchmark": [7, 49, 55], "llga": [7, 30], "saniti": [7, 57], "migrat": 7, "path_to_python_unit_test": 7, "ut": 7, "find": [7, 11, 22, 29, 31], "py": [7, 11, 16, 22, 28, 31, 42, 43, 48, 49, 50, 51, 52, 54, 55, 56], "do": [7, 14, 16, 19, 27, 28, 30, 34, 46], "done": [7, 22, 27, 29, 32], "standard": [7, 25], "pylint": 7, "against": 7, "definit": [7, 18, 23, 30], "root": [7, 34, 35, 50], "pip": [7, 11, 14, 16, 22, 30, 31, 32, 33, 34, 35, 36, 37, 40, 41, 48, 49, 52, 54, 56, 57], "rcfile": 7, "pylintrc": 7, "myfil": 7, "conform": 7, "googl": [7, 14, 16, 21, 22, 31, 51], "both": [7, 14, 15, 18, 19, 23, 28, 29, 30, 37, 42, 55], "clang": 7, "format": [7, 9, 18, 24, 27, 30, 55], "cpplint": 7, "apt": [7, 16, 31, 32, 37], "12": [7, 14, 27, 28, 37, 45, 48, 52, 55, 56, 57], "inplac": 7, "stdout": [7, 28], "filter": 7, "legal": 7, "copyright": 7, "exclud": 7, "third_parti": [7, 9, 31], "recurs": 7, "sometim": 7, "fals": [7, 17, 25, 27, 28, 45, 51, 56], "error": [7, 11, 14, 20, 25, 27, 31, 42, 43, 45, 48, 49, 50, 51, 54], "nolint": 7, "nolintnextlin": 7, "skip": [7, 27, 28, 33], "line": [7, 27, 29, 31, 42, 50, 56], "mkl": [7, 31, 32, 33, 34, 35, 37], "h": [7, 11, 14, 17, 31, 35, 52], "include_subdir": 7, "buildifi": 7, "tool": [7, 9, 11, 14, 18, 29, 32, 33, 34, 37, 57], "bzl": 7, "convent": 7, "xxx": [7, 46, 50], "tpl": 7, "go": [7, 35, 36, 37], "golang": 7, "dl": 7, "go1": 7, "15": [7, 16, 28, 37], "html64": [7, 32], "gz": [7, 29], "sudo": [7, 16, 31, 32, 34, 35, 37], "usr": [7, 28, 32], "xzf": 7, "bazelbuild": [7, 34, 35], "buildtool": 7, "src": [7, 11, 14, 17, 31], "home": [7, 28, 32, 36, 37, 50], "NOT": [7, 14], "zzz": 7, "view": 8, "latest": [8, 16, 31, 33, 34, 35, 37, 57], "previou": [8, 25, 29], "valid": [8, 30], "here": [8, 11, 17, 18, 24, 34, 35, 45, 48, 49, 54, 56], "contact": 8, "addit": [8, 21, 23, 24, 29, 35, 57], "assist": 8, "none": [8, 25, 26, 27, 28, 30], "docker": [9, 38, 39], "docs_build": 9, "core": [9, 11, 14, 16, 17, 26, 27, 29, 34, 35, 37, 46, 47, 52, 56], "test": [9, 19, 22, 27, 31, 33, 39, 49, 50, 57], "third": [9, 57], "parti": [9, 57], "program": [9, 29, 57], "kei": [9, 16, 17, 20, 32], "parent": 9, "sub": [9, 14, 18, 19, 29, 30], "descript": [9, 13, 18, 28, 29, 30, 39, 50], "onednn": [9, 11, 12, 14, 15, 16, 20, 24, 29, 30, 39], "propag": [9, 13, 17], "miscellan": 9, "repositori": [9, 32, 45, 49, 54], "modular": 10, "pluggabl": [10, 35, 37], "streamexecutor": [10, 16], "registr": [10, 11, 49], "pluggabledevic": [10, 57], "pass": [11, 15, 16, 17, 27, 30, 48, 56], "procedur": [11, 16, 32, 36, 37], "tf": [11, 14, 15, 19, 22, 25, 26, 27, 28, 30, 32, 34, 35, 36, 37, 46, 47, 56], "__version__": [11, 30, 32, 34, 35, 36, 37, 57], "verbos": [11, 19, 20, 27, 28], "itex_verbos": [11, 16, 17], "onednn_verbos": 11, "familiar": [11, 16], "architectur": 11, "built": [11, 31, 36, 37], "creat": [11, 18, 27, 28, 30, 33, 37, 41, 46, 49, 54, 56], "offcial": 11, "geluop": 11, "init": 11, "void": 11, "register_geluop": 11, "declar": 11, "call": [11, 15, 16, 26, 27, 29, 30, 38, 41, 46, 47, 50, 51], "nn": [11, 16, 25, 26, 30, 47], "itex_vlog": 11, "statusuniqueptr": 11, "tf_newstatu": [11, 35], "tf_opdefinitionbuild": 11, "op_build": 11, "tf_newopdefinitionbuild": 11, "gelu": [11, 30], "tf_opdefinitionbuilderaddinput": 11, "tf_opdefinitionbuilderaddoutput": 11, "activ": [11, 18, 19, 22, 25, 27, 29, 30, 32, 33, 34, 35, 36, 37, 41, 42, 43, 46, 47, 49, 50, 52, 54], "tf_opdefinitionbuilderaddattr": 11, "half": [11, 27], "float": [11, 18, 20, 27, 30, 35, 42], "approxim": [11, 25], "bool": 11, "true": [11, 22, 25, 26, 27, 28, 30, 45, 51, 56], "tf_opdefinitionbuildersetshapeinferencefunct": 11, "unchanged_shape_fn": 11, "tf_registeropdefinit": 11, "itex_check_eq": 11, "tf_ok": [11, 35], "tf_getcod": [11, 35], "fail": [11, 27, 30], "its": [11, 25, 27, 28, 29, 32, 37, 47], "docstr": 11, "attr": [11, 20], "might": [11, 34], "debug": [11, 20, 22, 30], "one": [11, 14, 15, 20, 21, 27, 29, 34, 35, 42, 47, 49, 55], "made": [11, 49], "separ": [11, 16, 23, 24, 27, 29, 33, 34, 57], "register_kernel_build": 11, "device_cpu": 11, "typeconstraint": 11, "cpudevic": 11, "device_gpu": [11, 17, 56], "gpudevic": 11, "engin": [11, 14], "polymorph": 11, "load_ops_librari": 11, "load": [11, 27, 31, 37], "register_": 11, "macro": 11, "directli": [11, 17, 27, 28, 29, 37], "relubaseop": 11, "eltwisebaseop": 11, "opkernel": 11, "templat": 11, "typenam": 11, "opkernelconstruct": 11, "context": [11, 25, 29], "dnnl": [11, 13], "algorithm": [11, 25], "eltwise_gelu_erf": 11, "0f": 11, "hasattr": [11, 30], "op_requires_ok": 11, "getattr": 11, "approximate_": 11, "alg_kind_": 11, "eltwise_gelu_tanh": 11, "algo": 11, "alpha": 11, "beta": 11, "eltwis": 11, "rewrit": [11, 16, 17], "comput": [11, 15, 16, 25, 27, 29, 32, 40, 47, 48, 57], "ctx": 11, "alpha_": 11, "beta_": 11, "opkernelcontext": 11, "try": [11, 21, 28, 40, 46], "onednn_engin": 11, "creatednnlengin": 11, "tensor": [11, 25, 27, 35, 47], "dst_tensor": 11, "nullptr": 11, "noth": 11, "return": [11, 16, 17, 27, 30, 35], "src_tensor": 11, "shape": [11, 13, 17, 19, 25, 27, 47], "num_el": 11, "allocate_output": 11, "kdstindex": 11, "forward": [11, 27, 48], "descriptor": 11, "primit": [11, 13, 20], "eltwise_forward": 11, "desc": [11, 13], "fwd_desc": 11, "prop_kind": 11, "primitive_attr": 11, "set_scratchpad_mod": 11, "scratchpad_mod": 11, "primitive_desc": 11, "fwd_pd": 11, "fwd_primit": 11, "onednn_stream": 11, "creatednnlstream": 11, "std": [11, 35], "unordered_map": 11, "int": [11, 35], "fwd_primitive_arg": 11, "dnnl_arg_src": 11, "src_mem": 11, "dnnl_arg_dst": 11, "dst_mem": 11, "dnnl_arg_scratchpad": 11, "scratchpad_mem": 11, "catch": 11, "protect": 11, "eltwise_relu": 11, "hpp": 11, "It": [11, 14, 15, 16, 17, 18, 19, 20, 21, 27, 29, 33, 34, 39, 46, 49, 50, 54, 57], "elig": 11, "infer": [11, 15, 17, 18, 19, 24, 27, 31, 39, 40, 46, 50], "backward": [11, 27], "descibl": 11, "click": [11, 34, 35], "header": 11, "itex_xpu_librari": 11, "relu_op": 11, "hdr": [11, 31], "relu_op_functor": 11, "eltwise_base_hdr": 11, "copt": [11, 31], "tf_copt": [11, 31], "linkstat": 11, "dep": [11, 31], "alwayslink": [11, 31], "gpu_kernel": 11, "In": [11, 16, 18, 19, 27, 28, 29, 33, 40, 42, 46, 47, 52, 56], "tip": [11, 20, 29, 31], "compil": [11, 14, 16, 19, 21, 27, 29, 30, 31, 32, 33, 34, 35, 37], "name_scop": 11, "convert_to_tensor": 11, "intel_extension_for_tensorflow": [11, 17, 18, 19, 25, 26, 27, 28, 31, 32, 33, 34, 36, 37, 42, 57], "clean": [11, 35], "xfd": 11, "config": [11, 14, 16, 17, 18, 19, 27, 31, 32, 34, 35, 37, 42, 46, 52, 55, 56], "pip_packag": [11, 34], "build_pip_packag": [11, 34], "uninstal": 11, "intel_extension_for_tensorflow_lib": [11, 34], "x": [11, 19, 25, 26, 27, 34, 35, 42, 47, 52], "constant": [11, 15, 25, 26, 27], "dtype": [11, 19, 25, 26, 47, 56], "float32": [11, 16, 19, 24, 25, 26, 27, 45, 47, 49, 54], "y": [11, 16, 25, 26, 27, 32, 34, 35, 42, 52, 57], "nn_op": 11, "141": 11, "common_runtim": 11, "eager": [11, 25], "1445": 11, "job": [11, 20, 35], "replica": [11, 20], "task": [11, 20, 29, 55], "100": [11, 27, 30, 45], "eltwise_bas": 11, "44": [11, 28], "exec": [11, 13], "ocl": 11, "gen9": 11, "forward_train": 11, "data_f32": 11, "block": [11, 29, 30, 37], "f0": 11, "diff_undef": 11, "undef": 11, "scratchpad": [11, 13], "alg": 11, "5": [11, 18, 19, 20, 22, 25, 27, 30, 32, 34, 35, 36, 45, 47, 51, 56], "xxxxxx": 11, "op_kernel": 11, "773": 11, "object": [12, 14, 18, 27, 29, 30, 42, 43, 45, 48, 49, 50, 51, 54], "cach": [12, 15, 29], "creation": 13, "overhead": [13, 27, 29], "becom": [13, 29], "notic": [13, 27], "especi": [13, 33], "small": [13, 25, 27, 28, 29], "latenc": [13, 42, 48], "bind": [13, 29, 35], "node": [13, 18, 20, 24, 29, 33, 40], "By": [13, 27, 28, 29, 46], "off": [13, 28, 30, 46, 56, 57], "dynam": [13, 27, 29], "mean": [13, 14, 18, 25, 27, 28, 29, 34, 35], "invalid": [13, 29], "dim": 13, "meta": 13, "layout": [13, 28, 30], "parallel": [13, 16, 29], "schedul": [13, 25, 28, 29], "thread": [13, 28, 29, 30, 37], "safe": [13, 18, 30, 57], "stream": [13, 48, 56], "demand": [13, 57], "satisfi": [13, 23], "concurr": [13, 29], "case": [13, 18, 19, 21, 27, 28, 29, 42], "mutex": 13, "lock": 13, "weight": [13, 25, 27, 45, 47, 54, 56], "bia": [13, 20, 24, 25, 47], "temporari": 13, "area": 13, "reorder": 13, "argument": [13, 25, 27, 28, 30], "whether": [14, 24, 28, 29], "successfulli": [14, 31, 33, 34, 35, 37, 56], "platform": [14, 16, 27, 29, 30, 32, 34, 35, 36, 45, 48, 49, 50, 51, 54, 55, 56], "zero": [14, 16, 25, 26, 27, 32], "opencl": [14, 16, 32, 37], "And": [14, 32, 36, 37], "high": [14, 16, 17, 27, 29, 57], "list_physical_devic": [14, 19, 27], "tell": 14, "regist": [14, 16, 40, 46], "2021": 14, "07": [14, 25, 37, 55, 56], "01": [14, 30, 55], "06": [14, 27], "40": [14, 28], "55": [14, 28, 29, 56], "510076": 14, "dpcpp_runtim": [14, 27], "116": 14, "select": [14, 16, 27, 28, 30, 48, 57], "physicaldevic": [14, 52], "physical_devic": [14, 52], "know": [14, 19, 27], "rate": [14, 15, 18, 25, 31], "system": [14, 21, 29, 31, 33, 34, 35], "monitor": 14, "capabl": [14, 27], "clock": 14, "frequenc": 14, "eu": 14, "count": 14, "amount": [14, 27], "so": [14, 16, 19, 27, 28, 29, 30, 31, 34, 35, 42, 43, 45, 48, 49, 50, 51, 52, 54], "each": [14, 25, 27, 28, 29, 56], "modul": [14, 16, 17, 28], "relationship": [14, 18], "replac": [14, 25, 26, 31, 35], "stock": [14, 23, 24, 27, 32, 33, 36, 37, 40, 45, 48, 49, 50, 51, 54, 55, 56, 57], "sinc": [14, 27, 29], "9": [14, 16, 18, 25, 28, 33, 34, 40, 41, 50, 56], "That": [14, 29, 34, 35, 42], "them": [14, 18, 21, 27, 28, 29, 31, 50, 55], "unknown": [14, 27], "help": [14, 19, 20, 21, 28, 29, 37, 40, 46], "acceler": [14, 16, 30, 39, 42, 46, 57], "q1": 14, "2024": 14, "discontinu": 14, "upstream": [14, 18], "futur": 14, "current": [14, 17, 22, 30, 45, 49, 54, 56], "upgrad": [14, 32, 33, 36, 37, 40, 41, 49, 54, 57], "section": [14, 27, 29, 32], "problem": [14, 24, 27, 29], "encount": 14, "sycl": [14, 16], "level_zero_util": 14, "33": [14, 16, 32, 37], "fatal": 14, "level_zero": 14, "ze_api": 14, "modulenotfounderror": 14, "depend": [14, 19, 28, 29, 32, 34, 35, 37], "framework": [14, 32, 35, 42, 43, 44, 45, 48, 49, 51, 54], "errors_impl": [14, 42, 43, 45, 48, 49, 51, 54], "notfounderror": [14, 42, 43, 45, 48, 49, 51, 54], "libmkl_sycl": [14, 42, 43, 45, 48, 49, 51, 54], "cannot": [14, 18, 42, 43, 45, 48, 49, 51, 54], "setvar": [14, 32, 37, 41, 52], "env": [14, 31, 33, 34, 35, 37, 41, 46, 48], "var": [14, 31, 33, 34, 35, 37], "toolkit": [14, 16, 32, 33, 40, 42, 52, 57], "glibcxx_3": 14, "4": [14, 17, 18, 20, 24, 25, 27, 28, 29, 33, 45, 47, 52, 54, 56], "30": [14, 35, 56], "forg": 14, "gxx_linux": 14, "64": [14, 16, 17, 19, 27, 28, 32, 34, 36, 37, 45], "higher": [14, 15, 20, 27, 29], "glibcxx": 14, "veri": [15, 27, 45], "popular": 15, "deep": [15, 29, 39, 57], "techniqu": [15, 27], "invent": 15, "improv": [15, 19, 27, 29, 34, 35, 56], "speed": [15, 18, 29, 39, 40], "minim": [15, 29], "number": [15, 24, 27, 29, 39, 40, 45, 48, 55, 56], "bit": [15, 16, 18, 27, 30, 32, 34, 36, 37, 42], "convert": [15, 17, 18, 19, 27, 40, 42, 49, 55], "real": [15, 27, 55], "valu": [15, 17, 18, 20, 25, 27, 28, 29, 30, 55], "represent": 15, "mainli": [15, 17, 28], "phase": [15, 45], "loss": [15, 18, 19, 39, 40, 46, 52], "accuraci": [15, 18, 19, 25, 27, 39, 40, 46, 52, 55], "reduc": [15, 18, 27, 29, 34, 35, 40, 45, 48, 56], "miss": 15, "cost": 15, "network": [15, 29], "v2": [15, 30, 33, 45, 55], "newer": [15, 40, 41, 46], "integr": [15, 16, 29, 34], "box": 15, "green": 15, "subgraph": 15, "onednngraph": 15, "part": [15, 17, 29, 45, 54], "executor": 15, "partit": [15, 29], "deleg": 15, "grappler": [15, 17, 19, 52], "fold": 15, "itex_tf_constant_fold": [15, 46], "incept": [15, 18, 39, 48], "v3": [15, 39], "introduc": [16, 28, 29], "seamlessli": 16, "simplifi": [16, 40], "quickli": [16, 20, 27], "initi": [16, 17, 20, 27, 34, 35], "pytorch": 16, "xla": 16, "numpi": [16, 22, 25, 27, 47, 49], "style": 16, "compos": [16, 17], "transform": [16, 24, 25], "batch": [16, 17, 25, 27, 28, 56], "differenti": [16, 34], "multipl": [16, 18, 20, 29, 56], "_src": 16, "xla_bridg": 16, "register_pjrt_plugin_factori": 16, "getenv": 16, "pjrt_names_and_library_path": 16, "your_itex_path": 16, "libitex_xla_extens": 16, "jaxlib": 16, "xla_extens": 16, "lastest": 16, "interfac": [16, 17, 38, 57], "got": 16, "getpjrtapi": 16, "verifi": [16, 33, 34, 35, 39, 45, 48, 49, 50, 51, 54, 55, 56], "max": [16, 30, 34, 35, 37, 42, 44, 45, 48, 49, 50, 51, 52, 54, 55, 56], "647": [16, 32, 37], "flex": [16, 34, 35, 37, 40, 42, 44, 46, 48, 51, 57], "170": [16, 34, 35, 37, 48, 51], "arc": [16, 34, 35, 37, 42, 57], "red": [16, 37], "hat": [16, 37], "8": [16, 18, 25, 27, 28, 30, 32, 34, 35, 36, 37, 45, 46, 55], "6": [16, 18, 27, 30, 37, 45], "suse": [16, 37], "enterpris": [16, 37], "sle": [16, 37], "sp3": [16, 37], "sp4": [16, 37], "2023": [16, 32, 33, 37, 52], "19": [16, 28, 32, 36, 37], "later": [16, 29, 32, 36, 37], "manylinux2014": [16, 32, 36, 37], "append": [16, 32, 36, 37], "after": [16, 17, 18, 19, 22, 24, 26, 27, 29, 30, 32, 33, 37, 40, 45], "compon": [16, 17, 19, 30, 32, 33, 34, 35, 37], "icd": [16, 32, 37], "23": [16, 28, 32, 37, 56], "17": [16, 28, 32, 35, 37], "26241": [16, 32, 37], "There": [16, 21, 34, 40, 42, 46, 55], "ye": [16, 19, 33], "wish": [16, 34], "n": [16, 18, 24, 25, 29, 30, 33, 34, 35, 47], "libitex": [16, 35], "ld_library_path": [16, 35, 37], "your_python_sit": 16, "info": [16, 17, 18, 28, 35, 40, 42], "jnp": 16, "jit": 16, "def": [16, 27], "lax_conv": 16, "random": [16, 25, 47], "prngkei": 16, "lh": 16, "rh": 16, "side": 16, "lax": 16, "conv_with_general_pad": 16, "multipli": [16, 27], "itex_gpu_runtim": 16, "129": [16, 28], "servic": [16, 49], "176": [16, 32], "0x56060b5ae740": 16, "doe": [16, 24, 27], "guarante": [16, 32], "184": 16, "0449753": 16, "093208": 16, "1844783": 16, "9769732": 16, "5857391": 16, "6942389": 16, "9218378": 16, "2862523": 16, "1549542": 16, "8367321": 16, "3978379": 16, "3860377": 16, "9456574": 16, "062028": 16, "0365305": 16, "901286": 16, "5255247": 16, "1421617": 16, "0621": 16, "2933435": 16, "1257985": 16, "1095486": 16, "5584903": 16, "1229166": 16, "7746235": 16, "2446113": 16, "7870374": 16, "8216239": 16, "557919": 16, "9832508": 16, "0887792": 16, "5433128": 16, "9749291": 16, "2580051": 16, "6096935": 16, "264905": 16, "175818": 16, "0094342": 16, "005763": 16, "6559253": 16, "3896458": 16, "4036925": 16, "1342552": 16, "8239582": 16, "6091168": 16, "434404": 16, "671778": 16, "7397764": 16, "930626": 16, "659667": 16, "6508744": 16, "3305787": 16, "4061482": 16, "0829628": 16, "130649": 16, "6637266": 16, "594426": 16, "2636002": 16, "7168686": 16, "8598001": 16, "9009514": 16, "7938274": 16, "4870623": 16, "6193901": 16, "5297288": 16, "0247464": 16, "0905268": 16, "7598859": 16, "9362347": 16, "9513799": 16, "9403584": 16, "1483061": 16, "hlo_pass_pipelin": 16, "301": 16, "hlo": 16, "pipelin": [16, 39, 40, 46], "jit_lax_conv": 16, "181": 16, "fusion_merg": 16, "multi_output_fus": 16, "conv": [16, 17, 24, 47], "convolut": [16, 29], "gpu_compil": 16, "1221": 16, "llvm": 16, "spir_compil": 16, "255": [16, 19, 27], "compiletargetbinari": 16, "compiletospir": 16, "11": [16, 18, 28, 32, 33, 34, 35, 55, 57], "cumul": 16, "99": 16, "74": 16, "pjrt_stream_executor_cli": 16, "2201": 16, "num_replica": 16, "num_partit": 16, "num_addressable_devic": 16, "2268": 16, "replic": 16, "complet": [16, 29], "1208": 16, "pjrtstreamexecutorbuff": 16, "delet": 16, "1299": 16, "toliter": 16, "v0": [16, 30, 33], "mnist_classifi": 16, "given": [17, 25, 28, 49], "tile": [17, 20, 30, 45, 52, 54, 55, 56], "split": [17, 18], "dimens": 17, "As": [17, 24, 27, 28, 29], "first": [17, 18, 19, 22, 24, 25, 27, 28, 29, 32, 33, 36, 37, 45, 49, 54], "limit": [17, 29, 57], "homogen": 17, "At": [17, 21, 40, 48], "tfg": 17, "mlir": 17, "assum": [17, 27, 29, 33, 34, 35, 45, 49, 54], "matmul": [17, 20, 24, 26, 35], "normal": [17, 20, 25, 27, 29, 34, 42], "autoshard": [17, 56], "back": [17, 27], "under": [17, 23, 26, 28, 30, 34, 46], "primari": [17, 29], "entri": 17, "point": [17, 18, 20, 27, 30, 32, 37, 42], "auto_sharding_pass_mlir": 17, "invok": 17, "hook": 17, "convers": [17, 18, 19, 24], "between": [17, 18, 19, 21, 29, 31, 34, 48, 55, 56], "graphdef": [17, 18], "dialect": 17, "type_infer": 17, "tfg_to_h": 17, "auto_sharding_pass": 17, "hs_to_tfg": 17, "mark": 17, "scope": [17, 35], "unshard": 17, "annot": 17, "uniniti": 17, "properti": [17, 18, 27], "ir": 17, "heterogen": [17, 57], "reli": 17, "heurist": 17, "hsp": 17, "per": [17, 27, 28, 29, 33, 52, 56], "semant": [17, 20, 25], "final": [17, 19, 27, 45], "accord": [17, 18, 42, 50, 52, 55], "turn": [17, 57], "graphopt": [17, 18, 19, 42, 56], "ON": [17, 30, 42, 56], "flag": [17, 35, 54], "global": [17, 27, 30, 56], "shardingconfig": [17, 56], "mode": [17, 20, 24, 30, 45, 48, 55], "auto_mod": [17, 56], "paramet": [17, 26, 42], "batch_siz": [17, 19, 27, 49, 56], "stage_num": [17, 56], "decid": 17, "device_num": [17, 56], "graph_opt": [17, 18, 19, 30, 42, 46, 56], "sharding_config": [17, 56], "itex_cfg": [17, 56], "configproto": [17, 18, 19, 42, 46, 56], "set_config": [17, 18, 19, 42, 56], "itex_optimizer_before_shard": 17, "pbtxt": 17, "itex_optimizer_after_shard": 17, "resnet50": [17, 28, 39, 53], "train": [17, 18, 21, 24, 25, 26, 28, 31, 32, 33, 37, 38, 39, 42, 45, 46, 51, 53], "fp16": [18, 19, 39, 42, 45], "bf16": [18, 19, 24, 39, 40, 42, 45, 49, 54, 55, 56], "obvious": 18, "compar": [18, 27, 29, 39], "fp32": [18, 19, 20, 24, 39, 40, 45, 46, 54], "danger": 18, "order": [18, 19, 27, 28, 29, 33, 38], "achiev": [18, 29], "faster": [18, 19, 25, 27, 29, 42], "strong": 18, "four": 18, "allowlist": 18, "denylist": 18, "inferlist": 18, "clearlist": 18, "let": [18, 27, 31], "balanc": [18, 19], "expect": [18, 33, 46, 57], "alwai": [18, 27], "critic": 18, "addition": [18, 27], "downstream": 18, "too": [18, 27, 32, 37], "exp": 18, "gt": [18, 30, 56], "due": [18, 29], "effect": [18, 28, 29], "desir": [18, 28], "explain": 18, "principl": 18, "index": [18, 29], "7": [18, 27, 28, 30, 45, 48], "everi": [18, 20, 48], "ii": [18, 19, 30], "whose": 18, "iii": [18, 19], "deni": 18, "ignor": [18, 27, 31], "iv": [18, 19], "insert": [18, 19, 24, 46], "increas": [18, 27, 46], "priorit": 18, "auto_mixed_precision_opt": [18, 19, 42], "automixedprecosionopt": 18, "16": [18, 27, 28, 30, 36, 42, 45, 55], "32": [18, 25, 26, 27, 28, 30, 42, 45, 51, 55], "data_typ": [18, 19, 42], "itex_auto_mixed_precision_data_typ": [18, 19, 42], "ampthre": 18, "default_data_typ": [18, 30], "unsafe_force_al": 18, "itex_auto_mixed_precision_unsafe_force_al": 18, "allowlist_add": [18, 19], "itex_auto_mixed_precision_allowlist_add": [18, 19], "string": [18, 27, 28, 34, 35], "denylist_add": 18, "itex_auto_mixed_precision_denylist_add": 18, "inferlist_add": 18, "itex_auto_mixed_precision_inferlist_add": 18, "clearlist_add": 18, "itex_auto_mixed_precision_clearlist_add": 18, "allowlist_remov": 18, "itex_auto_mixed_precision_allowlist_remov": 18, "denylist_remov": 18, "itex_auto_mixed_precision_denylist_remov": 18, "inferlist_remov": [18, 19], "itex_auto_mixed_precision_inferlist_remov": [18, 19], "clearlist_remov": 18, "itex_auto_mixed_precision_clearlist_remov": 18, "avgpool": [18, 19], "mani": [18, 21, 27, 28, 29, 52], "extra": [18, 27], "up": [18, 22, 27, 29, 32, 35, 39, 45, 48, 51], "tabl": [18, 27, 28], "correspond": [18, 28], "itex_auto_mixed_precision_log_path": [18, 19, 20, 30], "tf_auto_mixed_precision_graph_rewrite_log_path": 18, "tf_auto_mixed_precision_graph_rewrite_level": 18, "tf_auto_mixed_precision_graph_rewrite_allowlist_add": 18, "tf_auto_mixed_precision_graph_rewrite_denylist_add": 18, "tf_auto_mixed_precision_graph_rewrite_inferlist_add": 18, "tf_auto_mixed_precision_graph_rewrite_clearlist_add": 18, "tf_auto_mixed_precision_graph_rewrite_allowlist_remov": 18, "tf_auto_mixed_precision_graph_rewrite_denylist_remov": 18, "tf_auto_mixed_precision_graph_rewrite_inferlist_remov": 18, "tf_auto_mixed_precision_graph_rewrite_clearlist_remov": 18, "With": [18, 19, 27, 28, 40, 43, 47, 48], "most": [18, 19, 27, 28, 29, 42, 50], "basic": [18, 19, 20, 27], "itexauto_mixed_precision_opt": [18, 19], "automixedprecisionopt": [18, 19, 42], "float16graph_opt": [18, 19], "auto_mixed_precision_optionsgraph_opt": 18, "auto_mixed_precis": [18, 19, 30, 42], "onconfig": [18, 19], "itex_auto_mixed_precis": [18, 19, 28, 30, 42], "1export": [18, 19], "avgpool3d": [18, 19], "cnn": [18, 29, 39, 40], "v4": [18, 39], "epoch": [18, 19, 27, 45, 52, 54], "slower": [18, 19, 27], "becaus": [18, 19, 27], "subsequ": [18, 19, 27, 29, 48], "alreadi": [18, 27, 33, 40], "howev": [18, 21, 24, 27, 28, 29, 48], "usual": 18, "chanc": [18, 27], "my": [18, 19], "automixedprecis": 18, "1657011814330": 18, "pb": [18, 19, 31, 42], "binari": [18, 31, 34, 35], "txt": [18, 32, 37, 48, 51, 56], "text": [18, 39], "preop": 18, "1657011815538": 18, "pre": [18, 30, 36, 37, 45, 50, 54], "paintbucket": 18, "netron": 18, "softmax": [18, 19, 27], "move": [18, 29, 45, 49, 54], "altern": 18, "abov": [18, 19, 22, 27, 28, 29, 32, 42, 45, 46, 49, 50, 51, 52, 54, 56], "littl": 18, "drop": [18, 28], "occupi": 18, "over": [18, 27], "whole": [18, 20, 30, 45], "runtim": [18, 23, 25, 27, 29, 32, 34, 35, 57], "repeat": 18, "until": [18, 29], "reach": 18, "peak": [18, 23], "consumpt": [19, 21, 27, 42], "kera": [19, 25, 26, 46, 48, 52, 57], "similar": [19, 29], "offer": [19, 29], "frozen": 19, "layernorm": [19, 24, 26], "instancenorm": [19, 26], "swish": [19, 24], "power": [19, 57], "versu": [19, 29], "remapp": [19, 24, 30], "exist": [19, 24, 26, 27, 28, 40], "cover": [19, 21, 24, 28, 29], "than": [19, 25, 27, 29, 32, 37, 42, 47, 52], "knowledg": [19, 29], "possibl": [19, 29, 34], "special": [19, 23, 27, 34, 35], "bfloat16graph_opt": 19, "4096": [19, 27], "unit": [19, 25, 27, 29], "num_unit": [19, 27], "els": [19, 27, 35, 55], "784": [19, 27, 28], "digit": [19, 27], "dens": [19, 20, 27], "dense_1": [19, 27], "dense_2": [19, 27], "dense_logit": [19, 27], "predict": [19, 26, 27, 51], "sparse_categorical_crossentropi": [19, 27], "rmsprop": [19, 27], "metric": [19, 27], "x_train": [19, 27], "y_train": [19, 27], "x_test": [19, 27], "y_test": [19, 27], "dataset": [19, 27, 46, 52], "mnist": [19, 27, 31, 39, 52], "load_data": [19, 27], "reshap": [19, 25, 27], "60000": [19, 27], "astyp": [19, 27, 47], "10000": [19, 25, 27], "histori": [19, 27], "fit": [19, 29], "8192": [19, 27], "validation_split": [19, 27], "test_scor": [19, 27], "evalu": [19, 27, 48, 51], "stabil": [19, 27], "rule": 19, "introduct": [19, 57], "adjust": [20, 25], "Not": 20, "rest": [20, 24], "ll": [20, 24], "prioriti": [20, 30], "itex_tile_as_devic": 20, "card": [20, 52], "treat": 20, "itex_fp32_math_mod": 20, "math": [20, 24, 27, 32, 37], "tf32": 20, "bf32": 20, "auto_mixed_precision_log_path": [20, 30], "tf_cpp_max_vlog_level": 20, "itex_cpp_min_log_level": 20, "tf_cpp_min_log_level": 20, "displai": 20, "onc": [20, 27, 29], "across": [20, 25], "iter": [20, 56], "larg": [20, 27, 29, 39], "dump": 20, "bert": [20, 39], "encod": 20, "layer_0": 20, "biasadd": [20, 26], "read": [20, 27, 40, 49], "dt_float": [20, 35], "data_format": [20, 56], "nhwc": [20, 29], "remain": 20, "situat": [20, 30], "preserv": 20, "dpc": [21, 32, 33, 34, 35, 37], "besid": [21, 29], "etc": [21, 32], "aka": 21, "almost": 21, "thing": 21, "expos": [21, 22, 57], "factor": [21, 28], "influenc": [21, 28, 29], "properli": [21, 28], "unifi": [21, 28], "topologi": [21, 28, 29], "combin": [21, 28, 29, 48], "autom": [21, 28], "complic": [21, 28], "launch": [21, 37, 48], "blob": [21, 31], "20230123": 21, "md": 21, "openxla_support_on_gpu": 21, "tfx": 21, "bridg": [21, 31], "streamlin": [21, 31], "deploi": [21, 31], "while": [21, 27, 29, 30, 31, 34, 43, 47, 50], "effici": [21, 29, 31, 56], "easi": [21, 40, 57], "track": [22, 50], "item": 22, "stat": 22, "trace": 22, "viewer": 22, "tensorflow_hub": 22, "tensorboard": [22, 57], "np": [22, 25, 47, 49, 52, 54, 55], "tf_hub": 22, "logpath": 22, "join": [22, 29], "profiler_demo": 22, "set_log_device_plac": 22, "keraslay": 22, "tfhub": 22, "imagenet": 22, "resnet_v1_50": 22, "classif": 22, "ones": [22, 25, 26, 30], "224": 22, "warm": 22, "stop": [22, 29], "demo": 22, "logdir": 22, "bind_al": 22, "analyz": 22, "tab": 22, "dashboard": 22, "refresh": 22, "bring": [23, 27, 28, 57], "deeper": 23, "choos": [23, 25, 27, 28, 29, 34, 35, 38, 42, 46, 47, 49], "These": [24, 27, 28, 57], "equal": [24, 29], "notequ": 24, "greaterequ": 24, "greater": [24, 29], "lessequ": 24, "l2loss": 24, "addn": 24, "batchmatmul": [24, 26], "mul": 24, "trainingop": 24, "relu6": 24, "elu": 24, "leakyrelu": 24, "gelu_erf": 24, "gelu_tanh": 24, "tanh": [24, 25, 26], "sigmoid": [24, 25, 26], "fusedbatchnorm": 24, "fusedbatchnormgrad": 24, "relugrad": 24, "biasaddgrad": 24, "convgradfilt": 24, "pad": [24, 25, 47], "break": 24, "closer": 24, "accmatmul": 24, "fusedmatmul": 24, "fusedaccmatmul": 24, "matcher": 24, "withsum": 24, "attribut": [24, 30], "tout": 24, "tpost": 24, "is_bf16_math_mod": 24, "boolean": [24, 28], "indic": [24, 27, 42, 56], "transpos": [24, 26], "conv3d": 24, "maxpool3d": 24, "unnecessari": [24, 27, 29], "ndhwc": 24, "ncdhw": 24, "adam": 25, "decai": 25, "weight_decay_r": 25, "001": [25, 26], "learning_r": [25, 51], "beta_1": 25, "beta_2": 25, "999": 25, "epsilon": [25, 26], "1e": [25, 27], "exclude_from_weight_decai": 25, "layer_norm": 25, "kwarg": [25, 26], "adamw": 25, "describ": [25, 27, 28, 29], "decoupl": 25, "regular": 25, "loshch": 25, "ilov": 25, "hutter": 25, "pdf": 25, "tfa": [25, 26, 49], "trainabl": 25, "piecewiseconstantdecai": 25, "15000": 25, "lr": [25, 52], "wd": 25, "lambda": 25, "ba": 25, "et": 25, "al": 25, "2016": 25, "axi": [25, 26], "scale": [25, 26, 56], "beta_initi": [25, 26], "gamma_initi": [25, 26], "beta_regular": [25, 26], "gamma_regular": [25, 26], "beta_constraint": [25, 26], "gamma_constraint": [25, 26], "independ": [25, 28], "rather": 25, "close": [25, 29], "deviat": 25, "arang": 25, "99998": 25, "group": [25, 29], "yuxin": 25, "wu": 25, "kaim": 25, "he": 25, "divid": [25, 27, 29], "varianc": 25, "empir": 25, "stabl": [25, 27, 39, 57], "norm": 25, "wide": [25, 39], "rang": [25, 27, 29], "linearli": 25, "4d": 25, "gaussian": 25, "where": [25, 27, 29, 34], "nonlinear": 25, "gate": 25, "sign": [25, 32], "arrai": 25, "00404969": 25, "15865526": 25, "8413447": 25, "9959502": 25, "00363725": 25, "158808": 25, "841192": 25, "9963627": 25, "long": 25, "short": [25, 27], "hochreit": 25, "schmidhub": 25, "1997": 25, "lstm": 25, "200": [25, 26], "recurrent_activ": [25, 26], "use_bia": [25, 26], "kernel_initi": [25, 26], "glorot_uniform": [25, 26], "recurrent_initi": [25, 26], "orthogon": [25, 26], "bias_initi": [25, 26], "constraint": 25, "fallback": 25, "fast": 25, "mask": [25, 39], "strictli": 25, "outermost": 25, "return_sequ": 25, "return_st": 25, "whole_seq_output": 25, "final_memory_st": 25, "final_carry_st": 25, "experimental_ops_overrid": [26, 30], "overload": 26, "kept": [26, 27], "layernormgrad": 26, "itexlayernorm": 26, "itexlayernormgrad": 26, "itexgelu": 26, "itexgelugrad": 26, "addon": [26, 52], "itexlstm": 26, "itexrnn": 26, "mixed_precis": 27, "mixed_float16": 27, "mixed_bfloat16": 27, "distinguish": 27, "nvidia": [27, 45, 48, 49, 54], "is_gpu_avail": 27, "test_func": 27, "identif": 27, "2022": [27, 28, 30], "14": [27, 52], "02": [27, 55], "52": [27, 28], "41": 27, "061277": 27, "w": [27, 39], "gpu_profil": 27, "111": [27, 29], "warn": [27, 28, 35], "061301": 27, "114": [27, 52], "061306": 27, "118": 27, "063685": 27, "063851": 27, "stream_executor": 27, "cuda": 27, "cuda_driv": 27, "269": 27, "cuinit": 27, "303": 27, "063865": 27, "cuda_diagnost": 27, "156": 27, "dut3046": 27, "atsp": 27, "proc": [27, 29], "caus": [27, 29, 50], "set_global_polici": 27, "slowli": 27, "least": [27, 32, 33], "multi": [27, 29, 30, 33, 34, 54, 56], "worker": 27, "messag": [27, 28], "aspect": 27, "constructor": 27, "numer": 27, "queri": 27, "compute_dtyp": 27, "variable_dtyp": 27, "mention": [27, 29], "next": 27, "domin": 27, "neglig": 27, "therefor": [27, 29], "fewer": 27, "finish": [27, 34, 47, 50], "dense1": 27, "dense2": 27, "previous": 27, "Their": 27, "mismatch": 27, "dtype_polici": 27, "incorrect": 27, "end": [27, 39, 40, 46], "would": [27, 32, 34, 55], "correct": [27, 34, 35], "keep": [27, 29], "middl": 27, "fine": [27, 28, 29, 45], "intermedi": 27, "flow": 27, "occur": 27, "think": 27, "But": 27, "necessari": [27, 32, 36, 37, 47], "last": [27, 50], "suffici": 27, "even": [27, 28, 29, 38, 57], "still": 27, "simpli": [27, 56], "particular": 27, "storag": [27, 42, 50], "googleapi": [27, 42, 50], "npz": 27, "11490434": 27, "1u": 27, "don": 27, "divis": 27, "retriev": 27, "scratch": [27, 45], "again": 27, "initial_weight": 27, "get_weight": 27, "6240": 27, "3359": 27, "val_loss": 27, "9755": 27, "val_accuraci": 27, "7494": 27, "83m": 27, "7987": 27, "7520": 27, "3455": 27, "8972": 27, "81m": 27, "3670": 27, "8819": 27, "3753": 27, "8751": 27, "85m": 27, "3555": 27, "8863": 27, "2155": 27, "9377": 27, "84m": 27, "1986": 27, "9410": 27, "4498": 27, "8534": 27, "spend": 27, "afterward": [27, 28, 29], "colab": 27, "rerun": 27, "cell": [27, 48], "On": [27, 29, 32, 36, 37], "significantli": 27, "sped": 27, "world": 27, "doubl": 27, "toi": 27, "entir": 27, "60": [27, 28, 45], "000": 27, "imag": [27, 36, 37, 39, 48], "narrow": 27, "65504": 27, "infin": 27, "much": [27, 29, 46], "256": [27, 56], "inf": 27, "rare": 27, "gradient": 27, "prevent": 27, "concept": [27, 29], "sai": [27, 45], "1024": 27, "greatli": 27, "pseudocod": 27, "loss_scal": 27, "grad": 27, "compute_gradi": 27, "trainable_vari": 27, "tricki": 27, "solv": 27, "explicitli": [27, 28, 30, 46], "wrapper": [27, 37], "lossscaleoptim": 27, "far": 27, "did": [27, 29], "wrap": 27, "highli": 27, "recommend": [27, 29, 30, 31, 32, 33, 34, 36, 37, 41, 46], "been": [27, 29, 48, 56], "known": [27, 50], "loss_object": 27, "sparsecategoricalcrossentropi": 27, "train_dataset": 27, "from_tensor_slic": 27, "shuffl": 27, "test_dataset": 27, "method": [27, 29, 40, 46], "unscal": 27, "get_scaled_loss": 27, "get_unscaled_gradi": 27, "apply_gradi": 27, "nan": 27, "halv": 27, "had": [27, 29], "potenti": [27, 57], "train_step": [27, 45, 56], "gradienttap": 27, "tape": 27, "scaled_loss": 27, "scaled_gradi": 27, "zip": 27, "few": [27, 55], "happen": [27, 50], "qualiti": 27, "test_step": 27, "retrain": 27, "set_weight": 27, "epoch_loss_avg": 27, "test_accuraci": 27, "sparsecategoricalaccuraci": 27, "update_st": 27, "924008369445801": 27, "7239000201225281": 27, "5294489860534668": 27, "9168000221252441": 27, "3364005982875824": 27, "9381000399589539": 27, "25294047594070435": 27, "9486000537872314": 27, "26531240344047546": 27, "9536000490188599": 27, "perspect": [28, 29], "numactl": 28, "placement": [28, 29], "polici": [28, 29, 57], "malloc": [28, 29], "unspecifi": 28, "knob": 28, "your_script": 28, "your_script_arg": 28, "latency_mod": 28, "throughput_mod": 28, "often": [28, 32, 36, 37], "calcul": [28, 48], "mutual": 28, "exclus": 28, "infer_resnet50": [28, 43], "undesir": 28, "log_path": 28, "absolut": 28, "rel": 28, "One": [28, 29], "prefix": 28, "_timestamp_inst": 28, "anoth": [28, 29], "_timestamp_instance_n_cor": 28, "run_20210712212258_inst": 28, "run_20210712212258_instance_0_cores_0": 28, "43": 28, "interpret": 28, "no_python": 28, "prepend": [28, 49, 54, 55], "log_file_prefix": 28, "yourself": 28, "ninstanc": 28, "integ": 28, "instance_idx": 28, "among": [28, 29], "ncore_per_inst": 28, "resourc": [28, 29, 50], "node_id": 28, "skip_cross_node_cor": 28, "cross": [28, 29], "disable_numactl": 28, "disable_taskset": 28, "taskset": 28, "use_logical_cor": 28, "core_list": 28, "core_id": 28, "enable_tcmalloc": 28, "enable_jemalloc": 28, "use_default_alloc": 28, "prefer": [28, 32, 36, 37], "certain": [28, 29], "openmp": 28, "kmp_affin": [28, 29], "granular": [28, 29], "compact": [28, 29], "hyper": [28, 29], "our": 28, "enable_itex_amp": 28, "enable_itex_layout_opt": 28, "itex_layout_opt": [28, 29, 30], "num": [28, 29], "intraop": 28, "interop": 28, "run_20221009103552_instance_0_cores_0": 28, "run_20221009103552_inst": 28, "cat": 28, "09": [28, 55], "35": [28, 37], "53": 28, "136": 28, "__main__": 28, "neither": 28, "nor": 28, "conda_prefix": 28, "virtual_env": 28, "lib64": 28, "sdp": 28, "ld_preload": [28, 29], "omp_num_thread": 28, "96": [28, 35], "kmp_blocktim": [28, 29], "tf_enable_onednn_opt": 28, "137": 28, "localalloc": 28, "95": 28, "tee": [28, 32, 45, 56], "run_20221009104740_inst": 28, "run_20221009104740_instance_0_cores_0": 28, "191": 28, "47": [28, 55], "908": 28, "909": 28, "192": 28, "run_20221009105044_inst": 28, "run_20221009105044_instance_0_cores_12": 28, "50": [28, 54], "693": 28, "694": 28, "run_20221009105320_inst": 28, "run_20221009105320_instance_0_cores_0": 28, "21": 28, "089": 28, "090": 28, "run_20221009105838_inst": 28, "run_20221009105838_instance_0_cores_0": 28, "run_20221009105838_instance_1_cores_12": 28, "run_20221009105838_instance_2_cores_24": 28, "run_20221009105838_instance_3_cores_36": 28, "run_20221009105838_instance_4_cores_48": 28, "59": 28, "run_20221009105838_instance_5_cores_60": 28, "71": 28, "run_20221009105838_instance_6_cores_72": 28, "83": [28, 29], "run_20221009105838_instance_7_cores_84": 28, "58": 28, "38": 28, "757": 28, "772": 28, "795": 28, "24": [28, 52], "806": 28, "36": 28, "817": 28, "48": [28, 55], "828": 28, "839": 28, "72": 28, "850": 28, "84": [28, 29], "run_20221009110327_inst": 28, "run_20221009110327_instance_0_cores_0": 28, "run_20221009110327_instance_1_cores_4": 28, "run_20221009110327_instance_2_cores_8": 28, "run_20221009110327_instance_3_cores_12": 28, "run_20221009110327_instance_4_cores_16": 28, "run_20221009110327_instance_5_cores_20": 28, "run_20221009110327_instance_6_cores_24": 28, "27": [28, 29, 56], "run_20221009110327_instance_7_cores_28": 28, "31": [28, 32], "run_20221009110327_instance_8_cores_32": 28, "run_20221009110327_instance_9_cores_36": 28, "39": 28, "run_20221009110327_instance_10_cores_40": 28, "run_20221009110327_instance_11_cores_44": 28, "run_20221009110327_instance_12_cores_48": 28, "51": 28, "run_20221009110327_instance_13_cores_52": 28, "run_20221009110327_instance_14_cores_56": 28, "run_20221009110327_instance_15_cores_60": 28, "63": 28, "run_20221009110327_instance_16_cores_64": 28, "67": 28, "run_20221009110327_instance_17_cores_68": 28, "run_20221009110327_instance_18_cores_72": 28, "75": 28, "run_20221009110327_instance_19_cores_76": 28, "79": 28, "run_20221009110327_instance_20_cores_80": 28, "run_20221009110327_instance_21_cores_84": 28, "87": 28, "run_20221009110327_instance_22_cores_88": 28, "91": 28, "run_20221009110327_instance_23_cores_92": 28, "03": [28, 55], "198": 28, "215": 28, "216": 28, "229": 28, "241": 28, "254": 28, "266": 28, "278": 28, "20": [28, 36, 54, 56], "290": 28, "302": 28, "28": [28, 29, 33, 37], "315": 28, "327": 28, "339": 28, "351": 28, "364": 28, "376": 28, "388": 28, "56": [28, 29], "400": [28, 55], "413": 28, "425": 28, "68": 28, "438": 28, "452": 28, "76": 28, "465": 28, "80": 28, "480": 28, "494": 28, "88": [28, 51], "509": 28, "92": 28, "run_20221009110849_inst": 28, "run_20221009110849_instance_0_cores_0": 28, "run_20221009110849_instance_1_cores_11": 28, "run_20221009110849_instance_2_cores_22": 28, "run_20221009110849_instance_3_cores_33": 28, "08": [28, 55], "49": [28, 37, 55], "891": 28, "892": 28, "run_20221009110849_instance_1_cores_24": 28, "930": 28, "run_20221009110849_instance_2_cores_48": 28, "951": 28, "run_20221009110849_instance_3_cores_72": 28, "confirm": [28, 34, 35], "34": 28, "586": 28, "assign": [28, 29, 35], "604": 28, "605": 28, "run_20221009111034_instance_0_cores_0": 28, "144": 28, "145": [28, 55, 56], "run_20221009111239_instance_0_cores_24": 28, "run_20221009111753_inst": 28, "run_20221009111753_instance_0_cores_0": 28, "947": 28, "948": 28, "run_20221009111951_inst": 28, "run_20221009111951_instance_0_cores_0": 28, "404": 28, "405": 28, "match": [28, 38], "conf": 28, "549": 28, "550": 28, "malloc_conf": 28, "oversize_threshold": 28, "background_thread": 28, "metadata_thp": 28, "run_20221009112720_instance_0_cores_0": 28, "29": 28, "05": [28, 52], "206": 28, "207": 28, "run_20221009112905_instance_0_cores_0": 28, "911": 28, "run_20221009112956_instance_0_cores_0": 28, "although": 29, "articl": 29, "omp": 29, "briefli": 29, "background": 29, "being": 29, "socket": [29, 33, 56], "competit": 29, "stall": 29, "busi": 29, "uma": 29, "connect": 29, "control": [29, 39, 46, 56], "remot": 29, "lscpu": [29, 46], "platinum": 29, "8180m": 29, "detect": 29, "onboard": 29, "logic": 29, "thu": 29, "total": [29, 52], "112": 29, "second": [29, 46, 55, 56], "neg": 29, "50ghz": 29, "node0": 29, "node1": 29, "friendli": 29, "nchw": 29, "idea": 29, "bound": 29, "workload": [29, 39, 46, 57], "nth": 29, "man": 29, "cpunodebind": 29, "membind": 29, "wikipedia": [29, 45], "wherebi": 29, "master": [29, 31], "consecut": 29, "fork": 29, "figur": 29, "illustr": 29, "libgomp": 29, "libiomp": 29, "region": 29, "along": 29, "seen": 29, "coupl": 29, "commonli": 29, "gomp": 29, "affin": 29, "comma": 29, "hyphen": 29, "contigu": 29, "gomp_cpu_affin": 29, "omp_proc_bind": 29, "omp_schedul": 29, "static": 29, "ld": 29, "preload": 29, "libiomp5": [29, 35], "kmp": 29, "dramat": 29, "togeth": 29, "thrash": 29, "suppos": [29, 45], "leav": 29, "compet": 29, "strategi": 29, "proclist": 29, "classic": 29, "blocktim": 29, "millisecond": 29, "wait": 29, "sleep": 29, "200m": 29, "elaps": 29, "larger": [29, 34, 35], "reserv": 29, "sole": 29, "penal": 29, "plai": 29, "role": 29, "destruct": 29, "reus": [29, 40], "jemalloc": 29, "hold": 29, "dealloc": 29, "costli": 29, "gperftool": 29, "plu": 29, "nice": 29, "analysi": 29, "xzvf": 29, "heap": 29, "checker": 29, "debugalloc": 29, "flexibl": 30, "protocolmessag": 30, "easili": 30, "tune": [30, 40, 45], "offononoffoff": 30, "itex_onednn_graph": [30, 46], "itex_layout_optitex_remapperitex_auto_mixed_precisionitex_shard": 30, "except": [30, 37], "enum": 30, "itexdatatyp": 30, "datatyp": [30, 35, 45, 49, 54], "toggl": 30, "unless": 30, "field": 30, "onednn_graph": 30, "onednn_graphoverrid": 30, "layout_opt": 30, "itex_remapp": 30, "itex_shard": 30, "xpu_force_sync": 30, "itex_sync_exec": 30, "sync": 30, "hurt": 30, "rais": 30, "valueerror": 30, "git_vers": [30, 33], "7112d33": 30, "onednn_cpu_git_vers": 30, "a930253": 30, "onednn_gpu_git_vers": 30, "compiler_vers": 30, "gcc": 30, "20180905": 30, "dpcpp": [30, 32], "122": 30, "tf_compatible_vers": 30, "lt": 30, "put": 31, "libitex_cpu_cc": [31, 35], "libitex_gpu_cc": [31, 35], "l28": 31, "exit": 31, "xxxxx": [31, 56], "kernels_experiment": 31, "tf_cuda_librari": 31, "if_not_mobil": 31, "p1": 31, "tf_serv": 31, "serving_plugin": 31, "l24": 31, "l29": 31, "local_repositori": 31, "org_tensorflow": 31, "wno": 31, "stringop": 31, "truncat": 31, "rm": [31, 35, 41, 42, 55], "rf": [31, 41, 55], "tmp": 31, "mnist_saved_model": 31, "saved_model": 31, "l": [31, 35], "modelserv": 31, "plug": [31, 57], "hub": 31, "port": [31, 48], "rest_api_port": 31, "8501": 31, "model_base_path": 31, "tensorflow_plugin": 31, "path_to_libitex_cpu_cc": 31, "oneapi_install_path": 31, "path_to_libitex_gpu_cc": 31, "mnist_client": 31, "num_test": 31, "1000": [31, 49, 55], "xx": [31, 55], "earli": 32, "effort": 32, "basi": 32, "subystem": 32, "graphic": [32, 34, 35], "101": 32, "4255": 32, "dch": 32, "gpg": 32, "agent": 32, "qo": 32, "dearmor": 32, "keyr": 32, "echo": 32, "deb": 32, "arch": 32, "i386": 32, "jammi": 32, "igc": 32, "cm": 32, "libigc1": 32, "13822": 32, "libigdfcl1": 32, "libigdgmm12": 32, "pub": 32, "sw": 32, "archiv": 32, "instead": [32, 45, 48, 49, 54], "icd_23": 32, "04_": 32, "isol": [32, 36, 37], "basekit": [32, 33, 37], "weekli": 32, "env_check": [32, 33, 37, 57], "access": 32, "onemkl": [32, 33, 34, 35, 37], "registrationcent": [32, 37], "akdlm": [32, 37], "irc_na": [32, 37], "992857b9": [32, 37], "624c": [32, 37], "45de": [32, 37], "9701": [32, 37], "f6445d845359": [32, 37], "l_basekit_p_2023": [32, 37], "49397_offlin": [32, 37], "mpi": [32, 33, 37], "deploy": [33, 36, 37], "miniconda": 33, "approach": 33, "easiest": 33, "setup": [33, 36, 38, 40], "press": 33, "curl": 33, "anaconda": 33, "miniconda3": 33, "x86_64": [33, 34, 35], "restart": 33, "termin": 33, "bashrc": 33, "intelpython3_ful": 33, "142f5f29": 33, "ccl": [33, 37], "cluster": 33, "fi_provid": 33, "though": 34, "virtual": [34, 35, 45, 46, 48, 49, 50, 51, 54, 55, 56], "itex_build": [34, 35], "aot": [34, 35], "ahead": [34, 35], "startup": [34, 35], "prolong": [34, 35], "minut": [34, 35], "tookit": [34, 35], "tree": [34, 35], "prompt": [34, 35], "differenct": [34, 35], "fill": [34, 35], "ats": [34, 35], "m150": [34, 35], "acm": [34, 35], "g11": [34, 35], "ve": [34, 35], "140": [34, 35], "m75": [34, 35], "pvc": [34, 35], "a730m": [34, 35], "g10": [34, 35], "a380": [34, 35], "wrong": [34, 35], "identifi": 34, "libitex_common": 34, "_pywrap_itex": 34, "libitex_cpu": 34, "libitex_gpu": 34, "preconfigur": 34, "bazelrc": 34, "shoul": 35, "diretcori": 35, "llvm_openmp": 35, "pythonhost": 35, "ed": 35, "310fee0477ce46f722c561dd7e21eebca0d1d29bdb3cf4a2335b845fbba4": 35, "cp311": 35, "manylinux_2_17_x86_64": 35, "manylinux2014_x86_64": 35, "b": [35, 42, 46, 55, 56], "unzip": 35, "tensorflow_2": 35, "symbol": 35, "ln": 35, "libtensorflow_cc": 35, "libtensorflow_framework": 35, "libtensorflow": 35, "r2": [35, 56], "install_head": 35, "environment": 35, "library_path": 35, "tf_loadpluggabledevicelibrari": 35, "c_api_experiment": 35, "tf_statu": 35, "lib_path": 35, "client_sess": 35, "standard_op": 35, "newrootscop": 35, "assign_x": 35, "randomnorm": 35, "assign_i": 35, "z": [35, 52], "const": 35, "vz": 35, "vector": 35, "clientsess": 35, "session": [35, 46], "fetch": 35, "tf_check_ok": 35, "matrix": 35, "xpu_lib_path": 35, "c_str": 35, "tf_code": 35, "status_msg": 35, "tf_messag": 35, "makefil": 35, "tf_include_path": 35, "tfcc_path": 35, "example_test": 35, "ltensorflow_framework": 35, "ltensorflow_cc": 35, "wl": 35, "rpath": 35, "2nd": 36, "4th": [36, 42], "cento": 36, "sapphir": [36, 42], "rapid": [36, 42], "8888": [36, 37, 42, 46, 48, 50], "pip3": 36, "simultan": 37, "stack": [37, 38], "intel64": 37, "libfabr": 37, "i_mpi_root": 37, "ccl_root": 37, "fi_provider_path": 37, "tbb": 37, "libiari": 37, "en": 37, "consol": 37, "00": [37, 55], "374832": 37, "itex_cpu_wrapp": 37, "42": 37, "217981": 37, "itex_gpu_wrapp": 37, "205706": 37, "313231": 37, "varieti": 39, "classifi": [39, 55], "bare": 39, "metal": 39, "alexnet": 39, "recogn": [39, 40], "handwrit": [39, 40], "ai": [39, 40, 44, 46, 57], "zoo": 39, "diffus": [39, 57], "text2imag": 39, "pretrain": 39, "3d": 39, "unet": 39, "medic": 39, "segment": [39, 54], "technologi": 40, "big": 40, "blocker": 40, "analyt": 40, "websit": [40, 57], "env_nam": 41, "env_itex": [41, 42, 46, 48, 49, 50, 52, 54], "venv": [41, 49, 52, 54], "internet": 42, "throughput": [42, 48], "seriesintel": 42, "170intel": 42, "seriesne": 42, "seriessupport": 42, "itex_repo": 42, "pwd": [42, 56], "infer_inception_v4_amp": 42, "v1_8": 42, "inceptionv4_fp32_pretrained_model": 42, "set_env_gpu": [42, 43, 50], "ws1": 42, "infer_fp32_vs_amp": 42, "screen": 42, "01837550401687622": 42, "0113076031208038": 42, "fp": 42, "128": [42, 45, 51], "92880015134813": 42, "1691980294577": 42, "6153628825864496": 42, "867908472383153": 42, "wors": 42, "set_env_cpu": [43, 50], "env_itex_cpu": [43, 50], "success": [43, 47, 48, 56], "n02123159": 43, "tiger_cat": 43, "22355853": 43, "legaci": [45, 48, 49, 50, 51, 54, 55, 56], "deeplearningexampl": [45, 49, 54], "tensorflow2": [45, 52, 54], "languagemodel": 45, "pip_set_env": [45, 46, 48, 49, 51, 54, 55], "extract": 45, "squad": [45, 51], "bookcorpu": 45, "data_download": 45, "v1": [45, 46, 51, 57], "google_pretrained_weight": 45, "uncased_l": 45, "24_h": 45, "1024_a": 45, "12_h": 45, "768_a": 45, "tfrecord": [45, 49, 55], "books_wiki_en_corpu": 45, "consum": 45, "v100": 45, "dai": 45, "pretrain_bert": 45, "lamb": 45, "maximum": 45, "sequenc": [45, 48], "length": 45, "phase1": 45, "phase2": 45, "512": [45, 51], "train_batch_size_phase1": 45, "train_batch_size_phase2": 45, "eval_batch_s": 45, "learning_rate_phase1": 45, "5e": 45, "learning_rate_phase2": 45, "usa_xla": 45, "num_gpu": [45, 56], "warmup_steps_phase1": 45, "660": 45, "warmup_steps_phase2": 45, "66": 45, "2600": 45, "save_checkpoint_step": 45, "num_accumulation_steps_phase1": 45, "num_accumulation_steps_phase2": 45, "bert_model": [45, 51], "gbs1": 45, "expr": 45, "gbs2": 45, "pretrain_result_dir": 45, "tf_bert_pretraining_lamb_": 45, "_gbs1_": 45, "_gbs2_": 45, "data_dir": [45, 49, 54, 55], "run_pretraining_lamb": 45, "pretrain_lamb": 45, "checkpoint": 45, "batch_size_per_gpu": 45, "learning_rate_per_gpu": 45, "use_xla": 45, "squad_vers": 45, "use_mytrain": 45, "pretrain_path": 45, "phase_2": 45, "ckpt": [45, 51], "result_dir": 45, "tf_bert_finetune_": 45, "run_squad": [45, 51], "calibr": 46, "qdq": 46, "dequant": 46, "flower": 46, "photo": 46, "transfer": 46, "stage": 46, "protobuf": 46, "rewriter_config_pb2": 46, "infer_config": 46, "rewrite_opt": 46, "constant_fold": 46, "rewriterconfig": 46, "set_sess": 46, "speedup": [46, 56], "grep": 46, "vnni": 46, "avx_vnni": 46, "amx": 46, "amx_bf16": 46, "amx_int8": 46, "run_jupyt": 46, "yyi": 46, "xxxxxxxx": 46, "ipynb": [46, 48, 50], "mit": 46, "sy": 47, "num_channel": 47, "input_width": 47, "input_height": 47, "filter_width": 47, "filter_height": 47, "rand": 47, "stride": 47, "bias_add": 47, "479142": 47, "7296917": 47, "6456823": 47, "077278": 47, "9259825": 47, "3000765": 47, "3999124": 47, "0527704": 47, "0656753": 47, "85485": 47, "7297122": 47, "9373732": 47, "4818356": 47, "1455178": 47, "4929404": 47, "6422923": 47, "718459": 47, "7090344": 47, "988714": 47, "3391027": 47, "875052": 47, "6461415": 47, "9349675": 47, "327398": 47, "298973": 47, "3905785": 47, "1704025": 47, "9154005": 47, "6926193": 47, "9677248": 47, "481086": 47, "9746864": 47, "8941312": 47, "3221133": 47, "5479512": 47, "197306": 47, "305706": 47, "9873173": 47, "5597944": 47, "250221": 47, "118212": 47, "8672705": 47, "949225": 47, "2636094": 47, "5300783": 47, "1403804": 47, "1729176": 47, "6628485": 47, "2607155": 47, "6342418": 47, "9381838": 47, "6761076": 47, "5063303": 47, "4718971": 47, "8880196": 47, "1658201": 47, "3787665": 47, "1193419": 47, "42261": 47, "318963": 47, "8809638": 47, "6514435": 47, "3549364": 47, "8598063": 47, "517385": 47, "9702091": 47, "9260886": 47, "3804817": 47, "381424": 47, "6027272": 47, "7787259": 47, "9631021": 47, "93901324": 47, "2134862": 47, "89942324": 47, "cv": 48, "concaten": 48, "loop": [48, 56], "hasn": 48, "reset": 48, "66fa74b6a2a0bb1e563ae8bce66496b118b95200": 48, "ipykernel": 48, "url": [48, 50], "token": [48, 50], "stable_diffussion_infer": 48, "stable_diffusion_infer": 48, "present": 48, "fr\u00e9chet": 48, "distanc": 48, "fid": 48, "outcom": 48, "a100": 48, "stable_diffusion_accuraci": 48, "load_ref_result": 48, "ref_result_dir": 48, "nv_result": 48, "img_arrays_for_acc": 48, "81": [48, 51], "1146879196167": 48, "328223477737884": 48, "3dunet_itex": 49, "3dunet_itex_with_horovod": 49, "unet_3d_med": 49, "88eb3cff2f03dad85035621d041e23a14345999": 49, "nightli": 49, "dllogger": [49, 54], "brain": 49, "tumor": 49, "2019": 49, "upon": 49, "challeng": 49, "ipp": 49, "cbica": 49, "upenn": 49, "edu": 49, "nifti": 49, "volum": 49, "nibabel": 49, "preprocess_data": 49, "train_maskrcnn": [49, 54], "dataset_dir": [49, 54], "output_dir": [49, 51, 54], "exec_mod": 49, "warmup_step": 49, "150": 49, "max_step": 49, "log_everi": [49, 54], "dataset_path": 49, "mpirun": [49, 54, 55], "rank": [49, 54, 55], "ppn": [49, 54, 55], "tutori": 50, "pacakg": 50, "tensorflow_doc": 50, "classify_text_with_bert": 50, "ip": 50, "f502f0715979ec73c571ca5676ba58431b916f5f58ee3333": 50, "crash": 50, "tri": 50, "traceback": 50, "recent": 50, "174": 50, "__del__": 50, "typeerror": 50, "nonetyp": 50, "callabl": 50, "research": 51, "bert_large_dir": 51, "squad_dir": 51, "vocab_fil": 51, "vocab": 51, "bert_config_fil": 51, "bert_config": 51, "json": 51, "init_checkpoint": 51, "do_train": 51, "train_fil": 51, "do_predict": 51, "predict_fil": 51, "train_batch_s": [51, 54], "3e": 51, "num_train_epoch": 51, "max_seq_length": 51, "doc_strid": 51, "use_tpu": 51, "tpu_nam": 51, "produc": 51, "f1": 51, "41249612335034": 51, "exact_match": 51, "2488174077578": 51, "gin": 52, "raw": 52, "train_horovod": 52, "tensorflow2_keras_mnist": 52, "horovodrun": 52, "18": 52, "54": 52, "006950": 52, "custom_graph_optimizer_registri": 52, "163161": 52, "940695": 52, "107809": 52, "163517": 52, "250": 52, "yym": 52, "xxxx": [52, 55], "yyyi": 52, "zzzz": 52, "maskrcnn": 54, "c481324031ecf0f70f8939516c02e16cac60446d": 54, "opencv": 54, "headless": 54, "pybind11": 54, "cocoapi": 54, "egg": 54, "pycocotool": 54, "subdirectori": 54, "pythonapi": 54, "preprocess": 54, "coco": 54, "2017": 54, "download_and_preprocess_coco": 54, "resnet": [54, 55, 56], "download_weight": 54, "save_dir": 54, "pretrained_dir": 54, "seed": 54, "use_synthetic_data": [54, 56], "steps_per_epoch": 54, "log_warmup_step": 54, "lar": 55, "hvd_configur": 55, "hvd_support": 55, "tfd": 55, "trainer": 55, "snippet": 55, "readm": 55, "runner": 55, "ctl": 55, "wherea": 55, "classifier_train": 55, "builder": 55, "record": 55, "yaml": 55, "correspondli": 55, "dummi": 55, "itex_bf16_lar": 55, "itex_fp32_lar": 55, "itex_dummy_bf16_lar": 55, "itex_dummy_fp32_lar": 55, "pythonpath": 55, "config_fil": 55, "itex_xx": 55, "itex_bf16": 55, "itex_fp32": 55, "itex_dummy_bf16": 55, "itex_dummy_fp32": 55, "fi": 55, "vision": 55, "image_classif": [55, 56], "train_and_ev": 55, "model_typ": 55, "number_of_process": 55, "process_per_nod": 55, "i0203": 55, "006297": 55, "139660941027136": 55, "keras_util": [55, 56], "timehistori": [55, 56], "1900": 55, "2000": 55, "590331": 55, "2100": 55, "178206": 55, "2200": 55, "790128": 55, "2300": 55, "408512": 55, "2400": 55, "i0817": 55, "602742": 55, "139898862851904": 55, "600": 55, "603262": 55, "140612319840064": 55, "917546": 55, "800": 55, "917738": 55, "277716": 55, "277811": 55, "555174": 55, "1200": 55, "555221": 55, "accordingli": 56, "tf_num_interop_thread": 56, "tf_num_intraop_thread": 56, "resnet_ctl_imagenet_main": 56, "train_epoch": 56, "steps_per_loop": 56, "log_step": 56, "skip_ev": 56, "distribution_strategi": 56, "use_tf_while_loop": 56, "use_tf_funct": 56, "enable_xla": 56, "enable_tensorboard": 56, "enable_checkpoint_and_export": 56, "channels_last": 56, "single_l2_loss_op": 56, "follw": 56, "use_itex_shard": 56, "pramet": 56, "suggest": 56, "2x256x10": 56, "5120": 56, "itex_enable_multiple_stream": 56, "queue": 56, "resnet50_itex": 56, "tfg_optimizer_hook": 56, "289": 56, "i0324": 56, "594147": 56, "140348344015936": 56, "597360": 56, "479": 56, "sec": 56, "train_accuraci": 56, "train_loss": 56, "634554": 56, "161625": 56, "163815": 56, "790632": 56, "792936": 56, "103148": 56, "25": 56, "416651": 56, "419072": 56, "3359284": 56, "025180": 56, "027671": 56, "3343554": 56, "aim": 57, "flexibli": 57, "diagram": 57, "summari": 57, "ecosystem": 57, "estim": 57, "manag": 57, "dockerhub": 57, "come": 57, "soon": 57, "visit": 57, "tour": 57, "collabor": 57, "adher": 57, "innov": 57, "jax": 57, "vulner": 57, "apach": 57, "govern": 57, "forth": 57}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"contributor": 0, "coven": 0, "code": [0, 7, 17, 19, 34, 35, 45, 47, 48, 49, 50, 51, 52, 54, 55, 56], "conduct": 0, "our": 0, "pledg": 0, "standard": 0, "enforc": 0, "respons": 0, "scope": 0, "guidelin": [0, 7], "1": [0, 11, 16, 31, 32, 35], "correct": 0, "2": [0, 11, 16, 31, 32, 35], "warn": 0, "3": [0, 11, 16, 32], "temporari": 0, "ban": 0, "4": [0, 11, 16, 32], "perman": 0, "attribut": [0, 18], "secur": [1, 57], "polici": [1, 27], "report": 1, "vulner": 1, "intel": [2, 3, 4, 6, 7, 23, 29, 30, 31, 32, 34, 35, 36, 37, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 56, 58], "extens": [2, 3, 4, 6, 7, 10, 23, 30, 31, 32, 34, 35, 36, 37, 40, 46, 58], "tensorflow": [2, 3, 4, 6, 7, 18, 19, 21, 23, 30, 31, 32, 34, 35, 36, 37, 40, 46, 58], "docker": [2, 3, 31, 36, 37, 42, 44], "contain": [2, 3, 36, 37, 42, 44], "guid": [2, 3, 5, 7, 28, 29, 38, 41, 44], "descript": [2, 3], "binari": [2, 3, 57], "prepar": [2, 3, 35, 39, 41, 42, 43, 45, 48, 49, 50, 51, 52, 54, 55, 56], "usag": [2, 15, 17, 18, 19, 22, 26, 28], "i": [2, 3, 28], "custom": [2, 11, 19, 23, 25, 27], "build": [2, 3, 5, 11, 14, 16, 27, 31, 34, 35, 36, 37], "script": [2, 28, 41], "ii": [2, 3, 28], "iii": [2, 28], "run": [2, 3, 16, 31, 32, 35, 39, 40, 41, 42, 43, 44, 45, 46, 48, 49, 50, 51, 52, 54, 55, 56], "verifi": [2, 11, 32, 36, 37], "That": 2, "gpu": [2, 16, 17, 21, 22, 29, 32, 34, 35, 37, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 56, 57], "access": [2, 29], "from": [2, 14, 31, 35, 36, 37], "serv": [3, 21, 31], "imag": [3, 31, 49], "welcom": [4, 6, 58], "document": [4, 5, 6, 7, 57, 58], "highlight": 4, "onlin": 5, "introduct": [5, 13, 23, 40, 42, 44, 45, 46, 48, 49, 50, 51, 54, 55, 56], "updat": 5, "latest": 5, "version": [5, 30, 46], "creat": [5, 34, 35, 52], "releas": [5, 8, 32], "local": [5, 40, 46], "test": [5, 7, 42], "contribut": [7, 57], "develop": 7, "tip": [7, 19], "debug": 7, "unit": 7, "python": [7, 11, 17, 18, 20, 21, 30, 35, 42, 43, 48, 56], "style": 7, "c": [7, 31, 35], "bazel": [7, 34, 35], "known": 8, "issu": 8, "incompat": 8, "chang": [8, 45, 48, 49, 51, 54, 55], "directori": 9, "tree": 9, "structur": [9, 17], "design": [10, 12, 28], "workflow": [10, 15, 17], "resourc": [10, 57], "how": [11, 27], "write": 11, "op": [11, 25, 30], "prerequisit": [11, 30, 43, 45, 48, 49, 50, 51, 54, 55, 56], "defin": 11, "interfac": 11, "regist": 11, "kernel": 11, "implement": [11, 24], "6": 11, "add": 11, "7": 11, "us": [11, 21, 28, 31, 55], "8": 11, "packag": [11, 35, 37, 56], "9": 11, "instal": [11, 16, 31, 32, 33, 34, 35, 36, 37, 38, 47, 52, 56, 57], "optim": [12, 13, 19, 21, 24, 52], "onednn": [13, 46], "object": 13, "cach": 13, "convolut": 13, "frequent": 14, "ask": 14, "question": 14, "troubleshoot": 14, "sourc": [14, 31, 34, 35], "runtim": 14, "int8": [15, 21], "quantiz": [15, 21, 40, 46], "overview": [15, 16, 17, 19, 20, 27, 28, 29, 30, 34], "openxla": [16, 21], "support": [16, 21, 57], "via": [16, 20, 32, 36, 37, 42], "pjrt": 16, "hardwar": [16, 27, 29, 32, 34, 35, 36, 37, 40, 42, 45, 46, 48, 49, 50, 51, 54, 55, 56, 57], "softwar": [16, 29, 32, 36, 37, 57], "requir": [16, 32, 34, 35, 36, 37, 42, 45, 48, 49, 50, 51, 54, 55, 56, 57], "driver": [16, 32, 34, 35, 37, 41], "librari": [16, 31, 35], "jax": 16, "exampl": [16, 17, 18, 19, 22, 28, 34, 35, 39, 43, 45, 47, 48, 49, 51, 52, 54, 55, 56], "xpuautoshard": [17, 21, 56], "experiment": [17, 21, 32], "api": [17, 18, 20, 21, 23, 30, 42, 43, 48, 56], "dump": 17, "graph": [17, 19, 21, 24, 30, 46], "tune": [18, 19, 51], "advanc": [18, 19, 21, 23, 28, 42, 46], "auto": [18, 19, 20, 21], "mix": [18, 19, 20, 21, 24, 27, 42], "precis": [18, 19, 20, 21, 27, 42], "background": [18, 40, 46], "numer": 18, "stabil": 18, "configur": [18, 20, 29, 34, 35, 42, 46], "list": 18, "rule": 18, "improv": 18, "perform": [18, 42], "environ": [18, 20, 28, 30, 32, 33, 34, 35, 36, 37, 40, 41, 42, 43, 45, 46, 48, 49, 50, 51, 52, 54, 55, 56], "variabl": [18, 20, 28, 30, 32, 37, 42], "differ": [18, 27], "stock": [18, 19], "end": 18, "mobilenet": 18, "amp": [19, 21, 42], "v": [19, 28], "data": [19, 24], "type": [19, 24, 27], "featur": [19, 21, 23], "manual": 19, "quick": [19, 44, 47, 57], "train": [19, 27, 44, 49, 50, 52, 54, 55, 56], "setup": [19, 27, 32, 37, 41, 42, 43, 45, 48, 49, 50, 51, 52, 54, 55, 56], "enabl": [19, 41, 42, 43, 45, 46, 48, 49, 50, 51, 52, 54, 55, 56], "origin": 19, "notic": 19, "log": [19, 28], "save": 19, "oper": [19, 21, 25, 26, 30], "itex_verbos": 20, "level": 20, "definit": 20, "backend": 20, "config": [20, 30], "protocol": [20, 30], "option": [20, 32, 35], "eas": 21, "profil": [21, 22], "cpu": [21, 29, 34, 35, 36, 37, 42, 43, 46, 47, 48, 50, 55, 57], "launcher": 21, "faq": [22, 42, 43, 45, 48, 49, 50, 51, 54], "infrastructur": 23, "architectur": 23, "public": 23, "manag": 23, "xpu": [23, 34, 35, 37, 57], "engin": 23, "fusion": 24, "basic": [24, 28], "detail": 24, "gener": 24, "layout": [24, 29], "itex": [25, 30], "adamwithweightdecayoptim": 25, "layernorm": 25, "groupnorm": 25, "gelu": [25, 26], "itexlstm": 25, "overrid": [26, 30], "layer": 26, "normal": 26, "dens": 26, "activ": 26, "instanc": [26, 28], "lstm": 26, "kera": 27, "identifi": 27, "set": [27, 28, 40, 55, 56], "dtype": 27, "model": [27, 31, 42, 44, 45, 48, 49, 51, 54, 55], "fit": 27, "loss": 27, "scale": 27, "underflow": 27, "overflow": 27, "loop": 27, "launch": 28, "user": 28, "common": [28, 34, 35, 41], "execut": [28, 40, 42, 43, 45, 48, 49, 50, 51, 52, 54, 55, 56], "mode": 28, "latenc": 28, "throughput": 28, "multi": [28, 49], "numa": [28, 29], "control": 28, "memori": [28, 29], "alloc": [28, 29], "singl": [28, 49], "infer": [28, 42, 43, 44, 48], "all": 28, "physic": 28, "core": 28, "includ": 28, "logic": 28, "one": 28, "node": 28, "iv": 28, "your": 28, "number": 28, "multipl": 28, "vi": 28, "vii": 28, "viii": 28, "index": 28, "ix": 28, "tf_num_intraop_thread": 28, "x": 28, "tf_num_interop_thread": 28, "tcmalloc": [28, 29], "jemalloc": 28, "default": 28, "practic": 29, "tabl": [29, 57], "content": 29, "non": 29, "uniform": 29, "format": 29, "numactl": 29, "openmp": 29, "omp_num_thread": 29, "gnu": 29, "import": 30, "intel_extension_for_tensorflow": 30, "name": 30, "preserv": 30, "configproto": 30, "gpuoption": 30, "graphopt": 30, "automixedprecisionopt": 30, "shardingconfig": 30, "debugopt": 30, "set_config": 30, "get_config": 30, "server": [31, 40, 46], "dockerfil": [31, 36, 37], "sampl": 31, "arc": 32, "A": 32, "seri": 32, "window": 32, "subsystem": 32, "linux": 32, "wsl2": 32, "nativ": 32, "directli": 32, "step": [32, 33, 42, 43, 48, 50, 55], "By": 32, "instruct": [32, 33], "ubuntu": 32, "pypi": [32, 34, 36, 37], "wheel": [32, 36, 37], "virtual": [32, 36, 37, 41, 52], "system": [32, 36, 37], "full": 32, "oneapi": [32, 34, 35, 37, 41, 52], "conda": [33, 34, 35], "precondit": 33, "download": [34, 35, 42, 50, 52], "extra": [34, 35], "onli": [34, 35, 37], "base": [34, 35, 37, 40, 41], "toolkit": [34, 35, 37, 41], "For": [34, 35], "addit": 34, "cc": 35, "header": 35, "file": 35, "extract": 35, "recommend": 35, "integr": 35, "linker": 35, "load": 35, "get": [36, 37, 57], "dockerhub": [36, 37], "bare": [36, 37, 42, 44], "metal": [36, 37, 42, 44], "check": [37, 46], "platform": 37, "acceler": [40, 44, 45, 49, 54, 56], "alexnet": 40, "devcloud": [40, 46], "up": [40, 42], "speed": 42, "incept": [42, 46], "v4": 42, "automat": 42, "skip": [42, 43, 48, 50, 55], "thi": [42, 43, 48, 50, 55], "clone": [42, 52], "repositori": 42, "pretrain": [42, 45], "compar": 42, "fp32": [42, 48], "result": 42, "method": 42, "resnet50": [43, 55, 56], "output": [43, 47, 48, 52, 55, 56], "deep": [44, 46], "learn": [44, 46], "zoo": 44, "workload": 44, "start": [44, 57], "bert": [45, 50, 51], "larg": [45, 51], "dataset": [45, 49, 54, 55], "command": [45, 52, 55, 56], "finetun": 45, "v3": 46, "xeon": 46, "disabl": 46, "constant": 46, "fold": 46, "function": 46, "boost": 46, "matrix": 46, "startup": [46, 50], "jupyt": [46, 48, 50], "notebook": [46, 48, 50], "licens": [46, 57], "quick_exampl": 47, "py": 47, "note": 47, "stabl": 48, "diffus": 48, "text2imag": 48, "fp16": 48, "accuraci": [48, 51], "3d": 49, "unet": 49, "w": [49, 54], "o": [49, 54], "horovod": [49, 52, 54, 55], "medic": 49, "segment": 49, "tile": 49, "classifi": [50, 51], "text": [50, 51], "fp8": 51, "fine": 51, "bf16": 51, "distribut": 52, "depend": 52, "repo": 52, "patch": [52, 55], "appli": [52, 55], "devic": 52, "count": 52, "refer": 53, "train_resnet50": 53, "mask": 54, "r": 54, "cnn": 54, "If": 55, "imagenet": 55, "paramet": [55, 56], "without": [55, 56], "hvd": 55, "other": 56, "pythonpath": 56, "With": 56, "shard": 56, "further": 56, "channel": 57, "compat": 57, "weekli": 57}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 57}, "alltitles": {"Contributor Covenant Code of Conduct": [[0, "contributor-covenant-code-of-conduct"]], "Our Pledge": [[0, "our-pledge"]], "Our Standards": [[0, "our-standards"]], "Enforcement Responsibilities": [[0, "enforcement-responsibilities"]], "Scope": [[0, "scope"]], "Enforcement": [[0, "enforcement"]], "Enforcement Guidelines": [[0, "enforcement-guidelines"]], "1. Correction": [[0, "correction"]], "2. Warning": [[0, "warning"]], "3. Temporary Ban": [[0, "temporary-ban"]], "4. Permanent Ban": [[0, "permanent-ban"]], "Attribution": [[0, "attribution"]], "Security Policy": [[1, "security-policy"]], "Report a Vulnerability": [[1, "report-a-vulnerability"]], "Intel\u00ae Extension for TensorFlow* Docker Container Guide": [[2, "intel-extension-for-tensorflow-docker-container-guide"]], "Description": [[2, "description"], [3, "description"]], "Binaries Preparation": [[2, "binaries-preparation"]], "Usage of Docker Container": [[2, "usage-of-docker-container"]], "I. Customize Build Script": [[2, "i-customize-build-script"]], "II. Build the Container": [[2, "ii-build-the-container"], [3, "ii-build-the-container"]], "III. Running the Container": [[2, "iii-running-the-container"]], "Verify That Intel GPU is Accessible From TensorFlow": [[2, "verify-that-intel-gpu-is-accessible-from-tensorflow"]], "Intel\u00ae Extension for TensorFlow* Serving - Docker Container Guide": [[3, "intel-extension-for-tensorflow-serving-docker-container-guide"]], "Build the Docker Image": [[3, "build-the-docker-image"]], "I. Binaries Preparation": [[3, "i-binaries-preparation"]], "Running the Container": [[3, "running-the-container"]], "Welcome to Intel\u00ae Extension for TensorFlow* documentation": [[4, "welcome-to-intel-extension-for-tensorflow-documentation"]], "Documentation": [[4, "documentation"], [57, "documentation"]], "Highlights": [[4, "highlights"]], "Online Documentation Build Guide": [[5, "online-documentation-build-guide"]], "Introduction": [[5, "introduction"], [13, "introduction"], [23, "introduction"], [40, "introduction"], [42, "introduction"], [44, "introduction"], [45, "introduction"], [46, "introduction"], [48, "introduction"], [49, "introduction"], [50, "introduction"], [51, "introduction"], [54, "introduction"], [55, "introduction"], [56, "introduction"]], "Update latest Version": [[5, "update-latest-version"]], "Create Release Version": [[5, "create-release-version"]], "Build to Local Test": [[5, "build-to-local-test"]], "Welcome to Intel \u00ae Extension for TensorFlow* documentation!": [[6, "welcome-to-intel-extension-for-tensorflow-documentation"], [58, "welcome-to-intel-extension-for-tensorflow-documentation"]], "Contributing guidelines": [[7, "contributing-guidelines"]], "Contributing to Intel\u00ae Extension for TensorFlow*": [[7, "contributing-to-intel-extension-for-tensorflow"]], "Developing Intel\u00ae Extension for TensorFlow*": [[7, "developing-intel-extension-for-tensorflow"]], "Tips and Debugging": [[7, "tips-and-debugging"]], "Unit testing": [[7, "unit-testing"]], "Python Unit Testing": [[7, "python-unit-testing"]], "Code style guide": [[7, "code-style-guide"]], "Python coding style": [[7, "python-coding-style"]], "C++ coding style": [[7, "c-coding-style"]], "bazel style guide": [[7, "bazel-style-guide"]], "Documentation style guide": [[7, "documentation-style-guide"]], "Releases": [[8, "releases"]], "Known Issues": [[8, "known-issues"]], "Incompatible Changes": [[8, "incompatible-changes"]], "Directory Tree Structure": [[9, "directory-tree-structure"]], "Extension Design": [[10, "extension-design"]], "Workflow": [[10, "workflow"], [15, "workflow"], [17, "workflow"]], "Resources": [[10, "resources"], [57, "resources"]], "How to write custom op": [[11, "how-to-write-custom-op"]], "1. Prerequisite": [[11, "prerequisite"]], "2. Define the op interface and Register op": [[11, "define-the-op-interface-and-register-op"]], "3. Register the kernels for the op": [[11, "register-the-kernels-for-the-op"]], "4. Implement the kernels": [[11, "implement-the-kernels"]], "6. Add the op to BUILD": [[11, "add-the-op-to-build"]], "7. Use the op in Python": [[11, "use-the-op-in-python"]], "8. Build the package": [[11, "build-the-package"]], "9. Install and Verify": [[11, "install-and-verify"]], "Optimizations Design": [[12, "optimizations-design"]], "oneDNN object cache optimization": [[13, "onednn-object-cache-optimization"]], "Optimization in convolution": [[13, "optimization-in-convolution"]], "Frequently Asked Questions": [[14, "frequently-asked-questions"]], "Troubleshooting": [[14, "troubleshooting"]], "Build from source": [[14, "build-from-source"], [31, "build-from-source"]], "Runtime": [[14, "runtime"]], "INT8 Quantization": [[15, "int8-quantization"], [21, "int8-quantization"]], "Overview": [[15, "overview"], [17, "overview"], [19, "overview"], [20, "overview"], [27, "overview"], [28, "overview"], [29, "overview"], [30, "overview"], [34, "overview"]], "Usage": [[15, "usage"], [17, "usage"], [18, "usage"], [18, "id1"], [19, "usage"], [22, "usage"], [26, "usage"]], "OpenXLA Support on GPU via PJRT": [[16, "openxla-support-on-gpu-via-pjrt"]], "1. Overview": [[16, "overview"]], "2. Hardware and Software Requirement": [[16, "hardware-and-software-requirement"]], "Hardware Requirements": [[16, "hardware-requirements"], [32, "hardware-requirements"], [34, "hardware-requirements"], [35, "hardware-requirements"], [36, "hardware-requirements"], [37, "hardware-requirements"], [45, "hardware-requirements"], [48, "hardware-requirements"], [49, "hardware-requirements"], [50, "hardware-requirements"], [51, "hardware-requirements"], [54, "hardware-requirements"], [55, "hardware-requirements"], [56, "hardware-requirements"]], "Software Requirements": [[16, "software-requirements"], [32, "software-requirements"], [36, "software-requirements"], [37, "software-requirements"]], "Install GPU Drivers": [[16, "install-gpu-drivers"], [37, "install-gpu-drivers"]], "3. Build Library for JAX": [[16, "build-library-for-jax"]], "4. Run JAX Example": [[16, "run-jax-example"]], "XPUAutoShard on GPU [Experimental]": [[17, "xpuautoshard-on-gpu-experimental"], [21, "xpuautoshard-on-gpu-experimental"]], "Code Structure": [[17, "code-structure"]], "Python API": [[17, "python-api"], [18, "python-api"], [42, "python-api"], [56, "python-api"]], "Dump the graph": [[17, "dump-the-graph"]], "Examples": [[17, "examples"], [28, "examples"], [39, "examples"], [39, "id1"]], "Tune Advanced Auto Mixed Precision": [[18, "tune-advanced-auto-mixed-precision"]], "Background": [[18, "background"], [40, "background"], [46, "background"]], "Numeric Stability": [[18, "numeric-stability"]], "Configuration List": [[18, "configuration-list"]], "Example of Mix Precision by List": [[18, "example-of-mix-precision-by-list"]], "Rule to Improve Performance by the Configuration List": [[18, "rule-to-improve-performance-by-the-configuration-list"]], "Python API Attribute & Environment Variable": [[18, "python-api-attribute-environment-variable"]], "Environment Variable Difference with Stock TensorFlow": [[18, "environment-variable-difference-with-stock-tensorflow"]], "Example": [[18, "example"], [19, "example"], [35, "example"]], "End-to-end Example": [[18, "end-to-end-example"]], "Tuning Performance Example on MobileNet": [[18, "tuning-performance-example-on-mobilenet"]], "Advanced Auto Mixed Precision": [[19, "advanced-auto-mixed-precision"], [19, "id1"]], "Advanced AMP vs. Stock TensorFlow AMP": [[19, "advanced-amp-vs-stock-tensorflow-amp"]], "Data Type": [[19, "data-type"]], "Graph Optimizer": [[19, "graph-optimizer"]], "Feature": [[19, "feature"]], "Tune Advanced AMP Manually": [[19, "tune-advanced-amp-manually"]], "Quick Training Example": [[19, "quick-training-example"]], "Setup": [[19, "setup"], [27, "setup"]], "Enable Advanced AMP": [[19, "enable-advanced-amp"]], "Original Code": [[19, "original-code"]], "Notice": [[19, "notice"]], "Tips": [[19, "tips"]], "Log and Save Optimized Graph": [[19, "log-and-save-optimized-graph"]], "Custom Operation": [[19, "custom-operation"]], "Environment Variables": [[20, "environment-variables"], [28, "environment-variables"]], "Configuration via Environment Variables": [[20, "configuration-via-environment-variables"]], "ITEX_VERBOSE level definition": [[20, "itex-verbose-level-definition"]], "Environment Variables with Python APIs": [[20, "environment-variables-with-python-apis"]], "Backend and Config Protocol": [[20, "backend-and-config-protocol"]], "Auto Mixed Precision Options": [[20, "auto-mixed-precision-options"]], "Features": [[21, "features"]], "Operator Optimization": [[21, "operator-optimization"]], "Graph Optimization": [[21, "graph-optimization"]], "Advanced Auto Mixed Precision (AMP)": [[21, "advanced-auto-mixed-precision-amp"]], "Ease-of-use Python API": [[21, "ease-of-use-python-api"]], "GPU Profiler": [[21, "gpu-profiler"], [22, "gpu-profiler"]], "CPU Launcher [Experimental]": [[21, "cpu-launcher-experimental"]], "OpenXLA Support on GPU [Experimental]": [[21, "openxla-support-on-gpu-experimental"]], "TensorFlow Serving": [[21, "tensorflow-serving"]], "Example:": [[22, "example"]], "FAQ": [[22, "faq"], [42, "faq"], [43, "faq"], [45, "faq"], [48, "faq"], [49, "faq"], [50, "faq"], [51, "faq"], [54, "faq"]], "Infrastructure": [[23, "infrastructure"]], "Architecture": [[23, "architecture"]], "TensorFlow Public API": [[23, "tensorflow-public-api"]], "Custom API": [[23, "custom-api"]], "Intel Advanced Feature and Extension Management": [[23, "intel-advanced-feature-and-extension-management"]], "XPU Engine": [[23, "xpu-engine"]], "Graph fusion": [[24, "graph-fusion"]], "Basic fusion": [[24, "basic-fusion"]], "Mixed data type fusion": [[24, "mixed-data-type-fusion"]], "Implementation Details": [[24, "implementation-details"]], "Generic layout optimizer": [[24, "generic-layout-optimizer"]], "Customized Operators": [[25, "customized-operators"]], "itex.ops.AdamWithWeightDecayOptimizer": [[25, "itex-ops-adamwithweightdecayoptimizer"]], "itex.ops.LayerNormalization": [[25, "itex-ops-layernormalization"]], "itex.ops.GroupNormalization": [[25, "itex-ops-groupnormalization"]], "itex.ops.gelu": [[25, "itex-ops-gelu"]], "itex.ops.ItexLSTM": [[25, "itex-ops-itexlstm"]], "Operators Override": [[26, "operators-override"]], "Layer Normalization": [[26, "layer-normalization"]], "Dense Layer": [[26, "dense-layer"]], "Gelu Activation": [[26, "gelu-activation"]], "Instance Normalization": [[26, "instance-normalization"]], "LSTM": [[26, "lstm"]], "Keras Mixed Precision": [[27, "keras-mixed-precision"]], "How to identify different hardware types?": [[27, "how-to-identify-different-hardware-types"]], "Setting the dtype policy": [[27, "setting-the-dtype-policy"]], "Building the model": [[27, "building-the-model"]], "Training the model with Model.fit": [[27, "training-the-model-with-model-fit"]], "Loss scaling": [[27, "loss-scaling"]], "Underflow and Overflow": [[27, "underflow-and-overflow"]], "Loss scaling overview": [[27, "loss-scaling-overview"]], "Training the model with a custom training loop": [[27, "training-the-model-with-a-custom-training-loop"]], "Launch Script User Guide": [[28, "launch-script-user-guide"]], "Common Execution Mode": [[28, "common-execution-mode"]], "Latency mode": [[28, "latency-mode"]], "Throughput mode": [[28, "throughput-mode"]], "Basic Settings": [[28, "basic-settings"]], "Launch Log": [[28, "launch-log"]], "Advanced Settings": [[28, "advanced-settings"]], "Multi-instance": [[28, "multi-instance"]], "NUMA Control": [[28, "numa-control"]], "Memory Allocator": [[28, "memory-allocator"], [29, "memory-allocator"]], "Single instance for inference": [[28, "single-instance-for-inference"]], "I. Use all physical cores": [[28, "i-use-all-physical-cores"]], "II. Use all cores including logical cores": [[28, "ii-use-all-cores-including-logical-cores"]], "III. Use physical cores on one node": [[28, "iii-use-physical-cores-on-one-node"]], "IV. Use your designated number of cores": [[28, "iv-use-your-designated-number-of-cores"]], "Multiple instances for inference": [[28, "multiple-instances-for-inference"]], "V. Throughput mode": [[28, "v-throughput-mode"]], "VI. Latency mode": [[28, "vi-latency-mode"]], "VII. Your designated number of instances": [[28, "vii-your-designated-number-of-instances"]], "VIII. Your designated number of instances and instance index": [[28, "viii-your-designated-number-of-instances-and-instance-index"]], "Set environment variables for inference": [[28, "set-environment-variables-for-inference"]], "IX. Set environment variable TF_NUM_INTRAOP_THREADS": [[28, "ix-set-environment-variable-tf-num-intraop-threads"]], "X. Set environment variable TF_NUM_INTEROP_THREADS": [[28, "x-set-environment-variable-tf-num-interop-threads"]], "Usage of TCMalloc/Jemalloc/Default memory allocator": [[28, "usage-of-tcmalloc-jemalloc-default-memory-allocator"]], "Jemalloc": [[28, "jemalloc"]], "TCMalloc": [[28, "tcmalloc"], [29, "tcmalloc"]], "Default memory allocator": [[28, "default-memory-allocator"]], "Practice Guide": [[29, "practice-guide"]], "Table of Contents": [[29, "table-of-contents"]], "CPU Practice Guide": [[29, "cpu-practice-guide"]], "Hardware Configuration": [[29, "hardware-configuration"]], "Non-Uniform Memory Access (NUMA)": [[29, "non-uniform-memory-access-numa"]], "Software Configuration": [[29, "software-configuration"]], "Memory Layout format": [[29, "memory-layout-format"]], "Numactl": [[29, "numactl"]], "OpenMP": [[29, "openmp"]], "OMP_NUM_THREADS": [[29, "omp-num-threads"]], "GNU OpenMP": [[29, "gnu-openmp"]], "Intel OpenMP": [[29, "intel-openmp"]], "GPU Practice Guide": [[29, "gpu-practice-guide"]], "Python APIs": [[30, "python-apis"]], "Prerequisite: import intel_extension_for_tensorflow as itex": [[30, "prerequisite-import-intel-extension-for-tensorflow-as-itex"]], "Python APIs and Environment Variable Names": [[30, "python-apis-and-environment-variable-names"]], "Python APIs and preserved environment variable Names": [[30, "python-apis-and-preserved-environment-variable-names"]], "Intel\u00ae Extension for TensorFlow* Config Protocol": [[30, "intel-extension-for-tensorflow-config-protocol"]], "itex.ConfigProto": [[30, "itex-configproto"]], "itex.GPUOptions": [[30, "itex-gpuoptions"]], "itex.GraphOptions": [[30, "itex-graphoptions"]], "itex.AutoMixedPrecisionOptions": [[30, "itex-automixedprecisionoptions"]], "itex.ShardingConfig": [[30, "itex-shardingconfig"]], "itex.DebugOptions": [[30, "itex-debugoptions"]], "itex.set_config": [[30, "itex-set-config"]], "itex.get_config": [[30, "itex-get-config"]], "itex operators": [[30, "itex-operators"]], "itex ops override": [[30, "itex-ops-override"]], "itex graph": [[30, "itex-graph"]], "itex version": [[30, "itex-version"]], "Install TensorFlow Serving with Intel\u00ae Extension for TensorFlow*": [[31, "install-tensorflow-serving-with-intel-extension-for-tensorflow"]], "Install Model Server": [[31, "install-model-server"]], "Install using Docker": [[31, "install-using-docker"]], "1. Build Intel\u00ae Extension for TensorFlow* C++ library": [[31, "build-intel-extension-for-tensorflow-c-library"]], "2. Build TensorFlow Serving": [[31, "build-tensorflow-serving"]], "Build Docker image from Dockerfile": [[31, "build-docker-image-from-dockerfile"]], "Run sample": [[31, "run-sample"]], "Experimental: Intel\u00ae Arc\u2122 A-Series GPU Software Installation": [[32, "experimental-intel-arc-a-series-gpu-software-installation"]], "Experimental Release": [[32, "experimental-release"]], "Windows Subsystem for Linux 2 (WSL2)": [[32, "windows-subsystem-for-linux-2-wsl2"], [32, "id1"]], "Native Linux Running Directly on Hardware": [[32, "native-linux-running-directly-on-hardware"], [32, "id2"]], "Step-By-Step Instructions": [[32, "step-by-step-instructions"]], "1. Install GPU Drivers": [[32, "install-gpu-drivers"]], "Windows GPU Drivers": [[32, "windows-gpu-drivers"]], "Ubuntu Linux Installed in WSL2": [[32, "ubuntu-linux-installed-in-wsl2"]], "2. Install TensorFlow* via PyPI Wheel in Linux": [[32, "install-tensorflow-via-pypi-wheel-in-linux"]], "Install TensorFlow": [[32, "install-tensorflow"], [34, "install-tensorflow"], [35, "install-tensorflow"], [36, "install-tensorflow"], [37, "install-tensorflow"]], "Virtual environment install": [[32, "virtual-environment-install"], [36, "virtual-environment-install"], [37, "virtual-environment-install"]], "System environment install": [[32, "system-environment-install"], [36, "system-environment-install"], [37, "system-environment-install"]], "3. Install Intel\u00ae Extension for TensorFlow*": [[32, "install-intel-extension-for-tensorflow"]], "4. Verify the Installation": [[32, "verify-the-installation"]], "Optional: Install Full Intel\u00ae oneAPI": [[32, "optional-install-full-intel-oneapi"]], "Setup environment variables": [[32, "setup-environment-variables"], [37, "setup-environment-variables"]], "Conda Environment Installation Instructions": [[33, "conda-environment-installation-instructions"]], "Preconditions": [[33, "preconditions"]], "Step by step instructions:": [[33, "step-by-step-instructions"]], "Requirements": [[34, "requirements"], [35, "requirements"]], "Common Requirements": [[34, "common-requirements"], [35, "common-requirements"]], "Install Bazel": [[34, "install-bazel"], [35, "install-bazel"]], "Download Source Code": [[34, "download-source-code"], [35, "download-source-code"]], "Create a Conda Environment": [[34, "create-a-conda-environment"], [35, "create-a-conda-environment"]], "Extra Requirements for XPU/GPU Build Only": [[34, "extra-requirements-for-xpu-gpu-build-only"], [35, "extra-requirements-for-xpu-gpu-build-only"]], "Install Intel GPU Driver": [[34, "install-intel-gpu-driver"], [35, "install-intel-gpu-driver"]], "Install oneAPI Base Toolkit": [[34, "install-oneapi-base-toolkit"], [35, "install-oneapi-base-toolkit"]], "Build Intel\u00ae Extension for TensorFlow* PyPI": [[34, "build-intel-extension-for-tensorflow-pypi"]], "Configure": [[34, "configure"], [35, "configure"]], "Configure For CPU": [[34, "configure-for-cpu"], [35, "configure-for-cpu"]], "Configure For GPU/XPU": [[34, "configure-for-gpu-xpu"]], "Build Source Code": [[34, "build-source-code"], [35, "build-source-code"]], "Additional": [[34, "additional"]], "Configure Example for CPU": [[34, "configure-example-for-cpu"]], "Configure Example For GPU or XPU": [[34, "configure-example-for-gpu-or-xpu"]], "Intel\u00ae Extension for TensorFlow* for C++": [[35, "intel-extension-for-tensorflow-for-c"]], "Build Intel\u00ae Extension for TensorFlow* CC library": [[35, "build-intel-extension-for-tensorflow-cc-library"]], "Configure For GPU": [[35, "configure-for-gpu"]], "Prepare Tensorflow* CC library and header files": [[35, "prepare-tensorflow-cc-library-and-header-files"]], "Option 1: Extract from Tensorflow* python package (Recommended)": [[35, "option-1-extract-from-tensorflow-python-package-recommended"]], "Option 2: Build from TensorFlow* source code": [[35, "option-2-build-from-tensorflow-source-code"]], "Integrate the CC library": [[35, "integrate-the-cc-library"]], "Linker": [[35, "linker"]], "Load": [[35, "load"]], "Build and run": [[35, "build-and-run"]], "Intel CPU Software Installation": [[36, "intel-cpu-software-installation"]], "Install via Docker container": [[36, "install-via-docker-container"], [37, "install-via-docker-container"]], "Build Docker container from Dockerfile": [[36, "build-docker-container-from-dockerfile"], [37, "build-docker-container-from-dockerfile"]], "Get docker container from dockerhub": [[36, "get-docker-container-from-dockerhub"], [37, "get-docker-container-from-dockerhub"]], "Install via PyPI wheel in bare metal": [[36, "install-via-pypi-wheel-in-bare-metal"], [37, "install-via-pypi-wheel-in-bare-metal"]], "Install Intel\u00ae Extension for TensorFlow*": [[36, "install-intel-extension-for-tensorflow"], [37, "install-intel-extension-for-tensorflow"]], "Verify the Installation": [[36, "verify-the-installation"], [37, "verify-the-installation"]], "Intel XPU Software Installation": [[37, "intel-xpu-software-installation"]], "Install oneAPI Base Toolkit Packages": [[37, "install-oneapi-base-toolkit-packages"]], "Check the Environment for XPU": [[37, "check-the-environment-for-xpu"]], "XPU for CPU only platform": [[37, "xpu-for-cpu-only-platform"]], "Installation Guide": [[38, "installation-guide"]], "Prepare for Running": [[39, "prepare-for-running"]], "Accelerate AlexNet by Quantization with Intel\u00ae Extension for Tensorflow*": [[40, "accelerate-alexnet-by-quantization-with-intel-extension-for-tensorflow"]], "Hardware Environment": [[40, "hardware-environment"], [46, "hardware-environment"]], "GPU": [[40, "gpu"], [46, "gpu"]], "Local Server": [[40, "local-server"], [46, "local-server"]], "Intel\u00ae DevCloud": [[40, "intel-devcloud"], [46, "intel-devcloud"]], "Running Environment": [[40, "running-environment"], [46, "running-environment"]], "Set up Base Running Environment": [[40, "set-up-base-running-environment"]], "Set up Intel\u00ae Extension for Tensorflow* for GPU": [[40, "set-up-intel-extension-for-tensorflow-for-gpu"]], "Execute": [[40, "execute"], [50, "execute"]], "Common Guide for Running": [[41, "common-guide-for-running"]], "Prepare": [[41, "prepare"]], "Intel GPU Driver": [[41, "intel-gpu-driver"]], "Intel\u00ae oneAPI Base Toolkit": [[41, "intel-oneapi-base-toolkit"]], "Setup Running Environment": [[41, "setup-running-environment"], [42, "setup-running-environment"], [43, "setup-running-environment"], [45, "setup-running-environment"], [48, "setup-running-environment"], [49, "setup-running-environment"], [50, "setup-running-environment"], [51, "setup-running-environment"], [52, "setup-running-environment"], [54, "setup-running-environment"], [55, "setup-running-environment"]], "Running": [[41, "running"]], "Enable oneAPI Running Environment": [[41, "enable-oneapi-running-environment"]], "Enable Virtual Running Environment": [[41, "enable-virtual-running-environment"]], "Run Script": [[41, "run-script"]], "Speed up Inference of Inception v4 by Advanced Automatic Mixed Precision on Intel CPU and GPU via Docker Container or Bare Metal": [[42, "speed-up-inference-of-inception-v4-by-advanced-automatic-mixed-precision-on-intel-cpu-and-gpu-via-docker-container-or-bare-metal"]], "Step": [[42, "step"]], "Hardware Requirement": [[42, "hardware-requirement"], [57, "hardware-requirement"]], "Prepare for GPU (Skip this Step for CPU)": [[42, "prepare-for-gpu-skip-this-step-for-cpu"]], "Clone the Repository": [[42, "clone-the-repository"]], "Download the Pretrained-model": [[42, "download-the-pretrained-model"]], "Enable Running Environment": [[42, "enable-running-environment"], [43, "enable-running-environment"], [45, "enable-running-environment"], [48, "enable-running-environment"], [49, "enable-running-environment"], [50, "enable-running-environment"], [51, "enable-running-environment"], [54, "enable-running-environment"], [55, "enable-running-environment"], [56, "enable-running-environment"]], "Execute Testing and Comparing the Performance of FP32 and Advanced AMP on CPU and GPU in Docker Container or Bare Metal": [[42, "execute-testing-and-comparing-the-performance-of-fp32-and-advanced-amp-on-cpu-and-gpu-in-docker-container-or-bare-metal"]], "Environment Variable Configuration": [[42, "environment-variable-configuration"]], "Result": [[42, "result"]], "Advanced: Enable Advanced AMP Method": [[42, "advanced-enable-advanced-amp-method"]], "ResNet50 Inference on Intel CPU and GPU": [[43, "resnet50-inference-on-intel-cpu-and-gpu"]], "Prerequisites": [[43, "prerequisites"], [45, "prerequisites"], [48, "prerequisites"], [49, "prerequisites"], [50, "prerequisites"], [51, "prerequisites"], [54, "prerequisites"], [55, "prerequisites"], [56, "prerequisites"]], "Prepare for GPU (Skip this step for CPU)": [[43, "prepare-for-gpu-skip-this-step-for-cpu"], [48, "prepare-for-gpu-skip-this-step-for-cpu"], [50, "prepare-for-gpu-skip-this-step-for-cpu"], [55, "prepare-for-gpu-skip-this-step-for-cpu"]], "Executes the Example with Python API": [[43, "executes-the-example-with-python-api"], [48, "executes-the-example-with-python-api"], [56, "executes-the-example-with-python-api"]], "Example Output": [[43, "example-output"], [47, "example-output"], [48, "example-output"], [56, "example-output"]], "Accelerate Deep Learning Training and Inference for Model Zoo Workloads on Intel GPU": [[44, "accelerate-deep-learning-training-and-inference-for-model-zoo-workloads-on-intel-gpu"]], "Quick Start Guide": [[44, "quick-start-guide"]], "Run Models in the Docker Container": [[44, "run-models-in-the-docker-container"]], "Run Models on Bare Metal": [[44, "run-models-on-bare-metal"]], "Accelerate BERT-Large Pretraining on Intel GPU": [[45, "accelerate-bert-large-pretraining-on-intel-gpu"]], "Model Code change": [[45, "model-code-change"], [48, "model-code-change"], [49, "model-code-change"], [51, "model-code-change"], [54, "model-code-change"], [55, "model-code-change"]], "Prepare for GPU": [[45, "prepare-for-gpu"], [49, "prepare-for-gpu"], [51, "prepare-for-gpu"], [54, "prepare-for-gpu"], [56, "prepare-for-gpu"]], "Prepare Dataset": [[45, "prepare-dataset"], [49, "prepare-dataset"], [54, "prepare-dataset"]], "Execute the Example": [[45, "execute-the-example"], [49, "execute-the-example"], [51, "execute-the-example"], [54, "execute-the-example"]], "Pretraining Command": [[45, "pretraining-command"]], "Finetune Command": [[45, "finetune-command"]], "Quantize Inception V3 by Intel\u00ae Extension for Tensorflow* on Intel\u00ae Xeon\u00ae": [[46, "quantize-inception-v3-by-intel-extension-for-tensorflow-on-intel-xeon"]], "Configuration": [[46, "configuration"]], "Intel\u00ae Extension for Tensorflow* Version": [[46, "intel-extension-for-tensorflow-version"]], "Enable oneDNN Graph": [[46, "enable-onednn-graph"]], "Disable Constant Folding Function": [[46, "disable-constant-folding-function"]], "CPU": [[46, "cpu"]], "Check Intel\u00ae Deep Learning Boost": [[46, "check-intel-deep-learning-boost"]], "Check Intel\u00ae Advanced Matrix Extensions": [[46, "check-intel-advanced-matrix-extensions"]], "Startup Jupyter Notebook": [[46, "startup-jupyter-notebook"], [50, "startup-jupyter-notebook"]], "License": [[46, "license"], [57, "license"]], "Quick Example on Intel CPU and GPU": [[47, "quick-example-on-intel-cpu-and-gpu"]], "Installation": [[47, "installation"]], "Code": [[47, "code"]], "quick_example.py": [[47, "quick-example-py"]], "Notes": [[47, "notes"]], "Stable Diffusion Inference for Text2Image on Intel GPU": [[48, "stable-diffusion-inference-for-text2image-on-intel-gpu"]], "Running the Jupyter Notebook": [[48, "running-the-jupyter-notebook"]], "FP32 Inference": [[48, "fp32-inference"]], "FP16 Inference": [[48, "fp16-inference"]], "Accuracy": [[48, "accuracy"], [51, "accuracy"]], "Accelerate 3D-Unet Training w/o horovod for medical image segmentation on Intel GPU": [[49, "accelerate-3d-unet-training-w-o-horovod-for-medical-image-segmentation-on-intel-gpu"]], "Single Tile": [[49, "single-tile"]], "Multi-tile with horovod": [[49, "multi-tile-with-horovod"]], "BERT Training for Classifying Text on Intel CPU and GPU": [[50, "bert-training-for-classifying-text-on-intel-cpu-and-gpu"]], "Download Jupyter Code:": [[50, "download-jupyter-code"]], "FP8 BERT-Large Fine-tuning for Classifying Text on Intel GPU": [[51, "fp8-bert-large-fine-tuning-for-classifying-text-on-intel-gpu"]], "BF16 + FP8 Fine-tuning": [[51, "bf16-fp8-fine-tuning"]], "Distributed Training Example with Intel\u00ae Optimization for Horovod* on Intel\u00ae GPU": [[52, "distributed-training-example-with-intel-optimization-for-horovod-on-intel-gpu"]], "Dependency": [[52, "dependency"]], "Create Virtual Environment": [[52, "create-virtual-environment"]], "Install": [[52, "install"], [57, "install"]], "Prepare Example Code": [[52, "prepare-example-code"]], "Clone Horovod Repo": [[52, "clone-horovod-repo"]], "Download Patch": [[52, "download-patch"]], "Apply Patch for Intel GPU": [[52, "apply-patch-for-intel-gpu"]], "Execution": [[52, "execution"], [55, "execution"]], "Enable oneAPI": [[52, "enable-oneapi"]], "Device Count": [[52, "device-count"]], "Running Command": [[52, "running-command"]], "Output": [[52, "output"]], "Refer to train_resnet50": [[53, "refer-to-train-resnet50"]], "Accelerate Mask R-CNN Training w/o horovod on Intel GPU": [[54, "accelerate-mask-r-cnn-training-w-o-horovod-on-intel-gpu"]], "Resnet50 train on Intel GPU": [[55, "resnet50-train-on-intel-gpu"]], "Apply Patch": [[55, "apply-patch"]], "If not use Horovod": [[55, "if-not-use-horovod"]], "If use Horovod": [[55, "if-use-horovod"]], "Prepare ImageNet dataset": [[55, "prepare-imagenet-dataset"]], "Set Model Parameters": [[55, "set-model-parameters"]], "Command": [[55, "command"]], "Command with Horovod": [[55, "command-with-horovod"]], "Example Output without hvd": [[55, "example-output-without-hvd"]], "Example Output with hvd": [[55, "example-output-with-hvd"]], "Accelerate ResNet50 Training by XPUAutoShard on Intel GPU": [[56, "accelerate-resnet50-training-by-xpuautoshard-on-intel-gpu"]], "Prepare the Codes": [[56, "prepare-the-codes"]], "Install Other Required Packages": [[56, "install-other-required-packages"]], "Setup PYTHONPATH": [[56, "setup-pythonpath"]], "Without XPUAutoShard": [[56, "without-xpuautoshard"]], "With XPUAutoShard": [[56, "with-xpuautoshard"]], "Sharding Parameters Setting": [[56, "sharding-parameters-setting"]], "Further Settings": [[56, "further-settings"]], "Executing Command": [[56, "executing-command"]], "Quick Get Started*": [[57, "quick-get-started"]], "Software Requirement": [[57, "software-requirement"]], "Installation Channel:": [[57, "installation-channel"]], "Compatibility Table": [[57, "compatibility-table"]], "Install for XPU": [[57, "install-for-xpu"]], "Install for CPU": [[57, "install-for-cpu"]], "Install for weekly binaries": [[57, "install-for-weekly-binaries"]], "Install for GPU weekly": [[57, "install-for-gpu-weekly"]], "Contributing": [[57, "contributing"]], "Support": [[57, "support"]], "Security": [[57, "security"]]}, "indexentries": {}}) \ No newline at end of file