diff --git a/INSTALL.md b/INSTALL.md index 42b4a5c..09e0ec8 100644 --- a/INSTALL.md +++ b/INSTALL.md @@ -1,7 +1,9 @@ -Cloc Install Instructions 1.0P -============================== +Cloc 0.9 Install Instructions +============================= + +Warning. These instructions are for HSA 1.0F . -The Cloc utility consists of three bash scripts with file names "cloc.sh" , "snack.sh" , and "snk_genw.sh" . These are found in the bin directory of this repository. Copy these files to a bin directory in your Linux environment PATH such as /usr/local/bin. To update to a new version of Cloc simply replace cloc.sh snack.sh, and snk_genw.sh in that directory. +The Cloc utility consists of three bash scripts with file names "cloc.sh" , "snack.sh" , and "snk_genw.sh" . These are found in the bin directory of this repository. Copy these files to /opt/amd/cloc/bin. To update to a new version of Cloc simply replace cloc.sh snack.sh, and snk_genw.sh in directory /opt/amd/cloc/bin. In addition to the bash scripts, Cloc requires the HSA runtime and the HLC compiler. This set of instructions can be used to install a comprehensive HSA software stack and the Cloc utility for Ubuntu. In addition to Linux, you must have an HSA compatible system such as a Kaveri processor. There are four major steps to this process: @@ -16,7 +18,7 @@ In addition to the bash scripts, Cloc requires the HSA runtime and the HLC compi ## Install Ubuntu 14.04 LTS -Make sure Ubuntu 14.04 LTS 64-bit version has been installed. We recommend the server package set. The utica version of ubuntu (14.10) has not been tested with HSA. Then install these dependencies: +Make sure Ubuntu 14.04 LTS 64-bit version has been installed. Ubunutu 14.04 is also known as trusty. We recommend the server package set. The utica version of ubuntu (14.10) has not been tested with HSA. Then install these dependencies: ``` sudo apt-get update sudo apt-get upgrade @@ -24,7 +26,7 @@ sudo apt-get install git sudo apt-get install make sudo apt-get install g++ sudo apt-get install libstdc++-4.8-dev -sudo apt-get install libelf-dev +sudo apt-get install libelf sudo apt-get install libtinfo-dev sudo apt-get install re2c sudo apt-get install libbsd-dev @@ -47,16 +49,16 @@ mount the appropriate MLNX_OFED iso ## Install HSA Linux Kernel Drivers -Make sure you get the backlevel kfd-v1.0.x branch. This set of instructions is for the provisional HSA runtime. The software stack for the new finalized v1.0F is not yet complete. We will update these install instructions when that is complete. This should be sometime in June 2015. +These instructions are for HSA1.0F. Execute these commands: ``` cd ~/git -git clone -b kfd-v1.0.x https://github.com/HSAfoundation/HSA-Drivers-Linux-AMD.git -sudo dpkg -i HSA-Drivers-Linux-AMD/kfd-1.0/ubuntu/*.deb +git clone https://github.com/HSAfoundation/HSA-Drivers-Linux-AMD.git +sudo dpkg -i HSA-Drivers-Linux-AMD/kfd-1.2/ubuntu/*.deb echo "KERNEL==\"kfd\", MODE=\"0666\"" | sudo tee /etc/udev/rules.d/kfd.rules -sudo cp HSA-Drivers-Linux-AMD/kfd-1.0/libhsakmt/lnx64a/libhsakmt.so.1 /opt/hsa/lib +sudo cp HSA-Drivers-Linux-AMD/kfd-1.2/libhsakmt.so.1 /opt/hsa/lib ``` ## Reboot System @@ -95,75 +97,75 @@ If it does not detect a valid GPU ID (last two entries are NO), it is possible t 3. Install HSA Software ======================= -## Install HSA Runtime +## Install HSA 1.0F Runtime ``` mkdir ~/git cd ~/git -git clone -b release-v1.0 https://github.com/HSAfoundation/HSA-Runtime-AMD.git +git clone https://github.com/HSAfoundation/HSA-Runtime-AMD.git cd HSA-Runtime-AMD/ubuntu sudo dpkg -i hsa-runtime_1.0_amd64.deb ``` -## Install HSAIL Compiler (HLC) - -``` -cd ~/git -git clone https://github.com/HSAfoundation/HSAIL-HLC-Stable.git -cd HSAIL-HLC-Stable/ubuntu -sudo dpkg -i hsail-hlc-stable_1.0_amd64.deb -``` - ## Install and Test Cloc utility -As of Cloc version 0.8 the executable shell script names are changed to cloc.sh, snack.sh and snk_genw.sh. -These scripts need to be copied to a directory that is in users PATH. For example /usr/local/bin is typically in PATH. +As of Cloc version 0.9 the cl frontend clc2 and supporting LLVM 3.6 executables are stored in the same directory as the cloc.sh, snack.sh and snk_genw.sh shell scripts. These scripts need to be copied should be copied into /opt/amd/cloc/bin ``` cd ~/git -git clone -b 0.8 https://github.com/HSAfoundation/CLOC.git -sudo cp CLOC/bin/cloc.sh /usr/local/bin/. -sudo cp CLOC/bin/snack.sh /usr/local/bin/. -sudo cp CLOC/bin/snk_genw.sh /usr/local/bin/. -cd -cp -r git/CLOC/examples . -cd examples/snack/helloworld +git clone -b CLOC-0.9 https://github.com/HSAfoundation/CLOC.git +# Install +mkdir -p /opt/amd/cloc +sudo cp -rp ~/git/CLOC/bin /opt/amd/cloc +sudo cp -rp ~/git/CLOC/examples /opt/amd/cloc +sudo ln -sf /opt/amd/cloc/bin/cloc.sh /usr/local/bin/cloc.sh +sudo ln -sf /opt/amd/cloc/bin/snack.sh /usr/local/bin/snack.sh +sudo ln -sf /opt/amd/cloc/bin/printhsail /usr/local/bin/printhsail +# Test +cp -r /opt/amd/cloc/examples ~ +cd ~/examples/snack/helloworld ./buildrun.sh +cd ~/examples/hsa/vector_copy +make +make test ``` -## Install Kalmar compiler - -This was formerly known as C++AMP. This step is optional because it is not needed for Cloc. However this is becoming a very good HSA compiler. +## Set HSA environment variables +As of Cloc version 0.9, HSA_LLVM_PATH is no longer required because cloc.sh and snack.sh expect the binaries to be in the same directory where cloc.sh and snack.sh are stored. For testing other compilers or versions of the HSA LLVM binaries, you may set HSA_LLVM_PATH or use the -p option as noted in the help. The snack.sh script assumes HSA_RUNTIME_PATH is /opt/hsa. However, we recommend using LD_LIBRARY_PATH to find the current version of he HSA runtime as follows: ``` -mkdir ~/git/deb -cd ~/git/deb -wget https://bitbucket.org/multicoreware/cppamp-driver-ng/downloads/clamp-0.5.0-hsa-milestone4-Linux.deb -wget https://bitbucket.org/multicoreware/cppamp-driver-ng/downloads/libcxxamp-0.5.0-hsa-milestone4-Linux.deb -wget https://bitbucket.org/multicoreware/cppamp-driver-ng/downloads/clamp-bolt-1.2.0-hsa-milestone4-Linux.deb -wget https://bitbucket.org/multicoreware/cppamp-driver-ng/downloads/boost_1_55_0-hsa-milestone3.deb -sudo dpkg -i *.deb +export HSA_RUNTIME_PATH=/opt/hsa +export LD_LIBRARY_PATH=$HSA_RUNTIME_PATH/lib ``` -## Install Okra - -This step is also optional. It is not needed for Cloc. However, it is currently needed for the experimental version of gcc that supports OpenMP accelertion in HSA. +We recommend that cloc.sh, snack,sh, and printhsail be available in your path. You can symbolically link them or add to PATH as follows: ``` -cd ~/git -git clone https://github.com/HSAfoundation/Okra-Interface-to-HSA-Device -sudo mkdir /opt/amd/okra -sudo cp -r Okra-Interface-to-HSA-Device/okra /opt/amd -sudo cp Okra-Interface-to-HSA-Device/okra/dist/bin/libokra_x86_64.so /opt/hsa/lib/. +# +# Either put /opt/amd/cloc/bin in your PATH as follows +export PATH=$PATH:/opt/amd/cloc/bin +# +# OR symbolic link cloc.sh and snack.sh to system path +sudo ln -sf /opt/amd/cloc/bin/cloc.sh /usr/local/bin/cloc.sh +sudo ln -sf /opt/amd/cloc/bin/snack.sh /usr/local/bin/snack.sh +sudo ln -sf /opt/amd/cloc/bin/printhsail /usr/local/bin/printhsail ``` -## Set HSA environment variables +Future package installers (.deb and .rpm) will symbolically link them. -``` -export HSA_LLVM_PATH=/opt/amd/bin -export HSA_RUNTIME_PATH=/opt/hsa -export HSA_OKRA_PATH=/opt/amd/okra -export PATH=$PATH:/opt/amd/bin -export LD_LIBRARY_PATH=/opt/hsa/lib -``` +## Install Kalmar (C++AMP) HSA Compiler (OPTIONAL) + +SKIP THIS STEP TILL KALMAR IS PORTED TO 1.0F + +## Install gcc OpenMP for HSA Compiler (OPTIONAL) + +SKIP THIS STEP TILL IT IS PORTED TO 1.0F + +## Install Codeplay HSA Compiler (OPTIONAL) + +SKIP THIS STEP TILL IT IS PORTED TO 1.0F + +## Install Pathscale HSA Compiler (OPTIONAL) + +SKIP THIS STEP TILL IT IS PORTED TO 1.0F diff --git a/README.md b/README.md index 2b55f2b..f8dc7d6 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ -CLOC - Version 0.8.0 -==================== +CLOC - V 0.9.0 (HSA 1.0F) +========================= CLOC: CL Offline Compiler Generate HSAIL or brig from a cl (Kernel c Language) file. @@ -81,19 +81,23 @@ Software License Agreement. -t Default=/tmp/cloc$$, Temp dir for files -o Default=. ft=brig or hsail -opt Default=2, LLVM optimization level - -p Default=$HSA_LLVM_PATH or /opt/amd/bin + -p $HSA_LLVM_PATH or if HSA_LLVM_PATH not set + is actual directory of cloc.sh -clopts Default="-cl-std=CL2.0" - -lkopts Default="--prelink-opt -l $HSA_LLVM_PATH/builtins-hsail.bc" + -lkopts Default="-prelink-opt \ + -l /builtins-hsail.bc -l /builtins-gcn.bc \ + -l /builtins-hsail-amd-ci.bc -l /builtins-ocml.bc" Examples: - cloc my.cl /* create my.brig */ - cloc -hsail my.cl /* create my.hsail and my.brig */ + cloc.sh my.cl /* create my.brig */ + cloc.sh -hsail my.cl /* create my.hsail and my.brig */ - You may set environment variables LLVMOPT, HSA_LLVM_PATH, CLOPTS, or - LKOPTS instead of providing options -opt -p, -clopts, or -lkopts . + You may set environment variables LLVMOPT, HSA_LLVM_PATH, CLOPTS, + or LKOPTS instead of providing options -opt -p, -clopts, or -lkopts . Command line options will take precedence over environment variables. Copyright (c) 2015 ADVANCED MICRO DEVICES, INC. + ``` @@ -126,22 +130,24 @@ Software License Agreement. -gccopt Default=2, gcc optimization for snack wrapper -t Default=/tmp/snk_$$, Temp dir for files -s Default=filename - -p1 Default=$HSA_LLVM_PATH or /opt/amd/bin - -p2 Default=$HSA_RUNTIME_PATH or /opt/hsa + -p $HSA_LLVM_PATH or if HSA_LLVM_PATH not set + is actual directory of snack.sh + -rp Default=$HSA_RUNTIME_PATH or /opt/hsa -o Default=. Examples: - snack my.cl /* create my.snackwrap.c and my.h */ - snack -c my.cl /* gcc compile to create my.o */ - snack -hsail my.cl /* create hsail and snackwrap.c */ - snack -c -hsail my.cl /* create hsail snackwrap.c and .o */ - snack -t /tmp/foo my.cl /* will automatically set -k */ + snack.sh my.cl /* create my.snackwrap.c and my.h */ + snack.sh -c my.cl /* gcc compile to create my.o */ + snack.sh -hsail my.cl /* create hsail and snackwrap.c */ + snack.sh -c -hsail my.cl /* create hsail snackwrap.c and .o */ + snack.sh -t /tmp/foo my.cl /* will automatically set -k */ You may set environment variables HSA_LLVM_PATH, HSA_RUNTIME_PATH, - instead of providing options -p1, -p2. + instead of providing options -p, -rp. Command line options will take precedence over environment variables. Copyright (c) 2015 ADVANCED MICRO DEVICES, INC. + ``` @@ -221,7 +227,7 @@ manual updates to HSAIL. This process has two steps. The first step compiles the .cl file into the object code needed by a SNACK application. For example, if your kernels are in the file myKernels.cl, then you can run step 1 as follows. ``` - snack -c -hsail myKernels.cl + snack.sh -c -hsail myKernels.cl ``` When cloc sees the "-c" option and the "-hsail" option, it will save four files in the same directory as myKernels.cl file. The first two files are always created diff --git a/bin/Dev b/bin/Dev new file mode 100755 index 0000000..fee5d9b Binary files /dev/null and b/bin/Dev differ diff --git a/bin/HSAILTestGen b/bin/HSAILTestGen new file mode 100755 index 0000000..c9a86ed Binary files /dev/null and b/bin/HSAILTestGen differ diff --git a/bin/builtins-gcn.bc b/bin/builtins-gcn.bc new file mode 100644 index 0000000..6c66f5e Binary files /dev/null and b/bin/builtins-gcn.bc differ diff --git a/bin/builtins-hsail-amd-ci.bc b/bin/builtins-hsail-amd-ci.bc new file mode 100644 index 0000000..7556132 Binary files /dev/null and b/bin/builtins-hsail-amd-ci.bc differ diff --git a/bin/builtins-hsail.bc b/bin/builtins-hsail.bc new file mode 100644 index 0000000..150faa9 Binary files /dev/null and b/bin/builtins-hsail.bc differ diff --git a/bin/clang-tblgen b/bin/clang-tblgen new file mode 100755 index 0000000..ef282f7 Binary files /dev/null and b/bin/clang-tblgen differ diff --git a/bin/clc2 b/bin/clc2 new file mode 100755 index 0000000..22c1151 Binary files /dev/null and b/bin/clc2 differ diff --git a/bin/cloc.sh b/bin/cloc.sh index fcb473a..e6945ae 100755 --- a/bin/cloc.sh +++ b/bin/cloc.sh @@ -6,7 +6,7 @@ # Written by Greg Rodgers Gregory.Rodgers@amd.com # Maintained by Shreyas Ramalingam Shreyas.Ramalingam@amd.com # -PROGVERSION=0.8.0 +PROGVERSION=0.9.0 # # Copyright (c) 2014 ADVANCED MICRO DEVICES, INC. # @@ -67,16 +67,19 @@ function usage(){ -t Default=/tmp/cloc$$, Temp dir for files -o Default=. ft=brig or hsail -opt Default=2, LLVM optimization level - -p Default=$HSA_LLVM_PATH or /opt/amd/bin + -p $HSA_LLVM_PATH or if HSA_LLVM_PATH not set + is actual directory of cloc.sh -clopts Default="-cl-std=CL2.0" - -lkopts Default="--prelink-opt -l $HSA_LLVM_PATH/builtins-hsail.bc" + -lkopts Default="-prelink-opt \ + -l /builtins-hsail.bc -l /builtins-gcn.bc \ + -l /builtins-hsail-amd-ci.bc" Examples: - cloc my.cl /* create my.brig */ - cloc -hsail my.cl /* create my.hsail and my.brig */ + cloc.sh my.cl /* create my.brig */ + cloc.sh -hsail my.cl /* create my.hsail and my.brig */ - You may set environment variables LLVMOPT, HSA_LLVM_PATH, CLOPTS, or - LKOPTS instead of providing options -opt -p, -clopts, or -lkopts . + You may set environment variables LLVMOPT, HSA_LLVM_PATH, CLOPTS, + or LKOPTS instead of providing options -opt -p, -clopts, or -lkopts . Command line options will take precedence over environment variables. Copyright (c) 2015 ADVANCED MICRO DEVICES, INC. @@ -161,24 +164,26 @@ fi if [ ! -z $1 ]; then echo " " - echo "WARNING: Cloc can only process one .cl file at a time." + echo "WARNING: cloc.sh can only process one .cl file at a time." echo " You can call cloc multiple times to get multiple outputs." echo " Argument $LASTARG will be processed. " echo " These args are ignored: $@" echo " " fi -# We no longer need CLOCPATH (no _genw). -# In future we expect cloc.sh to be in $HSA_LLVM_PATH -# CLOCPATH=$(getdname $0) +# All binaries and builtins are expected to be in the same directory as cloc.sh +# unless HSA_LLVM_PATH is set. +cdir=$(getdname $0) +[ ! -L "$cdir/cloc.sh" ] || cdir=$(getdname `readlink "$cdir/cloc.sh"`) +# If HSA_LLVM_PATH is set use it, else use cdir +HSA_LLVM_PATH=${HSA_LLVM_PATH:-$cdir} # Set Default values, all CMD_ are started from $HSA_LLVM_PATH LLVMOPT=${LLVMOPT:-2} -HSA_LLVM_PATH=${HSA_LLVM_PATH:-/opt/amd/bin} # no default CLOPTS -cl-std=CL2.0 is a forced option to the clc2 command CMD_CLC=${CMD_CLC:-clc2 -cl-std=CL2.0 $CLOPTS} CMD_LLA=${CMD_LLA:-llvm-dis} -LKOPTS=${LKOPTS:--prelink-opt -l $HSA_LLVM_PATH/builtins-hsail.bc} +LKOPTS=${LKOPTS:--prelink-opt -l $HSA_LLVM_PATH/builtins-hsail.bc -l $HSA_LLVM_PATH/builtins-gcn.bc -l $HSA_LLVM_PATH/builtins-hsail-amd-ci.bc} CMD_LLL=${CMD_LLL:-llvm-link $LKOPTS} CMD_OPT=${CMD_OPT:-opt -O$LLVMOPT -gpu -whole} CMD_LLC=${CMD_LLC:-llc -O$LLVMOPT -march=hsail-64 -filetype=obj} @@ -196,11 +201,6 @@ if [ ! -e "$LASTARG" ] ; then echo "ERROR: The file $LASTARG does not exist." exit $DEADRC fi -if [ ! -d $HSA_LLVM_PATH ] ; then - echo "ERROR: Missing directory $HSA_LLVM_PATH " - echo " Set env variable HSA_LLVM_PATH or use -p option" - exit $DEADRC -fi # Parse LASTARG for directory, filename, and symbolname INDIR=$(getdname $LASTARG) @@ -241,8 +241,7 @@ if [ ! -d $TMPDIR ] && [ ! $DRYRUN ] ; then exit $DEADRC fi if [ ! -e $HSA_LLVM_PATH/hsailasm ] ; then - echo "ERROR: Missing hsailasm in $HSA_LLVM_PATH" - echo " Set env variable HSA_LLVM_PATH or use -p option" + echo "ERROR: Missing binary hsailasm in $HSA_LLVM_PATH" exit $DEADRC fi if [ ! -d $OUTDIR ] && [ ! $DRYRUN ] ; then @@ -298,13 +297,12 @@ fi if [ $DRYRUN ] ; then echo $CMD_LLL -o $TMPDIR/$FNAME.lnkd.bc $TMPDIR/$FNAME.bc else -# Hide the warnings for now - $HSA_LLVM_PATH/$CMD_LLL -o $TMPDIR/$FNAME.lnkd.bc $TMPDIR/$FNAME.bc 2>/dev/null + $HSA_LLVM_PATH/$CMD_LLL -o $TMPDIR/$FNAME.lnkd.bc $TMPDIR/$FNAME.bc rc=$? fi if [ $rc != 0 ] ; then echo "ERROR: The following command failed with return code $rc." - echo " $CMD_LLL -o $TMPDIR/$FNAME.lnkd.bc $TMPDIR/$FNAME.bc" + echo " $HSA_LLVM_PATH/$CMD_LLL -o $TMPDIR/$FNAME.lnkd.bc $TMPDIR/$FNAME.bc" do_err $rc fi diff --git a/bin/hsailasm b/bin/hsailasm new file mode 100755 index 0000000..50958ba Binary files /dev/null and b/bin/hsailasm differ diff --git a/bin/inflate b/bin/inflate new file mode 100755 index 0000000..c08f80a Binary files /dev/null and b/bin/inflate differ diff --git a/bin/llc b/bin/llc new file mode 100755 index 0000000..51e45a1 Binary files /dev/null and b/bin/llc differ diff --git a/bin/llvm-ar b/bin/llvm-ar new file mode 100755 index 0000000..ebf398f Binary files /dev/null and b/bin/llvm-ar differ diff --git a/bin/llvm-as b/bin/llvm-as new file mode 100755 index 0000000..023e70a Binary files /dev/null and b/bin/llvm-as differ diff --git a/bin/llvm-dis b/bin/llvm-dis new file mode 100755 index 0000000..1d03501 Binary files /dev/null and b/bin/llvm-dis differ diff --git a/bin/llvm-dwarfdump b/bin/llvm-dwarfdump new file mode 100755 index 0000000..5113071 Binary files /dev/null and b/bin/llvm-dwarfdump differ diff --git a/bin/llvm-extract b/bin/llvm-extract new file mode 100755 index 0000000..c6b1606 Binary files /dev/null and b/bin/llvm-extract differ diff --git a/bin/llvm-link b/bin/llvm-link new file mode 100755 index 0000000..1e66fcc Binary files /dev/null and b/bin/llvm-link differ diff --git a/bin/llvm-nm b/bin/llvm-nm new file mode 100755 index 0000000..4671bbf Binary files /dev/null and b/bin/llvm-nm differ diff --git a/bin/llvm-ranlib b/bin/llvm-ranlib new file mode 100755 index 0000000..3dfd3f8 Binary files /dev/null and b/bin/llvm-ranlib differ diff --git a/bin/opt b/bin/opt new file mode 100755 index 0000000..30e74eb Binary files /dev/null and b/bin/opt differ diff --git a/bin/printhsail b/bin/printhsail index d62d113..94d5d3e 100755 --- a/bin/printhsail +++ b/bin/printhsail @@ -5,6 +5,27 @@ # # Written by Greg Rodgers. # + +function getdname(){ + local __DIRN=`dirname "$1"` + if [ "$__DIRN" = "." ] ; then + __DIRN=$PWD; + else + if [ ${__DIRN:0:1} != "/" ] ; then + if [ ${__DIRN:0:2} == ".." ] ; then + __DIRN=`dirname $PWD`/${__DIRN:3} + else + if [ ${__DIRN:0:1} = "." ] ; then + __DIRN=$PWD/${__DIRN:2} + else + __DIRN=$PWD/$__DIRN + fi + fi + fi + fi + echo $__DIRN +} + infile=$1 tmpfile=/tmp/printhsail$$ cloc_brigsym="HSA_BrigMem" @@ -12,7 +33,12 @@ cloc_brigszsym="HSA_BrigMemSz" kalmar_brigsym="_binary_kernel_brig_start" kalmar_brigszsym="_binary_kernel_brig_size" -HSAIL_DISASSEMBLE=${HSAIL_DISASSEMBLE:-/opt/amd/bin/hsailasm} +cdir=$(getdname $0) +[ ! -L "$cdir/cloc.sh" ] || cdir=$(getdname `readlink "$cdir/cloc.sh"`) +HSA_LLVM_PATH=${HSA_LLVM_PATH:-$cdir} + +HSAIL_DISASSEMBLE=${HSAIL_DISASSEMBLE:-$HSA_LLVM_PATH/hsailasm} + if [ ! -f "$infile" ] ; then echo "ERROR: Input file \"$1\" does not exist " exit 1 diff --git a/bin/snack.sh b/bin/snack.sh index 9cd8b81..44427d5 100755 --- a/bin/snack.sh +++ b/bin/snack.sh @@ -23,7 +23,7 @@ # Written by Greg Rodgers Gregory.Rodgers@amd.com # Maintained by Shreyas Ramalingam Shreyas.Ramalingam@amd.com # -PROGVERSION=0.8.0 +PROGVERSION=0.9.0 # # Copyright (c) 2015 ADVANCED MICRO DEVICES, INC. Patent pending. # @@ -92,19 +92,20 @@ function usage(){ -gccopt Default=2, gcc optimization for snack wrapper -t Default=/tmp/snk_$$, Temp dir for files -s Default=filename - -p1 Default=$HSA_LLVM_PATH or /opt/amd/bin - -p2 Default=$HSA_RUNTIME_PATH or /opt/hsa + -p $HSA_LLVM_PATH or if HSA_LLVM_PATH not set + is actual directory of snack.sh + -rp Default=$HSA_RUNTIME_PATH or /opt/hsa -o Default=. Examples: - snack my.cl /* create my.snackwrap.c and my.h */ - snack -c my.cl /* gcc compile to create my.o */ - snack -hsail my.cl /* create hsail and snackwrap.c */ - snack -c -hsail my.cl /* create hsail snackwrap.c and .o */ - snack -t /tmp/foo my.cl /* will automatically set -k */ + snack.sh my.cl /* create my.snackwrap.c and my.h */ + snack.sh -c my.cl /* gcc compile to create my.o */ + snack.sh -hsail my.cl /* create hsail and snackwrap.c */ + snack.sh -c -hsail my.cl /* create hsail snackwrap.c and .o */ + snack.sh -t /tmp/foo my.cl /* will automatically set -k */ You may set environment variables HSA_LLVM_PATH, HSA_RUNTIME_PATH, - instead of providing options -p1, -p2. + instead of providing options -p, -rp. Command line options will take precedence over environment variables. Copyright (c) 2015 ADVANCED MICRO DEVICES, INC. @@ -167,8 +168,8 @@ while [ $# -gt 0 ] ; do -s) SYMBOLNAME=$2; shift ;; -o) OUTFILE=$2; shift ;; -t) TMPDIR=$2; shift ;; - -p1) HSA_LLVM_PATH=$2; shift ;; - -p2) HSA_RUNTIME_PATH=$2; shift ;; + -p) HSA_LLVM_PATH=$2; shift ;; + -rp) HSA_RUNTIME_PATH=$2; shift ;; -h) usage ;; -help) usage ;; --help) usage ;; @@ -201,14 +202,14 @@ if [ ! -z $1 ]; then echo " " fi -CLOCPATH=$(getdname $0) +sdir=$(getdname $0) +[ ! -L "$sdir/snack.sh" ] || sdir=$(getdname `readlink "$sdir/snack.sh"`) +HSA_LLVM_PATH=${HSA_LLVM_PATH:-$sdir} # Set Default values GCCOPT=${GCCOPT:-3} LLVMOPT=${LLVMOPT:-2} HSA_RUNTIME_PATH=${HSA_RUNTIME_PATH:-/opt/hsa} -HSA_LLVM_PATH=${HSA_LLVM_PATH:-/opt/amd/bin} -LKOPTS=${LKOPTS:--prelink-opt -l $HSA_LLVM_PATH/builtins-hsail.bc} CMD_BRI=${CMD_BRI:-hsailasm } FORTRAN=${FORTRAN:-0}; @@ -236,19 +237,18 @@ if [ ! -e "$LASTARG" ] ; then fi if [ ! -d $HSA_LLVM_PATH ] ; then echo "ERROR: Missing directory $HSA_LLVM_PATH " - echo " Set env variable HSA_LLVM_PATH or use -p1 option" + echo " Set env variable HSA_LLVM_PATH or use -p option" exit $DEADRC fi -# We need RUNTIME with -c option -if [ ! -d $HSA_RUNTIME_PATH ] ; then - echo "ERROR: Snack needs HSA_RUNTIME_PATH" - echo " Missing directory $HSA_RUNTIME_PATH " - echo " Set env variable HSA_RUNTIME_PATH or use -p2 option" +if [ $MAKEOBJ ] && [ ! -d "$HSA_RUNTIME_PATH/lib" ] ; then + echo "ERROR: snack.sh -c option needs HSA_RUNTIME_PATH" + echo " Missing directory $HSA_RUNTIME_PATH/lib " + echo " Set env variable HSA_RUNTIME_PATH or use -rp option" exit $DEADRC fi -if [ ! -f $HSA_RUNTIME_PATH/include/hsa.h ] ; then +if [ $MAKEOBJ ] && [ ! -f $HSA_RUNTIME_PATH/include/hsa.h ] ; then echo "ERROR: Missing $HSA_RUNTIME_PATH/include/hsa.h" - echo " The -c option requires HSA includes" + echo " snack.sh requires HSA includes" exit $DEADRC fi @@ -323,7 +323,7 @@ if [ ! -d $TMPDIR ] && [ ! $DRYRUN ] ; then fi if [ ! -e $HSA_LLVM_PATH/hsailasm ] ; then echo "ERROR: Missing hsailasm in $HSA_LLVM_PATH" - echo " Set env variable HSA_LLVM_PATH or use -p1 option" + echo " Set env variable HSA_LLVM_PATH or use -p option" exit $DEADRC fi if [ ! -d $OUTDIR ] && [ ! $DRYRUN ] ; then @@ -417,13 +417,13 @@ else # Not step 2, do normal steps [ $VERBOSE ] && echo "#Step: genw cl --> $FNAME.snackwrap.c + $FNAME.h ..." if [ $DRYRUN ] ; then - echo "$CLOCPATH/snk_genw.sh $SYMBOLNAME $INDIR/$CLNAME $PROGVERSION $TMPDIR $CWRAPFILE $OUTDIR/$FNAME.h $TMPDIR/updated.cl $FORTRAN $NOGLOBFUNS" + echo "$HSA_LLVM_PATH/snk_genw.sh $SYMBOLNAME $INDIR/$CLNAME $PROGVERSION $TMPDIR $CWRAPFILE $OUTDIR/$FNAME.h $TMPDIR/updated.cl $FORTRAN $NOGLOBFUNS" else - $CLOCPATH/snk_genw.sh $SYMBOLNAME $INDIR/$CLNAME $PROGVERSION $TMPDIR $CWRAPFILE $OUTDIR/$FNAME.h $TMPDIR/updated.cl $FORTRAN $NOGLOBFUNS + $HSA_LLVM_PATH/snk_genw.sh $SYMBOLNAME $INDIR/$CLNAME $PROGVERSION $TMPDIR $CWRAPFILE $OUTDIR/$FNAME.h $TMPDIR/updated.cl $FORTRAN $NOGLOBFUNS rc=$? if [ $rc != 0 ] ; then echo "ERROR: The following command failed with return code $rc." - echo " $CLOCPATH/snk_genw.sh $SYMBOLNAME $INDIR/$CLNAME $PROGVERSION $TMPDIR $CWRAPFILE $OUTDIR/$FNAME.h $TMPDIR/updated.cl $FORTRAN $NOGLOBFUNS" + echo " $HSA_LLVM_PATH/snk_genw.sh $SYMBOLNAME $INDIR/$CLNAME $PROGVERSION $TMPDIR $CWRAPFILE $OUTDIR/$FNAME.h $TMPDIR/updated.cl $FORTRAN $NOGLOBFUNS" do_err $rc fi fi @@ -434,16 +434,16 @@ else fi [ $VERBOSE ] && echo "#Step: cloc.sh cl --> brig ..." if [ $DRYRUN ] ; then - echo "$CLOCPATH/cloc.sh -t $TMPDIR -k -clopts ""-I$INDIR"" $OTHERCLOCFLAGS $TMPDIR/updated.cl" + echo "$HSA_LLVM_PATH/cloc.sh -t $TMPDIR -k -clopts ""-I$INDIR"" $OTHERCLOCFLAGS $TMPDIR/updated.cl" else [ $CLOCVERBOSE ] && echo " " && echo "#------ Start cloc.sh output ------" - [ $CLOCVERBOSE ] && echo "$CLOCPATH/cloc.sh -t $TMPDIR -k -clopts "-I$INDIR" $OTHERCLOCFLAGS $TMPDIR/updated.cl" - $CLOCPATH/cloc.sh -t $TMPDIR -k -clopts "-I$INDIR" $OTHERCLOCFLAGS $TMPDIR/updated.cl + [ $CLOCVERBOSE ] && echo "$HSA_LLVM_PATH/cloc.sh -t $TMPDIR -k -clopts "-I$INDIR" $OTHERCLOCFLAGS $TMPDIR/updated.cl" + $HSA_LLVM_PATH/cloc.sh -t $TMPDIR -k -clopts "-I$INDIR" $OTHERCLOCFLAGS $TMPDIR/updated.cl rc=$? [ $CLOCVERBOSE ] && echo "#------ End cloc.sh output ------" && echo " " if [ $rc != 0 ] ; then echo "ERROR: cloc.sh failed with return code $rc. Command was:" - echo " $CLOCPATH/cloc.sh -t $TMPDIR -k -clopts "-I$INDIR" $OTHERCLOCFLAGS $TMPDIR/updated.cl" + echo " $HSA_LLVM_PATH/cloc.sh -t $TMPDIR -k -clopts "-I$INDIR" $OTHERCLOCFLAGS $TMPDIR/updated.cl" do_err $rc fi if [ $GEN_IL ] ; then @@ -496,7 +496,7 @@ else if [ $DRYRUN ] ; then echo "hexdump -v -e '""0x"" 1/1 ""%02X"" "",""' $BRIGDIR/$BRIGNAME " else - echo "char HSA_BrigMem[] = {" > $FULLBRIGHFILE + echo "char ${SYMBOLNAME}_HSA_BrigMem[] = {" > $FULLBRIGHFILE hexdump -v -e '"0x" 1/1 "%02X" ","' $BRIGDIR/$BRIGNAME >> $FULLBRIGHFILE rc=$? if [ $rc != 0 ] ; then @@ -504,7 +504,7 @@ else exit $rc fi echo "};" >> $FULLBRIGHFILE - echo "size_t HSA_BrigMemSz = sizeof(HSA_BrigMem);" >> $FULLBRIGHFILE + echo "size_t ${SYMBOLNAME}_HSA_BrigMemSz = sizeof(${SYMBOLNAME}_HSA_BrigMem);" >> $FULLBRIGHFILE fi @@ -535,7 +535,7 @@ if [ ! $KEEPTDIR ] ; then fi fi -[ $GEN_IL ] && [ $VERBOSE ] && echo " " && echo "#WARN: ***** For Step 2, Make hsail updates then run \"snack -c $FNAME.hsail \" ***** " +[ $GEN_IL ] && [ $VERBOSE ] && echo " " && echo "#WARN: ***** For Step 2, Make hsail updates then run \"snack.sh -c $FNAME.hsail \" ***** " [ $VERBOSE ] && echo "#Info: Done" exit 0 diff --git a/bin/snk_genw.sh b/bin/snk_genw.sh index fc7be1f..3411640 100755 --- a/bin/snk_genw.sh +++ b/bin/snk_genw.sh @@ -93,7 +93,6 @@ function write_copyright_template(){ #include #include #include -#include #include "hsa.h" #include "hsa_ext_finalize.h" @@ -104,13 +103,7 @@ function write_copyright_template(){ typedef enum status_t status_t; enum status_t { STATUS_SUCCESS=0, - STATUS_KERNEL_INVALID_SECTION_HEADER=1, - STATUS_KERNEL_ELF_INITIALIZATION_FAILED=2, - STATUS_KERNEL_INVALID_ELF_CONTAINER=3, - STATUS_KERNEL_MISSING_DATA_SECTION=4, - STATUS_KERNEL_MISSING_CODE_SECTION=5, - STATUS_KERNEL_MISSING_OPERAND_SECTION=6, - STATUS_UNKNOWN=7, + STATUS_UNKNOWN=1 }; EOF } @@ -127,33 +120,33 @@ function write_header_template(){ #define SNK_MAX_STREAMS 8 extern _CPPSTRING_ void stream_sync(const int stream_num); -#define SNK_MAXEDGESIN 10 -#define SNK_MAXEDGESOUT 10 #define SNK_ORDERED 1 #define SNK_UNORDERED 0 -#define SNK_GPU 0 -#define SNK_SIM 1 -#define SNK_CPU 2 + +#include +#ifndef HSA_RUNTIME_INC_HSA_H_ +typedef struct hsa_signal_s { uint64_t handle; } hsa_signal_t; +#endif + +typedef struct snk_task_s snk_task_t; +struct snk_task_s { + hsa_signal_t signal ; + snk_task_t* next; +}; typedef struct snk_lparm_s snk_lparm_t; struct snk_lparm_s { - int ndim; /* default = 1 */ - size_t gdims[3]; /* NUMBER OF THREADS TO EXECUTE MUST BE SPECIFIED */ - size_t ldims[3]; /* Default = {64} , e.g. 1 of 8 CU on Kaveri */ - int stream; /* default = -1 , synchrnous */ - int barrier; /* default = SNK_ORDERED */ - int acquire_fence_scope; /* default = 2 */ - int release_fence_scope; /* default = 2 */ - int num_edges_in; /* not yet implemented */ - int num_edges_out; /* not yet implemented */ - int * edges_in; /* not yet implemented */ - int * edges_out; /* not yet implemented */ - int devtype; /* not yet implemented-default=SNK_GPU */ - int rank; /* not yet implemented-used for MPI work sharing */ + int ndim; /* default = 1 */ + size_t gdims[3]; /* NUMBER OF THREADS TO EXECUTE MUST BE SPECIFIED */ + size_t ldims[3]; /* Default = {64} , e.g. 1 of 8 CU on Kaveri */ + int stream; /* default = -1 , synchrnous */ + int barrier; /* default = SNK_UNORDERED */ + int acquire_fence_scope; /* default = 2 */ + int release_fence_scope; /* default = 2 */ } ; /* This string macro is used to declare launch parameters set default values */ -#define SNK_INIT_LPARM(X,Y) snk_lparm_t * X ; snk_lparm_t _ ## X ={.ndim=1,.gdims={Y},.ldims={64},.stream=-1,.barrier=SNK_ORDERED,.acquire_fence_scope=2,.release_fence_scope=2,.num_edges_in=0,.num_edges_out=0,.edges_in=NULL,.edges_out=NULL,.devtype=SNK_GPU,.rank=0} ; X = &_ ## X ; +#define SNK_INIT_LPARM(X,Y) snk_lparm_t * X ; snk_lparm_t _ ## X ={.ndim=1,.gdims={Y},.ldims={64},.stream=-1,.barrier=SNK_UNORDERED,.acquire_fence_scope=2,.release_fence_scope=2} ; X = &_ ## X ; /* Equivalent host data types for kernel data types */ typedef struct snk_image3d_s snk_image3d_t; @@ -173,133 +166,121 @@ EOF function write_global_functions_template(){ /bin/cat <<"EOF" -extern void stream_sync(int stream_num) { - - hsa_queue_t *queue = Stream_CommandQ[stream_num]; - hsa_signal_t signal = Stream_Signal[stream_num]; - - hsa_barrier_packet_t barrier; - memset (&barrier, 0, sizeof(hsa_barrier_packet_t)); - barrier.header.type=HSA_PACKET_TYPE_BARRIER; - barrier.header.acquire_fence_scope=2; - barrier.header.release_fence_scope=2; - barrier.header.barrier=1; - barrier.completion_signal = signal; - - uint64_t index = hsa_queue_load_write_index_relaxed(queue); - const uint32_t queue_mask = queue->size - 1; - ((hsa_barrier_packet_t*)(queue->base_address))[index&queue_mask]=barrier; - hsa_queue_store_write_index_relaxed(queue,index+1); - //Ring the doorbell. - hsa_signal_store_relaxed(queue->doorbell_signal, index); - - //Wait for completion signal - /* printf("DEBUG STREAM_SYNC:Call #%d for stream %d \n",(int) index,stream_num); */ - hsa_signal_wait_acquire(signal, HSA_LT, 1, (uint64_t) -1, HSA_WAIT_EXPECTANCY_UNKNOWN); +void packet_store_release(uint32_t* packet, uint16_t header, uint16_t rest){ + __atomic_store_n(packet,header|(rest<<16),__ATOMIC_RELEASE); } - -EOF +uint16_t header(hsa_packet_type_t type) { + uint16_t header = type << HSA_PACKET_HEADER_TYPE; + header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE; + header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE; + return header; } -function write_context_template(){ -/bin/cat <<"EOF" -static Elf_Scn* snk_extract_elf_sect (Elf *elfP, Elf_Data *secHdr, char const *brigName, char const *bifName) { - int cnt = 0; - Elf_Scn* scn = NULL; - Elf32_Shdr* shdr = NULL; - char* sectionName = NULL; - - /* Iterate thru the elf sections */ - for (cnt = 1, scn = NULL; scn = elf_nextscn(elfP, scn); cnt++) { - if (((shdr = elf32_getshdr(scn)) == NULL)) { - return NULL; - } - sectionName = (char *)secHdr->d_buf + shdr->sh_name; - if (sectionName && - ((strcmp(sectionName, brigName) == 0) || - (strcmp(sectionName, bifName) == 0))) { - return scn; - } - } - - return NULL; -} +void barrier_sync(int stream_num, snk_task_t *dep_task_list) { + /* This routine will wait for all dependent packets to complete + irrespective of their queue number. This will put a barrier packet in the + stream belonging to the current packet. + */ -/* Extract section and copy into HsaBrig */ -static status_t snk_CopyElfSectToModule (Elf *elfP, Elf_Data *secHdr, char const *brigName, char const *bifName, - hsa_ext_brig_module_t* brig_module, - hsa_ext_brig_section_id_t section_id) { - Elf_Scn* scn = NULL; - Elf_Data* data = NULL; - void* address_to_copy; - size_t section_size=0; + if(stream_num < 0 || dep_task_list == NULL) return; - scn = snk_extract_elf_sect(elfP, secHdr, brigName, bifName); + hsa_queue_t *queue = Stream_CommandQ[stream_num]; + int dep_task_count = 0; + snk_task_t *head = dep_task_list; + while(head != NULL) { + dep_task_count++; + head = head->next; + } - if (scn) { - if ((data = elf_getdata(scn, NULL)) == NULL) { - return STATUS_UNKNOWN; + /* Keep adding barrier packets in multiples of 5 because that is the maximum signals that + the HSA barrier packet can support today + */ + snk_task_t *tasks = dep_task_list; + hsa_signal_t signal; + hsa_signal_create(1, 0, NULL, &signal); + const int HSA_BARRIER_MAX_DEPENDENT_TASKS = 5; + /* round up */ + int barrier_pkt_count = (dep_task_count + HSA_BARRIER_MAX_DEPENDENT_TASKS - 1) / HSA_BARRIER_MAX_DEPENDENT_TASKS; + int barrier_pkt_id = 0; + for(barrier_pkt_id = 0; barrier_pkt_id < barrier_pkt_count; barrier_pkt_id++) { + /* Obtain the write index for the command queue for this stream. */ + uint64_t index = hsa_queue_load_write_index_relaxed(queue); + const uint32_t queueMask = queue->size - 1; + + /* Define the barrier packet to be at the calculated queue index address. */ + hsa_barrier_and_packet_t* barrier = &(((hsa_barrier_and_packet_t*)(queue->base_address))[index&queueMask]); + memset(barrier, 0, sizeof(hsa_barrier_and_packet_t)); + barrier->header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE; + barrier->header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE; + barrier->header |= 0 << HSA_PACKET_HEADER_BARRIER; + barrier->header |= HSA_PACKET_TYPE_BARRIER_AND << HSA_PACKET_HEADER_TYPE; + + /* populate all dep_signals */ + int dep_signal_id = 0; + for(dep_signal_id = 0; dep_signal_id < HSA_BARRIER_MAX_DEPENDENT_TASKS; dep_signal_id++) { + if(tasks != NULL) { + /* fill out the barrier packet and ring doorbell */ + barrier->dep_signal[dep_signal_id] = tasks->signal; + tasks = tasks->next; + } } - section_size = data->d_size; - if (section_size > 0) { - address_to_copy = malloc(section_size); - memcpy(address_to_copy, data->d_buf, section_size); + if(tasks == NULL) { + /* reached the end of task list */ + barrier->header |= 1 << HSA_PACKET_HEADER_BARRIER; + barrier->completion_signal = signal; } + /* Increment write index and ring doorbell to dispatch the kernel. */ + hsa_queue_store_write_index_relaxed(queue, index+1); + hsa_signal_store_relaxed(queue->doorbell_signal, index); + //printf("barrier pkt submitted: %d\n", barrier_pkt_id); } - if ((!scn || section_size == 0)) { return STATUS_UNKNOWN; } + /* Wait on completion signal til kernel is finished. */ + hsa_signal_wait_acquire(signal, HSA_SIGNAL_CONDITION_LT, 1, UINT64_MAX, HSA_WAIT_STATE_BLOCKED); - /* Create a section header */ - brig_module->section[section_id] = (hsa_ext_brig_section_header_t*) address_to_copy; + hsa_signal_destroy(signal); +} - return STATUS_SUCCESS; -} - -/* Reads binary of BRIG and BIF format */ -static status_t snk_ReadBinary(hsa_ext_brig_module_t **brig_module_t, char* binary, size_t binsz) { - /* Create the brig_module */ - uint32_t number_of_sections = 3; - hsa_ext_brig_module_t* brig_module; - - brig_module = (hsa_ext_brig_module_t*) - (malloc (sizeof(hsa_ext_brig_module_t) + sizeof(void*)*number_of_sections)); - brig_module->section_count = number_of_sections; - - status_t status; - Elf* elfP = NULL; - Elf32_Ehdr* ehdr = NULL; - Elf_Data *secHdr = NULL; - Elf_Scn* scn = NULL; - - if (elf_version ( EV_CURRENT ) == EV_NONE) { return STATUS_KERNEL_ELF_INITIALIZATION_FAILED; } - if ((elfP = elf_memory(binary,binsz)) == NULL) { return STATUS_KERNEL_INVALID_ELF_CONTAINER; } - if (elf_kind (elfP) != ELF_K_ELF) { return STATUS_KERNEL_INVALID_ELF_CONTAINER; } - - if (((ehdr = elf32_getehdr(elfP)) == NULL) || - ((scn = elf_getscn(elfP, ehdr->e_shstrndx)) == NULL) || - ((secHdr = elf_getdata(scn, NULL)) == NULL)) { - return STATUS_KERNEL_INVALID_SECTION_HEADER; - } +extern void stream_sync(int stream_num) { + /* This is a user-callable function that puts a barrier packet into a queue where + all former dispatch packets were put on the queue for asynchronous asynchrnous + executions. This routine will wait for all packets to complete on this queue. + */ - status = snk_CopyElfSectToModule(elfP, secHdr,"hsa_data",".brig_hsa_data", - brig_module, HSA_EXT_BRIG_SECTION_DATA); - if (status != STATUS_SUCCESS) { return STATUS_KERNEL_MISSING_DATA_SECTION; } + hsa_queue_t *queue = Stream_CommandQ[stream_num]; - status = snk_CopyElfSectToModule(elfP, secHdr, "hsa_code",".brig_hsa_code", - brig_module, HSA_EXT_BRIG_SECTION_CODE); - if (status != STATUS_SUCCESS) { return STATUS_KERNEL_MISSING_CODE_SECTION; } + hsa_signal_t signal; + hsa_signal_create(1, 0, NULL, &signal); + + /* Obtain the write index for the command queue for this stream. */ + uint64_t index = hsa_queue_load_write_index_relaxed(queue); + const uint32_t queueMask = queue->size - 1; + + /* Define the barrier packet to be at the calculated queue index address. */ + hsa_barrier_or_packet_t* barrier = &(((hsa_barrier_or_packet_t*)(queue->base_address))[index&queueMask]); + memset(barrier, 0, sizeof(hsa_barrier_or_packet_t)); + barrier->header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE; + barrier->header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE; + barrier->header |= 1 << HSA_PACKET_HEADER_BARRIER; + barrier->header |= HSA_PACKET_TYPE_BARRIER_AND << HSA_PACKET_HEADER_TYPE; + barrier->completion_signal = signal; + + /* Increment write index and ring doorbell to dispatch the kernel. */ + hsa_queue_store_write_index_relaxed(queue, index+1); + hsa_signal_store_relaxed(queue->doorbell_signal, index); - status = snk_CopyElfSectToModule(elfP, secHdr, "hsa_operand",".brig_hsa_operand", - brig_module, HSA_EXT_BRIG_SECTION_OPERAND); - if (status != STATUS_SUCCESS) { return STATUS_KERNEL_MISSING_OPERAND_SECTION; } + /* Wait on completion signal til kernel is finished. */ + hsa_signal_wait_acquire(signal, HSA_SIGNAL_CONDITION_LT, 1, UINT64_MAX, HSA_WAIT_STATE_BLOCKED); - elf_end(elfP); - *brig_module_t = brig_module; + hsa_signal_destroy(signal); - return STATUS_SUCCESS; -} +} /* End of generated global functions */ +EOF +} # end of bash function write_global_functions_template() +function write_context_template(){ +/bin/cat <<"EOF" #define ErrorCheck(msg, status) \ if (status != HSA_STATUS_SUCCESS) { \ @@ -309,143 +290,49 @@ if (status != HSA_STATUS_SUCCESS) { \ /* printf("%s succeeded.\n", #msg);*/ \ } -/* Define required BRIG data structures. */ -typedef uint32_t BrigCodeOffset32_t; -typedef uint32_t BrigDataOffset32_t; -typedef uint16_t BrigKinds16_t; -typedef uint8_t BrigLinkage8_t; -typedef uint8_t BrigExecutableModifier8_t; -typedef BrigDataOffset32_t BrigDataOffsetString32_t; - -enum BrigKinds { - BRIG_KIND_NONE = 0x0000, - BRIG_KIND_DIRECTIVE_BEGIN = 0x1000, - BRIG_KIND_DIRECTIVE_KERNEL = 0x1008, -}; - -typedef struct BrigBase BrigBase; -struct BrigBase { - uint16_t byteCount; - BrigKinds16_t kind; -}; - -typedef struct BrigExecutableModifier BrigExecutableModifier; -struct BrigExecutableModifier { - BrigExecutableModifier8_t allBits; -}; - -typedef struct BrigDirectiveExecutable BrigDirectiveExecutable; -struct BrigDirectiveExecutable { - uint16_t byteCount; - BrigKinds16_t kind; - BrigDataOffsetString32_t name; - uint16_t outArgCount; - uint16_t inArgCount; - BrigCodeOffset32_t firstInArg; - BrigCodeOffset32_t firstCodeBlockEntry; - BrigCodeOffset32_t nextModuleEntry; - uint32_t codeBlockEntryCount; - BrigExecutableModifier modifier; - BrigLinkage8_t linkage; - uint16_t reserved; -}; - -typedef struct BrigData BrigData; -struct BrigData { - uint32_t byteCount; - uint8_t bytes[1]; -}; - -/* - * Determines if the given agent is of type HSA_DEVICE_TYPE_GPU - * and sets the value of data to the agent handle if it is. - */ -static hsa_status_t snk_FindGPU(hsa_agent_t agent, void *data) { - if (data == NULL) { - return HSA_STATUS_ERROR_INVALID_ARGUMENT; - } +/* Determines if the given agent is of type HSA_DEVICE_TYPE_GPU + and sets the value of data to the agent handle if it is. +*/ +static hsa_status_t get_gpu_agent(hsa_agent_t agent, void *data) { + hsa_status_t status; hsa_device_type_t device_type; - hsa_status_t stat = - hsa_agent_get_info(agent, HSA_AGENT_INFO_DEVICE, &device_type); - if (stat != HSA_STATUS_SUCCESS) { - return stat; - } - if (device_type == HSA_DEVICE_TYPE_GPU) { - *((hsa_agent_t *)data) = agent; + status = hsa_agent_get_info(agent, HSA_AGENT_INFO_DEVICE, &device_type); + if (HSA_STATUS_SUCCESS == status && HSA_DEVICE_TYPE_GPU == device_type) { + hsa_agent_t* ret = (hsa_agent_t*)data; + *ret = agent; + return HSA_STATUS_INFO_BREAK; } return HSA_STATUS_SUCCESS; } -/* Determines if a memory region can be used for kernarg allocations. */ -static hsa_status_t snk_GetKernArrg(hsa_region_t region, void* data) { - hsa_region_flag_t flags; - hsa_region_get_info(region, HSA_REGION_INFO_FLAGS, &flags); - if (flags & HSA_REGION_FLAG_KERNARG) { - hsa_region_t* ret = (hsa_region_t*) data; - *ret = region; +/* Determines if a memory region can be used for kernarg allocations. */ +static hsa_status_t get_kernarg_memory_region(hsa_region_t region, void* data) { + hsa_region_segment_t segment; + hsa_region_get_info(region, HSA_REGION_INFO_SEGMENT, &segment); + if (HSA_REGION_SEGMENT_GLOBAL != segment) { return HSA_STATUS_SUCCESS; } - return HSA_STATUS_SUCCESS; -} -/* Determines if a memory region is device memory */ -static hsa_status_t snk_GetDevRegion(hsa_region_t region, void* data) { - hsa_segment_t segment; - hsa_region_get_info(region, HSA_REGION_INFO_SEGMENT , &segment); - if (segment & HSA_SEGMENT_GROUP ) { + hsa_region_global_flag_t flags; + hsa_region_get_info(region, HSA_REGION_INFO_GLOBAL_FLAGS, &flags); + if (flags & HSA_REGION_GLOBAL_FLAG_KERNARG) { hsa_region_t* ret = (hsa_region_t*) data; *ret = region; - return HSA_STATUS_SUCCESS; + return HSA_STATUS_INFO_BREAK; } - return HSA_STATUS_SUCCESS; -} - -/* - * Finds the specified symbols offset in the specified brig_module. - * If the symbol is found the function returns HSA_STATUS_SUCCESS, - * otherwise it returns HSA_STATUS_ERROR. - */ -static hsa_status_t snk_FindSymbolOffset(hsa_ext_brig_module_t* brig_module, const char* symbol_name, - hsa_ext_brig_code_section_offset32_t* offset) { - - /* Get the data section */ - hsa_ext_brig_section_header_t* data_section_header = - brig_module->section[HSA_EXT_BRIG_SECTION_DATA]; - /* Get the code section */ - hsa_ext_brig_section_header_t* code_section_header = - brig_module->section[HSA_EXT_BRIG_SECTION_CODE]; - - /* First entry into the BRIG code section */ - BrigCodeOffset32_t code_offset = code_section_header->header_byte_count; - BrigBase* code_entry = (BrigBase*) ((char*)code_section_header + code_offset); - while (code_offset != code_section_header->byte_count) { - if (code_entry->kind == BRIG_KIND_DIRECTIVE_KERNEL) { - /* Now find the data in the data section */ - BrigDirectiveExecutable* directive_kernel = (BrigDirectiveExecutable*) (code_entry); - BrigDataOffsetString32_t data_name_offset = directive_kernel->name; - BrigData* data_entry = (BrigData*)((char*) data_section_header + data_name_offset); - if (!strncmp(symbol_name, (char*)data_entry->bytes, strlen(symbol_name))){ - *offset = code_offset; - return HSA_STATUS_SUCCESS; - } - } - code_offset += code_entry->byteCount; - code_entry = (BrigBase*) ((char*)code_section_header + code_offset); - } - return HSA_STATUS_ERROR; + return HSA_STATUS_SUCCESS; } /* Stream specific globals */ -hsa_signal_t Stream_Signal[SNK_MAX_STREAMS]; -hsa_queue_t* Stream_CommandQ[SNK_MAX_STREAMS]; - +hsa_queue_t* Stream_CommandQ[SNK_MAX_STREAMS]; +static int SNK_NextTaskId = 0 ; /* Context(cl file) specific globals */ -hsa_ext_brig_module_t* _CN__BrigModule; -hsa_agent_t _CN__Device; -hsa_ext_program_handle_t _CN__HsaProgram; -hsa_ext_brig_module_handle_t _CN__ModuleHandle; +hsa_agent_t _CN__Agent; +hsa_ext_program_t _CN__HsaProgram; +hsa_executable_t _CN__Executable; +hsa_region_t _CN__KernargRegion; int _CN__FC = 0; /* Global variables */ @@ -460,76 +347,80 @@ status_t _CN__InitContext(){ err = hsa_init(); ErrorCheck(Initializing the hsa runtime, err); - /* Iterate over the agents and pick the gpu agent */ - _CN__Device = 0; - err = hsa_iterate_agents(snk_FindGPU, &_CN__Device); - ErrorCheck(Calling hsa_iterate_agents, err); - - err = (_CN__Device == 0) ? HSA_STATUS_ERROR : HSA_STATUS_SUCCESS; - ErrorCheck(Checking if the GPU device is non-zero, err); -/* - err = hsa_ext_set_memory_type(_CN__Device, HSA_EXT_MEMORY_TYPE_COHERENT ); - ErrorCheck(Calling hsa_ext_set_memory_type, err); -*/ + /* Iterate over the agents and pick the gpu agent */ + err = hsa_iterate_agents(get_gpu_agent, &_CN__Agent); + if(err == HSA_STATUS_INFO_BREAK) { err = HSA_STATUS_SUCCESS; } + ErrorCheck(Getting a gpu agent, err); - /* Query the name of the device. */ + /* Query the name of the agent. */ char name[64] = { 0 }; - err = hsa_agent_get_info(_CN__Device, HSA_AGENT_INFO_NAME, name); - ErrorCheck(Querying the device name, err); -/* - printf("The device name is %s.\n", name); -*/ - /* Load BRIG, encapsulated in an ELF container, into a BRIG module. */ - status_t status = snk_ReadBinary(&_CN__BrigModule,HSA_BrigMem,HSA_BrigMemSz); - if (status != STATUS_SUCCESS) { - printf("Could not create BRIG module: %d\n", status); - if (status == STATUS_KERNEL_INVALID_SECTION_HEADER || - status == STATUS_KERNEL_ELF_INITIALIZATION_FAILED || - status == STATUS_KERNEL_INVALID_ELF_CONTAINER) { - printf("The ELF file is invalid or possibley corrupted.\n"); - } - if (status == STATUS_KERNEL_MISSING_DATA_SECTION || - status == STATUS_KERNEL_MISSING_CODE_SECTION || - status == STATUS_KERNEL_MISSING_OPERAND_SECTION) { - printf("One or more ELF sections are missing. Use readelf command to \ - to check if hsa_data, hsa_code and hsa_operands exist.\n"); - } - } + err = hsa_agent_get_info(_CN__Agent, HSA_AGENT_INFO_NAME, name); + ErrorCheck(Querying the agent name, err); + /* printf("The agent name is %s.\n", name); */ + + /* Query the maximum size of the queue. */ + uint32_t queue_size = 0; + err = hsa_agent_get_info(_CN__Agent, HSA_AGENT_INFO_QUEUE_MAX_SIZE, &queue_size); + ErrorCheck(Querying the agent maximum queue size, err); + /* printf("The maximum queue size is %u.\n", (unsigned int) queue_size); */ - /* Create hsa program for this context */ - err = hsa_ext_program_create(&_CN__Device, 1, HSA_EXT_BRIG_MACHINE_LARGE, HSA_EXT_BRIG_PROFILE_FULL, &_CN__HsaProgram); - ErrorCheck(Creating the hsa program, err); + /* Create hsa program. */ + memset(&_CN__HsaProgram,0,sizeof(hsa_ext_program_t)); + err = hsa_ext_program_create(HSA_MACHINE_MODEL_LARGE, HSA_PROFILE_FULL, HSA_DEFAULT_FLOAT_ROUNDING_MODE_DEFAULT, NULL, &_CN__HsaProgram); + ErrorCheck(Create the program, err); - /* Add the BRIG module to this hsa program. */ - err = hsa_ext_add_module(_CN__HsaProgram, _CN__BrigModule, &_CN__ModuleHandle); + /* Add the BRIG module to hsa program. */ + err = hsa_ext_program_add_module(_CN__HsaProgram, (hsa_ext_module_t) _CN__HSA_BrigMem ); ErrorCheck(Adding the brig module to the program, err); - /* Query the maximum size of the queue. */ - uint32_t queue_size = 0; - err = hsa_agent_get_info(_CN__Device, HSA_AGENT_INFO_QUEUE_MAX_SIZE, &queue_size); - ErrorCheck(Querying the device maximum queue size, err); + /* Determine the agents ISA. */ + hsa_isa_t isa; + err = hsa_agent_get_info(_CN__Agent, HSA_AGENT_INFO_ISA, &isa); + ErrorCheck(Query the agents isa, err); + + /* * Finalize the program and extract the code object. */ + hsa_ext_control_directives_t control_directives; + memset(&control_directives, 0, sizeof(hsa_ext_control_directives_t)); + hsa_code_object_t code_object; + err = hsa_ext_program_finalize(_CN__HsaProgram, isa, 0, control_directives, "", HSA_CODE_OBJECT_TYPE_PROGRAM, &code_object); + ErrorCheck(Finalizing the program, err); - /* printf("DEBUG: The maximum queue size is %u.\n", (unsigned int) queue_size); */ + /* Destroy the program, it is no longer needed. */ + err=hsa_ext_program_destroy(_CN__HsaProgram); + ErrorCheck(Destroying the program, err); + + /* Create the empty executable. */ + err = hsa_executable_create(HSA_PROFILE_FULL, HSA_EXECUTABLE_STATE_UNFROZEN, "", &_CN__Executable); + ErrorCheck(Create the executable, err); + + /* Load the code object. */ + err = hsa_executable_load_code_object(_CN__Executable, _CN__Agent, code_object, ""); + ErrorCheck(Loading the code object, err); + + /* Freeze the executable; it can now be queried for symbols. */ + err = hsa_executable_freeze(_CN__Executable, ""); + ErrorCheck(Freeze the executable, err); + + /* Find a memory region that supports kernel arguments. */ + _CN__KernargRegion.handle=(uint64_t)-1; + hsa_agent_iterate_regions(_CN__Agent, get_kernarg_memory_region, &_CN__KernargRegion); + err = (_CN__KernargRegion.handle == (uint64_t)-1) ? HSA_STATUS_ERROR : HSA_STATUS_SUCCESS; + ErrorCheck(Finding a kernarg memory region, err); /* Create a queue using the maximum size. */ - err = hsa_queue_create(_CN__Device, queue_size, HSA_QUEUE_TYPE_MULTI, NULL, NULL, &Sync_CommandQ); + err = hsa_queue_create(_CN__Agent, queue_size, HSA_QUEUE_TYPE_SINGLE, NULL, NULL, UINT32_MAX, UINT32_MAX, &Sync_CommandQ); ErrorCheck(Creating the queue, err); /* Create signal to wait for the dispatch to finish. this Signal is only used for synchronous execution */ err=hsa_signal_create(1, 0, NULL, &Sync_Signal); ErrorCheck(Creating a HSA signal, err); - /* Create queues and signals for each stream */ + /* Create queues and signals for each stream. */ int stream_num; for ( stream_num = 0 ; stream_num < SNK_MAX_STREAMS ; stream_num++){ - /* printf("calling queue create for stream %d\n",stream_num); */ - err = hsa_queue_create(_CN__Device, queue_size, HSA_QUEUE_TYPE_MULTI, NULL, NULL, &Stream_CommandQ[stream_num]); + err=hsa_queue_create(_CN__Agent, queue_size, HSA_QUEUE_TYPE_SINGLE, NULL, NULL, UINT32_MAX, UINT32_MAX, &Stream_CommandQ[stream_num]); ErrorCheck(Creating the Stream Command Q, err); - - /* Create signal to wait for the dispatch to finish. this Signal is only used for synchronous execution */ - err=hsa_signal_create(1, 0, NULL, &Stream_Signal[stream_num]); - ErrorCheck(Creating the Stream Signal, err); } return STATUS_SUCCESS; @@ -542,13 +433,14 @@ function write_KernelStatics_template(){ /bin/cat <<"EOF" /* Kernel specific globals, one set for each kernel */ -hsa_ext_code_descriptor_t* _KN__HsaCodeDescriptor; -void* _KN__kernel_arg_buffer = NULL; /* Only for syncrhnous calls */ -size_t _KN__kernel_arg_buffer_size ; -hsa_ext_finalization_request_t _KN__FinalizationRequestList; +hsa_executable_symbol_t _KN__Symbol; int _KN__FK = 0 ; status_t _KN__init(); status_t _KN__stop(); +uint64_t _KN__Kernel_Object; +uint32_t _KN__Kernarg_Segment_Size; /* May not need to be global */ +uint32_t _KN__Group_Segment_Size; +uint32_t _KN__Private_Segment_Size; EOF } @@ -565,36 +457,27 @@ extern status_t _KN__init(){ hsa_status_t err; - /* Construct finalization request list for this kernel. */ - _KN__FinalizationRequestList.module = _CN__ModuleHandle; - _KN__FinalizationRequestList.program_call_convention = 0; + /* Extract the symbol from the executable. */ + /* printf("Kernel name _KN__: Looking for symbol %s\n","__OpenCL__KN__kernel"); */ + err = hsa_executable_get_symbol(_CN__Executable, NULL, "&__OpenCL__KN__kernel", _CN__Agent , 0, &_KN__Symbol); + ErrorCheck(Extract the symbol from the executable, err); - err = snk_FindSymbolOffset(_CN__BrigModule, "_FN_" , &_KN__FinalizationRequestList.symbol); - ErrorCheck(Finding the symbol offset for the kernel, err); + /* Extract dispatch information from the symbol */ + err = hsa_executable_symbol_get_info(_KN__Symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, &_KN__Kernel_Object); + ErrorCheck(Extracting the symbol from the executable, err); + err = hsa_executable_symbol_get_info(_KN__Symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE, &_KN__Kernarg_Segment_Size); + ErrorCheck(Extracting the kernarg segment size from the executable, err); + err = hsa_executable_symbol_get_info(_KN__Symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_GROUP_SEGMENT_SIZE, &_KN__Group_Segment_Size); + ErrorCheck(Extracting the group segment size from the executable, err); + err = hsa_executable_symbol_get_info(_KN__Symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_PRIVATE_SEGMENT_SIZE, &_KN__Private_Segment_Size); + ErrorCheck(Extracting the private segment from the executable, err); - /* (RE) Finalize the hsa program with this kernel on the request list */ - err = hsa_ext_finalize_program(_CN__HsaProgram, _CN__Device, 1, &_KN__FinalizationRequestList, NULL, NULL, 0, NULL, 0); - ErrorCheck(Finalizing the program, err); - - /* Get the hsa code descriptor address. */ - err = hsa_ext_query_kernel_descriptor_address(_CN__HsaProgram, _CN__ModuleHandle , _KN__FinalizationRequestList.symbol, &_KN__HsaCodeDescriptor); - ErrorCheck(Querying the kernel descriptor address, err); - - /* Find a memory region that supports kernel arguments. */ - hsa_region_t kernarg_region = 0; - hsa_agent_iterate_regions(_CN__Device, snk_GetKernArrg, &kernarg_region); - err = (kernarg_region == 0) ? HSA_STATUS_ERROR : HSA_STATUS_SUCCESS; - ErrorCheck(Finding a kernarg memory region, err); - - /* Allocate the kernel argument buffer from the correct region. */ - _KN__kernel_arg_buffer_size = _KN__HsaCodeDescriptor->kernarg_segment_byte_size; - err = hsa_memory_allocate(kernarg_region, _KN__kernel_arg_buffer_size, &_KN__kernel_arg_buffer); - ErrorCheck(Allocating kernel argument memory buffer, err); return STATUS_SUCCESS; } /* end of _KN__init */ + extern status_t _KN__stop(){ status_t err; if (_CN__FC == 0 ) { @@ -611,15 +494,13 @@ extern status_t _KN__stop(){ } /* end of _KN__stop */ + EOF } function write_kernel_template(){ /bin/cat <<"EOF" - hsa_status_t err; - status_t status; - /* Get stream number from launch parameters. */ /* This must be less than SNK_MAX_STREAMS. */ /* If negative, then function call is synchrnous. */ @@ -629,92 +510,72 @@ function write_kernel_template(){ return; } - if (_KN__FK == 0 ) { - status = _KN__init(); - if ( status != STATUS_SUCCESS ) return; - _KN__FK = 1; + hsa_queue_t* this_Q ; + if ( stream_num < 0 ) { /* Sychronous execution */ + this_Q = Sync_CommandQ; + } else { /* Asynchrnous execution uses one command Q per stream */ + this_Q = Stream_CommandQ[stream_num]; } - hsa_queue_t* this_Q ; - hsa_signal_t this_sig ; + /* Obtain the current queue write index. increases with each call to kernel */ + uint64_t index = hsa_queue_load_write_index_relaxed(this_Q); + /* printf("DEBUG:Call #%d to kernel \"%s\" \n",(int) index,"_KN_"); */ + + /* Find the queue index address to write the packet info into. */ + const uint32_t queueMask = this_Q->size - 1; + hsa_kernel_dispatch_packet_t* this_aql = &(((hsa_kernel_dispatch_packet_t*)(this_Q->base_address))[index&queueMask]); - /* Setup this call to this kernel dispatch packet from scratch. */ - hsa_dispatch_packet_t this_aql; - memset(&this_aql, 0, sizeof(this_aql)); + /* FIXME: We need to check for queue overflow here. */ if ( stream_num < 0 ) { - /* Sychronous execution */ - this_Q = Sync_CommandQ; - this_sig = Sync_Signal; - this_aql.completion_signal=this_sig; - } else { - /* Asynchrnous */ - this_Q = Stream_CommandQ[stream_num]; - this_sig = Stream_Signal[stream_num]; + /* Use the global synchrnous signal Sync_Signal */ + this_aql->completion_signal=Sync_Signal; + hsa_signal_store_relaxed(Sync_Signal,1); } - /* Reset signal to original value. */ - /* WARNING atomic operation here. */ - hsa_signal_store_relaxed(this_sig,1); - - /* Set the dimensions passed from the application */ - this_aql.dimensions=(uint16_t) lparm->ndim; - this_aql.grid_size_x=lparm->gdims[0]; - this_aql.workgroup_size_x=lparm->ldims[0]; + /* Process lparm values */ + /* this_aql.dimensions=(uint16_t) lparm->ndim; */ + this_aql->setup |= (uint16_t) lparm->ndim << HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS; + this_aql->grid_size_x=lparm->gdims[0]; + this_aql->workgroup_size_x=lparm->ldims[0]; if (lparm->ndim>1) { - this_aql.grid_size_y=lparm->gdims[1]; - this_aql.workgroup_size_y=lparm->ldims[1]; + this_aql->grid_size_y=lparm->gdims[1]; + this_aql->workgroup_size_y=lparm->ldims[1]; } else { - this_aql.grid_size_y=1; - this_aql.workgroup_size_y=1; + this_aql->grid_size_y=1; + this_aql->workgroup_size_y=1; } if (lparm->ndim>2) { - this_aql.grid_size_z=lparm->gdims[2]; - this_aql.workgroup_size_z=lparm->ldims[2]; + this_aql->grid_size_z=lparm->gdims[2]; + this_aql->workgroup_size_z=lparm->ldims[2]; } else { - this_aql.grid_size_z=1; - this_aql.workgroup_size_z=1; + this_aql->grid_size_z=1; + this_aql->workgroup_size_z=1; } - this_aql.header.type=HSA_PACKET_TYPE_DISPATCH; - this_aql.header.acquire_fence_scope=lparm->acquire_fence_scope; - this_aql.header.release_fence_scope=lparm->release_fence_scope; - - /* Set user defined barrier, default = 0 implies execution order not gauranteed */ - this_aql.header.barrier=lparm->barrier; - this_aql.group_segment_size=_KN__HsaCodeDescriptor->workgroup_group_segment_byte_size; - this_aql.private_segment_size=_KN__HsaCodeDescriptor->workitem_private_segment_byte_size; - - /* copy args from the custom _KN__args structure */ - /* FIXME We should align kernel_arg_buffer because _KN__args is aligned */ - memcpy(_KN__kernel_arg_buffer, &_KN__args, sizeof(_KN__args)); - - /* Bind kernelcode to the packet. */ - this_aql.kernel_object_address=_KN__HsaCodeDescriptor->code.handle; - + /* thisKernargAddress has already been set up in the beginning of this routine */ /* Bind kernel argument buffer to the aql packet. */ - this_aql.kernarg_address=(uint64_t)_KN__kernel_arg_buffer; - - /* Obtain the current queue write index. increases with each call to kernel */ - uint64_t index = hsa_queue_load_write_index_relaxed(this_Q); - /* printf("DEBUG:Call #%d to kernel \"%s\" \n",(int) index,"_KN_"); */ - - /* Write this_aql at the calculated queue index address. */ - const uint32_t queueMask = this_Q->size - 1; - ((hsa_dispatch_packet_t*)(this_Q->base_address))[index&queueMask]=this_aql; - - /* Increment the write index and ring the doorbell to dispatch the kernel. */ + this_aql->kernarg_address = (void*) thisKernargAddress; + this_aql->kernel_object = _KN__Kernel_Object; + this_aql->private_segment_size = _KN__Private_Segment_Size; + this_aql->group_segment_size = _KN__Group_Segment_Size; + + /* Prepare and set the packet header */ + /* Only set barrier bit if asynchrnous execution */ + if ( stream_num >= 0 ) this_aql->header |= lparm->barrier << HSA_PACKET_HEADER_BARRIER; + this_aql->header |= lparm->acquire_fence_scope << HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE; + this_aql->header |= lparm->release_fence_scope << HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE; + __atomic_store_n((uint8_t*)(&this_aql->header), (uint8_t)HSA_PACKET_TYPE_KERNEL_DISPATCH, __ATOMIC_RELEASE); + + /* Increment write index and ring doorbell to dispatch the kernel. */ hsa_queue_store_write_index_relaxed(this_Q, index+1); hsa_signal_store_relaxed(this_Q->doorbell_signal, index); - /* For synchronous execution, wait on the dispatch signal until the kernel is finished. */ if ( stream_num < 0 ) { - err = hsa_signal_wait_acquire(this_sig, HSA_LT, 1, (uint64_t) -1, HSA_WAIT_EXPECTANCY_UNKNOWN); - ErrorCheck(Waiting on the dispatch signal, err); + /* For default synchrnous execution, wait til kernel is finished. */ + hsa_signal_value_t value = hsa_signal_wait_acquire(Sync_Signal, HSA_SIGNAL_CONDITION_LT, 1, UINT64_MAX, HSA_WAIT_STATE_BLOCKED); } - return; - /* *** END OF KERNEL LAUNCH TEMPLATE *** */ EOF } @@ -737,8 +598,6 @@ C INCLUDE launch_params.f in your FORTRAN source so you can set dimensions. integer (C_INT) :: barrier = 1 integer (C_INT) :: acquire_fence_scope = 2 integer (C_INT) :: release_fence_scope = 2 - integer (C_INT) :: num_edges_in = 0 - integer (C_INT) :: num_edges_out = 0 end type snk_lparm_t type (snk_lparm_t) lparm C @@ -846,7 +705,7 @@ __SEDCMD=" " # Read the CLF and build a list of kernels and args, one kernel and set of args per line of KARGLIST file cpp $__CLF | sed -e '/__kernel/,/)/!d' | sed -e ':a;$!N;s/\n/ /;ta;P;D' | sed -e 's/__kernel/\n__kernel/g' | grep "__kernel" | \ - sed -e "s/__kernel//;s/void//;s/__global//g;s/{//g;s/ \*/\*/g" | cut -d\) -f1 | sed -e "s/\*/\* /g;s/__restrict__//g" >$__KARGLIST + sed -e "s/__kernel//;s/__global//g;s/{//g;s/ \*/\*/g" | cut -d\) -f1 | sed -e "s/\*/\* /g;s/__restrict__//g" >$__KARGLIST # The header and extra-cl files must start empty because lines are incrementally added to end of file if [ -f $__EXTRACL ] ; then rm -f $__EXTRACL ; fi @@ -887,8 +746,15 @@ __SEDCMD=" " while read line ; do # parse the kernel name __KN and the native argument list __ARGL - __KN=`echo ${line%(*} | tr -d ' '` + TYPE_NAME=`echo ${line%(*}` + __KN=`echo $TYPE_NAME | awk '{print $2}'` + __KT=`echo $TYPE_NAME | awk '{print $1}'` __ARGL=${line#*(} +# force it to return pointer to snk_task_t + if [ "$__KT" == "snk_task_t" ] ; then + __KT="snk_task_t*" + fi + # Add the kernel initialization routine to the c wrapper write_KernelStatics_template | sed -e "s/_CN_/${__SN}/g;s/_KN_/${__KN}/g" >>$__CWRAP @@ -909,16 +775,31 @@ __SEDCMD=" " echo "/* ------ Start of SNACK function ${__KN} ------ */ " >> $__CWRAP if [ "$__IS_FORTRAN" == "1" ] ; then # Add underscore to kernel name and resolve lparm pointer - echo "extern void ${__KN}_($__CFN_ARGL, const snk_lparm_t * lparm) {" >>$__CWRAP + echo "extern ${__KT} ${__KN}_($__CFN_ARGL, const snk_lparm_t * lparm) {" >>$__CWRAP else if [ "$__CFN_ARGL" == "" ] ; then - echo "extern void $__KN(const snk_lparm_t * lparm) {" >>$__CWRAP + echo "extern ${__KT} $__KN(const snk_lparm_t * lparm) {" >>$__CWRAP else - echo "extern void $__KN($__CFN_ARGL, const snk_lparm_t * lparm) {" >>$__CWRAP + echo "extern ${__KT} $__KN($__CFN_ARGL, const snk_lparm_t * lparm) {" >>$__CWRAP fi fi - -# Write the structure definition for the kernel arguments + + echo " /* Kernel initialization has to be done before kernel arguments are set/inspected */ " >> $__CWRAP + echo " if (${__KN}_FK == 0 ) { " >> $__CWRAP + echo " status_t status = ${__KN}_init(); " >> $__CWRAP + echo " if ( status != STATUS_SUCCESS ) return; " >> $__CWRAP + echo " ${__KN}_FK = 1; " >> $__CWRAP + echo " } " >> $__CWRAP +# Write the structure definition for the kernel arguments. +# Consider eliminating global _KN__args and memcopy and write directly to thisKernargAddress. +# by writing these statements here: + echo " /* Allocate the kernel argument buffer from the correct region. */ " >> $__CWRAP + echo " void* thisKernargAddress; " >> $__CWRAP + echo " /* HSA 1.0F has a bug that serializes all queue operations when hsa_memory_allocate is used. " >> $__CWRAP + echo " Revert back to hsa_memory_allocate once bug is fixed. */ " >> $__CWRAP + echo " thisKernargAddress = malloc(${__KN}_Kernarg_Segment_Size); " >> $__CWRAP + #echo " hsa_memory_allocate(${__SN}_KernargRegion, ${__KN}_Kernarg_Segment_Size, &thisKernargAddress); " >> $__CWRAP +# How to map a structure into an malloced memory area? echo " struct ${__KN}_args_struct {" >> $__CWRAP NEXTI=0 if [ $GENW_ADD_DUMMY ] ; then @@ -946,7 +827,9 @@ __SEDCMD=" " NEXTI=$(( NEXTI + 1 )) done echo " } __attribute__ ((aligned (16))) ; " >> $__CWRAP - echo " struct ${__KN}_args_struct ${__KN}_args ; " >> $__CWRAP + echo " struct ${__KN}_args_struct* ${__KN}_args ; " >> $__CWRAP + echo " /* Setup kernel args */ " >> $__CWRAP + echo " ${__KN}_args = (struct ${__KN}_args_struct*) thisKernargAddress; " >> $__CWRAP # Write statements to fill in the argument structure and # keep track of updated CL arg list and new call list @@ -954,12 +837,12 @@ __SEDCMD=" " # to call the real kernel CL function. NEXTI=0 if [ $GENW_ADD_DUMMY ] ; then - echo " ${__KN}_args.arg0=0 ; " >> $__CWRAP - echo " ${__KN}_args.arg1=0 ; " >> $__CWRAP - echo " ${__KN}_args.arg2=0 ; " >> $__CWRAP - echo " ${__KN}_args.arg3=0 ; " >> $__CWRAP - echo " ${__KN}_args.arg4=0 ; " >> $__CWRAP - echo " ${__KN}_args.arg5=0 ; " >> $__CWRAP + echo " ${__KN}_args->arg0=0 ; " >> $__CWRAP + echo " ${__KN}_args->arg1=0 ; " >> $__CWRAP + echo " ${__KN}_args->arg2=0 ; " >> $__CWRAP + echo " ${__KN}_args->arg3=0 ; " >> $__CWRAP + echo " ${__KN}_args->arg4=0 ; " >> $__CWRAP + echo " ${__KN}_args->arg5=0 ; " >> $__CWRAP NEXTI=6 fi KERN_NEEDS_CL_WRAPPER="FALSE" @@ -975,18 +858,18 @@ __SEDCMD=" " if [ "$last_char" == "*" ] ; then arglistw="${arglistw}${sepchar}${arg_type} ${arg_name}" calllist="${calllist}${sepchar}${arg_name}" - echo " ${__KN}_args.arg${NEXTI} = $arg_name ; " >> $__CWRAP + echo " ${__KN}_args->arg${NEXTI} = $arg_name ; " >> $__CWRAP else is_scalar $simple_arg_type if [ $? == 1 ] ; then arglistw="$arglistw${sepchar}${arg_type} $arg_name" calllist="${calllist}${sepchar}${arg_name}" - echo " ${__KN}_args.arg${NEXTI} = $arg_name ; " >> $__CWRAP + echo " ${__KN}_args->arg${NEXTI} = $arg_name ; " >> $__CWRAP else KERN_NEEDS_CL_WRAPPER="TRUE" arglistw="$arglistw${sepchar}${arg_type}* $arg_name" calllist="${calllist}${sepchar}${arg_name}[0]" - echo " ${__KN}_args.arg${NEXTI} = &$arg_name ; " >> $__CWRAP + echo " ${__KN}_args->arg${NEXTI} = &$arg_name ; " >> $__CWRAP fi fi sepchar="," @@ -1007,12 +890,12 @@ __SEDCMD=" " # Write the prototype to the header file if [ "$__IS_FORTRAN" == "1" ] ; then # don't use headers for fortran but it is a good reference for how to call from fortran - echo "extern _CPPSTRING_ void ${__KN}_($__PROTO_ARGL, const snk_lparm_t * lparm_p);" >>$__HDRF + echo "extern _CPPSTRING_ $__KT ${__KN}_($__PROTO_ARGL, const snk_lparm_t * lparm_p);" >>$__HDRF else if [ "$__PROTO_ARGL" == "" ] ; then - echo "extern _CPPSTRING_ void ${__KN}(const snk_lparm_t * lparm);" >>$__HDRF + echo "extern _CPPSTRING_ $__KT ${__KN}(const snk_lparm_t * lparm);" >>$__HDRF else - echo "extern _CPPSTRING_ void ${__KN}($__PROTO_ARGL, const snk_lparm_t * lparm);" >>$__HDRF + echo "extern _CPPSTRING_ $__KT ${__KN}($__PROTO_ARGL, const snk_lparm_t * lparm);" >>$__HDRF fi fi @@ -1020,6 +903,7 @@ __SEDCMD=" " # 1) Context Name _CN_ 2) Kerneel name _KN_ and 3) Funtion name _FN_ write_kernel_template | sed -e "s/_CN_/${__SN}/g;s/_KN_/${__KN}/g;s/_FN_/${__FN}/g" >>$__CWRAP + echo " return;" >> $__CWRAP echo "} " >> $__CWRAP echo "/* ------ End of SNACK function ${__KN} ------ */ " >> $__CWRAP @@ -1036,13 +920,13 @@ __SEDCMD=" " # Write the updated CL if [ "$__SEDCMD" != " " ] ; then -# Remove extra spaces, then change "__kernel void" to "void" if they have call-by-value structs -# Still could fail if __kernel void _FN_ split across multple lines, FIX THIS - awk '$1=$1' $__CLF | sed -e "$__SEDCMD" > $__UPDATED_CL - cat $__EXTRACL >> $__UPDATED_CL +# Remove extra spaces, then change "__kernel void" to "void" if they have call-by-value structs +# Still could fail if __kernel void _FN_ split across multple lines, FIX THIS + awk '$1=$1' $__CLF | sed -e "$__SEDCMD" > $__UPDATED_CL + cat $__EXTRACL | sed -e "s/ snk_task_t/ void/g" >> $__UPDATED_CL else -# No changes to the CL file is needed, so just make a copy - cp -p $__CLF $__UPDATED_CL +# No changes to the CL file are needed, so just make a copy + cat $__CLF | sed -e "s/ snk_task_t/ void/g" > $__UPDATED_CL fi rm $__KARGLIST diff --git a/examples/README.md b/examples/README.md index 37ee674..a7822aa 100644 --- a/examples/README.md +++ b/examples/README.md @@ -1,34 +1,36 @@ ``` -Note- If HSAIL-HLC-Stable is used. The tests have to be compiled with CFLAGS=-DDUMMY_ARGS=1. Example make all -DDUMMY_ARGS=1. Make sure to have the compiler and cloc installed in opt/amd/bin and the runtime in opt/hsa/ as per the instructions in INSTALL.md +Make sure to have the compiler and cloc.sh installed in /opt/amd/cloc/bin and +the HSA runtime in /opt/hsa as per the instructions in INSTALL.md. + +# Make a copy of the examples in your home directory +cp -r /opt/amd/cloc/examples ~/examples #For building HSA examples. -cd hsa -make all +cd ~/examples/hsa/vector_copy +make make test - -#For building OKRA examples -export OKRA_DISABLE_FIX_HSAIL=1 -cd okra -make all +cd ~/examples/hsa/vector_copy_async +make make test #For SNACK examples -cd snack/csquares +cd ~/examples/snack/csquares ./buildrun.sh -cd snack/fortran +cd ~/examples/snack/fortran ./buildrun.sh -cd snack/helloworld +cd ~/examples/snack/helloworld ./buildrun.sh ./buildrun.sh f ./buildrun.sh cpp -cd snack/matmul +cd ~/examples/snack/matmul +./buildrun.sh +cd ~/examples/snack/multiple_cl_files ./buildrun.sh -cd snack/multiple_cl_files +cd ~/examples/snack/vector_copy ./buildrun.sh -cd snack/vector_copy +cd ~/examples/snack/async_vecsum ./buildrun.sh ``` -You can also build and run individually each test case ---------- diff --git a/examples/hsa/Makefile b/examples/hsa/Makefile deleted file mode 100644 index f76476d..0000000 --- a/examples/hsa/Makefile +++ /dev/null @@ -1,26 +0,0 @@ -DIRS = common vector_copy -# the sets of directories to do various things in -BUILDDIRS = $(DIRS:%=build-%) -CLEANDIRS = $(DIRS:%=clean-%) -TESTDIRS = $(DIRS:%=test-%) - -all: $(BUILDDIRS) -$(DIRS): $(BUILDDIRS) -$(BUILDDIRS): - $(MAKE) -C $(@:build-%=%) - -test: $(TESTDIRS) -$(TESTDIRS): - $(MAKE) -C $(@:test-%=%) test - -clean: $(CLEANDIRS) -$(CLEANDIRS): - $(MAKE) -C $(@:clean-%=%) clean - - -.PHONY: subdirs $(DIRS) -.PHONY: subdirs $(BUILDDIRS) -.PHONY: subdirs $(TESTDIRS) -.PHONY: subdirs $(CLEANDIRS) -.PHONY: all clean test - diff --git a/examples/hsa/common/Makefile b/examples/hsa/common/Makefile deleted file mode 100644 index 57075fe..0000000 --- a/examples/hsa/common/Makefile +++ /dev/null @@ -1,24 +0,0 @@ - -TEST_NAME=elf_utils -LFLAGS= -g -Wl,--unresolved-symbols=ignore-in-shared-libs -INCS += -I $(HSA_RUNTIME_PATH)/include -C_FILES := $(wildcard *.c) -OBJ_FILES := $(addprefix obj/, $(notdir $(C_FILES:.c=.o))) - -all: check-build-env $(OBJ_FILES) - -obj/%.o: %.c - $(CC) -c $(CFLAGS) $(INCS) -o $@ $< -std=c99 - -clean: - rm -rf obj/*o -test: - @echo 'Nothing to be done here' - -check-build-env: -ifndef HSA_RUNTIME_PATH - $(error HSA_RUNTIME_PATH is undefined) -endif - - - diff --git a/examples/hsa/common/elf_utils.c b/examples/hsa/common/elf_utils.c deleted file mode 100644 index d6989ff..0000000 --- a/examples/hsa/common/elf_utils.c +++ /dev/null @@ -1,221 +0,0 @@ -/* Copyright 2014 HSA Foundation Inc. All Rights Reserved. - * - * HSAF is granting you permission to use this software and documentation (if - * any) (collectively, the "Materials") pursuant to the terms and conditions - * of the Software License Agreement included with the Materials. If you do - * not have a copy of the Software License Agreement, contact the HSA Foundation for a copy. - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in the - * documentation and/or other materials provided with the distribution - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS - * FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE - * CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE SOFTWARE. - */ - -#include -#include -#include -#include -#include "hsa.h" -#include "elf_utils.h" -#include "hsa_ext_finalize.h" - -enum { - SECTION_HSA_DATA = 0, - SECTION_HSA_CODE, - SECTION_HSA_OPERAND, -}; - -typedef struct SectionDesc SectionDesc; -struct SectionDesc { - int sectionId; - const char *brigName; - const char *bifName; -} - -sectionDescs[] = { - { SECTION_HSA_DATA, "hsa_data",".brig_hsa_data" }, - { SECTION_HSA_CODE, "hsa_code",".brig_hsa_code" }, - { SECTION_HSA_OPERAND,"hsa_operand",".brig_hsa_operand"}, -}; - -extern int fileno(FILE* stream); - -const SectionDesc* get_section_desc(int sectionId) { - const int NUM_PREDEFINED_SECTIONS = sizeof(sectionDescs)/sizeof(sectionDescs[0]); - for(int i=0; id_buf + shdr->sh_name; - if (sectionName && - ((strcmp(sectionName, desc->brigName) == 0) || - (strcmp(sectionName, desc->bifName) == 0))) { - return scn; - } - } - - return NULL; -} - -/* Extract section and copy into HsaBrig */ -static status_t extract_section_and_copy (Elf *elfP, - Elf_Data *secHdr, - const SectionDesc* desc, - hsa_ext_brig_module_t* brig_module, - hsa_ext_brig_section_id_t section_id) { - Elf_Scn* scn = NULL; - Elf_Data* data = NULL; - void* address_to_copy; - size_t section_size=0; - - scn = extract_elf_section(elfP, secHdr, desc); - - if (scn) { - if ((data = elf_getdata(scn, NULL)) == NULL) { - return STATUS_UNKNOWN; - } - section_size = data->d_size; - if (section_size > 0) { - address_to_copy = malloc(section_size); - memcpy(address_to_copy, data->d_buf, section_size); - } - } - - if ((!scn || section_size == 0)) { - return STATUS_UNKNOWN; - } - - /* Create a section header */ - brig_module->section[section_id] = (hsa_ext_brig_section_header_t*) address_to_copy; - - return STATUS_SUCCESS; -} - -/* Reads binary of BRIG and BIF format */ -status_t read_binary(hsa_ext_brig_module_t **brig_module_t, FILE* binary) { - /* Create the brig_module */ - uint32_t number_of_sections = 3; - hsa_ext_brig_module_t* brig_module; - - brig_module = (hsa_ext_brig_module_t*) - (malloc (sizeof(hsa_ext_brig_module_t) + sizeof(void*)*number_of_sections)); - brig_module->section_count = number_of_sections; - - status_t status; - Elf* elfP = NULL; - Elf32_Ehdr* ehdr = NULL; - Elf_Data *secHdr = NULL; - Elf_Scn* scn = NULL; - int fd; - - if (elf_version ( EV_CURRENT ) == EV_NONE) { - return STATUS_KERNEL_ELF_INITIALIZATION_FAILED; - } - - fd = fileno(binary); - if ((elfP = elf_begin(fd, ELF_C_READ, (Elf *)0)) == NULL) { - return STATUS_KERNEL_INVALID_ELF_CONTAINER; - } - - if (elf_kind (elfP) != ELF_K_ELF) { - return STATUS_KERNEL_INVALID_ELF_CONTAINER; - } - - if (((ehdr = elf32_getehdr(elfP)) == NULL) || - ((scn = elf_getscn(elfP, ehdr->e_shstrndx)) == NULL) || - ((secHdr = elf_getdata(scn, NULL)) == NULL)) { - return STATUS_KERNEL_INVALID_SECTION_HEADER; - } - - status = extract_section_and_copy(elfP, - secHdr, - get_section_desc(SECTION_HSA_DATA), - brig_module, - HSA_EXT_BRIG_SECTION_DATA); - - if (status != STATUS_SUCCESS) { - return STATUS_KERNEL_MISSING_DATA_SECTION; - } - - status = extract_section_and_copy(elfP, - secHdr, - get_section_desc(SECTION_HSA_CODE), - brig_module, - HSA_EXT_BRIG_SECTION_CODE); - - if (status != STATUS_SUCCESS) { - return STATUS_KERNEL_MISSING_CODE_SECTION; - } - - status = extract_section_and_copy(elfP, - secHdr, - get_section_desc(SECTION_HSA_OPERAND), - brig_module, - HSA_EXT_BRIG_SECTION_OPERAND); - - if (status != STATUS_SUCCESS) { - return STATUS_KERNEL_MISSING_OPERAND_SECTION; - } - - elf_end(elfP); - *brig_module_t = brig_module; - - return STATUS_SUCCESS; -} - -status_t create_brig_module_from_brig_file(const char* file_name, hsa_ext_brig_module_t** brig_module) { - FILE *fp = fopen(file_name, "rb"); - - status_t status = read_binary(brig_module, fp); - - if (status != STATUS_SUCCESS) { - printf("Could not create BRIG module: %d\n", status); - if (status == STATUS_KERNEL_INVALID_SECTION_HEADER || - status == STATUS_KERNEL_ELF_INITIALIZATION_FAILED || - status == STATUS_KERNEL_INVALID_ELF_CONTAINER) { - printf("The ELF file is invalid or possibley corrupted.\n"); - } - if (status == STATUS_KERNEL_MISSING_DATA_SECTION || - status == STATUS_KERNEL_MISSING_CODE_SECTION || - status == STATUS_KERNEL_MISSING_OPERAND_SECTION) { - printf("One or more ELF sections are missing. Use readelf command to \ - to check if hsa_data, hsa_code and hsa_operands exist.\n"); - } - } - - fclose(fp); - return status; -} - -void destroy_brig_module(hsa_ext_brig_module_t* brig_module) { - for (int i=0; isection_count; i++) { - free (brig_module->section[i]); - } - free (brig_module); -} diff --git a/examples/hsa/common/elf_utils.h b/examples/hsa/common/elf_utils.h deleted file mode 100644 index 365a1a3..0000000 --- a/examples/hsa/common/elf_utils.h +++ /dev/null @@ -1,41 +0,0 @@ -/* Copyright 2014 HSA Foundation Inc. All Rights Reserved. - * - * HSAF is granting you permission to use this software and documentation (if - * any) (collectively, the "Materials") pursuant to the terms and conditions - * of the Software License Agreement included with the Materials. If you do - * not have a copy of the Software License Agreement, contact the HSA Foundation for a copy. - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in the - * documentation and/or other materials provided with the distribution - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS - * FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE - * CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE SOFTWARE. - */ - -#pragma once - -#include "hsa_ext_finalize.h" - -typedef enum status_t status_t; -enum status_t { - STATUS_SUCCESS=0, - STATUS_KERNEL_INVALID_SECTION_HEADER=1, - STATUS_KERNEL_ELF_INITIALIZATION_FAILED=2, - STATUS_KERNEL_INVALID_ELF_CONTAINER=3, - STATUS_KERNEL_MISSING_DATA_SECTION=4, - STATUS_KERNEL_MISSING_CODE_SECTION=5, - STATUS_KERNEL_MISSING_OPERAND_SECTION=6, - STATUS_UNKNOWN=7, -}; - -status_t create_brig_module_from_brig_file(const char* file_name, hsa_ext_brig_module_t** brig_module); - -void destroy_brig_module(hsa_ext_brig_module_t* brig_module); diff --git a/examples/hsa/common/obj/.gitignore b/examples/hsa/common/obj/.gitignore deleted file mode 100644 index 5e7d273..0000000 --- a/examples/hsa/common/obj/.gitignore +++ /dev/null @@ -1,4 +0,0 @@ -# Ignore everything in this directory -* -# Except this file -!.gitignore diff --git a/examples/hsa/vector_copy/Makefile b/examples/hsa/vector_copy/Makefile index 2e1ac14..5b9fab9 100644 --- a/examples/hsa/vector_copy/Makefile +++ b/examples/hsa/vector_copy/Makefile @@ -8,16 +8,13 @@ INCS = -I $(HSA_RUNTIME_PATH)/include C_FILES := $(wildcard *.c) OBJ_FILES := $(addprefix obj/, $(notdir $(C_FILES:.c=.o))) -COMMON_C_FILES := $(wildcard ../common/*.c) -COMMON_OBJ_FILES := $(addprefix ../common/obj/, $(notdir $(COMMON_C_FILES:.c=.o))) - all: $(TEST_NAME) $(TEST_NAME).brig $(TEST_NAME): $(OBJ_FILES) $(COMMON_OBJ_FILES) - $(CC) $(LFLAGS) $(COMMON_OBJ_FILES) $(OBJ_FILES) -lelf -L$(HSA_RUNTIME_PATH)/lib -lhsa-runtime64 -o $(TEST_NAME) + $(CC) $(LFLAGS) $(OBJ_FILES) -L$(HSA_RUNTIME_PATH)/lib -lhsa-runtime64 -o $(TEST_NAME) $(TEST_NAME).brig : - cloc $(TEST_NAME).cl + cloc.sh $(TEST_NAME).cl obj/%.o: %.c $(CC) -c $(CFLAGS) $(INCS) -o $@ $< -std=c99 diff --git a/examples/hsa/vector_copy/vector_copy.c b/examples/hsa/vector_copy/vector_copy.c index 1236195..5473d51 100644 --- a/examples/hsa/vector_copy/vector_copy.c +++ b/examples/hsa/vector_copy/vector_copy.c @@ -26,7 +26,9 @@ #include #include "hsa.h" #include "hsa_ext_finalize.h" -#include "../common/elf_utils.h" + +#define GLOBAL_SIZE 1024*1024 +#define LOCAL_SIZE 512 #define check(msg, status) \ if (status != HSA_STATUS_SUCCESS) { \ @@ -36,80 +38,51 @@ if (status != HSA_STATUS_SUCCESS) { \ printf("%s succeeded.\n", #msg); \ } -#define GRID_SIZE_X 1024*1024 -#define GROUP_SIZE_X 256 - /* - * Define required BRIG data structures. + * Loads a BRIG module from a specified file. This + * function does not validate the module. */ +int load_module_from_file(const char* file_name, hsa_ext_module_t* module) { + int rc = -1; + + FILE *fp = fopen(file_name, "rb"); + + rc = fseek(fp, 0, SEEK_END); + + size_t file_size = (size_t) (ftell(fp) * sizeof(char)); + + rc = fseek(fp, 0, SEEK_SET); + + char* buf = (char*) malloc(file_size); + + memset(buf,0,file_size); + + size_t read_size = fread(buf,sizeof(char),file_size,fp); -typedef uint32_t BrigCodeOffset32_t; - -typedef uint32_t BrigDataOffset32_t; - -typedef uint16_t BrigKinds16_t; - -typedef uint8_t BrigLinkage8_t; - -typedef uint8_t BrigExecutableModifier8_t; - -typedef BrigDataOffset32_t BrigDataOffsetString32_t; - -enum BrigKinds { - BRIG_KIND_NONE = 0x0000, - BRIG_KIND_DIRECTIVE_BEGIN = 0x1000, - BRIG_KIND_DIRECTIVE_KERNEL = 0x1008, -}; - -typedef struct BrigBase BrigBase; -struct BrigBase { - uint16_t byteCount; - BrigKinds16_t kind; -}; - -typedef struct BrigExecutableModifier BrigExecutableModifier; -struct BrigExecutableModifier { - BrigExecutableModifier8_t allBits; -}; - -typedef struct BrigDirectiveExecutable BrigDirectiveExecutable; -struct BrigDirectiveExecutable { - uint16_t byteCount; - BrigKinds16_t kind; - BrigDataOffsetString32_t name; - uint16_t outArgCount; - uint16_t inArgCount; - BrigCodeOffset32_t firstInArg; - BrigCodeOffset32_t firstCodeBlockEntry; - BrigCodeOffset32_t nextModuleEntry; - uint32_t codeBlockEntryCount; - BrigExecutableModifier modifier; - BrigLinkage8_t linkage; - uint16_t reserved; -}; - -typedef struct BrigData BrigData; -struct BrigData { - uint32_t byteCount; - uint8_t bytes[1]; -}; + if(read_size != file_size) { + free(buf); + } else { + rc = 0; + *module = (hsa_ext_module_t) buf; + } + + fclose(fp); + + return rc; +} /* * Determines if the given agent is of type HSA_DEVICE_TYPE_GPU * and sets the value of data to the agent handle if it is. */ -static hsa_status_t find_gpu(hsa_agent_t agent, void *data) { - if (data == NULL) { - return HSA_STATUS_ERROR_INVALID_ARGUMENT; - } +static hsa_status_t get_gpu_agent(hsa_agent_t agent, void *data) { + hsa_status_t status; hsa_device_type_t device_type; - hsa_status_t stat = - hsa_agent_get_info(agent, HSA_AGENT_INFO_DEVICE, &device_type); - if (stat != HSA_STATUS_SUCCESS) { - return stat; - } - if (device_type == HSA_DEVICE_TYPE_GPU) { - *((hsa_agent_t *)data) = agent; + status = hsa_agent_get_info(agent, HSA_AGENT_INFO_DEVICE, &device_type); + if (HSA_STATUS_SUCCESS == status && HSA_DEVICE_TYPE_GPU == device_type) { + hsa_agent_t* ret = (hsa_agent_t*)data; + *ret = agent; + return HSA_STATUS_INFO_BREAK; } return HSA_STATUS_SUCCESS; } @@ -118,63 +91,25 @@ static hsa_status_t find_gpu(hsa_agent_t agent, void *data) { * Determines if a memory region can be used for kernarg * allocations. */ -static hsa_status_t get_kernarg(hsa_region_t region, void* data) { - hsa_region_flag_t flags; - hsa_region_get_info(region, HSA_REGION_INFO_FLAGS, &flags); - if (flags & HSA_REGION_FLAG_KERNARG) { - hsa_region_t* ret = (hsa_region_t*) data; - *ret = region; +static hsa_status_t get_kernarg_memory_region(hsa_region_t region, void* data) { + hsa_region_segment_t segment; + hsa_region_get_info(region, HSA_REGION_INFO_SEGMENT, &segment); + if (HSA_REGION_SEGMENT_GLOBAL != segment) { return HSA_STATUS_SUCCESS; } - return HSA_STATUS_SUCCESS; -} -/* - * Finds the specified symbols offset in the specified brig_module. - * If the symbol is found the function returns HSA_STATUS_SUCCESS, - * otherwise it returns HSA_STATUS_ERROR. - */ -hsa_status_t find_symbol_offset(hsa_ext_brig_module_t* brig_module, - char* symbol_name, - hsa_ext_brig_code_section_offset32_t* offset) { - - /* - * Get the data section - */ - hsa_ext_brig_section_header_t* data_section_header = - brig_module->section[HSA_EXT_BRIG_SECTION_DATA]; - /* - * Get the code section - */ - hsa_ext_brig_section_header_t* code_section_header = - brig_module->section[HSA_EXT_BRIG_SECTION_CODE]; - - /* - * First entry into the BRIG code section - */ - BrigCodeOffset32_t code_offset = code_section_header->header_byte_count; - BrigBase* code_entry = (BrigBase*) ((char*)code_section_header + code_offset); - while (code_offset != code_section_header->byte_count) { - if (code_entry->kind == BRIG_KIND_DIRECTIVE_KERNEL) { - /* - * Now find the data in the data section - */ - BrigDirectiveExecutable* directive_kernel = (BrigDirectiveExecutable*) (code_entry); - BrigDataOffsetString32_t data_name_offset = directive_kernel->name; - BrigData* data_entry = (BrigData*)((char*) data_section_header + data_name_offset); - if (!strncmp(symbol_name, (char*)data_entry->bytes, strlen(symbol_name))){ - *offset = code_offset; - return HSA_STATUS_SUCCESS; - } - } - code_offset += code_entry->byteCount; - code_entry = (BrigBase*) ((char*)code_section_header + code_offset); + hsa_region_global_flag_t flags; + hsa_region_get_info(region, HSA_REGION_INFO_GLOBAL_FLAGS, &flags); + if (flags & HSA_REGION_GLOBAL_FLAG_KERNARG) { + hsa_region_t* ret = (hsa_region_t*) data; + *ret = region; + return HSA_STATUS_INFO_BREAK; } - return HSA_STATUS_ERROR; + + return HSA_STATUS_SUCCESS; } -int main(int argc, char **argv) -{ +int main(int argc, char **argv) { hsa_status_t err; err = hsa_init(); @@ -182,194 +117,227 @@ int main(int argc, char **argv) /* * Iterate over the agents and pick the gpu agent using - * the find_gpu callback. + * the get_gpu_agent callback. */ - hsa_agent_t device = 0; - err = hsa_iterate_agents(find_gpu, &device); - check(Calling hsa_iterate_agents, err); - - err = (device == 0) ? HSA_STATUS_ERROR : HSA_STATUS_SUCCESS; - check(Checking if the GPU device is non-zero, err); + hsa_agent_t agent; + err = hsa_iterate_agents(get_gpu_agent, &agent); + if(err == HSA_STATUS_INFO_BREAK) { err = HSA_STATUS_SUCCESS; } + check(Getting a gpu agent, err); /* - * Query the name of the device. + * Query the name of the agent. */ char name[64] = { 0 }; - err = hsa_agent_get_info(device, HSA_AGENT_INFO_NAME, name); - check(Querying the device name, err); - printf("The device name is %s.\n", name); + err = hsa_agent_get_info(agent, HSA_AGENT_INFO_NAME, name); + check(Querying the agent name, err); + printf("The agent name is %s.\n", name); /* * Query the maximum size of the queue. */ uint32_t queue_size = 0; - err = hsa_agent_get_info(device, HSA_AGENT_INFO_QUEUE_MAX_SIZE, &queue_size); - check(Querying the device maximum queue size, err); + err = hsa_agent_get_info(agent, HSA_AGENT_INFO_QUEUE_MAX_SIZE, &queue_size); + check(Querying the agent maximum queue size, err); printf("The maximum queue size is %u.\n", (unsigned int) queue_size); /* * Create a queue using the maximum size. */ - hsa_queue_t* commandQueue; - err = hsa_queue_create(device, queue_size, HSA_QUEUE_TYPE_MULTI, NULL, NULL, &commandQueue); + hsa_queue_t* queue; + err = hsa_queue_create(agent, queue_size, HSA_QUEUE_TYPE_SINGLE, NULL, NULL, UINT32_MAX, UINT32_MAX, &queue); check(Creating the queue, err); /* - * Load BRIG, encapsulated in an ELF container, into a BRIG module. + * Load the BRIG binary. */ - hsa_ext_brig_module_t* brigModule; - char file_name[128] = "vector_copy.brig"; - err = create_brig_module_from_brig_file(file_name, &brigModule); - check(Creating the brig module from vector_copy.brig, err); + hsa_ext_module_t module; + load_module_from_file("vector_copy.brig",&module); /* * Create hsa program. */ - hsa_ext_program_handle_t hsaProgram; - err = hsa_ext_program_create(&device, 1, HSA_EXT_BRIG_MACHINE_LARGE, HSA_EXT_BRIG_PROFILE_FULL, &hsaProgram); - check(Creating the hsa program, err); + hsa_ext_program_t program; + memset(&program,0,sizeof(hsa_ext_program_t)); + err = hsa_ext_program_create(HSA_MACHINE_MODEL_LARGE, HSA_PROFILE_FULL, HSA_DEFAULT_FLOAT_ROUNDING_MODE_DEFAULT, NULL, &program); + check(Create the program, err); /* * Add the BRIG module to hsa program. */ - hsa_ext_brig_module_handle_t module; - err = hsa_ext_add_module(hsaProgram, brigModule, &module); + err = hsa_ext_program_add_module(program, module); check(Adding the brig module to the program, err); - /* - * Construct finalization request list. + /* + * Determine the agents ISA. */ - hsa_ext_finalization_request_t finalization_request_list; - finalization_request_list.module = module; - finalization_request_list.program_call_convention = 0; - char kernel_name[128] = "&__OpenCL_vector_copy_kernel"; - err = find_symbol_offset(brigModule, kernel_name, &finalization_request_list.symbol); - check(Finding the symbol offset for the kernel, err); + hsa_isa_t isa; + err = hsa_agent_get_info(agent, HSA_AGENT_INFO_ISA, &isa); + check(Query the agents isa, err); /* - * Finalize the hsa program. + * Finalize the program and extract the code object. */ - err = hsa_ext_finalize_program(hsaProgram, device, 1, &finalization_request_list, NULL, NULL, 0, NULL, 0); + hsa_ext_control_directives_t control_directives; + memset(&control_directives, 0, sizeof(hsa_ext_control_directives_t)); + hsa_code_object_t code_object; + err = hsa_ext_program_finalize(program, isa, 0, control_directives, "", HSA_CODE_OBJECT_TYPE_PROGRAM, &code_object); check(Finalizing the program, err); /* - * Get the hsa code descriptor address. + * Destroy the program, it is no longer needed. */ - hsa_ext_code_descriptor_t *hsaCodeDescriptor; - err = hsa_ext_query_kernel_descriptor_address(hsaProgram, module, finalization_request_list.symbol, &hsaCodeDescriptor); - check(Querying the kernel descriptor address, err); + err=hsa_ext_program_destroy(program); + check(Destroying the program, err); /* - * Create a signal to wait for the dispatch to finish. - */ - hsa_signal_t signal; - err=hsa_signal_create(1, 0, NULL, &signal); - check(Creating a HSA signal, err); + * Create the empty executable. + */ + hsa_executable_t executable; + err = hsa_executable_create(HSA_PROFILE_FULL, HSA_EXECUTABLE_STATE_UNFROZEN, "", &executable); + check(Create the executable, err); /* - * Initialize the dispatch packet. + * Load the code object. */ - hsa_dispatch_packet_t aql; - memset(&aql, 0, sizeof(aql)); + err = hsa_executable_load_code_object(executable, agent, code_object, ""); + check(Loading the code object, err); /* - * Setup the dispatch information. + * Freeze the executable; it can now be queried for symbols. */ - aql.completion_signal=signal; - aql.dimensions=1; - aql.workgroup_size_x=GROUP_SIZE_X; - aql.workgroup_size_y=1; - aql.workgroup_size_z=1; - aql.grid_size_x=GRID_SIZE_X; - aql.grid_size_y=1; - aql.grid_size_z=1; - aql.header.type=HSA_PACKET_TYPE_DISPATCH; - aql.header.acquire_fence_scope=2; - aql.header.release_fence_scope=2; - aql.header.barrier=1; - aql.group_segment_size= hsaCodeDescriptor->workgroup_group_segment_byte_size; - aql.private_segment_size= hsaCodeDescriptor->workitem_private_segment_byte_size; - + err = hsa_executable_freeze(executable, ""); + check(Freeze the executable, err); + + /* + * Extract the symbol from the executable. + */ + hsa_executable_symbol_t symbol; + err = hsa_executable_get_symbol(executable, NULL, "&__OpenCL_vector_copy_kernel", agent, 0, &symbol); + check(Extract the symbol from the executable, err); + + /* + * Extract dispatch information from the symbol + */ + uint64_t kernel_object; + uint32_t kernarg_segment_size; + uint32_t group_segment_size; + uint32_t private_segment_size; + err = hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, &kernel_object); + check(Extracting the symbol from the executable, err); + err = hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE, &kernarg_segment_size); + check(Extracting the kernarg segment size from the executable, err); + err = hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_GROUP_SEGMENT_SIZE, &group_segment_size); + check(Extracting the group segment size from the executable, err); + err = hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_PRIVATE_SEGMENT_SIZE, &private_segment_size); + check(Extracting the private segment from the executable, err); + + /* + * Create a signal to wait for the dispatch to finish. + */ + hsa_signal_t signal; + err=hsa_signal_create(1, 0, NULL, &signal); + check(Creating a HSA signal, err); + /* - * Allocate and initialize the kernel arguments. + * Allocate and initialize the kernel arguments and data. */ - uint64_t total_buffer_size = GRID_SIZE_X * sizeof(int); - int* in=(int*)malloc(total_buffer_size); - memset(in, 1, total_buffer_size); - err=hsa_memory_register(in, total_buffer_size); + char* in=(char*)malloc(GLOBAL_SIZE*4); + memset(in, 1, GLOBAL_SIZE*4); + err=hsa_memory_register(in, GLOBAL_SIZE*4); check(Registering argument memory for input parameter, err); - int* out=(int*)malloc(total_buffer_size); - memset(out, 0, total_buffer_size); - err=hsa_memory_register(out, total_buffer_size); + + char* out=(char*)malloc(GLOBAL_SIZE*4); + memset(out, 0, GLOBAL_SIZE*4); + err=hsa_memory_register(out, GLOBAL_SIZE*4); check(Registering argument memory for output parameter, err); + + struct __attribute__ ((aligned(16))) args_t { + uint64_t global_offset_0; + uint64_t global_offset_1; + uint64_t global_offset_2; + uint64_t printf_buffer; + uint64_t vqueue_pointer; + uint64_t aqlwrap_pointer; + void* in; + void* out; + } args; + memset(&args, 0, sizeof(args)); + args.in=in; + args.out=out; + /* * Find a memory region that supports kernel arguments. */ - hsa_region_t kernarg_region = 0; - hsa_agent_iterate_regions(device, get_kernarg, &kernarg_region); - err = (kernarg_region == 0) ? HSA_STATUS_ERROR : HSA_STATUS_SUCCESS; + hsa_region_t kernarg_region; + kernarg_region.handle=(uint64_t)-1; + hsa_agent_iterate_regions(agent, get_kernarg_memory_region, &kernarg_region); + err = (kernarg_region.handle == (uint64_t)-1) ? HSA_STATUS_ERROR : HSA_STATUS_SUCCESS; check(Finding a kernarg memory region, err); - void* kernel_arg_buffer = NULL; - - size_t kernel_arg_buffer_size = hsaCodeDescriptor->kernarg_segment_byte_size; + void* kernarg_address = NULL; + /* * Allocate the kernel argument buffer from the correct region. */ - err = hsa_memory_allocate(kernarg_region, kernel_arg_buffer_size, - &kernel_arg_buffer); + err = hsa_memory_allocate(kernarg_region, kernarg_segment_size, &kernarg_address); check(Allocating kernel argument memory buffer, err); - uint64_t kernel_arg_start_offset = 0; -#ifdef DUMMY_ARGS - //This flags should be set if HSA_HLC_Stable is used - // This is because the high level compiler generates 6 extra args - kernel_arg_start_offset += sizeof(uint64_t) * 6; - printf("Using dummy args \n"); -#endif - memset(kernel_arg_buffer, 0, kernel_arg_buffer_size); - void *kernel_arg_buffer_start = - (char*)kernel_arg_buffer + kernel_arg_start_offset; - memcpy(kernel_arg_buffer_start, &in, sizeof(void*)); - memcpy(kernel_arg_buffer_start + sizeof(void*), &out, sizeof(void*)); + memcpy(kernarg_address, &args, sizeof(args)); - /* - * Bind kernel code and the kernel argument buffer to the - * aql packet. - */ - aql.kernel_object_address=hsaCodeDescriptor->code.handle; - aql.kernarg_address=(uint64_t)kernel_arg_buffer; - /* * Obtain the current queue write index. */ - uint64_t index = hsa_queue_load_write_index_relaxed(commandQueue); + uint64_t index = hsa_queue_load_write_index_relaxed(queue); /* * Write the aql packet at the calculated queue index address. */ - const uint32_t queueMask = commandQueue->size - 1; - ((hsa_dispatch_packet_t*)(commandQueue->base_address))[index&queueMask]=aql; + const uint32_t queueMask = queue->size - 1; + hsa_kernel_dispatch_packet_t* dispatch_packet = &(((hsa_kernel_dispatch_packet_t*)(queue->base_address))[index&queueMask]); + + dispatch_packet->header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE; + dispatch_packet->header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE; + dispatch_packet->setup |= 1 << HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS; + dispatch_packet->workgroup_size_x = (uint16_t)LOCAL_SIZE; + dispatch_packet->workgroup_size_y = (uint16_t)1; + dispatch_packet->workgroup_size_z = (uint16_t)1; + dispatch_packet->grid_size_x = (uint32_t) (GLOBAL_SIZE); + dispatch_packet->grid_size_y = 1; + dispatch_packet->grid_size_z = 1; + dispatch_packet->completion_signal = signal; + dispatch_packet->kernel_object = kernel_object; + dispatch_packet->kernarg_address = (void*) kernarg_address; + dispatch_packet->private_segment_size = private_segment_size; + dispatch_packet->group_segment_size = group_segment_size; + __atomic_store_n((uint8_t*)(&dispatch_packet->header), (uint8_t)HSA_PACKET_TYPE_KERNEL_DISPATCH, __ATOMIC_RELEASE); /* * Increment the write index and ring the doorbell to dispatch the kernel. */ - hsa_queue_store_write_index_relaxed(commandQueue, index+1); - hsa_signal_store_relaxed(commandQueue->doorbell_signal, index); + hsa_queue_store_write_index_relaxed(queue, index+1); + hsa_signal_store_relaxed(queue->doorbell_signal, index); check(Dispatching the kernel, err); /* - * Wait on the dispatch signal until the kernel is finished. + * Wait on the dispatch completion signal until the kernel is finished. */ - err = hsa_signal_wait_acquire(signal, HSA_LT, 1, (uint64_t) -1, HSA_WAIT_EXPECTANCY_UNKNOWN); - check(Wating on the dispatch signal, err); + hsa_signal_value_t value = hsa_signal_wait_acquire(signal, HSA_SIGNAL_CONDITION_LT, 1, UINT64_MAX, HSA_WAIT_STATE_BLOCKED); /* * Validate the data in the output buffer. */ - int valid = memcmp(in, out, total_buffer_size); - if(!valid) { + int valid=1; + int fail_index=0; + for(int i=0; i #include "hsa.h" #include "hsa_ext_finalize.h" -#include "../common/elf_utils.h" + +#define GRID_SIZE_X 1024*1024 +#define GROUP_SIZE_X 512 + #define check(msg, status) \ if (status != HSA_STATUS_SUCCESS) { \ printf("%s failed.\n", #msg); \ @@ -35,80 +38,51 @@ if (status != HSA_STATUS_SUCCESS) { \ printf("%s succeeded.\n", #msg); \ } -#define GRID_SIZE_X 2048 -#define GROUP_SIZE_X 256 - /* - * Define required BRIG data structures. + * Loads a BRIG module from a specified file. This + * function does not validate the module. */ +int load_module_from_file(const char* file_name, hsa_ext_module_t* module) { + int rc = -1; + + FILE *fp = fopen(file_name, "rb"); + + rc = fseek(fp, 0, SEEK_END); + + size_t file_size = (size_t) (ftell(fp) * sizeof(char)); + + rc = fseek(fp, 0, SEEK_SET); + + char* buf = (char*) malloc(file_size); + + memset(buf,0,file_size); -typedef uint32_t BrigCodeOffset32_t; - -typedef uint32_t BrigDataOffset32_t; - -typedef uint16_t BrigKinds16_t; - -typedef uint8_t BrigLinkage8_t; - -typedef uint8_t BrigExecutableModifier8_t; - -typedef BrigDataOffset32_t BrigDataOffsetString32_t; - -enum BrigKinds { - BRIG_KIND_NONE = 0x0000, - BRIG_KIND_DIRECTIVE_BEGIN = 0x1000, - BRIG_KIND_DIRECTIVE_KERNEL = 0x1008, -}; - -typedef struct BrigBase BrigBase; -struct BrigBase { - uint16_t byteCount; - BrigKinds16_t kind; -}; - -typedef struct BrigExecutableModifier BrigExecutableModifier; -struct BrigExecutableModifier { - BrigExecutableModifier8_t allBits; -}; - -typedef struct BrigDirectiveExecutable BrigDirectiveExecutable; -struct BrigDirectiveExecutable { - uint16_t byteCount; - BrigKinds16_t kind; - BrigDataOffsetString32_t name; - uint16_t outArgCount; - uint16_t inArgCount; - BrigCodeOffset32_t firstInArg; - BrigCodeOffset32_t firstCodeBlockEntry; - BrigCodeOffset32_t nextModuleEntry; - uint32_t codeBlockEntryCount; - BrigExecutableModifier modifier; - BrigLinkage8_t linkage; - uint16_t reserved; -}; - -typedef struct BrigData BrigData; -struct BrigData { - uint32_t byteCount; - uint8_t bytes[1]; -}; + size_t read_size = fread(buf,sizeof(char),file_size,fp); + + if(read_size != file_size) { + free(buf); + } else { + rc = 0; + *module = (hsa_ext_module_t) buf; + } + + fclose(fp); + + return rc; +} /* * Determines if the given agent is of type HSA_DEVICE_TYPE_GPU * and sets the value of data to the agent handle if it is. */ -static hsa_status_t find_gpu(hsa_agent_t agent, void *data) { - if (data == NULL) { - return HSA_STATUS_ERROR_INVALID_ARGUMENT; - } +static hsa_status_t get_gpu_agent(hsa_agent_t agent, void *data) { + hsa_status_t status; hsa_device_type_t device_type; - hsa_status_t stat = - hsa_agent_get_info(agent, HSA_AGENT_INFO_DEVICE, &device_type); - if (stat != HSA_STATUS_SUCCESS) { - return stat; - } - if (device_type == HSA_DEVICE_TYPE_GPU) { - *((hsa_agent_t *)data) = agent; + status = hsa_agent_get_info(agent, HSA_AGENT_INFO_DEVICE, &device_type); + if (HSA_STATUS_SUCCESS == status && HSA_DEVICE_TYPE_GPU == device_type) { + hsa_agent_t* ret = (hsa_agent_t*)data; + *ret = agent; + return HSA_STATUS_INFO_BREAK; } return HSA_STATUS_SUCCESS; } @@ -117,60 +91,24 @@ static hsa_status_t find_gpu(hsa_agent_t agent, void *data) { * Determines if a memory region can be used for kernarg * allocations. */ -static hsa_status_t get_kernarg(hsa_region_t region, void* data) { - hsa_region_flag_t flags; - hsa_region_get_info(region, HSA_REGION_INFO_FLAGS, &flags); - if (flags & HSA_REGION_FLAG_KERNARG) { +static hsa_status_t get_kernarg_memory_region(hsa_region_t region, void* data) { + hsa_region_segment_t segment; + hsa_region_get_info(region, HSA_REGION_INFO_SEGMENT, &segment); + if (HSA_REGION_SEGMENT_GLOBAL != segment) { + return HSA_STATUS_SUCCESS; + } + + hsa_region_global_flag_t flags; + hsa_region_get_info(region, HSA_REGION_INFO_GLOBAL_FLAGS, &flags); + if (flags & HSA_REGION_GLOBAL_FLAG_KERNARG) { hsa_region_t* ret = (hsa_region_t*) data; *ret = region; - return HSA_STATUS_SUCCESS; + return HSA_STATUS_INFO_BREAK; } + return HSA_STATUS_SUCCESS; } -/* - * Finds the specified symbols offset in the specified brig_module. - * If the symbol is found the function returns HSA_STATUS_SUCCESS, - * otherwise it returns HSA_STATUS_ERROR. - */ -hsa_status_t find_symbol_offset(hsa_ext_brig_module_t* brig_module, - char* symbol_name, - hsa_ext_brig_code_section_offset32_t* offset) { - - /* - * Get the data section - */ - hsa_ext_brig_section_header_t* data_section_header = - brig_module->section[HSA_EXT_BRIG_SECTION_DATA]; - /* - * Get the code section - */ - hsa_ext_brig_section_header_t* code_section_header = - brig_module->section[HSA_EXT_BRIG_SECTION_CODE]; - - /* - * First entry into the BRIG code section - */ - BrigCodeOffset32_t code_offset = code_section_header->header_byte_count; - BrigBase* code_entry = (BrigBase*) ((char*)code_section_header + code_offset); - while (code_offset != code_section_header->byte_count) { - if (code_entry->kind == BRIG_KIND_DIRECTIVE_KERNEL) { - /* - * Now find the data in the data section - */ - BrigDirectiveExecutable* directive_kernel = (BrigDirectiveExecutable*) (code_entry); - BrigDataOffsetString32_t data_name_offset = directive_kernel->name; - BrigData* data_entry = (BrigData*)((char*) data_section_header + data_name_offset); - if (!strncmp(symbol_name, (char*)data_entry->bytes, strlen(symbol_name))){ - *offset = code_offset; - return HSA_STATUS_SUCCESS; - } - } - code_offset += code_entry->byteCount; - code_entry = (BrigBase*) ((char*)code_section_header + code_offset); - } - return HSA_STATUS_ERROR; -} int main(int argc, char **argv) { @@ -181,106 +119,120 @@ int main(int argc, char **argv) /* * Iterate over the agents and pick the gpu agent using - * the find_gpu callback. + * the get_gpu_agent callback. */ - hsa_agent_t device = 0; - err = hsa_iterate_agents(find_gpu, &device); - check(Calling hsa_iterate_agents, err); - - err = (device == 0) ? HSA_STATUS_ERROR : HSA_STATUS_SUCCESS; - check(Checking if the GPU device is non-zero, err); + hsa_agent_t agent; + err = hsa_iterate_agents(get_gpu_agent, &agent); + if(err == HSA_STATUS_INFO_BREAK) { err = HSA_STATUS_SUCCESS; } + check(Getting a gpu agent, err); /* - * Query the name of the device. + * Query the name of the agent. */ char name[64] = { 0 }; - err = hsa_agent_get_info(device, HSA_AGENT_INFO_NAME, name); - check(Querying the device name, err); - printf("The device name is %s.\n", name); + err = hsa_agent_get_info(agent, HSA_AGENT_INFO_NAME, name); + check(Querying the agent name, err); + printf("The agent name is %s.\n", name); /* * Query the maximum size of the queue. */ uint32_t queue_size = 0; - err = hsa_agent_get_info(device, HSA_AGENT_INFO_QUEUE_MAX_SIZE, &queue_size); - check(Querying the device maximum queue size, err); + err = hsa_agent_get_info(agent, HSA_AGENT_INFO_QUEUE_MAX_SIZE, &queue_size); + check(Querying the agent maximum queue size, err); printf("The maximum queue size is %u.\n", (unsigned int) queue_size); /* * Create a queue using the maximum size. */ - hsa_queue_t* commandQueue; - err = hsa_queue_create(device, queue_size, HSA_QUEUE_TYPE_MULTI, NULL, NULL, &commandQueue); + hsa_queue_t* queue; + err = hsa_queue_create(agent, queue_size, HSA_QUEUE_TYPE_SINGLE, NULL, NULL, UINT32_MAX, UINT32_MAX, &queue); check(Creating the queue, err); /* - * Load BRIG, encapsulated in an ELF container, into a BRIG module. + * Load the BRIG binary. */ - hsa_ext_brig_module_t* brigModule; - char file_name[128] = "vector_copy.brig"; - err = create_brig_module_from_brig_file(file_name, &brigModule); - check(Creating the brig module from vector_copy.brig, err); + hsa_ext_module_t module; + load_module_from_file("vector_copy.brig",&module); /* * Create hsa program. */ - hsa_ext_program_handle_t hsaProgram; - err = hsa_ext_program_create(&device, 1, HSA_EXT_BRIG_MACHINE_LARGE, HSA_EXT_BRIG_PROFILE_FULL, &hsaProgram); - check(Creating the hsa program, err); + hsa_ext_program_t program; + memset(&program,0,sizeof(hsa_ext_program_t)); + err = hsa_ext_program_create(HSA_MACHINE_MODEL_LARGE, HSA_PROFILE_FULL, HSA_DEFAULT_FLOAT_ROUNDING_MODE_DEFAULT, NULL, &program); + check(Create the program, err); /* * Add the BRIG module to hsa program. */ - hsa_ext_brig_module_handle_t module; - err = hsa_ext_add_module(hsaProgram, brigModule, &module); + err = hsa_ext_program_add_module(program, module); check(Adding the brig module to the program, err); - /* - * Construct finalization request list. + /* + * Determine the agents ISA. */ - hsa_ext_finalization_request_t finalization_request_list; - finalization_request_list.module = module; - finalization_request_list.program_call_convention = 0; - char kernel_name[128] = "&__OpenCL_vector_copy_kernel"; - err = find_symbol_offset(brigModule, kernel_name, &finalization_request_list.symbol); - check(Finding the symbol offset for the kernel, err); + hsa_isa_t isa; + err = hsa_agent_get_info(agent, HSA_AGENT_INFO_ISA, &isa); + check(Query the agents isa, err); /* - * Finalize the hsa program. + * Finalize the program and extract the code object. */ - err = hsa_ext_finalize_program(hsaProgram, device, 1, &finalization_request_list, NULL, NULL, 0, NULL, 0); + hsa_ext_control_directives_t control_directives; + memset(&control_directives, 0, sizeof(hsa_ext_control_directives_t)); + hsa_code_object_t code_object; + err = hsa_ext_program_finalize(program, isa, 0, control_directives, "", HSA_CODE_OBJECT_TYPE_PROGRAM, &code_object); check(Finalizing the program, err); /* - * Get the hsa code descriptor address. + * Destroy the program, it is no longer needed. */ - hsa_ext_code_descriptor_t *hsaCodeDescriptor; - err = hsa_ext_query_kernel_descriptor_address(hsaProgram, module, finalization_request_list.symbol, &hsaCodeDescriptor); - check(Querying the kernel descriptor address, err); + err=hsa_ext_program_destroy(program); + check(Destroying the program, err); + /* + * Create the empty executable. + */ + hsa_executable_t executable; + err = hsa_executable_create(HSA_PROFILE_FULL, HSA_EXECUTABLE_STATE_UNFROZEN, "", &executable); + check(Create the executable, err); /* - * Initialize the dispatch packet. + * Load the code object. */ - hsa_dispatch_packet_t aql; - memset(&aql, 0, sizeof(aql)); + err = hsa_executable_load_code_object(executable, agent, code_object, ""); + check(Loading the code object, err); /* - * Setup the dispatch information. + * Freeze the executable; it can now be queried for symbols. */ - aql.dimensions=1; - aql.workgroup_size_x=GROUP_SIZE_X; - aql.workgroup_size_y=1; - aql.workgroup_size_z=1; - aql.grid_size_x=GRID_SIZE_X/2; - aql.grid_size_y=1; - aql.grid_size_z=1; - aql.header.type=HSA_PACKET_TYPE_DISPATCH; - aql.header.acquire_fence_scope=2; - aql.header.release_fence_scope=2; - aql.header.barrier=1; - aql.group_segment_size= hsaCodeDescriptor->workgroup_group_segment_byte_size; - aql.private_segment_size= hsaCodeDescriptor->workitem_private_segment_byte_size; + err = hsa_executable_freeze(executable, ""); + check(Freeze the executable, err); + + /* + * Extract the symbol from the executable. + */ + hsa_executable_symbol_t symbol; + err = hsa_executable_get_symbol(executable, NULL, "&__OpenCL_vector_copy_kernel", agent, 0, &symbol); + check(Extract the symbol from the executable, err); + + /* + * Extract dispatch information from the symbol + */ + uint64_t kernel_object; + uint32_t kernarg_segment_size; + uint32_t group_segment_size; + uint32_t private_segment_size; + err = hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, &kernel_object); + check(Extracting the symbol from the executable, err); + err = hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE, &kernarg_segment_size); + check(Extracting the kernarg segment size from the executable, err); + err = hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_GROUP_SEGMENT_SIZE, &group_segment_size); + check(Extracting the group segment size from the executable, err); + err = hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_PRIVATE_SEGMENT_SIZE, &private_segment_size); + check(Extracting the private segment from the executable, err); + /* * Allocate and initialize the kernel arguments. @@ -299,21 +251,22 @@ int main(int argc, char **argv) /* * Find a memory region that supports kernel arguments. */ - hsa_region_t kernarg_region = 0; - hsa_agent_iterate_regions(device, get_kernarg, &kernarg_region); - err = (kernarg_region == 0) ? HSA_STATUS_ERROR : HSA_STATUS_SUCCESS; + hsa_region_t kernarg_region; + kernarg_region.handle=(uint64_t)-1; + hsa_agent_iterate_regions(agent, get_kernarg_memory_region, &kernarg_region); + err = (kernarg_region.handle == (uint64_t)-1) ? HSA_STATUS_ERROR : HSA_STATUS_SUCCESS; check(Finding a kernarg memory region, err); void* kernel_arg_buffer_1 = NULL; void* kernel_arg_buffer_2 = NULL; - size_t kernel_arg_buffer_size = hsaCodeDescriptor->kernarg_segment_byte_size; + size_t kernel_arg_buffer_size = kernarg_segment_size; /* * Allocate the kernel argument buffer from the correct region. */ - err = hsa_memory_allocate(kernarg_region, kernel_arg_buffer_size, + err = hsa_memory_allocate(kernarg_region, kernarg_segment_size, &kernel_arg_buffer_1); check(Allocating kernel argument memory buffer, err); - err = hsa_memory_allocate(kernarg_region, kernel_arg_buffer_size, + err = hsa_memory_allocate(kernarg_region, kernarg_segment_size, &kernel_arg_buffer_2); check(Allocating kernel argument memory buffer, err); uint64_t kernel_arg_start_offset = 0; @@ -336,70 +289,96 @@ int main(int argc, char **argv) memcpy(kernel_arg_buffer_start_2 + sizeof(void*) + sizeof(void*), &offset, sizeof(int)); - /* Dispatch packet to copy elements from 0 to GRID_SIZE_X/2 - * Bind kernel code and the kernel argument buffer to the - * aql packet. - */ - aql.kernel_object_address=hsaCodeDescriptor->code.handle; - aql.kernarg_address=(uint64_t)kernel_arg_buffer_1; - /* * Obtain the current queue write index. */ - uint64_t index = hsa_queue_load_write_index_relaxed(commandQueue); + uint64_t index = hsa_queue_load_write_index_relaxed(queue); /* * Write the aql packet at the calculated queue index address. */ - const uint32_t queueMask = commandQueue->size - 1; - ((hsa_dispatch_packet_t*)(commandQueue->base_address))[index&queueMask]=aql; + const uint32_t queueMask = queue->size - 1; + hsa_kernel_dispatch_packet_t* dispatch_packet = &(((hsa_kernel_dispatch_packet_t*)(queue->base_address))[index&queueMask]); + + dispatch_packet->header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE; + dispatch_packet->header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE; + dispatch_packet->setup |= 1 << HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS; + dispatch_packet->workgroup_size_x = (uint16_t)GROUP_SIZE_X; + dispatch_packet->workgroup_size_y = (uint16_t)1; + dispatch_packet->workgroup_size_z = (uint16_t)1; + dispatch_packet->grid_size_x = (uint32_t) (GRID_SIZE_X/2); + dispatch_packet->grid_size_y = 1; + dispatch_packet->grid_size_z = 1; + dispatch_packet->completion_signal.handle = 0; + dispatch_packet->kernel_object = kernel_object; + dispatch_packet->kernarg_address = (void*) kernel_arg_buffer_1; + dispatch_packet->private_segment_size = private_segment_size; + dispatch_packet->group_segment_size = group_segment_size; + __atomic_store_n((uint8_t*)(&dispatch_packet->header), (uint8_t)HSA_PACKET_TYPE_KERNEL_DISPATCH, __ATOMIC_RELEASE); + + /* + * Increment the write index of the queue. + */ + hsa_queue_store_write_index_relaxed(queue, index+1); /* + * Write the second packet + */ + index = hsa_queue_load_write_index_relaxed(queue); + + dispatch_packet = &(((hsa_kernel_dispatch_packet_t*)(queue->base_address))[index&queueMask]); + + dispatch_packet->header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE; + dispatch_packet->header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE; + dispatch_packet->setup |= 1 << HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS; + dispatch_packet->workgroup_size_x = (uint16_t)GROUP_SIZE_X; + dispatch_packet->workgroup_size_y = (uint16_t)1; + dispatch_packet->workgroup_size_z = (uint16_t)1; + dispatch_packet->grid_size_x = (uint32_t) (GRID_SIZE_X/2); + dispatch_packet->grid_size_y = 1; + dispatch_packet->grid_size_z = 1; + dispatch_packet->completion_signal.handle = 0; + dispatch_packet->kernel_object = kernel_object; + dispatch_packet->kernarg_address = (void*) kernel_arg_buffer_2; + dispatch_packet->private_segment_size = private_segment_size; + dispatch_packet->group_segment_size = group_segment_size; + __atomic_store_n((uint8_t*)(&dispatch_packet->header), (uint8_t)HSA_PACKET_TYPE_KERNEL_DISPATCH, __ATOMIC_RELEASE); + + /* * Increment the write index and ring the doorbell to dispatch the kernel. */ - hsa_queue_store_write_index_relaxed(commandQueue, index+1); - hsa_signal_store_relaxed(commandQueue->doorbell_signal, index); - check(Dispatching the kernel, err); + hsa_queue_store_write_index_relaxed(queue, index+1); + /* Dispatch packet to copy elements from GRID_SIZE_X/2 to GRID_SIZE_X * Bind kernel code and the kernel argument buffer to the * aql packet. */ hsa_signal_t signal; hsa_signal_create(1, 0, NULL, &signal); - - aql.kernarg_address=(uint64_t)kernel_arg_buffer_2; - aql.completion_signal=signal; - index = hsa_queue_load_write_index_relaxed(commandQueue); - ((hsa_dispatch_packet_t*)(commandQueue->base_address))[index&queueMask]=aql; - hsa_queue_store_write_index_relaxed(commandQueue, index+1); - hsa_signal_store_relaxed(commandQueue->doorbell_signal, index); - check(Dispatching the kernel, err); - - - + /* * Dispatch a barrier packet to flush the queue */ - hsa_barrier_packet_t barrier; - memset(&barrier, 0, sizeof(hsa_barrier_packet_t)); - barrier.header.type=HSA_PACKET_TYPE_BARRIER; - barrier.header.acquire_fence_scope=2; - barrier.header.release_fence_scope=2; - barrier.header.barrier=1; + hsa_barrier_or_packet_t barrier; + memset(&barrier, 0, sizeof(hsa_barrier_or_packet_t)); + barrier.header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE; + barrier.header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE; + barrier.header |= 1 << HSA_PACKET_HEADER_BARRIER; + barrier.header |= HSA_PACKET_TYPE_BARRIER_AND << HSA_PACKET_HEADER_TYPE; + barrier.completion_signal = signal; - barrier.completion_signal = signal; - index = hsa_queue_load_write_index_relaxed(commandQueue); - ((hsa_barrier_packet_t*)(commandQueue->base_address))[index&queueMask]=barrier; - hsa_queue_store_write_index_relaxed(commandQueue, index+1); - hsa_signal_store_relaxed(commandQueue->doorbell_signal, index); + index = hsa_queue_load_write_index_relaxed(queue); + ((hsa_barrier_or_packet_t*)(queue->base_address))[index&queueMask]=barrier; + hsa_queue_store_write_index_relaxed(queue, index+1); + hsa_signal_store_relaxed(queue->doorbell_signal, index); check(Dispatching the kernel, err); //Wait for completion signal - hsa_signal_wait_acquire(signal, HSA_LT, 1, (uint64_t) -1, HSA_WAIT_EXPECTANCY_UNKNOWN); + hsa_signal_wait_acquire(signal, HSA_SIGNAL_CONDITION_LT, 1, UINT64_MAX, HSA_WAIT_STATE_BLOCKED); /* * Validate the data in the output buffer. @@ -414,14 +393,16 @@ int main(int argc, char **argv) /* * Cleanup all allocated resources. */ - destroy_brig_module(brigModule); err=hsa_signal_destroy(signal); check(Destroying the signal, err); - err=hsa_ext_program_destroy(hsaProgram); - check(Destroying the program, err); + err=hsa_executable_destroy(executable); + check(Destroying the executable, err); + + err=hsa_code_object_destroy(code_object); + check(Destroying the code object, err); - err=hsa_queue_destroy(commandQueue); + err=hsa_queue_destroy(queue); check(Destroying the queue, err); err=hsa_shut_down(); diff --git a/examples/okra/BVH/BVH.cl b/examples/okra/BVH/BVH.cl deleted file mode 100644 index e904794..0000000 --- a/examples/okra/BVH/BVH.cl +++ /dev/null @@ -1,163 +0,0 @@ -#define SVM_DATA_STRUCT_OPENCL_DEVICE - -#include "BVH.h" -#include "svm_data_struct.h" - -#define NULL 0 -#define _E 1 - -inline bool containsPoint( Sphere *s, double ox, double oy, double oz) { - double dx = s->x - ox; - double dy = s->y - oy; - double dz = 0; - double radius2 = s->radius2; - if((dx*dx + dy*dy + dz*dz) <= radius2) - return true; - else - return false; -} - -inline bool processLeaf(__global OutIdx *v, __global Sphere *LeafObjects, size_t n, double x, double y, double z) -{ - for (size_t i = 0; i < n; i++) - { - Sphere *s = &LeafObjects[i]; - if (containsPoint(s, x, y, z)) - { - v->idx = s->idx; - } - } - return true; -} - -inline int computeResult() -{ - // [5][5] X [5][5] X [5][1] X [1][5] Matrix multiplication - // Result is used to saturate LUMA - int zz; - int count = 20; - int c[5][5], sum = 0, result2[5][1], compute_result =0; - - for (zz = 1; zz < count; zz++) { - int i,j,k,m=5, n=5, x=5, q=5; - for(i = 0; i < m; i++) - { - for(j = 0; j < x; j++) - { - sum=0; - for(k = 0; k < n; k++) - { - sum = sum + (a_mat[i][k] * b_mat[k][j]); - } - c[i][j] = sum / zz; - } - } - sum = 0; - m=5; n=5; x=1; q=5; - for(i = 0; i < m; i++) - { - for(j = 0; j < x; j++) - { - sum=0; - for(k = 0; k < n; k++) - { - sum = sum + (c[i][k] * c_mat[k][j]); - } - result2[i][j] = sum; - } - } - m=1; n=5; x=1; q=5; - sum = 0; - for(i = 0; i < m; i++) - { - for(j = 0; j < x; j++) - { - sum=0; - for(k = 0; k < n; k++) - { - sum = sum + (d_mat[i][k] * result2[k][j]); - } - compute_result += sum; - } - } - } - return compute_result; -} -inline bool doesPointLieInsideBVH(__global BVH* node, double x, double y, double z) -{ - return ((x >= node->minX) && (y>= node->minY) && (x <= node->maxX) && (y<= node->maxY)); -} - -/* - * This kernel searches a set of points in a BVH. - * Arguments: - * 1. root node of the BVH. - * 2. An array of points to be searched. - * 3. An array of nodes pointers found in the search. - */ -__kernel void bvh_search( - __global void *root_parm, - __global float_3 *search_point, - __global void* found_nodes_parm) -{ - __global BVH *root = (__global BVH *)root_parm; - - int gid = get_global_id(0); - int init_id = gid; - - float x = search_point[init_id].x; - float y = search_point[init_id].y; - float z = search_point[init_id].z; - __global OutIdx *found_nodes_temp = (__global OutIdx *)found_nodes_parm; - __global OutIdx *list=found_nodes_temp+init_id; - - // Allocate traversal stack from thread-local memory, - // and push NULL to indicate that there are no postponed nodes. - //BVH *stack[64]={NULL}; - // BVH **stackPtr = stack; - //stackPtr = NULL; // push - // *stackPtr++ = NULL; - - // Traverse nodes starting from the root. - __global BVH* node = root; - do - { - // Check each child node for overlap. - __global BVH* childL = node->prev; - __global BVH* childR = node->next; - - - bool overlapL = doesPointLieInsideBVH(node->prev,x,y,z); - bool overlapR = doesPointLieInsideBVH(node->next,x,y,z); - - - // Query overlaps a leaf node => report collision. - if (overlapL && childL->nleafObjects!=0) - list->idx = childL->leafObjects[0].idx;// + computeResult(); - - if (overlapR && childR->nleafObjects!=0) - list->idx = childR->leafObjects[0].idx;// +computeResult(); - - /* - list->idx = (overlapL && childL->nleafObjects!=0) * childL->leafObjects[0].idx + (!(overlapL && childL->nleafObjects!=0)) * list->idx; - list->idx = (overlapR && childR->nleafObjects!=0) * childR->leafObjects[0].idx + (!(overlapR && childR->nleafObjects!=0)) * list->idx; - list->idx += (overlapL && childL->nleafObjects!=0) * computeResult(); - list->idx += (overlapR && childR->nleafObjects!=0) * computeResult(); - */ - // Query overlaps an internal node => traverse. - bool traverseL = (overlapL && !node->prev->nleafObjects); - bool traverseR = (overlapR && !node->next->nleafObjects); - - #if 1 - node=NULL; - //if (traverseL || traverseR) - node = (traverseR) ? childR : node; - node = (traverseL) ? childL : node; - - - #endif - //node =(__global BVH*) ((traverseL)*(long) childL + (traverseR)* (long)childR+(1-traverseL-traverseR)*NULL); - //node = (traverseL) ? childL : ((traverseR)? childR:NULL); - } - while (node); -} diff --git a/examples/okra/BVH/BVH.cpp b/examples/okra/BVH/BVH.cpp deleted file mode 100644 index ac08219..0000000 --- a/examples/okra/BVH/BVH.cpp +++ /dev/null @@ -1,384 +0,0 @@ -#include -#include -#include "BVH.h" - -static bool bvhsortx (Sphere s,Sphere t) { return (s.xparent = lparent; - - // Early out check due to bad data - // If the list is empty then we have no BVHGObj, or invalid parameters are passed in - if (!spherelist|| (count == 0)) - { - b->minX = 0; - b->maxX = 0; - b->minY = 0; - b->maxY = 0; - b->minZ = 0; - b->maxZ = 0; - b->prev = NULL; - b->next = NULL; - b->leafObjects = NULL; - return; - } - - // Check if we’re at our LEAF node, and if so, save the objects and stop recursing. Also store the min/max for the leaf node and update the parent appropriately - if (count <= 2) - { - // We need to find the aggregate min/max for all 4 remaining objects - // Start by recording the min max of the first object to have a starting point, then we’ll loop through the remaining - b->minX = spherelist[0].x - spherelist[0].radius; - b->maxX = spherelist[0].x + spherelist[0].radius; - b->minY = spherelist[0].y - spherelist[0].radius; - b->maxY = spherelist[0].y + spherelist[0].radius; - b->minZ = spherelist[0].z - spherelist[0].radius; - b->maxZ = spherelist[0].z + spherelist[0].radius; - - // once we reach the leaf node, we must set prev/next to NULL to signify the end - b->prev = NULL; - b->next = NULL; - - // at the leaf node we store the remaining objects, so initialize a list - b->leafObjects = (Sphere*)objData->mem_malloc(objData, count*sizeof(Sphere)); - b->nleafObjects = count; - - // loop through all the objects to add them to our leaf node, and calculate the min/max values as we go - for (loop = 0; loop < count; loop++) - { - // test min X and max X against the current bounding volume - if ((spherelist[loop].x - spherelist[loop].radius) < b->minX) - b->minX = (spherelist[loop].x - spherelist[loop].radius); - if ((spherelist[loop].x + spherelist[loop].radius) > b->maxX) - b->maxX = (spherelist[loop].x + spherelist[loop].radius); - - // Update the leaf node’s parent if appropriate with the min/max - if (b->parent != NULL && b->minX < b->parent->minX) - b->parent->minX = b->minX; - if (b->parent != NULL && b->maxX > b->parent->maxX) - b->parent->maxX = b->maxX; - - // test min Y and max Y against the current bounding volume - if ((spherelist[loop].y - spherelist[loop].radius) < b->minY) - b->minY = (spherelist[loop].y - spherelist[loop].radius); - if ((spherelist[loop].y + spherelist[loop].radius) > b->maxY) - b->maxY = (spherelist[loop].y + spherelist[loop].radius); - - // Update the leaf node’s parent if appropriate with the min/max - if (b->parent != NULL && b->minY < b->parent->minY) - b->parent->minY = b->minY; - if (b->parent != NULL && b->maxY > b->parent->maxY) - b->parent->maxY = b->maxY; - - // test min Z and max Z against the current bounding volume - if ( (spherelist[loop].z - spherelist[loop].radius) < b->minZ ) - b->minZ = (spherelist[loop].z - spherelist[loop].radius); - if ( (spherelist[loop].z + spherelist[loop].radius) > b->maxZ ) - b->maxZ = (spherelist[loop].z + spherelist[loop].radius); - - // Update the leaf node’s parent if appropriate with the min/max - if (b->parent != NULL && b->minZ < b->parent->minZ) - b->parent->minZ = b->minZ; - if (b->parent != NULL && b->maxZ > b->parent->maxZ) - b->parent->maxZ = b->maxZ; - - // store our object into this nodes object list - b->leafObjects[loop].x = spherelist[loop].x; - b->leafObjects[loop].y = spherelist[loop].y; - b->leafObjects[loop].z = spherelist[loop].z; - b->leafObjects[loop].r = spherelist[loop].r; - b->leafObjects[loop].g = spherelist[loop].g; - b->leafObjects[loop].b = spherelist[loop].b; - b->leafObjects[loop].idx = spherelist[loop].idx; - b->leafObjects[loop].radius = spherelist[loop].radius; - b->leafObjects[loop].radius2 = spherelist[loop].radius2; - - // store this leaf node back in out object so we can quickly find what leaf node our object is stored in - } - // done with this branch, return recursively and on return update the parent min/max bounding volume - return; - } - - b->nleafObjects = 0; - Sphere *newlist = (Sphere*)malloc(count*sizeof(Sphere)); - - // if we have more than one object then sort the list and create the bvhGObj - for (loop = 0; loop < count; loop++) - { - // first create a new list using just the subject of objects from the old list - newlist[loop].x = spherelist[loop].x; - newlist[loop].y = spherelist[loop].y; - newlist[loop].z = spherelist[loop].z; - newlist[loop].r = spherelist[loop].r; - newlist[loop].g = spherelist[loop].g; - newlist[loop].b = spherelist[loop].b; - newlist[loop].idx = spherelist[loop].idx; - newlist[loop].radius = spherelist[loop].radius; - newlist[loop].radius2 = spherelist[loop].radius2; - } - - switch (axisid) // sort along the appropriate axis - { - case X: // X - std::sort(newlist, newlist+count, bvhsortx); - if (newlist[0].x == newlist[count-1].x) - center = (size_t)(count * 0.5f); - else - center = std::distance(newlist, std::lower_bound(newlist, newlist+count, (double)((newlist[0].x + newlist[count-1].x) * 0.5f), bvhcompx)); - break; - case Y: // Y - std::sort(newlist, newlist+count, bvhsorty); - if (newlist[0].y == newlist[count-1].y) - center = (size_t)(count * 0.5f); - else - center = std::distance(newlist, std::lower_bound(newlist, newlist+count, (double)((newlist[0].y + newlist[count-1].y) * 0.5f), bvhcompy)); - break; - case Z: // Z - std::sort(newlist, newlist+count, bvhsortz); - if (newlist[0].z == newlist[count-1].z) - center = (size_t)(count * 0.5f); - else - center = std::distance(newlist, std::lower_bound(newlist, newlist+count, (double)((newlist[0].z + newlist[count-1].z) * 0.5f), bvhcompz)); - break; - } - - // Find the center object in our current sub-list - //center = (size_t)(count * 0.5f); - - // Initialize the branch to a starting value, then we’ll update it based on the leaf node recursion updating the parent - b->minX = newlist[0].x - newlist[0].radius; - b->maxX = newlist[0].x + newlist[0].radius; - b->minY = newlist[0].y - newlist[0].radius; - b->maxY = newlist[0].y + newlist[0].radius; - b->minZ = newlist[0].z - newlist[0].radius; - b->maxZ = newlist[0].z + newlist[0].radius; - b->leafObjects = NULL; - - // if we’re here then we’re still in a leaf node. therefore we need to split prev/next and keep branching until we reach the leaf node - BVH *temp = (BVH*)nodeData->mem_malloc(nodeData, sizeof(BVH)); - buildBVH(temp, newlist, b,center, NextAxis(axisid), nodeData, objData); // Split the Hierarchy to the left - b->prev = temp; - - BVH *temp1 = (BVH*)nodeData->mem_malloc(nodeData, sizeof(BVH)); - buildBVH(temp1, newlist+center, b, count-center, NextAxis(axisid), nodeData, objData); // Split the Hierarchy to the Right - b->next = temp1; - - free(newlist); - - // Update the parent bounding box to ensure it includes the children. Note: the leaf node already updated it’s parent, but now that parent needs to keep updating it’s branch parent until we reach the root level - if (b->parent != NULL && b->minX < b->parent->minX) - b->parent->minX =b-> minX; - if (b->parent != NULL && b->maxX > b->parent->maxX) - b->parent->maxX = b->maxX; - if (b->parent != NULL && b->minY < b->parent->minY) - b->parent->minY = b->minY; - if (b->parent != NULL && b->maxY > b->parent->maxY) - b->parent->maxY = b->maxY; - if (b->parent != NULL && b->minZ < b->parent->minZ) - b->parent->minZ = b->minZ; - if (b->parent != NULL && b->maxZ > b->parent->maxZ) - b->parent->maxZ = b->maxZ; - - return; -} - -static bool doesPointLieInsideBVH(BVH* node, double x, double y, double z) -{ - bool retVal = false; - if ((x >= node->minX) && (y>= node->minY) /*&& (z>= node->minZ)*/ && (x <= node->maxX) && (y<= node->maxY) /* && (z <= node->maxZ)*/) - retVal = true; - return retVal; -} - -static bool processLeaf(OutIdx *v, Sphere *LeafObjects, size_t n, double x, double y, double z) -{ - for (size_t i = 0; i < n; i++) - { - Sphere s = LeafObjects[i]; - if (containsPoint(&s, x, y, z)) - { -#ifdef RAYTRACE - v[s.idx].idx = s.idx; -#else - v->idx = s.idx; -#endif - } - } - return true; -} - -void traverseBVHRecursive(BVH *b, OutIdx *list, size_t *start, double x, double y, double z) -{ - // Bounding box overlaps the query => process node. - if(doesPointLieInsideBVH(b,x,y,z)) - { - // Leaf node => report collision. - if (b->nleafObjects!=0) - { - for(int k=0;knleafObjects;k++) - { - list[*start].idx=b->leafObjects[k].idx; - *start+=1; - } - } - // Internal node => recurse to children. - else - { - if(b->prev) - traverseBVHRecursive(b->prev,list,start,x,y,z); - if(b->next) - traverseBVHRecursive(b->next,list,start,x,y,z); - } - } -} - -int computeResult() -{ - /* ***************************** */ - /* COMPUTE - /* ***************************** */ - // [5][5] X [5][5] X [5][1] X [1][5] Matrix multiplication - // Result is used to saturate LUMA - int zz; - int count = 20; - int c[5][5], sum = 0, result2[5][1], compute_result =0; - - for (zz = 1; zz < count; zz++) { - int i,j,k,m=5, n=5, x=5, q=5; - for(i = 0; i < m; i++) - { - for(j = 0; j < x; j++) - { - sum=0; - for(k = 0; k < n; k++) - { - sum = sum + (a_mat[i][k] * b_mat[k][j]); - } - c[i][j] = sum / zz; - } - } - sum = 0; - m=5; n=5; x=1; q=5; - for(i = 0; i < m; i++) - { - for(j = 0; j < x; j++) - { - sum=0; - for(k = 0; k < n; k++) - { - sum = sum + (c[i][k] * c_mat[k][j]); - } - result2[i][j] = sum; - } - } - m=1; n=5; x=1; q=5; - sum = 0; - for(i = 0; i < m; i++) - { - for(j = 0; j < x; j++) - { - sum=0; - for(k = 0; k < n; k++) - { - sum = sum + (d_mat[i][k] * result2[k][j]); - } - compute_result += sum; - } - } - } - return compute_result; -} -void traverseBVHIterative(BVH *b, OutIdx *list, size_t *start, double x, double y, double z) -{ - // Allocate traversal stack from thread-local memory, - // and push NULL to indicate that there are no postponed nodes. - //BVH *stack[64]={NULL}; - //BVH **stackPtr =stack; - //*stackPtr++=NULL; - - // Traverse nodes starting from the root. - BVH* node = b; - do - { - // Check each child node for overlap. - BVH* childL = node->prev; - BVH* childR = node->next; - bool overlapL = doesPointLieInsideBVH(node->prev,x,y,z); - bool overlapR = doesPointLieInsideBVH(node->next,x,y,z); - - // Query overlaps a leaf node => report collision. - if (overlapL && childL->nleafObjects!=0) -#ifdef RAYTRACE - for(int k=0;knleafObjects;k++) - { - list[*start].idx=childL->leafObjects[k].idx; - *start+=1; - } -#else - //processLeaf(list, childL->leafObjects, childL->nleafObjects, x, y, z); - list[*start].idx=childL->leafObjects[0].idx;//+computeResult(); -#endif - if (overlapR && childR->nleafObjects!=0) -#ifdef RAYTRACE - for(int k=0;knleafObjects;k++) - { - list[*start].idx=childR->leafObjects[k].idx; - *start+=1; - } -#else - //processLeaf(list, childR->leafObjects, childR->nleafObjects, x, y, z); - list[*start].idx=childR->leafObjects[0].idx;//+computeResult(); -#endif - // Query overlaps an internal node => traverse. - bool traverseL = (overlapL && !node->prev->nleafObjects); - bool traverseR = (overlapR && !node->next->nleafObjects); -#if 0 - if (!traverseL && !traverseR) - { - node = *--stackPtr; // pop - } - else - { - node = (traverseL) ? childL : childR; - if (traverseL && traverseR) - *stackPtr++ = childR; // push - } -#endif - node=NULL; - if (traverseL || traverseR) - node = (traverseL) ? childL : childR; - } - while(node); -} - -void destroyBVH(BVH *b) -{ - if((b->prev == NULL) && (b->next == NULL)) - { - //leaf Node - b->leafObjects = NULL; - } - else - { - //not a leaf node - if(b->prev != NULL) - destroyBVH(b->prev); - - if(b->next != NULL) - destroyBVH(b->next); - b->prev = NULL; - b->next = NULL; - } - return; -} diff --git a/examples/okra/BVH/BVH.h b/examples/okra/BVH/BVH.h deleted file mode 100644 index 562b4a4..0000000 --- a/examples/okra/BVH/BVH.h +++ /dev/null @@ -1,90 +0,0 @@ -#ifndef BVH_H -#define BVH_H - -#include "Sphere.h" -#include "mem_mgr.h" -#include "stdint.h" -#ifndef SVM_DATA_STRUCT_OPENCL_DEVICE -#define __global -#endif - -typedef enum _Axis {X,Y,Z} Axis; - -typedef struct _float_3 -{ - double x; - double y; - double z; -} float_3; - -typedef struct _OutIdx -{ - intptr_t idx; -} OutIdx; - -typedef struct _BVH -{ - double minX; - double maxX; - double minY; - double maxY; - double minZ; - double maxZ; - __global struct _BVH *prev; - __global struct _BVH *next; - __global struct _BVH *parent; - __global Sphere *leafObjects; - size_t nleafObjects; -} BVH; - -void buildBVH(BVH *b, Sphere *spherelist, BVH *lparent, size_t count, Axis axisid, mem_mgr m, mem_mgr objdata); -//void buildBVH(BVH *b, Sphere *spherelist, BVH *lparent, size_t start, size_t end, Axis axisid, mem_mgr m, mem_mgr objdata); -void destroyBVH(BVH *b); -void traverseBVHRecursive(BVH *b, OutIdx *list, size_t *start, double x,double y,double z); -void traverseBVHIterative(BVH *b, OutIdx *list, size_t *start, double x,double y,double z); - -#ifdef SVM_DATA_STRUCT_OPENCL_DEVICE -// GPU Compute matrix -__constant int a_mat[5][5] = { 1, 2, 3, 4, 5, - 6, 7, 8, 9, 10, - 11, 12, 13, 14, 15, - 17, 18, 19, 20, 21, - 22, 23, 24, 25, 26 - }; -__constant int b_mat[5][5] = { 1, 2, 3, 4, 5, - 6, 7, 8, 9, 10, - 11, 12, 13, 14, 15, - 11, 12, 13, 14, 15, - 11, 12, 13, 14, 15 - }; -__constant int c_mat[5][1] = {27, - 28, - 29, - 30 - }; -__constant int d_mat[1][5] = {27,28,29,30}; -#else -// GPU Compute matrix -const int a_mat[5][5] = { 1, 2, 3, 4, 5, - 6, 7, 8, 9, 10, - 11, 12, 13, 14, 15, - 17, 18, 19, 20, 21, - 22, 23, 24, 25, 26 - }; -const int b_mat[5][5] = { 1, 2, 3, 4, 5, - 6, 7, 8, 9, 10, - 11, 12, 13, 14, 15, - 11, 12, 13, 14, 15, - 11, 12, 13, 14, 15 - }; -const int c_mat[5][1] = {27, - 28, - 29, - 30 - }; -const int d_mat[1][5] = {27,28,29,30}; -#endif - - - -#endif diff --git a/examples/okra/BVH/HsaApp.cpp b/examples/okra/BVH/HsaApp.cpp deleted file mode 100644 index 9503a61..0000000 --- a/examples/okra/BVH/HsaApp.cpp +++ /dev/null @@ -1,273 +0,0 @@ -/******************************************************************************* -Copyright ©2013 Advanced Micro Devices, Inc. All rights reserved. - -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - -1 Redistributions of source code must retain the above copyright notice, -this list of conditions and the following disclaimer. -2 Redistributions in binary form must reproduce the above copyright notice, -this list of conditions and the following disclaimer in the -documentation and/or other materials provided with the distribution. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF -THE POSSIBILITY OF SUCH DAMAGE. -*******************************************************************************/ - -/** -******************************************************************************** -* @file -* -* @brief This file contains functions for initializing HSA CU. -* It creates a binary search tree in shared virtual memory and also -* enqueues work to the CU for creating node and inserting in the same Binary Search Tree -* -* This shows SVM and atomics functionality of HSA. -******************************************************************************** -*/ - -#include "okra.h" -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include "BVH.h" -#include "HsaApp.h" - -#define USE_OPENMP -#ifdef _DEBUG -#undef USE_OPENMP -#endif - -using namespace std; - -bool bvhsortx_float (float_3 s,float_3 t) { return (s.x 1) { - int temp = atoi(argv[1]); - runHSA = ((temp & 1) != 0); -#if 0//def USE_OPENMP - if ((!runHSA) && (temp == 0)) - omp_set_num_threads(1); -#endif - } - srand(100); - gNSpheres = 1024*1024*atoi(argv[2]); - size_t _BVH_NODES_SIZE = (size_t) 128*gNSpheres; - size_t _OBJ_SIZE =(size_t) 256*gNSpheres; - /* Generate random sphere data */ - Sphere *tempS = (Sphere*)malloc(sizeof(Sphere) * gNSpheres); - int x = -3500, y= -3500; - for (int i=0; imem_init(nodeData, _BVH_NODES_SIZE); - objectData->mem_init(objectData, _OBJ_SIZE); - - BVH *b; - if ((b = (BVH*)nodeData->mem_malloc(nodeData, sizeof(BVH))) == NULL) { - std::cerr << "Error allocating memory for BVH object." << std::endl; - exit(1); - } - - OutIdx *found_idxs_HSA = NULL; - if ((found_idxs_HSA = (OutIdx *) malloc(num_search_points * sizeof(OutIdx))) == NULL) { - std::cerr << "Error allocating memory for found indices." << std::endl; - exit(1); - } - - buildBVH(b, tempS, NULL, gNSpheres, X, nodeData, objectData); - - /* Run BVH traversal kernel */ - //setup kernel arguments - okra_clear_args(kernel); - -#ifdef DUMMY_ARGS - //This flags should be set if HSA_HLC_Stable is used - // This is because the high level compiler generates 6 extra args - okra_push_pointer(kernel, NULL); - okra_push_pointer(kernel, NULL); - okra_push_pointer(kernel, NULL); - okra_push_pointer(kernel, NULL); - okra_push_pointer(kernel, NULL); - okra_push_pointer(kernel, NULL); -#endif - okra_push_pointer(kernel, b); - okra_push_pointer(kernel, search_points); - okra_push_pointer(kernel, found_idxs_HSA); - - std::cout << "search_per_wi = " << search_per_wi << std::endl; - - bool warmmedup = false; - -// for (int j = 0; j < sizeof(size) / sizeof(int); j++) - for (int j = 0; j < 1; j++) - { - size_t globalThreads = (size_t)(num_search_points / search_per_wi); - size_t localThreads = size[j]; - - memset(found_idxs_HSA, 0, num_search_points * sizeof(OutIdx)); - for (int i = 0; i < 1; i++) - { - okra_range_t range; - range.dimension=1; - range.global_size[0] = globalThreads; - range.global_size[1] = range.global_size[2] = 1; - range.group_size[0] = localThreads; - range.group_size[1] = range.group_size[2] = 1; - - //execute kernel and wait for completion - okra_status_t status = okra_execute_kernel(context, kernel, &range); - if(status != OKRA_SUCCESS) {std::cout << "Error while executing kernel:" << (int)status << std::endl; exit(-1);} - - } - - int numFound = 0; - for(int i=0;i test.sh - echo "./$(TEST_NAME)" >> test.sh - bash test.sh - rm test.sh - - diff --git a/examples/okra/BVH/Readme.txt b/examples/okra/BVH/Readme.txt deleted file mode 100644 index 39f0f0c..0000000 --- a/examples/okra/BVH/Readme.txt +++ /dev/null @@ -1,13 +0,0 @@ -1) To execute the exe please use following: - -bvh.exe <0 or 1> - -0 builds BVH for CPU using malloc and executes 1M search points traversal on CPU -1 builds BVH using clSVMalloc and executes 1 M search points traversal on HSA - -Total numbeer of spheres used = * 1024 * 1024 - -2) Search points range is defined in initialize_search_points function in HsaApp.cpp - -3) Run as ./BVH 1 1 -(where first 1 says run on HSA and second 1 gives size of nodes in Millions) diff --git a/examples/okra/BVH/Sphere.cpp b/examples/okra/BVH/Sphere.cpp deleted file mode 100644 index 5f493a1..0000000 --- a/examples/okra/BVH/Sphere.cpp +++ /dev/null @@ -1,26 +0,0 @@ -#include -#include "Sphere.h" - -double hit(Sphere *s, double ox, double oy, double *n) { - double dx = ox - s->x; - double dy = oy - s->y; - double radius2 = s->radius2; - if (dx*dx + dy*dy < radius2) { - double dz = sqrtf((float)(radius2 - dx*dx - dy*dy)); - *n = dz / s->radius; - return dz + s->z; - } - return (double)-INF; -} - -bool containsPoint(Sphere *s, double ox, double oy, double oz) { - double dx = s->x - ox; - double dy = s->y - oy; - //double dz = s-> - oz; - double dz = 0; - double radius2 = s->radius2; - if((dx*dx + dy*dy + dz*dz) <= radius2) - return true; - else - return false; -} diff --git a/examples/okra/BVH/Sphere.h b/examples/okra/BVH/Sphere.h deleted file mode 100644 index bdabca0..0000000 --- a/examples/okra/BVH/Sphere.h +++ /dev/null @@ -1,33 +0,0 @@ -#ifndef SPHERE_H -#define SPHERE_H - -#define INF 2e10f - -#define random(min, max) (((double)rand() / ((size_t)RAND_MAX + 1)) * (max - min + 1) + min) -#define rnd(max) random(0, max) - -typedef struct _Color { - double r, g, b; -} Color; - -typedef struct _Material { - double ns; // shininess - double transp; // transparency - double reflt; // reflection - Color ambient; - Color diffuse; - Color specular; -} Material; - -typedef struct _Sphere { - double x,y,z; - double radius,radius2; - double r,b,g; - long idx; - Material m; -} Sphere; - -double hit(Sphere *s, double ox, double oy, double *n); -bool containsPoint(Sphere *s, double ox, double oy, double oz); - -#endif diff --git a/examples/okra/BVH/mem_mgr.cpp b/examples/okra/BVH/mem_mgr.cpp deleted file mode 100644 index c1e20b2..0000000 --- a/examples/okra/BVH/mem_mgr.cpp +++ /dev/null @@ -1,60 +0,0 @@ -#include -#include "mem_mgr.h" - -struct mem_mgr_class_ext { - void (*mem_init)(mem_mgr m, size_t size); - void *(*mem_malloc)(mem_mgr m, size_t req_size); - void (*mem_destroy)(mem_mgr m); - size_t (*mem_usage)(mem_mgr m); - void *memory; - void *end; - size_t offset; - void *(*cb_malloc)(size_t size); - void (*cb_free)(void* ptr); -}; - -static void mem_init(mem_mgr m, size_t size) { - mem_mgr_class_ext *mgr = (mem_mgr_class_ext*)m; - if ((mgr->memory = mgr->cb_malloc(size)) == NULL) { - printf("abc Error allocating memory for BVH object.\n"); - exit(1); - } - mgr->end = (void*)((unsigned char*)mgr->memory+size); - mgr->offset = 0; -} - -static void* mem_malloc(mem_mgr m, size_t req_size) { - mem_mgr_class_ext *mgr = (mem_mgr_class_ext*)m; - void *ptr; - ptr = (void*)(((unsigned char*)mgr->memory)+mgr->offset); - - if((void*)((unsigned char*)ptr+req_size) < mgr->end) { - mgr->offset += req_size; - return ptr; - } - return NULL; -} - -static size_t mem_usage(mem_mgr m) { - mem_mgr_class_ext *mgr = (mem_mgr_class_ext*)m; - return mgr->offset; -} - -static void destroy_mem_mgr(mem_mgr m) { - mem_mgr_class_ext *mgr = (mem_mgr_class_ext*)m; - if(mgr->memory) { - mgr->cb_free(mgr->memory); - } - free(m); -} - -void create_mem_mgr(mem_mgr *m, void *(*cb_malloc)(size_t), void (*cb_free)(void*)) { - mem_mgr_class_ext *lmem = (mem_mgr_class_ext*) malloc(sizeof(mem_mgr_class_ext)); - lmem->mem_init = mem_init; - lmem->mem_malloc = mem_malloc; - lmem->mem_usage = mem_usage; - lmem->mem_destroy = destroy_mem_mgr; - lmem->cb_malloc = cb_malloc; - lmem->cb_free = cb_free; - *m = (mem_mgr)lmem; -} diff --git a/examples/okra/BVH/mem_mgr.h b/examples/okra/BVH/mem_mgr.h deleted file mode 100644 index 9a6ba2e..0000000 --- a/examples/okra/BVH/mem_mgr.h +++ /dev/null @@ -1,14 +0,0 @@ -#ifndef MEM_MGR_CLASS_H -#define MEM_MGR_CLASS_H - -typedef struct mem_mgr_class_base *mem_mgr; -struct mem_mgr_class_base { - void (*mem_init)(mem_mgr m, size_t size); - void *(*mem_malloc)(mem_mgr m, size_t req_size); - void (*mem_destroy)(mem_mgr m); - size_t (*mem_usage)(mem_mgr m); -}; - -void create_mem_mgr(mem_mgr *m, void *(*cb_malloc)(size_t), void (*cb_free)(void*)); - -#endif diff --git a/examples/okra/BVH/svm_data_struct.h b/examples/okra/BVH/svm_data_struct.h deleted file mode 100644 index 883575a..0000000 --- a/examples/okra/BVH/svm_data_struct.h +++ /dev/null @@ -1,62 +0,0 @@ - -#ifndef _SVM_DATA_STRUCT_H -#define _SVM_DATA_STRUCT_H - -#define SVM_MUTEX_LOCK 1 -#define SVM_MUTEX_UNLOCK 0 - -#ifndef SVM_DATA_STRUCT_OPENCL_DEVICE - -// C++11 implementation of the mutex. -// It is compatible with the OpenCL implementation -#include - -typedef struct { - std::atomic count; -} svm_mutex; - -void svm_mutex_init(svm_mutex* lock, int value) { - lock->count.store(value, std::memory_order_release); -} - -void svm_mutex_lock(svm_mutex* lock) { - int expected = SVM_MUTEX_UNLOCK; - while(!lock->count.compare_exchange_strong(expected, SVM_MUTEX_LOCK, std::memory_order_acquire)) { - expected = SVM_MUTEX_UNLOCK; - } -} - -void svm_mutex_unlock(svm_mutex* lock) { - lock->count.store(SVM_MUTEX_UNLOCK, std::memory_order_release); -} - -#else /* SVM_DATA_STRUCT_OPENCL_DEVICE */ - -// OpenCL implementation of the mutex. -// It is compatible with the OpenCL implementation - -typedef struct { - // atomic_int count; - volatile int count; -} svm_mutex; - -void svm_mutex_init(__global svm_mutex* lock, int value) { - // atomic_store_explicit(&lock->count, value, memory_order_release, memory_scope_all_svm_devices); - atomic_store_explicit((atomic_int *)&lock->count, value, memory_order_release); -} - -void svm_mutex_lock(__global svm_mutex* lock) { - int expected = SVM_MUTEX_UNLOCK; - while(!atomic_compare_exchange_strong_explicit((atomic_int *)&lock->count, &expected, SVM_MUTEX_LOCK - , memory_order_acquire, memory_order_release, memory_scope_all_svm_devices)) { - expected = SVM_MUTEX_UNLOCK; - } -} - -void svm_mutex_unlock(__global svm_mutex* lock) { - atomic_store_explicit((atomic_int *)&lock->count, SVM_MUTEX_UNLOCK, memory_order_release, memory_scope_all_svm_devices); -} - -#endif /* SVM_DATA_STRUCT_OPENCL_DEVICE */ - -#endif // _SVM_DATA_STRUCT_H diff --git a/examples/okra/CalcPI/CalcPI.cl b/examples/okra/CalcPI/CalcPI.cl deleted file mode 100644 index 3721541..0000000 --- a/examples/okra/CalcPI/CalcPI.cl +++ /dev/null @@ -1,7 +0,0 @@ -__kernel void calcPI(global float *x, global float *y, global int *out) { - int i = get_global_id(0); - - float c = x[i]*x[i] + y[i]*y[i]; - out[i] = 0; - if (c <= 1) out[i] = 1; -} diff --git a/examples/okra/CalcPI/CalcPI.cpp b/examples/okra/CalcPI/CalcPI.cpp deleted file mode 100644 index 28d5503..0000000 --- a/examples/okra/CalcPI/CalcPI.cpp +++ /dev/null @@ -1,107 +0,0 @@ -/* Copyright 2014 HSA Foundation Inc. All Rights Reserved. - * - * HSAF is granting you permission to use this software and documentation (if - * any) (collectively, the "Materials") pursuant to the terms and conditions - * of the Software License Agreement included with the Materials. If you do - * not have a copy of the Software License Agreement, contact the HSA Foundation for a copy. - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in the - * documentation and/or other materials provided with the distribution - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS - * FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE - * CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE SOFTWARE. - */ - -#include "okra.h" -#include -#include -#include "../utils.h" -#include -#include - -using namespace std; - -static const int NUMELEMENTS = 100000; -float *rX = new float[NUMELEMENTS]; -float *rY = new float[NUMELEMENTS]; -int *out = new int[NUMELEMENTS]; - -int main(int argc, char *argv[]) { - - string sourceFileName = "CalcPI.hsail"; - char* calcPISource = buildStringFromSourceFile(sourceFileName); - double piValue; - - for (int i=0;i test.sh - echo "./$(TEST_NAME)" >> test.sh - bash test.sh - rm test.sh - - diff --git a/examples/okra/CalcPI/obj/.gitignore b/examples/okra/CalcPI/obj/.gitignore deleted file mode 100644 index 5e7d273..0000000 --- a/examples/okra/CalcPI/obj/.gitignore +++ /dev/null @@ -1,4 +0,0 @@ -# Ignore everything in this directory -* -# Except this file -!.gitignore diff --git a/examples/okra/Makefile b/examples/okra/Makefile deleted file mode 100644 index fa647e6..0000000 --- a/examples/okra/Makefile +++ /dev/null @@ -1,27 +0,0 @@ -DIRS = CalcPI MatMul -# the sets of directories to do various things in -BUILDDIRS = $(DIRS:%=build-%) -INSTALLDIRS = $(DIRS:%=install-%) -CLEANDIRS = $(DIRS:%=clean-%) -TESTDIRS = $(DIRS:%=test-%) - -all: $(BUILDDIRS) -$(DIRS): $(BUILDDIRS) -$(BUILDDIRS): - $(MAKE) -C $(@:build-%=%) - -test: $(TESTDIRS) -$(TESTDIRS): - $(MAKE) -C $(@:test-%=%) test - -clean: $(CLEANDIRS) -$(CLEANDIRS): - $(MAKE) -C $(@:clean-%=%) clean - - -.PHONY: subdirs $(DIRS) -.PHONY: subdirs $(BUILDDIRS) -.PHONY: subdirs $(TESTDIRS) -.PHONY: subdirs $(CLEANDIRS) -.PHONY: all clean test - diff --git a/examples/okra/MatMul/Makefile b/examples/okra/MatMul/Makefile deleted file mode 100644 index dcddf99..0000000 --- a/examples/okra/MatMul/Makefile +++ /dev/null @@ -1,34 +0,0 @@ -#ifndef HSA_RUNTIME_PATH - HSA_RUNTIME_PATH=/opt/hsa -#endif -#ifndef HSA_OKRA_PATH - HSA_OKRA_PATH=/opt/amd/okra -#endif -TEST_NAME=MatMul -LFLAGS= -g -Wl,--unresolved-symbols=ignore-in-shared-libs -INCS += -I $(HSA_RUNTIME_PATH)/include -I $(HSA_OKRA_PATH)/dist/include -CPP_FILES := $(wildcard *.cpp) -OBJ_FILES := $(addprefix obj/, $(notdir $(CPP_FILES:.cpp=.o))) - -all: $(TEST_NAME) $(TEST_NAME).hsail - -$(TEST_NAME): $(OBJ_FILES) - $(CXX) $(LFLAGS) $(OBJ_FILES) -lelf -L$(HSA_RUNTIME_PATH)/lib -lokra_x86_64 -o $(TEST_NAME) - -$(TEST_NAME).hsail : - cloc.sh -hsail $(TEST_NAME).cl - -obj/%.o: %.cpp - $(CC) -c $(CFLAGS) $(INCS) -o $@ $< - -clean: - rm -rf obj/*o *.hsail $(TEST_NAME) - - -test: - echo "export LD_LIBRARY_PATH=$(HSA_RUNTIME_PATH)/lib" > test.sh - echo "./$(TEST_NAME)" >> test.sh - bash test.sh - rm test.sh - - diff --git a/examples/okra/MatMul/MatMul.cl b/examples/okra/MatMul/MatMul.cl deleted file mode 100644 index 056bda2..0000000 --- a/examples/okra/MatMul/MatMul.cl +++ /dev/null @@ -1,14 +0,0 @@ -kernel void matmul(global int *A, global int *B, global int *C, global int *N1, global int *P1) { - int i = get_global_id(0); - int j = get_global_id(1); - int k; - int P = *P1; - int N = *N1; - -// C[i][j] for given i and j - - C[i*P+j] = 0; - for (k=0;k -#include -#include "../utils.h" -#include -#include - -using namespace std; - -static const int M = 4; -static const int N = 5; -static const int P = 6; -int *A, *B, *C, *D; -// C = A * B - -void print_mat(int *A, int m, int n) { -int i,j; -for (i=0;i test.sh - echo "./$(TEST_NAME)" >> test.sh - bash test.sh - rm test.sh - - diff --git a/examples/okra/SVMAtomicsBinaryTreeInsert/README b/examples/okra/SVMAtomicsBinaryTreeInsert/README deleted file mode 100644 index fa90d09..0000000 --- a/examples/okra/SVMAtomicsBinaryTreeInsert/README +++ /dev/null @@ -1,36 +0,0 @@ - -In this application, the kernel inserts given set of keys in the binary search -tree created by the host. Simultaneously the host also inserts different set of keys. -To run the application so the following: - -1. Install HSA stack (runtime, compilers, CLOC, drivers) as given in the HSA Foundation web site GITHUB -2. Set environment variables - -export HSA_RUNTIME_PATH=/home/cas/Prakash/GitObsedian/HSA-Runtime-AMD -export HSA_KMT_PATH=/home/cas/Prakash/GitObsedian/HSA-Drivers-Linux-AMD/kfd-0.8/libhsakmt/ -export HSA_OKRA_PATH=/home/cas/Prakash/GitObsedian/Okra-Interface-to-HSA-Device/okra/ -export HSA_LLVM_PATH=/home/cas/Prakash/GitObsedian/HSAIL-HLC-Stable/bin -export LD_LIBRARY_PATH=$HSA_RUNTIME_PATH/lib/x86_64:$HSA_KMT_PATH/lnx64a:$HSA_OKRA_PATH/dist/bin -export OKRA_DISABLE_FIX_HSAIL=1 - -3. Set these macros to the value you want. Number of nodes refer to the nodes -to be inserted in the tree and host percent is the percentage to be inserted by -the host (rest would be inserted by the kernel/GPU). These are in -SVMAtomicsBinaryTreeInsert.hpp file. - -#define NUMBER_OF_NODES 10 * 1024 * 1024 -#define WORKGROUP_SIZE 256 -#define HOST_PERCENT 10 - -4. Run by giving ./SVMAtomicsBinaryTreeInsert - -Notes: - -a) This needs Stable compiler since we are using platform atomics, which are -not supported yet on Development compiler -b) Therefore you need to define DUMMY_ARGS=1 in the Makefile -c) There is a bug in the compiler and so we need to disable optimization -during "opt" to O0. Otherwise, we need to put atomic_work_item_fence call in the CL -file. We have modified the cloc for this purpose. We send -O0 to CLOC. - -It will report "Passed" if everything is passed. diff --git a/examples/okra/SVMAtomicsBinaryTreeInsert/SVMAtomicsBinaryTreeInsert b/examples/okra/SVMAtomicsBinaryTreeInsert/SVMAtomicsBinaryTreeInsert deleted file mode 100755 index 0eb5379..0000000 Binary files a/examples/okra/SVMAtomicsBinaryTreeInsert/SVMAtomicsBinaryTreeInsert and /dev/null differ diff --git a/examples/okra/SVMAtomicsBinaryTreeInsert/SVMAtomicsBinaryTreeInsert.cl b/examples/okra/SVMAtomicsBinaryTreeInsert/SVMAtomicsBinaryTreeInsert.cl deleted file mode 100644 index 8c09735..0000000 --- a/examples/okra/SVMAtomicsBinaryTreeInsert/SVMAtomicsBinaryTreeInsert.cl +++ /dev/null @@ -1,107 +0,0 @@ -/********************************************************************** -Copyright ©2014 Advanced Micro Devices, Inc. All rights reserved. - -Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: - -• Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. -• Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or - other materials provided with the distribution. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED - WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY - DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS - OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING - NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -********************************************************************/ -#define SVM_DATA_STRUCT_OPENCL_DEVICE - -#include "SVMBinaryNode.h" - -/* - * This kernel inserts a node on an BST. - * Arguments: - * - */ - -__kernel void binTreeInsert( - __global void *rootNode, - __global void *devStartNode, - __global int *g_nodes - ) -{ - __global volatile svm_mutex *tmp_mutex; - __global node *tmp_node, *tmp_parent, *new_node; - - __global node *root = (__global node *)rootNode; - __global node *data = (__global node *)devStartNode; - int flag; - int key; - size_t gpu_nodes = (__global size_t)*g_nodes; - - size_t gidx = get_global_id(0); - - //return if beyond limits - if (gidx >= gpu_nodes) - { - return; - } - - /* Search the parent node. - * Multiple work-items in the a work-group run this part. */ - flag = 0; - tmp_node = root; - - tmp_parent = root; - new_node = &(data[gidx]); - key = (new_node->value); - - while (tmp_node) - { - tmp_parent = tmp_node; - flag = (key - (tmp_node->value)); - tmp_node = (flag < 0) ? tmp_node->left : tmp_node->right; - } - - __global node *child = tmp_node; - int done = 0; - tmp_mutex = &tmp_parent->mutex_node; - int exFlag, expected; - - do - { - tmp_mutex = &tmp_parent->mutex_node; - expected = SVM_MUTEX_UNLOCK; - - exFlag = atomic_compare_exchange_strong_explicit((atomic_int *)&tmp_mutex->count, &expected, SVM_MUTEX_LOCK, memory_order_seq_cst,memory_order_seq_cst, memory_scope_all_svm_devices); - - - if(exFlag) - { - child = (flag < 0) ? tmp_parent->left : tmp_parent->right; - if(child) - { - tmp_parent = child; - flag = ((new_node->value) - (child->value)); - child = (flag < 0) ? tmp_parent->left : tmp_parent->right; - } - else - { - tmp_parent->left = (flag < 0) ? new_node : tmp_parent->left ; - - tmp_parent->right = (flag >= 0) ? new_node : tmp_parent->right ; - done = 1; - } - - expected = SVM_MUTEX_LOCK; - - atomic_compare_exchange_strong_explicit((atomic_int *)&tmp_mutex->count, &expected, SVM_MUTEX_UNLOCK, memory_order_seq_cst,memory_order_seq_cst, memory_scope_all_svm_devices); - - - } - -// atomic_work_item_fence(CLK_GLOBAL_MEM_FENCE, memory_order_seq_cst, memory_scope_all_svm_devices); - - }while (!done); - -} - diff --git a/examples/okra/SVMAtomicsBinaryTreeInsert/SVMAtomicsBinaryTreeInsert.cpp b/examples/okra/SVMAtomicsBinaryTreeInsert/SVMAtomicsBinaryTreeInsert.cpp deleted file mode 100644 index cfb7b8a..0000000 --- a/examples/okra/SVMAtomicsBinaryTreeInsert/SVMAtomicsBinaryTreeInsert.cpp +++ /dev/null @@ -1,291 +0,0 @@ -/********************************************************************** -Copyright ©2014 Advanced Micro Devices, Inc. All rights reserved. - -Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: - -• Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. -• Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or - other materials provided with the distribution. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED - WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY - DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS - OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING - NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -********************************************************************/ - - -using namespace std; -#include "SVMAtomicsBinaryTreeInsert.hpp" -#include "SVMAtomicsBinaryTreeInsert_Host.hpp" - -char *SVMAtomicsBinaryTreeInsert::buildStringFromSourceFile(string fname) { - cout << "using source from " << fname << endl; - ifstream infile; - infile.open(fname.c_str()); - if (!infile) {cout << "could not open " << fname << endl; exit(1);} - infile.seekg(0, ios::end); - int len = infile.tellg(); - char *str = new char[len+1]; - infile.seekg(0, ios::beg); - infile.read(str, len); - int lenRead = infile.gcount(); - // if (!infile) {cout << "could not read " << len << " bytes from " << fname << " but read " << lenRead << endl;} - str[lenRead] = (char)0; // terminate - // cout << "Source String -----------\n" << str << "----------- end of Source String\n"; - return str; -}; - -int SVMAtomicsBinaryTreeInsert::setupSVMBinaryTree() -{ - //Ensure that there is atleast 1 node to start with - if (init_tree_insert < 1) - init_tree_insert = 1; - - if (num_insert > 125000000) - num_insert = 125000000; - - //Num of nodes to insert on host and device - host_nodes = (size_t)((double)num_insert * ((float)hostCompPercent / 100)); - device_nodes = num_insert - host_nodes; - - total_nodes = num_insert + init_tree_insert; - - return SDK_SUCCESS; -} - -int SVMAtomicsBinaryTreeInsert::setupCL(void) -{ - - string sourceFileName = "SVMAtomicsBinaryTreeInsert.hsail"; - char* svmTreeInsertSource = buildStringFromSourceFile(sourceFileName); - - okra_status_t status; - - //create okra context - context = NULL; - - status = okra_get_context(&context); - - if (status != OKRA_SUCCESS) {cout << "Error while creating context:" << (int)status << endl; exit(-1);} - - //create kernel from hsail - kernel = NULL; - - status = okra_create_kernel(context, svmTreeInsertSource, "&__OpenCL_binTreeInsert_kernel", &kernel); - - if (status != OKRA_SUCCESS) {cout << "Error while creating kernel:" << (int)status << endl; exit(-1);} - - // initialize any device/SVM memory here. - svmTreeBuf = (node *) malloc( total_nodes*sizeof(node) ); - - if(NULL == svmTreeBuf) { - cout << " Malloc (svmTreeBuf) failed\n"; - exit (-1); - } - - return SDK_SUCCESS; -} - -int SVMAtomicsBinaryTreeInsert::runCLKernels(void) -{ - - if (host_nodes > 0) - { -#pragma omp parallel for - for (long k = 0; k < host_nodes; k++) - { - insertNode(&(currNode[(size_t)k]), &svmRoot); - } - } - - if (device_nodes > 0) - { - - size_t localThreads = WORKGROUP_SIZE; // 256 - size_t globalThreads = device_nodes; - size_t deviceStartNode = init_tree_insert + host_nodes; - - //setup kernel arguments - okra_clear_args(kernel); - -#ifdef DUMMY_ARGS - //This flags should be set if HSA_HLC_Stable is used - // This is because the high level compiler generates 6 extra args - okra_push_pointer(kernel, NULL); - okra_push_pointer(kernel, NULL); - okra_push_pointer(kernel, NULL); - okra_push_pointer(kernel, NULL); - okra_push_pointer(kernel, NULL); - okra_push_pointer(kernel, NULL); -#endif - okra_push_pointer(kernel, (void *)svmTreeBuf); - okra_push_pointer(kernel, (void *)(svmTreeBuf+deviceStartNode)); - okra_push_pointer(kernel, (void *)&device_nodes); - cout << "Setting kernel args done device_nodes = " << globalThreads << " deviceStartNode = " << deviceStartNode << "\n"; - - //setup execution range - okra_range_t range; - range.dimension=1; - range.global_size[0] = globalThreads; - range.global_size[1] = range.global_size[2] = 1; - range.group_size[0] = localThreads; - range.group_size[1] = range.group_size[2] = 1; - - //execute kernel and wait for completion - okra_status_t status = okra_execute_kernel(context, kernel, &range); - if(status != OKRA_SUCCESS) {cout << "Error while executing kernel:" << (int)status << endl; exit(-1);} - } - - return SDK_SUCCESS; -} - -int SVMAtomicsBinaryTreeInsert::setup() -{ - if(setupSVMBinaryTree() != SDK_SUCCESS) - { - return SDK_FAILURE; - } - - if (setupCL() != SDK_SUCCESS) - { - return SDK_FAILURE; - } - - return SDK_SUCCESS; -} - -int SVMAtomicsBinaryTreeInsert::run() -{ - int status = 0; - - //create the initial binary tree with init_tree_insert nodes - status = cpuCreateBinaryTree(); - CHECK_ERROR(status, SDK_SUCCESS, "cpuCreateBinaryTree() failed."); - - //Advance the current node after initial insert - currNode = svmRoot + init_tree_insert; - - cout << "--------------------------------------------------"; - cout << "-----------------------" << endl; - cout << "Inserting " << num_insert << " nodes in a Binary Tree having "; - cout << init_tree_insert << " Nodes..." << endl; - - cout << "--------------------------------------------------"; - cout << "-----------------------" << endl; - - // Arguments are set and execution call is enqueued on command buffer - if(runCLKernels() != SDK_SUCCESS) - { - return SDK_FAILURE; - } - - cout << "Nodes inserted on host = " << host_nodes << endl; - cout << "Nodes inserted on device = " << device_nodes << endl; - - if (printTreeOrder) - recursiveInOrder(svmRoot); - - return SDK_SUCCESS; -} - -size_t SVMAtomicsBinaryTreeInsert::count_nodes(node* root) -{ - size_t count = 0; - if (root) - count = 1; - - if (root->left) - count += count_nodes(root->left); - - if (root->right) - count += count_nodes(root->right); - - return count; -} - -int SVMAtomicsBinaryTreeInsert::verifyResults() -{ - int status = SDK_SUCCESS; - size_t actualNodes = count_nodes(svmTreeBuf); - cout << "Actual Nodes (including the initial nodes) = " << actualNodes << " total_nodes = " << total_nodes << endl; - - if (actualNodes == total_nodes) - { - cout << "Passed!\n" << endl; - } - else - { - cout << "Failed\n" << endl; - } - return status; -} - -int SVMAtomicsBinaryTreeInsert::cleanup() -{ - free(svmTreeBuf); - - okra_dispose_kernel(kernel); - okra_dispose_context(context); - - return SDK_SUCCESS; -} - -int SVMAtomicsBinaryTreeInsert::cpuCreateBinaryTree() -{ - node* root; - - //Initialize the node elements - initialize_nodes(svmTreeBuf, total_nodes, localSeed); - - //Make tree with given initial nodes - init_tree_insert - root = cpuMakeBinaryTree(init_tree_insert, svmTreeBuf); - - /* set the root */ - svmRoot = root; - - return SDK_SUCCESS; -} - -int SVMAtomicsBinaryTreeInsert::recursiveInOrder(node* leaf) -{ - if(NULL != leaf) - { - recursiveInOrder(leaf->left); - cout << leaf->value << ", "; - recursiveInOrder(leaf->right); - } - - return SDK_SUCCESS; -} - -int main(int argc, char * argv[]) -{ - SVMAtomicsBinaryTreeInsert clSVMBinaryTree; - - // Setup - if(clSVMBinaryTree.setup() != SDK_SUCCESS) - { - return SDK_FAILURE; - } - - // Run - if(clSVMBinaryTree.run() != SDK_SUCCESS) - { - return SDK_FAILURE; - } - - // VerifyResults - if(clSVMBinaryTree.verifyResults() != SDK_SUCCESS) - { - return SDK_FAILURE; - } - - // Cleanup - if (clSVMBinaryTree.cleanup() != SDK_SUCCESS) - { - return SDK_FAILURE; - } - - return SDK_SUCCESS; -} diff --git a/examples/okra/SVMAtomicsBinaryTreeInsert/SVMAtomicsBinaryTreeInsert.hpp b/examples/okra/SVMAtomicsBinaryTreeInsert/SVMAtomicsBinaryTreeInsert.hpp deleted file mode 100644 index 978026a..0000000 --- a/examples/okra/SVMAtomicsBinaryTreeInsert/SVMAtomicsBinaryTreeInsert.hpp +++ /dev/null @@ -1,227 +0,0 @@ -/********************************************************************** -Copyright ©2014 Advanced Micro Devices, Inc. All rights reserved. - -Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: - -• Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. -• Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or - other materials provided with the distribution. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED - WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY - DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS - OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING - NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -********************************************************************/ - - -#ifndef _SVM_BINARY_TREE_H_ -#define _SVM_BINARY_TREE_H_ - -#include "okra.h" -#include -#include -#include -#include -#include -#include - -#include "SVMBinaryNode.h" - -#define NUMBER_OF_NODES 10 * 1024 * 1024 -#define WORKGROUP_SIZE 256 -#define HOST_PERCENT 10 -#define SDK_SUCCESS 0 -#define SDK_FAILURE 1 - -static void error(std::string errorMsg) -{ - std::cout<<"Error: "<count, value, std::memory_order_release); -} - -void svm_mutex_lock(svm_mutex* lock) { - int expected = SVM_MUTEX_UNLOCK; - while(!atomic_compare_exchange_strong_explicit(&lock->count, &expected, SVM_MUTEX_LOCK - , std::memory_order_seq_cst,std::memory_order_seq_cst)) { - expected = SVM_MUTEX_UNLOCK; - } -} - -void svm_mutex_unlock(svm_mutex* lock) { - atomic_store_explicit(&lock->count, SVM_MUTEX_UNLOCK, std::memory_order_release); -} - -void initialize_nodes(node *data, size_t num_nodes, int seed) -{ - node *tmp_node; - int val; - - srand(seed); - for (size_t i = 0; i < num_nodes; i++) - { - tmp_node = &(data[i]); - - val = (((rand() & 255)<<8 | (rand() & 255))<<8 | (rand() & 255))<<7 | (rand() & 127); - - (tmp_node->value) = val; - tmp_node->left = NULL; - tmp_node->right = NULL; - - svm_mutex_init(&tmp_node->mutex_node, SVM_MUTEX_UNLOCK); - } -} - -node* cpuMakeBinaryTree(size_t numNodes, node* inroot) -{ - node* root = NULL; - node* data; - node* nextData; - - if (NULL != inroot) - { - /* allocate first node to root */ - data = (node *)inroot; - nextData = data; - root = nextData; - - /* iterative tree insert */ - for (size_t i = 1; i < numNodes; ++i) - { - nextData = nextData + 1; - - insertNode(nextData, &root); - } - } - - return root; -} - -void insertNode(node* nextData, node** root) -{ - node* nextNode = *root; - node* tmp_parent = NULL; - int key = nextData->value; - int flag = 0; - int done = 0; - - while (nextNode) - { - tmp_parent = nextNode; - flag = (key - (nextNode->value)); - nextNode = (flag < 0) ? nextNode->left : nextNode->right; - } - - node *child = nextNode; - - do - { - svm_mutex *parent_mutex = &tmp_parent->mutex_node; - svm_mutex_lock(parent_mutex); - - child = (flag < 0) ? tmp_parent->left : tmp_parent->right; - if(child) - { - tmp_parent = child; - flag = (key - (child->value)); - child = (flag < 0) ? tmp_parent->left : tmp_parent->right; - } - else - { - tmp_parent->left = (flag < 0) ? nextData : tmp_parent->left ; - - tmp_parent->right = (flag >= 0) ? nextData : tmp_parent->right ; - done = 1; - } - - svm_mutex_unlock(parent_mutex); - - }while (!done); -} diff --git a/examples/okra/SVMAtomicsBinaryTreeInsert/SVMAtomicsBinaryTreeInsert_Host.hpp b/examples/okra/SVMAtomicsBinaryTreeInsert/SVMAtomicsBinaryTreeInsert_Host.hpp deleted file mode 100644 index eea846e..0000000 --- a/examples/okra/SVMAtomicsBinaryTreeInsert/SVMAtomicsBinaryTreeInsert_Host.hpp +++ /dev/null @@ -1,12 +0,0 @@ -#ifndef __SVM_BINARY_NODE_HOST__H -#define __SVM_BINARY_NODE_HOST__H - -#include -#include -#include "SVMBinaryNode.h" - -void initialize_nodes(node *data, size_t num_nodes, int seed); -void insertNode(node* nextData, node** root); -node* cpuMakeBinaryTree(size_t numNodes, node* inroot); - -#endif diff --git a/examples/okra/SVMAtomicsBinaryTreeInsert/SVMBinaryNode.h b/examples/okra/SVMAtomicsBinaryTreeInsert/SVMBinaryNode.h deleted file mode 100644 index af7efee..0000000 --- a/examples/okra/SVMAtomicsBinaryTreeInsert/SVMBinaryNode.h +++ /dev/null @@ -1,32 +0,0 @@ -#ifndef __SVM_BINARY_NODE__ -#define __SVM_BINARY_NODE__ - -#define SVM_MUTEX_LOCK 1 -#define SVM_MUTEX_UNLOCK 0 - -#ifndef SVM_DATA_STRUCT_OPENCL_DEVICE - -#include -#define __global - -#endif - -typedef struct { -#ifndef SVM_DATA_STRUCT_OPENCL_DEVICE - std::atomic count; -#else - volatile int count; -#endif -} svm_mutex; - -typedef struct bin_tree -{ - int value; // Value at a node - __global struct bin_tree *left; // Pointer to the left node - __global struct bin_tree *right; // Pointer to the right node - svm_mutex mutex_node; -} node; - - -#endif //__SVM_BINARY_NODE__ - diff --git a/examples/okra/SVMAtomicsBinaryTreeInsert/hsail.work b/examples/okra/SVMAtomicsBinaryTreeInsert/hsail.work deleted file mode 100644 index dd51d99..0000000 --- a/examples/okra/SVMAtomicsBinaryTreeInsert/hsail.work +++ /dev/null @@ -1,163 +0,0 @@ -version 0:20140528:$full:$large; -extension "amd:gcn"; -extension "IMAGE"; - -decl prog function &__atomic_memfence()( - arg_u32 %arg_p0, - arg_u32 %arg_p1, - arg_u32 %arg_p2); - -decl prog function &abort()(); - -prog kernel &__OpenCL_binTreeInsert_kernel( - kernarg_u64 %global_offset_0, - kernarg_u64 %global_offset_1, - kernarg_u64 %global_offset_2, - kernarg_u64 %printf_buffer, - kernarg_u64 %vqueue_pointer, - kernarg_u64 %aqlwrap_pointer, - kernarg_u64 %rootNode, - kernarg_u64 %devStartNode, - kernarg_u64 %g_nodes) -{ - pragma "AMD RTI", "ARGSTART:__OpenCL_binTreeInsert_kernel"; - pragma "AMD RTI", "version:3:1:104"; - pragma "AMD RTI", "device:generic"; - pragma "AMD RTI", "uniqueid:1030"; - pragma "AMD RTI", "memory:private:16"; - pragma "AMD RTI", "memory:region:0"; - pragma "AMD RTI", "memory:local:0"; - pragma "AMD RTI", "value:global_offset_0:u64:1:1:0"; - pragma "AMD RTI", "value:global_offset_1:u64:1:1:16"; - pragma "AMD RTI", "value:global_offset_2:u64:1:1:32"; - pragma "AMD RTI", "pointer:printf_buffer:u8:1:1:48:uav:8:1:RW:0:0:0"; - pragma "AMD RTI", "value:vqueue_pointer:u64:1:1:64"; - pragma "AMD RTI", "value:aqlwrap_pointer:u64:1:1:80"; - pragma "AMD RTI", "pointer:rootNode:u8:1:1:96:uav:8:1:RW:0:0:0"; - pragma "AMD RTI", "pointer:devStartNode:u8:1:1:112:uav:8:1:RW:0:0:0"; - pragma "AMD RTI", "pointer:g_nodes:u32:1:1:128:uav:8:4:RW:0:0:0"; - pragma "AMD RTI", "function:1:0"; - pragma "AMD RTI", "memory:64bitABI"; - pragma "AMD RTI", "uavid:8"; - pragma "AMD RTI", "privateid:8"; - pragma "AMD RTI", "enqueue_kernel:0"; - pragma "AMD RTI", "kernel_index:0"; - pragma "AMD RTI", "reflection:0:size_t"; - pragma "AMD RTI", "reflection:1:size_t"; - pragma "AMD RTI", "reflection:2:size_t"; - pragma "AMD RTI", "reflection:3:size_t"; - pragma "AMD RTI", "reflection:4:size_t"; - pragma "AMD RTI", "reflection:5:size_t"; - pragma "AMD RTI", "reflection:6:void*"; - pragma "AMD RTI", "reflection:7:void*"; - pragma "AMD RTI", "reflection:8:int*"; - pragma "AMD RTI", "ARGEND:__OpenCL_binTreeInsert_kernel"; - align(4) private_u8 %privateStack[4]; - -@__OpenCL_binTreeInsert_kernel_entry: - // BB#0: // %entry - workitemabsid_u32 $s0, 0; - cvt_u64_u32 $d0, $s0; - ld_kernarg_align(8)_width(all)_u64 $d1, [0]; - add_u64 $d0, $d0, $d1; - ld_kernarg_align(8)_width(all)_u64 $d1, [%g_nodes]; - ld_global_align(4)_const_width(all)_u32 $s0, [$d1]; - cvt_s64_s32 $d1, $s0; - cmp_ge_b1_u64 $c0, $d0, $d1; - cbr_b1 $c0, @BB0_15; - // BB#1: // %if.end - ld_kernarg_align(8)_width(all)_u64 $d3, [%rootNode]; - ld_kernarg_align(8)_width(all)_u64 $d1, [%devStartNode]; - shl_u64 $d0, $d0, 5; - add_u64 $d0, $d1, $d0; - mov_b32 $s0, 0; - nullptr_global_u64 $d1; - cmp_eq_b1_s64 $c0, $d3, $d1; - cbr_b1 $c0, @BB0_2; - // BB#6: - ld_global_align(4)_u32 $s2, [$d0]; - -@BB0_7: - // %while.body - ld_global_align(4)_u32 $s1, [$d3]; - sub_u32 $s1, $s2, $s1; - add_u64 $d2, $d3, 16; - add_u64 $d4, $d3, 8; - cmp_lt_b1_s32 $c0, $s1, 0; - cmov_b64 $d2, $c0, $d4, $d2; - ld_global_align(8)_u64 $d4, [$d2]; - cmp_ne_b1_s64 $c0, $d4, $d1; - mov_b64 $d2, $d3; - mov_b64 $d3, $d4; - cbr_b1 $c0, @BB0_7; - br @BB0_3; - -@BB0_2: - mov_b32 $s1, $s0; - mov_b64 $d2, $d3; - -@BB0_3: - // %do.body.preheader - mov_b32 $s2, 1; - mov_b32 $s3, 2; - mov_b32 $s4, 4; - -@BB0_4: - // %do.body - st_private_align(4)_u32 $s0, [%privateStack]; - ld_private_align(4)_u32 $s5, [%privateStack]; - atomic_cas_global_scar_sys_equiv(255)_b32 $s6, [$d2+24], $s5, $s2; - st_private_align(4)_u32 $s6, [%privateStack]; - cmp_ne_b1_s32 $c0, $s6, $s5; - cbr_b1 $c0, @BB0_5; - // BB#8: // %if.then9 - add_u64 $d3, $d2, 24; - add_u64 $d4, $d2, 16; - add_u64 $d5, $d2, 8; - cmp_lt_b1_s32 $c0, $s1, 0; - cmov_b64 $d6, $c0, $d5, $d4; - ld_global_align(8)_u64 $d6, [$d6]; - cmp_ne_b1_s64 $c0, $d6, $d1; - cbr_b1 $c0, @BB0_9; - // BB#10: // %if.else - cmp_gt_b1_s32 $c0, $s1, -1; - mov_b64 $d6, $d0; - cbr_b1 $c0, @BB0_12; - // BB#11: // %cond.false42 - st_global_align(8)_u64 $d0, [$d5]; - ld_global_align(8)_u64 $d6, [$d4]; - -@BB0_12: - // %cond.end44 - st_global_align(8)_u64 $d6, [$d4]; - mov_b32 $s5, $s2; - br @BB0_13; - -@BB0_5: - mov_b32 $s5, $s0; - br @BB0_14; - -@BB0_9: - // %if.then19 - ld_global_align(4)_u32 $s1, [$d6]; - ld_global_align(4)_u32 $s5, [$d0]; - sub_u32 $s1, $s5, $s1; - mov_b64 $d2, $d6; - mov_b32 $s5, $s0; - -@BB0_13: - // %if.end47 - st_private_align(4)_u32 $s2, [%privateStack]; - ld_private_align(4)_u32 $s6, [%privateStack]; - atomic_cas_global_scar_sys_equiv(255)_b32 $s6, [$d3], $s6, $s0; - st_private_align(4)_u32 $s6, [%privateStack]; - -@BB0_14: - // %if.end50 - cmp_eq_b1_s32 $c0, $s5, 0; - cbr_b1 $c0, @BB0_4; - -@BB0_15: - // %do.end - ret; -}; diff --git a/examples/okra/SVMAtomicsBinaryTreeInsert/obj/SVMAtomicsBinaryTreeInsert.o b/examples/okra/SVMAtomicsBinaryTreeInsert/obj/SVMAtomicsBinaryTreeInsert.o deleted file mode 100644 index 0fcd27c..0000000 Binary files a/examples/okra/SVMAtomicsBinaryTreeInsert/obj/SVMAtomicsBinaryTreeInsert.o and /dev/null differ diff --git a/examples/okra/SVMAtomicsBinaryTreeInsert/obj/SVMAtomicsBinaryTreeInsert_Host.o b/examples/okra/SVMAtomicsBinaryTreeInsert/obj/SVMAtomicsBinaryTreeInsert_Host.o deleted file mode 100644 index 766eed8..0000000 Binary files a/examples/okra/SVMAtomicsBinaryTreeInsert/obj/SVMAtomicsBinaryTreeInsert_Host.o and /dev/null differ diff --git a/examples/okra/SVMBinaryTreeSearch/Makefile b/examples/okra/SVMBinaryTreeSearch/Makefile deleted file mode 100644 index 102f359..0000000 --- a/examples/okra/SVMBinaryTreeSearch/Makefile +++ /dev/null @@ -1,34 +0,0 @@ -#ifndef HSA_RUNTIME_PATH - HSA_RUNTIME_PATH=/opt/hsa -#endif -#ifndef HSA_OKRA_PATH - HSA_OKRA_PATH=/opt/amd/okra -#endif -TEST_NAME=SVMBinaryTreeSearch -LFLAGS= -g -Wl,--unresolved-symbols=ignore-in-shared-libs -INCS += -I $(HSA_RUNTIME_PATH)/include -I $(HSA_OKRA_PATH)/dist/include -CPP_FILES := $(wildcard *.cpp) -OBJ_FILES := $(addprefix obj/, $(notdir $(CPP_FILES:.cpp=.o))) - -all: $(TEST_NAME) $(TEST_NAME).hsail - -$(TEST_NAME): $(OBJ_FILES) - $(CXX) $(LFLAGS) $(OBJ_FILES) -lelf -L$(HSA_RUNTIME_PATH)/lib -lokra_x86_64 -o $(TEST_NAME) - -$(TEST_NAME).hsail : - cloc.sh -hsail $(TEST_NAME).cl - -obj/%.o: %.cpp - $(CXX) -c $(CFLAGS) $(INCS) -o $@ $< - -clean: - rm -rf obj/*o *.hsail $(TEST_NAME) - - -test: - echo "export LD_LIBRARY_PATH=$(HSA_RUNTIME_PATH)/lib" > test.sh - echo "./$(TEST_NAME)" >> test.sh - bash test.sh - rm test.sh - - diff --git a/examples/okra/SVMBinaryTreeSearch/README b/examples/okra/SVMBinaryTreeSearch/README deleted file mode 100644 index 2093116..0000000 --- a/examples/okra/SVMBinaryTreeSearch/README +++ /dev/null @@ -1,24 +0,0 @@ - -In this application, the kernel searches for set of keys in the binary search -tree created by the host. To run the application so the following: - -1. Install HSA stack (runtime, compilers, CLOC, drivers) as given in the HSA Foundation web site GITHUB -2. Set environment variables - -export HSA_RUNTIME_PATH=/home/cas/Prakash/GitObsedian/HSA-Runtime-AMD -export HSA_KMT_PATH=/home/cas/Prakash/GitObsedian/HSA-Drivers-Linux-AMD/kfd-0.8/libhsakmt/ -export HSA_OKRA_PATH=/home/cas/Prakash/GitObsedian/Okra-Interface-to-HSA-Device/okra/ -export HSA_LLVM_PATH=/home/cas/Prakash/GitObsedian/HSAIL-HLC-Development/bin -export LD_LIBRARY_PATH=$HSA_RUNTIME_PATH/lib/x86_64:$HSA_KMT_PATH/lnx64a:$HSA_OKRA_PATH/dist/bin -export OKRA_DISABLE_FIX_HSAIL=1 - -3. Set these macros to the value you want. Number of nodes refer to the nodes -in the tree, search key ratio is the percentage of nodes you want to search. - -#define NUMBER_OF_NODES 1 * 1024 * 1024 -#define SEARCH_KEY_NODE_RATIO (0.25) -#define DEFAULT_LOCAL_SIZE 256 - -4. Run by giving ./SVMBinaryTreeSearch - -It will report "Passed" if everything is passed. diff --git a/examples/okra/SVMBinaryTreeSearch/SDKUtil.hpp b/examples/okra/SVMBinaryTreeSearch/SDKUtil.hpp deleted file mode 100644 index 10096dd..0000000 --- a/examples/okra/SVMBinaryTreeSearch/SDKUtil.hpp +++ /dev/null @@ -1,568 +0,0 @@ -/********************************************************************** -Copyright ©2014 Advanced Micro Devices, Inc. All rights reserved. - -Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: - -1 Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. -2 Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or - other materials provided with the distribution. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED - WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY - DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS - OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING - NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -********************************************************************/ - - -#ifndef SDKUTIL_HPP_ -#define SDKUTIL_HPP_ - -/****************************************************************************** -* Included header files * -******************************************************************************/ -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - - -#if defined(__MINGW32__) && !defined(__MINGW64_VERSION_MAJOR) -#define _aligned_malloc __mingw_aligned_malloc -#define _aligned_free __mingw_aligned_free -#endif // __MINGW32__ and __MINGW64_VERSION_MAJOR - -#ifndef _WIN32 -#if defined(__INTEL_COMPILER) -#pragma warning(disable : 1125) -#endif -#endif - -#ifdef _WIN32 -#include -#else -#include -#include -#include -#endif - -/****************************************************************************** -* Defined macros * -******************************************************************************/ -#define SDK_SUCCESS 0 -#define SDK_FAILURE 1 -#define SDK_EXPECTED_FAILURE 2 - -#define CHECK_ALLOCATION(actual, msg) \ - if(actual == NULL) \ - { \ - error(msg); \ - std::cout << "Location : " << __FILE__ << ":" << __LINE__<< std::endl; \ - return SDK_FAILURE; \ - } - -#define CHECK_ERROR(actual, reference, msg) \ - if(actual != reference) \ - { \ - error(msg); \ - std::cout << "Location : " << __FILE__ << ":" << __LINE__<< std::endl; \ - return SDK_FAILURE; \ - } - -#define FREE(ptr) \ - { \ - if(ptr != NULL) \ - { \ - free(ptr); \ - ptr = NULL; \ - } \ - } - -#ifdef _WIN32 -#define ALIGNED_FREE(ptr) \ - { \ - if(ptr != NULL) \ - { \ - _aligned_free(ptr); \ - ptr = NULL; \ - } \ - } -#endif - - -/****************************************************************************** -* namespace boltsdk * -******************************************************************************/ -namespace appsdk -{ - -/************************************************************************** -* CmdArgsEnum * -* Enum for datatype of CmdArgs * -**************************************************************************/ -enum CmdArgsEnum -{ - CA_ARG_INT, - CA_ARG_FLOAT, - CA_ARG_DOUBLE, - CA_ARG_STRING, - CA_NO_ARGUMENT -}; - -/** - * error - * constant function, Prints error messages - * @param errorMsg std::string message - */ -static void error(std::string errorMsg) -{ - std::cout<<"Error: "< -std::string toString(T t, std::ios_base & (*r)(std::ios_base&) = std::dec) -{ - std::ostringstream output; - output << r << t; - return output.str(); -} - -/** - * filetoString - * converts any file into a string - * @param file string message - * @param str string message - * @return 0 on success Positive if expected and Non-zero on failure - */ -static int fileToString(std::string &fileName, std::string &str) -{ - size_t size; - char* buf; - // Open file stream - std::fstream f(fileName.c_str(), (std::fstream::in | std::fstream::binary)); - // Check if we have opened file stream - if (f.is_open()) - { - size_t sizeFile; - // Find the stream size - f.seekg(0, std::fstream::end); - size = sizeFile = (size_t)f.tellg(); - f.seekg(0, std::fstream::beg); - buf = new char[size + 1]; - if (!buf) - { - f.close(); - return SDK_FAILURE; - } - // Read file - f.read(buf, sizeFile); - f.close(); - str[size] = '\0'; - str = buf; - return SDK_SUCCESS; - } - else - { - error("Converting file to string. Cannot open file."); - str = ""; - return SDK_FAILURE; - } -} - -/** -******************************************************************* -* @fn printArray -* @brief displays a array on std::out -******************************************************************/ -template -void printArray( - const std::string header, - const T * data, - const int width, - const int height) -{ - std::cout<<"\n"< -void printArray( - const std::string header, - const T * data, - const int width, - const int height, - int veclen) -{ - std::cout<<"\n"< -void printArray( - const std::string header, - const std::vector& data, - const int width, - const int height) -{ - std::cout<<"\n"< stats[i].length())? - statsStr[i].length() : stats[i].length()); - std::cout << " " << std::setw(columnWidth[i]+1) << std::left << statsStr[i] << - "|"; - } - std::cout << std::endl << "|"; - for(int i=0; i -int fillRandom( - T * arrayPtr, - const int width, - const int height, - const T rangeMin, - const T rangeMax, - unsigned int seed=123) -{ - if(!arrayPtr) - { - error("Cannot fill array. NULL pointer."); - return SDK_FAILURE; - } - if(!seed) - { - seed = (unsigned int)time(NULL); - } - srand(seed); - double range = double(rangeMax - rangeMin) + 1.0; - /* random initialisation of input */ - for(int i = 0; i < height; i++) - for(int j = 0; j < width; j++) - { - int index = i*width + j; - arrayPtr[index] = rangeMin + T(range*rand()/(RAND_MAX + 1.0)); - } - return SDK_SUCCESS; -} - -/** - * fillPos - * fill the specified positions - */ -template -int fillPos( - T * arrayPtr, - const int width, - const int height) -{ - if(!arrayPtr) - { - error("Cannot fill array. NULL pointer."); - return SDK_FAILURE; - } - /* initialisation of input with positions*/ - for(T i = 0; i < height; i++) - for(T j = 0; j < width; j++) - { - T index = i*width + j; - arrayPtr[index] = index; - } - return SDK_SUCCESS; -} - -/** - * fillConstant - * fill the array with constant value - */ -template -int fillConstant( - T * arrayPtr, - const int width, - const int height, - const T val) -{ - if(!arrayPtr) - { - error("Cannot fill array. NULL pointer."); - return SDK_FAILURE; - } - /* initialisation of input with constant value*/ - for(int i = 0; i < height; i++) - for(int j = 0; j < width; j++) - { - int index = i*width + j; - arrayPtr[index] = val; - } - return SDK_SUCCESS; -} - - -/** - * roundToPowerOf2 - * rounds to a power of 2 - */ -template -T roundToPowerOf2(T val) -{ - int bytes = sizeof(T); - val--; - for(int i = 0; i < bytes; i++) - { - val |= val >> (1< -int isPowerOf2(T val) -{ - long long _val = val; - if((_val & (-_val))-_val == 0 && _val != 0) - { - return SDK_SUCCESS; - } - else - { - return SDK_FAILURE; - } -} - -/** - * getPath - * @return path of the current directory - */ -static std::string getPath() -{ -#ifdef _WIN32 - char buffer[MAX_PATH]; -#ifdef UNICODE - if(!GetModuleFileName(NULL, (LPWCH)buffer, sizeof(buffer))) - { - throw std::string("GetModuleFileName() failed!"); - } -#else - if(!GetModuleFileName(NULL, buffer, sizeof(buffer))) - { - throw std::string("GetModuleFileName() failed!"); - } -#endif - std::string str(buffer); - /* '\' == 92 */ - int last = (int)str.find_last_of((char)92); -#else - char buffer[PATH_MAX + 1]; - ssize_t len; - if((len = readlink("/proc/self/exe",buffer, sizeof(buffer) - 1)) == -1) - { - throw std::string("readlink() failed!"); - } - buffer[len] = '\0'; - std::string str(buffer); - /* '/' == 47 */ - int last = (int)str.find_last_of((char)47); -#endif - return str.substr(0, last + 1); -} - diff --git a/examples/okra/SVMBinaryTreeSearch/SVMBinaryNode.h b/examples/okra/SVMBinaryTreeSearch/SVMBinaryNode.h deleted file mode 100644 index e716605..0000000 --- a/examples/okra/SVMBinaryTreeSearch/SVMBinaryNode.h +++ /dev/null @@ -1,20 +0,0 @@ -#ifndef __SVM_BINARY_NODE__ -#define __SVM_BINARY_NODE__ - -/* binary tree node definition */ -typedef struct nodeStruct -{ - int value; - struct nodeStruct* left; - struct nodeStruct* right; -} node; - -/* search keys */ -typedef struct searchKeyStruct -{ - int key; - node* oclNode; - node* nativeNode; -} searchKey; - -#endif diff --git a/examples/okra/SVMBinaryTreeSearch/SVMBinaryTreeSearch b/examples/okra/SVMBinaryTreeSearch/SVMBinaryTreeSearch deleted file mode 100755 index 9e4f5cd..0000000 Binary files a/examples/okra/SVMBinaryTreeSearch/SVMBinaryTreeSearch and /dev/null differ diff --git a/examples/okra/SVMBinaryTreeSearch/SVMBinaryTreeSearch.cl b/examples/okra/SVMBinaryTreeSearch/SVMBinaryTreeSearch.cl deleted file mode 100644 index 9bd7b29..0000000 --- a/examples/okra/SVMBinaryTreeSearch/SVMBinaryTreeSearch.cl +++ /dev/null @@ -1,64 +0,0 @@ -/********************************************************************** -Copyright ©2014 Advanced Micro Devices, Inc. All rights reserved. - -Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: - -• Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. -• Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or - other materials provided with the distribution. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED - WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY - DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS - OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING - NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -********************************************************************/ - -/* binary tree node definition */ -typedef struct nodeStruct -{ - int value; - __global struct nodeStruct* left; - __global struct nodeStruct* right; -} node; - -/* search keys */ -typedef struct searchKeyStruct -{ - int key; - __global node* oclNode; - __global node* nativeNode; -} searchKey; - -/*** - * sample_kernel: - ***/ -__kernel void btree_search(__global void* bstRoot, - __global void* searchKeyVect) -{ - __global node* searchNode = (__global node *)(bstRoot); - __global searchKey* keyPtr = (__global searchKey*)(searchKeyVect); - int gid = get_global_id(0); - __global searchKey* currKey = keyPtr + gid; - - while(NULL != searchNode) - { - if(currKey->key == searchNode->value) - { - /* rejoice on finding key */ - currKey->oclNode = searchNode; - searchNode = NULL; - } - else if(currKey->key < searchNode->value) - { - /* move left */ - searchNode = searchNode->left; - } - else - { - /* move right */ - searchNode = searchNode->right; - } - } -} - diff --git a/examples/okra/SVMBinaryTreeSearch/SVMBinaryTreeSearch.cpp b/examples/okra/SVMBinaryTreeSearch/SVMBinaryTreeSearch.cpp deleted file mode 100644 index 258c083..0000000 --- a/examples/okra/SVMBinaryTreeSearch/SVMBinaryTreeSearch.cpp +++ /dev/null @@ -1,493 +0,0 @@ -/********************************************************************** -Copyright ©2014 Advanced Micro Devices, Inc. All rights reserved. - -Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: - -• Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. -• Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or - other materials provided with the distribution. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED - WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY - DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS - OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING - NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -********************************************************************/ - - -using namespace std; -#include "SVMBinaryTreeSearch.hpp" - -char *SVMBinaryTreeSearch::buildStringFromSourceFile(string fname) { - cout << "using source from " << fname << endl; - ifstream infile; - infile.open(fname.c_str()); - if (!infile) {cout << "could not open " << fname << endl; exit(1);} - infile.seekg(0, ios::end); - int len = infile.tellg(); - char *str = new char[len+1]; - infile.seekg(0, ios::beg); - infile.read(str, len); - int lenRead = infile.gcount(); - // if (!infile) {cout << "could not read " << len << " bytes from " << fname << " but read " << lenRead << endl;} - str[lenRead] = (char)0; // terminate - // cout << "Source String -----------\n" << str << "----------- end of Source String\n"; - return str; -}; - -int SVMBinaryTreeSearch::setupSVMBinaryTree() -{ - - /* setup number of keys */ - if(numKeys == 0) - numKeys = numNodes*SEARCH_KEY_NODE_RATIO; - - /* if localRandMax GT RAND_MAX set it to RAND_MAX */ - if(localRandMax > RAND_MAX) - localRandMax = RAND_MAX; - - /* initialize random number generator */ - if(localSeed == 0) - srand(time(NULL)); - else - srand(localSeed); - - - return SDK_SUCCESS; -} - -int SVMBinaryTreeSearch::setupCL(void) -{ - - string sourceFileName = "SVMBinaryTreeSearch.hsail"; - char* svmTreeSource = buildStringFromSourceFile(sourceFileName); - - okra_status_t status; - - //create okra context - context = NULL; - - status = okra_get_context(&context); - - if (status != OKRA_SUCCESS) {cout << "Error while creating context:" << (int)status << endl; exit(-1);} - - //create kernel from hsail - kernel = NULL; - - status = okra_create_kernel(context, svmTreeSource, "&__OpenCL_btree_search_kernel", &kernel); - - if (status != OKRA_SUCCESS) {cout << "Error while creating kernel:" << (int)status << endl; exit(-1);} - - // initialize any device/SVM memory here. - svmTreeBuf = malloc( numNodes*sizeof(node) ); - - if(NULL == svmTreeBuf) { - cout << " Malloc (svmTreeBuf) failed\n"; - exit (-1); - } - - svmSearchBuf = malloc( numKeys*sizeof(searchKey)); - - if(NULL == svmSearchBuf) { - cout << " Malloc (SearchBuf) failed\n"; - exit (-1); - } - - return SDK_SUCCESS; -} - -int SVMBinaryTreeSearch::runCLKernels(void) -{ - int status; - - /* run global kernels for stage decided by input length */ - status = runSampleKernel(); - CHECK_ERROR(status, SDK_SUCCESS, "runSampleKernel() failed."); - - return SDK_SUCCESS; -} - -int SVMBinaryTreeSearch::runSampleKernel() -{ - size_t localThreads = WORKGROUP_SIZE; // 256 - size_t globalThreads = numKeys; - - //setup kernel arguments - okra_clear_args(kernel); - -#ifdef DUMMY_ARGS - //This flags should be set if HSA_HLC_Stable is used - // This is because the high level compiler generates 6 extra args - okra_push_pointer(kernel, NULL); - okra_push_pointer(kernel, NULL); - okra_push_pointer(kernel, NULL); - okra_push_pointer(kernel, NULL); - okra_push_pointer(kernel, NULL); - okra_push_pointer(kernel, NULL); -#endif - okra_push_pointer(kernel, svmTreeBuf); - okra_push_pointer(kernel, svmSearchBuf); - cout << "Setting kernel args done!\n"; - - //setup execution range - okra_range_t range; - range.dimension=1; - range.global_size[0] = globalThreads; - range.global_size[1] = range.global_size[2] = 1; - range.group_size[0] = localThreads; - range.group_size[1] = range.group_size[2] = 1; - - //execute kernel and wait for completion - okra_status_t status = okra_execute_kernel(context, kernel, &range); - if(status != OKRA_SUCCESS) {cout << "Error while executing kernel:" << (int)status << endl; exit(-1);} - - return SDK_SUCCESS; -} - -int SVMBinaryTreeSearch::svmBinaryTreeCPUReference() -{ - searchKey* keyPtr = (searchKey*)svmSearchBuf; - searchKey* currKey = keyPtr; - node* searchNode = svmRoot; - int status = SDK_SUCCESS; - - for(int i = 0; i < numKeys; ++i) - { - /* search tree */ - searchNode = svmRoot; - - while(NULL != searchNode) - { - if(currKey->key == searchNode->value) - { - /* rejoice on finding key */ - currKey->nativeNode = searchNode; - searchNode = NULL; - } - else if(currKey->key < searchNode->value) - { - /* move left */ - searchNode = searchNode->left; - } - else - { - /* move right */ - searchNode = searchNode->right; - } - } - - /* move to next key */ - currKey += 1; - } - - return SDK_SUCCESS; -} - -int SVMBinaryTreeSearch::setup() -{ - if(setupSVMBinaryTree() != SDK_SUCCESS) - { - return SDK_FAILURE; - } - - if (setupCL() != SDK_SUCCESS) - { - return SDK_FAILURE; - } - - return SDK_SUCCESS; -} - -int SVMBinaryTreeSearch::run() -{ - int status = 0; - - //create the binary tree - status = cpuCreateBinaryTree(); - CHECK_ERROR(status, SDK_SUCCESS, "cpuCreateBinaryTree() failed."); - - //initialize search keys - status = cpuInitSearchKeys(); - CHECK_ERROR(status, SDK_SUCCESS, "cpuInitSearchKeys() failed."); - - //warm up run - if(runCLKernels() != SDK_SUCCESS) - { - return SDK_FAILURE; - } - - cout << "-------------------------------------------" << std::endl; - cout << "Executing kernel for " << iterations - << " iterations" << std::endl; - cout << "-------------------------------------------" << std::endl; - - for(int i = 0; i < iterations; i++) - { - // Arguments are set and execution call is enqueued on command buffer - if(runCLKernels() != SDK_SUCCESS) - { - return SDK_FAILURE; - } - } - - return SDK_SUCCESS; -} - -int SVMBinaryTreeSearch::verifyResults() -{ - int status = SDK_SUCCESS; - // reference implementation - svmBinaryTreeCPUReference(); - - // compare the results and see if they match - status = compare(); - if(SDK_SUCCESS == status) - { - cout << "Passed!\n" << std::endl; - } - else - { - cout << "Failed\n" << std::endl; - } - return status; -} - -int SVMBinaryTreeSearch::cleanup() -{ - // Releases OpenCL resources (Context, Memory etc.) - int status = 0; - - //dispose okra resources - okra_dispose_kernel(kernel); - okra_dispose_context(context); - - free(svmTreeBuf); - free(svmSearchBuf); - return SDK_SUCCESS; -} - -int SVMBinaryTreeSearch::lrand() -{ - float frand; - - /* generate a real random number between 0 and 1.0 */ - frand = (float)rand()/(float)(RAND_MAX); - - /* convert to the range needed */ - return (int)(frand*localRandMax); -} - -/** - * cpuCreateBinaryTree() - * creates a tree from the data in "svmTreeBuf". If this is NULL returns NULL - * else returns root of the tree. - **/ -int SVMBinaryTreeSearch::cpuCreateBinaryTree() -{ - node* root; - int status; - - status = cpuInitNodes(); - CHECK_ERROR(status, SDK_SUCCESS, "cpuInitNodes() failed."); - - root = cpuMakeBinaryTree(); - CHECK_ERROR(status, SDK_SUCCESS, "cpuMakeBinaryTree() failed."); - - svmRoot = root; - - return SDK_SUCCESS; -} - -node* SVMBinaryTreeSearch::cpuMakeBinaryTree() -{ - node* root = NULL; - node* data; - node* nextData; - node* nextNode; - bool insertedFlag = false; - - if (NULL != svmTreeBuf) - { - /* allocate first node to root */ - data = (node *)svmTreeBuf; - nextData = data; - root = nextData; - - /* iterative tree insert */ - for (int i = 1; i < numNodes; ++i) - { - nextData = nextData + 1; - - nextNode = root; - insertedFlag = false; - - while(false == insertedFlag) - { - if(nextData->value <= nextNode->value) - { - /* move left */ - if(NULL == nextNode->left) - { - nextNode->left = nextData; - insertedFlag = true; - } - else - { - nextNode = nextNode->left; - } - } - else - { - /* move right */ - if(NULL == nextNode->right) - { - nextNode->right = nextData; - insertedFlag = true; - } - else - { - nextNode = nextNode->right; - } - } - } - } - } - - return root; -} - -int SVMBinaryTreeSearch::cpuInitNodes() -{ - node* nextData; - - if (NULL != svmTreeBuf) - { - /* get the first node */ - nextData = (node *)svmTreeBuf; - - /* initialize nodes */ - for (int i = 0; i < numNodes; ++i) - { - /* allocate a random value to node */ - nextData->value = lrand(); - - /* all pointers are null */ - nextData->left = NULL; - nextData->right = NULL; - - nextData = nextData + 1; - } - } - else - { - return SDK_FAILURE; - } - - return SDK_SUCCESS; -} - -int SVMBinaryTreeSearch::cpuInitSearchKeys() -{ - searchKey* nextData; - int status = SDK_SUCCESS; - - if (NULL != svmSearchBuf) - { - /* get the first node */ - nextData = (searchKey *)svmSearchBuf; - - /* initialize nodes */ - for (int i = 0; i < numKeys; ++i) - { - /* allocate a random value to node */ - nextData->key = lrand(); - nextData->oclNode = NULL; - nextData->nativeNode = NULL; - - nextData = nextData + 1; - } - } - else - { - status = SDK_FAILURE; - } - - return status; -} - -int SVMBinaryTreeSearch::compare() -{ - searchKey* keyPtr = (searchKey*)svmSearchBuf; - searchKey* currKey = keyPtr; - int compare_status = SDK_SUCCESS; - int status; - - for(int i = 0; i < numKeys; ++i) - { - /* compare OCL and native nodes */ - if(currKey->oclNode != currKey->nativeNode) - { - compare_status = SDK_FAILURE; - } - - /* next key */ - currKey += 1; - } - - return compare_status; -} - - -int SVMBinaryTreeSearch::printInOrder() -{ - int status; - - status = recursiveInOrder(svmRoot); - - return SDK_SUCCESS; -} - -int SVMBinaryTreeSearch::recursiveInOrder(node* leaf) -{ - if(NULL != leaf) - { - recursiveInOrder(leaf->left); - cout << leaf->value << ", "; - recursiveInOrder(leaf->right); - } - - return SDK_SUCCESS; -} - - - -int main(int argc, char * argv[]) -{ - SVMBinaryTreeSearch clSVMBinaryTree; - - // Setup - if(clSVMBinaryTree.setup() != SDK_SUCCESS) - { - return SDK_FAILURE; - } - - // Run - if(clSVMBinaryTree.run() != SDK_SUCCESS) - { - return SDK_FAILURE; - } - - // VerifyResults - if(clSVMBinaryTree.verifyResults() != SDK_SUCCESS) - { - return SDK_FAILURE; - } - - // Cleanup - if (clSVMBinaryTree.cleanup() != SDK_SUCCESS) - { - return SDK_FAILURE; - } - - return SDK_SUCCESS; -} diff --git a/examples/okra/SVMBinaryTreeSearch/SVMBinaryTreeSearch.hpp b/examples/okra/SVMBinaryTreeSearch/SVMBinaryTreeSearch.hpp deleted file mode 100644 index 9178923..0000000 --- a/examples/okra/SVMBinaryTreeSearch/SVMBinaryTreeSearch.hpp +++ /dev/null @@ -1,322 +0,0 @@ -/********************************************************************** -Copyright ©2014 Advanced Micro Devices, Inc. All rights reserved. - -Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: - -• Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. -• Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or - other materials provided with the distribution. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED - WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY - DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS - OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING - NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -********************************************************************/ - - -#ifndef _SVM_BINARY_TREE_H_ -#define _SVM_BINARY_TREE_H_ - -#include "okra.h" -#include -#include -#include -#include -#include -#include -#include -#include "SVMBinaryNode.h" - -#define NUMBER_OF_NODES 1 * 1024 * 1024 -#define SEARCH_KEY_NODE_RATIO (0.25) -#define DEFAULT_LOCAL_SIZE 256 - -#define SDK_SUCCESS 0 -#define SDK_FAILURE 1 - -#define WORKGROUP_SIZE 256 - -static void error(std::string errorMsg) -{ - std::cout<<"Error: "< -#include -using namespace std; - -static char *buildStringFromSourceFile(string fname) { - cout << "using source from " << fname << endl; - ifstream infile; - infile.open(fname.c_str()); - if (!infile) {cout << "could not open " << fname << endl; exit(1);} - infile.seekg(0, ios::end); - int len = infile.tellg(); - char *str = new char[len+1]; - infile.seekg(0, ios::beg); - infile.read(str, len); - int lenRead = infile.gcount(); - // if (!infile) {cout << "could not read " << len << " bytes from " << fname << " but read " << lenRead << endl;} - str[lenRead] = (char)0; // terminate - // cout << "Source String -----------\n" << str << "----------- end of Source String\n"; - return str; -}; diff --git a/examples/snack/async_vecsum/buildrun.sh b/examples/snack/async_vecsum/buildrun.sh index 2a77e88..a9d1a6d 100755 --- a/examples/snack/async_vecsum/buildrun.sh +++ b/examples/snack/async_vecsum/buildrun.sh @@ -2,19 +2,18 @@ # Set HSA Environment variables [ -z $HSA_RUNTIME_PATH ] && HSA_RUNTIME_PATH=/opt/hsa -[ -z HSA_LIBHSAIL_PATH ] && HSA_LIBHSAIL_PATH=/opt/hsa/lib -[ -z HSA_LLVM_PATH ] && HSA_LLVM_PATH=/opt/amd/bin +[ -z $HSA_LLVM_PATH ] && HSA_LLVM_PATH=/opt/amd/cloc/bin export LD_LIBRARY_PATH=$HSA_RUNTIME_PATH/lib # Compile accelerated functions echo if [ -f sumKernel.o ] ; then rm sumKernel.o ; fi -echo snack.sh -c sumKernel.cl -snack.sh -c sumKernel.cl +echo $HSA_LLVM_PATH/snack.sh -c sumKernel.cl +$HSA_LLVM_PATH/snack.sh -c sumKernel.cl echo if [ -f vecsum ] ; then rm vecsum ; fi -echo g++ -O3 -o vecsum sumKernel.o vecsum.cpp -L $HSA_RUNTIME_PATH/lib -lhsa-runtime64 -lelf -g++ -O3 -o vecsum sumKernel.o vecsum.cpp -L $HSA_RUNTIME_PATH/lib -lhsa-runtime64 -lelf +echo g++ -O3 -o vecsum sumKernel.o vecsum.cpp -L $HSA_RUNTIME_PATH/lib -lhsa-runtime64 +g++ -O3 -o vecsum sumKernel.o vecsum.cpp -L $HSA_RUNTIME_PATH/lib -lhsa-runtime64 # Execute echo diff --git a/examples/snack/csquares/buildrun.sh b/examples/snack/csquares/buildrun.sh index 3e30736..97deb3e 100755 --- a/examples/snack/csquares/buildrun.sh +++ b/examples/snack/csquares/buildrun.sh @@ -2,22 +2,21 @@ # Set HSA Environment variables [ -z $HSA_RUNTIME_PATH ] && HSA_RUNTIME_PATH=/opt/hsa -[ -z HSA_LIBHSAIL_PATH ] && HSA_LIBHSAIL_PATH=/opt/hsa/lib -[ -z HSA_LLVM_PATH ] && HSA_LLVM_PATH=/opt/amd/bin +[ -z $HSA_LLVM_PATH ] && HSA_LLVM_PATH=/opt/amd/cloc/bin export LD_LIBRARY_PATH=$HSA_RUNTIME_PATH/lib # Compile accelerated functions echo if [ -f CSquares.o ] ; then rm CSquares.o ; fi -echo snack.sh -q -c CSquares.cl -snack.sh -q -c CSquares.cl +echo $HSA_LLVM_PATH/snack.sh -q -c CSquares.cl +$HSA_LLVM_PATH/snack.sh -q -c CSquares.cl # Compile Main and link to accelerated functions in CSquares.o echo if [ -f CSquares ] ; then rm CSquares ; fi -echo "g++ -o CSquares CSquares.o CSquares.cpp -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 -lelf " -g++ -o CSquares CSquares.o CSquares.cpp -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 -lelf +echo "g++ -o CSquares CSquares.o CSquares.cpp -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 " +g++ -o CSquares CSquares.o CSquares.cpp -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 # Execute echo diff --git a/examples/snack/fortran/buildrun.sh b/examples/snack/fortran/buildrun.sh index 810da77..b10430c 100755 --- a/examples/snack/fortran/buildrun.sh +++ b/examples/snack/fortran/buildrun.sh @@ -2,20 +2,19 @@ # Set HSA Environment variables [ -z $HSA_RUNTIME_PATH ] && HSA_RUNTIME_PATH=/opt/hsa -[ -z HSA_LIBHSAIL_PATH ] && HSA_LIBHSAIL_PATH=/opt/hsa/lib -[ -z HSA_LLVM_PATH ] && HSA_LLVM_PATH=/opt/amd/bin +[ -z $HSA_LLVM_PATH ] && HSA_LLVM_PATH=/opt/amd/cloc/bin export LD_LIBRARY_PATH=$HSA_RUNTIME_PATH/lib # First compile the acclerated functions to create hw.o # Tell cloc to use fortran names for external references echo -echo snack.sh -q -fort -c hw.cl -snack.sh -q -fort -c hw.cl +echo $HSA_LLVM_PATH/snack.sh -q -fort -c hw.cl +$HSA_LLVM_PATH/snack.sh -q -fort -c hw.cl # Compile the main Fortran program and link to hw.o echo -echo "f95 -o HelloWorld hw.o HelloWorld.f -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 -lelf " -f95 -o HelloWorld hw.o HelloWorld.f -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 -lelf +echo "f95 -o HelloWorld hw.o HelloWorld.f -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 " +f95 -o HelloWorld hw.o HelloWorld.f -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 echo echo ./HelloWorld diff --git a/examples/snack/helloworld/buildrun.sh b/examples/snack/helloworld/buildrun.sh index e0cdf67..599c8c9 100755 --- a/examples/snack/helloworld/buildrun.sh +++ b/examples/snack/helloworld/buildrun.sh @@ -9,32 +9,31 @@ # Set HSA Environment variables [ -z $HSA_RUNTIME_PATH ] && HSA_RUNTIME_PATH=/opt/hsa -[ -z HSA_LIBHSAIL_PATH ] && HSA_LIBHSAIL_PATH=/opt/hsa/lib -[ -z HSA_LLVM_PATH ] && HSA_LLVM_PATH=/opt/amd/bin +[ -z $HSA_LLVM_PATH ] && HSA_LLVM_PATH=/opt/amd/cloc/bin export LD_LIBRARY_PATH=$HSA_RUNTIME_PATH/lib # First compile the acclerated functions to create hw.o echo if [ "$1" == "f" ] ; then - echo snack.sh -q -fort -c hw.cl - snack.sh -q -fort -c hw.cl + echo $HSA_LLVM_PATH/snack.sh -q -fort -c hw.cl + $HSA_LLVM_PATH/snack.sh -q -fort -c hw.cl else - echo snack.sh -q -c hw.cl - snack.sh -q -c hw.cl + echo $HSA_LLVM_PATH/snack.sh -q -c hw.cl + $HSA_LLVM_PATH/snack.sh -q -c hw.cl fi # Compile the main program and link to hw.o # Main program can be c, cpp, or fotran echo if [ "$1" == "cpp" ] ; then - echo "g++ -o HelloWorld hw.o HelloWorld.cpp -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 -lelf " - g++ -o HelloWorld hw.o HelloWorld.cpp -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 -lelf + echo "g++ -o HelloWorld hw.o HelloWorld.cpp -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 " + g++ -o HelloWorld hw.o HelloWorld.cpp -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 elif [ "$1" == "f" ] ; then - echo "f95 -o HelloWorld hw.o HelloWorld.f -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 -lelf " - f95 -o HelloWorld hw.o HelloWorld.f -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 -lelf + echo "f95 -o HelloWorld hw.o HelloWorld.f -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 " + f95 -o HelloWorld hw.o HelloWorld.f -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 else - echo "gcc -o HelloWorld hw.o HelloWorld.c -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 -lelf" - gcc -o HelloWorld hw.o HelloWorld.c -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 -lelf + echo "gcc -o HelloWorld hw.o HelloWorld.c -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 " + gcc -o HelloWorld hw.o HelloWorld.c -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 fi echo diff --git a/examples/snack/matmul/buildrun.sh b/examples/snack/matmul/buildrun.sh index 454e9d6..a9f23c6 100755 --- a/examples/snack/matmul/buildrun.sh +++ b/examples/snack/matmul/buildrun.sh @@ -5,21 +5,20 @@ # # Set HSA Environment variables [ -z $HSA_RUNTIME_PATH ] && HSA_RUNTIME_PATH=/opt/hsa -[ -z HSA_LIBHSAIL_PATH ] && HSA_LIBHSAIL_PATH=/opt/hsa/lib -[ -z HSA_LLVM_PATH ] && HSA_LLVM_PATH=/opt/amd/bin +[ -z $HSA_LLVM_PATH ] && HSA_LLVM_PATH=/opt/amd/cloc/bin export LD_LIBRARY_PATH=$HSA_RUNTIME_PATH/lib # Compile accelerated functions echo if [ -f matmulKernels.o ] ; then rm matmulKernels.o ; fi -echo snack.sh -c -opt 3 -vv matmulKernels.cl -snack.sh -c -opt 3 -vv matmulKernels.cl +echo $HSA_LLVM_PATH/snack.sh -c -opt 3 -vv matmulKernels.cl +$HSA_LLVM_PATH/snack.sh -c -opt 3 -vv matmulKernels.cl # Compile Main .c and link to accelerated functions in matmulKernels.o echo if [ -f matmul ] ; then rm matmul ; fi -echo gcc -O3 -o matmul matmulKernels.o matmul.c -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 -lelf -lbsd -gcc -O3 -o matmul matmulKernels.o matmul.c -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 -lelf -lbsd +echo gcc -O3 -o matmul matmulKernels.o matmul.c -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 -lbsd +gcc -O3 -o matmul matmulKernels.o matmul.c -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 -lbsd # Execute the application echo diff --git a/examples/snack/multiple_cl_files/buildrun.sh b/examples/snack/multiple_cl_files/buildrun.sh index a8c2f67..8bd6988 100755 --- a/examples/snack/multiple_cl_files/buildrun.sh +++ b/examples/snack/multiple_cl_files/buildrun.sh @@ -4,22 +4,22 @@ # Set HSA Environment variables [ -z $HSA_RUNTIME_PATH ] && HSA_RUNTIME_PATH=/opt/hsa -[ -z HSA_LLVM_PATH ] && HSA_LLVM_PATH=/opt/amd/bin +[ -z $HSA_LLVM_PATH ] && HSA_LLVM_PATH=/opt/amd/cloc/bin export LD_LIBRARY_PATH=$HSA_RUNTIME_PATH/lib # First compile all files with acclerated functions to create hw.o and hw2.o echo -echo snack.sh -q -c hw.cl -snack.sh -q -c hw.cl +echo $HSA_LLVM_PATH snack.sh -q -c hw.cl +$HSA_LLVM_PATH/snack.sh -q -c hw.cl echo -echo snack.sh -q -c -noglobs hw2.cl -snack.sh -q -c -noglobs hw2.cl +echo $HSA_LLVM_PATH/snack.sh -q -c -noglobs hw2.cl +$HSA_LLVM_PATH/snack.sh -q -c -noglobs hw2.cl # Compile the main program and link to hw.o # Main program can be c, cpp, or fotran echo -echo "gcc -o HelloWorld hw.o hw2.o HelloWorld.c -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 -lelf" -gcc -o HelloWorld hw.o hw2.o HelloWorld.c -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 -lelf +echo "gcc -o HelloWorld hw.o hw2.o HelloWorld.c -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 " +gcc -o HelloWorld hw.o hw2.o HelloWorld.c -L$HSA_RUNTIME_PATH/lib -lhsa-runtime64 echo echo ./HelloWorld diff --git a/examples/snack/vector_copy/buildrun.sh b/examples/snack/vector_copy/buildrun.sh index a8a2826..ba6319d 100755 --- a/examples/snack/vector_copy/buildrun.sh +++ b/examples/snack/vector_copy/buildrun.sh @@ -2,19 +2,19 @@ # Set HSA Environment variables [ -z $HSA_RUNTIME_PATH ] && HSA_RUNTIME_PATH=/opt/hsa -[ -z HSA_LLVM_PATH ] && HSA_LLVM_PATH=/opt/amd/bin +[ -z $HSA_LLVM_PATH ] && HSA_LLVM_PATH=/opt/amd/cloc/bin export LD_LIBRARY_PATH=$HSA_RUNTIME_PATH/lib # Compile accelerated functions echo if [ -f vector_copy.o ] ; then rm vector_copy.o ; fi -echo snack.sh -q -c vector_copy.cl -snack.sh -q -c vector_copy.cl +echo $HSA_LLVM_PATH/snack.sh -q -c vector_copy.cl +$HSA_LLVM_PATH/snack.sh -q -c vector_copy.cl # Compile Main and link to accelerated functions in vector_copy.o echo if [ -f VectorCopy ] ; then rm VectorCopy ; fi -echo "g++ -o VectorCopy vector_copy.o VectorCopy.cpp -L $HSA_RUNTIME_PATH/lib -lhsa-runtime64 -lelf " -g++ -o VectorCopy vector_copy.o VectorCopy.cpp -L $HSA_RUNTIME_PATH/lib -lhsa-runtime64 -lelf +echo "g++ -o VectorCopy vector_copy.o VectorCopy.cpp -L $HSA_RUNTIME_PATH/lib -lhsa-runtime64 " +g++ -o VectorCopy vector_copy.o VectorCopy.cpp -L $HSA_RUNTIME_PATH/lib -lhsa-runtime64 # Execute echo diff --git a/ubuntu/cloc_0.7.4_amd64.deb b/ubuntu/cloc_0.7.4_amd64.deb deleted file mode 100644 index 4fee306..0000000 Binary files a/ubuntu/cloc_0.7.4_amd64.deb and /dev/null differ