Thrust D: Case Studies

All modules focus on issues related to provisioning/scheduling/

List of SLOs

List of MODULES

D.1. Case-Study: Workflow on local cluster + cloud + Energy/CO2

Application: Workflow from WorkflowHub
- Which workflow applications to use?
  - May be good to have several versions of this case-study for different workflow types
  - Montage
  - Epigenomics
One local 128-node, each node with 8 cores, cluster (BareMetalComputeService)
One cloud on which one can lease various kinds of VMs for cost (by the hour)
Question: what should you do?
- How many VMs of what types for how many hours (within budget)
  - specified by command-line arguments
- User tags which tasks or groups of tasks should run where
  - specified by command-line arguments
The WMS merely implements that (it doesn't implement any actual scheduling algorithm overall)
- Greedy scheduling on each platform
Output: time, $ (include power bill!), "energy" (CO2 footprint, total joule power consumption
- on the cloud: don't pay for power (pay for the VMs already)
- on the local cluster: pay for power with some power bill model
TAB #1: NO REMOTE CLOUD, ONLY LOCAL DEDICATED CLUSTER
- UI:
  - Picture of the Workflow
  - Picture of the cluster
    - some number of hosts with number of cores (enough cores to run full sub-structure on a node?)
    - a local storage with all input data
  - Describe the WMS as using greedy/list-scheduling

Input parameters to the simulator - [FIXED] workflow: fixed to be big Montage workflow - [FIXED] number of cores per node: fixed at 8

- number nodes that are powered on: 1 - 64
- pstate of all hosts that are powered on

Output: - execution time - $cost - CO2 cost

Look at: - 3-D plot of exec time vs. nodes/pstates - 3-D plot of $ vs. nodes/pstate - 3-D plot of CO2 vs. nodes/pstate (scaling of the $ plot)

Based on that: come with interesting activities

Questions: - Activity #1: Run as fast as possible (all nodes, max pstate) ---> runs in XXX hours (or whatever it is) - See the time, $, CO2

- Activity #2: Your boss says it has to run in YYYY > XXXX hours
       - Option #1: turn off some cluster nodes and not use them  --->
                     binary search find do it with XXX nodes
       - Option #2: downclock all cluster nodes  --->
                     binary search find XXXX GHz

       - What about $ and CO2?
       - Conclude on what you should do?

- Activity #3: combine the two?

        - STEP ONE: PICK A NON-STUPID # OF HOSTS
            - Parallel efficiency > XXX%
                    - run with 1 node: T(1)
                    - run with n nodes: T(n)
                    - efficiency: (T(1)/T(n))/n
                    - binary research for efficiency >= XXX%

        - STEP TWO: PICK THE CLOCK RATE
            - Binary search

        - By the way, with the BEST option, it's possible to get a $ of $XXX (we know this because we've run it exhaustively)

TAB #2: Now we have a cloud!
- The cloud has infinite resources, and charges hourly, connected via some bandwidth
- Only one kind of VMs
- Input: How many VMs should we lease and how should we partition the workflow?
- Output: Time + Cost + CO2

D.2. Case-Study: Reimplementing the application: Local + Batch (+ Energy/Co2 as above)

Given platform:
- Local two machines (BareMetal)
  - some number of cores
  - CO2 (electrical bill for both local and remote)
- Remote cluster: Batch with background load -
  - Cost in $ per CPU hour: it's really an allocation for which you pay total + CO2 (nice power plant solar?)
  - CAN'T HAVE MORE THAN 4 PENDING/RUNNING JOBS!
    - Evan's paper - light
  - Question: what do we know about the load?
    - Perhaps a deterministic load and we say: the app has can start at 3PM each day but must be complete by 7AM, and batch load is lower at night?
    - perhaps just non-deterministic and show distributions over multiple runs?
      - leading to a "look at the load and decide what to do" strategy? could be interesting
- Application: 1 sequential program: I (big) -> T -> O1 (tiny):
  - GOAL: Run it 1000 times before the end of the month (today is the 1st)
  - Can be parallelized as a a bunch of sequential tasks I->Tx->Ox and then just concatenate output (free)
    - TIME TO IMPLEMENT: ONE DAY
  - More work can be done to make each task multi-threaded:
    - TIME TO IMPLEMENT: ONE WEEK
      - with 2 threads with 1.5 efficiency
      - with 4 threads with 1.25 efficiency
      - Beyond not worth it
  - If using batch: send I to cluster and then: "How to group things into jobs?"
    - FIXED:
      - 1 TOTAL job
      - 1 job per task
      - Bundle tasks for 1-node jobs
      - Ask for multiple nodes?
    - SMART:
      - Look at load, and decide what to do...
      - TIME TO IMPLEMENT: ONE DAY

D.3. Case-Study: focus on Scheduling

Some way to construct a scheduling strategy (coordinator-worker on steroid, perhaps task-dependencies, I/O, disks, etc.)
optimize different metrics: energy, execution time, throughput, etc.

SLO Map

Home

Thrust A: Parallel and Distributed Computing Concepts

Thrust B: Cyberinfrastructure Concepts

Thrust C: Cyberinfrastructure Services

Thrust D: Case Studies

Feature development plan

HowTo: Editing Figures

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thrust D: Case Studies

List of SLOs

List of MODULES

D.1. Case-Study: Workflow on local cluster + cloud + Energy/CO2

D.2. Case-Study: Reimplementing the application: Local + Batch (+ Energy/Co2 as above)

D.3. Case-Study: focus on Scheduling

SLO Map

Clone this wiki locally