Skip to content

Commit

Permalink
Generalize RES configuration of login nodes and user/group json
Browse files Browse the repository at this point in the history
Resolves #250

=====
Fix SlurmFsxLustre ingress rule.

CDK creates egress rule without matching ingress rule.

Resolves #252

=====

Fix FSxZ egress rules

Compensate for a bug in ParallelCluster that requires egress rules.

Leave the bug open so that can remove rules when ParallelCluster bug is fixed.

Addresses #253

=====

Document FSx configuration

=====

Add IAM policy required to mount FSx

Add AttachRolePolicy, DetachRolePolicy for HeadNodePolicy.

Resolves #254

====

Fix SNS notification bug when CreateParallelCluster Lambda is missing parameter
  • Loading branch information
cartalla committed Sep 10, 2024
1 parent 2d84608 commit 9968c71
Show file tree
Hide file tree
Showing 13 changed files with 542 additions and 371 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,9 @@ def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
for dst_sg_name, dst_sg in lustre_security_groups.items():
src_sg.connections.allow_to(dst_sg, ec2.Port.tcp(988), f"{src_sg_name} to {dst_sg_name} lustre")
src_sg.connections.allow_to(dst_sg, ec2.Port.tcp_range(1018, 1023), f"{src_sg_name} to {dst_sg_name} lustre")
# It shouldn't be necessary to do allow_to and allow_from, but CDK left off the ingress rule form lustre to lustre if I didn't add the allow_from.
dst_sg.connections.allow_from(src_sg, ec2.Port.tcp(988), f"{src_sg_name} to {dst_sg_name} lustre")
dst_sg.connections.allow_from(src_sg, ec2.Port.tcp_range(1018, 1023), f"{src_sg_name} to {dst_sg_name} lustre")

# Rules for FSx Ontap
for fsx_client_sg_name, fsx_client_sg in fsx_client_security_groups.items():
Expand All @@ -138,12 +141,21 @@ def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
fsx_client_sg.connections.allow_to(fsx_ontap_sg, ec2.Port.udp(4046), f"{fsx_client_sg_name} to {fsx_ontap_sg_name} Network status monitor for NFS")

for fsx_zfs_sg_name, fsx_zfs_sg in zfs_security_groups.items():
fsx_client_sg.connections.allow_to(slurm_fsx_zfs_sg, ec2.Port.tcp(111), f"{fsx_client_sg_name} to {fsx_zfs_sg_name} rpc for NFS")
fsx_client_sg.connections.allow_to(slurm_fsx_zfs_sg, ec2.Port.udp(111), f"{fsx_client_sg_name} to {fsx_zfs_sg_name} rpc for NFS")
fsx_client_sg.connections.allow_to(slurm_fsx_zfs_sg, ec2.Port.tcp(2049), f"{fsx_client_sg_name} to {fsx_zfs_sg_name} NFS server daemon")
fsx_client_sg.connections.allow_to(slurm_fsx_zfs_sg, ec2.Port.udp(2049), f"{fsx_client_sg_name} to {fsx_zfs_sg_name} NFS server daemon")
fsx_client_sg.connections.allow_to(slurm_fsx_zfs_sg, ec2.Port.tcp_range(20001, 20003), f"{fsx_client_sg_name} to {fsx_zfs_sg_name} NFS mount, status monitor, and lock daemon")
fsx_client_sg.connections.allow_to(slurm_fsx_zfs_sg, ec2.Port.udp_range(20001, 20003), f"{fsx_client_sg_name} to {fsx_zfs_sg_name} NFS mount, status monitor, and lock daemon")
fsx_client_sg.connections.allow_to(fsx_zfs_sg, ec2.Port.tcp(111), f"{fsx_client_sg_name} to {fsx_zfs_sg_name} rpc for NFS")
fsx_client_sg.connections.allow_to(fsx_zfs_sg, ec2.Port.udp(111), f"{fsx_client_sg_name} to {fsx_zfs_sg_name} rpc for NFS")
fsx_client_sg.connections.allow_to(fsx_zfs_sg, ec2.Port.tcp(2049), f"{fsx_client_sg_name} to {fsx_zfs_sg_name} NFS server daemon")
fsx_client_sg.connections.allow_to(fsx_zfs_sg, ec2.Port.udp(2049), f"{fsx_client_sg_name} to {fsx_zfs_sg_name} NFS server daemon")
fsx_client_sg.connections.allow_to(fsx_zfs_sg, ec2.Port.tcp_range(20001, 20003), f"{fsx_client_sg_name} to {fsx_zfs_sg_name} NFS mount, status monitor, and lock daemon")
fsx_client_sg.connections.allow_to(fsx_zfs_sg, ec2.Port.udp_range(20001, 20003), f"{fsx_client_sg_name} to {fsx_zfs_sg_name} NFS mount, status monitor, and lock daemon")
# There is a bug in PC 3.10.1 that requires outbound traffic to be enabled even though ZFS doesn't.
# Remove when bug in PC is fixed.
# Tracked by https://github.com/aws-samples/aws-eda-slurm-cluster/issues/253
fsx_client_sg.connections.allow_from(fsx_zfs_sg, ec2.Port.tcp(111), f"{fsx_zfs_sg_name} to {fsx_client_sg_name} rpc for NFS")
fsx_client_sg.connections.allow_from(fsx_zfs_sg, ec2.Port.udp(111), f"{fsx_zfs_sg_name} to {fsx_client_sg_name} rpc for NFS")
fsx_client_sg.connections.allow_from(fsx_zfs_sg, ec2.Port.tcp(2049), f"{fsx_zfs_sg_name} to {fsx_client_sg_name} NFS server daemon")
fsx_client_sg.connections.allow_from(fsx_zfs_sg, ec2.Port.udp(2049), f"{fsx_zfs_sg_name} to {fsx_client_sg_name} NFS server daemon")
fsx_client_sg.connections.allow_from(fsx_zfs_sg, ec2.Port.tcp_range(20001, 20003), f"{fsx_zfs_sg_name} to {fsx_client_sg_name} NFS mount, status monitor, and lock daemon")
fsx_client_sg.connections.allow_from(fsx_zfs_sg, ec2.Port.udp_range(20001, 20003), f"{fsx_zfs_sg_name} to {fsx_client_sg_name} NFS mount, status monitor, and lock daemon")

for sg_name, sg in security_groups.items():
CfnOutput(self, f"{sg_name}Id",
Expand Down
55 changes: 55 additions & 0 deletions docs/deployment-prerequisites.md
Original file line number Diff line number Diff line change
Expand Up @@ -441,3 +441,58 @@ slurm:
ansys:
Count: 1
```

### Configure File Systems

The Storage/ExtraMounts parameter allows you to configure additional file systems to mount on compute nodes.
Note that the security groups for the file systems must allow connections from the compute nodes.

#### Lustre

The following example shows how to add an FSx for Lustre file system.
The mount information can be found from the FSx console.

```
storage:
ExtraMounts
- dest: /lustre
src: <FileSystemId>.fsx.<Region>.amazonaws.com@tcp:/<MountName>
StorageType: FsxLustre
FileSystemId: <FileSystemId>
type: lustre
options: relatime,flock
```

#### ONTAP

The following example shows how to add an FSx for NetApp ONTAP file system.
The mount information can be found from the FSx console.

```
storage:
ExtraMounts
- dest: /ontap
src: <SvmId>.<FileSystemId>.fsx.<Region>.amazonaws.com:/vol1
StorageType: FsxOntap
FileSystemId: <FileSystemId>
VolumeId: <VolumeId>
type: nfs
options: default
```

#### ZFS

The following example shows how to add an FSx for OpenZFS file system.
The mount information can be found from the FSx console.

```
storage:
ExtraMounts
- dest: /zfs
src: <FileSystemId>.fsx.<Region>.amazonaws.com:/fsx
StorageType: FsxOpenZfs
FileSystemId: <FileSystemId>
VolumeId: <VolumeId>
type: nfs
options: noatime,nfsvers=3,sync,nconnect=16,rsize=1048576,wsize=1048576
```
Loading

0 comments on commit 9968c71

Please sign in to comment.