Page last updated on: 2019-10-14
CUNY HPCC operates a collection of clusters and servers that can run serial and parallel programs.
You need to
- be affiliated with CUNY;
- and have an active research project;
in order to apply for a CUNY HPCC computing account.
Goto https://hpcreg1.csi.cuny.edu/forms/requirements.php to start application.
The first page of the application is about the computer resourses you want to request. It is often hard for a new user to figure that out.
How to estimate the resources you need?¶
Let's look at these two examples.
- I have 50 GB experiment data, and need to analyze the data with factor analysis, and in turn, I will use the analysis results to redesign experiment to optimize experiment conditions.
- I plan to do the first round analysis in one month, and verify the second round, the redesigned experiment data within 1 week.
- The analysis tools are written in Python because SciPy has many data analysis and optimization modules.
- The produced intermediate and final results will take twice as much space as the original data.
- The exact computing time of the analysis is not known.
The first thing we need to figure out is CPU time. Comparing with simulation, the analytical analysis takes very little time. So we will give 1 CPU hour for 1 GB data analysis, which sums up to 50 CPU hours for 50 GB data.
The code development itself may take much more time, so we add 5x CPU hours to the computing. This gives 250 CPU hours. If we do the data analysis for two rounds, the total is 500 CPU hours.
The second thing is memory. If I don't know the exact memory requirement, I will use 8 GB per core for data analysis purpose.
In this scenario, it is easy to figure out the storage space requirement. It will be 50 GB initial data and 100 GB intermediate and final results for two rounds. The total is 300 GB. However, we only need maximum of 150 GB at any given time.
Next, let's calculate how many CPU cores we need to finish the first and second round of calculation in the desired time frame:
- First round: 250 CPU hours in 30 days, we only need 250/(30*24) = 0.35 CPU core.
- Second round: 250 CPU hours in 7 days, we need 250/(7*24) = 1.5 CPU core.
So we choose to request 2 CPU cores. They can be on the same node or different nodes. In the form, we can fill in the above information:
- I am a graduate student and I will use Molecular Dynamics simulation to study a biological system in the next 4 years.
- The code is parallel and uses MPICH.
- My system of interest can be big but the CPU and memory demand are unknown at this stage.
CPU time in this scenario can be estimated by our work over 4 years. Suppose we will have a job running half of the time, over the 4 year period. This job needs 16 threads. This sums up to be 16x24x365x2 = 280320 CPU hours.
The memory 4GB per core is reasonable for MD simulation.
The storage is up to the snapshots we need to save. We will constantly download the simulation results so we request 200GB revolving storage on the server.
For parallel jobs, we pack threads on one node to minimize cross-node network traffic. It might be hard to obtain 16 cores on a node from the job scheduler, so we will request 8 cores per node.