Stata#
Stata is a general purpose statistical software suite.
Stata homepage: stata.com/
Official Stata Documentation: stata.com/features/documentation/
See also
For examples and tips on submitting jobs, see our SLURM documentation and Best Practices for Jobs
For compute resources, see HPC Queues
Using Stata on M3#
Important
The Stata licenses we have only allow using up to 8 CPU cores. Requesting more than 8 cores will not help jobs run faster.
Files used in the example are available:
on M3 at
/hpc/m3/examples/stata/
Example submission script#
The following job script can be submitted using sbatch stata_example.sbatch
from the command line.
This example should run in a few seconds and use very little memory. Therefore, we request 4 GB of memory and 5 minutes of run time to give ourselves some room for error.
Note, we know this from running the job. It is always a good idea to review the resources your jobs use and adjust future jobs to more accurately request resources.
1#!/bin/bash
2#SBATCH --job-name=stata_example # Job name
3#SBATCH --output=stata_example_%j.out # job output, %j is the job id
4#SBATCH --error=stata_example_%j.err # job error output, %j is the job id
5#SBATCH -p standard-s # Queue (partition) to run on
6#SBATCH -t 0-00:05:00 # Time in days-HH:MM:SS
7#SBATCH --mem=4G # Total memory required per node
8#SBATCH -c 8 # number of cores
9
10# unload all modules
11# then load stata. Different versions may be available
12module purge
13module load stata/mp-18
14
15# Run stata
16stata-mp -b example.do
The above job script runs the Stata script example.do
(on github)
1* Example Stata do-file
2
3* Create a text log file that stores the results
4log using example.txt, text replace
5
6* Read in the Stata data set carsdata.dta
7use example.dta
8
9* Describe the variables in the data set
10describe
11
12* List the dataset
13list
14
15* Provide summary statistics of the variables in the data set
16summarize
17
18* Provide an X,Y scatterplot with a regression line
19twoway (scatter cars hhsize) (lfit cars hhsize)
20
21* Save the preceding graph in a file in PNG (portable networks graphic) format
22graph export carsdata.png, replace
23
24* Regress cars on hhsize
25regress cars hhsize
Example submission script for array jobs#
It is also possible to submit array jobs. This is a way to submit multiple parameters and/or Stata scripts using a single job submission.
The following job script can be submitted using sbatch stata_array_example.sbatch
from the command line.
1#!/bin/bash
2#SBATCH -J stata_example # Job name
3#SBATCH -p dev # Partition (queue)
4#SBATCH --mem=4G # Total memory required per node
5#SBATCH -t 0-00:05:00 # time, days-HH:MM:SS
6#SBATCH -c 8 # number of CPU cores
7#SBATCH -o stata_array_example_%A-%a.out # Job output; %A is job ID and %a is array index
8#SBATCH --array=1-2 # Range of indices to be executed
9
10# unload all modules
11# then load stata. Different versions may be available
12module purge
13module load stata/mp-18
14
15# Run Stata
16stata-mp -b array_example_${SLURM_ARRAY_TASK_ID}.do
17# Edit STATA script name as needed; ${SLURM_ARRAY_TASK_ID} is array index
The above job script runs the Stata scripts array_example_1.do
(on github)
and array_example_2.do
(on github.)
Note, these scripts are identical for demonstration purposes.