Stata#

Stata is a general purpose statistical software suite.

Stata homepage: stata.com/

Official Stata Documentation: stata.com/features/documentation/

See also

For examples and tips on submitting jobs, see our SLURM documentation and Best Practices for Jobs

For compute resources, see HPC Queues

Using Stata on M3#

Important

The Stata licenses we have only allow using up to 8 CPU cores. Requesting more than 8 cores will not help jobs run faster.

Files used in the example are available:

Example submission script#

The following job script can be submitted using sbatch stata_example.sbatch from the command line.

This example should run in a few seconds and use very little memory. Therefore, we request 4 GB of memory and 5 minutes of run time to give ourselves some room for error.

Note, we know this from running the job. It is always a good idea to review the resources your jobs use and adjust future jobs to more accurately request resources.

 1#!/bin/bash
 2#SBATCH --job-name=stata_example       # Job name
 3#SBATCH --output=stata_example_%j.out  # job output, %j is the job id
 4#SBATCH --error=stata_example_%j.err   # job error output, %j is the job id
 5#SBATCH -p standard-s                  # Queue (partition) to run on
 6#SBATCH -t 0-00:05:00                  # Time in days-HH:MM:SS
 7#SBATCH --mem=4G                       # Total memory required per node
 8#SBATCH -c 8                           # number of cores
 9
10# unload all modules
11# then load stata. Different versions may be available
12module purge
13module load stata/mp-18
14
15# Run stata
16stata-mp -b example.do

The above job script runs the Stata script example.do (on github)

 1* Example Stata do-file 
 2
 3* Create a text log file that stores the results
 4log using example.txt, text replace
 5
 6* Read in the Stata data set carsdata.dta
 7use example.dta
 8
 9* Describe the variables in the data set
10describe
11
12* List the dataset 
13list 
14
15* Provide summary statistics of the variables in the data set
16summarize
17
18* Provide an X,Y scatterplot with a regression line
19twoway (scatter cars hhsize) (lfit cars hhsize)
20
21* Save the preceding graph in a file in PNG (portable networks graphic) format
22graph export carsdata.png, replace
23
24* Regress cars on hhsize
25regress cars hhsize

Example submission script for array jobs#

It is also possible to submit array jobs. This is a way to submit multiple parameters and/or Stata scripts using a single job submission.

The following job script can be submitted using sbatch stata_array_example.sbatch from the command line.

 1#!/bin/bash
 2#SBATCH -J stata_example                 # Job name
 3#SBATCH -p dev                           # Partition (queue)
 4#SBATCH --mem=4G                         # Total memory required per node
 5#SBATCH -t 0-00:05:00                    # time, days-HH:MM:SS
 6#SBATCH -c 8                             # number of CPU cores
 7#SBATCH -o stata_array_example_%A-%a.out # Job output; %A is job ID and %a is array index
 8#SBATCH --array=1-2                      # Range of indices to be executed
 9
10# unload all modules
11# then load stata. Different versions may be available
12module purge
13module load stata/mp-18
14
15# Run Stata
16stata-mp -b array_example_${SLURM_ARRAY_TASK_ID}.do
17# Edit STATA script name as needed; ${SLURM_ARRAY_TASK_ID} is array index

The above job script runs the Stata scripts array_example_1.do (on github) and array_example_2.do (on github.) Note, these scripts are identical for demonstration purposes.