Stata

Stata#

Stata is a general purpose statistical software suite.

Stata homepage: stata.com/

Official Stata Documentation: stata.com/features/documentation/

Using Stata on M3#

Important

The Stata licenses we have only allow using up to 8 CPU cores. Requesting more than 8 cores will not help jobs run faster.

Files used in the example are available:

on github
on M3 at /hpc/m3/examples/stata/

Example submission script#

The following job script can be submitted using sbatch stata_example.sbatch from the command line.

This example should run in a few seconds and use very little memory. Therefore, we request 4 GB of memory and 5 minutes of run time to give ourselves some room for error.

Note, we know this from running the job. It is always a good idea to review the resources your jobs use and adjust future jobs to more accurately request resources.

#!/bin/bash
#SBATCH --job-name=stata_example       # Job name
#SBATCH --output=stata_example_%j.out  # job output, %j is the job id
#SBATCH --error=stata_example_%j.err   # job error output, %j is the job id
#SBATCH -p standard-s                  # Queue (partition) to run on
#SBATCH -t 0-00:05:00                  # Time in days-HH:MM:SS
#SBATCH --mem=4G                       # Total memory required per node
#SBATCH -c 8                           # number of cores

# unload all modules
# then load stata. Different versions may be available
module purge
module load stata/mp-18

# Run stata
stata-mp -b example.do

The above job script runs the Stata script example.do (on github)

* Example Stata do-file 

* Create a text log file that stores the results
log using example.txt, text replace

* Read in the Stata data set carsdata.dta
use example.dta

* Describe the variables in the data set
describe

* List the dataset 
list 

* Provide summary statistics of the variables in the data set
summarize

* Provide an X,Y scatterplot with a regression line
twoway (scatter cars hhsize) (lfit cars hhsize)

* Save the preceding graph in a file in PNG (portable networks graphic) format
graph export carsdata.png, replace

* Regress cars on hhsize
regress cars hhsize

Example submission script for array jobs#

It is also possible to submit array jobs. This is a way to submit multiple parameters and/or Stata scripts using a single job submission.

The following job script can be submitted using sbatch stata_array_example.sbatch from the command line.

#!/bin/bash
#SBATCH -J stata_example                 # Job name
#SBATCH -p dev                           # Partition (queue)
#SBATCH --mem=4G                         # Total memory required per node
#SBATCH -t 0-00:05:00                    # time, days-HH:MM:SS
#SBATCH -c 8                             # number of CPU cores
#SBATCH -o stata_array_example_%A-%a.out # Job output; %A is job ID and %a is array index
#SBATCH --array=1-2                      # Range of indices to be executed

# unload all modules
# then load stata. Different versions may be available
module purge
module load stata/mp-18

# Run Stata
stata-mp -b array_example_${SLURM_ARRAY_TASK_ID}.do
# Edit STATA script name as needed; ${SLURM_ARRAY_TASK_ID} is array index

The above job script runs the Stata scripts array_example_1.do (on github) and array_example_2.do (on github.) Note, these scripts are identical for demonstration purposes.