Skip to Main Content

An Introductory Guide to SAS

An introductory online tutorial for beginners to use SAS.

Importing Data

Importing an Excel file

Syntax

proc import

   datafile = 'file path\filename.xlsx'

   out = new_dataset;
 run;

Example

proc import

   datafile = 'c:\temp\sample.xlsx'

   out = sample;
 run;

The above code reads an Excel file called "sample.xlsx" from "c:\temp" on the computer. It also creates a new dataset called "sample" based on the Excel file. 

 

Importing an SPSS file

Syntax

proc import

   datafile = 'file path\filename.sav'

   out = new_dataset

   dbms = SAV replace;
 run;

Example

proc import

   datafile = 'c:\temp\sample.sav'

   out = sample

   dbms = SAV replace;
 run;

The above code reads an SPSS file called "sample.sav" from "c:\temp" on the computer. It also creates a new dataset called "sample" based on the SPSS file. 

 

Importing a CSV file

Syntax

proc import

   datafile = 'file path\filename.csv'

   out = new_dataset

   dbms = CSV replace;
 run;

Example

proc import

   datafile = 'c:\temp\sample.csv'

   out = sample

   dbms = CSV replace;
 run;

The above code reads an CSV file called "sample.csv" from "c:\temp" on the computer. It also creates a new dataset called "sample" based on the CSV file. 

 

Printing data

Syntax

proc contents data = dataset;

run;

Example

proc contents data = sample;

run;

The above code prints the content of dataset "sample."

Computing New Variables

Creating a new variable by copying an existing variable

Syntax

var_new = var_old

Example

data sample_new;

   set sample;

   age_new = age;

run;

The above code creates a new variable "age_new" by copying the value from an existing variable "age" in the dataset. 

 

Creating a new variable with a constant value

Syntax

var_new = constant_number

Example

data sample_new;

   set sample;

   group = 1;

run;

The above code creates a new variable "group" with a constant value 1.

 

Creating new variables by arithmetic calculations

Arithmetic Operators

Symbol Definition
** exponentiation
* multiplication
/ division
+ addition
- subtraction

Example

data calculation;

   set sample;

   ID_new = ID + 1;

   age_new = age - 2;

   point = total*5;

   weight_kg = weight_lb/2.205;

   bmi = (weight / (height*height) ) * 703;

run;

 

Using if-then statement

Comparison and Logical Operators

Definition Symbol Alternative
equal to = EQ
not equal to ^= NE
greater than > GT
less than < LT
greater than or equal to >= GE
less than or equal to <= LE
equal to one of a list in IN
  & AND
  | OR
    NOT

Example

data sample_new;

   set sample;

   if age >= 18 then adult = 1;

   if age < 18 then adult = 0;

   if age >= 18 & gender = 1 then group = 1;

run;

Sorting Data

Sorting data by ascending

Syntax

proc sort data = dataset_old

   out = dataset_new;

   by var_name;

run;

Example

proc sort data = sample

   out = sample_ascending;

   by age;

run;

The above code sorts the data by age in ascending order. 

 

Sorting data by descending

Syntax

proc sort data = dataset_old

   out = dataset_new;

   by descending var_name;

run;

Example

proc sort data = sample

   out = sample_descending;

   by descending age;

run;

The above code sorts the data by age in descending order. 

Subsetting and Merging

Subsetting data

1. By variables

Example

data sample_new;

   set sample;

   keep number ID age;

run;

The above code creates a subset from dataset "sample" only including variables: number, ID, and age. 

 

2. By dropping missing value

data sample_new;

   set sample;

   if ID = . then delete;

run;

The above code creates a subset from dataset "sample" only including those observations with a valid ID value. 

 

3. By multiple conditions

data sample_new;

   set sample;

   if age >= 18 & group = 1 then delete;

run;

The above code creates a subset from dataset "sample" only including those observations with age less than 18 and group number is not 1. 

 

Merging data

Syntax 

data dataset_new;

   merge dataset1 dataset2;

   by var;

 run;

Example

data sample_new;

   merge sample1 sample2;

   by id;

 run;

The above code merges two datasets "sample1" and "sample2" into a new dataset "sample_new" by the common variable "id". (Remember to sort the merging datasets by the common variable before merging into a new dataset.)

 

Stacking data

Syntax

data dataset_new;

   set dataset1 dataset2;

 run;

Example

data sample_new;

   set sample1 sample2;

 run;

The above code stacks two datasets "sample1" and "sample2" into a new dataset "sample_new." 

Summarizing Data

Printing variables

Syntax

proc print data = dataset;

   var var_name1 var_name2 var_name3;

run;

Example

proc print data = sample;

   var number age id;

run;

The above code prints values of variables number, age, and id from dataset "sample."

 

Summarizing continuous variables

Syntax

proc means data = dataset_name;

   var var_name;

   class var_name;

run;

Example 1

proc means data = dataset_name mean median std min max maxdec = 2;

run;

The above code calculates mean, median, standard deviation, minimum value, maximum value of each variable in dataset "sample." Results only provide 2 decimals. 

Example 2

proc means data = sample N mean median std min max maxdec = 2;

   var age;

   class gender;

run;

The above code calculates number of observation, mean, median, standard deviation, minimum value, maximum value of variable age in dataset "sample" by gender. Results only provide 2 decimals. 

 

Summarizing categorical variables

Syntax

proc freq data = dataset_name;

   tables var_name1 var_name2 var_name3;

run;

Example 1

proc freq data = sample;

   tables race;

run;

The above code creates a frequency table for variable race in dataset "sample." It includes both frequency and percentage of each category of the variable. 

Example 2

proc freq data = sample order = freq;

   tables race;

run;

The above code creates a frequency table for variable race in dataset "sample." The table is sorted by frequency of each category within the variable. 

Analyzing Data

Pearson Correlation

Syntax

proc corr data = dataset_name;

   var var_name1;

run;

Example

proc corr data = sample;

   var math science read;

run;

 

Independent t-test

Syntax

proc ttest data = dataset_name;

   var var_name1(outcome variable);

   class var_name2(independent variable);

run;

Example

proc ttest data = sample;

   var write;

   class gender;

run;

 

Chi-Square Test

Syntax

proc freq data = dataset_name;

   tables var_row*var_column /chisq;

run;

Example

proc freq data = sample;

   tables sports*location /chisq;

run;

 

One Way ANOVA

Syntax

proc anova data = dataset_name;

   class var_name;

   model var_name1 = var_name2;

run;

Example

proc anova data = sample;

   class school;

   model math = school;

run;

 

Linear Regression

Syntax

proc reg data = dataset_name;

   title "analysis_title";

   model outcome_var = independent_var;

run;

Example 1 (simple linear regression)

proc reg data = sample;

   title "example 1";

   model bmi = weight;

run;

Example 2 (multiple linear regression)

proc reg data = sample;

   title "example 2";

   model bmi = weight height age;

run;

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.