# Using MATLAB to Grade

Educators use MATLAB a lot. In addition to using MATLAB for research, many professors and instructors use MATLAB for teaching,
including demonstrating and explaining concepts, creating class notes, and creating and collecting homework assignments and
exams. Today I will show how you might use a `dataset` array from Statistic Toolbox to grade a set of student results that are already recorded.

### Contents

### What is a dataset Array?

A `dataset` array is basically two-dimensional array where each column holds data represented in a single data type, but the columns
together may comprise many different data types.

A cell array can do this, but does not enforce the idea that all the elements in a given column must have the same type. To have names with a cell array, you either need to carry around an extra array, or have a special row or column to contain the label.

A scalar structure similarly can perform this service but but does not enforce the idea that each field must hold the same
number of rows. Additionally, a `dataset` array can have labels that allow you to reference data not just by numeric or logical indexing, but by names as well.

### Why Use a dataset Array?

You might want to use a `dataset` array if the data you have is natural to think of as collection of different but related entities. The relationships between
the collections are more constrained (in this case, 1:1) than the flexibility afforded by cells or structs. The example I'm
showing here, grading an assignment, shows some of the simpler ways in which you might use a `dataset` array.

### Creating a dataset

I have an assignment with 5 questions, 2 T/F and 3 multiple choice (a:d). The questions have extraordinarily imaginative names.

qnames = {'Q1' 'Q2' 'Q3' 'Q4' 'Q5'};

I have collected the students' results in a sheet of a spreadsheet labeled "students" and the truth in a sheet labeled "truth".
I can read each of these directly into a `dataset` array so the results are available for analysis.

answers = dataset('xlsfile','class answers.xls',... 'sheet','students','ReadObsNames',true,... 'ReadVarNames',false,'VarNames',qnames)

answers = Q1 Q2 Q3 Q4 Q5 Chris 0 'a' 0 'c' 'b' Christine 0 'b' 0 'c' 'a' Christopher 1 'a' 0 'c' 'a' Kris 1 'a' 1 'b' 'c' Kristen 1 'a' 0 'd' 'a'

As you can see, all my students have similar names. In fact, I believe the most common root name for employees at MathWorks is this same root. Each row represents the results for a given student, and the columns represent the results for a given question.

Here are the "real" answers.

truth = dataset('xlsfile','class answers.xls',... 'sheet','truth','ReadObsNames',true,... 'ReadVarNames',false,'VarNames',qnames)

truth = Q1 Q2 Q3 Q4 Q5 Answers 1 'a' 0 'c' 'a'

### Merge All into a Single dataset

I am placing the answer key in the top row of my array so I have the truth handy for comparison later.

alldata = [truth; answers]

alldata = Q1 Q2 Q3 Q4 Q5 Answers 1 'a' 0 'c' 'a' Chris 0 'a' 0 'c' 'b' Christine 0 'b' 0 'c' 'a' Christopher 1 'a' 0 'c' 'a' Kris 1 'a' 1 'b' 'c' Kristen 1 'a' 0 'd' 'a'

### Transform the Underlying Data

As you can see, the answers show up as a mixture of string values, 1s, and 0s. If fact, I prefer to think of the 1s and 0s
as true and false. Also, the string values are the results from multiple choice questions - where the answers are limited
to values **a** through **d**. I would prefer to see the results reflected in the way I want to think about them. So I now transform the data, column
by column.

for q = qnames % Get all the values for a question. vals = alldata.(q{1}); % Change numeric values to logical. if isnumeric(vals) alldata.(q{1}) = logical(vals); else % Change non-numeric values to the collection 'a':'d' alldata.(q{1}) = nominal(vals,[],{'a','b','c','d'}); end end

Here's a summary of the transformed data.

summary(alldata)

Q1: [6x1 logical] true false 4 2 Q2: [6x1 nominal] a b c d 5 1 0 0 Q3: [6x1 logical] true false 1 5 Q4: [6x1 nominal] a b c d 0 1 4 1 Q5: [6x1 nominal] a b c d 4 1 1 0

It shows me, by column, the size and type of the data and, in this case, a count of how many results there are for each possible value.

You may have noticed that I converted the multiple choice columns to a type called `nominal`. The idea of a `nominal` array is constrain the values in the array to a specific collection of values. If all the acceptable values for the array
are represented in the data, you often don't need more than the first input. Since some of my columns did not include all
possible values a:d, I supplied these as the levels. Since they are strings, I choose to use them as the labels for the data
as well as the values.

### Gathering Information Per Question

I am now poised to gather information either by question or by student. Let's first look at the answers for questions 1 and
4. This is another `dataset` array.

q14Truth = alldata('Answers',{'Q1' 'Q4'})

q14Truth = Q1 Q4 Answers true c

To get a single answer, I have another option. This returns a `nominal` value.

`q4ans = alldata.Q4('Answers')`

q4ans = c

I can get all the students' answers to a particular question. I get a `nominal` vector in return here since the `Q4` column contains the answers to a multiple choice question.

q4all = alldata.Q4(2:end)

q4all = c c c b d

`whos q4*`

Name Size Bytes Class Attributes q4all 5x1 314 nominal q4ans 1x1 306 nominal

And I can also find out all answers for one student, resulting in another `dataset` array.

`ChrisAnswers = alldata('Chris',:)`

ChrisAnswers = Q1 Q2 Q3 Q4 Q5 Chris false a false c b

### Which Questions are Hard?

Suppose I want to find out which questions are hardest for this set of students. I can use the `datasetfun` function, similar to `cellfun` and `arrayfun`, to apply a function to each variable in the data. First I need to find out which questions students got right and wrong
so I can compare their answers to the truth (row 1).

f = @(x) 100*sum(x(1)==x(2:end))/(size(alldata,1)-1) percentRight = datasetfun(f,alldata)

f = @(x)100*sum(x(1)==x(2:end))/(size(alldata,1)-1) percentRight = 60 80 80 60 60

### Score the Assignments for Each Student

use datasetfun by comparing all elements in a column for students (i.e., 2:end) with first element of that column, the right answer. I make an effort to label the rows with the students' names here.

f = @(x) x(1)==x(2:end) rightWrong = datasetfun(f,alldata,'DatasetOutput',true,... 'ObsNames',alldata.Properties.ObsNames(2:end))

f = @(x)x(1)==x(2:end) rightWrong = Q1 Q2 Q3 Q4 Q5 Chris false true true true false Christine false false true true true Christopher true true true true true Kris true true false false false Kristen true true true false true

Next I sum across the rows to get scores for each student and add the score as the last column to the `dataset`. Remember: the first score is for the answer key so I set that score to 100.

```
alldata.grade = [100; ...
100*sum(double(rightWrong),2)/size(alldata,2)]
```

alldata = Q1 Q2 Q3 Q4 Q5 grade Answers true a false c a 100 Chris false a false c b 60 Christine false b false c a 60 Christopher true a false c a 100 Kris true a true b c 40 Kristen true a false d a 80

Notice that `alldata` now contains a new column with numeric values in addition to the `nominal` and `logical` ones.

### How Can You See Using a dataset Array?

Can you see applications in which you'd be able to take advantage of a working with your data as a `dataset` array? Let me know here.

**Category:**- Education,
- New Feature