Our assignments are supported by generous donation of AWS credits from Amazon.com.
Assignment 3: Generate text using LSTM
Due: 2016.11.10 :: 23:59:59 IST [Link to submit]
Problem Statement: Train an LSTM on Human Action by Ludwig von Mises. This book is supposedly the best defense of capitalism ever written, which might be a good read for your winter vacation. For the assignment, you don't have to read it. Just train an LSTM on it and generate five samples of random text that sound like this book. Submit a single zip archive with well-commented code, sample output, and any interesting observations while training the code.
Marks: 5 for well documented code that runs, and 5 for the readability of the generated text.
Assignment 2: Classification of cat vs. dog images
Due: 2016.10.27 :: 23:59:59 IST [Link to submit]
Problem Statement: You are given a dataset with images of cats and dogs in zipped file CatDogdata.zip [0.7GB]. Each image is of size 64*64*3 each arranged in the rows of trainX matrix in file traindata.mat . Labels for training are available in the trainY matrix in file traindata.mat. You should 3-fold cross validation to create separate training and validation datasets. Testing images are provided as the rows of matrix testX in the file testdata.mat. Testing labels are not provided. You are asked to use whatever method you wish to classify the test dataset.
Hint: If you find the number of training images to be less, then try data augmentation. That is, artificially increase the number of samples by taking mirror image of the image, or slightly scale or translate the image.
Here is a sample python code to read the mat files:
import scipy.io as sio
traindata = sio.loadmat('traindata.mat')
trainX = traindata['trainX']
trainY = traindata['trainY']
testdata = sio.loadmat('testdata.mat')
testX = testdata['testX']
Submission: Please submit the following as a single zipped file (limit 1MB), or as a link to an accessible online folder (Dropbox, GDrive, OneDrive etc.)
1. All codes. (4 marks for a well commented code with dependencies and running instructions well-documented). [5 marks]
2. Text file with only 0 or 1 labels (one per line) for test dataset. [5 * (your_accuracy / highest_accuracy - 0.5), rounded to the next 0.5 marks]
3. Report on what architectures and training methods you tried before the final architecture, and what did you learn from each experiment. [5 marks]
Assignment 1: HDLSS analysis of gene expression data
Due: 2016.08.31 :: 23:59:59 IST [Link to submit]
Problem Statement: The Gene Expression Omnibus (GEO) data series GSE4115 contains data from 192 human subjects, each with 22,283 profiled genes. Each subject can have one of three disease states: cancer, no cancer, or suspected cancer. Your task is to build a classifier for cancer vs. no cancer by using HDLSS techniques (such as elastic net).
You can use the attached R file as a start. [Here is a tutorial on R]
1. Install R Studio
2. Rename the R file provided by adding your roll number
3. Run the file and go through the comments to make any code or installation changes until you get no errors
Then, start writing your code at the bottom of the file.
You will be graded on:
1. Number of models tried (at least 4). Up to 2 marks.
2. Cleanness, legibility, and comments in the code. Up to 2 marks.
3. Good machine learning protocol (e.g. n-fold cross validation). Up to 2 marks.
4. Report with the following: Tables or figures showing variation of hyperparameters and their impact on accuracy to show how hyper parameters were selected, how different methods compared, which genes were the most important in each technique, which genes were consistently important across techniques, what do some of these genes mean, and how your results compare to the paper: "Airway PI3K Pathway Activation Is an Early and Reversible Event in Lung Cancer Development", Gustafson et al. Sci Tran Med, 2010. Up to 4 marks.
5. Bonus marks (5) for doing any other insightful analysis.
Assignment 3: Generate text using LSTM
Due: 2016.11.10 :: 23:59:59 IST [Link to submit]
Problem Statement: Train an LSTM on Human Action by Ludwig von Mises. This book is supposedly the best defense of capitalism ever written, which might be a good read for your winter vacation. For the assignment, you don't have to read it. Just train an LSTM on it and generate five samples of random text that sound like this book. Submit a single zip archive with well-commented code, sample output, and any interesting observations while training the code.
Marks: 5 for well documented code that runs, and 5 for the readability of the generated text.
Assignment 2: Classification of cat vs. dog images
Due: 2016.10.27 :: 23:59:59 IST [Link to submit]
Problem Statement: You are given a dataset with images of cats and dogs in zipped file CatDogdata.zip [0.7GB]. Each image is of size 64*64*3 each arranged in the rows of trainX matrix in file traindata.mat . Labels for training are available in the trainY matrix in file traindata.mat. You should 3-fold cross validation to create separate training and validation datasets. Testing images are provided as the rows of matrix testX in the file testdata.mat. Testing labels are not provided. You are asked to use whatever method you wish to classify the test dataset.
Hint: If you find the number of training images to be less, then try data augmentation. That is, artificially increase the number of samples by taking mirror image of the image, or slightly scale or translate the image.
Here is a sample python code to read the mat files:
import scipy.io as sio
traindata = sio.loadmat('traindata.mat')
trainX = traindata['trainX']
trainY = traindata['trainY']
testdata = sio.loadmat('testdata.mat')
testX = testdata['testX']
Submission: Please submit the following as a single zipped file (limit 1MB), or as a link to an accessible online folder (Dropbox, GDrive, OneDrive etc.)
1. All codes. (4 marks for a well commented code with dependencies and running instructions well-documented). [5 marks]
2. Text file with only 0 or 1 labels (one per line) for test dataset. [5 * (your_accuracy / highest_accuracy - 0.5), rounded to the next 0.5 marks]
3. Report on what architectures and training methods you tried before the final architecture, and what did you learn from each experiment. [5 marks]
Assignment 1: HDLSS analysis of gene expression data
Due: 2016.08.31 :: 23:59:59 IST [Link to submit]
Problem Statement: The Gene Expression Omnibus (GEO) data series GSE4115 contains data from 192 human subjects, each with 22,283 profiled genes. Each subject can have one of three disease states: cancer, no cancer, or suspected cancer. Your task is to build a classifier for cancer vs. no cancer by using HDLSS techniques (such as elastic net).
You can use the attached R file as a start. [Here is a tutorial on R]
1. Install R Studio
2. Rename the R file provided by adding your roll number
3. Run the file and go through the comments to make any code or installation changes until you get no errors
Then, start writing your code at the bottom of the file.
You will be graded on:
1. Number of models tried (at least 4). Up to 2 marks.
2. Cleanness, legibility, and comments in the code. Up to 2 marks.
3. Good machine learning protocol (e.g. n-fold cross validation). Up to 2 marks.
4. Report with the following: Tables or figures showing variation of hyperparameters and their impact on accuracy to show how hyper parameters were selected, how different methods compared, which genes were the most important in each technique, which genes were consistently important across techniques, what do some of these genes mean, and how your results compare to the paper: "Airway PI3K Pathway Activation Is an Early and Reversible Event in Lung Cancer Development", Gustafson et al. Sci Tran Med, 2010. Up to 4 marks.
5. Bonus marks (5) for doing any other insightful analysis.