Question Description
Q1 Classification
Write a function classify to conduct a classification experiements as follows:
- Take the training and testing file name strings as inputs, e.g. classify(training_file, testing_file). I
- Classify text samples in training file using linear support vector machine as follows:
a. First apply grid search with 6-fold cross validation to find the best values for parameters min_df, stop_words, and C (penality parameter of SVM) that are used the modeling pipeline. Use f1-macro as the scoring metric to select the best parameter values. Potential values for these parameters are:
min_df' : [1,2,5]
stop_words' : [None,"english"]
C: [0.5,1,5]
b. Using the best parameter values, train a linear support vector machine classifier with all samples in news_train.csv
- Test the linear support vector classifier created in Step 2.b using the testing file. Compare f1-macro score you obtain from the test dataset with the f1-macro of the best model from grid search, and comment if the model is overfitted or not. Save your comment into a pdf file
- Your function "classify" t has no return. However, when this function is called, the best parameter values from grid search is printed and the testing precision, recall, and f1 score from Step 3 is printed.
Q2. How many samples are enough? Show the impact of sample size on classifier performance
Write a function "impact_of_sample_size" as follows:
Take the full file name path strings for training and test datasets as inputs, e.g.
impact_of_sample_size(train_file, test_file).
Starting with 300 samples from the training file, in each round you build a classifier with 300
more samples. i.e. in round 1, you use samples from 0:300, and in round 2, you use samples from
0:600, …, until you use all samples.
In each round, do the following:
- create tf-idf matrix using TfidfVectorizer with stop words removed
- train a classifier using multinomial Naive Bayes model
- train a classifier using linear support vector machine model
- for each classifier, test its performance using the testing file and collect the following metrics: macro precision, macro recall. Note, make sure you use the same model parameters for all iterations.
Draw a line chart (two lines, one for each classifier) show the relationship between sample size and precision. Similarly, plot another line chart to show the relationship between sample size and recall
Write your analysis on the following:
How sample size affects each classifier’s performance?
How many samples do you think would be needed for each model for good performance?
How is performance of SVM classifier compared with Naïve Bayes classifier, as the sample size increases?
There is no return for this function, but the charts should be plotted.
Our website has a team of professional writers who can help you write any of your homework. They will write your papers from scratch. We also have a team of editors just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE. To make an Order you only need to click Ask A Question and we will direct you to our Order Page at WriteDemy. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.
Fill in all the assignment paper details that are required in the order form with the standard information being the page count, deadline, academic level and type of paper. It is advisable to have this information at hand so that you can quickly fill in the necessary information needed in the form for the essay writer to be immediately assigned to your writing project. Make payment for the custom essay order to enable us to assign a suitable writer to your order. Payments are made through Paypal on a secured billing page. Finally, sit back and relax.
About Writedemy
We are a professional paper writing website. If you have searched a question and bumped into our website just know you are in the right place to get help in your coursework. We offer HIGH QUALITY & PLAGIARISM FREE Papers.
How It Works
To make an Order you only need to click on “Place Order” and we will direct you to our Order Page. Fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.
Are there Discounts?
All new clients are eligible for 20% off in their first Order. Our payment method is safe and secure.