Analysis


The analysis has two parts.

####Part A

  1. In this question, you will examine how the k-value/order of the model, the length of the training text, and the number of characters generated affect the runtime for the brute force code we give you. To do that, determine how long it takes for each of the following Markov trials to run:

    alice.txt, k = 1, 200 characters max:

    alice.txt, k = 1, 400 characters max:

    alice.txt, k = 1, 800 characters max:

    alice.txt, k = 5, 200 characters max:

    alice.txt, k = 5, 400 characters max:

    alice.txt, k = 5, 800 characters max:

    alice.txt, k = 10, 200 characters max:

    alice.txt, k = 10, 400 characters max:

    alice.txt, k = 10, 800 characters max:

    hawthorne.txt, k = 1, 200 characters max:

    hawthorne.txt, k = 1, 400 characters max:

    hawthorne.txt, k = 1, 800 characters max:

  2. Based on these results, what is the relationship between the runtime and the length of the training text, the k-value, and the max characters generated? How long do you think it will take to generate 1600 random characters using an order-5 Markov model when the The Complete Works of William Shakespeare is used as the training text — our online copy of this text contains roughly 5.5 million characters. Justify your answer — don’t test empirically, use reasoning.

  3. Provide timings using your Map/Smart model for both creating the map and generating 200, 400, 800, and 1600 character random texts with an order-5 Model and alice.txt. Provide some explanation for the timings you observe.

  4. Provide timings for the WordMarkovModel with different hashcode methods. Time the method you are given and compare the results that you achieve with the better hashcode method that you developed.

  5. Using a k of your choice and clinton-nh.txt as the training text, if we do not set a maximum number of characters generated (you can effectively do this by having maxLetters be a large number, e.g. 1000 times the size of the training text) what is the average number of characters generated by our Markov text generator?

####Part B

The goal of the second part of the analysis is to analyze the performance of WordMarkovModel using a HashMap (and the hashCode function you wrote) and a TreeMap (and the compareTo function you wrote). The main difference between them should be their performance as the number of keys (i.e, WordNGrams as keys in your map) gets large. So set up a test with the lexicons we give you and a few of your own. Figure out how many keys each different lexicon generates (for a specific number sized n-gram). Then generate some text and see how long it takes. Graph the results. On one axis you’ll have the number of keys, on the other you’ll have the time it took to generate a constant number of words (you decide - choose something pretty big to get more accurate results). Your two lines will be HashMap and TreeMap. Try to see if you can see any differences in their performance as the number of NGrams in the map get large. If you can’t, that’s fine. Provide some data values, a description of the shape of the graph, and an analysis in your README.txt.