Skip to content

Using Markov chains and NLTK to create a text-generation machine learning model.

License

Notifications You must be signed in to change notification settings

zifuyang/Natural-Language-Processing

Repository files navigation

The goal is to generate reasonably sensible sentences by constructing a dictionary that represents a Markov chain by reading an input text file. The chain is based on observing the previous two words and remembering the next word. The dictionary's values are lists of words to accommodate cases where the same two-word key has multiple next words. To generate a new text, the program starts with the first two words of the input text, forms an initial key, and selects the third word based on the key's value. If there are multiple words in the value, one is chosen randomly. The algorithm continues cycling by picking new words and updating the key, generating up to 500 words.

The program prompts the user for an input file name and displays the generated text. It then prompts the user for an output file name, and the user can choose to discard the text by not entering a filename. If a file is selected, the program writes the new text with no more than 80 characters per line and no split words.