This repo contains different university projects made by me, Alessio Mana and Fabrizio Sanino for our Algorithm and Data Structure Exam done in Turin in June 2021.
- Merge-BinaryInsertion Sort, written in C is about Algorithms Complexity, Sorting Algorithms and Generic Programming
- Edit Distance Algorithm, written in C is about Algorithms Complexity, Recursion and Data Structure (Hash Table)
- Union-Find-Set Structure, written in Java is about Data Structure
- Kruskal Algorithm, written in Java is about Algorithms Complexity, Kruskal Algorithm and Data Structure (Graph)
In this program we tried to combine two really important algorithms: Merge Sort and Binary Insertion Sort.
We were given of a file (.csv) containing 20mln records, with 3 different fields inside each record:
- id : (int) unique identifier;
- field1: (string) containing words extracted from Divina Commedia (we assumed the values don't contain commas or blanks);
- field2: (int);
- field3: (floating point);
Our goal was to sort by each field (increasingly or not) faster than we could, COMBINING Merge Sort and Binary Insertion Sort.
We developed Generic Code so that our library(contained in merge_binary_insertion_sort.c
) manages the sorting of all types (such as field1, filed2 and field3).
We used a constant (we called it K) which manages our program to switch between Merge Sort, with complexity O(n log n)
, and Binary Insertion Sort (basically an Insertion Sort were the array split is done by a binary split) with complexity O(n^2)
.
The goal was to find the best K value which permits the faster sorting time.
The idea behind is that, potentially, Insertion Sort could be faster than Merge Sort while working with really small size arrays.
It turned out to be 67.
So, the algorithm works like this: while the length of the array to sort is bigger than 67 uses Merge Sort, Binary Insertion Sort otherwise.
THE GRAPHIC WHICH FOLLOW IS AN EMPIRICAL RESULT (obtained by bench marking arrays long 500 elements), it could depends on the machine, or other stuffs; we combined the sorting time results (done one by one): integers with Binary Insertion Sort and Merge Sort, and floats with Binary Insertion Sort and Merge Sort.
- GCC compiler
sudo apt install gcc
- Make util
sudo apt install make
- Unity framework for unit testing (:warning: Copyright (c) 2007-14 Mike Karlesky, Mark VanderVoord, Greg Williams), see: https://github.com/ThrowTheSwitch/Unity.git
There is a Makefile which automatically compiles and runs the project and tests:
You just have to know two simple commands to use it. Once you have opened a shell in the root
folder, write:
- make all
to compile the application.
- make tests
to run the 49 unit tests using Unity Framework
- make clean
to clean the root folder by the obj files and the executables in the subdirectories
- make src=file.csv reverse=0 run
to run che application. src
is the file to be sorted name, stored in the src folder; reverse=0
sort by ascending order, 1 otherwise.
- CTRL-c
if you want to force the termination of the program.
src
folder were each record is organized like follow:
1,hello,2735414,68601.754091
Once exectued, the sorted files are printed and saved in the src
folder.
What if we write something wrong on our smartph0n3? The automatic corrector does the work for us.
Keeping this in mind we developed this algorithm.
Once got the file containing the text to correct and a Dictionary, we check (word by word) whether it's contained in the dictionary (which means that's correct) or not. If not, we use the Edit Distance Algorithm to correct the word.
The goal for this project was to compare a recursive version and a Dynamic Programming version of the same application.
We calculate the distance between the wrong word and every dictionary's word; the word with the shortest edit-distance will be the replacement of the wrong one (it may not make sense, doesn't matter).
The operations allowed are Insertion and Deletion of a char to/from the wrong word. An example will follow:
"Home" e "Hoome" have edit-distance equals to 1 (1 deletion)
We wrote two versions of the algorithm, the first is a Recursive One, the second works on Dynamic Programming.
We worked with an Open Indexing - Two Layers Hashing Function, Hash Table which mainteined couple <key, value>
.
- Key is the concatenation between the two strings taken in exam.
- Value rapresents the edit-distance between the two strings calculated by the recursive version.
example: <HomeHoome, 1>
This method permits us to obtain a more efficient application version. In fact, once calculated an edit-distance between two strings, the result is stored in the Hash Table. Its access is done in O(1)
which is far more efficient than calculate again the edit-distance.
At last we added another Hash Table containing the whole dictionary so that if a word is correct, the result is detected in an unique access to this Hash Table. example: <HomeHome, 0>
.
- GCC compiler
sudo apt install gcc
- Make util
sudo apt install make
- Unity framework for unit testing (:warning: Copyright (c) 2007-14 Mike Karlesky, Mark VanderVoord, Greg Williams), see: https://github.com/ThrowTheSwitch/Unity.git
There is a Makefile which automatically compiles and runs the project and tests:
You just have to know two simple commands to use it. Once you have opened a shell in the root
folder, write:
- make all
to compile the application.
- make tests
to run the unit tests using Unity Framework
- make clean
to clean the root folder by the obj files and the executables in the subdirectories
- make cor=correctme.txt dic=dictionary.txt run
to run che application. cor
is the file which you want to be corrected, dic
is the file which rapresents your dictionary, both stored in the bin folder.
- CTRL-c
if you want to force the termination of the program.
bin
folder (one word per row).
Once exectued, the corrected text is printed and saved in the Corrected_file.txt file contained in the bin
folder.
We developed a Generic-Type Java Library containing the Union-Find-Set structure.
Union-Find-Set, aka Disjoint-set, is a data structures which plays a key role in Kruskal's algorithm for finding the minimum spanning tree of a graph.
To maintain an efficient implementation the tree height is controlled using Union By Rank and Path Compression heuristics.
We developed Unit Tests too, thanks to:
- Hamcrest 1.3, (:warning: Copyright (c) 2000-2015 www.hamcrest.org, see:
https://github.com/hamcrest/JavaHamcrest.git
) - J-Unit 4.12, (:warning: Copyright (c), see:
https://github.com/junit-team/junit4.git
)
This Generic class has three interesting methods:
UnionFindSet<T> findSet()
: it returns the collections's representative on which is called.void link(UnionFindSet<T> x, UnionFindSet<T> y)
: the lower rank representative element will be linked to the higher rank representative element. This solution is adopted in order to allows the new collection's rank not to grow (or in the worst case: when the two linked collections have the same rank, to grow the less as possible).void union(UnionFindSet<T> y)
: it merges two collections.
- JRE
- ANT Java
There is a Ant file which automatically compiles the project and tests:
You just have to know few simple commands to use it. Follow README.txt instructions.
We developed a Kruskal implementation.
- Graph package: it implements a Spread Graph using an HashMap which simulates the Adjacent List approach:
HashMap<FromNode, Pair<ToNode, Weight>>
. Take a look to Report.pdf to see the more about methods implementation complexity. - unionFindSet package: already discussed.
- kruskal package: it implements the MST (Minimum Spanning Tree) of a graph passed as parameter.
- ItalyGraph.java: read a .csv passed as parameter which contains the distances between italian cities. (Couldn't load it due to Copyright reasons)
We developed Unit Tests too, thanks to: - Hamcrest 1.3, (:warning: Copyright (c) 2000-2015 www.hamcrest.org, see:
https://github.com/hamcrest/JavaHamcrest.git
) - J-Unit 4.12, (:warning: Copyright (c), see:
https://github.com/junit-team/junit4.git
)
- JRE
- ANT Java
There is a Ant file which automatically compiles the project and tests:
You just have to know few simple commands to use it. Follow README.txt instructions.
Giacomo Perlo: Linkedin, perlogiacomo@gmail.com
Alessio Mana: Linkedin, alessioma20@gmail.com
Fabrizio Sanino: Linkedin, fabrizio.sanino@studenti.polito.it
If you need help or want to know something more about all of this, we are ready and excited to help you!