Java projects for extracting and searching for Wikipedia redirects (alternative titles)
Project created by Michael Gloger for school assignment at FIIT STU Bratislava http://vi.ikt.ui.sav.sk/User:Michael.Gloger?view=home
Main goal of this project was to implement parser for finding alternative titles for Wikipedia pages by parsing articles XML dump files. Amongst other detailed information, in each page record we can find page title and flag if this page is redirect to another page. If this page is redirect we can consider its title as alternative title of page it is referring to.
Please note that this project does not bring any new exciting functionality. Wikipedia provides online services such as "What links here" where you can find amongst other things pages referring to specified page. This project was more like a challenge because input XML files are larger than 50 GB of more than 14 mil pages records.
This repository contains two Java projects: