I am interested in data mining certain sections of wikipedia for a personal project of mine.
Have you previously extracted, formatted ,stored the unformatted data from wikipedia?
For e.g. I want to find out all famous people from a certain city based on their occupation. I would like their name, date of birth and death, description about them , their country and their famous works , etc. stored in the database.
Basically converting XML wikipedia dumps to formatted mysql tables with textual information and not wiki readable information.
Let me know and we can talk about details.
Please show samples of your previous work.