Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

project: Adding Indian place names to Spellcheck Dictionaries #5

Open
answerquest opened this issue Feb 17, 2018 · 0 comments
Open

Comments

@answerquest
Copy link
Collaborator

Where this is coming from:
https://etherpad.net/p/LibreOffice-Hackathon-Gnunify
17 Feb Gnunify 2018 event: Session on hacking LibreOffice conducted by @geekgod where we talked about this.

Initial task list:

  1. District Census Handbook page: http://www.censusindia.gov.in/2011census/dchb/DCHB.html
  2. Download excel files for each state under "Town Amenities" and "Village Amenities" headings.
  3. Find the worksheet & column for a. Districts , b. Sub-districts. And if desired, c. Towns, and d. Villages.
  4. Extract the data. Take care to exclude headers.
  5. Remove duplicates.
  6. Remove artefacts like "(MC)", hyphens, asterisk etc.
  7. Isolate entries having multiple words and figure out what to do with them. One option is to add those words in distinct entries, and remove the duplicates.
  8. Diff with existing dictionary to get the place words that aren't present in dictionary.
  9. Push this list to update the dictionary on LibreOffice and possibly other places.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant