
Discover more from #sarkari
One quick hack to make government websites more language inclusive
Why making government websites available in all languages is a hard problem and a possible solution
If an Oriya speaker goes to rural.nic.in (MoRD's main portal) or omms.nic.in (PMGSY's main portal) they would have to choose between Hindi or English. Most government websites catering to national schemes are used by a set of diverse stakeholders - Citizens, Gram Panchayat functionaries, Data Entry Operators, Block, District and State HQs. A Block officer in Gujarat is using the same portal as the citizen in Assam who is trying to figure out their benefits. Digitization of workflows has already consolidated power in the hands of the tech literate but language of communication makes it further exclusive.
A simple example, our team generated PDF reports for printing for Block level functionaries to complete a task on the ground. It had recommendations for each edge cases listed row-wise. One of the recommendations was "Form a new VO". Our product manager reported back from the field stating the functionary was struggling to read the word "form" and thought it meant "from". That would change the meaning of the recommendation. While the visit by the team was to get feedback on the format and its columns etc, we missed the basic fact that the whole of it was in English. There are other examples where "Landless" was interpreted as "Landline" leading to Kafkaesque results which I'll save for a later date.
Most central government websites are eventually available in Hindi or English. Some attempt to cater to more languages but generally limit their exercises to static public interfaces. If you go to COWIN and select Oriya, you'll find the resulting page is a mix of English and Oriya.
Often departments create word to word mapping dictionaries when they wish to make their content available in other languages. These mappings are often cumbersome one-time exercises and don't cater to new content or dynamic reports. The process involves creating a spreadsheet of word to word mappings, sending it to the states for vetting and if we get a reply then uploading that on our website. This is a broken system.
You can't use google translate either as it often breaks formatting but more importantly it doesn't understand when to translate vs transliterate.
Let's take an example of a common MIS report which has the name of beneficiaries, their home districts and the status of their entitlements (With Block for Approval, Delivered, Not Submitted). Here the District and Beneficiary name columns would need to be transliterated but the Status column would need to be translated. You don't want beneficiary name "Harsh" to be translated into "कठोर" (ie. rude/mean) but remain has "हर्ष ". At the same time, you want "Not Submitted" to become "प्रस्तुत नही किया है".
While people have figured out language models which accurately translate and transliterate separately (https://github.com/libindic), it still doesn't solve our problem entirely.
Plus, many IT systems are legacy. You can do a word to word mapping for the public static interfaces, but the workflows behind logins are a house of cards. Plus, backend systems vary across schemes. A solution for SQL Server might not work for websites built on Spring or Angular.
A possible solution is an open-source browser plugin specifically trained to operate on sarkari MIS vocabulary and idiosyncrasies. It will be client side so it won't impact our legacy systems. And narrowing the use case to government vocabulary makes it an easy problem to solve. A mix of unsupervised and rule based logics could solve the problem. Some custom rules on identifying column types in dynamic sarkari reports, hard coded cases for translation/transliterations on common government vocab and leaving the rest to the algorithm's best guess.
This would truly be a digital public good. But do solutions with no eventual pathways for market profits count anymore? :P
Submitted please.
PS. While the first noting on substack, I am continuing the numbering from the blog. We all hate it when people start new sarkari files without back referencing.
Previous notes are available at my blog and you can check my tweets for more #sarkari micro-musings
One quick hack to make government websites more language inclusive
If you do find a solution, do share. We have been struggling for transliteration from English to Tamil and vice versa.
If you do find a solution for this, do share - it is one thing that we have been struggling with to transliterate from English to Tamil and vice verse.