Grants:IdeaLab/Consolidating search results when query text is in Khmer

Consolidating search results when query text is in Khmer
Implement an algorithm that normalises the character sequence of search terms and Wikipedia content when written in Khmer ខ្មែរ​ script.
idea creator
Eltimbalino
this project needs...
volunteer
developer
designer
join
endorse
created on03:42, 15 January 2018 (UTC)


Audience edit

Who are the people you want to introduce Wikipedia to? edit

Try to be as specific as possible about this group.
People who predominantly use their native Khmer language to access information.

In what languages do they search for information, online or otherwise? edit

Khmer ខ្មែរ

In what ways does this group communicate with each other? edit

This can include services and apps in social media, mailing lists, physical spaces like conferences or lectures, or at specific institutions like at a library.
Facebook, even business communication is done this way here.

What are some reasons this group would use Wikipedia? How would they benefit from it, or what would they find useful? edit

Think about what they would be interested in reading and learning about; does Wikipedia provide better access to information this group cares about?
After all of the educated and intellectuals were deliberately massacred between 1975 and 1979, there has continued to be a lack of reliable information in Cambodia.

Project idea edit

What language Wikipedia projects will you promote to new readers? edit

Khmer ខ្មែរ

How will you communicate with new readers? Will you be communicating with them online, in-person, or both? edit

Enabling search results to be effective in Wikipedia

Describe your idea to engage new readers. How might it be implemented? What will you tell people about Wikipedia? edit

Think about the steps that might involved to make this idea happen, and what you might teach people about your experience or others' experiences using Wikipedia.
From the request I posted to Google here: https://productforums.google.com/forum/#!topic/websearch/fenCCXsoZY4;context-place=topicsearchin/websearch/category$3Adesktop---other-please-specify

Khmer is the language and script used by the people of Cambodia.

Depending on the keystroke sequence used, the same word, with the same correct spelling, shows different and incomplete search results.

The Khmer script has been adapted to Unicode in an excellent manner that makes learning to type in Khmer easy. This is because it is forgiving in regard to the keystroke order. This is important because different people write the same looking word in different sequences. And because the qwerty keyboard in different operating systems behaves slightly differently. Keymaps are not identical, but all are capable of creating the required words.

For example, a word meaning 'eat' is: ញ៉ាំ pronounced nyarm. It can be written with the following keystrokes. Note that the script on the left looks the same regardless. Note also that uppercass is achieved by SHIFT + keystroke, so the symbol " is created by SHIFT + ' ញ៉ាំ J"am ញុាំ JuaM ញាុំ JauM ញំុា JMua ញំាុ JMau

Even though the scripts on the left look the same, if they are pasted into Google Search as a search terms, each version will generate completely different results. I am guessing that this is because Google indexes and searches based on the unicode sequence, not the resulting script.

This is a significant problem for the Khmer people because all available Internet searches fail to provide comprehensive and ordered results. This is exacerbated by the fact that very few, if any, Khmer people know that results are omitted due to keystroke order.

While not a simple task, an algorithm could be generated based on the same rules that are implemented in Khmer unicode, which could normalise the keystroke sequence when indexing, and then use the same algorithm to normalise the search terms, for Khmer script.

The result of implementing such an algorithm when Khmer script is detected would be a significant improvement in the availability of relevent, quality information to Khmer people. This is particularly important if you recall that all educated Cambodian people were systematically killed between 1975 and 1979, leaving an extreme knowledge gap that remains a significant problem for Cambodia.

Kind regards,

Tim.


Samples based on above script, try pasting the script yourself: https://www.google.com.kh/?gws_rd=cr,ssl&ei=E0lhV_TXKcm00gSkhJX4CA#newwindow=1&q=%E1%9E%89%E1%9E%BB%E1%9E%B6%E1%9F%86&*

https://www.google.com.kh/?gws_rd=cr,ssl&ei=E0lhV_TXKcm00gSkhJX4CA#newwindow=1&q=%E1%9E%89%E1%9F%89%E1%9E%B6%E1%9F%86&*

https://www.google.com.kh/?gws_rd=cr,ssl&ei=E0lhV_TXKcm00gSkhJX4CA#newwindow=1&q=%E1%9E%89%E1%9E%B6%E1%9E%BB%E1%9F%86&*

How will you know if this project is successful? What are some outcomes that you can share after the project is completed? edit

If you’re not certain about how to respond to this question when starting your idea, you do not need to answer it right now. Campaign participants and Wikimedia Foundation staff can help you consider some options.
Searches in Khmer on Wikipedia will not end up with "No Results" as often. This is a very easy metric to track. The increase rate of Khmer searches on Wikipedia should rise more sharply.

Do you think you can implement this idea? What support do you need? edit

Do you need people with specific skills or resources to complete this idea? Are there any financial needs for this project? Do you need advising from Wikimedia Foundation staff?
No. This is a technical issue that needs to be resolved by Wikipedia. It would be great if once they have achieved this, they opensource their solution to Google and other search engine providers.

Get Involved edit

About the idea creator edit

I am an Australian, living in Cambodia and am concerned about the difficulties they face when accessing reliable information.

Participants edit

Endorsements edit

Expand your idea edit

Would a grant from the Wikimedia Foundation help make your idea happen? You can expand this idea into a grant proposal.

Expand into a Rapid Grant

No funding needed? edit

Does your idea not require funding, but you're not sure about what to do next? Not sure how to start a proposal on your local project that needs consensus? Contact Chris Schilling on-wiki at I JethroBT (WMF) (talk · contribs) or via e-mail at cschilling wikimedia.org for help!