위키낱말사전 미래

This page is a translated version of the page Wiktionary future and the translation is 49% complete.
Other languages:
Bahasa Melayu • ‎English • ‎Esperanto • ‎dansk • ‎français • ‎galego • ‎italiano • ‎magyar • ‎polski • ‎português do Brasil • ‎română • ‎беларуская (тарашкевіца) • ‎العربية • ‎ગુજરાતી • ‎ไทย • ‎ဖၠုံလိက် • ‎မြန်မာဘာသာ • ‎中文 • ‎한국어
"위키데이터:위키데이터:위키낱말사전위키데이터/참고 사항/미래도 참조하세요."

이 페이지는 위키낱말사전의 미래를 위한 아이디어를 모으는 것을 목표로합니다. 실제로 위키데이터의 도입으로 많은 가능성이 나타납니다. 더욱이, 오메가위키(OmegaWiki)를 채택하겠다는 제안은 중복되는 목표를 가진 프로젝트에 대한 현재의 흥분을 더하고 있습니다.

목표는 먼저 각 프로젝트의 특성과 겹치는 부분을 공개하는 것입니다. 그런 다음 중복되지 않는 목표를 전달하고 중복된 방식으로 자원을 낭비하지 않는 방법에 대한 제안을 논의 할 수 있습니다. 마지막으로, 이 페이지는 위키낱말사전에 관한 장기 계획을 제시하고 토론하는 데 사용되어야합니다.


이 페이지의 구성

이 페이지가 유용한 방식으로 성장하도록 돕기 위해 권고 구조가 제안되었습니다. 물론 토론 페이지에서 문서를 구성하기 위해 자신의 아이디어를 공유할 수 있습니다.

각 제안 섹션은 다음 형식으로 "제안" 아래에 있어야합니다:

제안 제목
주제에 대한 힌트를 제공하는 레벨2 섹션 (===).
제목 요약
제안에 대한 매우 간단한 설명을 제공하는 레벨3 섹션 (====).
제목 현재 상황
현재 상황의 단점을 설명하는 레벨3 섹션 (====).
권장되는 개선 사항 제목
레벨4 (=====)의 자막을 사용하여이를 설명하는 레벨3 섹션 (====).
제안서에 대한 제목 의견
레벨3 섹션 (====), 처음에는 비어 있고 다른 사람에 의해 채워집니다.

다음은 틀입니다:

=== 제안 제목 ===

==== 요약 ==== 

==== 현재 상황 ====

==== 권장 개선 ====

==== 제안에 대한 의견 ====

제안

위키낱말사전 프로젝트에 대한 정보 수집

요약

위키낱말사전의 미래를 위한 적절한 권장 사항을 작성하려면 먼저 최첨단 작업을 실현해야합니다. 문서는 다음에 대한 요약 및 참조를 제공해야합니다:

  • 위키낱말사전: 초기 목표 및 구현 선택.
  • 위키낱말사전의 현재 상태: 각 장에서 기사가 구성되는 방식의 유사점 및 차이점.
  • 오메가위키시맨틱 미디어위키와 같은 관련 프로젝트 목록: 위키낱말사전에서 나온 이유는 하나의 프로젝트로 통합 할 수있는 목표와 위키낱말사전의 목표입니다.
  • 더 구조화 된 위키낱말사전의 장단점은 무엇일까요? 분명히 구조는 지부 간 자동 피드백에 도움이 될 수 있습니다. 그러나 다른 한편으로 미디어위키 프로젝트에서 가장 중요한 부분은 커뮤니티입니다. 지부 간 구조를 사용하면, 특히 일부 지역의 기존 특이성이 해당 구조에서 불가능해지면 통제력 상실로 인해 잠재적인 공동체 불만이 제기될 수 있습니다. 이러한 상황을 방지하는 방법을 조사해야합니다: 광범위한 피드백 캠페인, 유연한 구조 등..

현재 상황

알려져 있는 한, 그러한 보고서는 아직 존재하지 않습니다.

권장 개선

보고서를 작성해야하며 멘토링 프로젝트 아이디어로 제안 할 수 있습니다.

제안에 대한 의견

예상 사용량 목록 작성

요약

데이터를 구조화하는 방법에 따라 특정 용도를 완화하거나 강화합니다. 따라서 예상되는 용도 목록이 있으면 쉽게 사용할 수있는 구조를 설계하는 데 도움이 됩니다. 물론, 상상할 수있는 미래의 모든 용도를 추측하는 것은 불가능하지만 적어도 지금 잡을 수 있는 모든 아이디어를 고려하면 도움이 될 것입니다.

현재 상황

문서와 데이터베이스 덤프에서 구조화 된 정보를 추출하는 것은 매우 어려울 수 있습니다. 그 내용은 위키낱말사전 html 페이지 생성에만 초점을 맞추고 때로는 위키미디어 엔진 외부에서는 거의 해석할 수 없는 틀에 크게 의존하기 때문입니다.

권장 개선

위키낱말사전 콘텐츠의 다른 사용 사례를 나열하고 이를 완화하기 위한 요구 사항에 도달합니다.

제안에 대한 의견

위키낱말사전 데이터 모델 개선

요약

위키 방식은 대부분 프로를 증명했지만 코인은 여전히 남아 있지만 공동체는 예를 들어 위키데이터 이니셔티브와 같은 일부 요소의 구조화 및 공통 데이터의 중앙 집중화를 통해 기꺼이 극복하려고 합니다.

현재 상황

위키낱말사전의 각 언어 버전에는 데이터를 구조화하는 고유한 방법이 있으므로 번역과 발음 및 동의어와 반문과 같은 단어 관계와 같은 공통 데이터를 공유하기가 더 어렵습니다.

권장 개선

모든 위키낱말사전 기여자를 위키낱말사전 항목의 글로벌 구조화에 참여 시키세요. 구조화된 데이터를 확장하는 정의된 방법을 통해 각 공동체가 필요한 특수성을 계속 개발하면서 글로벌 교차 버전 협업 작업에서 좋은 행위자로 남을 수 있습니다.

  • 위키낱말사전이 쉽게 사용할 수있는 용도 정의
  • 위키낱말사전 항목 정의:
    • 필수 데이터
    • 데이터가 불안정한 지 알려주는 방법
      • 항목과 관련이 없음
      • 기여가 필요합니다
      • (여기에 다른 아이디어 입력)
  • (다른 아이디어 추가)

제안 아이디어

제안에 대한 의견

위키낱말사전 데이터 모델 개선

요약

현재 상황

권장 개선

제안에 대한 의견

위키낱말사전에 대한 작업과 토론

위키데이터의 위키낱말사전 지원에 관한 위키마니아 2013 회의

위키마니아 2013에서 위키데이터의 위키낱말사전 지원에 관한 회의가 조직되었습니다. 다음 행사 중 메모가 작성되었으며 온라인에서 교차 목록 토론이 함께 진행되었습니다.

11:30-13:00, Y520 호실 참석자: 미크루(Micru), Aude, Nikerabbit, Duesentrieb, Jaing Bian, 데니(Denny), Chorobek, Linda Lin, Amgine (스카이프 사용), 스트렙로(Stepro), 버티고(Vertigo) (스카이프 사용), Nirakka, 비이크(GWicke), Max Klein

  • 미크루는 여러 가지 데이터 모델을 제시합니다.
    • 여러 위키낱말사전
    • 오메가 위키
    • 워드넷(WordNet)
    • 레몬 모델
    • 디비피디아(DBpedia) 위키낱말사전
  • 데니의 말: 이 회의의 아이디어는 아무것도 결정하지 않고 토론을 논의하고 이동하는 것입니다. 이것은 분명히 위키낱말사전에 대해 결정하기에 적합한 사람들이 아닙니다. 우리의 주요 목표는 우리 자신의 어휘 자원을 만드는 것이 아니라 위키낱말사전을 지원하는 것입니다.
  • Amgine의 말: 위키낱말사전의 인터위키 링크 설정은 다소 복잡합니다. 단어/어휘와 개념의 차이. 감각, 어휘, 정의에 관한 용어 질문...
  • 스트렙로의 말:구조화 된 편집을 편집 할 수있는 간단한 양식 기반 사용자 인터페이스를 갖기 위해 모든 언어로 제공되는 구조화 된 데이터, 위키낱말사전 회의일 수도 있지만 어떻게 시작해야할까요?
  • 데니: 위키데이터가 회의에 초대하면 날아 가지 않을까 걱정. 독일어 위키낱말사전은 이런 종류의 구조화 된 데이터를 원합니다. 독일어 위키낱말사전의 테슈투베(Teestube), 영어의 맥주 가게(Beer parlor) 및 그리스 핏(Grease Pit), 하루 중 다른 시간에 영어를 위한 IRC 채널. 더욱 영어 위키낱말사전에는 기술적으로 기울이는 위키낱말사전을 위한 "그리스 핏"이라는 공통 대화 영역이 있습니다.
  • 버티고의 말: 이거 미리 먹어 보면 좋을 텐데
  • (데니) 문제는 우리가 무엇을 개발해야하는지 알아야한다는 것입니다. 나중에 변경하면 비용이 많이 듭니다.
  • (미크루) 어쩌면 모의?
    • (버티고) 실물 모형(mock up)은 아주 좋은 아이디어입니다.
  • 스트렙로의 말: 첫 번째 위키낱말사전 회의는 올해 였고 다음 회의는 아마도 봄에 있을 것입니다.
  • 비이크의 말: 파소이드(Parsoid)는 위키텍스트의 구문 분석을 수행합니다. 자바스크립트를 통해 틀 및 틀 블록 가져 오기. 위키낱말사전이 시각편집기에 대한 특별한 요구 사항이 있는 경우 가브리엘 비이크(Gabriel Wicke)에게 문의하세요. parsoid.wmflabs.org에서 사용 가능한 첫 번째 데모.
  • 비이크의 말: en.WT에서 편집기 특수 문자를 보셨나요? 확장 또는 가젯입니까?

구조화되고 제안서에 병합되어야하는 기여

데니의 질문

저에게 있어서 여기 제안은 약간 진보 된 것입니다. 먼저 약간의 배경 지식을 수집하고 싶습니다. 함께 모을 수 있다면 대단히 감사하겠습니다.

아래 페이지에는 데이터 모델 아이디어와 사용 사례 및 요구 사항이 혼합되어 있습니다. 이 페이지를 분류하고 다른 지점으로 나누어 사용 사례와 기타 요구 사항 및 데이터 모델을 논의하는 것이 좋습니다. 데이터 모델의 경우 오메가위키의 기존 데이터 모델, 2~3개의 성공적인 위키낱말사전, 워드넷(WordNet), 버브넷(VerbNet) 및 프레임넷(FrameNet)에 대한 간단한 설명을 보고 싶습니다. 이러한 배경은 데이터 모델을 구축하는 데 매우 유용합니다.

이 사이트를 적절하게 준비하고 토론을 구성하고 싶은 사람이 있나요? --denny (토론) 2013년 4월 3일 10:28 (UTC)

의견
좋은 제안이지만, 우리는 그들에게 도달하기 위한 도구(위키데이터와 오메가위키, 워드넷, 벌브넷 등)에 대해 생각하기 "전에" 대상 설명이 필요하다고 생각합니다. 저는 WT 프로젝트의 WT 비즈니스 모델을 모르고 "여기에 제시된 제안의 대상, 비즈니스 모델?" 이라는 제목을 추가하기 "이전"에 아래에 (스케치) 제안을 했습니다. 하나가 있나요? 어디서 찾을 수 있나요? 먼 미래에 이르는 장기적인 개발을 위해 정말 열려 있나요? [1][2] 장기 목표를 알면 좋은 제안을 하고 그 (상상) 가치를 더 잘 평가할 수 있습니다. 그러나 이것은 제안을 구성하는 방법에 대한 여러분의 질문에 대답하지 않습니다. 저는 단기, 중기, 장기를 제안합니다. 일부 중장기 제안은 전제 조건으로 많은 단기 제안이 필요할 수 있습니다. NoX (토론) 2013년 4월 3일 17:38 (UTC)
  1. 예를 들어, 음성 인식을 추가합니다: 중국 여행을 가는데 이해가 잘 안 돼서 휴대폰을 들고 SIRI에게 영어로 물어봅니다. 이것은 영어로 무엇을 의미합니까?(What does this mean in English?), 제 파트너의 입에 전화를 잡고 제 파트너는 말합니다: 朋友. SIRI(또는 더 나은 것)가 WT 프로젝트 데이터베이스를 사용하기를 기대합니다(여전히 최고이기 때문에). WT 기능은 사운드를 IPA 문자열로 변환하고, WT 데이터베이스에서 검색을 시작하고, 피드백을 전달합니다. SIRI는 다음과 같이 응답합니다: 당신은 현재 중국에 있습니다(제 GPS 위치가 활성화됨). 이 단어의 의미는 만다린어로 친구일 수 있습니다. (현재 WT 데이터베이스 구조를 사용하여 이것이 달성될 수 있다고 생각하는 사람이 있나요?)
  2. 음성 입력을 IPA로 "번역"한다는 아이디어는 흥미롭지만 현재로서는 현실적이지 않습니다. 우리는 IPA를 거의 전면적으로 점검해야 합니다. 많은 사람의 몇 가지 장애물: (1) 발화를 전사하는 방법은 여러 면에서 언어에 따라 다릅니다. (2) 침묵 기간(일시 정지, 성문 정지, 파열음)의 음향적 인식 및 구별은 불가능합니다. (3) 운율은 의미를 전달하지만 IPA는 쓰기에 너무 적은 수단을 제공합니다.

저는 오메가위키와 위키낱말사전, 워드넷 등의 데이터 모델을 도구로 간주하지 않습니다. 저는 그것들을 수십 년의 경험이 흘러 들어 오는 분명히 관련된 작업으로 간주합니다. 이미 몇 가지 좋은 현실 테스트를 거친 기존 데이터 모델을 명시적으로 설명하지 않고 새 데이터 모델을 구성하는 것은 주어진 분야에서 천재가 아니거나 경험이 많지 않다고 가정할 때 잠재적인 시간 낭비처럼 보입니다. 그래서 저는 여전히 제 요청을 고수합니다. 완전히 새로운 모델을 소개하기 전에 최소한 몇 가지 위키낱말사전과 오메가위키 및 워드넷의 기존 데이터 모델을 먼저 설명하겠습니다. --denny (토론) 2013년 4월 4일 11:14 (UTC)

여기에 제시된 제안의 대상, 비즈니스 모델?

여기에 제안을 추가하는 것은 스스로에게 다음과 같은 질문을 제기해야합니다: "제 제안으로 누가 혜택을 받나요?"

  • 제 잠재 고객, 제 제안의 최종 사용자는 누구인가요?
  • 그 뒤에 있는 비즈니스 모델은 무엇인가요?
  • 윈-윈 상황으로 이어질 것인가
    • 나를 위해 (나만을 위해?),
    • 전세계 WT 프로젝트의 경우,
    • 전 세계 (희망적으로 확장되는) 사용자 그룹: 사용자 읽기, 사용자 기여 또는 (매우 필수적인) 사용자 기부?

여기서 윈-윈 상황은 두 당사자, 즉 WT 프로젝트 자체와 사용자간에 존재할 수 있습니다.

WT 사용자

WT 프로젝트는 광범위한 사용자의 진정한 요구를 충족하고 현재 및 미래의 이 사용자 그룹(언어 관련)의 요구를 충족하는 방향으로 정확하게 평가하고 계속 발전할 경우에만 장기적으로 번창 할 것입니다.

이 사용자 그룹은 WT의 품질이 높고 사용 편의성이 증가하는 경우(이해하기 쉽고 기여하기 쉬운 경우)에만 비용을 지불(기부 형태로, WT 인프라에 절대적으로 필요하며, 지속되는 광고를 통해 기부를 하지 않기를 바랍니다)할 의향이 있습니다.

이것은 모든 비즈니스에서 제기되는 질문으로 이어집니다. WT 프로젝트 사용자(전 세계)는 누구입니까?

저는 개인적으로 다음 사용자 그룹을 설정합니다.

  1. 단일어 WT 사용자
  2. 다국어 WT 사용자.
  3. WT 데이터 빨판: WT에서 데이터를 빨아들이는 경쟁자.

처음 두 그룹 내에서 일반 사용자(단순히 단어 검색, 외국어 학습)와 언어 전문가(번역가, 언어를 공부한 사람들)를 구별해야 합니다. 확실히 일반 사용자는 더 큰 사용자 기반입니다. 일반 사용자의 작은 부분이 기여할 수 있습니다. 더 쉽게 WT를 사용할 수 있습니다. 더 많은 (대량) 입력을 생성할 수 있습니다. WT 프로젝트는 언어 전문가의 요구도 충족해야 합니다. 언어 전문가의 관심사가 일반 사용자의 관심사와 일치하나요? 제 생각에는: 아닙니다. 하지만 그들은 좋은 방식으로 공존할 수 있습니다. 언어 전문가는 일반 사용자의 실제 요구 사항을 이해하고 있나요? "가끔 내 감정은 : '아니요' 입니다".

"좋은 제안의 실제적이고 일반적인 이점"을 추정하기 위한 이러한 사용자 기반의 규모는 얼마인가요?

NoX (토론) 2013년 3월 29일 12:22 (UTC)

개선 된 위키낱말사전 데이터 모델 제안

NoX: 제안 철회.
"이유"
  1. 이 제안은 병렬로 실행되는 모든 것(예를 들어, 오메가위키와, 위키데이터 등)을 자세히 알지 못하는 일반적인 교차 WT 사용자(위의 "여기에 제시된 ..." 참조)의 사용 요구를 반영하여 위키낱말사전, 특히 위키낱말사전에 대한 "보통의 일반인"의 관점에서 수행되었습니다.
  2. "위키낱말사전 데이터 모델의 개선"은 적절한 제목이 아닙니다. 현재 WT의 "번역과 번역 예"에 대한 부족과 결함이 주요 원인이었습니다. 이러한 결점은 오메가위키에 의해 명확하게 식별되고 제가 제안한 방식(구조화된 데이터)으로 처리되지만 "실현 방식"은 저에게 합리적이지 않은 것 같습니다(두 개의 다른 언어 세계).
  3. 하나의 제안이 아니라 "하나뿐인 WT 세계"를 대상으로 하는 여러 제안입니다.
  4. 하향식 목표(위의 "여기에 제시된 ..." 참조)를 모르는 "상향식 제안"입니다.
그래서 저는 여전히 모든 제안(일부는 조금 다른 관점에서, 일부는 추가로) 뒤에 서서 더 나은 방법을 찾고 있습니다. NoX (토론) 2013년 4월 6일 20:31 (UTC)

요약: "다음 제안은 내부의 위키낱말사전 데이터 구조를 개선하는 것을 목표로 합니다. 현재 대부분의 텍스트 및 마크업 기반 구조는 데이터 콘텐츠의 품질이 다소 우발적이지만 실제 장기 요구 사항을 나타내는 더 적절하고 이해하기 쉽고 사용 가능한 데이터 모델로 단계적으로 이동해야 합니다. 주제. 보기는 위키낱말사전 사용자의 보기입니다. 목표는 로드맵이 아니라 설명되어 있습니다."

현재: 하나의 위키낱말사전 프로젝트에는 m개의 위키낱말사전이 포함되어 있습니다. (m = many, 많은). 각 위키 사전에는 m개의 위키가 포함되어 있습니다. 각 위키는 마크업과 혼합된 텍스트를 포함하는 하나의 텍스트 블록을 포함합니다. 이러한 텍스트 블록 내에는 1이 포함될 수 있습니다 : 동일하거나 다른 언어의 m 단어(실제 위키낱말사전 언어일 필요는 없음). 각 단어는 단어 토큰 자체(1바이트 또는 더블 바이트 문자 문자열), 단어 유형(명사, 동사 등), 성별 및 기타 특성(있는 경우), 1을 포함할 수 있습니다 : m 발음, 1 : m 단어 의미. 0 : m (단어) 표현식, 0 : 파생어에 대한 m 참조(링크). 각 단어의 의미는 1 : m을 1로 번역 : m 언어. 각 번역에는 2개의 링크가 있습니다(프랑스어 관점). 하나는 번역된 언어의 단어를 포함하는 위키에 대한 동일한 위키낱말사전으로, 두 번째 링크는 번역된 단어가 포함된 위키에 대한 참조된 언어의 다른 위키낱말사전 프로젝트에 대한 링크입니다. 지금까지 잘 알려진 직원들에게. 함께 보기 [위키, 인터위키].

권장 사항:

위키낱말사전 프로젝트는 전세계 커뮤니케이션을 개선하는 데 큰 도움이 되는 훌륭한 세계적인 프로젝트입니다. 따라서 제시된 권장 사항은 하나의 위키낱말사전 내에서 하나의 언어에만 초점을 맞추지 않는 교차 언어 경계 사용자 사용자의 권장 사항입니다.

1. 모든 위키낱말사전에서 하나의 태그 표준, 하나의 모델 표준과 하나의 모델-시퀀스 표준 만 사용하세요.

독일어 위키낱말사전과 프랑스어 위키낱말사전 및 영어 위키낱말사전은 다른 태그와 모델 구조를 사용합니다.

예를 들어:

  • (독일어) 번역 마크업: *{{fr}}: [1] {{Ü|fr|maison}} {{f}}, ''Normannisch:'' maisoun
  • (프랑스어) 번역 마크업: * {{T|de}} : {{trad+|de|Haus}} {{n}} or * {{T|de}} : {{trad+|de|Haus|n}}
  • (영어) 번역 마크업: * French: {{t+|fr|maison|f}}

부분적으로 마크 태그가 병렬로 존재합니다.

예: 정의된 장소의 젠더 마크업 '|f'(여성) 및 {{f}}(어디서나).

언뜻 보기에 이것은 약간 편협한 것처럼 보입니다.

 
이미지 1: "중복".

하지만. 태그 표준 버전을 업데이트하기 위한 도구의 생성은 용이하고 위키낱말사전의 경계를 넘어 광범위하게 사용될 수 있습니다. 표준화의 부재는 한 번만 생성된 기능의 이전을 방지합니다. 예를 들어 번역을 자동으로 업데이트하기 위해 다른 위키낱말사전으로 자동화된 데이터 전송을 개선합니다. 같은 방식으로 모든 자동화 기능은 모든 위키낱말사전에 대해 하나의 버전에서 사용될 수 있습니다.

"저는 영어 태그-, 모델- 및 모델-시퀀스 표준을 일반 표준으로 사용할 것을 제안합니다."

이 경우 마크업 언어에 대한 실제 영어 도움말 설명이 실제로 선두에 있어야 하고 또 그래야 합니다. 다른 위키낱말사전은 "번역"만 하면 됩니다. 현재(프랑스어 위키낱말사전에서) 부분적으로 열악하고 부분적으로 구식이며 접근할 수 없는 모델 설명은 기여하려는 의사는 있지만 노하우가 부족한 사용자의 광범위한 사용을 방해합니다.

2. 다른 위키낱말사전에서 중복을 피하십시오.

[NoX: 2013년 3월 28일 수정됨.]
제 생각에는 같은 언어 코드를 가진 같은 단어각각 위키낱말사전(WT)에 존재하는 것은 엄청난 시간과 노력의 낭비입니다.

"외국어"의 모든 단어(WT의 "호스트 언어" 제외 (예를 들어, 프랑스어 WT에서 비프랑스어 단어))는 "제거"되어야 합니다. 이미지 1에서 모든 단어를 분홍색으로 지정합니다. 제 생각에 그들은 이 언어에서 "가장 좋은 표현"으로 남용되었습니다. 그것들은 중복이며 어떤 면에서 그들의 생산은 기여하는 사용자들에게 불필요한 시간 낭비입니다. 이러한 노력은 더 나은 목적을 위해 사용될 수 있습니다. 즉, 이중 언어 번역 예제를 개선하는 것입니다.

이미지 1의 위키낱말사전 보기에서 이것은 다음을 의미합니다:

영어 [1]와 프랑스어 [2] 위키낱말사전에서 Haus는 제거되어야 합니다.
독일어와 영어 위키낱말사전에서 Maison은 제거되어야 합니다.
독일어와 프랑스어 위키낱말사전에서 house는 제거되어야 합니다

다음 장에 설명된 대로 다른 ENTITY로 대체됨 (TransEx-Entity 참조).

이것을 설명하기 위해 먼저 프랑스어에 강한 관심을 갖고 있는 모국어 영어(교차 WT 보기)인 (읽기(기여의 반대)) 사용자의 관점을 취하겠습니다. 그가 "maison"을 모르고 "house"를 나타내는 프랑스어 단어를 찾는 경우 영어 WT "house"를 사용할 수 있습니다. 번역 참조 유형 2(이미지 3 참조), "교차 해당 링크를 클릭하여 프랑스어 WT"를 검색하고 "maison"에 관한 완전하고 최상의 정보를 얻으십시오. 그가 찾은 "maison"에 관한 언어 정보는 WT 프로젝트의 기타 모든 WT에서 찾을 수 있는 "최고의 언어"라고 확신합니다. "maison"에 대한 지식을 풍부하게 하고 난 후, 그가 영어 WT에 이 단어를 추가해야 하는 이유는 무엇입니까(그가 기여하는 사용자인 경우)? 반대의 경우(프랑스어에 관심이 있는 사용자 모국어 영어)는 동일한 방식으로 작동합니다.

그래서 제 주장은 다음과 같습니다.

  • "house"에 관한 최고의 (언어) 정보는 영어 WT에서 찾을 수 있습니다.
  • "maison"에 관한 최고의 정보는 프랑스어 WT에서 찾을 수 있습니다.
  • 독일어 WT 등에서 찾을 수 있는 "Haus"에 관한 최고의 정보.

일반적으로 특정 언어의 단어에 관한 최상의 정보는 호스트 WT(이미지 1의 파란색 단어)에서 찾을 수 있습니다. 호스트가 아닌 WT(이미지 1에서 분홍색으로 표시된 단어)에서 이러한 단어의 모든 표현은 "일반적으로 품질이 낮음"입니다.

그러나: 이것이 진실의 전부가 아님을 압니다. 실제로 이러한 분홍색 위키 단어에는 이 단어 자체에 관한 "중복" 정보(예를 들어, 발음과 성별, 편차, 활용 등)만 포함되어 있지 않습니다.

They also provide:

  1. A synonym reference to synonyms in host language.
    Synonym references are currently of better quality within the pink words (image 1) than in the blue ones. E.g. a contributor with mother language English better knows the English synonyms for maison. But why should he add them to maison in the English WT as he does today? Wouldn’t it be better to improve the translations of maison in the French WT? This would make the blue words to be used as synonym references. (I know this touches the problem of differences in WT mark-up which hampers updating other WTs: different translation mark-up tags. See my recommendation 1: Use only ONE tag-standard across all WTs.)
  2. A bilingual representation of the usage of a word in a defined bilingual meaning/sense context (translation examples).
    Today they are in my opinion the only strong reason for the existence of the words in pink (which I propose to remove). The true sense of these bilingual representations is, that they pass to the reading user a hint, a helping hand, how a word is typically and properly used in a (bilateral) language context: In the form of examples, translated into the other (!?!) langue. This decisive information must be kept and survive. I propose to transfer it into the TransEx-Entity (Translation examples). Its content should composed (in my example) from house in the French WT, and from maison in the English WT. Bilaterally.

NoX (talk) 14:36, 28 March 2013 (UTC) (NoX)

3. Improve the data model. Introduce IDs and attributes.
 
Image 2: Data model.

The data model generally prosed for Wiktionaries is represented as ERM diagram in crowfoot representation. Data modelling and data representation (as shown to the user or edited) are two different things.

Primary Entity-Type shown is Word. It should only contain the basic information representing one single word in a defined language (Wiktionary language code), and word type (verb, substantive etc.).

In terms to understand the intention of the data model it seems to be necessary the identifier of Word. The unique identifier, the key of Word should be composed of the following attributes:

  1. Word-token. String of single or double byte characters representing the written word in any typeface.
  2. Language code.
  3. Word-type. Defined permissible word-types in relation to language code. E.g. substantive, verb, adjective etc.
  4. Homonym-token. String single or double byte characters in any typeface to discern homonyms. (Words of the same word-token, same language and same word type). Normally empty.

Each Word has one or more Meanings. Meanings are currently represented in Wikis by

(example DE:) :[1] [[Unterkunft]], [[Gebäude]],
(example FR:) # {{architecture|fr}} [[bâtiment|Bâtiment]] [[servir|servant]] de [[logis]], d’habitation, de demeure.
(example EN:) # {{senseid|en|abode}} A structure serving as an [[abode]] of human beings..

Each Meaning-Entity should contain one and only one meaning and one or more characteristic sentences (repetition is not shown in image 2), using the word under this specific meaning (as currently). Meanings should be ordered by relevance.

Each Meaning has 0 to n Translations. Each Translation has two references.

 
Image 3: Translation references.

I propose to substitute reference Typ 1 (see image 3) to refer to TransEx-Entity (bi-lingual translation examples). This seems to me to be a decisive change. Reference Typ 2 should be kept. It needs to point to a word in the language referenced by the translation.

Establishing the TransEx-Entity (bi-lingual translation examples) avoids all redundancies currently met at foreign language words in one defined Wiktionary. Data technically it’s a relationship-type-Entity to resolve the many-to-many relationships between a specific word meaning in different languages. Its content is bi-lingual. It does not belong to one Wiktionary. It’s a bridge between two Wiktionaries.

What could its content look like? Example: TransEx between German first meaning of DE: Haus and French first meaning of FR: maison. (Examples taken from (DE: maison) and (FR : Haus).)

Signification – Bedeutung
DE: Haus im Sinne von: [1] Unterkunft, Gebäude
FR: Maison en sens de: (Architecture) Bâtiment servant de logis, d’habitation, de demeure.
Exemples – Beispiele
FR: [1] Dans quelle maison est-ce que tu habites?
DE: In welchem Haus wohnst du?
FR: Sa maison se trouvait seule sur une colline. De là, on avait une vue sur les toits des autres maisons du village.
DE: Sein Haus stand einsam auf einem Hügel. Von dort blickte man über die Dächer der anderen Häuser der Stadt.

All other information in TransEx should be avoided. E.g.: Word-type, gender, pronunciation, translation. See German example of (DE: maison). They are superfluous and redundant at this place. These information are attributes of other entities; mostly of the entity Word itself.

Other entities: They seem to be self-explaining. Not all of them are detailed.

Comparison between the current data model and the proposed one.

ID of Word.
Currently the ID of a WIKI is only the character-string, representing a word. The proposed ID of a Word consists of several attributes that should be put into separated (database) data-fields. Not into text mark-up. (A change of one of these IT-attributes would result into a database-move-process of the word.)
Other common attributes.
They could be put into a text-container, containing the well-known, hopefully standardised mark-up. Such a container could also contain (as currently) repeating groups, e.g. translated sentence pairs in TransEx. The same could be the case with entities like See, Expression, Derived word etc. Another possibility could be to put them into separate database elements.
TransEx entity.
As described, this entity represents the deep and broad Jordan River, which has to be crossed.

Future presentation and editing.

If you look at the English style of presentation I do not see big differences (besides the not-yet-extant TransEx entity).

One thing that really needs improvement in the presentation area is the display of translations. The currently usable roll-in roll-out mode seems to me to be simple-minded. An experienced cross-language user is interested in only two or three languages. User-defined it should be possible to select a translation language roll-out mode that rolls-out only translations of languages requested.

A big challenge seems to me to be the future editing process. It needs to be greatly improved. Preferably window based pop-up sequences, oriented at the entity structure, containing input fields that do not require the knowledge of the mark-up, except perhaps in an expert mode.

Advantages of the proposal.

  1. One general, commonly usable data structure.
  2. The limits between the Wiktionaries could be demolished. All Wiktionaries could be put into one worldwide Wiktionary-Project data pot.
  3. One general, worldwide usable mark-up language would be established.
  4. Functions need only one times be developed and can be used in all Wiktionaries. (I know this will kill the beloved babies of many a Wiktionary power user.)
  5. Automation processes (automated content controlling, automated Word-stub creation, automated translation transfer, mark-up upgrade etc.) could be greatly improved.
  6. No word redundancies, less editing effort.

Disadvantages of the proposal.

  1. Strong effort is needed. The TransEx entity is a broad and deep Jordan River to be crossed.
  2. The theme in its entirety is difficult to communicate between single language users, database experts focused at the WIKI Data Structure, language experts and those who need to create a Wiktonary-project data representation style-guide.

NoX (talk) 19:22, 17 March 2013 (UTC). (NoX)

Your opinion?

NoX (talk) 19:22, 17 March 2013 (UTC). (NoX)

Eirikr's comments (and replies to them)
  • Wow. That's a lot to digest.
Allow me to say for now that the underlying idea, of better data portability, is a good one.
However, your proposal here calls for unifying many many things that really have no business being unified -- many of these variant aspects are different at least in part because they meet the needs of different user communities. For instance, the wikitext markup used in translation tables reflects the languages of the host language of each Wiktionary. The DE WT uses {{Ü}} to stand in for Übersetzung; the FR WT uses {{trad}} as shorthand for traduction, since {{t}} there is used to stand in for transitif; the EN WT uses {{t}} as shorthand for translation, since there wasn't the same name collision as on the FR WT. Requiring that all Wiktionaries use {{t}} for translation table items might make English speakers happy, but it would be a poor mnemonic for editors of languages where the relevant term for translation does not start with a "t". It would also require renaming any other existing templates already at {{t}}, and then going through all entries that referenced the previous name to update with the new name.
This does not even begin to address the more complicated issue that different Wiktionaries employ sometimes very different entry structures because of the different ideas about grammar and linguistics held by the different user communities. If you are serious about moving forward with this proposal, I strongly recommend that you do some grass-roots building by addressing each Wiktionary user community directly. I'm pretty much never over here on Meta, and the only reason I learned of your proposal was thanks to another editor who posted on wiktionary:Wiktionary:Beer_Parlor about this. I suspect that I am not alone in having missed this post earlier. -- Eiríkr ÚtlendiTala við mig 22:05, 19 March 2013 (UTC).
I think your worry on the translated templates is not as much a problem as you think. I wouldn't be difficult to have "code-common" where templates have a name in english, and in each local chapter offering a wrapper template which translate it (as well as documentation). But I think that you are nonetheless right on the importance on making such a structuration with the community. We must communicate on this project and working with the whole community, so every regular contributor will know about it, and hopefully will be enthusiast to get involved so their current specificity could be preserved by making an enough flexible structure for every use case. --Psychoslave (talk) 13:57, 22 March 2013 (UTC).
  • Re: localized labels for various features, your idea of a wrapper is probably a good one. Now that we have Lua, that might be less of a concern, though I have read that editors are running into possible performance concerns when a single Lua module is being called multiple times all at once. Collapsing all translation item templates for all Wiktionaries into a single Lua module might wind up creating an extremely limited bottleneck. -- Eiríkr ÚtlendiTala við mig 21:52, 22 March 2013 (UTC).
Well, I don't know the detail of this specific load balancy problem, but it seems like a resolvable to me. A simple solution would be to duplicate the code automatically, so you keep the central editable version, but executed code is distributed. Now if there's really a load ballency with lua module, there should be a serious investigation to resolve it. I just haven't the proper representation of the technical infrastructure to give an appropriate answer just like that, but I have no doubt this can be resolved. --Psychoslave (talk) 08:55, 23 March 2013 (UTC).


  •   Ditto what Lars and -sche have said.
@Lars, one difference in the treatment of translations is that the English Wiktionary, for instance, links straight through to the translated term entry pages. The full treatment is available there, but not right in the "Translations" table.
@Nox, your rewrite quite concerns me, particularly this paragraph:

To explain this, let me first take the perspective of a (reading (opposite to contributing)) user, mother language English, strongly interested in French (cross WT view). If he looks for a French word representing house, not knowing maison, he could use English WT house, take the translation reference type 2 (see image 3), cross over to the French WT by clicking on that link and get full and best information concerning maison. I’m sure, the language information concerning maison he finds there is the best one he can find in ANY other WT of WT project. After having enriched his knowledge concerning maison, why should he add this word to English WT (if he is a contributing user)? The other way round (user mother language English interested in French) it works same way.

You assume that this hypothetical English reader is also capable of fully understanding the French Wiktionary entry at wiktionary:fr:maison. This is a seriously flawed assumption. As Lars notes, each Wiktionary represents thousands of hours of work by host-language contributors, writing in the host language.
I am also concerned about some of your operating assumptions about applicable data models. The only commonalities in entry structure and data, across all Wiktionaries that I have seen, is the presence of the lemma term itself, and possibly lists like for translations, derived terms, and descendant terms. It is not even safe to assume common parts of speech for term categorization, as not all host languages treat parts of speech in the same way. For instance, what English grammarians think of as an "adjective" roughly maps to at least three different parts of speech in Japanese (形容詞 [keiyōshi], 形容動詞 [keiyō dōshi], and 連体詞 [rentaishi]). What Japanese grammarians of as a 語素 (goso) roughly maps to two different parts of speech in English ("prefix" or "suffix"). Meanwhile, it seems that the Russian Wiktionary forgoes such labeling entirely and instead uses running text to describe the morphology of each term. (NB: I'm not a Russian reader; this comes to me as second-hand information.)
Since each Wiktionary describes each term using the host language, there is no guarantee at all that the labels used in the Russian Wiktionary match the labels used in the French Wiktionary match the labels used in the English Wiktionary match the labels used in the Japanese wiktionary... all for any single given term.
I certainly wish you luck in your research. However, I think this problem is much more complicated, and much more intractable, than your description above suggests. -- Eiríkr ÚtlendiTala við mig 23:59, 28 March 2013 (UTC)
@Eirikr et al.: Since I have worked quite a lot in the Russian Wiktionary (Викисловарь) I can assure that you can find that kind of labelling also there. But the reason why you don't see it when you just look at a page like this, is that you don't know where to look. I think that this illustrates very well one of the problems, if you want all Wiktionaries to function as one big Wiktionary (and if you want to be able to contribute to many Wiktionaries without starting from scratch everytime), all the unnecessary differences. There an initiative like this could make a difference.
Lars Gardenius(diskurs) 09:14, 29 March 2013 (UTC)
Psychoslave's comments (and replies to them)
Excellent work. Now to my mind the word entity should not have a single orthography, because it doesn't reflect reality : even if you restrict yourself to well known and widely used spells, there are words which have several acceptations. For example in french you may write clef or clé to refer to a key. So orthography should be an other entity, just like meaning, and one word may have one or more orthographies (for a given language). One orthography can correspond to one or more word. Moreover an orthography should be categorizable so you can say if it's considered a correct orthography, a mispelled word, or special things like the all your base are belong to us locution and the word l33t. Also the proposition should be extended to include synonyms, hypernyms, and so on, as well as etymologia. Etymologia should have it's own ERM part I think, because we can for sure establish a well structured schema of how words slided from one form to an other (I'm not a specialist, but I know there are specific vocabulary for many supposed transformation, like a l sliding to r. --Psychoslave (talk) 10:41, 22 March 2013 (UTC).
Also each spelling could be attached to examples, and examples could have 0 or m translations, as well as a well defined reference (url/document with isbn…). --Psychoslave (talk) 13:14, 22 March 2013 (UTC).
  • Re: orthographies, different spellings often carry different connotations, sometimes different enough that they should be considered different entities in their own rights, even if the underlying concept referred to by the terms is the same thing. English thru and through are two different labels for one concept, but the labels themselves carry sufficiently different semantic information that dictionaries often treat these two different spellings as separate entries.
Even with your French example, I see that clé has a secondary sense of "wrench, spanner", that seems to be missing from the clef entry. Assuming that this difference in meaning is valid and not just an accidental omission by Wiktionary editors, then these two spellings carry different semantic information, and deserve to be treated as different entries, at least for Wiktionary purposes.
Japanese gets much more complicated due to the extremely visually rich nature of the written language. The hiragana spelling つく (tsuku) can mean "to arrive; to turn on; to stab", among other meanings. Meanwhile, the kanji spelling 着く (tsuku) is limited to "to arrive"; 付く (tsuku) is limited to "to turn on"; and 突く (tsuku) is limited to "to stab". (Simplified examples; all of these entries have additional senses.) Whether to use the more-specific kanji spellings is a matter of style and preference, not to mention clarity and disambiguation; which kanji spelling to use depends on semantic context. The hiragana spellings of many short verbs have similar one-to-many correlations to kanji spellings, where the kanji spellings are generally more specific than the hiragana spellings, and often the hiragana spellings are in common use right alongside the kanji spellings.
The data model must ostensibly account for all of this variation. Separating the spelling from the concept, which I think is what @Psychoslave here is proposing, is probably necessary for this. Some commercial terminology management tools that I have used take the concept as the top level of the data structure. One concept may have multiple terms, and one term may point to multiple concepts. One serious potential shortfall of such software is clarity --
  • how are concepts identified within the data model?
  • how does one add a synonym (such as a new orthography) to a concept?
  • when looking at a single term, how are different concepts identified for the user?
  • is each individual sense of any given term (implemented now in Wiktionary as a numbered definition line) to be transformed into a "concept" in the data model?
  • how does one manage different "concept" data objects, to do things such as find potential duplicates (possibly differing only by minor wording choices)?
  • how does one manage "concept" data objects, for purposes of splitting a sense into multiple separate senses when more specific meanings are identified?
  • etc., etc.
This is an enormously complicated problem, even when limited to looking at just one language. Expanding the problem scope to include all languages is both insanely ambitious and deliciously challenging. Good luck to all!  :) -- Eiríkr ÚtlendiTala við mig 21:52, 22 March 2013 (UTC).
Ok, let me begin with the simplest point (for a french native speaker point of view): clef and clé are exactly the same "word", orthography being the only difference. They have the same meaning, and you pronounce them in the same way. To understand why, you can begin with w:Rectifications orthographiques du français if you want to know more about it (some equivalent articles are available in other chapters). But to stay both on the topic and the french specificities (or at least, linguistic phenomena which may not happened in all languages), there are word that you write in the same way, but you'll pronounce differently according to their meaning. See wikt:Catégorie:Homographes non homophones en français. I have no doubt each language have it's own curiosities, so indeed, we are here speaking of a daunting task. Fortunately (and hopefully) this task can rely on a global community (or a global set of communities if you prefer). Probably no single human could afford the time and experience needed to accomplish such a task, but I believe that together we can do it.
For the how can we deal with identification and more than that, what are the element which should form a key to a unique entry in our database, I would be personally interested to know about the wikiomega contributors opinion, because they probably have interesting analyze to share that they gained through their experience.
Also we for sure have to gather information to be sure we can establish a model flexible enough to take account of all languages/communities specificities, but how do we decide we gathered enough information to freeze a structure? Ideally, to my mind, we should come with an extensible basic solid structure. --Psychoslave (talk) 09:57, 23 March 2013 (UTC).

NoX: You (Psychoslave, Eirikr) are absolutely right. My ERM is only a sketch. Relevant Entities and Relationships are missing. In IT Database projects it’s a good idea to begin with a simple ERM. Its purpose is, to initiate a discussion between IT- and (in this case) language-experts about necessary and relevant things (Entities) and their relationships to each other. Later in the database design process they are mapped, not at all 1 : 1 into specific (MS SQL-, ORACLE-, DB2-, WIKI-) databases and tables. In an IT project e.g. your contribution would lead to (a discussion and) an extension of the ERM by adding Entities (not seen by me, or left away in the discussion provoking startup process). A good ERM on Language and language translation would reflect the long lasting nature and the essence of all things (Entities) and their relationships in this environment.
But our current problem is different. We have a multipurpose WIKI-database with shortcomings in the language area (WT Project) and big advantages in other areas (e.g. WIKIPEDIA). So my idea was, looking at the current French WT (knowing also English, German and Italian WT), what could its ERM look like, what could be improved. I didn’t write anything about HOW to do it. The change could be made evolutionary (I’m not sure if this can work because I’m not a WIKI-database expert), or it could be made revolutionary: Simply said 1. Harmonize the mark-up, 2. Export current WT content (eg. into an agreed XML Structure). 3. Reload it into a database better apt (see following proposals by others).

NoX (talk) 21:35, 24 March 2013 (UTC).

Lars Gardenius' comments (and replies to them)

@NoX?: Since I look upon all Wiktionaries as one big Wiktionary and then also share your interest in cross-Wiktionary questions I believe that your approach is basically praiseworthy, however some or your suggestions above raises questions. Before I start critizing the proposals too severly I therefore want to ask you a question concerning the section "Avoid duplicates in different Wiktionaries". What exactly do you mean by that? To make a comparison: Do you want all Chinese users to throw away their French-Chinese dictionaries, all Swedes to throw away their French-Swedish dictionaries and so forth, and that they should all start using Larousse's French-French dictionary, to get the exact meaning of a French word? Is that the idea you have for Wiktionary, or have I completly misunderstood your vision? Lars Gardenius(diskurs) 13:21, 26 March 2013 (UTC)

NoX: Hi Lars. I partly rewrote chap 2 concerning duplicates. I hope this answers your questions. If not, let me know. Still unanswered rests, where (in which WT) to put the TransEx-Entity if established. NoX (talk) 14:46, 28 March 2013 (UTC)

Thank You for the new and extended version of "Avoid duplicates in different Wiktionaries". However I am still very critical. I would like to stress again that I find Your initiative and approach praiseworthy, and I hope that You will not find what I write below as an attack on Your proposal as a whole. [1] However I believe that You have overlooked some very basic facts about languages in that specific section.
I have worked as a professional translator (from Chinese) during a short period of my life. I, as many others, recommend that you start using monolingual dictionaries as soon as possible, like Oxford Dictionary for English, Larousse for French or 新华字典 for Chinese. The reason is the one you give above, the best explanation you can find is probably in this kind of dictionary. Since it is very costly to produce a (paper) dictionary you have to limit the space given to explanations in bilingual dictionaries. [2]
However, this recommendation is easier to give than to follow. To be able to handle e.g. a monolingual Chinese dictionary, you have to study Chinese at least a couple of years. An effort that perhaps not everybody is ready to make. I don't think it is reasonable to believe that any average user can understand an explanation written in Chinese, however good it is.
You could of course propose a translation of the Chinese articles (on Chinese words) to all other languages but then you are back in the situation you wanted to avoid, and how many can translate from Chinese to Finnish, Romanian, Quechua etc., and keep them updated?
There is also another mayor reason why this is not a good approach.
Every monolingual dictionary is written in a social and linguistic context. A Chinese monolingual dictionary is written in a Chinese social and linguistic context, that you have to know to really understand the explanations. All languages also have different ways to solve different grammatical and linguistic problems. So what perhaps is not at all mentioned in a Chinese monolingual dictionary, because it is considered trivial to everybody having Chinese as their mother tongue, is perhaps very difficult to understand, and necessary to treat in a dictionary, if you e.g. just speak Portuguese.
These are some of the reasons why I think every serious translator use all kinds of monolingual, bilingual (both ways) dictionaries when translating. If you are a Swedish translator you simply need a dictionary explaining the word from e.g. a Chinese point of view as well as from a Swedish point of view.
So both the professional translator as well as the average layman needs both monolingual and bilingual dictionaries, now and in the future.
Then it should also be said that it is obvious that the biggest problem in Wiktionary lies with these bilingual parts of the dictionary. If you for instance want to create a Chinese-Romanian Wiki, you will need at least ten people working on it for several years before it reaches a level of quality and usability that is acceptable. These number of people is obviously often lacking. But to throw these bilingual parts out doesn´t solve the problem, just hides it.
This problem is I believe also partly linked to the translation part of the articles. The space devoted to translations is very small (in all Wiktionaries). I at one time made a comparison with an ordinary (paper) dictionary. While they devoted 40 lines to translate a german word (to a certain language), Wiktionary devoted half a line, that is about as much as you can find in an ordinary cheap pocket dictionary!
I believe that Wiktionary have to find a whole new way to present translations and to link them to the articles.
Lars Gardenius(diskurs) 18:13, 28 March 2013 (UTC)

  1. I don’t have any problem with rational pros and cons. I have problems with some emotional ones (not yet found here). NoX
  2. I agree that the normal way learning a language is starting with a bilingual dictionary and ends in using a monolingual dictionary in the target language. The latter is particularly difficult in cases where the manners of writing differ and cultural differences are big. But in my opinion this is no strong argument against my proposal to remove the pink (non WT host language) words. Though I’m an oldie, I’m still willing to learn. But before rewriting parts of my proposal, so I should, I want to listen to your voices. NoX
I sometimes use wiktionary in my poetry writing process, so I'm fully aware that you can't expect some little definitions to give you all the key of the overall meaning of a sentance. Meaning is context dependent, at least that's how I think about it currently. Ok, but even if no dictionary will never able to give all keys to understand a sentance in its idiomatic context, I think that giving some hint is better than nothing. Wiktionary have no deadline, so to my mind time is not a real problem. Gathering more contributors is to my opinion a far more important issue. So we should aim at
  • making edition as easy as possible to new contributors
  • having a cross-chapter way to structure articles in a flexible manner which allow:
    • feedback across all chapters : if I add a Shakespeare quotation as usage example in the english version, it should be propagated to all chapters, with text requesting users to translate in the current chapter language if they know the original one. If relevant, text explenation may be for example added in ref anchors.
    • adaptation to locale specificities, let the chapter community decide how to integrate elements in their workflow. --13:01, 5 April 2013 (UTC)
-sche's comments (and replies to them)
  • (-sche here:) I don't have time right now to respond to everything that has been said, but: it's true that many (e.g.) English Wikt entries for (e.g.) French words are currently smaller than French dictionaries' entries for those words, because Wiktionary is incomplete. However, because Wiktionary is not paper, it has the ability to cover all words in all languages in greater detail than any paper dictionary. wikt:de:life (as a result of my work) and wikt:de:be, for example, provide German-language coverage of the English words life and be that is as expansive and detailed as a monolingual English dictionary's. wikt:en:-ak is provides English-language coverage of -ak more detailed than any Abenaki-language dictionary's—not that there are (m)any Abenaki-language dictionaries! That's what each Wiktionary can do at its best, and it's what would be lost or made more difficult by proposals to centralise foreign-language content either on Wikidata or on OmegaWiki (cf. my comments on the proposal to adopt OmegaWiki) -sche (talk) 23:34, 28 March 2013 (UTC).
  • Currently in my opinion (very) rare pink high quality examples like the cited German WT life (1st bilingual part) seem to me to be a strong pro for my proposal. I stay with your example and have a look at English WT Leben (2nd bilingual part). Why not merge them into one TransEx? 1st part meaning 1. Life, the state between birth and death is the same as 2nd part meaning [2] Leben: die Zeit, in der jemand lebt; persönliche Laufbahn, mit der Geburt beginnend und mit dem Tod endend. And so on with the other meanings. Why not take one of the two as best representation of the word’s meaning, translate it into the other language and both on top of the corresponding translation examples, as proposed (by me)? (By the way in this special case: the blue German WT Leben has fewer meanings than pink German WT life (colors refer to image 1 above). The question arises, why did the contributing author not improve the meanings of the blue one or took them from the blue one and put them in the pink one?) I think we are in this respect not far away from each other. I don’t propose to throw away the content of the pink words. The idea is to transfer the bilingual representation of the usage of a word in a defined bilingual meaning/sense context (perhaps also other non redundant language-relationship-information that I did not yet identify, but seems to be addressed) into the TransEx Entity/database/table/WT. Link type 1 (see Image 3: Translation references above) would address these TransEx(es). German to English translation link type 1 of Leben would reference the identical TransEx Leben*life as English to German translation link type 1 of life. Thinking of an evolutionary expansion of WT Project: to which WT sould in this case Leben*life, in general referential entity ("brige-") elements belong? Not to the German, not to the English but to a common new WT, containing the elements of the relationship entity TransEx. The integration should occur in a way that users have the feeling to work with one WT alone. Currently with the English, French etc. WT, in the future with a single big pot. I think its not a good idea to split up parts of current WTs and transfer them elsewhere. Political parties doing that, generally loose. NoX
  • Re "I think it's not a good idea to split up parts of current WTs and transfer them elsewhere": yet that is what this proposal does; it either splits foreign-language entries out of all the Wiktionaries, or duplicates content.
    Re "Why not merge them into one TransEx?": Many of the comments I made about OmegaWiki—which already exists as an 'all-in-one' Wiktionary that translates and transcludes 'consolidated' definitions into multiple languages—can be repeated here:
    Firstly, words which are denotatively similar enough to translate each other are rarely connotatively synonymous. It is linguistically unsound to presume that terms from different languages have exactly the same nuanced definitions. Freund and друг, for example, denote a closer companion than friend (Freund also denotes boyfriend); wikt:de:friend-vs-wikt:de:Freund and wikt:en:Freund-vs-wikt:en:friend should and can note this, even if (due to incompleteness) they do not note it yet. How would a consolidated 'TransEx' address it?
    Secondly, OmegaWiki already tries to do something similar to what is proposed here... and it fails. New contributors create dozens of translation-table entries and new entries on en.Wikt every day; in any given week, en.Wikt has 40+ regularly active users. Other Wiktionaries, laid out in their own languages, also have many active users. OmegaWiki has 10 regularly active users. I doubt this is without good reason: OmegaWiki's lingua franca is English, and its translation tables are based in English terms (it assigns translations to English senses: for translations of the German laufen, I am taken to DefinedMeaning:run (6323)). This makes it hard for people who do not speak English well to contribute translations or anything else, or to participate in discussions: such users are best served by Wiktionaries in their own languages. WikiData, with English as its lingua franca, has the same handicap as OmegaWiki, and I feel that an attempt to create a second 'consolidated' Wiktionary on WikiData is likely to fail to flourish just as the first consolidated Wiktionary (OmegaWiki) failed to flourish, and just like OmegaWiki, a WikiData 'TransEx project' will then be just another competing 'standard', its entries sure to fall out of sync with the other Wiktionaries. :/ -sche (talk) 07:15, 1 April 2013 (UTC)
At the first glance, this seemed to me to be a strong argument. To get a feeling how the Germans would treat a problem like this, I added a question to the discussion page of wikt:de:Freund.
My interpretation of the (one opinion only) response is: Lack of quality. 1) Missing meaning at wikt:de:Freund. 2) Missing word wikt:de:Freundchen. 3) Wrong translation references at wikt:fr:ami (should refer to wikt:de:Leute) and wikt:en:friend (should refer to wikt:de:Freundchen).
Not only since Heisenberg we know that we are living in a world of uncertainty and we have to live with it. But we can reduce it. My not yet fully presented idea concerning TransEx is, that a presentation construct like
#* {{RQ:Schuster Hepaticae V|vii}} , see word sound
is needed, but without containing its data in line behind the mark up. The editing process, using a pop up window (data presentation defers from data editing and data storage) puts the data into a separate storage, unseen by readers and editors). The presentation process takes the date from there.
And this is completely different from a two tire approach like that of OmegaWiki (which I don’t appreciate). The users need to have the feeling to act in one single environment. This can be achieved by separating data presentation from data storage. This is currently not the case. Its description needs more words than those currently contained in my proposal.
By the way, thank you for removing my {{sic}} provoking bugs. Hope you get a feeling of the idea behind my poor words. NoX (talk) 19:43, 1 April 2013 (UTC)
Here is what your interesting comment inspire me: we should not pretend to provide word/locutions "translations", but try to list semantic proximity. For example, as far as I can tell, the french je and the english I are semanticaly equivalent. But I have no idea if the japanese わたし (watashi) is (I suspect it isn't even if my knowledge in japanese is close to nil). Of course, for japanese people, having details on watashi semantic may not be as interesting as it can be for non-native speakers. We lake consistency:
  • on fr.wikt fr:わたし link to a wikibook on japanese grammar (why not), but fr:I don't provide such a link (may be there's no wikibook).
  • wikt:en:I give far more etymological information than fr:wikt:I.
We should provide a way to propagate this kind of information in a more systematic way, with automation when possible. Tipically, a word etymology is something you can well represent into a form that a computer can manipulate. --Psychoslave (talk) 13:48, 5 April 2013 (UTC)
Please note that the assertion of -sche that OmegaWiki's translation tables are based in English terms is wrong. In OmegaWiki we also have terms that have no English equivalent (e.g. safraner). The fact that the German "laufen", takes you to "DefinedMeaning:run (6323)" should not be interpreted as being English-based. In fact, it should take you to the "DefinedMeaning:6323", but it was decided to add a translation in the DefinedMeaning name to make the recentchanges page more readable. As it causes some confusion, it will be changed to number-only in the future, when the recentchanges page will be ajaxized. --Kip (talk) 13:40, 8 April 2013 (UTC)
Purodhas comments (and replies to them)

I am not at all commenting presentational matters here. I am strictly concerned with structure only.

The approach of OmegaWiki is to have a defintion of a concept per expression, or per "word". Defintions are assumed to be expressed in all languages, yet to describe something which is not bound to a single language.

This has a technical disadvantage: It prohibits the mass-import of most bilingual word lists and dictionaries, becaus they lack the requied definitions. Remember, a definition must exist in OmegaWiki before another word or translation can be connected to it.

Is the idea of having (mostly) language-independent definitions thus a dead-end? I believe not so. For a really huge class of words such definitions exist, can easily be found, and they serve their purposes. But there are notable exceptions.

In more detail:

  • Typical names, proper names, geographical designations, technical terms, scientific terms, and many more of theses kinds are identical or almost identical between many languages, having doubtlessly shareable definitions. The are btw. quite often additionally identifyble by pictures, drawings, maps, etc.. They comprise something like 90% of the use cases of a dictionary of more than ten thousand or twenty thousand "words". Thus, it would be unwise, and a waste, if we were not willing to use this easily available opportunity.

However:

  • The typical hundred most used words of a language - each language has its own set of course - most often do not have good shareable definitions, and very often no useful definitions at all. Setting very common descriptive words aside, such as mother, rain, or one, you will find words that are best described by their usage rather than by their meanings. E.g. the English words many, much, and often share some aspects of ther meanings and may even have a common translation to another language, but you cannot exchange them for one another in English sentences. Their use cases do not overlap. Somewhat simplified, you have many for countables, you use much for uncountables, and often for repititions, which all exclude one another. Descriptions of the proper name type can be entirely on the object language level. The preceeding "descriptions" of many, much, and often are on the meta language level (which uses language to speak about the language itself rather than about objects alone). Transferring them into another language may proove cumbersome when the target language does not have concepts like countables, uncountables, or repititions. -- Purodha Blissenbach (talk) 05:49, 21 March 2014 (UTC)
  • The Toki Pona word li has no direct translation in any language that I would know of. It is a pure structural word. It separates verbs from objects. E.g. Purodha toki toki. means something like Purodha talks and talks. while Purodha toki li toki. tells us Purodha speaks a language. You see how li introduces the object part of the second sentence. It does not have a "meaning" on its own. It is there for a structural reason. It only indirectly influences the meanings of entire sentences. There are strutural words in many languages. Some of them have close or remote look-alikes in some other languages. Generally, they are language specific, have no global translations, have no or little meaning on their own, and need to be explained using metalanguage. OmegaWikis approach of having a common "defintion" has drawbacks when it comes to pure structural words. The good news is that there are not many of them in any language, most often maybe a dozen or two. The bad news is that they are usually among the most frequently used ones. A vocabulary without them would be very incomplete.

(to be continued) -- Purodha Blissenbach (talk) 13:22, 21 March 2014 (UTC)

Thinking out of the classical online Wiktionary format and reading usage

Our goal is not only to build dictionaries as complete as possible, we also want the result to be as useful as possible, which mean it should be easy to integrate them elsewhere and generaly to be used in innovative ways.

In this part, contributors are encouraged to expose what kind of usage could be made easier if taken into account at the design step rather than an after thought.

Generating standard dictionary output.

Currently, dump which are generated are not directly usable usable in offline application, for example gnome dictionnary. As far as I know we doesn't provide a standard way to consult it like through the w:DICT protocole. It would also be convenient to be able to download wiktionary for e-ink devices. --Psychoslave (talk) 13:40, 22 March 2013 (UTC).

Related:

Voice recognition.

One way one could want to access to an entry in wiktionnary, is to pronounce the word/locution. As smartphone become more common, people acquire a device which is able to take voice input. On the other hand, sometime people will meet a word they can't spell. For example, two person from distant native culture became friends, and they like to share their respective knowledge through their talk. So sometime one will talk a specific word of its native language, but the other person won't understand it and in fact won't even be able to pronounce it because it contains sounds s/he doesn't know (or may it's a tone language while s/he doesn't know tone language). So they take a smartphone, run the wikipronounce app, and voilà, the original graphy, an IPA transcription (eventually a roman transcription if relevant), a definition in the user native language. --08:45, 23 March 2013 (UTC).

This is not as easy as one might expect it to be. We have three kinds of obstacles.
  1. There is no technical way to come from sound to IPA in general. While some pieces of sound recordings may be technically identifyable (mostly sonorants, but by far not all of them), other segments are not generally identifyable. Most notably plosives are silent to a large extent of their durations. There is no way to distinguish them based on their main parts (no frequencies, amplitude zero for each of them) but you can try from their coarticulation with neghboring segments. Since that is extremely language dependent, we are currently not capable to do that in an independent way, and we cannot for the majority of the major languages either due to lack of specific research on them, leave alone so called smaller languages.
  2. Current state-of-the-art voice recognition systems are either:
    • restricted to a very limited predefined vocabulary. Good ones produce speaker independant hit rates in the 60% to 75% range.
    • less restriced vocabularywise, e.g. accepting all words recognized by a spellchecker of a certain language, but then they need to be trained to a specific speaker for a significant duration before they become usable. Their recognition rate of non-dictionary words is, politly said, limited.
  3. IPA use is both language and tradition dependant. If you have a single recorded utterance and give it to, say, a dozen people from different parts of the earth and from different traditions of using IPA, you will likely get a dozen mostly incompatible or contradictive transcripts. Most usually, IPA transcripts in dictionaries follow a single tradition and are made by native academics. Unless you know both the tradition and the customary language specific deviations from the formal IPA standard, you cannot pronunce IPA transcripts found in foreingn language dictionaries correctly.
Thus I am sorry to say, your vision is - at the moment at least - limited to very few use cases and at best to words already existing in a Wiktionary. I wish the better, but it is certainly not at sight. -- Purodha Blissenbach (talk) 04:31, 21 March 2014 (UTC)

Speech synthesis.

Along the IPA (and X-SAMPA), wiktionnary also offer prononciation sample. Currently this sounds need to be recorded and uploaded by contributors, one by one. This solution is better than nothing, and even should be probably kept to give real world examples of the word prononciation. Thus said, there are many disadvantages with it. First of all, not all word have such sample. Some have nonetheless IPA, but probably must people won't be able to read it easily, given that, as far as I know, no primary school in the world teach it. So it would be very helpful for much reader to have a speech synthesis using the IPA data (when present), so not only would people have at least a minimal idea of how to pronounce it, but also will they be able to learn IPA with accustom. An other pro would be that it will provide an unified prononciation voice accross all words (possibly customable in preferences), while records will change and represent contributors diversity. This last sentance should not be taken as a critic of diversity, as previously said records should be considered of great value because they provide real world examples, and should stay as a complementary data to a speech synthesis. --Psychoslave (talk) 08:11, 25 March 2013 (UTC).

For reasons outlined in the section on Voice recognition, the chances to get correct or usable speech synthesis from IPA transcripts are very limited. Odds are that a majority of cases will simply be incorrect. Do not take this as an argument not to try it, but you must be aware that each langage needs its own speech synthesis algorithm for the entire idea to function. --Purodha Blissenbach (talk) 04:46, 21 March 2014 (UTC)

Helping avoiding/creating neologisms.

Language primary purpose is to communicate, share ideas. Often people know no specific word to express what they are thinking and willing to communicate. Usually, one may use a sentence using a set of words which enable to express, more or less accurately, what they think. But when a new concept is central to a thought, one may decide to create a new word to express it. Different strategies may be used to coin such a word, each having pro and cons:

  • Use etymological knowledge of the given language to build a word which doesn't add new roots, and will be both be short and hopefully understandable to someone having a good knowledge of this language. The advantage here is that it extend the language in more or less familiar way to speakers, possibly in a word that they will understand even if they never heard it before. For many word of this kind, no high knowledge is really needed, as many (all?) languages have affixes which enable to coin such adhoc words. But sometime making such a construction can require such a high linguistic level, especially in specific topics such as science activities, where people may be competent in their specialty but not in linguistic.
  • Make an acronym. The clear advantage here is that you need no linguistic competence to coin a word. The evident con is that the coined word is completely opaque and native speakers won't be able to use their lexical knowledge to deduce its meaning. An acronym is not necessarily used because no specific expression exists, it's often a matter of shortness. Thus, DNA which stands for deoxyribonucleic acid trade a ten syllables against three.
  • Using a loanword. Advantages are that the word exist, just eventually need some pronunciation tuning, and it probably have a known meaningful etymology. The con is that the word may be opaque to native speakers of the target language, so they can't establish semantic relations based on their already acquired lexical/meaning mind network.

Here wiktionnaries should help by:

  • first, avoiding unwanted[1] redundant neologisms, making easy to find existing expression to express a given concept,
  • making easier to create neologisms as relevant as possible given existent lexicon of the target language.

Add a user-friendly method for adding new pronunciations

Using for example Recorderjs (see demo).

Create a tool to help users move in batch their own pronunciations from Forvo to Wikimedia Commons

Forvo is a website that allows users to pronounce words in many different languages. Unfortunately recorded sounds are licensed using Creative Commons Attribution-NonCommercial-ShareAlike license, so only the authors could import existing pronunciations in Forvo to Commons.

참고 및 참조

  1. The goal is not to prevent people to create new words or languages if they want, just to let them know if there are existing expression if they would like to avoid it.

See also

Certain Wiktionary bugs: