Module:Sandbox/AbstractWikipedia/TextAssembler
This module is part of the Abstract Wikipedia template-renderer prototype. It corresponds to the last block in the proposed NLG architecture.
It exposes the function constructText
, which responsible for assembling a string of text from a list of lexemes passed to it.
While assembling the text, it takes care of spacing, punctuation and capitalization, according to the information given in the trailing_punctuation
and capitalization
tables.
Note that previously this code was part of the main module, and as such it is not mentioned in the recorded demo of the prototype.
local p = {}
-- The following gives a list of trailing punctuation signs, and their relative
-- rank. Lower rank (i.e. higher number) means that a punctuation mark is
-- superseded by an adjacent higher rank mark. Between punctuation marks of equal
-- rank, the latter supersedes.
trailing_punctuation = { ['.'] = 1, [','] = 2 }
-- The following lists punctuation marks which should trigger capitalization:
capitalization = { ['.'] = true }
-- This functions constructs the final string of the lexemes.
-- It reduces spans of multiple spacings to a single one, handles punctuation
-- specially, and concatenates the rest of the text.
-- It also handles capitalization (except in the first sentence).
function p.constructText(lexemes)
local result = ''
local pending_space = ''
local pending_punctuation = ''
for index, lexeme in ipairs(lexemes) do
local text = tostring(lexeme)
if lexeme.pos == 'spacing' then
pending_space = text
elseif lexeme.pos == 'punctuation' and trailing_punctuation[text] then
if #pending_punctuation == 0 or trailing_punctuation[pending_punctuation] > trailing_punctuation[text] then
pending_punctuation = text
end
-- Trailing punctuation removes prior space
pending_space = ''
elseif text ~= "" then -- Empty text can be ignored
if result == '' or capitalization[pending_punctuation] then
text = mw.getLanguage(language):ucfirst(text)
end
result = result .. pending_punctuation .. pending_space .. text
pending_punctuation = ''
pending_space = ''
end
end
result = result .. pending_punctuation
return result
end
return p