Document worth reading: “A Large-Scale Comparison of Historical Text Normalization Systems”
There isn’t any consensus on the state-of-the-art technique to historic textual content material normalization. Many methods have been proposed, along with rule-based methods, distance metrics, character-based statistical machine translation, and neural encoder–decoder fashions, nevertheless analysis have used utterly completely different datasets, utterly completely different evaluation methods, and have come to utterly completely different conclusions. This paper presents crucial look at of historic textual content material normalization achieved so far. We critically survey the current literature and report experiments on eight languages, evaluating applications spanning all courses of proposed normalization methods, analysing the impression of teaching information quantity, and using utterly completely different evaluation methods. The datasets and scripts are made publicly on the market. A Large-Scale Comparison of Historical Text Normalization Systems