Statistical machine translation (SMT) is a machine translation system that uses algorithms to establish probabilities between segments in a source and target language document to propose translation candidates. Also known as data-driven machine translation to contrast the approach with a rule-based machine translation system.

The main disadvantage of statistical machine translation is that it fails when it is presented texts that are not similar to material in the training corpora. For example, a translation engine that was trained using technical texts will have a difficult time translating texts written in casual style. Therefore, it is important to train the engine with texts that are similar to the material you will be translating on an ongoing basis.

Even with large and suitable training corpora, statistical machine translation does not generally produce publication quality text. It frequently translates items out of context or uses the wrong word order. However,it generally translates well enough that it is suitable for comprehension.

For publication quality translation, it is necessary to implement a human review and post-editing process, which many commercial machine translation provders offer as an option.

See also