Bi-Lingual Evaluation Understudy, an algorithm for evaluating machine translation output against a reference human translation. Best used to evaluate improvements of a machine translation system over several cycles of training. BLEU is not a useful metric for machine translation end users trying to evaluate quality.

