A method of using large language models in machine translation in which a translation model is partioned into a plurality of language model partitions stored on a pluraility of different language model servers. Segments of text are distributed to the servers for translation according to server workload.