Codon Usage

In general, codons can be grouped into 20 disjoint families, one family for each of the standard amino acids, with a 21st family for the translation termination signal. Each family in the universal genetic code contains between 1 and 6 codons. Where present, alternate codons are termed as synonymous. Although choice among synonymous codons might not be expected to alter the primary structure of a protein, it has been known for the past 20 years that alternative synonymous codons are not used randomly. This in itself is not startling as codon usage might be expected to be influenced at the very least, by mutational biases.

Analysis of genes from the RNA bacteriophage MS2 identified differences between the codon usage of phage genes and genes from its host, E. coli. Codon bias in MS2 might result from selection for the rate of chain elongation during protein translation. It was suggested that the most frequent synonyms of MS2 were those translated by the major tRNAs of its host. The observation of codon usage bias implied that not all synonymous mutations were neutral. The codon usage of the bacteriophage ΦX174 (5,386 bp), the first genome to be sequenced entirely, was found to be non-random, with a bias towards codons whose third position was thymidine (T) and away from codons starting with adenosine (A) or guanidine (G).

Optimal Codons

The number of species where the abundance and structures of tRNAs are known is limited relative to the number of organisms from which sequence data has been obtained. Indeed, what knowledge there is of tRNA abundance is potentially biased, because measurements are made under laboratory growth conditions. It is therefore desirable to define an optimal codon in terms of a more readily estimated characteristic. The most commonly used characteristic is the pattern of codon usage itself, the definition used in this thesis is “an optimal codon is any codon whose frequency of usage is significantly higher in putatively highly expressed genes”. Significance is estimated using a two-way chi-squared contingency test, with a cut-off at p<0.01. The most frequent codon for an amino acid is not necessarily an optimal codon, which is subtly different from the original definition of an optimal codon used by Ikemura, who defined optimal codons as those codons occurring most often in biased genes.

Codon usage and heterologous gene expression

In all known organisms, from bacteria to man, the same triplets of DNA bases code for the same amino acids. However this does not mean that all species encode their genomes in exactly the same way. The code is redundant: a number of triplets code for the same amino acid. While all species are able to translate any sequence of DNA interchangeably, E. coli prefers to use certain triplets to code for certain amino acids which may be different to the ones we use. This 'preference' is reflected in the levels of tRNA which match such a triplet. In this project we resynthesised a number of genes de novo and thus were able to codon optimise them for expression in E. coli.

The presence of rare codons per se does not imply weak expression. Despite the poor overlap between the codon usage of Halobacterium halobium (70% G+C) and E. coli (50% G+C), genes from Halobium can be highly expressed. In E. coli mutation of the ribosomal binding site of atpH can increase its level of expression 20-fold. An oligonucleotide of rare codons within the coding sequence of B. subtilis sspB (small acid soluble spore-protein) did not have a discernible effect on yield. The addition of rare AGG codons near the terminus actually enhanced expression of chloramphenicol acetyltransferase in E. coli.

However, the expression of heterologous genes can be adversely affected by unusual codon usage or context. The presence of rare codons in a recombinant gene can be compensated for by either adding the appropriate tRNA, or synthesising the gene to remove the rare codons. The expression of the human granulocyte macrophage stimulating factor was enhanced after argU was induced (even though the recombinant protein had only a single AGG codon). The human rap74 gene (RNA polymerase associating protein) was expressed more efficiently in E. coli after codon usage was adjusted, previously there are a large number of amino terminal fragments due to frameshifts. Similarly altering the codon usage of avidin, tropoelastin (Martin et al. 1995) and isovaleryl-coa dehydrogenase enhanced their expression .

http://www.biologicscorp.com/tools/CodonUsageCalculator

large-scale production