Protein Expression and Purification
Overview of protein expression and purification
What do you know about your protein
How to improve the expression level of active and soluble protein
Strategies for native protein and recombinant protein purification
Methods for protein seperation and protein purification
Custom Protein Service & Contact Us
The genetics and biochemistry of Escherichia coli are probably the best understood of any known organism. The knowledge gained in the study of E. coli biology has been applied to the development of many of today¡¯s molecular cloning techniques. Most cloning vectors and methods utilize E. coli or its phages as a preferred host, primarily because of the ease with which the bacterium can be grown and genetically manipulated. These same characteristics made E. coli an attractive early choice as a host for the production of large quantities of protein encoded by cloned genes. Aside from its well-studied biology, E. coli is suitable as the basis of an expression system because of its rapid doubling time and its ability to grow in inexpensive media. Years of study devoted to gene expression in E. coli have provided numerous choices for transcriptional and translational control elements that can be applied to the expression of foreign genes.
Expression plasmids contain sequences encoding a selectable marker to ensure maintenance of the vector in the host cell. Commonly used selectable markers in E. coli include bla (which encodes -lactamase and confers resistance to ampicillin and other -lactam antibiotics), cat (which encodes chloramphenicol acetyltransferase and confers resistance to chloramphenicol), and tet (which encodes a membrane protein that confers resistance to tetracycline).
Initiation of translation on mRNAs requires the presence of a so-called Shine and Dalgarno sequence or ribosome binding site (RBS) in close proximity to an initiator methionine (Shine and Dalgarno, 1974). The RBS consists of a purine-rich stretch of nucleotides complementary to the 3¡ä?end of 16S RNA, located 5 to 13 bases 5¡ä?to an initiator ATG. RBS elements typically used in expression vectors derive from well-translated E. coli or bacteriophage genes. For instance, the pTrcHis and pRSET vectors and the pGEMEX vectors use the T7 gene 10 RBS.
Direct expression refers to the fusing of the coding sequence of interest to transcriptional and translational control sequences on an expression vector, with an initiator methionine codon preceding the open reading frame. This approach can be used to produce cytoplasmic proteins, and it can also be used for the intracellular expression of normally secreted proteins. In the latter case, the DNA sequence encoding the signal peptide is replaced by the initiator methionine codon. Success with the direct approach is often variable. First, translation initiation is inconsistent due to the fact that sequences 3¡ä?to the initiator methionine can influence the efficiency of ribosome binding (Looman et al., 1987; Bucheler et al., 1990). For reasons that are not fully understood, maximizing the A?T content of the 5¡ä?end of the coding sequence (taking advantage of the degeneracy of the genetic code) can sometimes improve the efficiency of translation initiation (De Lamarter et al., 1985; Devlin et al., 1988). Second, recombinant proteins produced in the cytoplasm often form dense, insoluble aggregates of protein called inclusion bodies (Schein, 1989).
Secretion of proteins in E. coli is mediated by the presence of an N-terminal signal sequence that is cleaved after translocation of the protein. Expression of cloned gene products as secreted proteins in E. coli has been utilized as an alternative to cytoplasmic expression for proteins that are normally secreted. In E. coli the protein is secreted to the periplasmic space between the cytoplasmic and outer membranes, in contrast to extracellular secretion that occurs in gram-positive bacteria and eukaryotic cells. The result is that in E. coli the secreted protein remains cell-associated, although in a ¡°compartment¡± separated from the cytoplasmic proteins that make up the vast majority of the total cellular protein. This can be advantageous in terms of protein purification if techniques are used that release only periplasmic contents while leaving the cytoplasmic membrane intact (Neu and Heppel, 1965). Secretion of heterologous gene products has been successfully employed for various proteins that are difficult to produce in the cytoplasm of E. coli as soluble and active proteins, including various growth factors (Cheah et al., 1994), receptors (Fuh et al., 1990), and recombinant Fab fragments (Skerra, 1994).
Methods for the overexpression of cloned gene products in E. coli have improved significantly since it was first attempted. Common problems such as variable expression levels, inclusion body formation, and purification difficulties have been successfully addressed by advancements in expression technology. Probably the most significant of these advancements has been the development of fusion proteins and fusion tag expression and purification techniques. These methods have resulted in more consistent production of soluble and active protein, and have allowed for simple and efficient purification of the proteins from bacterial lysates. Although the production of soluble, properly folded, and active recombinant proteins in E. coli is still not guaranteed, the likelihood of success is far greater than it was just a few years ago. This progress should help ensure that E. coli will continue to be the host organism of choice for recombinant protein production.
Baculoviruses have emerged as a popular system for overproducing recombinant proteins in eukaryotic cells (Luckow and Summers, 1988; Miller, 1988; Miller et al., 1986; Luckow, 1991). Several factors have contributed to this popularity. First, unlike bacterial expression systems, the baculovirus- based system is a eukaryotic expression system and thus uses many of the protein modification, processing, and transport systems present in higher eukaryotic cells. In addition, the baculovirus expression system uses a helperindependent virus that can be propagated to high titers in insect cells adapted for growth in suspension cultures, making it possible to obtain large amounts of recombinant protein with relative ease. The majority of this overproduced protein remains soluble in insect cells, in contrast to the insoluble proteins often obtained from bacteria. Furthermore, the viral genome is large (130 kbp) and thus can accommodate large segments of foreign DNA. Finally, baculoviruses are noninfectious to vertebrates, and their promoters have been shown to be inactive in mammalian cells (Carbonell et al., 1985), which gives them a possible advantage over other systems when expressing oncogenes or potentially toxic proteins.
Currently, the most widely used baculovirus expression system utilizes a lytic virus known as Autographa californica nuclear polyhedrosis virus (AcMNPV; hereafter called baculovirus). This virus is the prototype of the family Baculoviridae. It is a large, enveloped, double- stranded DNA virus that infects arthropods. The baculovirus expression system takes advantage of some unique features of the viral life cycle (Fig. 5.4.1). See Doerfler and Bohm (1986) for a comprehensive review. As with mammalian DNA viruses, the baculovirus life cycle is divided temporally into immediate early, early, late, and very late phases. Viruses enter the cell by adsorptive endocytosis and move to the nucleus where their DNA is released. DNA replication begins 6 hr after infection. Replication is followed by viral assembly in the nucleus of the infected cell. Two types of viral progeny are produced during the life cycle of the virus: extracellular virus particles (nonoccluded viruses) during the late phase and polyhedra-derived virus particles (occluded viruses) during the very late phase of infection.
The baculovirus expression system takes advantage of several facts about polyhedrin protein:(1) that it is expressed at very high levels in infected cells, constituting more than half of the total cellular protein late in the infectious cycle; (2) that it is nonessential for infection or replication of the virus, meaning that the recombinant virus does not require any helper function; and (3) that viruses lacking the polyhedrin gene have a plaque morphology which is distinct from that of viruses containing the gene. Recombinant baculoviruses are generated by replacing the polyhedrin gene with a foreign gene through homologous recombination. In this system, the distinctive plaque morphology provides a simple visual screen for identifying the recombinants. To produce a recombinant virus that expresses the gene of interest, the gene is first cloned into a transfer vector (described below). Most baculovirus transfer vectors contain the polyhedrin promoter followed by one or more restriction enzyme recognition sites for foreign gene insertion. Once cloned into the expression vector, the gene is flanked both 5¡ä?and 3¡ä?by viral-specific sequences. Next, the recombinant vector is transfected along with wild-type viral DNA into insect cells. In a homologous recombination event, the foreign gene is inserted into the viral genome and the polyhedrin gene is excised. Recombinant viruses lack the polyhedrin gene and in its place contain the inserted gene, whose expression is under the control of the polyhedrin promoter.
Because baculoviruses infect invertebrate cells, it is possible that the processing of proteins produced by them is different from the processing of proteins produced by vertebrate cells. Although this seems to be the case for some post-translational modifications, it is not the case for others. For example, two of the three post-translational modifications of the tyrosine protein kinase, pp60c-src, that occur in higher eukaryotic cells (myristylation of the N-terminal glycine residue and phosphorylation of serine 17) also take place in insect cells. However, another modification of pp60c-src observed in vertebrate cells, phosphorylation of tyrosine 527, is almost undetectable in insect cells (Piwnica-Worms et al., 1990). In addition to myristylation, palmitylation has been shown to take place in insect cells. However, it has not been determined whether all or merely a subfraction of the total recombinant protein contains these modifications. Cleavage of signal sequences, removal of hormonal prosequences, and polyprotein cleavages have also been reported, although cleavage varies in its efficiency. Internal proteolytic cleavages at arginine- or lysine-rich sequences have been reported to be highly inefficient, and alpha-amidation, although it does not occur in cell culture, has been reported in larvae and pupae (Hellers et al., 1991).
For more than a decade, the yeast Saccharomyces cerevisiae has been extensively utilized for the production of foreign proteins. Numerous characteristics of this system account for its popularity, among them (1) the existence of well-developed tools for manipulation of yeast DNA, including expression vectors and host strains; (2) a vast knowledge base regarding the genetics and biochemistry of the organism; (3) a eukaryotic secretory pathway; (4) eukaryotic post-translational modification pathways, such as those for N-linked and O-linked glycosylation; and (5) an extensive track record as a safe organism. Foreign gene expression in S. cerevisiae has been reported for a wide variety of proteins derived from fungal and mammalian species, and many laboratories have contributed to a substantial base of vectors and host strains. Romanos et al. (1992) provide an extensive overview of strategies and progress toward the expression of foreign genes in S. cerevisiae as well as other yeasts. However, although S. cerevisiae has a well-developed eukaryotic secretory pathway, it is not the most efficient yeast for high-level export of proteins to the extracellular medium.
Through the efforts of a diverse and talented community, there are now many vectors and host strains available to direct gene expression in S. cerevisiae (Romanos et al., 1992). A variety of choices is available with respect to specific elements used to direct expression and secretion-e.g., the promoter to direct expression, the signal sequence for secretion, the expression cassette copy number and mechanism for replication, and the selectable marker to establish and/or maintain transformants. Some representative vectors that have been successfully used to direct foreign gene expression in S. cerevisiae are listed in Table 5.6.1. These examples illustrate the diversity of choices available to effect production of foreign proteins in this yeast. During the early 1970s, the methylotrophic yeast Pichia pastoris was developed as a biological tool to convert methanol-a cheap and readily available carbon source that was being discarded by the oil industry as a waste product- into high-quality protein that could be used in the livestock industry. A combination of the oil crisis that began in the mid-1970s, the fact that Pichia as a component of livestock feed was never able to compete in cost with soybeans, and the emergence of the biotechnology industry led to new attempts to find potential uses for the high protein-production capacity of Pichia.
As a yeast, Pichia pastoris is a microbial eukaryote and is as easy to manipulate as Escherichia coli. It has many of the advantages of eukaryotic expression (e.g., protein processing, folding, and post-translational modifications), and it is faster, easier, and cheaper to use than other eukaryotic expression systems, such as baculovirus or mammalian tissue culture. It also generally gives higher expression levels. P. pastoris is completely amenable to the genetic, biochemical, and molecular biological techniques that have been developed over the past several decades for S. cerevisiae with little or no modification. In particular, methods for transformation by complementation, gene disruption, and gene replacement developed for S. cerevisiae work equally well for P. pastoris (Cregg et al., 1987, 1989; Guthrie and Fink, 1991).
P. pastoris, one of four different genera of methylotrophic yeasts (the others being Candida, Hansenula, and Torulopsis), is capable of metabolizing methanol as its sole carbon source. The first step in this process is the oxidation of methanol to formaldehyde by the enzyme alcohol oxidase, generating hydrogen peroxide in the process. To avoid hydrogen peroxide toxicity, methanol metabolism takes place within a specialized cell organelle, the peroxisome, which sequesters toxic intermediates from the rest of the cell. Alcohol oxidase is sequestered in the peroxisomes and functions as a homo-octomer, with each subunit containing one noncovalently bound flavin adenine dinucleotide (FAD) cofactor. Alcohol oxidase has a poor affinity for O2 and P. pastoris compensates for this by generating large amounts of the enzyme. There are two genes in P. pastoris that code for alcohol oxidase-AOX1 and AOX2-but the AOX1 gene is responsible for the vast majority of alcohol oxidase activity in the cell. Expression of the AOX1 gene is tightly regulated and is induced by methanol to very high levels, typically ≥30% of the total soluble protein in cells grown with methanol as the carbon source. The AOX1 gene has been isolated and a plasmid-borne version of its promoter is used to drive expression of the gene of interest encoding a desired heterologous protein (Ellis et al., 1985; Tschopp et al., 1987a; Koutz et al., 1989).
In recent years, mammalian cells have been used in the production of recombinant proteins, antibodies, viruses, viral-subunit proteins, and gene-therapy vectors. In addition to being used in commercial biotechnology, mammalian cell systems have served as a means for examining fundamental aspects of gene replication, transcription, translation, and post-translational protein processing. The availability of transformable cell lines, along with viral and plasmid-based mammalian- cell vector systems, has provided tools through which important aspects of mammalian gene function can be investigated. The following are typical uses for mammalian expression systems: 1. verification of a cloned gene product; 2. analysis of the effects of protein expression on cell physiology; 3. production and isolation of genes from cDNA libraries; 4. production of correctly folded and glycosylated proteins for assessment of biological activity in both in vitro and in vivo systems; 5. production of suitable quantities of proteins and glycoproteins for structural characterization of protein and carbohydrate moieties; 6. production of important clinically active viral surface antigens-e.g., prehepatitis B virus surface antigen (preS2 HBVsAg)-as well as therapeutic proteins-e.g.,-interferon, tissue plasminogen activator (tPA), erythropoietin (EPO), and Factor VIII; and 7. production of monoclonal antibodies. Important features of mammalian cells include their ability to perform post-translational modifications and to secrete glycoproteins that are correctly folded and contain complex antennary oligosaccharides with terminal sialic acid. These covalent modifications may modulate the clinical efficacy of the protein (e.g., circulatory half-life and biospecificity) or result in properties that are of interest for biochemical characterization-e.g., with respect to structural stabilization, functional groups, and biological role. Mammalian-produced proteins are quality-controlled through a process whereby the progress of incompletely folded, misassembled, and unassembled proteins into the secretory pathway is selectively inhibited (Hurtley and Helenius, 1989). The correctly processed material progresses and is generally secreted as fully active protein.
Although a variety of mammalian cell hosts are available for protein production, only a small number have emerged as systems of choice for production of proteins to be used clinically. The most common of these cell hosts are summarized in Table 5.9.1. The narrowing down of choices is largely due to the need for cell lines that: (1) are capable of continuous growth; (2) can be grown in suspension (in bioreactors); (3) have low risk of adventitious infection by potentially pathogenic viruses; (4) have genetic stability; and (5) can be readily characterized with respect to karyology, morphology, isoenzymes, and gene copy number. The existence of a variety of host-cell systems, the availability of viral or cDNA-based vectors, and the possibility of either stable or transient expression requires that the prospective user define an expression strategy based on ultimate goals. When the researcher¡¯s objectives require <1 mg of protein, transient expression in COS-7 cells is the relevant route. Transient expression in the COS-cell and vaccinia systems has been recently reviewed (Moss and Earl, 1991; Aruffo, 1997), and detailed protocols for construction of suitable vectors and protein expression by these systems can be found in those articles. In transient expression a burst of production occurs in the host cell and is usually accompanied by death and rapid lysis of the cell. This presents the purification scientist with the challenge of fishing out the protein of interest, which may be present at 5 g/ml, from a soup of lysed cellular protein, nucleic acids, and viral particles. The yield of product during purification may be low as a result of the low titer and starting purity; however, when only small quantities of protein are required, transient expression in COS cells or the vaccinia system is a quick and suitable system to employ. For production of larger quantities of protein, stable expression must be used because of the difficulty in scaling up transient expression into a bioreactor system.Some cell lines have successfully been used as hosts in production of viruses and useful proteins. These include the human embryonic lung cell line MRC-5 and the normal embryonic cell line WI-38. These cells are attachment growth¨Cdependent and have a finite lifespan of ¡«50 generations, after which time the cells enter a senescent phase and begin to die. Cell lines that have undergone transformation by virus or that have experienced alteration of chromosomes can exhibit a capacity for infinite growth as well as an ability to grow in suspension. Examples of such cell lines are HeLa cells from a human cervical cancer and Namalwa cells from a human lymphoma. There has been some reluctance to use these cell lines in production of clinical agents because of the possibility of transfer of tumorigenic agents to the product. For production of larger amounts of recombinant protein, stable expression is required and has been primarily performed in CHO cells, baby hamster kidney (BHK-21) cells, or myeloma cells (e.g., NS/O). These cell lines are capable of indefinite growth on a large scale and are suitable hosts for stable integration of heterologous DNA. Stable expression results from integration into the host-cell genome of the gene for the expression of the heterologous protein. The integrated gene is transcribed efficiently and the protein is expressed persistently over many generations by the host cell. A stable producercell line typically provides 1 to 10 mg of secreted protein per 109 cells per day (specific productivity). For monoclonal antibody production, productivity levels of 35 to 100 mg per 109 viable cells per day in myeloma cells (Bebbington et al., 1992; Shitara et al., 1994) and 15 to 110 mg per 109 viable cells per day in CHO cells (Page and Sydenham, 1991) are obtainable. Such levels may allow secreted-antibody titers of 1 to 1.5 mg/liter to be achieved in optimized large-scale systems; hence these systems have been popular in biotechnology. Where protein is accumulated intracellularly, it may reach little more than 0.1% to 0.5% of total cell protein.
CHO cells have been used extensively as a host for stable expression of proteins. A number of CHO mutants have been developed that provide the user with tools to examine the synthesis of DNA, RNA, and protein as well as protein secretion, protein glycosylation, and intermediary metabolism. The most popular CHO sublines (also see Urlaub and Chasin, 1980) are CHO DXB11 (dhfr+/dhfr.) and CHO DG44 (dhfr/dhfr.). The DXB11 cell was derived from CHO-K1 in 1978 by Lawrence Chasin. The CHO-K1 cells were originally derived in the 1950s, and the cells that were used in Chasin¡¯s laboratory for the development of the DXB11 line were obtained from Ted Puck and Fa-ten Kao at the Eleanor Roosevelt Cancer Institute in Denver in 1970. The DG44 cell line contains a double mutation in the dhfr genes and is not capable of natural reversion to the dhfr+ phenotype. The DG44 cell line was derived from CHO pro3. cells by Chasin in 1982. The CHO pro3. was derived from the cell line established in the 1950s and is sometimes referred to as CHO Toronto, because it was used extensively in that city by Louis Siminivitch and colleagues.
©2013 BiologicsCorp, All right reserved.