De Novo Protein Design: Using Generative AI to Create Synthetic Enzymes that Eat Plastic Waste
A practical developer guide to designing, validating, and iterating synthetic plastic-degrading enzymes with generative AI and computational pipelines.
De Novo Protein Design: Using Generative AI to Create Synthetic Enzymes that Eat Plastic Waste
Intro — why this matters to engineers
Plastic pollution is a trillion-piece materials problem. Biocatalysis offers a promising route: enzymes such as PETase and MHETase can depolymerize PET into monomers. But natural enzymes are often slow, unstable, or poorly expressed. De novo protein design—creating sequences and folds that never existed in nature—combined with generative AI, gives engineers a practical path to synthesize enzymes tailored to degrade specific plastics at industrial scales.
This post is a sharp, practical guide for developers building computational pipelines: which models and tools to use, how to validate designs in silico, how to prepare libraries for wet-lab testing, and what safety and ethical checks to adopt.
How de novo design pipelines are structured
At a high level the pipeline splits into four phases:
- Generative backbone and sequence design.
- Structure prediction and refinement.
- In silico scoring and filtering.
- Experimental hand-off and iterative evolution.
Each phase has automated components you can stitch together with workflow tools (Airflow, Nextflow, or a simple Python script). The common pattern: generate many candidates, apply increasingly expensive filters, and output a manageable library for wet-lab screening.
Core building blocks (tools)
- Structure prediction: AlphaFold2, ESMFold. Use them to verify foldability.
- Sequence design: ProteinMPNN, ESM-IF, or generative language models like ProtGPT2/ProtTrans for diverse sequences.
- Energy and stability scoring: Rosetta, FoldX for ΔΔG and packing checks.
- Molecular dynamics: OpenMM or GROMACS for short stability runs around the active site.
- Docking and substrate modeling: AutoDock Vina, RosettaLigand, or flexible docking for polymer fragments.
Practical pipeline for developers
Here is a minimal, practical pipeline you can implement:
- Define your substrate model (a PET oligomer, a fragment representing the ester bond you want to cleave).
- Choose a catalytic template: use known active-site motifs (e.g., serine-histidine-aspartate triad) or extract catalytic residues from PETase-like structures.
- Generate backbone scaffolds either by: (a) scaffolding known folds around the motif, (b) hallucinating new backbones with generative models.
- Design sequences for each backbone using
ProteinMPNNor language-model-guided sampling. - Predict structures with
AlphaFold/ESMFoldand filter for RMSD to scaffold and confidence metrics (pLDDT). - Score remaining candidates with Rosetta energy, solubility predictors, and docking against the substrate model.
- Run short MD simulations for top candidates to check active-site integrity.
- Output a prioritized library (dozen to a few hundred variants) for synthesis and experimental screening.
Example pseudocode (developer-friendly)
# pseudo-Python pipeline outline
scaffold_list = generate_backbones(template_motif)
sequence_pool = []
for backbone in scaffold_list:
seqs = protein_mpn_design(backbone, num_samples=50)
sequence_pool.extend(seqs)
predicted = []
for seq in sequence_pool:
structure = alphafold_predict(seq)
if structure.pLDDT_mean > 70 and rmsd_to_backbone(structure, backbone) < 2.5:
predicted.append((seq, structure))
scored = score_with_rosetta(predicted)
top_candidates = select_top(scored, n=100)
run_md_on(top_candidates)
final_lib = prioritize_for_synthesis(top_candidates, criteria=[stability, docking_score, expressibility])
Note: function names are illustrative. Replace with actual library calls or API clients.
Designing for plastic-degrading activity
Plastic degradation has unique constraints compared with small-molecule substrates:
- Substrate is polymeric and bulky. Docking single oligomers is an approximation.
- The active site should accommodate a chain end or an accessible loop and position the scissile ester bond for nucleophilic attack.
- Catalysis often benefits from substrate binding pockets that recognize aromatic rings (for PET) and position the ester.
Key engineering targets:
- Catalytic geometry: ensure nucleophile, acid/base, and oxyanion hole are properly oriented and within hydrogen-bonding distance.
- Surface loops: design flexible loops to bind polymer surfaces; consider cation-pi interactions for aromatic PET.
- Thermostability: higher temperatures can accelerate depolymerization. Aim for thermostable scaffolds or include stabilizing mutations from consensus design.
- Secretability and expression: add signal peptides or tags for secretion when designing for extracellular applications.
Metrics to optimize:
- Predicted stability (ΔG, Rosetta score).
- pLDDT/confidence from AlphaFold.
- Docking/free energy of binding for a model oligomer.
- Predicted solubility and aggregation propensity.
In silico validation: what to run and why
- Structure confidence:
AlphaFoldpLDDT gives coarse foldability. Reject low-confidence designs. - Rosetta or FoldX ΔΔG: flag designs predicted to be destabilized by > 2 kcal/mol.
- MD (10–100 ns, short): check active-site geometry retention and backbone RMSD.
- Docking: sample multiple oligomer conformers and dock both chain-end and interior cuts.
- Mutational scanning: run single-point substitutions in silico to identify stabilizing mutations for later library design.
Automation tips:
- Use batching and asynchronous prediction to scale AlphaFold or remote model APIs.
- Cache intermediate results (predicted structures, scores) to avoid recomputation.
- Track provenance: store the exact model versions, hyperparameters, and random seeds.
Wet-lab handoff and iteration
Design begins in silico but lives or dies in the lab:
- Synthesize genes for top candidates, clone into expression vectors (E. coli, Pichia, or secretion systems as required).
- Establish high-throughput assays: colorimetric release assays, HPLC quantification of monomers, or mass-spec for depolymerization products.
- Use multiplexed fitness or droplet assays to screen thousands of variants, then apply next-round computational design informed by real activity data (active learning loop).
- Combine de novo designs with directed evolution libraries (error-prone PCR or targeted site-saturation) focused on binding pockets and thermostability residues.
Limitations, risks, and ethics
- Dual-use: Powerful enzymes that degrade plastics could also affect non-target polymers or ecosystems. Avoid open release without rigorous ecological risk assessment.
- False confidence in predictions: computational metrics are proxies. pLDDT or Rosetta scores do not guarantee catalytic activity.
- Intellectual property and materials: ensure clearance for commercial sequences, and be mindful of gene-synthesis screening policies.
- Regulatory: environmental release requires compliance and long-term monitoring.
> Practical developers must treat wet-lab validation and containment as first-class concerns, not afterthoughts.
Summary and checklist
- Understand the target substrate and catalytic motif before generating designs.
- Use generative models for diversity, but always filter with structure predictors like AlphaFold.
- Apply layered filtering: quick heuristics first, expensive MD/docking only on top candidates.
- Prioritize expressibility and thermostability alongside catalytic geometry.
- Design candidate libraries for directed evolution and plan for high-throughput assays.
- Implement provenance, versioning, and safety reviews at every handoff.
Checklist for a first experimental run:
- Substrate model (oligomer) prepared and parameterized.
- Catalytic motif defined and anchored in backbones.
- Sequence pool generated (hundreds to thousands).
- Structural prediction and confidence filtering completed.
- Energetic scoring and docking completed for top candidates.
- Short MD validation for final selection.
- Library synthesized and expression constructs validated.
- Assay pipeline and containment procedures in place.
De novo enzyme design for plastic degradation is now practical for developer teams with computational resources. The bottleneck is no longer imagination but rigorous filtering, safe experimental design, and a tight computational–experimental loop. Follow the pipeline above, instrument your workflow, and iterate based on real activity data—this is how generative AI moves from novelty to impact.