Rapid proteome-wide prediction of lipid-interacting proteins through ligand-guided structural genomics
Inventors & their inventions
Axial: https://linktr.ee/axialxyz
Axial partners with great founders and inventors. We invest in early-stage life sciences companies such as Appia Bio, Seranova Bio, Delix Therapeutics, Simcha Therapeutics, among others often when they are no more than an idea. We are fanatical about helping the rare inventor who is compelled to build their own enduring business. If you or someone you know has a great idea or company in life sciences, Axial would be excited to get to know you and possibly invest in your vision and company. We are excited to be in business with you — email us at info@axialvc.com
The research paper, "Rapid proteome-wide prediction of lipid-interacting proteins through ligand-guided structural genomics" by Chou et al., details the development and implementation of a novel bioinformatic tool called SLiPP (Structure-based Lipid-interacting Pocket Predictor). This tool is designed to rapidly and accurately identify proteins that interact with lipids within a given proteome, offering a significant advancement in the field of lipid biology.
Lipids, crucial components of cellular membranes and signaling pathways, are implicated in various diseases, including cancer and infectious diseases. Understanding the interactions between lipids and proteins is crucial for unraveling their roles in cellular processes and for identifying potential therapeutic targets. However, identifying lipid-binding proteins has been challenging due to the lack of conserved amino acid motifs and the limitations of current experimental techniques.
SLiPP addresses this challenge by leveraging the growing availability of protein structures, both experimentally determined and computationally predicted using AlphaFold. The tool employs a machine learning algorithm trained on a dataset of lipid-binding and non-lipid binding pockets extracted from experimentally determined protein structures in the PDB database. This training enables SLiPP to distinguish between pockets that are likely to bind lipids and those that are not, based on their physiochemical properties.
The development of SLiPP involved several key steps. First, the researchers curated a dataset of protein structures with known ligands, including lipids and non-lipids. They then used the fpocket software to identify potential ligand-binding pockets within these structures, which were further categorized as lipid-binding pockets (LBPs), non-lipid binding pockets (nLBPs), and pseudo-pockets (PPs), representing unliganded pockets identified by fpocket.
The researchers performed principal component analysis (PCA) on these pockets using 17 physiochemical descriptors provided by fpocket. The PCA revealed a clear separation between LBPs and nLBPs, particularly along the second principal component, which was dominated by hydrophobicity-related properties. This separation suggested that hydrophobicity plays a significant role in lipid binding, as anticipated due to the hydrophobic nature of lipid tails.
Next, the researchers evaluated six different machine learning algorithms to build a classifier capable of accurately distinguishing LBPs from other types of pockets. They found that the random forest (RF) algorithm exhibited the best performance, achieving a high F1 score and accuracy. However, the initial model suffered from low sensitivity due to the highly imbalanced nature of the dataset, with a much larger number of PPs compared to LBPs and nLBPs.
To improve the sensitivity of the classifier, the researchers explored two strategies. First, they tested different ratios of PPs to LBPs in the training dataset, finding that a dataset with 20-fold more PPs than LBPs performed optimally. Second, they fine-tuned the hyperparameters of the RF algorithm to further enhance its performance, although the improvements were marginal. Ultimately, the researchers prioritized sensitivity over precision, opting for a model that produced more potential hits for further validation.
The performance of the optimized SLiPP model was rigorously assessed using various test datasets, including an independent test dataset, a dataset of apo (ligand-free) structures, and a dataset of AlphaFold-predicted models. The model performed remarkably well on the independent test dataset, achieving an accuracy of 96.8% and an F1 score of 0.869. However, its performance on the apo and AlphaFold datasets was slightly lower, likely due to the inherent limitations of fpocket in accurately identifying ligand-binding pockets in the absence of ligands. Despite this limitation, the high precision of SLiPP makes it a powerful tool for discovering novel lipid-binding proteins.
To demonstrate the utility of SLiPP in real-world applications, the researchers applied it to predict lipid-binding proteins in the proteomes of three well-annotated organisms: Escherichia coli (E. coli), Saccharomyces cerevisiae (yeast), and Homo sapiens (human). They used AlphaFold-predicted structures for these proteomes, removing signal peptides and filtering out proteins with low confidence scores or fewer than 100 amino acids.
In the E. coli proteome, SLiPP identified 159 putative lipid-binding proteins, representing 3.6% of the proteome. Notably, many of the top hits were already annotated LBPs, providing confidence in the accuracy of the predictions. Additionally, SLiPP highlighted several uncharacterized proteins as potential lipid binders, suggesting new avenues for research.
Similarly, in the yeast proteome, SLiPP predicted 273 hits (4.5% of the proteome). Gene ontology (GO) enrichment analysis of these hits revealed a significant overrepresentation of lipid-related processes, further supporting the validity of the predictions. Interestingly, the analysis also pointed to a potential role of lipids in cation homeostasis and ion transport, suggesting an interplay between these processes.
In the human proteome, SLiPP identified 935 putative lipid-binding proteins (4.6% of the proteome). GO enrichment analysis again confirmed the enrichment of lipid-related processes among the hits, with a focus on transport functions. The researchers also noted that several hits were associated with diseases, highlighting the potential of SLiPP for identifying drug targets.
One compelling example of SLiPP's potential lies in its prediction of GDAP2, a protein implicated in spinocerebellar ataxia, as a lipid-binding protein. SLiPP identified a large hydrophobic pocket within the CRAL-TRIO domain of GDAP2, suggesting a potential lipid-binding site. Interestingly, pathogenic mutations in GDAP2 co-localize with this predicted pocket, suggesting a link between lipid binding and disease pathogenesis.
To understand the underlying features driving SLiPP's predictions, the researchers analyzed the importance of the 17 pocket descriptors used by the classifier model. They found that hydrophobicity-related features, particularly the hydrophobicity score and mean local hydrophobicity density, were the most important factors in distinguishing LBPs from other pockets. This finding reinforces the importance of hydrophobicity in lipid binding and explains the tendency of SLiPP to misidentify heme-binding pockets as LBPs due to their shared hydrophobic characteristics.
In conclusion, SLiPP represents a significant advancement in the field of lipid biology by providing a rapid and accurate method for identifying lipid-binding proteins on a proteome-wide scale. Its reliance on physiochemical properties rather than sequence homology allows for the discovery of novel lipid-binding domains and expands our understanding of lipid-protein interactions. While SLiPP has certain limitations, particularly its inability to detect lipid binding sites formed upon protein oligomerization, its high precision and ability to predict lipid-binding proteins in poorly studied organisms make it a valuable tool for advancing research in this field. The researchers anticipate that SLiPP will be particularly useful in identifying new drug targets for bacterial infections by revealing lipid-binding proteins in pathogenic bacteria that rely on host lipids for survival.