Spatial Proteomics: Conquering High-plex Analysis of Spatial Proteomics Images

DOI: https://doi.org/10.47184/tp.2025.01.04

Spatially resolved proteomics has been named the Nature method of the year 2024. To understand and reverse-engineer processes within the tumor microenvironment, in developmental biology, in autoimmune diseases or other areas, researchers need to gain a full understanding of the situation. A deep characterization of cells is desired. Spatially resolved proteomics subsumes a group of methods that promise to facilitate exactly this – they capture the proteome while preserving its spatial origin, allowing to uncover causal relationships and pathways based on observations which cell types attract or reject each other. While fluorescence microscopy has long been used to investigate the co-expression of 2, 3 or 4 markers, in spatial proteomics 10, 20, or even up 100 markers are used in parallel, allowing unprecedented insights. Such a large cocktail of antibodies helps decipher the status and function of individual cells or find rare cell types. Establishing such panels in a lab and for a specific tissue, however, is laborious and expensive, but nonetheless spatial proteomics has gained a lot of traction and has become available in many labs and core facilities. While instruments and kits are now more widely available, the bioinformatic analysis of such datasets remains a challenge. This article focuses on methods and gives an overview of primary and secondary image analysis.

Keywords: immunofluorescence, slide alignment, IHC, AI, computational pathology

Cyclic Immunofluorescence

The most widespread method for spatial proteomics is cyclical staining. Instruments that utilize this method, such as the Phenocycler by Akoya Biosciences, Comet by Lunaphore, MACSima by Miltenyi Biotec, CellScape by Canopy Biosciences or Cell DIVE by Leica Microsystems, are integrated stainers and fluorescence imagers. They apply a nuclei marker (e. g. DAPI) plus a set of 3 or 4 additional antibodies conjugated to a standard set of fluorophores and capture this low-plex immunofluorescence image in a first cycle. Then, they wash off the applied antibodies and apply a second set antibodies conjugated to the identical set of fluorophores and capture a second low-plex image. This procedure can be repeated multiple cycles to obtain a higher and higher plexity. The nuclei marker can be imaged each round or at least intermittently to serve as a reference for later aligning and fusing the set of low-plexes into a single high-plex image. An advantage of this method is the high plexity that can be achieved. A drawback is the long scan duration, which can take many hours up to a day.

Manual Sequential Immunofluorescence

When no such instrument is available, an alternative is to manually stain, scan and wash multiple cycles of antibodies and use an image registration software to eventually merge the scans together into a high-plex. Essentially any fluorescence microscope or scanner, such as the Zeiss AxioScan or Olympus VS200, can be used in this workflow.

High-plex Spectral Imaging

An alternative imaging method is used by instruments such as Orion by RareCyte (up to 20 plex) or PhenoImager by Akoya Biosciences (up to 6 plex). They are equipped with a multispectral optical system that can resolve a larger number of fluorescence channels and therefore no cyclic procedure is required. Instead, all antibodies can be captured in parallel. 
A drawback is that the achievable plexity of this method is more limited compared to cyclical IF. An advantage is that the scan durations are much higher and no co-registration of cycles, which brings with it the risk of introducing alignment errors, is required.

Co-Alignment of IHC Serial Sections

To a limited degree, spatial proteomics is also possible using brightfield immunohistochemistry. Resorting to this method has the advantage that IHC antibodies may already be established in many labs and the risk of wasting expensive anti-bodies or obtaining unreliable signals is low. Restaining the same tissue section is possible but rare. More predominant is the staining of serial sections. After all sections are digitized, they need to be aligned to one another. The center section can serve as a reference and adjacent sections are non-rigidly rotated, translated and morphed to obtain a resulting image that can be overlaid on top of the reference. Nonetheless, co-expression with a cellular resolution is hardly feasible in this approach since the same cell is likely not visible in both sections. The downstream       image analysis should consider this limitation and evaluate co-expression patterns on a local but slightly coarser level. The cell-cell connections analysis or grid analysis methods outlined below are eligible methods in this scenario.

Example Use Case: Biomarker Discovery

Before diving into the intricate details of the bioinformatic analysis of spatial proteomics data, this paragraph shall provide an example of where this technology is used. Researchers at the Max Delbrück Center for Molecular Medicine (MDC) in Berlin analyze tissue from a cohort of patients with head and neck squamous cell carcinoma (HNSCC) who have been treated with PD-L1 immune checkpoint therapy. All figures in this article are created with MIKAIA studio, developed by Fraunhofer IIS [1], on this cohort.

HNSCC arises in the laryngeal, pharyngeal, or oral cavities, and the main factors contributing to the development of these tumors are HPV infection and/or alcohol consumption and smoking. HPV-positive patients generally have a better prognosis due to immune system activation triggered by the infection itself. Patients with recurrent and metastatic tumors are treated with PD-L1 immunotherapy, provided they demonstrate positive PD-L1 expression. However, this therapy is only effective in 15 to 20 % of patients. Due to the lack of alternative therapies, some patients without PD-L1 expression also receive this form of immunotherapy. 

The goal now is to find alternative therapies as well as better prognostic markers for identifying patients who will benefit from immunotherapy. By locating and phenotyping the different cell types, such as immune and cells, their distribution, abundance and interaction can be quantitatively measured. Cells of the same type can behave differently depending on their cellular communication within distinct neighborhoods [5]. This information is then used to compare patients who respond to therapy with those who do not.

AI Cell Segmentation

Regardless of which imaging method is used, the primary analysis steps are similar. Imaged field of views are stitched and illumination corrected, typically, this is done already by the scanning application. In case of fluorescence microscopy, autofluorescence should then be mea-sured and deducted from the marker channel images. For cell segmentation, AI based approaches such as CellPose, StarDist or Mesmer are superior over computer vision-based methods and their use is highly recommended despite the longer computation times since all secondary analyses are based and rely on robustly detected cells. If a membrane stain (or cocktail of membrane stains) is available, this should be fed into the AI to delineate the true cell boundary. The DNA marker (e. g. DAPI or Hoechst) is additionally fed into the AI and serves as a seed for locating cells. While CellPose will find either the nuclei or the membrane contour, other AIs such as Mesmer can identify both in a single run. If no membrane stain is available, it is common to estimate the cell contour by dilating nuclei by a fixed radius. 

In case of brightfield IHC either cell segmentation AIs trained on IHC can be used. An alternative method is to deconvolve both stains (e. g. hematoxylin and DAB) and convert these channels into the optical density (OD) color space, which produces images that look similar to fluorescent images as pixels with a high (low) stain intensity map to brighter (darker) pixels in the resulting OD grey level image (Fig. 1).

Cell Typing

Cell typing here means the step of identifying the cell phenotype. A straightforward approach is to measure the mean intensity per marker, decide for each marker whether it is expressed (positive) or not (negative) – some markers additionally require differentiating between weak and strong expression – and then define a phenotype for each encountered combination of expressed markers. Here, the selection of the threshold has a significant impact on the resulting type and might introduce a bias. An unbiased alternative is to determine the threshold per marker automatically, for instance using k-means clustering or Otsu thresholding based on the measurements obtained per cell. In either case, regarding each combination of markers as a separate phenotype may serve well for small panels, but for large antibody panels the number of encountered combinations will likely become too high. A way around this problem is to group different marker combinations together into a single phenotype. To this end, a cell type map (lineage) can be defined that contains a set of rules describing which markers are required to be positive or negative for a particular cell type – example: Call it a “T helper cell” when both CD3 and CD4 are expressed. All cells that fit this pattern, and unless an alternative more specific rule is also met, will then be assigned this phenotype. This is a targeted approach, where the user knows what cell types they are looking for. An untargeted approach is to automatically determine cell phenotypes in a purely data-driven manner. Clustering algorithms, in particular, can be used for this purpose. Depending on the algorithm, the number of clusters must be predefined or the maximum variance a cluster may obtain before it is again broken down into subclusters. When working with such an unbiased method, it is critical to be able to investigate the properties of each cluster, i. e. figure out the defining markers for a particular cluster and in turn identify the cell type based on expert knowledge or literature research. Here, it may be necessary to fuse or divide clusters until each cluster matches a known cell type. 

Gating, as in cytometry or FACS, is yet another alternative and could be regarded as a hybrid approach utilizing both biased and unbiased decisions. Here, various morphological and stain attributes are first measured for the cell population and then the user can configure various gates – simple ones such as a size threshold or complex ones such as a 2- or 3-dimensional subarea of a t-SNE or UMAP representation of the feature space – to successively divide the cell population into subclusters.

Proteogenomics

When FISH probes are used in addition to protein antibodies, information on both transcriptome and proteome is available and can be integrated. Lunaphore’s COMET instrument, for instance, offers to image a combined panel of protein markers and RNAscope. In terms of primary image analysis, the FISH signals need to be identified by running a spot detection algorithm on the FISH channels. Both, AI methods such as DeepFish or DECODE and computer vision algorithms such as Big-Fish are available. While spatial transcriptomic instruments are capable of identifying hundreds of RNAs today, e. g. by indicating a particular RNA using a barcode that is successively built from only a handful of fluorescent probes imaged across multiple stain & image cycles, in integrated proteo-genomics imaging the available number of RNAs is more limited (e. g. 12 for RNAscope on a Comet). Detected transcripts are then assigned to a cell (cell by gene mapping). The abundance or ratio of transcripts in a cell can be utilized in the cell typing or deliberately held out to then investigate correlations between transcriptome and cell phenotype.

Cellular Neighborhood Analysis

Once cells have been detected and labeled a range of secondary analyses can be conducted to uncover spatial relationships. Nolan et al have introduced the concept of cellular neighborhoods (CN) [2]. In simple terms, this analysis centers on each cell and counts the abundance of cell types in the cell’s neighborhood. The neighborhood is defined by a fixed radius, by the k-nearest cells or a combination of both. This neighborhood analysis alone already provides valuable insights such as what cell types are on average located next to a particular cell type of interest and how does this composition potentially change from a lower towards a higher distance. Cellular neighborhood types (“niches”) can then be detected by feeding these neighborhood compositions for each cell into a clustering algorithm, which will then assign each cell to a particular CN. It is then interesting to look at which CNs are adjacent or whether the presence or frequency of a particular CN is of prognostic relevance (Fig. 2).

Cell-Cell Connections

A similar yet subtly different analysis is to investigate only direct neighborhood relationships. Direct neighborhood relationships can be detected by using the Delauney triangulation algorithm. Statistics can then be collected per neighborhood connection type, where a type is defined by the cell types that are interconnected, e. g. “from macrophage to tumor cell”. The triangulation algorithm yields an undirected graph where each node is a cell, and each edge is a cell-cell connection. From this graph, it is optionally possible to filter too long connections. It is then straightforward to obtain a bystander analysis, which states for a cell type A how many neighbors of type B it has on average. Similarly, the average distance in µm and standard deviation can be collected. More advanced metrics such as the SpatialScore consider the ratio of two connections, e. g. in the original work for each CD4+ T cell the ratio of the distances to its nearest tumor cell relative to its nearest T regulator cell [3]. Using advanced analysis, it is even possible to assign a direction to each edge in the graph with the goal of identifying for neighboring cell types which cell was the attractor and which the attractee [4] (Fig. 3).

When analyzing a stack of aligned IHC-stained serial sections, this type of analysis can be used to realize a “co-expression analysis with tolerance”, where two connected neighbors, unless they have the same type, stem from different serial sections. By constraining the connection length, it is possible to detect and quantify co-located marker occurrences even if it is not possible to detect co-expression within a single cell due to the inherent limitation of working with serial sections.

Quantifying Spatial Heterogeneity Using Grid Analysis

For many of the metrics listed above it may not be sufficient to regard only the average or absolute abundance across the entire analyzed tissue region or across a larger region such as a tumor or tissue layer. This does not capture any spatial heterogeneity that may be present. To this end, a grid analysis presents a simple approach where a virtual grid is positioned over the tissue area. The size of each tile in the grid is user definable, e. g. it could be 100 x 100 µm. Metrics are then measured per tile. Examples are the percentage of a particular cell type in that tile, or the ratio of two cell types such as tumor to immune cells, or tumor-stroma ratio. The measurement per tile in the grid can then be color-coded which results in a heatmap that serves to visually recognize hotspots and judge uniformity versus heterogeneity. The analyzed region can be summarized by the histogram over the per-tile measurements, which in turn can be further densified into a scalar such as the histogram’s peak magnitude, entropy or variance.

Summary

The richness of information hidden with spatial proteomics data calls for a multitude of different analyses. This ar-ticle provided an overview of some of the possibilities. Developing and applying custom algorithms is always an option but requires computer and AI scientists. It also takes time and resources. An alternative is to opt for commercial or open-source tools that can be operated directly by medical domain experts. Prominent options are MIKAIA by Fraunhofer IIS [1], HALO by Indica LabsOncotopix Discovery by VisiopharmQuPath or CellProfiler.

Authors
Dr. Volker Bruns
Gruppenleiter Medical Image Analysis
Fraunhofer IIS
Sonja Fritzsche
Doktorandin
Spatial Proteomics
Max Delbrück Center für Molekulare Medizin (MDC)
Dr. Fabian Coscia
Leiter AG Spatial Proteomics
Max Delbrück Center für Molekulare Medizin (MDC)
From the section