Cryo-electron tomography (cryo-ET) has the potential to reveal cell structure down to atomic resolution. Nevertheless, cellular cryo-ET data is highly complex, requiring image segmentation for visualization and quantification of subcellular structures. Due to noise and anisotropic resolution in cryo-ET data, automatic segmentation based on classical computer vision approaches usually does not perform satisfactorily. Communication between neurons relies on neurotransmitter-filled synaptic vesicle (SV) exocytosis. Cryo-ET study of the spatial organization of SVs and their interconnections allows a better understanding of the mechanisms of exocytosis regulation. Accurate SV segmentation is a prerequisite to obtaining a faithful connectivity representation. Hundreds of SVs are present in a synapse, and their manual segmentation is a bottleneck. We addressed this by designing a workflow consisting of a convolutional network followed by post-processing steps. Alongside, we provide an interactive tool for accurately segmenting spherical vesicles. Our pipeline can in principle segment spherical vesicles in any cell type as well as extracellular and in vitro spherical vesicles.
Introduction
The fine architecture of cells can be investigated by cryo-electron tomography (cryo-ET) (Bäuerlein and Baumeister, 2021). Cellular structures are preserved down to the atomic scale through vitrification and observation of the samples in a fully hydrated state. When a macromolecule is present in a sufficient number of copies in the cells imaged by cryo-ET, it is possible to obtain its atomic structure in situ using subtomogram averaging (Ni et al., 2022; Obr et al., 2022; Tegunov et al., 2021) Cellular cryo-ET datasets are usually extremely complex, making them difficult to analyze. This is aggravated by the sensitivity of biological samples to electron radiation, which limits the signal-to-noise ratio in cryo-ET datasets (Lucić et al., 2005). Tomographic reconstructions are generated from a series of images of the sample acquired at different viewing angles. The geometry of the samples prevents acquisition at certain angles, resulting in anisotropic spatial coverage. The resolution in the directions close to the axis of the electron beam incident on the untilted sample is strongly reduced. This effect, commonly referred to as the missing-wedge artifact, further complicates data analysis. In particular, organelles fully bounded by a membrane appear to have holes at their top and bottom (relative to the electron beam axis) (Lucić et al., 2005).
The synapse is a specialized cellular contact at which information is transmitted from one neuron to another, the presynaptic and postsynaptic synapses, respectively. In most cases, the signal is transmitted by the release of neurotransmitters into the intercellular space. Neurotransmitters are stored in synaptic vesicles (SVs) and are released following the fusion of an SV with the presynaptic plasma membrane. A synapse contains hundreds of SVs and their mobility and recruitability for neurotransmitter release depends on inter-vesicle interactions through so-called connector structures (Fernández-Busnadiego et al., 2010; Radecke et al., 2023; Zuber and Lucić, 2019, 2022). The characterization of these interactions can be performed automatically with the Pyto software, which implements a hierarchical connectivity approach to segment connectors (Lucić et al., 2016). For accurate connector segmentation, precise segmentation of SVs is a prerequisite. To date, SV segmentation has been performed manually, but given the large number of SVs per dataset, it is an extremely time-consuming process. Typically, one person spends three to eight working days segmenting a single dataset. Attempts to perform this task automatically based on classical computer vision algorithms have not yielded sufficiently accurate results (Martinez-Sanchez et al., 2014). Subsequent automation efforts, such as the 3D ART VeSElecT (Kaltdorf et al., 2017), enabled automatic segmentation of SVs in freeze-substituted resin-embedded samples. However, these methods still faced challenges in achieving the level of accuracy required for detailed analyses of SV pools and their interactions in fully hydrated, vitrified cryo-ET samples. To alleviate this situation, we decided to develop an approach based on deep learning.
Convolutional neural networks (CNN) have shown promise in segmenting cryo-ET data and have been integrated into EMAN2 package (Chen et al., 2017). However, while these approaches have been sufficient for some visualization purposes, they have not yet met the stringent requirements for segmenting tethers and connectors in detailed analyses such as those performed with Pyto. Later on, Imbrosci et al. described accurate SV segmentation of transmission electron microscopy images using CNN, but this approach is limited to 2-dimensional (2D) images of resin-embedded synapses (Imbrosci et al., 2022). In the former study, cryo-ET data are decomposed in individual 2D slices, which are handed as separate inputs to the CNN. The independent output 2D prediction images are then reassembled in a 3-dimensional (3D) stack (Chen et al., 2017; Held et al., 2024). As discussed above, membranes oriented approximately parallel to the plane of the 2D tomographic images are not resolved. In the absence of contextual knowledge of the other 2D images, the CNN fails to segment these regions of the vesicles. Hence, spherical vesicles appear open, whereas we expect closed spherical objects. Recently Zhou et al. addressed this issue by implementing a downstream fitting step based on a Gaussian process approach, allowing for the smooth closure of the membranes (Zhou et al., 2023). Ideally, 3D networks should be used to segment 3D cryo-ET data. In this context, several groups have published applications of 3D networks in cryo-ET for other tasks, such as particle picking and classification, to perform subtomogram averaging (de Teresa-Trueba et al., 2023; Lamm et al., 2022; Moebel et al., 2021). However, these papers have not focused on accurately segmenting membranes in cryo-ET data.
We opted to employ a 3D U-Net CNN to process 3D images as input (Çiçek et al., 2016). Weigert et al. (2018) implemented a U-Net for content-aware restoration (CARE) of 3D fluorescence microscopy datasets. They showed that it can restore information from anisotropic and very noisy datasets. Such networks have been used in the last couple of years in cryo-ET analysis, mainly to perform denoising and object detection (Buchholz et al., 2019; de Teresa-Trueba et al., 2023; Liu et al., 2022; Moebel et al., 2021). As is typically the case, segmentation methods can benefit from denoising techniques. However, a dedicated tool is still required to achieve accurate segmentation. Such a tool is essential for enabling detailed studies of SV pool regulation in the presynaptic terminal. Recent work combined 3D network with multiple rounds of manual annotation to obtain better results, which comes with the extra cost of active learning (Lamm et al., 2024, Preprint). We implemented a 3D U-Net based on CARE building blocks and trained it with manually segmented datasets. This method provided good accuracy and was not strongly affected by the missing wedge artifact. Nevertheless, it was not sufficient for our downstream Pyto analysis. Hence, we developed a post-processing method, which transforms the segmented objects into spheres and refines their radius and center location. The workflow includes outlier detection based on the radial profile features of the segmented objects. Then, these mis-segmented vesicles can be either removed or refined. This leads to a substantial improvement in accuracy, which is reflected in Pyto performances comparable with those obtained after manual SV segmentation. We also introduced a semiautomatic method to quickly fix wrongly segmented or missed SVs. This tool can potentially be used to generate a larger training set.
Although our set of procedures was developed with the use case of SV segmentation in mind, it can be used to segment any other types of biological spherical vesicles, such as transport vesicles, secretory vesicles, endocytic vesicles, and extracellular vesicles. Furthermore, the method can also extend to segment membrane-bound organelles that mildly deviate from spherical shape such as endosome.
Results
Overview of training and test paradigm
In view of the effort required for the manual segmentation of SVs, we decided to develop an automatic segmentation procedure. Since we had previously manually segmented a number of tomograms with the program IMOD, we could use these segmentations as the ground truth (Kremer et al., 1996). We trained a U-Net with a set of nine segmented tomograms of rat synaptosomes (see Materials and methods).
We sought to further improve segmentation accuracy by feeding the probability map output by the U-Net to a series of post-processing steps (Fig. 1). Three sets of tomograms were used to assess the performance of the pipeline:
- (1)
Train tomograms: the nine rat synaptosome tomograms that have been used for U-Net training.
- (2)
Within-distribution test tomograms: nine additional rat synaptosome tomograms.
- (3)
Out-of-distribution test tomograms: 12 mouse primary neuronal culture tomograms.
Additionally, to further explore the generalizability of CryoVesNet across data from different species and its potential applicability to non-synaptic spherical vesicles, we applied our method to a set of publicly available tomograms. Due to the absence of ground truth annotations for these datasets, we conducted a qualitative assessment of the segmentation results on these tomograms:
- (4)
Generalization tomograms:
- •
Excitatory and inhibitory synapses in cultured rat hippocampal neurons (EMD-30364, EMD-30365) (Tao et al., 2018).
- •
In vitro reconstituted SVs (EMPIAR-10498) (Ginger et al., 2020).
- •
Cell bodies in a Drosophila melanogaster brain cryo-FIB lamella (EMD-12727) (Bäuerlein et al., 2021, Preprint).
- •
Axons in a human neuronal organoid culture (EMPIAR-10805) (Hoffmann et al., 2021).
- •
Cell body in a Caenorhabditis elegans cryo-FIB lamella (EMD-4869) (Schaffer et al., 2019).
- •
U-Net segmentation workflow
Our automated SV segmentation pipeline consists of several key steps, beginning with the generation of a probability map and then combining this with spherical refinement to optimize the segmentation. This approach aims to accurately identify and segment SVs in cryo-electron tomograms, addressing challenges such as noise and the missing wedge artifact inherent in cryo-ET data.
Each tomogram was split into patches of 323 voxels. These patches were fed into the U-Net, which outputs a probability map for those patches. To obtain a complete probability map, the patches were stitched back together (Figs. 1 and S1). The resulting probability map underwent binarization using a global threshold to create an initial segmentation. Except where mentioned, we restricted the analysis to the region of the tomogram corresponding to the presynaptic terminal by masking it.
At this stage, we noticed that some vesicles were not segmented accurately. Indeed, vesicles in close proximity to one another were at times misidentified as a single entity. Separating them necessitated adjusting the detection threshold to a more stringent value (see “Adaptative local thresholding” section in Materials and methods and Fig. S2). Additionally, there were instances where the initial detection captured only a fraction of a vesicle. In this situation, our adaptative local thresholding strategy loosened the threshold for a more accurate segmentation. In any case, the assignment of a unique label to each segmented vesicle was essential for the subsequent analysis steps.
Sphericalization and radial profile-based refinement
Although at first glance the segmentation looked good after these steps, we noticed that it was not extremely accurate. For example, the vesicles were not always centered in the segment or the radius of the segment was inexact. Very often, the vesicle segment looked shrunk in the z-direction, whereas the actual vesicles were spherical. This would be highly problematic for automatic connector and tether segmentation. To address these issues, we represented each vesicle as a sphere. We determined the center and radius of the sphere as described in the Materials and methods section. We then performed a radial averaging of the intensity around the center of the sphere. And we adjusted iteratively the position and radius of the sphere to match the actual structure in the tomogram (Fig. 2 and see Materials and methods). The radial profile refinement is a pivotal tool as it ensures that the segmented vesicles are a true representation of their form in the tomogram.
While research conducted on synapses from rodent brains indicates that SVs are generally spherical, and, indicating a predominant spherical morphology within synapses, elliptical vesicles can be observed with a higher prevalence in inhibitory synapses (Tao et al., 2018). To distinguish between spherical and elliptical vesicles, we adopted the criteria by Tao et al. (2018). Using this approach, we could either retain the initial segmentation of elliptical vesicles or discard them entirely (see Materials and methods). We demonstrate its use in an excitatory and an inhibitory synapse (Fig. 3). We did not restrict our analysis to the presynaptic terminal as a higher proportion of elliptical vesicles was observed in regions outside of it. The majority of vesicles were spherical (blue), while some were elliptical (yellow). In a magnified view, elliptical vesicles are marked with asterisks (Fig. 3 C), and these vesicles were discarded in this instance. In contrast, Fig. 3 D shows the segmentation retaining them after thresholding. Although our network was initially trained on spherical vesicles, this figure indicates capability in detecting and segmenting elliptical vesicles as well.
Outlier detection and refinement
Despite the improvement brought by the radial profile refinement, some vesicles were still not segmented accurately. By quantifying several parameters of the segmented vesicles, such as radius, membrane thickness, membrane intensity, and lumen intensity, we were able to spot outliers using multivariate statistics, specifically by calculating the Mahalanobis distance (see Materials and methods). Our outlier detection method proved particularly effective in identifying three main types of segmentation inaccuracies: (1) non-vesicular structures mistakenly segmented as vesicles, (2) correctly segmented vesicles with abnormal characteristics (e.g., unusually large radius), and (3) misplaced vesicles with otherwise normal features.
This outlier detection step was important in preparing our data for downstream analyses, as it helped eliminate potential sources of error that could have skewed our results. An example of outlier detection is shown in Fig. 4. In this example, three outliers are highlighted. Outlier 1 (red, top row) corresponds to the mistaken segmentation of a non-vesicular membrane-bound structure. The high Mahalanobis distance of this outlier can be explained by a vesicle radius, membrane thickness, and intensity that are very different from the average of the dataset. Outlier 2 (green, middle row) is correctly segmented but is flagged for its abnormally high radius. Indeed, both the membrane thickness and intensity are close to the average of the dataset but the highly increased radius leads to a high Mahalanobis distance. Outlier 3 (blue, bottom row) is initially detected but is misplaced. Its radius was not divergent from the average but its membrane thickness and intensity were. We could then refine these outliers or remove them if refinement failed.
Expanding segmentation scope
Traditional manual segmentation, while precise, is time-consuming and often limited in scope. In previous cryo-ET studies of presynaptic terminals, the analysis of spatial organization was restricted within 250 nm of the active zone to keep segmentation time reasonable (yellow SVs in Fig. 5). This limitation inherently narrows the scope of synaptic analyses. The advent of deep learning-based segmentation offers a promising alternative, providing both speed and scalability. We were able to segment all SVs (blue, Fig. 5 D) in a fraction of the time that manual segmentation would take. This increased efficiency enables comprehensive analysis of all SV pools, providing a more complete picture of synaptic organization and function.
Performance
The performance of all steps was quantitatively assessed by comparing the obtained segmentation with the ground truth using the Sørensen-Dice coefficient (DICE) metric (see Materials and methods). The DICE of the probability map was 0.80 ± 0.04 for the train tomograms, and in test datasets 0.78 ± 0.03 for the within-distribution, and 0.71 ± 0.08 for the out-of-distribution tomograms (Tables 1, 2, 3, S1, S2, S3, S4, and S5; and Fig. 6). The probability map was then binarized with a global threshold step, which led to a DICE of 0.80 ± 0.05 and 0.83 ± 0.05 in the train and within-distribution test datasets, respectively, while it led to a slight increase of DICE to 0.73 ± 0.10 in the out-of-distribution test dataset. While the localized thresholding step does not significantly improve the DICE, it is essential for reducing false negative vesicle detections. Specifically, the ability of this step to separate falsely connected vesicles is not reflected in the DICE score. This separation corrects instances where two distinct vesicles were initially identified as a single object. Although not captured by the DICE, this process is necessary for accurately determining the center and radius of individual vesicles. The radial profile refinement and final outlier removal steps led to a DICE of 0.88 ± 0.04, 0.85 ± 0.05 and 0.82 ± 0.08, respectively.
In addition to the DICE metric, which is a voxel-wise evaluation, we performed a vesicle-wise evaluation. Namely, we quantified vesicle diameter deviation and center residual (Fig. 6; and Tables 1, 2, and 3). Results show that our method transfers well across datasets even without fine-tuning which shows robustness and generalization, with an F1-score of 0.96 ± 0.04, 0.96 ± 0.03, and 0.90 ± 0.06, fort the train, within-distribution test, and out-of-distribution datasets, respectively (Tables S3, S4, and S5).
Comparison with other methods
To provide a comparison with non-deep learning-based approaches, we assessed the performance of 3D ART VeSElecT, a pipeline designed to segment automatically SVs in electron tomograms of freeze-substituted samples (Kaltdorf et al., 2017). The workflow of 3D ART VeSElecT involves several preprocessing steps, including smoothing, contrast, and edge enhancement. The segmentation process involves image thresholding and the watershed algorithm for 2D and 3D segmentation. Fig. 7 A shows the cryo-ET dataset used for the test, while Fig. 7 B shows the 3D distance map obtained from the 3D ART VeSElecT algorithm before binarization and the final labels output by the algorithm. It is apparent that most SVs do not get correctly segmented by this procedure. This can likely be attributed to the low signal-to-noise ratio inherent to cryo-ET.
We then went on to compare our approach to well-known 2D network architectures and the model implemented in EMAN2 (Chen et al., 2017). Held et al. (2024) have recently trained the latter model on a set of manually annotated SV cross-sections and they have subsequently automatically segmented additional SVs. For the comparison, we conducted extensive training experiments using several well-known 2D networks (Last et al., 2024, Preprint). These networks consisted of InceptionNet (Szegedy et al., 2015, Preprint), ResNet (Szegedy et al., 2017, Preprint), 2D U-Net (Ronneberger et al., 2015), VGGNet (Simonyan and Zisserman, 2014, Preprint), and the model in EMAN2 (Galaz-Montoya et al., 2015). They were trained under conditions similar to those used for our proposed method, ensuring a fair and accurate comparison (see Material and methods and Fig. S3).
Fig. 7, C and D illustrates the probability maps and the labels after initial thresholding generated by the best-performing 2D network (VGGNet) and by CryoVesNet, respectively. While 2D networks can effectively capture information in the x-y plane, they struggle with the full 3D complexity of the data. This limitation is particularly evident when dealing with the missing wedge effect in cryo-electron tomography, which results in anisotropic resolution and incomplete information in the reconstructed volumes (Fig. S4) Consequently, 2D networks trained on these tomograms often miss the top and bottom parts of the vesicles along the z-axis, as it can be observed in Fig. 7 C. This shortcoming highlights the need for a 3D approach like CryoVesNet.
We quantitatively compared performance, measured by DICE score, of six different deep learning models (CryoVesNet, VGGNet, 2D U-Net, EMAN2, ResNet, and InceptionNet) across all post-processing steps except adaptive thresholding (Fig. 8). This step was omitted due to its tendency to expand small vesicles in noisy probability maps, which were obtained by some 2D networks, leading to poorer results. The results show that our post-processing steps consistently improve segmentation performance across all assessed models. This highlights the potential of our pipeline as a versatile tool in cryo-ET data analysis. Despite the superior performance of 3D models in theory, the practical reality of cryo-ET data analysis often relies on 2D annotations due to time constraints and the complexity of 3D manual segmentation (Last et al., 2024, Preprint). Our approach bridges this gap, allowing researchers to leverage existing 2D annotations while benefiting from improved 3D segmentation accuracy. By providing a robust post-processing pipeline that enhances results across multiple architectures, we offer a practical solution that can be integrated into various existing workflows, potentially reducing the need for time-consuming 3D manual annotations while improving overall segmentation quality.
Generalization and interactive tool usage
To assess the generalization capabilities of our procedure, we used our tools on publicly available datasets on the Electron Microscopy Public Image Archive (EMPIAR) and Electron Microscopy Data Bank (EMDB). The generalization dataset contains in vitro–reconstituted SVs (EMPIAR-10498), cell bodies in a D. melanogaster brain cryo-FIB lamella (EMD-12727), axons in a human neuronal organoid culture (EMPIAR-10805), and a cell body in a C. elegans cryo-FIB lamella (EMD-4869). Due to the lack of ground truth annotations, quantitative performance metrics could not be calculated. However, visual inspection and qualitative analysis demonstrate the method’s potential applicability across diverse sample types (Fig. 9).
Even if the pipeline efficiently and accurately segments vesicles, we typically get a few percent of regions mistakenly segmented as vesicles (false positives) and missed vesicles (false negatives; Tables 1, 2, and 3). To fix them, we developed an interactive tool using Napari, a multidimensional image viewer for Python. It enables users to manually remove incorrectly identified vesicles in a single click. Users can also click the approximate center of a missed vesicle and the tool will automatically refine the center position and find the radius of the vesicle. An example of the output of the tool is shown in Fig. 9; and Videos 1 and 2. The right column shows in green the vesicles that were correctly segmented by CryoVesNet without any user intervention. The wrongly segment vesicles are depicted in red, while the missed vesicles were segmented using the Napari tool and are shown in blue. More details on the function of the tool are given in Materials and methods and in Videos 1 and 2.
Downstream analysis and application
Pyto is a software package designed for the analysis of pleomorphic membrane-bound molecular complexes in 3D images, particularly in the context of synaptic cryo-ET (Lucić et al., 2016). A key feature of Pyto is its ability to accurately segment connectors and tethers within the pre-synaptic terminal, a task that requires a high level of vesicle segmentation precision. This segmentation process is hierarchical and connectivity-based, detecting densities interconnecting vesicles (connectors) and densities connecting vesicles to the active-zone plasma membrane (tethers). CryoVesNet has been designed to be compatible with Pyto (Lucić et al., 2016) and an application is demonstrated in Fig. 10 and Video 3. This enables us to investigate SV connectivity and priming at the ultrastructural level to better understand the structural basis of SV exocytosis regulation. A detailed close-up of Fig. 10 D is provided in Fig. S5.
Discussion
Synaptic vesicles: Molecular interactions and functions
SVs play a central role in neurotransmission, facilitating the release of neurotransmitters into the synaptic cleft. These vesicles undergo a series of molecular interactions with various protein complexes, transitioning from a tethered to a primed state, and eventually to neurotransmitter release through exocytosis. Synapsins have been identified as key proteins in regulating the availability of SVs for exocytosis. It has been hypothesized that synapsins crosslink SVs, thereby preventing their premature release. Cryo-ET emerges as a powerful tool to address these challenges, offering unparalleled insights into the molecular architecture of synapses. Accurate segmentation of structures such as vesicles, connectors, and tethers is essential for a comprehensive understanding of synaptic function. Cryo-ET, however, is not without its challenges. The technique suffers from a high level of noise and anisotropic resolution (known as the missing wedge phenomenon), which complicates data analysis and interpretation. Addressing these challenges is crucial for obtaining clear and accurate tomographic reconstructions.
CryoVesNet: Automatic vesicle segmentation in Cryo-ET
By utilizing a U-Net architecture trained on manually segmented tomograms and postprocessing steps, we have developed a system that can efficiently and accurately segment SVs in tomographic datasets. In particular, CryoVesNet is uniquely insensitive to the missing wedge and can segment complete vesicles even if the membrane is not fully visible in the tomogram. Recent methods like IsoNet learn about the effects of the missing wedge by artificially introducing additional missing wedge artifacts during training (Liu et al., 2022). This approach enables them to partially restore information lost due to the experimental missing wedge. IsoNet applies this strategy to reduce tomogram resolution anisotropy. In contrast, our approach takes a different path. CryoVesNet inherently addresses the missing information problem during training as the ground truth segmentations consist of perfect spheres. By learning from these idealized representations, the network can accurately segment vesicles even when faced with incomplete data in real tomograms. The results obtained from our method, as evidenced by the DICE and other evaluation metrics, demonstrate its robustness and accuracy. Notably, the applicability of our method across different datasets, namely from rat synaptosomes and primary neuronal cultures, underscores its versatility. The potential of CryoVesNet to generalize across species and to segment not only SVs but also other spherical membrane-bound organelles illustrates its broad utility in structural biology research. This adaptability suggests that researchers could potentially apply CryoVesNet to a broad range of studies involving spherical vesicles or similar structures. The enhancement in segmentation quality provided by the post-processing steps was observed for both our 3D U-Net and popular 2D networks. Such improvements underscore the crucial role of post-processing in refining results, especially when dealing with image noise and anisotropic resolution, which are characteristic of cryo-ET data.
In our segmentation approach, the use of both global and adaptive localized thresholding techniques further refines the segmentation, addressing challenges posed by closely packed vesicles. Our results highlight the effectiveness and robustness of our post-processing steps, including radial profile refinement and the removal of outliers. The radial profile, in particular, ensures that the segmented vesicles closely match their actual structure in the tomogram, providing a more accurate representation. Although our network was trained exclusively on spherical vesicles, it also detects and segments elliptical vesicles. This facilitates the analysis of inhibitory synapses, which consistently contain a modest yet significant proportion of elliptical SVs. Furthermore, our method’s compatibility with software like Pyto, which is designed for the analysis of pleomorphic membrane-bound molecular complexes in cryo-electron tomograms, enhances its utility. By integrating our segmentation approach with tools like Pyto, researchers can gain deeper insights into vesicle interactions.
By visualizing and quantifying changes in vesicle distribution, connectivity, and tether morphology, we can infer how molecular manipulations translate into structural alterations that ultimately affect synaptic transmission. We focus on nanometer-scale morphological analysis that quantifies SV distribution, tether morphology, and connectivity patterns. This level of analysis is essential for understanding the spatial organization and interactions of synaptic components, offering a comprehensive view of synaptic architecture that bridges the gap between molecular-level interactions and overall synaptic function.
Conclusion
In conclusion, CryoVesNet for automatic segmentation in cryo-ET represents a step forward in the study of SVs and their associated structures. By combining the power of deep learning with optimized post-processing techniques, we offer a solution that is both efficient and precise. As the emerging field of structural cell biology develops, tools like ours will contribute to advancing our understanding of complex cellular structures and processes. Future studies could further expand on this approach by combining our morphological analysis with higher-resolution techniques like subtomogram averaging, potentially providing a multiscale view of synaptic organization from individual protein complexes to overall vesicle arrangements. This integration of methods could significantly advance our understanding of the structure–function relationships in synapses and how they are regulated in various physiological and pathological conditions.
Materials and methods
Cryo-electron tomography datasets
In this study, we used datasets originating from either rat synaptosomes or mouse primary neuron cultures. They represent a total of 30 tomograms with heterogeneous pixel sizes, defocus, and resolution, and we split them into three groups: (1) Train set: nine synaptosome tomograms were used for training. (2) Within-distribution test set: nine independent synaptosome tomograms were used for testing. (3) Out-of-distribution test set: 12 neuron tomograms were used for assessing transfer learning potential. (4) Generalization test set: six tomograms from publicly available databank to demonstrate the generalization across species. The preparation procedure of the samples from which the datasets were obtained as well as the biological analysis of these datasets was previously reported (Radecke et al., 2023).
Manual segmentation and automatic interboundary segment detection
Manual segmentation of SVs, the presynaptic cytoplasm, and the active zone plasma membrane was done in IMOD (Kremer et al., 1996). SVs were segmented as spheres. The presynaptic cytoplasm marked the region to be analyzed by Pyto (Lucić et al., 2016). Later on, we refer to this region as the cytoplasmic segmentation region. It consisted of the volume comprising the active zone and the cluster of SVs. The analysis by Pyto was essentially the same as described previously (Fernández-Busnadiego et al., 2010; Lucić et al., 2016). In short, the segmented region is divided into one voxel thick layers parallel to the active zone for distance calculations. A hierarchical connectivity segmentation detects densities interconnecting boundaries. The boundaries were SVs and the active zone plasma membrane. Detected intervesicular segments are termed connectors and segments connecting vesicles to the active zone plasma membrane are called tethers. Distance calculations respective to SVs were done from the SV center. The segmentation procedure is conservative and tends to miss some tethers and connectors because of noise. Consequently, the numbers of tethers and connectors should not be considered as absolute values, but rather to compare experimental groups. As it was done before, an upper limit was set between 2,100 and 3,200 nm3 on segment volume. The tomograms that were used for this analysis were binned by a factor of 2–3, resulting in voxel sizes between 2.1 and 2.4 nm.
Train and validation set generation
In the preparation of our train set, we utilized segmented 3D image volumes. The primary volume was systematically divided into 323 cubic sub-volumes. To ensure the relevance and richness of the data, only those sub-volumes that were sufficiently occupied by vesicles, specifically containing >1,000 voxels, were retained. 900 sub-volumes were used for training and 200 sub-volumes were used for validation in our experimental setup, we configured the model with a dropout rate of 0.2 to stabilize the validation loss.
Network architecture and training procedure
We used a U-Net with two downsampling stages and two convolutional layers per stage, with a kernel size of 3, and ReLU activation function based on the open-source CARE framework (Fig. 1) (Weigert et al., 2018). We employed the Adam optimizer on a weighted (10:1) binary cross-entropy loss function. The learning progress was tracked by calculating the DICE and the loss value after each training epoch (Fig. S6). The DICE for the train set was initially started at ∼0.25 and rose to just below 0.9 during training, the validation DICE score remained stable around 0.8 after 50 epochs. The loss for the train set decreased from over 1.2 to values close to 0.2 after 50 epochs, whereas for the validation set, the loss initially dropped slightly below 0.5 and then exhibited slight increases and fluctuations. A dropout rate of 0.2 was used to prevent overfitting. We used 50 sub-volumes per batch, and the training was conducted for 200 epochs.
Probability map construction
Our U-Net model, trained on 323-voxel patches, utilizes a 24-voxel region of interest (ROI). To mitigate tiling effects during testing, the network input can be expanded to accommodate larger volumes specifically in this case of 643 voxels. The tomogram undergoes padding to align with the ROI, ensuring reduced edge artifacts. Segmentation is executed in tiles, where the U-Net predicts the SV probability for each tile. Only the central part of the segmented patch, corresponding to the ROI, is retained. Finally, the segmented tiles are reassembled, yielding a continuous SV probability map of the entire volume.
Global thresholding
Segmenting implies turning the probability map into a binary mask. To find the optimal threshold value, we iterated through potential threshold values ranging from 0.8 to 1 in increments of 0.01. A binary mask was generated for each threshold. Subsequently, an erosion operation was applied to the binary mask, and the difference between the original and eroded masks produced the vesicle shell. The voxel intensity values of the original image corresponding to this shell were recorded for each threshold. We minimized the average intensity of the shell voxels to determine the optimal threshold value since the shell of correctly segmented vesicles corresponds to the vesicle membrane, which in cryo-ET appears darker, i.e., with lower intensity values.
Adaptative local thresholding
Segmentation refinement using radial profile
- (1)
The radial average was computed:
- (2)
The radius of the vesicle r was updated as:
- (3)
The radial average was back-projected in three dimensions:
- (4)
We computed by cross-correlation the shift between the obtained 3-D average and the 3-D image in the cubic box with central coordinates C and edge length l = 2r + c, where c is a constant. C was updated by subtraction of the shift.
- (5)
Steps 1–4 were repeated for a maximum of 10 iterations until convergence or until a total shift of , where lo is the edge length of the initial box. The feature space of predicted vesicle labels was computed, containing membrane thickness tm, membrane intensity ρ, and vesicle radius r. ρ was defined as the mean intensity of the radial average within the radial distance interval .
Outlier detection and refinement
With the computed P values, outlier detection and refinement were conducted. Each vesicle with a P value lower than a given threshold was defined as an outlier. In this study, we empirically set the threshold to 0.3, but other values can be used, depending on the use case. The radial profile and P value of the outliers were recalculated using a different box size. We performed this step iteratively. At each iteration, the box size was made larger by 2 × 2 × 2 voxels. For each outlier, the iteration stopped when its P value was higher than the threshold. A maximum of 10 iterations was performed. Vesicles that did not meet the P value criteria were removed from the dataset.
Criteria for spherical vesicle selection
Eccentricity values range from 0 to 1, with 0 indicating a perfect sphere and values closer to 1 indicating more elongated ellipses. Vesicles with an eccentricity <0.48 are considered spherical. We classify vesicles with an eccentricity exceeding 0.95 as too elongated to be considered SVs.
Method comparison
For our 2D networks, we adopted code from the Ais codebase (Last et al., 2024, Preprint). To ensure a fair comparison, we extracted slices with label density matching our model’s training set. These networks were trained under conditions identical to our proposed method, including the use of the same optimizer and class weight ratio (10:1). Training continued for 200 epochs, during which we observed a consistent decrease in training loss and an increase in validation DICE (Fig. S3). We utilized a batch size of 25 slices throughout the training process.
To compare our approach with non-deep learning algorithms, we employed the FIJI Macro 3D ART VeSElecT v2. To align our cryo-ET data more closely with the characteristics expected by this method, we applied non-anisotropic denoising (NAD) and histogram matching as preprocessing steps. We used the default characterization parameters of the VeSElecT method, which include minimum radius, sphericity, and elongation. This preparation ensured that this non-deep learning approach could be applied to our dataset, allowing for a more meaningful comparison between traditional and deep learning-based segmentation techniques.
Interactive vesicle segmentation tool
The interactive Napari tool offers several key features for image visualization and segmentation editing. It displays the original cryo-ET image alongside an editable label layer representing the current segmentation. Users can add, modify, or remove vesicle annotations using a point-based interface. It is particularly useful for addressing false positives and false negatives in vesicle detection.
The tool provides various modification operations, including computing labels, removing labels, and adding spherical labels. The “compute labels” function uses a radial profile refinement step to segment vesicles based on user-specified points as vesicle center approximations. In this process, the function initially creates a bounding box around each point using a default diameter of 45 nm (larger than a typical SV diameter) as a starting point for analysis. The refinement step then examines the intensity profile radially from the center of each potential vesicle, adjusting the position and radius to best fit the actual vesicle boundaries in the image data iteratively (see “Segmentation refinement using radial profile” section). The “remove labels” function will remove any label under the user-specified points. Users can also employ the “add spherical labels” feature as a fully manual annotation method, in which they interactively define the center and radius of the label to add. To enhance user efficiency, keyboard shortcuts for common operations have been implemented. The procedure is demonstrated in Video 1.
Evaluation metrics
The evaluation framework was designed to assess the capabilities of the proposed toolbox for automatic SV segmentation. We defined as ground truth the manual segmentation of SVs. The evaluation was performed within the cytoplasmic segmentation region (see “Manual segmentation and automatic interboundary segment detection”). We performed per-vesicle evaluation and voxel-wise evaluation. For the former, we defined a vesicle as correctly segmented if the center of the predicted vesicle was located inside the ground truth vesicle. Based on that we calculated an F1 score. For the voxel-wise evaluation, we calculated the DICE between the prediction and the ground truth.
Voxel-wise evaluation
The DICE was also employed to monitor all stages of post-processing of the labels and to observe the effect of each post-processing step.
Vesicle diameter and position deviation
Similarly, the average deviation in position estimation across all vesicles can be expressed as Δc. It corresponds to the average Euclidean distance between the center of ground truth vesicles and the center of predicted vesicles.
Statistical comparison
Multiple pairwise ANOVA comparisons with Benjamini-Hochberg correction were performed on the DICE values summarized in Table S2 to assess the statistical significance of the differences between the DICE values (Benjamini and Hochberg, 1995). We performed Benjamini-Hochberg correction with the multipletests function implemented in the Python module statsmodels (Seabold and Perktold, 2010). A list of P-values resulting from pairwise comparisons was input, and multipletests output a list of corrected P-values. The used implementation of the Benjamini-Hochberg correction does not require a false discovery rate to be input. This variation of the original Benjamini-Hochberg correction algorithm was proposed by Yekutieli and Benjamini (1999). If a corrected P-value is smaller than the defined acceptable false discovery rate, then the null hypothesis is rejected, i.e., the difference is considered statistically significant. This algorithm enables to test multiple false discovery rates in one step and its conclusions are exactly the same as the original Benjamini-Hochberg correction algorithm run multiple times with different false discovery rates.
Computational setup
All experiments were conducted using 4 × NVIDIA 2080 Ti GPUs with CUDA 10.1. The software environment was set up with Python 3. Key libraries and packages used include TensorFlow 2.4.1 with GPU support and Keras 2.4.3. Image visualization was achieved with UCSF ChimeraX and Amira 2022.2 (Thermo Fisher Scientific). The contouring shown in Fig. S2 was done using MATLAB version R2023a (The MathWorks, Inc.). Surface rendering was performed using the volume tracer and color zone in UCSF ChimeraX.
Manuscript preparation
The first version of the manuscript was written with the open and collaborative scientific writing package Manubot (Himmelstein et al., 2019).
Online supplemental material
Fig. S1 shows a 2D slice of an automatically segmented dataset, including a presynaptic terminal section and predicted probability maps. Fig. S2 illustrates the adaptive local thresholding applied to the probability map. Fig. S3 presents the training performance of various 2D networks. Fig. S4 shows the reconstructed neuron tomogram and probability map output of VGGNet and CryoVesnet. Fig. S5 provides a close-up of segmented SVs and connectors. Fig. S6 illustrates the DICE coefficient and loss values for both training and validation sets over epochs. Table S1 lists DICE statistical values across different algorithm steps for various datasets. Table S2 provides corrected P-values for DICE value comparisons between different algorithms. Table S3 summarizes per-tomogram train set metrics, including F1 scores and false positives. Table S4 presents metrics for within-distribution test sets. Table S5 details out-of-distribution test set metrics. Video 1 demonstrates the usage of the Napari Interactive Tool for segmenting SVs. Video 2 shows how to use the tool for segmenting SVs from scratch. Video 3 shows a tomogram and segmentation of a cultured mouse neuron synapse, highlighting various neuronal structures.
Data availability
The training dataset annotation, along with the corresponding raw data, binned tomograms, and processed tomograms, has been deposited to EMPIAR (Iudin et al., 2023) under accession code EMPIAR-12195. CryoVesNet is available on GitHub at https://github.com/Zuber-group/CryoVesNet.
Acknowledgments
This work was supported by the Swiss National Science Foundation (grant number 179520 to B. Zuber), ERA-NET NEURON (NEURON-119 to B. Zuber), and the University of Bern Research Foundation (to Ioan Iacovache).
Author contributions: A. Khosrozadeh: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing - original draft, Writing - review & editing, R. Seeger: Data curation, Formal analysis, Investigation, Validation, Visualization, Writing - original draft, G. Witz: Methodology, Software, J. Radecke: Investigation, Resources, Writing - review & editing, J.B. Sørensen: Resources, Writing - review & editing, B. Zuber: Conceptualization, Data curation, Formal analysis, Funding acquisition, Methodology, Project administration, Software, Supervision, Writing - review & editing.
References
Author notes
Disclosures: The authors declare no competing interests exist.