^{1}

^{2}

^{1}

^{1}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: TP AC. Performed the experiments: TP AC. Analyzed the data: TP. Contributed reagents/materials/analysis tools: TP AC SP. Wrote the paper: TP SP LS. Assisted with writing and analysis: LS.

Electron Microscopy (EM) image (or volume) segmentation has become significantly important in recent years as an instrument for connectomics. This paper proposes a novel agglomerative framework for EM segmentation. In particular, given an over-segmented image or volume, we propose a novel framework for accurately clustering regions of the same neuron. Unlike existing agglomerative methods, the proposed context-aware algorithm divides superpixels (over-segmented regions) of different biological entities into different subsets and agglomerates them separately. In addition, this paper describes a “delayed” scheme for agglomerative clustering that postpones some of the merge decisions, pertaining to newly formed bodies, in order to generate a more confident boundary prediction. We report significant improvements attained by the proposed approach in segmentation accuracy over existing standard methods on 2D and 3D datasets.

Extracting the network structure among neurons in animal brain has gained prominence lately in the field of neuroscience. Rapid advances in imaging technology, in particular Electron Microscopy (EM) techniques, have enabled us to trace neural bodies in unprecedented level of details. However, recording in such high resolution (at nanometer scale) generates massive amount of data that is too large to annotate manually. Automated region labeling or segmentation is considered to be the most viable strategy for generating a dense reconstruction of neural anatomy. Some recent efforts of such reconstruction yielded impressive results utilizing machine learning/computer vision tools such as image segmentation, and offered valuable biological insights to the neuroscience community [

Image segmentation for natural scenes has a long history in computer vision literature [

Different approaches resort to different methods to generate the final segmentation by merging or clustering the over-segmented bodies to corresponding neuronal cells. Andres et.al. [

Biologically, the interior of a neuron cell comprises several distinct sub-structures (or sub-categories) such as cytoplasm, mitochondria, vesicles etc. An ideal binary pixel classifier—which assigns a pixel to one of the two categories: cell boundary and cell interior—should label all locations within these sub-categories to cell-interior class. Several past studies [

However, we believe the methods of [

(a) one plane of input volume, (b) mitochondria detection on that plane, (c) the output of GALA [

This paper introduces a context-aware scheme for combining over-segmented regions by utilizing the prior knowledge of sub-classes. We adopt an agglomerative or hierarchical clustering framework [

We also propose a modified version of the hierarchical clustering algorithm to cluster the superpixels in both phases of the context-aware framework. The proposed clustering method emphasizes on minimizing under-segmentation errors since these errors are conventionally costlier to correct than the over-segmentation errors [

The paper is organized as follows. We define the problem in Section 2 and briefly describe the existing clustering segmentation algorithms in Section 2.1. Then we explain the proposed delayed agglomeration scheme in Section 2.2. This delayed strategy is employed in both the stages of our context-aware algorithm discussed in Section 2.3. Section 3 reports our experimental setup, both quantitative and qualitative results and their analyses. We conclude and discuss our findings further in Section 4.

A formal definition of the problem we are addressing assumes an initial over-segmentation, comprising _{1}, _{2}, …, _{N}} ⊆ 𝓢, of an EM image or volume with _{i}, _{i}).

We denote a boundary between two superpixels (i.e., oversegmented regions) by a pair of regions _{i}, _{j}} and the set of all such boundaries by _{i} is considered to be a node and the boundary or face between two regions is regarded as an edge—a notation we will be using throughout the paper. Also, let the boundary label map _{i} to its corresponding _{i}) is similar to a clustering problem where the number of clusters cannot be computed a priori. Following [

In our context-aware scheme, the set of superpixels is divided into two subsets: 1) the 𝓢_{c} of potential cytoplasm superpixels, and 2) the set 𝓢_{m} of potential mitochondria superpixels. The set of cytoplasm superpixels is clustered first with the proposed delayed agglomeration algorithm. Agglomeration of the mitochondria superpixels is also performed by the proposed delayed method, but with a different merge criterion. In order to assist the reader to comprehend the novelty of the proposed approach, we introduce the prior studies on agglomerative clustering for EM segmentation [

Several existing EM segmentation approaches [_{i}, _{j}}) ∈ [0, 1] indicates how confident the estimator is about the existence of a true boundary between _{i} and _{j}: a large _{i}, _{j}}) implies the estimator is very confident that the boundary {_{i}, _{j}} is correct while a small value implies the boundary was probably generated as an artifact of over-segmentation and therefore is false. Given such a function

_{1}, _{2}, …, _{N} and confidence function |

_{1}, _{2}, …, _{N′} |

_{i}_{i} |

_{i}, _{j}} = _{e ∈ E} |

_{j} to _{i} and update |

_{b} ∈ _{j}) |

_{i}, _{b}}); |

Each time a superpixel border is dissolved in standard agglomerative clustering, it modifies the characteristic representations of the pixels within the superpixels and on the boundary. This demands the confidences of the estimator function

Our adaption of segmentation commences with the boundary with lowest estimator confidence and repeatedly dissolve edges with in ascending order of

This method is described in _{j} is absorbed into _{i}, we do not _{i}, _{b}} between the recently merged _{i} and its updated neighbors _{b}. We maintain a set of edges _{i}, _{b}} only if its confidence increases from that of {_{j}, _{b}} after _{j} is absorbed into _{i} (Line 2 in _{i}, _{b}}) decreases from previous value, are kept aside until there are no members left in _{i}, _{b}} is less than the agglomeration threshold (Line 2 in

_{1}, _{2}, …, _{N} and confidence function |

_{1}, _{2}, …, _{N′} |

_{i}_{i}, Flag( |

_{i}, _{j}} = _{e ∈ W} |

_{j} to _{i}, i.e., _{i} = {_{i}∪_{j}}, and update |

_{b} ∈ Nbr(_{i}) do |

_{i}, _{b}}) > _{j}, _{b}}) then |

_{i}, _{b}}) = ACTIVE; |

_{i}, _{b}}) = DELAY; |

Effectively, the proposed strategy ‘delays’ the merging of new edges {_{i}, _{b}} resulting from a merge: either due to an increase in _{i}, _{b}}) or deliberately if _{i}, _{b}}) decreases. To avoid propagating wrong decisions made on smaller superpixels to the larger ones, this design postpones the merge decisions on the newly formed bodies for a later time. Our analyses support that deferring decisions on these edges significantly reduces false merges during agglomeration.

Asymptotically, the running time of the delayed algorithm remains the same as the traditional agglomerative clustering in the worst case. Instead of adding the adjacent boundaries to the priority queue, the delayed algorithm stores them in a separate list. Later, building a queue from this list would require _{1}) time where the length _{1} of new list must be smaller than that of the previous one (which contains all edges): _{1}.

Our implementation is tuned to reduce the running time of delayed agglomeration. Notice that, a subset of adjacent boundaries is not pushed back or updated into the queue (Line 13 of _{2} and

The proposed context-aware agglomeration is composed of two different phases. We separate the set 𝓢_{m} of potential mitochondria superpixels from the set 𝓢_{c} of potential cytoplasm superpixels assuming the existence of an effective mitochondria superpixel detector (e.g., [_{c} are agglomerated first by the proposed delayed policy. Motivated by [_{c} is trained to act as the boundary predictor function for clustering the set 𝓢_{c} of cytoplasm superpixels. During _{c} training, mitochondria-cytoplasm borders are treated the same way as cell membrane.

In the second step, the mitochondria-cytoplasm edges are merged in the same delayed scheme as explained in Section 2.2, but with a different estimator function _{m}. In order to absorb mitochondria into corresponding cells, we apply the delayed-agglomeration algorithm with a small alteration. The _{c}, _{m}} ∣ type(_{c}) = Cyto, type(_{m}) = Mito, Flag({_{c}, _{m}}) = ACTIVE}; _{m}, _{c}}) to be the fraction of the total boundary of _{m} which separates _{m} from _{c}: _{m}, _{c}} with a mitochondria superpixel _{m} and a cytoplasm superpixel _{c}, the confidence is defined as _{m}({_{m}, _{c}}) = 1−_{m}, _{c}}).

In effect, the mitochondria superpixels are combined with the cytoplasm superpixels in the descending order of the overlap ratio between these two types of regions. That is, a mitochondria superpixel is merged into the adjacent cytoplasm region with the largest overlap between their boundaries. The combined cytoplasm-mitochondria superpixel created by such merge then identifies the next mitochondria superpixel with the largest overlap to absorb in the next step. We show snapshots of this process, at different values of _{m}, _{c}) in

The figure shows mitochondria superpixels absorbed into cytoplasm superpixels up to different values of overlap ratio _{m}, _{c})

We have applied the proposed method to EM images of two different modalities: isotropic Focused Ion Beam Scanning Electron Microscope (FIBSEM) data and anisotropic serial section Transmission Electron Microscopy (ssTEM) data. For both types of input data, the image (volume for the isotropic data) is first over-segmented for the agglomeration to be applied on. In the following sections, we explain our over-segmentation process and the error measures used to evaluate segmentation performance before reporting the results on FIBSEM and ssTEM data in Sections 3.3 and 3.4 respectively.

We learn a classifier to assign each individual pixel into multiple categories, such as cell boundary, cytoplasm, mitochondria and mitochondria boundary, using the interactive tool Ilastik [

The set 𝓢_{m} of probable mitochondria superpixels is populated with all regions possessing mean mitochondria probability (estimated by our pixelwise RF classifier trained by Ilastik) above a certain threshold. The rest of the superpixels constitute the set 𝓢_{c} of possible cytoplasm regions. The training set for superpixel boundary classifier _{c} consists of all boundaries among members of 𝓢_{c} as well as the mitochondria-cytoplasm borders. Similar to [

We report segmentation error of both types, namely under- and over-segmentation, separately because one of these errors (under-segmentation) is costlier than the other. Split versions of variance of information (VI) [_{1}, …, _{M}}, and a segmentation (SG), _{1}, …, _{P}}, we compute the over-segmentation (OE) and under-segmentation (UE) errors by splitting the terms in VI and RE. For split-VI, the over and under-segmentation are quantified as follows.

We also quantify segmentation error by average percentage (× 10^{−5}) of pairs of voxels falsely merged and split by any method. Formally, the over-segmentation (OE) and under-segmentation (UE) is computed based on the following formula.

^{3} volume and applied on two 520^{3} test volumes.

^{3} volume) and segmented two 520^{3} volumes 5 times and averaged their scores. We plot the average _{UE} and _{OE} respectively on x and y-axis respectively in plots on the left column of _{UE} and _{OE} errors on x and y-axis respectively on right columns of _{c} ∈ [0.1,0.2] which was used as stopping criterion for cytoplasm merging. For [

Top: Test volume 1 and bottom: Test volume 2. Left column shows split-VI error: _{UE} in x-axis, _{OE} in y-axis; right column shows split-RE: _{UE} in x-axis, _{OE} in y-axis. Each curve is the average of results in 5 trials. Each point represents either a stopping point for clustering or bias parameter for [

As the plots show, both the delayed agglomeration and two-phase segmentation process attained significant improvement over past methods: compare the performance of LASH (red +) with LASH-D (black x) and that of GALA (cyan *) with CADA variants (green square and blue circle). Compared to the rest of the techniques, the two variants of proposed methods, namely CADA-L and CADA-F, appear to achieve the most favorable segmentations by reducing the over-segmentation steeply without increasing the false merge numbers much. During segmentation, the delayed version decreases the time needed for segmentation approximately 5 times among the agglomerative approaches.

It is also worth mentioning that, in a two stage segmentation scheme, the performance of a depth limited RF (i.e., CADA-L, green square), learned without accumulating training set over multiple passes, is very similar to that of the standard RF (CADA-F, blue circle) trained over cumulative learning passes. Training full-depth RF (CADA-F) with multiple passes needed several hours whereas training a depth limited single iteration (CADA-L) required ≤ 5 minutes.

Three columns show segmentation outputs overlaid with random colors on three planes of the FIBSEM volume. The rows, from top to bottom, show the output of LASH-D, GLobal multicut [

Method | Run time (min) |
---|---|

LASH: Standard Agglomeration, context-oblivious | 5.35 ± 0.2 |

LASH-D: Delayed Agglomeration, context-oblivious | 2.72 ± 0.06 |

CADA: Delayed Agglomeration, context-aware | 4.69 ± 0.02 |

Global multicut | 7.13 ± 1.1 |

All the algorithms, except CADA-L and Global Multicut, perform standard agglomeration multiple times (we repeated 5 times) in order to obtain extensive training sets for superpixel boundary learning. Both CADA-L and Global method exploited the same classifier learned from the initial set of boundaries existed in the over-segmented data (without training set augmentation).

In the following subsections, we analyze why the proposed strategies improve the segmentation performance over the existing approaches.

It is perhaps intuitive that traditional context-oblivious agglomeration will result in higher degree of over-segmentation than the context-aware method. The mitochondria-cytoplasm borders indeed have strong feature similarity with cell membranes and consequently superpixel boundary predictors cannot distinguish between these two types of borders perfectly. Recall that, for segmentation, we need to dissolve the mitochondria-cytoplasm border but retain the cell boundaries. In order to substantiate our claim, we trained a superpixel boundary classifier in context-oblivious fashion (0: false cell membrane, 1:true cell membrane) and computed its confidences on these two types of boundaries.

The plot is clipped at

In addition, due to appearance dissimilarity, the distribution of same features computed on cytoplasm and mitochondria will be substantially different from each other. Combining these two types of feature value distribution will impede the identification of false boundaries between cytoplasm superpixels such as the one in the lower left corner of the output of GALA in

In practice, mitochondria from two different cells could also lead to false merges. Often the mitochondria regions from two cells are closely located to the cell membrane, or other mitochondria regions from neighboring cells, blurring the boundary. Figs

The split-VI plot in _{c}(

were incorrectly split (over-segmented) by the Global method,

were correctly merged by proposed algorithm.

Left column: test volume 1, right column: test volume 2. Each curve is the average of results in 5 trials. Each point represents either a stopping point for clustering or bias parameter.

These boundary predictions were plotted on x-axis of _{c}(_{c} = 0.2.

Left: False splits (over-segmentation) of Global method corrected by proposed CADA-L. Each point corresponds to a false boundary that Global method failed to dissolve. The x-axis labels indicate the predictor confidence at the beginning of the proposed agglomeration and y-axis plots the predictor confidence at the point it was merged accurately by the agglomeration. Right: False merges (under-segmentation) of Standard agglomeration corrected by delayed method—x-axis: boundary indices, y-axis: predictor confidence. The confidences computed for the same correct edge in traditional agglomeration and in the proposed delayed version is plotted in blue square and red ‘+’. The confidences on many true boundaries were increased by the delayed approach.

Note that, the agglomerative process correctly reduced the confidences of many false boundaries that received a high score by the predictor at the beginning (high x value but low y value). This refinement is possible through the evolution of the superpixels in the agglomerative process—an advantage the Global method of [_{c}(_{c} within the rectangular region in

In order to illustrate the improved accuracy attained by the delayed agglomeration over the standard one, we collected all _{c} = 0.14 The confidences (clipped to 0.25) of these 534 edges generated by standard and delayed agglomeration are plotted in _{c} of many of these faces, among which, 41 exceeded the threshold of 0.14 (green line) and avoided a false merge. In addition to these common supervoxel edges, the standard and delayed algorithms independently generated 163 and 4 more incorrect merges respectively.

This section reports the 2D segmentation results that our method and others produced on a different data modality, namely ssTEM images. These images were part of those generated for the work of [

Left column shows split-VI error: _{UE} in x-axis, _{OE} in y-axis; right column shows split-RE: _{UE} in x-axis, _{OE} in y-axis. The curves are averages of errors on 15 1000 × 1000 images. The results of Global method [

In

The segmentation outputs are overlaid with random colors on the grayscale images. Top row: input, middle GALA and bottom: proposed CADA-L. Significant over-segmentation errors and under-segmentation errors are marked in yellow rectangles and red ellipses respectively.

We argue that, due to considerable ambiguity in appearances, it is only rational for an EM segmentation algorithm to be context-aware in each of its stages, i.e., in both pixel and superpixel levels (and in alignment for anisotropic data). The results reported in this paper support our claim that a context-aware clustering of sub-classes such as cytoplasm and mitochondria can improve segmentation accuracy significantly given fairly accurate sub-class detection. Our examination of both isotropic and anisotropic data suggests cell structures cannot be meaningfully identified without mitochondria regions and it is non-trivial to combine detection with a segmentation that ignores it (e.g., [

In addition to reducing the over- and under-segmentation errors, one of the variants of our classifier, namely CADA-L, can be trained considerably faster than those in other methods because CADA-L demands substantially fewer training examples and no training iterations. A context-oblivious strategy gain significantly (compare LASH-D with GALA in

We further investigated this conjecture and developed a semi-supervised active learning algorithm to train the supevoxel boundary classifier with as few as < 20% of the total examples [

We have applied our context aware algorithm to segment 216 FIBSEM volumes of 520^{3} voxels each, with a 10nm isotropic resolution, from the Medulla region of fly retina. To our knowledge, this is an attempt to reconstruct one of the largest volumes for such animal. Compared to the result of [^{3} blocks, our segmentation resulted in an estimated 30% reduction in subsequent manual correction time. In addition, our segmentation was sufficiently accurate for regions that pertains to Post-synaptic densities (PSD), i.e., the synaptic partners of a cell. During the manual annotation of these PSDs, the output of our segmentation method assisted the experts to improve their performances [

The authors wish to thank Stuart Berg and Bill Katz for their contributions in software development; Pat Rivlin and Shinye Takemura for providing the annotated dataset of [