Neurobot Notebook Welcome

Neurobot is a collection of tools for analyzing connectomic datasets and the HHMI Janelia Farm seven-column-medulla dataset in particular. The original inspiration came from experimenting with Pawel Dloko's Neurotop software which implements a class of simplicial complex called a directed flag complex described in a paper published on arXiv [2]. The Dlotko et al paper is self-contained in terms of explaining the relevant mathematics, but you might want to look at David Cox's excellent primer on clique topology for a painless introduction.

In addition to specialized graph and visualization tools, Neurobot includes a convolution operator that applies Dlotko's code to compute topologically invariant properties of the subgraphs embedded in spherical subvolumes as defined by diameter and stride parameters. These properties are used to construct local feature vectors and classify regions of the connectome graph. This notebook introduces the reader to some of most the useful tools by demonstrating a typical workflow analyzing the Janelia dataset.

[1] Pawel Dlotko, Kathryn Hess, Ran Levi, Max Nolte, Michael Reimann, Martina Scolamiero, Katharine Turner, Eilif Muller, and Henry Markram. Topological analysis of the connectome of digital reconstructions of neural microcircuits. CoRR, arXiv: 1601.01580, 2016.
[2] Bratislav Misic and Olaf Sporns. From regions to connections and networks: new bridges between brain and behavior. Current Opinion in Neurobiology, 40: 1-7, 2016.
[3] Ann Sizemore, Chad Giusti, Richard F. Betzel, and Danielle S. Bassett. Closures and cavities in the human connectome. CoRR, arXiv: 1608.03520, 2016.

Seven Column Medulla Dataset

Flies have multi-faceted or compound eyes. The number of facets or ommatidia comprising the compound eyes of insects that have them varies widely, from the dragonfly with its ~30,000 ommatidia to subterranean insects having around 20. Even within the phylogenetic order of so-called true flies known as Diptera, there is significant variation, e.g., the common fruit fly, Drosophila melanogaster has ~800, a house fly ~4,000, and a horse fly ~10,000 ommatidia.

The fly visual system is highly-conserved and extrordinarily stereoptyped thereby facilitating structural studies involving multiple organisms of the same species. It is structurally divided into three sucessive visual neuropils, the lamina, the medulla and the lobula-complex which is further divided into the lobula and lobula-plate. Our focus is on the medulla, but see Alexander Borst's lab page for an excellent overview.

The number of ommatidia is directly related to the number of columns in the medulla. These columns are generally characterized as having ten layers analogous to the six functually distinct layers of the mammalian striate cortex. There are as many columns in the medulla as there are cartridges in the lamina and as many cartridges as there are ommatidia in the eye.

Each column consists of approximately 50 neurons. The Janelia dataset mentioned in the introduction includes seven complete columns and several additional partially completed columns. To estimate the number of neurons in seven columns of Drosophila medulla, multiply the total number of neurons in the medulla—approximately 40,000—by 7 and divide by the total number of columns: 40,000 × 7 / 800 = 350.

The Janelia seven-column-medula dataset includes approximately 500 completely-reconstructed and carefully-annotated neurons. In addition to the 350 neurons in the seven columns there are neurons in the next ring of columns adjoining the seven centrally located ones. All told, there are on the order of 10,000 partially-reconstructed neurons, 200,000 individual synapses and 50,000 multi-synapse T-bar structures.

The voxel resolution of the original EM dataset is 10 x 10 x 10 nanometers. The size of imaged tissue is on the order of a 100 micron cube. The size of the adult Drosophila brain is 590 x 340 x 120 microns. Borst describes the fly brain as a super computer unrivaled by anything we can currently engineer.

Seven Column Work Flow

This section illustrates a typical workflow using some of the tools that we’ve built for exploring the Janelia seven-column-medulla and similar connectomic datatsets. The presentation consists primarily of interactive plots generally followed by a short explanation.

This slide shows a cloud of points that reveal the outlines of seven neurons called type-one medullary intrinsic neurons or Mi1 for short. The points of the central column Mi1 are blue, and those of the surrounding six Mi1 neurons are shaded green. I’ve fit a line to each neuron: red for the central-column Mi1 and yellow lines for the adjacent Mi1 neurons.

Here I’ve used the central Mi1 axis to define a cylindrical volume for closer study. The blue and green lines orthogonal to the cylinder axis represent the distance from the central column to the centroids of the adjacent six column Mi1 cells and a sample of the more distant, partially reconstructed neurons that originate outside the centrally located seven. Using this tool, I can selectively turn on specific neurons or classes of neurons or even neurons that participate in particular subgraphs of the full connectome graph.

The next few plots illustrate tools for analyzing skeletons, examining how pairs of highly connected neurons overlap and looking at the distribution of synapses.

Here we see a skeleton drawn with its original coordinates shifted to the centroid of the central column Mi1 neuron. The estimtated diameter of process—dendrite, axon or cell body—at each point along the skeleton is proportional to the diameter of the circles marking each point.

Here are the same skeleton coordinates as above but with tight x, y, z axis limits and the locations along the skeleton with largest estimated diameters highlighted as possible locations for the soma / cell body.

Here is the same skeleton with coordinates shifted and scaled to the unit cube to facilitate alignment with other neurons and to simplify interpreting the results of running convolutions with nonlinear geometrical or topological filters.

Finally, we scale and render the plot with tight axes limits, noting z-axis distortion due to each dimension being scaled inpendently.

We select two neurons that share a substantial number of synapses and display the skeleton in unit-cube centered at [0.5,0.5,0.5]. Given that we shifted all of the coordinates with respect to the central-column Mi1 neuron centroid, The centroid Mi1 neuron is now located at center of the unit cube. The first of the two neurons is the same one that we displayed in the previous plots.

As a sanity check we compute the bounds for all the scaled coordinates. The scaling parameters were computed from coordinates of the estimated cell-body locations of all neurons—or all of those neurons within a selected cylindrical volume as constructed above—and so we expect that for each axis either the minimum will be 0.0 or the maximum will be 1.0 depending on the distribution of the points around the central column. Some of the synapse coordinates could fall outside the unit cube.

We calculate the mean and the max number of syapses shared by pairs of neurons, select a pair with the maximum number of synapses and then display skeleton of the source neuron (SRC) as we did earlier showing possible locations for the cell body.

Here we show the point cloud for the destination neuron (DST) in a contrasting color with marker size proportional to estmated diameter at each point on the skeleton.

We use a different marker shape to further distinguish visually between the SRC and DST neurons.

Here we see the point clouds of the two neurons superimposed and their shared synapses visualzed as red triangles.

This graphic focuses on the DST neuron and shows the synapses such that the pre-synaptic neuron is SRC in purple and and those such that the pre-synaptic neuron is DST in red.

SLIDE 14

We’re primarily interested in categorizing the local microstructure of the tissue sample. The cartoon in panel (a) depicts the connectome graph embedded in a 3D volume. We run a non-linear convolution filter over the 3D volume, illustrated as a 2D tiling in panel (b). Despite my inadequate rendering in panel (c), the enclosed subgraph for a 3D kernel spanning a sub-volume approximately 20 microns on a side is generally quite complex.

SLIDE 13

Sub-volume-enclosed subgraphs are defined by the position of synapses and not cell bodies. Consider the simple network shown in panel (a). The subgraph comprised of cell bodies as vertices shown in (b) is not completely contained in the volume bounded by the two dashed horizontal lines. Whereas the subgraph shown in panel (c) employing synapses as vertices is completely contained within the lines. In defining sub-circuits contained in local sub-volumes, we use the convention illustrated in panel (c).

SLIDE 15

There is a long history of work analyzing neural circuits in terms of their graph-theoretic properties though most of the work has been applied to fMRI data where resolution—the size of a voxel—is on the order of 5 millimeters compared with 10 nanometers in the case of the Janelia dataset. Intuitively, a network motif is a repeating subgraph that defines a pattern of connectivity exhibiting some degree of functional specificity.

[1] Yu Hu, James Trousdale, Kres̈imir Josíc, and Eric Shea-Brown. Motif statistics and spike correlations in neuronal networks. CoRR, arXiv: 1206.3537, 2015.
[2] Marcus Kaiser. A tutorial in connectome analysis: Topological and spatial features of brain networks. CoRR, arXiv: 1105.4705, 2011.
[3] Arun S. Konagurthu and Arthur M. Lesk. On the origin of distribution patterns of motifs in biological networks. BMC Systems Biology, 2: 1-8, 2008.
[4] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science, 298(5594): 824-827, 2002.

SLIDE 16

Here is an example subgraph. The darker green nodes and black edges give you some idea of just how complex the subgraphs are even in small volumes. This one subgraph involves 148 neurons and over 10,000 synapses. The simplicial complex consists of all k-simplexes for k > 0 where a k-simplex is a complete or fully-connected subgraph—also referred to as a clique—with k + 1 vertices in the unordered graph that has a single sink vertex in the directed graph. This slide shows a 4-simplex of which there are thousands in the simplicial complex associated with this subgraph, typically involving one of a few specialized types of neurons as the sink.

Here is another method for visualizing subgraphs and simplices this time in 3D and allowing interactive examination of the data and manipulations of the axes.

SLIDE 17

Happily, decades of painstaking bench work on Drosophila has helped us in sorting out possible, functionally discriminative motifs consisting of typed k-complexes. If the pattern of connectivity is more or less random or the strength of the connections uncertain, then a topological or graph-theoretical analysis may not be particularly informative. The seven-column dataset does include a confidence field for each synapse, but it is only assigned 1.0 or 0.0. While the Janelia dataset does not include connection-strength metadata, one can infer synaptic weights using vesicle counts and synaptic cleft measurements and thereby enrich the microcircuit connectome graph. However, the preferred way to assign cell types and connection weights is to train an artificial neural network.

We construct feature vectors consisting of $k$-simplex statistics and topological invariants including the Euler characteristic. In the case of simplicial complexes, the Euler characteristic $\chi{}$ is defined as the alternating sum $\chi{} = k_0 - k_1 + k_2 - k_3 \cdots{} k_N$ where $k_n$ denotes the number of $n$-simplices in the simplicial complex and $N$ is the largest integer for which at least one $N$-simplex exists in the simplicial complex. Good luck finding a satisfying interpretation.

We also use another class of topological invariants called Betti numbers $\{\,\beta{}_0, \beta{}_1, \ldots{}, \beta{}_N\}$ that are too complicated to define here and so a few examples will have to suffice: $\beta{}_0$ is the number of connected components, $\beta{}_1$ is the number of one-dimensional "holes", and $\beta{}_2$ is the number of two-dimensional "voids". Even relatively simple unsupervised algorithms like $k$-means can cluster the resulting feature vectors to reconstruct the layered, columnar structure of the medulla.

The sort of analyses considered in the last few slides are necessary but not sufficient for building models of neural circuitry. They are necessary because machine learning technology isn't so advanced that it can be trusted to get it right without any supervision whatsoever. Machine learning often fails spectacularly, since it can easily miss the forest for the trees. Humans can often apply their common-sense reasoning and skill at pattern recognition to catch the most egregious errors.

Machine learning is necessary because complex brains constitute alternative universes in which the dominating laws of physics are different from those governing the sort of phenomena we can (directly) observe in the macroscale universe in which our physical intuitions evolved. Modern machine learning tools such as deep recurrent networks excel in modeling these alien universes because they have few built-in biases aside from those implicit in our selecting a network architecture.

I'm confident we can train an artificial neural network to significantly improve on my poor attempt to be clever by channeling algebraic topology and in particular homology theory. I might have learned less by using a neural network had I taken that route, but I'm not convinced that what I did learn from the exercise is particularly relevant to my primary interest in constructing mesoscale models. It all depends on your loss function.