Task: Apply and extend the methods described in Kato et al [4] (PDF) to the original data from Alipasha Viziri's lab [7] (PDF) and newly published data from Andrew Leifer's lab, see Nguyen et al [6] (PDF).
Data: The data described in Nguyen et al [6] is available from the PNAS page for the published paper under the SI (Supplementary Information) tab. We also have the data1 from Manuel Zimmer's lab used in their 2015 paper [4] in Cell. The Worm Atlas includes connectivity for the Hermaphrodite and Male C. elegans plus diverse metadata.
Task: Implement and apply variants of the algorithms described in Jonas and Kording [3] (PDF) using models for connectivity such as the infinite stochastic block model [5] (PDF) to new datasets.
Data: Inner plexiform layer in the mouse retina from Helmstaedter [2] (PDF). Drosophila melangaster data from FlyEM at HHMI Janelia including one-column—a seven-column dataset is in the works—of the Drosophila medulla with annotations and tools (GIT).
Task: Write a parallel version of Algorithm 1 in Dlotko et al [1] ST2.3 (PDF) (TAR) for constructing directed-clique simplicial complexes and performing all required homology computations in ST2.4. Write a multi-scale convolutional2 filter that computes a 3D feature map characterizing the local topology of the 3D volumetric projection of a neural microcircuit3.
Data: Simulated mouse visual and rat vibrissal barrel cortex from, respectively, Costas Anastassiou's lab—see here for details—at the Allen Institute and the Neocortical Microcircuitry (NMC) portal described in Ramaswamy et al [8] (PDF).
1 The data from Manuel's lab is password protected. Historically neural-recording data has been hard to come by and was generally acquired painstakingly by graduate students and postdocs who quite reasonably want to sequester the data for some period of time so as to amortize the costs of obtaining it—in the highly competitive world they work in, careers depend on protecting this window of opportunity. The situation is changing somewhat with the advent of new acquisition technologies, but attitudes and policies toward sharing will lag behind these advances and so we have to be patient and, at the same time, careful in working with data from other labs.
I've password-protected the data from the Zimmer lab associated with the experiments reported in [4]. Manuel has been generous in letting us work with their data. If you want to use it in your project you are welcome to do so for your class project with the proviso that: (a) you do not share it with anyone else, and (b) you obtain permission from the authors if you wish to publish anything about your findings. If you want to use it in your class project and agree to these terms, contact me and I will send you the password. Here is a note from Manuel—sans the URL for the data—including some additional meta data:
From: Manuel Zimmer Dear Tom, Below is a link for downloading the 5 datasets for unstimulated worms from the Cell paper. Each MatLab file contains the timevector 'tv', neural activity traces uncorrected ('deltaFOverF') and corrected for bleaching ('deltaFOverF_bc') as well as derivatives ('deltaFOverF_deriv'). The identity of neurons is in the same order in the cell array 'ID', if IDs are ambiguous, the cell array contains multiple entries. It also contains a dataset name 'FlNm'. If you have questions about a particular datasets please include this dataset name. The file sevenStateColoring.mat contains the inferred motor command states for each dataset: 'FWD' forward crawling 'SLOW' forward slowing 'DT' dorsal post reversal turn 'VT' ventral post reversal turn 'REV1' reverse crawling 'REV2' reverse crawling 'REVSUS' sustained reverse crawling 'NOSTATE' - ambiguous Let me know if you have questions. Best wishes, Manuel
2 The term convolution in this context refers to its application in computer vision and artificial neural networks. In a convolutional network, a feature map is obtained by repeated application of a function across sub-regions of a 2D image plane, 3D volume or, more generally, a multi-dimensional feature map. The sub-regions can partition or tile the target map allowing no overlap or, more generally, they can cover—in a topological sense—the target thereby allowing overlap. In our case, the repeatedly-applied functions correspond to topological invariants, e.g., Betti number or Euler characteristic, describing the properties (network motifs) of each local sub-region.
3 The most obvious data structures for efficiently storing (once) and retrieving (repeatedly) 3D data include 3D spill trees, KD-trees and various approximate nearest-neighbors algorithms, e.g., [9] and popular libraries such as ANN. If K is the directed-clique complex for the graph G = (V, E) and K′ the corresponding complex for G′ = (V, E′) where E′ ⊆ E, then the Hasse diagram H′ — which is a directed acyclic graph — representing K′ is a subgraph of the Hasse diagram H representing K. Since every transmission-response graph is a subgraph of the original graph of the reconstructed microcircuit, it would seem we can reuse the reference-based data structure described in ST2.1 and therefore apply Algorithm 1 [Page 23, Dlotko et al [1]] but once. However, I'd like to see a proof of this before we start writing algorithms that depend on such a property.