Stochastic inference with spiking neurons in the high-conductance state

Sto c hastic inference with spiking neurons in the high-conductance state ∗ Mihai A. P etrovici 1 *, Johannes Bill 1 , 2 *, Ilja Bytschok 1 , Johannes Schemmel 1 , Karlheinz Meier 1 1 Kir chhoﬀ Institute for Physics, University of Heidelb er g 2 Institute for The or etic al Computer Scienc e, Gr az University of T e chnolo gy (Dated: October 25, 2016) The highly v ariable dynamics of neo cortical circuits observed in vivo ha v e b een h yp othesized to represen t a signature of ongoing sto c hastic inference, but stand in apparent contrast to the deterministic resp onse of neurons measured in vitr o . Based on a propagation of the mem brane auto correlation across spik e bursts, w e pro vide an analytical deriv ation of the neural activ ation function that holds for a large parameter space, including the high-conductance state. On this basis, w e sho w ho w an ensem ble of leaky in tegrate-and-ﬁre neurons with conductance-based synapses em b edded in a spiking en vironment can attain the correct ﬁring statistics for sampling from a well- deﬁned target distribution. F or recurren t netw orks, we examine conv ergence tow ards stationarit y in computer sim ulations and demonstrate sample-based Bay esian inference in a mixed graphical model. This points to a new computational role of high-conductance states and establishes a rigorous link b et w een deterministic neuron models and functional sto c hastic dynamics on the netw ork level. In tro duction In responding to en vironmental stim uli, brains ha ve to mak e predictions based on incomplete, noisy and am- biguous data. The recen t h yp othesis that the brain cop es with this challenge by p erforming Bay esian, rather than logical inference [1–3], has b een strengthened by electro- ph ysiological data which iden tiﬁed neural correlates of the inv olved computations [4, 5] and theoretical work on p oten tial spiking netw ork implementations [6–8]. In probabilistic inference, possible v alues of a quan- tit y are describ ed b y a random v ariable (R V) z k and all dep endencies betw een R Vs are stored in a join t distribution p ( z 1 , . . . , z K ). The b elief ab out a set of unobserv ed R Vs { z 1 , . . . , z M } giv en an observed set of R Vs is represen ted by the p osterior distribution p ( z 1 , . . . , z M | z M +1 , . . . , z K ). In particular, the p osterior con tains information on the most likely conclusion, as w ell as on all p otential alternativ es. With regard to the represen tation of probabilit y dis- tributions in the brain, theoretical work [2] has argued in fav or of sample-based co des, i.e., instead of provid- ing the entire distribution at an y p oin t in time, samples z ( t ) ∼ p ( z 1 , . . . , z K ) are used as a proxy . When model- ing large systems, this oﬀers three important adv antages. First, appro ximate solutions can b e provided at an y time, with increasingly reliable results as the calculation pro- gresses (“anytime computing”). Secondly , marginaliza- tion comes at no cost, as p ( z k ) can b e determined by sim- ply neglecting the v alues of all other R Vs. Thirdly , some sampling algorithms supp ort a high degree of paralleliza- tion with an algorithmic structure that is reminiscent of neural net works [9]. Recently , a theory has b een sug- gested which com bines these adv antages b y implementing Mark ov c hain Monte Carlo (MCMC) sampling in spiking net works of abstract mo del neurons [7]. In this frame- w ork, spike patterns are in terpreted as samples of binary FIG. 1. (A) Spike patterns as samples of a random vector z . The R V z k is activ e for duration τ on (gra y bar) after each spik e. (B) In sto chastic neuron mo dels, in ternal state v ari- ables (red) mo dulate the ﬁring probability (gray). (C) In con trast, deterministic neurons elicit a spike when the mem- brane p otential (blue) reaches a threshold voltage. R Vs as follows (see Fig. 1A): z ( t ) k = 1 ⇔ Neuron k ﬁred in ( t − τ on , t ] . (1) The duration τ on of the active state following a spike is a free parameter for the timescale on which a neuron aﬀects do wnstream cells. The neuron mo del in [7] is inherently sto c hastic, with an instantaneous ﬁring probabilit y r k ( t ) = lim ∆ t → 0 p (spike in [ t, t + ∆ t )) ∆ t =  exp( v k ) τ if z k = 0 0 if z k = 1 (2) where v k is an abstract membrane p otential (Fig. 1B). In con trast to this idealized sto c hastic neuron mo del, in vitr o experiments hav e demonstrated the largely de- terministic nature of single neurons [10]. Similarly , mi- croscopic mo dels of neural circuits typically rely on de- terministic dynamics of their constituents (see Fig. 1C). The aim of this letter is to demonstrate how a net work of deterministic neurons in a biologically plausible spiking en vironment can quantitativ ely repro duce the sto c hastic dynamics required for sampling from a well-deﬁned dis- tribution and p erform inference given observ ations. In this letter, w e extend our earlier discussion of this ap- proac h [11]. W e start by calculating the dynamics of a single leaky integrate-and-ﬁre (LIF) neuron in a spiking noisy 2 en vironment and deriv e its activ ation function in the high-conductance state. F or this biologically relev ant regime, existing analytical descriptions of neuronal re- sp onse functions (e.g., [16, 17]) are not applicable. Here, w e provide an approach based on the propagation of the membrane auto correlation throughout spike bursts. This establishes an equiv alence to the abstract, inher- en tly sto chastic units (Eq. 2). On the netw ork lev el, we sho w how conductance-based synapses approximate the in teraction for sampling from a well-deﬁned target dis- tribution. F urthermore, w e show that the distribution sampled by LIF netw orks remains a go o d approximation of the target distribution ev en for large netw orks with strong recurrent interaction. W e complement our study with a demonstration of probabilistic inference by imple- men ting a small graphical mo del for pattern recognition in a recurrent net work of LIF neurons. LIF dynamics in a high-conductance state W e consider deterministic LIF neurons deﬁned b y C m d dt u k = g l ( E l − u k ) + I k , with membrane poten- tial u k , capacitance C m , leak potential E l , leak con- ductance g l and input curren t I k . When u k crosses a threshold ϑ from b elow, a spike is emitted and u k is re- set to % for a refractory p eriod τ ref . W e formally par- tition the total input curren t I k in to recurrent synaptic input, diﬀuse synaptic noise and additional external cur- ren ts: I k = I rec k + I noise k + I ext k . The curren ts I rec k and I noise k are mediated through synapses and ob ey I syn k = P i g ki ( E rev i − u k ) with reversal p otential E rev i of the i th synapse. The temp oral evolution of the conductance g ki is mo delled as a low-pass ﬁlter on pre-synaptic spik es: d dt g ki = − g ki /τ syn + P s w ki δ ( t − t s i ), with synaptic time constan t τ syn , weigh t w ki and spike times t s i . W e start b y considering a single neuron that receives diﬀuse synaptic noise I noise k in the form of P oisson spike trains from its surrounding. The capacity of recurrent net works to pro duce such noise has b een shown in, e.g., [12]. F or the follo wing analysis of individual neurons in a noisy environmen t, we omit the index k and set I rec = 0. When a conductance-based LIF neuron receiv es strong synaptic stim ulation, it enters a so-called high- conductance state (HCS, [13]), characterized by acceler- ated mem brane dynamics. F or an analytical treatment, it is adv antageous to rewrite the membrane dynamics as τ eﬀ d dt u = u eﬀ ( t ) − u , such that u decays to w ards an ef- fectiv e leak p otential u eﬀ with an eﬀective time constan t τ eﬀ = C m /g tot [14] (see App endix AI). The total conduc- tance g tot subsumes b oth leak age and synaptic conduc- tances. In a high input rate regime, τ eﬀ → 0 and the eﬀectiv e p otential u eﬀ simply b ecomes a linear transfor- mation of the synaptic noise input (see AII). Using meth- o ds similar to [15], it can b e shown that, in this regime, u eﬀ ( t ) can b e describ ed as an Ornstein-Uhlenbeck (OU) pro cess du = 1 τ syn ( ¯ u − u ) dt + σ dW with parameters (see AI II): ¯ u = { I ext + g l E l + P i ν i w i E rev i τ syn } / h g tot i , (3) σ 2 = n P i ν i [ w i ( E rev i − ¯ u )] 2 τ syn o / h g tot i 2 . (4) The activ ation function of LIF neurons in the HCS Similarly to the abstract model [7], w e deﬁne the re- fractory state of a neuron as z ( t ) = 1. The mapping of spik es to R V states (Eq. 1) naturally leads to the concept of an activ ation function p ( z = 1 | ¯ u ), where ¯ u = h u eﬀ i t . In the following, w e deriv e a general expression for the activ ation function of an LIF neuron under P oisson stim- ulus, which is of particular use in the cortically relev ant HCS regime. Fig. 2A sho ws an exemplary simulation (mem brane p oten tial and spike train) of such a scenario. The activ ation function in Fig. 2B is obtained by sweep- ing ov er I ext (see A VI I and A VI I I). Related setups hav e already b een examined in [16, 17]. Ho wev er, the metho ds in [16, 17] are tailored to certain parameter ranges, whic h, in particular, do not include the HCS regime with refractoriness. This is b ecause [16] re- quires τ syn  τ eﬀ , whereas [17] assumes τ ref  τ eﬀ , τ syn , whic h leads to discrepancies in the predicted activ ation functions (Fig. 2B). The deep er reason for the observ ed discrepancies is found in the lack of an appropriate prop- agation of the auto correlation of u eﬀ through τ ref . Here, w e prop ose a deriv ation that explicitly includes this prop- agation and thereby cov ers a large parameter space, in- cluding the cases studied in [16, 17], as well as the HCS. In Fig. 2A, tw o mo des of ﬁring can be observ ed: a “bursting” mode, where the eﬀective membrane poten- tial u eﬀ after the refractory p erio d is still ab ov e thresh- old, and a freely evolving mo de, where the neuron do es not spike again immediately after the refractory p erio d. This is illustrated b y the distributions in Fig. 2C. Our approac h relies on the calculation of burst lengths and their asso ciated o ccurrence probability P n . Denoting the relative o ccurrence of burst lengths n by P n , the av erage drift time from % to ϑ b etw een the k th and ( k + 1)st spik e in a burst b y τ b k and the av erage duration of the freely evolving mo de that follo ws an n - spik e-burst by T n , we identify the following relation: p ( z = 1 | ¯ u ) = P n P n · n · τ ref P n P n ·  nτ ref + P n − 1 k =1 τ b k + T n  . (5) Kno wing the OU pro cess gov erning u eﬀ , recursiv e ex- pressions for P n and T n can b e derived. These terms ha ve considerable impact on the activ ation function if τ ref ≈ τ syn . In the limit of strong noise stimuli, we can calculate the av erage drift time τ b k in a quasistatic ap- pro ximation [17] of u eﬀ and assume that u ≈ u eﬀ for the 3 FIG. 2. (A) Membrane p otential u ( t ) and spikes of an LIF neuron in a spiking noisy en vironment. (B) Prediction of the activ ation function (red) compared to simulation results (green), as well as to other predictions from literature [16, 17]. (C) In a HCS, u (blue) and u eﬀ (red) are nearly identical when the neuron is not refractory . After each refractory p erio d (gra y), the predicted distribution of u eﬀ (pink) is used for the propagation in (Eq. 6). (D) High-noise regime: theoretical prediction (red) vs. simulation results (green), with a ﬁtted logistic function σ ( ¯ u ) (blue). freely evolving mo de, thus obtaining (see AIV): P n =  1 − P n − 1 i =1 P i  R ∞ ϑ du n − 1 p ( u n − 1 | u n − 1 ≥ ϑ ) × h R ϑ −∞ du n p ( u n | u n − 1 ) i , (6) T n = R ∞ ϑ du n − 1 p ( u n − 1 | u n − 1 ≥ ϑ ) × h R ϑ −∞ du n p ( u n | u n < ϑ, u n − 1 ) h T ( ϑ, u n ) i i , (7) τ b k = R ∞ ϑ du k ln  % − u k ϑ − u k  × R ∞ ϑ du k − 1 p ( u k | u k ≥ ϑ, u k − 1 ) . (8) Fig. 2C displa ys an intuitiv e picture of the in tegrals in (Eq. 6) and (Eq. 7). The transfer function p ( u n | u n − 1 ) is the Green’s function of the OU pro cess at time t − t s = τ on and h T ( ϑ, u n ) i denotes the av erage time needed for the mem brane to reac h ϑ starting from u n , which can b e giv en in closed form [18]. The dep endency of P n , T n and τ b k on the moments of the OU process (Eq. 3,4) renders Eq. (5) a function of ¯ u . So far, we hav e only considered τ eﬀ = 0. T o further impro ve the prediction, w e take into accoun t ﬁnite v alues of τ eﬀ b y means of an expansion in p τ eﬀ /τ syn . Due to the symmetry of the PSP shap e in τ eﬀ and τ syn , this can b e done analogously to [16, 17], where the opp osite limit of large τ eﬀ and small τ syn is used (see AIV). A comparison b et w een our prediction of p ( z = 1 | ¯ u ) and results from a n umerical simulation is shown in Fig. 2B. Sampling with netw orks of LIF neurons W e can no w reconcile the response of LIF neurons with the inherently sto chastic neuron model (Eq. 2) that re- quires a logistic activ ation function for constant p oten- tial v : p ( z = 1 | v ) = σ ( v ) := [1 + exp( − v )] − 1 . In the HCS FIG. 3. (A) Spike pattern of a recurrent net w ork of K = 5 LIF neurons during sampling from a randomly generated Boltz- mann machine. (B) Sampled distribution p N ( z ) of netw ork states (blue) and target distribution p B ( z ) (red). p N ( z ) was estimated from ten 10 s simulation runs (errorbars: std. de- viation b etw een runs). (C) D KL ( p N || p B ) as a function of in tegration time T for 10 trials. The red dotted line shows con vergence for the theoretically optimal abstract mo del [7]. (D) D KL ( p N || p B ) when sampling for T = 10 6 ms from 100 diﬀeren t randomly generated target distributions. regime, this logistic activ ation can be approximated b y LIF neurons with high accuracy . Fig. 2D shows our the- oretical prediction and simulation results for the activ a- tion function alongside a ﬁtted logistic function. F or the translation from the LIF domain to the abstract model (Eq. 2) we ha ve emplo yed a linear mapping v = ( ¯ u − ¯ u 0 ) /α , (9) with scaling factor α and ¯ u 0 denoting the p oten tial for whic h p ( z = 1) = 1 2 . W e next connect the neurons to form a recurrent net- w ork. In addition to noise stim uli, an LIF neuron in a net work receives synaptic currents I rec k from other neu- rons. F or certain connectivit y structures, it is p ossible to predict the target distribution [7, 19] of states z ( t ) that arise from the sto chastic dynamics of the recurren t net work. W e use the emulation of Boltzmann mac hines (BMs) as an example case. The joint distribution reads: p B ( z ) = Z − 1 exp  z T W z / 2 + z T b  , (10) where W is a symmetric zero-diagonal weigh t matrix, b is a bias vector and Z is the normalizing partition func- tion. This probabilistic mo del underlies state-of-the-art mac hine learning algorithms for image [20] and sp eech recognition [21]. It has b een sho wn [7] that a netw ork of abstract neurons (Eq. 2) with linear mem brane p otentials v k = b k + P K j =1 W kj z j (11) 4 FIG. 4. Sampling in large netw orks. Spike raster (top left), joint distribution (bottom) and KL divergence (top righ t) as in Fig. 3A,B,C. Since the target distribution cannot b e computed analytically for 500 R Vs, we deﬁne the Gibbs-sampling estimate after 10 6 steps as a reference distribution. F or the joint, the sampled distribution ov er 5 R Vs (out of 500) obtained from the LIF netw ork (after T sim = 10 4 ms) is plotted alongside the Gibbs estimate. Error bars indicate standard deviation b et w een 10 simulation runs. In all panels, biases w ere drawn from a b eta distribution: b k ∼ 1 . 2 · [ B (0 . 5 , 0 . 5) − 0 . 5]. (A) Small w eights: W kj ∼ 0 . 6 · [ B (0 . 5 , 0 . 5) − 0 . 5]. (B) Mo derate w eights: W kj ∼ 1 . 2 · [ B (0 . 5 , 0 . 5) − 0 . 5]. (C) Strong weigh ts: W kj ∼ 2 . 4 · [ B (0 . 5 , 0 . 5) − 0 . 5]. Systematic deviations b et w een the sampled distribution and the reference manifest mainly in lo w-probability mo des and are due to the diﬀerence in PSP shap es b etw een the abstract mo del and LIF neurons. will sample from the desired target distribution (Eq. 10). This ﬁnding uses the fact that individual neurons sam- ple from the conditionals p ( z k = 1 | z \ k ) = σ ( v k ), with z \ k = { z j | j 6 = k } , in an MCMC up dating scheme. As sho wn ab o ve, LIF neurons in a spiking noisy envi- ronmen t closely appro ximate this logistic activ ation func- tion if the synaptic currents I rec k shift the mean mem- brane p otential ¯ u k according to the linear in teraction (Eq. 11). Using the linear transformation (Eq. 9) b et ween v k and ¯ u k , and estimating the eﬀect of a conductance- based synapse of weigh t w kj , we arrive at the following translation b et ween the abstract and the LIF domain (see A V): b k = ( ¯ u b k − ¯ u 0 k ) /α (12) W kj = 1 αC m w kj  E rev kj − µ  1 τ syn − 1 τ eff ×  1 − e e − τ eﬀ τ syn  e − τ syn τ eff − 1   , (13) where ¯ u b k is the mean p oten tial ¯ u k that establishes p ( z k = 1 | ¯ u k = ¯ u b k ) = σ ( b k ) in Eq. (5), and E rev kj denotes the rev ersal p oten tial for synapse w kj . The idea b ehind (Eq. 13) is to match the integrals of individual p ostsynap- tic p otentials (PSPs) on v k and ¯ u k . Since the membrane loses any memory following a reset, in con trast to [7], we use the synaptic conductance as a memory carrier. The remaining systematic diﬀerence to the abstract mo del lies in the additive – instead of renewing – nature of PSPs elicited by the same presynaptic neuron, which has a noticeable eﬀect in case of fast consecutive spikes (bursts). Since the membrane p otential closely follows the eﬀective potential in the HCS, renewing PSPs can b e ac hieved by using renewing postsynaptic conductances. F or this, we ha ve used short-term synaptic depression [22] with a recov ery time constant equal to τ syn . The sampling qualit y with net works of LIF neurons w as examined in computer sim ulations of BMs, with ran- domly dra wn parameters b k and W kj (Fig. 3A,B). The target distribution p B ( z ) is appro ximated by the distri- bution p N ( z ) of netw ork states with τ on = 10 ms. The c hosen integration time T = 10 s displays a conserv ativ e estimate of the maximum duration a neuronal ensemble will exp erience s table stimulus conditions in a b ehaving organism and can thus b e exp ected to sample from a stable target distribution. F or this in tegration time, we ﬁnd that the recurrent netw ork of LIF neurons accurately enco des the target distribution, within the precision im- p osed by the sample-based representation. The net work distribution p N ( z ) b ecomes increasingly more reliable as more samples are considered (Fig. 3C). After few sam- ples, the netw ork has generated a coarse approximation of p B ( z ) that could serve as an “educated guess” in on- line computation tasks. Only for simulation times T well b ey ond biologically relev ant timescales do systematic er- rors in p N ( z ) b ecome apparent. The sampling quality holds for a v ariety of target distributions (Fig. 3D, see AIX for simulation details). These observ ations hold for larger-scale net works as w ell (Fig. 4). F or weak coupling, the net works are in an asynchronous irregular state of ﬁring, whic h enables 5 FIG. 5. (A) Graphical model used for the probabilistic inference task. (B) A Gaussian likelihoo d model provides input to the sampling neurons. (C) Two dimensional pro jection of states z ( t ) when sampling from the prior p ( z ), which was trained to store hand-written digits 0, 3 and 4. Solid line: net work tra jectory ov er 200 ms. Color maps: Marginals of z k a veraged ov er 25 ms. The time arro w co vers the duration of the red tra jectory and consecutive snapshots are 25 ms apart. (D) As in (C) when provided with incomplete input y that is incompatible with digit 0 and ambiguous with resp ect to digits 3 and 4. highly accurate sampling of the target distribution. As synaptic weigh ts increase, the net work activit y exp ect- edly b ecomes more synchronous and the sampled distri- bution ov erall less accurate, esp ecially in low-probabilit y regions of the state space, but the high-probability mo des are alwa ys sampled from with high ﬁdelity . Demonstration of probabilistic inference W e conclude our inv estigation of sampling in recurren t net works of LIF neurons with an example of Bay esian inference based on incomplete observ ations. A fully con- nected BM of K = 144 neurons, aligned on a 12 × 12 grid, w as trained as an asso ciative netw ork [23] to store pat- terns of hand-written digits 0, 3 and 4 in the weigh ts W kj and biases b k . Each pixel of the image grid was assigned to one netw ork neuron. The resulting joint distribution p ( z ) displays “prior knowledge” stored by the net work. The probabilistic model was augmented b y adding real-v alued input channels for eac h pixel, asso ciated with random v ariables y k ∈ R , 1 ≤ k ≤ K . The result- ing generative model p ( y , z ) has the structure shown in Fig. 5A and connects the latent net work v ariables z k to observ able inputs y k b y means of likelihoo d func- tions p ( y k | z k ), whic h w e hav e chosen to b e Gaussian with unit v ariance (Fig. 5B). The likelihoo ds p ( y k | z k ) tend to align the netw ork state with the observ ation, i.e. z k = 1 for y k > 0, while the prior p ( z ) reconciles the ob- serv ations with kno wledge on consistent activ ation pat- terns z . The task for the net work is to calculate and represen t the p osterior distribution according to Bay es’ rule: p ( z | y ) ∝ p ( z ) · p ( y | z ). A short deriv ation (see A VI) sho ws that the p osterior p ( z | y ) is a BM for an y input y with the following abstract mem brane p otential: v k = b k + y k + P j W kj z j . (14) In the LIF domain, the sum b k + y k is equiv alent to an eﬀectiv e bias (Eq. 74) and corresp onds to an external cur- ren t I ext k = I b k + I y k that shifts ¯ u k appropriately . Thus, a neuron receives synaptic input from recurren t connec- tions and noise sources, as well as an external current, i.e., I k = I rec k + I noise k + I ext k . In case of I y k = 0 ∀ k , the netw ork samples from the prior distribution p ( z ) = p ( z | y = 0 ). A tw o- dimensional pro jection of net work states z ( t ) ∼ p ( z ) is sho wn in Fig. 5C. The sampled distribution has three dis- tinct mo des that corresp ond to the three hand-written digits stored in the recurren t weigh t matrix. A closer lo ok at the netw ork tra jectory reveals that the system sta ys in one mo de (“digit”) for some duration, trav erses the state space and then samples from a diﬀerent mo de of the distribution. A t ypical inference scenario with incomplete observ a- tions is sho wn in Fig. 5D. F our input c hannels at the cen- ter were pick ed to inject p ositiv e currents I y k > 0 to the net work while all other inputs remained uninformativ e. P ositive currents I y k w ere chosen such that the observ a- tion y app eared incompatible with digit 0, and remained am biguous with resp ect to digits 3 and 4 (see AX). In accordance with Bay es’ rule, the resulting bimo dal p os- terior distribution p ( z | y ) has a suppressed 0-mo de, but preserv es the 3 and 4 mo des. Discussion W e ha v e sho wn ho w recurren t net w orks of conductance-based neurons in a spiking noisy en vi- ronmen t can perform probabilistic inference through sampling from a w ell-deﬁned posterior distribution. Our approac h extends Bay esian spiking net work im- plemen tations to deterministic neuron mo dels widely used in computational neuroscience. W e hav e pro vided an analytical deriv ation of the burst y ﬁring response of LIF neurons under P oisson b ombardmen t, which holds for a wide range of parameter regimes (b oth high and lo w ratios of τ syn /τ eﬀ , as w ell as b oth high and lo w τ ref ). Our approach is based on the propagation of the mem brane auto correlation throughout bursts and can thereb y provide a prediction of the activ ation function in a regime where existing approaches [16, 17] do not hold 6 (see also Fig. 6 in AXI). W e ha ve further shown how high-frequency spiking inputs that could b e provided b y the surrounding netw ork can lead to fast membrane dynamics, which enable individual LIF neurons to cor- rectly enco de conditional distributions given information from their presynaptic partners. Thereby , our deriv ation also iden tiﬁes a p otential functional role of biologically observ ed high-conductance states and synaptic memory within a Bay esian framework of brain computation. F or mathematical tractabilit y , simplifying mo deling assumptions had to b e made. The neuron mo del uses an absolute refractory time τ ref , whic h matc hes the ac- tiv ation time constant τ on , and neglects any gradual re- co very eﬀects. On the netw ork level, we hav e assumed statistically independent noise sources and instan taneous axonal transmission. One imp ortant diﬀerence to cortical structure is the requirement of a symmetric connectivit y matrix. The precise symmetry is a consequence of all neurons sharing the same parameters and can b e relaxed as neurons b ecome div erse ( α 7→ α k in Eq. 13). F urther- more, while Dale’s principle is known to not hold univer- sally [24], negative coupling in large netw orks b et w een otherwise excitatory neurons could b e, in principle, in- tro duced through populations of inhibitory interneurons. LIF PSPs diﬀer from the theoretically optimal rectangu- lar shap e, which could impair conv ergence to the target distribution outside of the high noise regime [25]. How- ev er, computer sim ulations indicate that in man y bio- logically relev ant scenarios the ab ov e approximations are not critical (see also Fig. 7 in AXII). In particular, the sampling prop erties of LIF netw orks remain preserved as net work size and interaction strength increase (Fig. 4). Em b edding LIF sampling in cortical-size netw orks ap- p ears feasible in ligh t of our results, but remains a matter for future work. F or neuroscientiﬁc mo deling, our analysis of LIF neu- rons can b e readily transferred to other neuron [26] and synapse mo dels. Bey ond neuroscience, the ability to p er- form probabilistic inference with deterministic neurons displa ys a promising computing paradigm for neuromor- phic hardw are systems, whic h typically implement ph ysi- cal models of integrate-and-ﬁre neurons [27, 28]. The dis- tributed nature of the prop osed LIF sampling netw orks allo ws to exploit the inherent parallelism of neuromor- phic architectures and fosters their application to online data ev aluation and rob otics. In this context, our results ha ve already pro vided the basis for the implementation of Bay esian netw orks [29] and learning [30]. APPENDIX App endix I: Conductance-based LIF neuron mo del W e restate the set of equations that gov ern the conductance-based LIF mo del. F or the mem brane p o- ten tial, we ha ve C m du dt = g l ( E l − u ) + I , (15) with capacitance C m , membrane potential u k , leak p o- ten tial E l , leak conductance g l and input curren t I . The input curren t I can b e formally partitioned in to recur- ren t synaptic inputs, diﬀuse synaptic noise and addi- tional “external” currents, i.e., I = I rec + I noise + I ext . (16) I ext ma y represent external curren t stim uli, av erage synaptic stim ulus curren ts, or c hanges in the leak mec ha- nism (i.e., c hanges in g l or E l ). The total synaptic current I syn = I rec + I noise ob eys the equation I syn = X syn i g syn i ( E rev i − u ) , (17) where g syn i represen ts the conductance at the i th synapse and E rev i the corresponding reversal p otential. The synaptic conductance ob eys the ODE dg syn i dt = − g syn i τ syn + X spk s w i δ ( t − t s ) , (18) with synaptic time constant τ syn and weigh t w i . The sum runs ov er all presynaptic spikes s . The solution to this equation is a sup erposition of exp onentials: g syn i = X spk s w i Θ( t − t s ) exp  − t − t s τ syn  . (19) Putting all of the abov e together, we obtain the full ODE for the membrane p oten tial C m du dt = g l ( E l − u ) + X i X spk s w i Θ( t − t s ) exp  − t − t s τ syn  ( E rev i − u ) + I ext . (20) W e can now divide the RHS of (Eq. 20) by g tot = g l + P i g syn i and rearrange the terms in order to obtain τ eﬀ du dt = u eﬀ − u , (21) with a new eﬀective membrane time constant τ eﬀ = C m g tot (22) 7 and eﬀective leak p oten tial u eﬀ ( t ) = g l E l + P i g syn i ( t ) E rev i + I ext g tot ( t ) . (23) This transformation is routinely used in studies of conductance-based neurons (see, e.g., [14]). Here, w e ha ve made the time dep endencies explicit, since the fol- lo wing HCS approximation will serve to eliminate t in the denominator. App endix I I: The high-conductance state (HCS) In a ﬁrst approximation, assuming a rapidly-ﬁring P oisson background ( ν syn → ∞ ), the total av erage con- ductance can b ecome arbitrarily large ( h g tot i → ∞ ), causing the membrane p otential to follo w the eﬀective p oten tial nearly instantaneously ( h τ eﬀ i → 0). Eq. (23) can then b e rewritten as u ≈ u eﬀ = g l E l + P i h g syn i i E rev i + P i ∆ g syn i E rev i + I ext h g tot i + P i ∆ g syn i , (24) where ∆ g syn i = g syn i − h g syn i i (25) denotes the ﬂuctuations of the synaptic conductances. F or a single Poisson source with rate ν i connected to the neuron by a synapse with weigh t w i and time con- stan t τ syn , the conductance course can b e seen as a sum of indep enden t random v ariables, each of them represent- ing the conductance change caused by a single spike. In the limit of large ν i , the central limit theorem guaran- tees the con vergence of the conductance distribution to a Gaussian, with moments given b y h g syn i i = X spk s  w i Θ( t − t s ) exp  − t − t s τ syn   = lim T →∞ h N i T w i Z T 0 exp  − t τ syn  dt = w i ν i τ syn (26) and V ar [ g syn i ] = X spk s V ar  w i Θ( t − t s ) exp  − t − t s τ syn  = lim T →∞ h N i (*  w i Θ( t − t s ) exp  − t − t s τ syn  2 + +   w i Θ( t − t s ) exp  − t − t s τ syn   2 ) = lim T →∞ ν i T ( 1 T w 2 i Z T 0 exp  − 2 t τ syn  dt − 1 T 2 " Z T 0 exp  − t τ syn  dt #) = 1 2 w 2 i ν i τ syn , (27) so the relative ﬂuctuations of g syn are of the order p V ar [ g syn i ] h g syn i i = s 1 2 ν i τ syn (28) and v anish in the limit of large ﬁring rates. This w arrants an expansion of Eq. (24) in ∆ g syn i , ∀ i . Considering only the ﬁrst-order term we obtain u ( t ) = I ext + g l E l + P i g syn i ( t ) E rev i h g tot i , (29) whic h renders u simply a linear transformation of the synaptic noise current J syn = P i g syn i E rev i . App endix I I I: Deriv ation of the equiv alence to an OU pro cess F rom Eq. (18), w e can ﬁnd that the synaptic noise J syn ob eys the ﬁrst-order inhomogenous ODE dJ syn dt = − J syn τ syn + X syn i X spk s ∆ J syn i δ ( t − t s ) , (30) where ∆ J syn i = w i E rev i . This equation is highly reminis- cen t of the ODE that deﬁnes the OU pro cess dx ( t ) = θ [ µ − x ( t )] dt + σ dW ( t ) . (31) It is well-kno wn that the PDF of the OU pro cess f ( x, t | x 0 ) = s θ π σ 2 (1 − e − 2 θt ) × exp  − θ σ 2  ( x − µ + ( µ − x 0 ) e − θt ) 2 1 − e − 2 θt  (32) is the unique solution of the F okk er-Planc k equation 1 θ ∂ f ( x, t ) ∂ t = ∂ ∂ x [( x − µ ) f ] + σ 2 2 θ ∂ 2 f ∂ x 2 (33) 8 with starting condition x 0 := x ( t = 0). In the following, w e prov e that, under certain assumptions, the distribu- tion of the synaptic input J syn ob eys the same F okker- Planc k equation. T o this end, w e follo w an approac h similar to [15]. Consider the PDF of the synaptic input f ( J syn , t ). W e can use the Chapman-Kolmogorov equation to describe its evolution after a short time in terv al ∆ t as an in tegral o ver all p ossible intermediate states J 0 : f ( J syn , t + ∆ t ) = Z ∞ −∞ f ( J syn , t + ∆ t | J 0 , t ) f ( J 0 , t ) dJ 0 (34) F or a small enough ∆ t , the probability of the o ccurrence of multiple spikes within ∆ t can be neglected. As in- coming spikes are assumed to b e generated by P oisson pro cesses, the probabilit y of a single spik e occurring in ∆ t is ∆ t P i ν i . By summing ov er the t wo p ossible histo- ries of J syn within ∆ t (either a single incoming spike or no spike at all), we can use Eq. (30) to ﬁnd f ( J syn , t + ∆ t | J 0 ) = " 1 − ∆ t X i ν i # δ  J syn − J 0 exp  − ∆ t τ syn  + ∆ t X i ν i δ  J syn − ( J 0 + ∆ J syn i ) exp  − ∆ t τ syn  , (35) where ν i represen ts the aﬀerent ﬁring frequency at the i th synapse. Plugging this into Eq. (34) and integrating o ver J 0 yields f ( J syn , t + ∆ t ) = 1 − ∆ t X i ν i ! exp  ∆ t τ syn  f  J syn exp  ∆ t τ syn  , t  + ∆ t X i ν i exp  ∆ t τ syn  f  J syn exp  ∆ t τ syn  − ∆ J syn i , t  . (36) W e can now expand f ( x, t + ∆ t ) up to ﬁrst order in ∆ t f ( J syn , t + ∆ t ) ≈ f ( J syn , t ) + ∂ f ( J syn , t + ∆ t ) ∂ ∆ t     ∆ t =0 ∆ t (37) and rearrange the terms to obtain f ( J syn , t + ∆ t ) − f ( J syn , t ) ∆ t = ∂ f ( J syn , t + ∆ t ) ∂ ∆ t     ∆ t =0 = ( − X i ν i exp  ∆ t τ syn  f  J syn exp  ∆ t τ syn  , t  + 1 − ∆ t X i ν i ! 1 τ syn × ( exp  ∆ t τ syn  f  J syn exp  ∆ t τ syn  , t  + exp  2 ∆ t τ syn  J syn ∂ f h J syn exp  ∆ t τ syn  , t i ∂ J syn exp  ∆ t τ syn  ) + X i ν i exp  ∆ t τ syn  f  J syn exp  ∆ t τ syn  − ∆ J syn i , t  + ( . . . )∆ t ) ∆ t =0 (38) By taking the limit ∆ t → 0, w e obtain: ∂ f ( J syn , t ) ∂ t = 1 τ syn ∂ ∂ J syn [ J syn f ( J syn , t )] + X i ν i [ f ( J syn − ∆ J syn i , t ) − f ( J syn , t )] . (39) In the limit of small synaptic weigh ts (i.e., ∆ J syn i → 0), w e can expand the second term on the RHS up to the second order in ∆ J syn i . This yields, after some rearrange- men t ∂ f ( J syn , t ) ∂ t = 1 τ syn ∂ ∂ J syn " J syn − X i ν i ∆ J syn i τ syn ! f ( J syn , t ) # + P i ν i ∆ J syn i 2 2 ∂ 2 f ( J syn , t ) ∂ J syn 2 , (40) whic h is the exact equiv alent of the F okker-Planc k equa- tion of the OU pro cess (Eq. 33). Since u ( t ) is only a linear transformation of J syn ( t ), it can also b e approxi- mated by an OU pro cess in the limit of large input fre- quencies and small synaptic weigh ts, with Eq. (29) and Eq. (40) giving the sp eciﬁc time constant, mean v alue 9 and v ariance: θ = 1 τ syn , (41) µ = I ext + g l E l + P i ν i w i E rev i τ syn h g tot i , (42) σ 2 2 = P i ν i [ w i ( E rev i − µ )] 2 τ syn 2 h g tot i 2 . (43) W e conclude this section with tw o important notes. Firstly , for the ab o ve metho dology to b e generally applic- cable, we m ust b e able to tak e the limit ∆ J syn i → 0 for arbitrary ﬁrst and second momen ts of f ( J syn , t ) without mo difying them. This is possible if at least one excitatory and one inhibitory input is present, which then giv e us t wo degrees of freedom with a prop er c hoice of ν exc → ∞ and ν inh → ∞ . Secondly , all higher moments (3 and ab o v e) need to v anish in the ab ov ementioned limit. This has b een sho wn to also b e the case under the ab o ve con- ditions [31]. App endix IV: Deriv ation of the activ ation function W e can distinguish b etw een tw o ﬁring modes of the neuron. The ﬁrst mo de can be classiﬁed as “burst spik- ing” and o ccurs when multiple spikes o ccur in rapid succession with an expected ISI of h ∆ t k i = τ ref +  t s +1 k − t s k  s = τ ref + τ b k , where τ b k represen ts the av- erage drift time from the reset to the threshold p otential follo wing the k th refractory perio d within a burst. In this case, for each spike within a burst, u eﬀ ( t s ) ≥ ϑ , (44) and also, for all but the last spike, u eﬀ ( t s + τ ref ) ≥ ϑ . (45) The second mo de app ears b et ween suc h bursts, where the mem brane p oten tial ev olv es freely in the subthreshold regime. If we deﬁne, just like in the abstract mo del, that the k th neuron is in the state z k = 1 for a duration τ on = τ ref follo wing a spik e, we can write p ( z = 1) = P n P n · n · τ ref P n P n ·  nτ ref + P n − 1 k =1 τ b k + T n  , (46) where P n represen ts the distribution of burst lengths (conditioned on the existence of the ﬁrst spike) and T n is the mean time interv al b et ween the end of a burst (i.e., the endp oint of its last refractory perio d) and the next spik e. The v ariables P n , T n and τ b k dep end on all neuron and noise parameters, but for calculating the activ ation function (Fig. 2 in the main manuscript and Fig. 6 in the App endix), w e only v ary ¯ u . W e can now calculate b oth P n and T n iterativ ely . The idea b ehind this approac h is to propagate the membrane p oten tial PDF from spik e to spike within a burst and cut oﬀ the irrelev ant parts for a particular burst length n . W e denote the spik e times within a burst of length n b y t 0 , . . . , t n − 1 and the endp oint of such a burst by t n := t n − 1 + τ ref . F or brevity , w e also use u i := u ( t i ). Assuming a ﬁrst spike at some time t 0 ( u 0 := u ( t 0 ) = ϑ ), a “burst” of length n = 1 requires a subthreshold free mem brane p oten tial after the ﬁrst refractory perio d ( u 1 := u ( t 0 + τ ref ) < ϑ ). This o ccurs with probabilit y P 1 : = p ( u 1 < ϑ | u 0 = ϑ ) = Z ϑ −∞ du 1 p ( u 1 | u 0 = ϑ ) | {z } I 1 , (47) where p ( u i +1 | u i ) := f ( u, τ ref | u i ), whic h was deﬁned in Eq. (32). On av erage, the neuron then stays in the sub- threshold regime for a perio d equal to the mean ﬁrst pas- sage time from u 1 to ϑ , so the mean duration of the time in terv al un til the onset of the next burst can b e expressed as T 1 = Z ϑ −∞ du 1 p ( u 1 | u 0 = ϑ ) h T ( ϑ, u 1 ) i . (48) The ﬁrst-passage time problem of the OU process has often b een discussed [32]. While no closed-form expres- sion for the distribution of ﬁrst-passage times T ( b, a ) = inf t ≥ 0 : x ( t ) = b | x (0) = a is kno wn, its momen ts can b e computed analytically [18]. In particular, the mean ﬁrst passage time reads h T ( b, a ) i = θ σ r π 2 Z b a dx exp  ( x − µ ) 2 2 σ 2  1 + erf  x − µ √ 2 σ  . (49) A burst of n = 2 spikes can only o ccur when the eﬀec- tiv e membrane p otential lies ab ov e the spiking threshold ( u 1 ≥ ϑ ) after the ﬁrst refractory p erio d and b elow after the second ( u 2 < ϑ ). This makes P 2 and T 2 recursiv e functions of P 1 : P 2 = p ( u 2 < ϑ, u 1 ≥ ϑ | u 0 = ϑ ) = p ( u 1 ≥ ϑ | u 0 = ϑ ) p ( u 2 < ϑ | u 1 ≥ ϑ, u 0 = ϑ ) (Eq . 47) = (1 − I 1 ) | {z } =1 − P 1 Z ∞ ϑ du 1 p ( u 1 | u 1 ≥ ϑ ) " Z ϑ −∞ du 2 p ( u 2 | u 1 ) # | {z } I 2 (50) T 2 = Z ∞ ϑ du 1 p ( u 1 | u 1 ≥ ϑ ) × " Z ϑ −∞ du 2 p ( u 2 | u 2 > ϑ, u 1 ) h T ( u 2 , ϑ ) i # , (51) 10 where p ( u i | u i ≥ ϑ ) is a shorthand notation for p ( u i | u i ≥ ϑ, u i − 1 ≥ ϑ, . . . , u 1 ≥ ϑ, u 0 = ϑ ). In particular, this represen ts a renormalization of the PDF of the eﬀective mem brane p otential to v alues ab ov e the spiking threshold after i refractory p eriods. W e can now contin ue this recursion up to an arbitrary burst length and write P n = p ( u n < ϑ, u n − 1 ≥ ϑ, . . . , u 1 ≥ ϑ | u 0 = ϑ ) = p ( u 1 ≥ ϑ | u 0 = ϑ ) × p ( u n < ϑ, u n − 1 ≥ ϑ, . . . , u 2 ≥ ϑ | u 1 ≥ ϑ, u 0 = ϑ ) (Eq . 47) = (1 − I 1 ) p ( u 2 ≥ ϑ | u 1 ≥ ϑ, u 0 = ϑ ) × p ( u n < ϑ, u n − 1 ≥ ϑ, . . . , u 3 ≥ ϑ | | u 2 ≥ ϑ, u 1 ≥ ϑ, u 0 = ϑ ) (Eq . 50) = (1 − I 1 )(1 − I 2 ) p ( u 3 ≥ ϑ | u 2 ≥ ϑ, u 1 ≥ ϑ, u 0 = ϑ ) × p ( u n < ϑ, u n − 1 ≥ ϑ, . . . , u 4 ≥ ϑ | | u 3 ≥ ϑ, . . . , u 1 ≥ ϑ, u 0 = ϑ ) = n − 1 Y i =1 (1 − I i ) p ( u n < ϑ | u n − 1 ≥ ϑ, . . . , u 1 ≥ ϑ, u 0 = ϑ ) (52) = (1 − n − 1 X i =1 P i ) × Z ∞ ϑ du n − 1 p ( u n − 1 | u n − 1 ≥ ϑ ) " Z ϑ −∞ du n p ( u n | u n − 1 ) # | {z } I n (53) T n = Z ∞ ϑ du n − 1 p ( u n − 1 | u n − 1 ≥ ϑ ) × " Z ϑ −∞ du n p ( u n | u n < ϑ, u n − 1 ) h T ( u n , ϑ ) i # (54) The transition from a product to a sum betw een Eq. (52) and Eq. (53) requires the identit y n − 1 Y i =1 (1 − I i ) = 1 − n − 1 X i =1 P i , (55) whic h can b e easily shown by induction from P n = I n Q n − 1 i =1 (1 − I i ) (Eq. 52) and P 1 = I 1 (Eq. 47). Since lim n →∞ P n = 0, one can stop the recursion at some small enough P n . What remains to b e calculated is the av erage time-to threshold τ b k within a burst that follows the k th refrac- tory p erio d. Since we assume a HCS, we are lo oking at a regime in which τ eﬀ  τ syn . Therefore, w e can assume u eﬀ to b e approximately unchanged during the short time in terv al τ b k (adiabatic approximation, see also [17]). F or a ﬁxed u k , the jump time can b e easily calculated from Eq. (21): τ b k ( u k ) = ln  % − u k ϑ − u k  . (56) The av erage jump time can then be obtained b y inte- grating o ver all suprathreshold v alues of u k , whic h in turn hav e probabilities that follow from integrating o ver all suprathreshold v alues of u k − 1 : τ b k = Z ∞ ϑ du k ln  % − u k ϑ − u k  Z ∞ ϑ du k − 1 p ( u k | u k > ϑ, u k − 1 ) . (57) With Eq. (46), (53), (54), (57) and (49), one could no w predict the activ ation function of an LIF unit in an extreme high-noise regime ( τ eﬀ → 0). W e can, ho wev er, generalize our approac h b y taking the ﬁnite nature of the eﬀectiv e time constan t into account. If w e go bac k to Eq. (21) and lea v e τ eﬀ = C / h g tot i small but ﬁnite, w e can still p erform all the remaining appro ximations, but are required to mo dify Eq. (29): τ eﬀ ˙ u ( t ) = I ext + g l E l h g tot i + J syn ( t ) h g tot i − u ( t ) . (58) T ogether with Eq. (30), w e no w hav e a system of ﬁrst- order ODEs whic h can b e solved analytically by standard tec hniques (v ariation of constants). The PSPs are then no longer a linear transformation of the exponentially shap ed PSCs, but rather alpha-shap ed (more precisely , a diﬀerence of exp onentials): u s ( t ) = Θ( t − t s ) A  e − t − t s τ eff − e − t − t s τ syn  τ eﬀ − τ syn , (59) with A = w i ( E rev i −h u eff i ) τ syn i h g tot i . This shap e causes a lo wer PSP p eak than in the case of exp onential PSPs, decreas- ing the ov erall width of the mem brane potential distri- bution. In tuitively sp eaking, this results in a horizontal shift and compression of the activ ation function. More recen tly , analytical treatmen ts of these phenom- ena hav e b een prop osed [33]. In these approaches, large mem brane time constants (equiv alent to a long τ eﬀ ) and small synaptic time constants are usually considered. Ho wev er, Eq. (59) is symmetric in τ eﬀ and τ syn , so the same argument applies to our case as w ell, but the tw o time constants need to b e switched. It is, for example, p ossible to correct the ﬁrst passage time from the reset to the threshold p oten tial by using an expansion in p τ 0 /τ (with τ 0 and τ b eing the smaller and the larger of the t wo time constants, resp ectively) [16]: h T ( ϑ, u ) i = τ √ π Z ϑ eff − µ σ u − µ σ dx exp( x 2 )[erf ( x ) + 1] , (60) with µ and σ 2 the ﬁrst tw o momen ts of the free membrane p oten tial distribution and an eﬀective threshold ϑ eﬀ ≈ ϑ − ζ  1 2  r τ 0 2 τ σ , (61) 11 in which ζ denotes the Riemann zeta function. In our particular case, the expansion is done in p τ eﬀ /τ syn , so τ 0 = τ eﬀ and τ = τ syn . With this approximation, w e assume that u conv erges from ρ to u eﬀ in negligible time after it is released from the refractory state. Afterw ards, its conv ergence to u eﬀ is determined by Eq. (21). Note ho w Eq. (60) is equiv alent to a change of the integration v ariable and limits in the original equation (Eq. 49) for the ﬁrst passage time. App endix V: T ranslation of synaptic w eights A suﬃcien t condition for a single neuron to sample from the correct conditional distribution is giv en b y its activ ation function: p ( z k = 1 | z \ k ) = σ ( v k ) . (62) The relationship betw een the abstract model and the LIF implemen tation is deﬁned b y the lateral dilation α and relativ e oﬀset ¯ u 0 k of the LIF activ ation function: p ( z k = 1 | z \ k ) = σ  ¯ u k − ¯ u 0 k α  . (63) The parameters α and ¯ u 0 k can b e determined b y ﬁtting Eq. (63) either to sim ulation results or to the theoreti- cal prediction (Eq. 46). As a consequence, also synaptic w eights need to b e rescaled by the factor α . Addition- ally , the diﬀerence in PSP shap es needs to b e taken into accoun t. W e choose a translation rule in whic h w e set the LIF synaptic w eigh ts w ij suc h that the area under a PSP (Eq. 59) during the refractory state of the corresp ond- ing aﬀerent neuron (i.e., for a duration τ ref ) is equal to W ij τ ref α : W kj τ ref α = Z τ ref 0 w kj ( E rev kj − h u eﬀ i ) τ syn h g tot i × exp  − t − t s τ eff  − exp  − t − t s τ syn  τ eﬀ − τ syn dt = w kj τ syn h g tot i  E rev kj − µ  τ eﬀ − τ syn × h τ syn  e − τ ref τ syn − 1  − τ eﬀ  e − τ ref τ eff − 1 i . (64) By setting τ ref = τ syn , w e obtain the mapping b etw een the abstract and LIF synaptic weigh t domains: W kj = 1 αC m w kj  E rev kj − µ  1 − τ syn τ eff × h τ syn  e − 1 − 1  − τ eﬀ  e − τ syn τ eff − 1 i . (65) Additionally , depressing short-term plasticity [22] has b een applied to attenuate the amplitudes of consecu- tiv ely arriving alpha-shap ed PSPs from a net work neuron and em ulate renewing synapses. In particular, within the Tso dyks-Markram short-term plasticity mo del [22], the synaptic eﬃcacy parameter and recov ery time constant ha ve b een chosen as U SE = 1 and τ rec = τ syn , resp ec- tiv ely . Analogously to the weigh ts, the biases can b e deter- mined from the condition σ ( b k ) = σ (  ¯ u b k − ¯ u 0 k  /α ) in the absence of recurrent activit y , i.e. I rec k = 0. App endix VI: Probabilistic mo del for demonstration of inference W e deﬁne a joint mo del p ( y , z ) ov er real-v alued in- put nodes y = ( y 1 , . . . , y K ) and binary laten t v ariables z = ( z 1 , . . . , z K ) as sk etched in Fig. 4A of the main man uscript. The real-v alued v ariables y k enco de inten- sities of the input pixels. The latent v ariables z k corre- sp ond to neurons in the netw ork. Our aim is to demon- strate that the net work can sample from the p osterior distribution p ( z | y ). F or this example, we ha ve c hosen a particularly simple lik eliho o d function, namely a Gaussian emission mo del with v ariance σ 2 = 1 and mean v alues µ = ± 1 2 . F rom the graphical mo del we iden tify the structure of the joint distribution p ( y , z ) = p ( z ) · K Y k =1 p ( y k | z k ) (66) where p ( z ) is a Boltzmann distribution and the lik eliho o d is deﬁned by p ( y k | z k ) = N ( y k ; µ = 1 / 2) z k · N ( y k ; µ = − 1 / 2) 1 − z k , (67) whic h is equiv alent to log p ( y k | z k ) = z k · [log N ( y k ; 1 / 2) − log N ( y k ; − 1 / 2)] + log N ( y k ; − 1 / 2) = z k · y k + log N ( y k ; − 1 / 2) , (68) using normal distributions N ( y ; µ ) := exp[ − ( y − µ ) 2 / 2] / √ 2 π (69) with unit v ariance. The p osterior of this mo del reads: p ( z | y ) = p ( y , z ) p ( y ) = exp  1 2 z T W z + z T ( b + y )  . Norm , (70) where the normalizing constant dep ends on the input y , but is independent of the netw ork v ariables and thus 12 deﬁnes a Boltzmann distribution ov er z for any y . In particular, w e iden tify the abstract mem brane p oten tial v k = b k + y k + P j W kj z j for sampling from p ( z | y ) by means of a spiking netw ork. App endix VI I: Sim ulation parameters All simulations hav e b een p erformed with the NEU- R ON simulation pac k age [34] and the PyNN API [35], with a time step of dt = 0 . 01 ms. F or the LIF neuron, w e hav e chosen the following parameters (compare with, e.g., [36] for parameters ﬁtted to exp erimen tal data): C m 0.1 nF mem brane capacitance g l 5 nS leak conductance E l -65 mV leak p oten tial ρ -53 mV reset p oten tial E rev exc 0 mV excitatory reversal p oten tial E rev inh -90 mV inhibitory reversal p oten tial ϑ -52 mV threshold voltage τ syn 10 ms synaptic time constant τ ref 10 ms refractory time constant T ABLE I. Neuron parameters used for the simulations in the main manuscript. Synaptic noise w as implemen ted as b om bardment b y inhibitory and excitatory P oisson stimuli with rates ν inh = ν exc = 5000 Hz. The excitatory synaptic w eigh t for the noise stimuli was set to w noise exc = 0 . 0035 µ S. The inhibitory weigh t w noise inh w as adjusted as to yield p ( z k = 1) ≈ 0 . 5 with no curren t stim ulus presen t. F or ab o v e parameters, this happens at an av erage free mem- brane p otential of ¯ u = − 55 mV . This determines w noise inh according to     E rev inh − ¯ u E rev exc − ¯ u     = w noise exc w noise inh . (71) App endix VI I I: The activ ation function of LIF neurons in a spiking noisy en vironment (Fig. 2 in main man uscript) In order to sw eep through the activ ation function, the external current I ext w as v aried. How ev er, in order to facilitate a comparison with the logistic activ ation func- tion of the abstract mo del, we hav e represented p ( z = 1) as a function of ¯ u instead. The latter is equiv alent to the mean µ of the corresp onding Ornstein-Uhlenbeck pro- cess, with Eq. (42) allowing a direct translation b etw een I ext and ¯ u . The abscissa v alues in Fig. 2D in the main manuscript represen t a verages of the free membrane p otential ob- tained from 10 simulation runs with a total duration of T sim = 100 s and ﬁring threshold θ set to E rev exc = 0 mV. The deviations from the theoretical prediction (Eq. 46) are smaller than the size of the sym b ols, therefore no errorbars are shown. The ordinate v alues and standard errors were calcu- lated from the simulated spike train data according to p ( z = 1) = 1 N N X i =1 p i , (72) s = v u u t 1 N − 1 · N X i =1 [ p i − p ( z = 1)] 2 , (73) with p i = N spk i τ on T sim b eing the fraction of time spent in z = 1 and N spk i represen ting the total num b er of spikes in the i th out of N = 10 performed sim ulations. Since the resp ective standard errors of the mean are smaller than the size of the symbols, no error bars are shown. App endix IX: Sampling via recurrent netw orks of LIF neurons (Fig. 3 and 4 in main man uscript) The sim ulated netw ork consists of K = 5 neurons with a synaptic weigh t matrix W and a bias vector b (b oth in the Boltzmann domain). All en tries w ere dra wn from a b eta distribution B (0 . 5 , 0 . 5) and mapped linearly to the in terv al [ − 0 . 6 , 0 . 6]. More sp eciﬁcally , b k , W kj ∼ 1 . 2 · [ B (0 . 5 , 0 . 5) − 0 . 5]. The parameters and mapping of the b eta distribution were c hosen with the in tent of generating diverse distributions, spanning m ul- tiple orders of magnitude. The bias b k , deﬁned in the Boltzmann domain, determines the probability p ( z k = 1 | z \ k = 0 ) for neuron k . In the LIF domain, the proba- bilit y p ( z k = 1 | z \ k = 0 ) = 0 . 5 corresp onds to the mean free mem brane potential ¯ u 0 k . Then, a nonzero bias can b e describ ed in the LIF domain as a linear shift from ¯ u 0 k to a mean membrane p oten tial ¯ u b k . This yields the linear transformation b k =( ¯ u b k − ¯ u 0 k ) /α , (74) where α represents the scaling factor b etw een the tw o domains. Both quantities ¯ u 0 k and α can b e determined from the predicted activ ation function of a single LIF unit. The ﬁrst quantit y constitutes the inﬂection p oin t of the activ ation function (at p ( z k = 1 | z \ k = 0 ) = 0.5), the latter follows from the slop e of the function. By computing ¯ u b k , we can map an y bias b k of a single unit of the Boltzmann machine on to a yet unconnected LIF neuron. In sim ulations, ¯ u b k w as established b y inject- ing a temporally constant external curren t I k ext according to I ext k = ( αb k + ¯ u 0 k ) h g tot i − g l E l − X i ν i w noise i E rev i τ syn . (75) 13 In order to achiev e sampling netw ork dynamics in the LIF domain faithful to those display ed by an equiv alen t Boltzmann machine, the Boltzmann weigh t matrix W w as translated into LIF net work w eigh ts w ij according to Eq. (65). Th us, sup erposing PSPs saturate the mem- brane p oten tial, approximating the constan t amplitude of a PSP in the abstract neuron mo del. F or Fig. 3B in the main manuscript, this setup of a random Boltzmann machine w as sim ulated N = 10 times with diﬀerent random seeds for the P oisson background for a duration of T sim = 10 s. The red bars show the analytically computed target joint distribution p B ( z ). The blue bars depict the net w ork distribution p N ( z ), calculated from the ﬁring activity of the simulated LIF net work set up to match p B ( z ). The means and error bars hav e b een calculated as in Eq. (72) and (73), resp ectiv ely . The abov e simulations were rep eated with a signiﬁ- can tly longer duration in order to study systematic devia- tions due to the LIF implemen tation. Fig. 3C in the main man uscript shows the distance b etw een the target distri- bution p B ( z ) and its LIF netw ork representation p N ( z ) in form of the Kullback-Leibler div ergence D KL ( p N || p B ) = h log [ p N ( z ) / p B ( z ) ] i p N ( z ) (76) This estimate has been tak en for one set of parameters ( W , b ) for ten indep enden t trials (thin lines) in an LIF net work at integration times T : 0 ≤ T ≤ T sim = 10 6 ms. The red dashed line displa ys the av eraged D KL ( p N || p B ) for the abstract net w ork mo del with identical parameters ( W , b ). The decrease of D KL ( p N || p B ) for longer in tegra- tion times indicates the increasing precision of the sam- pling netw ork ov er time. Even tually , the D KL ( p N || p B ) con verges for the LIF netw ork to a nonzero v alue, re- ﬂecting small systematic errors. Fig. 3D in the main man uscript shows the distribution of D KL ( p N || p B ) v al- ues for 100 randomly dra wn Boltzmann mac hines emu- lated by LIF net works, ev aluated from a single run of T sim = 10 6 ms each. Fig. 4 w as generated in the same w ay as Fig. 3, except for the netw ork size and weigh t distribution. App endix X: Demostration of probabilistic inference (Fig. 5 in main manuscript) W e trained a fully visible, fully connected Boltzmann mac hine to store three hand-written digits (0, 3, 4) that w ere tak en from the MNIST data set [37] and were scaled do wn to 12x12 pixels. The pixel intensities of these pat- terns (ranging from 0 to 1) w ere linearly scaled to an ac- tiv ation b et w een 0 . 05 and 0 . 95, deﬁning the target statis- tics h z k i T and h z k z j i T for the Boltzmann machine. Then, external currents I b k and synaptic weigh ts w kj w ere optimized via the update rules ∆ I b k ∝ h z k i T − z k and ∆ w kj ∝ h z k z j i T − z k z j , with samples z obtained through sampling from an LIF net w ork set up with synaptic noise and neuron parameters as describ ed ab o ve. The recurren t connections, deﬁned b y w kj , induce synaptic curren ts I rec k in addition to the noise currents I noise k . Additionally , eac h neuron’s mean eﬀective mem- brane potential Eq. (42) is shifted by external curren ts of the trained quantities I b k as well as input curren ts I y k . The latter encode observ ations y which are deﬁned in the con text of the probabilistic mo del describ ed in A VI. Hence, the total received curren t of neuron k in the net- w ork amounts to I k = I rec k + I noise k + I b k + I y k . (77) The resulting samples z ( t ) displa yed in subplots 5C and 5D were taken for 4000 ms, after a burn-in time that ensured that the net work had conv erged to its equilibrium distribution. Figur e 5C, main manuscript: Sampling fr om the prior. F rom the probabilistic model it follo ws that p ( z | y = 0 ) = p ( z ). This means that the LIF netw ork will sample from the prior when I y k = 0 , ∀ k . The ﬁgure makes use of t wo pro jections of net work states z ( t ) to illustrate the sampled distribution: 1. Star plot: In order to illustrate the 12x12- dimensional netw ork states z ( t ) , a t wo-dimensional linear pro jection in a star plot has b een chosen (blue dots). The axes indicate the three basis vec- tors B , representing pixel intensities h z k i T of the digits (0, 3, 4), h z k i 034 T = ( B 0 , B 3 , B 4 ) T (78) with a total intensit y normalization || B i || = q P j | B i j | 2 = 1. The net work states z ( t ) acquired from the simula- tion are pro jected onto this basis: z 034 ( t ) = ( B 0 · z ( t ) , B 3 · z ( t ) , B 4 · z ( t ) ) T . (79) This three-dimensional v ector is then pro jected on to a t wo-dimensional plane in co ordinates z pro j ( t ) =  sin( φ 0 B ) sin( φ 3 B ) sin( φ 4 B ) cos( φ 0 B ) cos( φ 3 B ) cos( φ 4 B )  z 034 ( t ) (80) with ( φ 0 B , φ 3 B , φ 4 B ) = (0 , 2 π 3 , 4 π 3 ) indicating the di- rections of the normalized basis vectors. These linear pro jections z pro j ( t ) of netw ork states z ( t ) are used to illustrate similarity of states as dis- tance in the tw o-dimensional plane. In Fig. 5C in the main man uscript, a netw ork evolution time of 14 4000 ms is shown, samples z ( t ) tak en every 2 ms, i.e. in total 2000 pro jected states are displa yed. W e ensured in longer simulations that the total simu- lation runtime is suﬃcient to represent the distri- bution under the mixing of the Marko v c hain. The signiﬁcant clustering of the net work states around the directions of the arrows indicates the pro ximity to the target states (0, 3, 4) for a ma jor- it y of time. The transitions of the netw ork states z ( t ) are de- picted as a red tra jectory in the star plot. This tra jectory connects 100 pro jected netw ork states within a time in terv al of 200 ms, demonstrating the time evolution of z pro j ( t ). 2. Snapshots: The squared color maps on the time axis display the av eraged pixel in tensity of the net- w ork in a time window 25 ms: ¯ z k ( t ) = 1 25 ms Z t +12 . 5 ms t − 12 . 5 ms z ( t 0 ) k dt 0 . (81) The equidistant time interv als b etw een these snap- shots w ere taken ev ery 25 ms within the time frame of the ab ov e mentioned red tra jectory of 200 ms. In summary , the net work sp ends most of the time in distinct mo des (digits) and only little time in “blurred” states that could not b e clearly assigned to one digit. F urthermore, it sp ends an approximately equal amoun t of time in each mo de, indicating that the prior p ( z ) do es not fav or one of the three digits. Figur e 5D, main manuscript: Sampling fr om the p os- terior. Incomplete and am biguous input y w as pro vided to the net work by setting four of the inputs diﬀeren t from zero: y k 6 = 0 for k ∈ I with I denoting an index set and |I | = 4. These inputs were chosen at the cen ter of the image where digits 3 and 4 b oth hav e black pixels, while digit 0 is white. More precisely , I = { 77 , 78 , 79 , 80 } when pixels are indexed row-wise starting f rom the top-left cor- ner. In the LIF implementation, p ositive v alues y k cor- resp ond to p ositiv e currents I y k , as sp eciﬁed in Eq. (75). W e set I y k = 0 . 831 nA for the four non-zero inputs, induc- ing an eﬀectiv e bias b k + y k b y injecting a total current I b k + I y k . F or the plot, the same pro jections of netw ork states were used as in Fig. 5C in the main manuscript. Under the incomplete input, the netw ork sp ends only little time in the 0-mo de or “blurred” states, while the equilibrium distribution exhibits tw o distict modes in the 3- and 4-directions. Th us the p osterior reﬂects b oth the almost certain conclusion that “the input is not a zero” and the uncertain t y that “the input could either be a three or a four”. In particular, the netw ork response is w ell-suited for further pro cessing (e.g. by other cortical p opulations or in a tec hnical application). F or instance, the netw ork states could b e integrated by a linear classi- ﬁer to recognize the digit class. App endix XI: Prediction of the activ ation function for v arious parameter sets W e ha v e used Eq. (46) for predicting the activ ation functions of LIF neurons in the HCS regime. In this regime, the refractory time τ ref and the synaptic time constan t τ syn b ecome the dominant time constants: τ eﬀ  τ ref ≈ τ syn . (82) So far, we ha ve compared our prediction to simulation data, as well as to t wo other predictions b y [16] and [17] in Fig. 2B of the main manuscript. Here, we depart from these assumptions and show in Fig. 6 that our prediction holds for sev eral diﬀerent parameter sets. In particular, the predictions from [16] and [17] are only v alid when ei- ther synaptic time constan ts (Fig. 6C) or refractory times (Fig. 6E) b ecome shorter. App endix XI I: V alidity of the LIF sampling framework for v arious parameter sets F or our v arious netw ork simulations (Fig. 3-5 in the main man uscript) w e ha ve used a set of biologically plau- sible parameters from [36] (see also “Appendix VI I: Sim- ulation parameters”). In particular, we hav e used high input noise rates and weigh ts in order to ac hieve the HCS, as well as a small distance b et w een ﬁring threshold and reset, in order to reduce τ b k , during which a neuron falsely enco des the state z = 0. F urthermore, we ha ve assumed τ ref = τ syn . Fig. 7 shows that these are by no means strict constraints. ∗ The ﬁrst tw o authors hav e con tributed equally to this w ork. W e thank W. Maass for his essential supp ort, as w ell as S. Hab ensc huss, M. Brixner and the three anon ymous reviewers for their helpful comments. This researc h was supp orted b y EU grants #269921 (Brain- ScaleS), #237955 (F ACETS-ITN), #604102 (Human Brain Pro ject), the Austrian Science F und FWF #I753- N23 (PNEUMA) and the Manfred St¨ ark F oundation. [1] K. K¨ ording and D. W olp ert, Nature 427 , 244 (2004). [2] J. Fiser, P . Berkes, G. Orb´ an, and M. Lengyel, T rends in cognitive sciences 14 , 119 (2010). [3] K. F riston, J. Mattout, and J. Kilner, Biological cyb er- netics 104 , 137 (2011). [4] T. Y ang and M. N. Shadlen, Nature 447 , 1075 (2007). [5] P . Berkes, G. Orb´ an, M. Lengyel, and J. Fiser, Science 331 , 83 (2011). 15 FIG. 6. Comparison of our prediction of the activ ation function (Eq. 46) to simulation data, as well as to the predictions giv en by [16] and [17], for several parameter sets. (A) Standard parameter set as given in T ab. I. This ﬁgure shows the same curv es as Fig. 2B in the main manuscript. (B) Same as A, but with quadrupled membrane capacitance C m and one quarter of the leak conductance g l , input rates ν exc , inh and weigh ts w exc , inh . This parameter set is identical to the one used for the top left corner of Fig. 7 and has the eﬀect of slo wing the membrane, i.e., increasing τ eﬀ b y a factor of 16. (C) Same as A, but with a decreased synaptic time constant τ syn = 3 ms. The prediction from [16] is improv ed, since the correlations in the pre- and p ost-refractory eﬀective mem brane p oten tial are smaller in this scenario. (D) Same as A, but with an increased synaptic time constan t τ syn = 30 ms. The prediction from [16] deteriorates due to the longer-range mem brane p oten tial autocorrelation. Con versely , the prediction from [17] impro ves, since the refractory time b ecomes less important. (E) Same as A, but with a v ery short refractory time τ ref = 1 ms. Here, we enter the parameter range where [17] provides go o d predictions. (F) Same as A, but with the input rates ν exc , inh and w eights w exc , inh decreased b y a factor of 10, thereby slo wing the mem brane considerably (imp erfect HCS). Additionally , we hav e c hosen a large reset-to-threshold distance of ϑ − % = 10 mV. In this scenario, the τ b k -term in Eq. (46) b ecomes dominant and the activ ation function departs from the logistic shap e that it has in the HCS. [6] S. Deneve, Neural computation 20 , 91 (2008). [7] L. Buesing, J. Bill, B. Nessler, and W. Maass, PLoS computational biology 7 , e1002211 (2011). [8] R. P . Rao (2004) pp. 1113–1120. [9] P . O. Hoy er and A. Hyv arinen, Adv ances in neural infor- mation pro cessing systems , 293 (2003). [10] Z. F. Mainen and T. J. Sejnowski, Science 268 , 1503 (1995). [11] M. A. Petro vici, J. Bill, I. Bytschok, J. Schemmel, and K. Meier, arXiv preprint arXiv:1311.3211 (2013). [12] A. Destexhe, J. Comp. Neurosci. 27 , 493 (2009). [13] A. Destexhe, M. Rudolph, and D. Pare, Nature Reviews Neuroscience 4 , 739 (2003). [14] M. J. Richardson and W. Gerstner, Neural computation 17 , 923 (2005). [15] L. M. Ricciardi and L. Sacerdote, Biol. Cyb ern. 35 , 1 (1979). [16] N. Brunel and S. Sergi, Journal of Theoretical Biology 195 , 87 (1998). [17] R. Moreno-Bote and N. Parga, Physical Review Letters 92 , 028102 (2004). [18] L. M. Ricciardi and S. Sato, Journal of Applied Proba- bilit y 25 , 43 (1988). [19] D. Pecevski, L. Buesing, and W. Maass, PLoS compu- tational biology 7 , e1002294 (2011). [20] R. Salakhutdino v and G. E. Hinton (2009) pp. 448–455. [21] A.-R. Mohamed, G. E. Dahl, and G. Hin ton, IEEE T rans. on Audio, Speech, and Lang. Proc. 20 , 14 (2012). [22] M. Tso dyks and H. Markram, Pro ceedings of the Na- tional Academ y of Sciences of the United States of Amer- ica 94 , 719 (1997). [23] G. Hin ton, P . Day an, B. F rey , and R. Neal, Science 268 , 1158 (1995). [24] D. Sulzer and S. Rayport, Amino acids 19 , 45 (2000). [25] W. Gerstner and J. L. v an Hemmen, Netw ork: Compu- tation in Neural Systems 3 , 139 (1992). [26] R. Brette and W. Gerstner, J. Neurophysiol. 94 , 3637 (2005). [27] S. Mitra, S. F usi, and G. Indiveri, IEEE T ransactions on Biomedical Circuits and Systems 3 , 32 (2009). [28] M. A. P etrovici, B. V ogginger, P . M ¨ uller, O. Breitwieser, et al. , PLOS ONE 9 , e108590 (2014). [29] D. Probst, M. A. P etrovici, I. Bytschok, et al. , F ron tiers in Computational Neuroscience 9 , 13 (2015). [30] E. Neftci, S. Das, B. Pedroni, et al. , F rontiers in Neuro- morphic Engineering (2014). [31] P . L´ ansk` y, Ph ysical Review E 55 , 2040 (1997). [32] M. U. Thomas, Journal of Applied Probability , 600 (1975). [33] A. N. Burkitt, Biological cyb ernetics 95 , 97 (2006). [34] M. L. Hines and N. T. Carnev ale, The NEUR ON Bo ok (Cam bridge Univ ersity Press, Cambridge, UK, 2006). 16 FIG. 7. Sampling with LIF neurons o ver a broad range of relev ant mo del parameters. All plots depict the Kullback-Leibler div ergence b etw een the sampled distribution (with a certain set of parameters) and the target distribution, which is iden tical to the one used in Fig. 3B in the main manuscript. (A) Sweep ov er neuron size and leak age, as well as bac kground input parameters. The axes represen t multiplicativ e scaling v alues for four parameters: background (Poisson) synaptic weigh ts w exc , inh and ﬁring rates ν exc , inh on the abscisa, neuron capacitance C m and leak conductance g l on the ordinate. The parameter v alues used throughout the main manuscript (see also “App endix VI I: Simulation parameters”) therefore ha ve the co ordinates (1 , 1). The net work simulation runtimes were chosen as T sim = 10 6 ms. As exp ected, the large neuron / weak noise scenario (top left square) do es not p ermit accurate sampling, as the activ ation function is no longer logistic (see also Fig. 6F). In general, the plot shows that go o d sampling quality can b e achiev ed for any neuron capacitance and leak as long as the background noise is strong enough (HCS). (B) Sweep ov er the ratio of the synaptic and refractory time constants. The b est sampling p erformance is, indeed, achiev ed for τ syn ≈ τ ref , but the net work still produces go o d approximations of the target distribution when the t wo time constants are not precisely identical. All D KL data p oin ts result from 20 simulations with runtimes of T sim = 10 5 ms eac h. The error bars represent the standard error of the mean. (C) Eﬀect of an increased distance from reset to threshold p oten tial. An increase in ϑ − % causes a gradual decay of the sampling quality , since the membrane requires additional time to reac h a suprathreshold u eﬀ when the refractory perio d is ov er. This can, in principle, b e accomodated by deﬁning a larger time window τ on during which the neuron is considered to enco de the state z = 1. Nevertheless, in an HCS, the eﬀective time constan t can b e low enough to render the threshold-to-reset distance irrelev ant. [35] A. P . Davison, D. Br¨ uderle, J. Eppler, J. Kremk o w, E. Muller, D. Pecevski, L. Perrinet, and P . Yger, F ron t. Neuroinform. 2 (2008). [36] R. Naud, N. Marcille, C. Clopath, and W. Gerstner, Biological Cyb ernetics 99 , 335 (2008). [37] Y. LeCun and C. Cortes, (1998).

Stochastic inference with spiking neurons in the high-conductance state

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment