Programmable Spectrometry -- Per-pixel Classification of Materials using Learned Spectral Filters

Pr ogrammable Spectr ometry — P er -pixel Classiﬁcation of Materials using Learned Spectral Filters V ishw anath Saragadam, and Aswin C. Sankaranarayanan Department of ECE, Carnegie Mellon Uni versity , USA vishwanathsrv@cmu.edu Abstract Many materials have distinct spectral pr oﬁles. This fa- cilitates estimation of the material composition of a scene at eac h pixel by ﬁrst acquiring its hyperspectr al image, and subsequently ﬁltering it using a bank of spectral pr oﬁles. This pr ocess is inherently wasteful since only a set of lin- ear pr ojections of the acquired measur ements contribute to the classiﬁcation task. W e pr opose a novel pr ogrammable camera that is capable of pr oducing images of a scene with an arbitrary spectral ﬁlter . W e use this camera to optically implement the spectr al ﬁltering of the scene’ s hyper spectral image with the bank of spectral pr oﬁles needed to perform per-pixel material classiﬁcation. This pr ovides gains both in terms of acquisition speed — since only the rele vant mea- sur ements ar e acquir ed — and in signal-to-noise ratio — since we in variably avoid narr owband ﬁlters that ar e light inefﬁcient. Given tr aining data, we use a rang e of classi- cal and modern techniques including SVMs and neural net- works to identify the bank of spectral pr oﬁles that facilitate material classiﬁcation. W e verify the method in simulations on standar d datasets as well as r eal data using a lab pr oto- type of the camera. 1. Introduction Material composition of a scene can often be identiﬁed by analyzing variations of light intensity as a function of spectrum or wa velengths. Since materials tend to hav e unique spectral proﬁles, spectrum-based material classiﬁca- tion has found widespread use in numerous scientiﬁc disci- plines including molecular identiﬁcation using Raman spec- troscopy [ 5 ], tagging of ke y cellular components in ﬂuores- cence microscopy [ 17 ], land coverage and weather moni- toring [ 4 , 11 ], and even the study of chemical composition of stars and astronomical objects using line spectroscopy . It would not be a stretch to suggest that spectroscopy or its imaging variant, hyperspectral imaging (HSI), is an impor- tant scientiﬁc tool for material identiﬁcation. F r am e 1 F r am e 40 F ra m e 75 L ab e l R GB O p ti c a l S V M Figure 1: V ideo-rate material classiﬁer . W e propose an optical setup that is capable of classifying material on a per-pixel basis. This is achie ved by building a programmable spectral ﬁlter that can image at high spatial resolution. The images here show a video sequence of a identifying real plants from plastic plants. W e cap- tured the data at 4 fps, and performed only a per-pix el thresholding to get the video result. While hyperspectral imaging has also found application in computer vision tasks [ 14 , 25 , 31 ], its widespread adop- tion has been hindered due to inherent challenges in acqui- sition them. Capturing a HSI requires sampling of a very high dimensional signal; for e xample, me ga-pixel images at hundreds of spectral bands, a process that is daunting to do at video rate. This problem is further aggrav ated by the fact that hyperspectral measurements have to combat lo w sig- nal to noise ratios, as a ﬁxed amount of light is divided in to several spectral bands — leading to long exposure times that can ev en span several minutes per HSI. This paper proposes a nov el approach for enabling spectrometry-based per-pix el material classiﬁcation by 1 ov ercoming the limitations posed by HSI acquisition. T o understand our proposed approach, we ﬁrst need to delve deeper into the process of classiﬁcation itself. Classiﬁca- tion techniques in volv e comparing the spectral proﬁle at each pixel with known or learned spectra by taking a lin- ear projection. Intuitiv ely , giv en K material classes, we would compute O ( K ) such linear projections. F or exam- ple, a support vector machine (SVM) classiﬁes by ﬁnding distance of features from the separating hyperplane; in the context of spectral classiﬁcation, this translates to spectrally ﬁltering the scene with the hyperplane coefﬁcients. Hence, spectral classiﬁcation can be made practical if we can cap- ture the linear projections directly without ha ving to acquire the complete HSI. Such an operation translates to optically ﬁltering the scene’ s HSI using kno wn spectral ﬁlters, which can be achie ved if the camera’ s spectral response can be ar- bitrarily programmed. T o enable per-pix el material classiﬁcation, we propose a new imaging architecture with a programmable spectral response that can changed on-the-ﬂy at video rate. Gi ven a training dataset of spectral proﬁles, we use of f-the-shelf classiﬁcation techniques like SVMs and deep neural net- works to identify linear projections that facilitate material classiﬁcation. For a novel scene, the camera captures multi- ple images, each with a different spectral response; the cap- tured measurements are used with the classiﬁer to perform per-pix el material classiﬁcation. The proposed pipeline has numerous beneﬁts. Optical computing of the linear projections allo ws us to circum- vent the measurement of sampling the full HSI. This has the dual beneﬁt of reducing the acquisition time (from min- utes to hundreds of milliseconds) as well as increasing light efﬁcienc y of each captured image since the linear projec- tions often correspond to broadband spectral proﬁles. For binary classiﬁcation problem, our lab prototype provides a classiﬁcation result ev ery second frame thereby providing material labels at 4 frames per second. W e also show re- sults on multi-class labeling problems using a classiﬁer that can differentiate between ﬁ ve distinct material types. 2. Prior W ork W e discuss prior w ork in the areas of material classiﬁca- tion using HSIs as well as optical computing and design of programmable spectral ﬁlters. Hyperspectral classiﬁcation. Consider the HSI of a scene, H ( x, y, λ ) , where each pixel ( x, y ) is assumed to be- long to one of K material classes. Speciﬁcally , the spectra at each pixel can be written as, H ( x, y , λ ) = α ( x, y ) S L ( x,y ) ( λ ) , (1) where L ( x, y ) is label of the material contrib uting to spec- trum at ( x, y ) , and α ( x, y ) is scaling parameter . Note that the model above assumes all spatial pix els are pure, i.e., e v- ery pixel gets contribution from only one material. W e use this model for the sak e of e xposition and later discuss about how to relax it later to handle mix ed pixel. The goal of classiﬁcation is to estimate the label at each pixel, L ( x, y ) , which forms a label map. There are broadly two approaches to spectral classiﬁcation — generativ e and discriminativ e. Generativ e techniques rely on decomposing the HSI as a linear combination of basic materials that are called end-members [ 6 ]. Speciﬁcally , the HSI of the scene is decomposed as, H ( x, y , λ ) = K X k =1 s k ( λ ) a k ( x, y ) , (2) where s k ( λ ) is the spectra of k th material, and a ( x, y ) is the relativ e contribution of material k at ( x, y ) . The abundances at each pixel along with the end-member spectra provide a feature vector that can be used to spatially cluster the mate- rials and subsequently identify them. Discriminativ e techniques rely on directly learning dis- cerning features from the HSI without the intermittent stage of low-dimensional decomposition. Here, we identify a set of spectral ﬁlters, { ( d k ( λ ) , β k ) } M k =1 that generate per-pixel feature vector via spectral-domain ﬁltering: F k ( x, y ) = Z λ H ( x, y , λ ) d k ( λ ) dλ + β k . (3) Hence, each image F k ( x, y ) is a spectrally-ﬁltered version of the HSI with an added of fset. In case of SVMs, the learned spectral ﬁlters form separating hyperplanes; this has been a de facto way of HSI classiﬁcation [ 7 , 22 ]. More sophisticated learning techniques based on neural net- works use spectral features [ 13 ] or spatio-spectral features [ 30 , 18 , 10 , 15 , 3 , 16 , 21 , 12 ] for classiﬁcation. Inv ariably , the number of spectral features used, i.e, the dimensional- ity of the projection, tends to be smaller than the number of spectral channels in the HSI. Hence, we seek to measure the features directly , by computing ( 3 ) optically . As is to expected, such a paradigm of optical classiﬁcation requires the design of cameras that can be programmed with arbi- trary spectral ﬁlters. Optical computing. Instead of relying on both spatial and spectral information, we consider a simpler approach which relies only on the spectral proﬁles for classiﬁcation. Such a strategy is less accurate than spatial and spectral versions [ 30 , 18 , 10 , 15 , 3 , 16 , 21 , 12 ], b ut signiﬁcantly reduces the complexity of the imaging system. This approach is similar , in spirit, to using BRDFs to perform per -pixel classiﬁcation by v arying the incident illumination [ 19 , 9 ], or using ﬁrst layer of a neural network to capture light ﬁelds [ 2 ]. Such a setup offers tw o-fold advantage: 1. F ewer measurements. Since the number of material classes is far fe wer than number of spectral bands, we need to measure far fe wer measurements. For example, we sho w in our experiments that 3 − 5 images suf ﬁce for a 5-class classiﬁcation task. 2. Incr eased SNR. The discriminating ﬁlters tend to be spectrally broadband, and hence each image is measured at higher light le vels than any individual narrow spectral band. Hence, the images can be captured at higher SNR or at faster acquisition rates. Optical computing has found use in various computer vision tasks such as capturing light transport matrices [ 24 ], low- rank approximation of hyperspectral images [ 29 ], and spec- tral classiﬁcation using programmable light sources [ 8 , 26 ]. W e adopt the paradigm of optical computing to make dis- criminativ e ﬁlter measurements by building a camera whose spectral response can be arbitrarily programmed. Dynamic spectral ﬁlters. Spectral ﬁltering can be achiev ed by modiﬁed the response of the camera; a canoni- cal and static example being the Bayer pattern or more inter- estingly , the case of ﬂuorescence ﬁlters in microscopy . It is howe ver more useful to have a camera whose response can be altered arbitrarily in a fast manner . Numerous techniques to achiev e spectral ﬁltering have been proposed in the past. Agile spectral imager [ 23 ] rely on the coding the so-called “rainbow plane” to achie ve arbitrary spectral ﬁltering. This was further developed by [ 20 ] where they placed a digital micromirror de vice (DMD) on the rainbow plane to achieve dynamic spectral ﬁltering. Howe ver , such architectures come with a debilitating problem — usage of simple pupil codes such as open aper- ture or a slit directly tradeoff spatial resolution for spectral resolution. This was ﬁrst identiﬁed in [ 29 ] in the context of hyperspectral imaging. They showed that a slit, a common choice for spectrometry , leads to large spatial blur . Simi- larly an open aperture, a common choice for high-resolution imaging, leads to large spectral blur . Hence, such apertures are not capable of spectral classiﬁcation with high accurac y . W e instead rely on the optical setup in [ 29 ] to overcome the spatial-spectral tradeoff. The key idea is to use a coded aperture that introduces an in vertible blur in both spatial and spectral domains. An important difference is that the setup in [ 29 ] is designed for HSI image acquisition; this paper adapts the underlying ideas for performing material classi- ﬁcation in the scene. 3. Programmable Spectral Filter Our optical setup is a modiﬁcation of the optical setup proposed in [ 29 ]. W e brieﬂy explain the rele vant parts of the optical setup here. The interested reader is referred to [ 29 ] as well as appendix for a detailed deriv ation. l e n s w i t h c ode d a pe r t u r e d i f f r a c t i o n g r at in g r a i n bow p lan e s p at ial p lan e f f f f r e l a y l e ns r e l a y l e ns P1 P2 P3 (a) Setup schematic Dif fr action grating NIR bandpass filter L1 L2 Coded apertur e Relay + coded apertur e L3 PBS Spectral coding Objective l ens Unpolarized light P-polarized light S-polarized light PBS Polarizing Beam Splitt er LCoS display NIR camera (b) Practical realization Figure 2: Schematic for programmable spectral ﬁlter . The optical architecture in (a) consists of a lens assembly with coded aperture which introduces spatial and spectral blurs. By placing an SLM in P2, the HSI of the scene can be spectrally ﬁltered and sensed by a camera sensor on P3. (b) sho ws a compact realization of the optical setup. 4f system f or spectral pr ogramming. W e borrow the op- tical schematic for spectral programming from [ 29 ], sho wn in Fig. 2 (a). Given the HSI, H ( x, y , λ ) , that is focused on the grating at P1, we seek to derive the intensity on planes P2 and P3. The intensity on rainbo w plane P2, I 4 ( x, y ) = a 2 ( − x, − y ) ∗  S  x f ν 0  e c  x f ν 0  , (4) where S ( λ ) = R ( x,y ) H ( x, y , λ ) is spectrum of the scene, e c ( λ ) is response of the optical system, and ν 0 is the density of grov es in mm − 1 . The intensity on image plane P3, I 5 ( x, y ) = Z λ H ( x, y , λ ) ∗     1 λ 2 f 2 A  − x λf , − y λf      2 ! dλ, (5) where A ( u, v ) is the 2D Fourier transform of a ( x, y ) . The key observation from ( 4 ), ( 5 ) is that a coded aperture placed on plane P2 causes a spectral blur giv en by a ( x, y ) and a spatial blur given by    A  − x λf , − y λf     2 . As shown in Fig. 3 , a slit causes a severe spatial blur , whereas an open aper- ture causes large spectral blur . The solution is to introduce an in vertible blur in both domains, which can be achiev ed using a coded aperture, shown in the last column. W e use sp e c tr um 3 0 μm s lit op e n I n v e r t i bl e c o de sp a c e Figure 3: Spatio-spectral r esolution tradeoff. A slit is capable of high spectral resolution whereas an open aperture is capable of high spatial resolution b ut both are inappropriate for high spatio- spectral HSI imaging. In contrast, a coded aperture introduces an inv ertible spatial and spectral blurs which can then be decon- volv ed. Figure reproduced with permission from [ 29 ]. the same coded aperture that was used in [ 29 ], as it is de- signed to promote in vertibility in both domains. Optical setup. Our optical setups is in principle similar to Fig. 2 (a). W e place a spatial light modulator on the rainbow plane (P2) and sensor on spatial plane (P3) to achie ve spec- tral ﬁltering. The optimized binary code [ 29 ] is placed in the lens assembly Figure 2 (b) shows a schematic of a prac- tical implementation of the same optical setup. W e use a Liquid Crystal on Silicon (LCoS) display as a spatial light modulator for spectral ﬁltering. Effect of coded aperture. Gi ven the HSI of the scene, H ( x, y , λ ) , the coded aperture introduces spatial and spec- tral blurs in the following w ay , b H ( x, y , λ ) =  A  x λf , y λf  ∗ H ( x, y , λ )  ∗ a ( λν 0 f , y ) , (6) i.e., all operations are no w performed on a modiﬁed version of the HSI of the scene. Giv en a spectral proﬁle s k ( λ ) , the proposed setup directly computes ﬁltered image,: b f k ( x, y ) = Z λ b H ( x, y , λ ) s k ( λ ) c ( λ ) dλ, (7) by loading s k ( λ ) on the spatial light modulator . W ith the optical setup in place, we will next see how to use the pro- grammable spectral ﬁlter to perform optical classiﬁcation. 4. Learning Discriminant Filters W ith camera that is capable of capturing images with ar - bitrary spectral proﬁles, we pursue tw o questions; one, ho w many ﬁlters are required for classifying K classes, and two, what spectral ﬁlters maximize classiﬁcation accuracy . The L e a r ning a c la s s ific a t io n ne t w o r k 2 56 ×   × 25 6 2 56 × 12 8 1 28 × 6 4 S p e c t r a lly p r o g r ammab l e came r a L a b e le d t r a ining d a t a L ea r n ed filt e r s P r e d ic t io n 6 4 × 3 2  × 2 56 25 6 × 1 28 12 8 × 6 4 64 × 32 C o m p ut e fir s t la y e r o p t ic a lly in t e s t ing p ha s e  pre d Figure 4: Proposed optical classiﬁer . The proposed optical clas- siﬁer broadly consists of two stages. In the ﬁrst stage, we learn the weights of a neural network with spectrum as input and class label as output. The training process outputs the set of discerning ﬁlters, marked ”learned ﬁlters” in the image. In testing stage, we ﬁlter the HSI of the scene with the learned ﬁlters, thereby replac- ing the ﬁrst layer of the classiﬁer with an optical implementation. This results in a high accurac y , per -pixel classiﬁer while requiring far fe wer measurements than the size of the HSI. questions above are closely tied to the type of classiﬁer un- der consideration. W e detail the two classiﬁer architectures we explore in this paper which help answer the questions abov e. Note that any classiﬁer which relies on the linear projection can be used. For the sake of exposition, we only ev aluate SVM and neural networks. 4.1. Support V ector Machine SVMs provide a binary , linear classiﬁer by learning a separating hyperplane on the training dataset. Giv en a set of data points { x k , y k } N k =1 , where y k ∈ { 0 , 1 } is the la- bel of x k , SVM seeks to solve the following optimization problem, min w ,c 1 N k X k =1 max(0 , 1 − y k ( w > x k + c )) + λ k w k 2 , (8) where λ is a tuning parameter . The output of solving the optimization problem is the vector w and intercept c . In the context of optical classiﬁcation, w is the ﬁlter that maxi- mizes accuracy for binary decision. F or K -class decision, we choose a one-vs-all classiﬁcation strategy , which uses K hyperplanes, and hence K spectral ﬁlters. 4.2. Deep Neural Networks Deep neural networks (DNNs) provide a richer alterna- tiv e to SVMs. W e model the ﬁrst linear unit of the DNN to be the programmable spectral ﬁlter and train a model whose input is the spectral proﬁle at a pixel and whose output is the material class label as a one-hot vector . While there are many possible architectures, we choose a simple, ﬁv e-layer Method Classifier Coding strategy #Measurements Accura cy Santara et al. DNN Non - linear, spatial and spectral 220 96.7% ( reported ) Hu et al. DNN Convol utional , spectrum - only 220 90.16% ( reported ) Lee et al. DNN Convol utional , spatial and spectral 220 93.6% ( reported ) Melgani et al. SVM Linear , spectrum - only 16 84% ( computed ) This paper DNN Linea r , spectrum - only 16 90% ( Computed ) Figure 5: Simulations on the Indian Pines dataset. W e compare state-of-the-art classiﬁers against the classiﬁers proposed in this paper . By r eported we report the accuracy ﬁgures listed in the respectiv e papers, while computed results were generated by us. A key feature of our optical setup is that it can only compute linear projections of spectra. While this leads to reduction in accuracy , the number of captured images are far fe wer. neural network as an example with all layers being fully connected. Figure 4 gives a brief overvie w of the proposed training and testing methodology . The weights of ﬁrst fully connected layer , A 1 are the learned discriminating ﬁlters, and hence the ﬁrst layer can be ev aluated optically , thereby circumventing the need to measure the full spectrum at each pixel. The number of ﬁlters, Q depends on the number of materials and ho w easily they can be separated. In our ex- periments, we classiﬁed a total of 5 objects. W e then v aried the number of ﬁlters and computed mean classiﬁcation ac- curacy . Based on this, we picked the optimal number of ﬁlters. W e note that the idea of optically computing the ﬁrst layer has been explored before in the context of designing color ﬁlter arrays [ 1 ] and processing light ﬁelds [ 2 ]. 4.3. Simulations. W e compare SVM and the 5-layer DNN classiﬁer to some of the state-of-the-art techniques in spectral- classiﬁcation on the NASA Indian Pine dataset which con- sists of 220 spectral bands with 16 object classes. Figure 5 tabulates the accuracies with classiﬁers used in this pa- per in bold. W e observe that the accuracy is lower than state-of-the-art, which is expected as we only use spectral information, while the other techniques use both spatial and spectral information. Howe ver , relying on a spectrum-only classiﬁer lets us capture far fe wer images than the number of spectral bands. 5. Experiments W e demonstrate capabilities of our setup for video- rate binary classiﬁcation with binary SVM as well as matched ﬁltering, and multi-class classiﬁcation with multi- class SVM and DNNs. Gray s c al e c am e ra ( H a m a m a t s u F l a s h 4 .0 L T ) C o d e d ape rture O b j ec t i v e l en s D i f f rac ti o n grati ng LC o S S L M ( H ol oey e ) Figure 6: Lab prototype. The picture shows the lab proto- type we built with only the major components marked. W e used an objectiv e lens of 8 mm focal length, while all other lenses were 100 mm. (a) RGB (b) 650 nm (c) 750 nm (d) 850 nm 650 700 750 800 (nm) 0.2 0.4 0.6 0.8 Rel. intensity Point 1 Point 2 Point 3 Figure 7: Example HSI. Our prototype is designed to cap- ture images from 600 nm to 900 nm. (a) was captured using a cellphone while (b)-(d) are images captured by our setup. Bottom row shows spectral proﬁles at three marked points. Note how all the pigments disappear at λ = 850 nm in (d). Optical setup. Figure 6 shows a photograph of the lab prototype we built along with labels for relev ant compo- nents. A detailed optical layout along with the list of com- ponents is in appendix. Our SLM is a Holoeye LCoS SLM with a frame rate of 60 Hz that works as a secondary mon- itor . W e used an NIR-sensitive sCMOS camera (Hamatasu ORCA Flash 4.0 L T). In order to classify materials accu- rately , we designed our system to image from 600 nm to 900 nm, which is the near infrared (NIR) regime. Our setup is capable of coding spectrum at a resolution of 3 . 3 nm, gi v- ing us 100 spectral bands. Finally , the SLM acts as a dy- namic spectrally-selectiv e camera and hence can be directly used for measuring the complete HSI. T o do so, we dis- play permuted Hadamard patterns on the SLM to capture a 512 × 512 × 256 dimensional HSI. Figure 7 shows an ex- ample of captured HSI of an acrylic painting. Calibration. Our optical setup broadly requires calibra- tion of the code resulting in spectral blur , calibration of wa velengths and ﬁnally , spatial PSF . W e use narrow-band RGB Cardboard NIR Wood Plant Wood - textu red paper Plas tic p lant (a) V isible and NIR images. 650 700 750 800 850 (nm) 0.2 0.4 0.6 0.8 1 Rel. intensity Cardboard Fake plant Fake wood Plant Wood Figure 8: Material dataset. The ﬁgure sho ws false colored images of the 5 different materials we collected for classiﬁ- cation purpose. (b) shows av erage spectra of the materials as measured by our lab prototype. 500 600 700 800 900 (nm) 0 1 2 3 4 5 Relative intensity Filter 1 Filter 2 Filter 3 Filter 4 Filter 5 (a) SVM ﬁlters 500 600 700 800 900 (nm) 0 1 2 3 4 5 Relative intensity Filter 1 Filter 2 Filter 3 Filter 4 Filter 5 (b) DNN ﬁlters Figure 9: Learned ﬁlters. The output of a multi-class SVM is K separating hyperplanes, which results in K ﬁlters, shown in (a). Similarly , the DNN architecture consists of sev eral layers, of which the ﬁrst layer is linear . Hence the training process results in weights shown in (b) that can be used as spectral ﬁlters. lasers for calibrating both code and wav elengths, and use a 10 µ m pinhole for calibrating spatial PSF . Details are av ail- able in the appendix. Dataset. W e show classiﬁcation results with a total of ﬁ ve types of subjects: 1) black cardboard, 2) varnished wood, 3) wood-te xtured paper , 4) real plants, and 5) plastic plants. The choice of objects stems from similarity of these mate- rials (plants vs plastic plants) in visible wav elengths, while (a) RGB image (b) Full HSI scan + projection (256 meas.) (c) optical projection (2 meas.) 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 False positive 10 -3 0.996 0.997 0.998 0.999 1 True positive Full measurement Optical computing Figure 10: Advantage of optical computing. W e show an example of binary classiﬁcation between cardboard and wood (a) using per-pixel SVM. Optical computing achiev es higher accuracy with far fe wer measurements. 2 4 6 8 10 12 14 16 Number of filters 75 80 85 90 Mean accuracy Neural net. SVM Figure 11: Accuracy vs. number of ﬁlters. The plot shows accuracy as a function of number of ﬁlters. The accuracy increases initially and then saturates. W e hence use the knee point of the curve as the optimal number of ﬁlters. (a) RGB (b) SVM score (c) Label Figure 12: P er-pixel classiﬁcation. Due to per-pix el op- eration with high spatial resolution, our imager can clearly identify the micro-structures such as the cactus thorns by capturing only two images instead of the complete HSI. having distinctly dif ferent spectra in NIR domain. W e col- lect one HSI for each of the materials and manually label them, giving a total of 5 HSI for training. Figure 8 shows P l a n ts W o o d R G B NIR SVM i m a g e SVM l a b e l M F i m a g e M F l a b e l Figure 13: V arious binary classiﬁers. W e compare binary classiﬁcation using SVM and matched ﬁltering (MF). First ro w is a comparison of real wood (rhino) and fake wood (background, printed paper), while the second row is real and fake plants. Due to dynamic programming capability , we can classify with arbitrary ﬁlters and hence utilize any classiﬁer that relies on linear projection of spectrum at each pixel. visible and false-NIR images, as well as the av erage spec- trum for each material. W e note that none of the objects used for training the classiﬁers were reused in testing phase. T raining classiﬁers. W e trained two classiﬁers – multi- class SVM and DNN with v arying number of ﬁlters. F or SVM, we used Scikit-Learn [ 28 ] in a one-vs-all conﬁgura- tion which learned a total of 5 spectral ﬁlters. The learned spectral ﬁlters are shown in Fig. 9 (a) DNNs were trained with the network architecture sho wn in Fig. 4 with loss function set to cross entropy . The num- ber of spectral ﬁlters were v aried from 1 to 20 to compare performance. W e learned the network using the PyT orch framew ork [ 27 ] with learning rate set to 10 − 3 for a total of 50 epochs. W e then extracted weights of ﬁrst layer and used them as spectral ﬁlters. The learned ﬁlters are sho wn in Fig. 9 (b). Further details about the learning process are included in appendix. Figure 11 shows a plot of accuracy as a function of num- ber of ﬁlters, Q . Accuracy of the classiﬁer increases sharply initially and then saturates which implies that more spectral ﬁlters do increase accuracy but there is diminishing returns after a point. Based on this, we used 3 , 5 , 10 ﬁlters for com- parisons in our real experiments. Handling scale of features. A key requirement of any classiﬁer is that the scale of features be same during training and testing. A common practice is to set the norm of feature at ( x 0 , y 0 ) , k H ( x 0 , y 0 , λ ) k to unity , or the maximum value to unity . In our case, this requires having knowledge of the complete spectral proﬁle, which defeats the purpose of op- tical computing. instead, we normalize our measurements with sum of the spectrum, R λ H ( x 0 , y 0 , λ ) , which can be measured by displaying a spectral proﬁle with all ones. The measured featured vectors are then, I sum ( x 0 , y 0 ) = Z λ H ( x 0 , y 0 , λ ) dλ (9) e I k ( x 0 , y 0 ) = Z λ H ( x 0 , y 0 , λ ) s k ( λ ) dλ (10) I k ( x 0 , y 0 ) = e I k ( x 0 , y 0 ) I sum ( x 0 , y 0 ) (11) W e scale the spectra the same way ev en while training, which makes the scaling consistent. Hence any set of mea- surements with spectral proﬁles requires one extra image. Binary classiﬁcation. The simplest task possible with our optical setup is a binary classiﬁcation, where the label at each pixel belongs to one of the two possible classes. In such a situation, one may either use a linear SVM where the spectral ﬁlter is the learned supporting hyperplane, w , or use a matched ﬁlter , where the spectral ﬁlter is difference of spectra of the tw o classes, s 1 ( λ ) − s 2 ( λ ) . Figure 10 e val- uates the advantages of optical classiﬁcation. (b) visualizes the SVM score at each pix el obtained by scanning the com- plete HSI and then computing the projection to the SVM hyperplane, which requires a total of 256 measurements. In contrast, optical projection, shown in (c) requires only two images. Bottom row shows the Receiver operating Charac- teristic (RoC) of classiﬁcation for both cases. The SNR ad- vantage is e vident; the area under the curve for optical pro- jection ( 0 . 7194 ) is higher than full measurement and then projection ( 0 . 7912 ). Figure 12 shows classiﬁcation of a real cactus surrounded by sev eral plastic plants. The SVM score R GB scen e N I R scen e S VM D NN Q = 3 D NN Q = 5 D NN Q = 1 0 G r o und tr uth C ar d b o ar d Pl a nt W ood F a k e p l a nt F a ke w ood L eg en d Figure 14: Optical classiﬁcation. W e sho w two e xamples of classiﬁcation where the linear operations are directly computed in the optical domain. The ground truth labels were obtained by hand annotation. SVM required a total of 11 measurements for ﬁve ﬁlters, whereas DNN with 3, 5, 10 ﬁlters required 7, 11, 21 images respecti vely . The RGB images shown how the objects are not easily discernable in the visible domain, while they are accurately identiﬁed in the NIR domain along with optical classiﬁcation. Accuracy: 72.55% 89.2% 70456 3.9% 3099 0.4% 278 0.8% 597 5.8% 4551 9.4% 1393 56.4% 8329 12.3% 1820 3.6% 533 18.2% 2681 0.0% 0 8.6% 5971 63.4% 43822 0.3% 174 27.8% 19204 0.0% 0 3.6% 804 10.3% 2334 72.3% 16321 13.7% 3100 0.0% 0 10.0% 2419 34.8% 8399 0.7% 174 54.5% 13150 Cardboard Fake plant Fake wood Plant Wood Target Class Cardboard Fake plant Fake wood Plant Wood Output Class (a) SVM Accuracy: 77.33% 91.1% 69997 1.8% 1413 0.9% 654 1.2% 887 5.1% 3900 11.6% 1753 47.2% 7112 26.1% 3929 0.9% 132 14.2% 2140 0.1% 21 5.7% 2226 91.1% 35541 0.7% 276 2.5% 960 0.0% 0 0.3% 47 2.5% 351 97.2% 13753 0.0% 3 0.1% 78 15.2% 9824 25.1% 16178 4.3% 2751 55.3% 35683 Cardboard Fake plant Fake wood Plant Wood Target Class Cardboard Fake plant Fake wood Plant Wood Output Class (b) Neural net. Figure 15: Confusion matrices for classiﬁers. Neural net- works typically outperform SVM. Among object classes, “wood” and “fake wood” get confused the most, as their spectra are similar . In contrast, “cardboard” is most differ- ent from all other spectra and hence has high accuracy . in (b) as well as the labels sho w that our setup is capable of resolving very thin structures such as the cactus thorns. Fig- ure 13 shows classiﬁcation results for real vs. plastic plants and real vs. fake wood with binary SVM as well as matched ﬁltering. Note that the objects are not easily discernable in RGB domain, while they are easily isolated after spectral ﬁltering. Figure 1 sho ws a video rate classiﬁcation of a real plant and a fake plant. The video was captured at 4 frames per second with alternating spectral proﬁle and all ones pat- tern. Note how the real plant is tracked across all frames, while the fake plant is ignored. Multi-class classiﬁcation. W e test the SVM and DNN ﬁl- ters learned on training data to classify a scene made of vari- ous materials from the set of ﬁv e materials. Figure 14 shows classiﬁcation results for two scenes for various techniques. The ground truth annotation was obtained by manually an- notating the objects, and then this was used for measuring the accuracy of classiﬁcation. V isually , DNNs outperform SVM, as is visible from the accurate classiﬁcation of the wooden rhinoceros head. Figure 15 shows a confusion ma- trix for SVM and DNN with 5 ﬁlters. The accuracies are not very high as we depend on spectral features alone. Ac- curacy can be signiﬁcantly increased if spatial information is used along with spectral proﬁles. This is done by ﬁrst capturing the Q images and then using the spatial informa- tion to classify . 6. Discussions and Conclusion W e propose a per-pixel material classiﬁer that relies on a high resolution programmable spectral ﬁlter . W e achieve this by learning spectral ﬁlters that can achiev e high classiﬁ- cation accuracy and then measure images of the scene with the learned ﬁlters. Owing to a simple, per-pixel decoding strategy , we can achiev e classiﬁcation at video rates. W e showed several compelling real world examples with em- phasis on binary video-rate and multi-class classiﬁcation. Limitations. A ke y limitation of our setup is the assump- tion that the pixels come from a single material class. Some real world examples are made of a mixture of materials at each class, an example being land cover . In such a case, out- putting just a class label may not sufﬁce but relative prob- abilities of each class is desired. This can be achieved by modifying the classiﬁers to output a score for each material at each pixel instead of most probable class. 7. Acknowledgment The authors acknowledge support from the National Sci- ence Foundation under the CAREER aw ard CCF-1652569, the Expeditions a ward IIS-1730147, and the National Geospatial-Intelligence Agencys Academic Research Pro- gram (A ward No. HM0476-17-1-2000). A. Theoretical backgr ound A.1. Image f ormation model. W e speciﬁed a simpliﬁed version of image formation model where we said that the HSI of the scene can be rep- resented as H ( x, y , λ ) . W e discuss a more precise model here. Consider the scene’ s spectral reﬂectance function, H R ( x, y , λ ) , where we assume that each point in 3D space is well modeled by Lambertian reﬂectance. Let L ( λ ) be the spectral distrib ution of a spatially uniform light source. The HSI of the scene under this illumination is then giv en by , H o ( x, y , λ ) = H R ( x, y , λ ) L ( λ ) , (12) which was the signal model we used in the main paper . Then, given a camera with spectral response C ( λ ) , the mea- sured image is, I ( x, y ) = Z λ H o ( x, y , λ ) C ( λ ) dλ = Z λ H R ( x, y , λ ) L ( λ ) C ( λ ) dλ (13) From the abov e equation, we see that the camera measures spectral albedo of the scene’ s HSI and not the spectral re- ﬂectance of models. Ho we ver , this is not a problem, as long as the light’ s spectral distribution is kno wn a priori . B. Learning details W e provide details about our training process with em- phasis on choice of parameters and hyperparameters. W e captured a total of 1 , 000 , 000 spectral proﬁles ov er 5 ma- terial types. For each classiﬁer , we used 20% for training, 5% for v alidation and 80% for testing. W e found that the testing accuracy did not improve ev en if we used more than 20% data. Support V ector Machine. W e used the function LinearSVC from Scikit-Learn [ 28 ] for training a one-vs- all SVM. The only hyperparamter of tuning was penalty for the hyperplanes, C , which was tuned by performing a grid search over the log space from 10 − 5 to 1 . Hyperparamter search was done through a 3-fold cross-v alidation. Neural Netw orks. W e used PyT orch [ 27 ] for training our neural network (DNN) classiﬁers. The architecture used for learning is shown in 4 and the details of each layer is pro- vided in T able 1 . Q is the number of ﬁlters and was varied Layer Components 1 ( Filters ) Linear (256xQ), ReLU , Dropout (0.1) 2 Linear (Qx256), ReLU , Dropout (0.1) 3 Linear (256x128), ReLU , Dropout (0.1) 4 Linear (128x64), ReLU , Dropout (0.1) 5 Linear (64x32), ReLU , Dropout (0.1) 6 ( Output ) Linear (32x5) T able 1: Components of our DNN classiﬁer . All the layers are formed of fully connected layers with a ReLU and dropout added after each linear layer. Here, Q is the number of spectral ﬁlters and w as variable in our experiments to compare performance. The output was a single linear layer . During training process, we used cross-entropy as loss function. from 1 to 20 to ev aluate performance as a function of mea- surements. W e trained the network with an initial learning rate of 10 − 3 and trained for a total of 60 epochs. The ﬁlters were initialized with a principal component analysis (PCA) decomposition of training data. This lead to smoother ﬁlters and higher accuracy . For each Q , we picked the model with best accuracy on v alidation dataset. C. Hardwar e details Hardwar e prototype. Figure 16 shows a picture of our lab prototype with names of major components. The last lens in the setup was replaced by a 50 mm objectiv e lens focused at inﬁnity . This led to a better spatial resolution than an achromat. Calibration. As described in the main paper, our setup re- quired calibration of coded aperture, wa velengths and spa- tial PSF . W e detail the calibration procedure here. 1. Coded apertur e calibration : This is required to capture the code that blurs the spectrum. W e measure the coded aperture by illuminating a spectrally ﬂat object (such as spectralon) with a laser of known wa velength and scan- ning the complete HSI. W e then average all spatial pix- els to get the spectrum of the scene. Since a laser can be treated as a discrete delta, the measured spectrum will be the coded aperture. W e threshold the measured spectrum appropriately to get the binary coded aperture, as shown in Fig. 17 (a). 2. W avelength calibration : T o ﬁnd the correspondence be- tween band inde x (1 - 256) and the corresponding wa ve- lengths, we capture two scenes, each one comprised of a spectrally ﬂat object illuminated by a narrowband laser light source. The averaged spectrum of the HSI is a blurred version of the laser spectrum. By deconv olving with the previously estimated coded aperture, we get lo- cation of the laser in terms of band index. W e use this 1 2 3 4 5 6 1 8mm Objectiv e Lens Thorlabs M VL8M23 3 Polarizing B eam Splitter Moxtex Wi regrid Polar izing Beam splitter 4 300 groves/m m Diffraction Gr ating Dynasil G30 0TN31.7CC 5 NIR LCoS Disp lay Holoeye HE D6001 with N IR-enhancemen t 6 sCMOS camera Hamamatsu O RCA Flash 4.0 L T 100mm Achrom at Thorlabs A C254-100-B 2 Coded Ape rture Printed by Photoplot store with a feature size of 10μm Figure 16: Lab prototype. A picture of the lab prototype along with major components marked with details. W e skipped details about opto-mechanical components such as cage plates and posts to av oid clutter . The inset image shows the printed mask we used as coded aperture. 100 200 300 400 500 Band index 0 0.2 0.4 0.6 0.8 1 Measured spectrum Computed code (a) Code calibration 600 700 800 (nm) 0 0.2 0.4 0.6 0.8 1 635 nm (calib) 850 nm (calib) 780 nm (test) 830 nm (test) (b) W avelength calibration Figure 17: W avelength calibration. W e ﬁrst estimate the blur due to coded aperture by capturing a scene illuminated by a narrowband light source (635nm laser), giving us the code in (a). W e then calibrate the correspondence between band index and wa velengths by capturing two separate scenes illuminated by known laser light sources (635nm, 850nm). The results of the two calibration are show in (b), where we capture two more scenes with 780nm and 830nm laser . information along with laser wa velength to calibrate the correspondence. 3. Spatial PSF : T o ﬁnd the spatial blur kernel, we capture a single image of a 10 µ m pinhole. Since the PSF is well conditioned, deblurring the spatial images is well condi- tioned. 600 650 700 750 800 850 (nm) 0.2 0.4 0.6 0.8 (a) T arget proﬁle (b) W ithout correction (c) W ith correction Figure 18: Displaying desir ed spectral proﬁle. Gi ven a target proﬁle (a), we display a binary image on the SLM, as shown in (b), which ensures grayscale modulation despite wa velength-dependent gamma curve. Howe ver , since the SLM is 2 f aw ay from the camera sensor , there will be ef- fects of dif fraction. W e counter this by adding a small DC offset, as sho wn in (c). 4. Radiometric calibration of SLM : The LCoS SLM in our optical setup is based on twisted-nematic design, and hence has different gamma curves for different wav e- (a) Raw (b) Decon volved 0.2 0.4 0.6 0.8 1 Line pairs/pixel 0.2 0.4 0.6 MTF Raw image Deconvolved (c) MTF Figure 19: Spatial decon volution. Due to design of an in- vertible spatial blur , the optical setup is capable of high res- olution after decon volution. (a) shows a raw image, with enlarged PSF in inset, (b) shows result of wiener decon vo- lution, and (c) sho ws a comparison of modulation transfer function (MTF). There is a marked increase in resolution both quantitativ ely and qualitatively . lengths. Since the spectrum on the SLM is a blurred v er- sion of the true spectrum, we cannot perform a column- wise gamma correction. Instead, we use the SLM only as a binary modulator and achie ve grayscale modulation by varying height of each column as shown in Fig 18 (b). This way , the SLM has a linear gamma curv e for all wa velengths. Figures of merit. Our setup is capable of achie ving spec- tral resolution of up to 3 . 3 nm over the wavelength range of 600 − 900 nm, which is the designed resolution (see KRISM.pdf for further details). Due to inv ertible spatial blur , our setup is capable of high resolution after decon vo- lution. Figure 19 visualizes the captured image in (a) and decon volved image in (b) of a sector star target. (c) shows plot of Modulation T ransfer Function (MTF) as a function of line pairs per pixel. Image was deconv olved using sim- ple W iener deconv olution. The MTF30 after deconv olution was 0 . 45 linepairs/pixel. Handling diffraction due to SLM. Since the SLM is placed 2 f away from the image plane, any pattern displayed on SLM will lead to a diffraction blur . T o counter this effect, we alw ays display ones in the middle of the pattern to be displayed on the SLM. This reduces the effect of dif fraction while adding a simple offset to the data, which can be re- mov ed by capturing image with only the central part open. References [1] A. Chakrabarti. Learning sensor multiplexing design through back-propagation. In Advances in Neural Information Pr o- cessing Systems , pages 3081–3089, 2016. 5 [2] H. G. Chen, S. Jayasuriya, J. Y ang, J. Stephen, S. Si vara- makrishnan, A. V eeraraghav an, and A. Molnar . Asp vision: Optically computing the ﬁrst layer of con volutional neural networks using angle sensitiv e pixels. In IEEE Conf. Com- puter V ision and P attern Recognition , 2016. 2 , 5 [3] Y . Chen, H. Jiang, C. Li, X. Jia, and P . Ghamisi. Deep feature extraction and classiﬁcation of hyperspectral images based on con volutional neural networks. T rans. Geoscience and Remote Sensing , 54(10):6232–6251, 2016. 2 [4] E. Cloutis. Revie w article hyperspectral geological remote sensing: ev aluation of analytical techniques. International J . Remote Sensing , 17(12):2215–2242, 1996. 1 [5] N. Colthup. Intr oduction to infrar ed and Raman spec- tr oscopy . Else vier, 2012. 1 [6] N. Dobigeon, J.-Y . T ourneret, C. Richard, J. C. M. Bermudez, S. McLaughlin, and A. O. Hero. Nonlinear un- mixing of hyperspectral images: Models and algorithms. IEEE Signal Pr ocessing Magazine , 31(1):82–94, 2014. 2 [7] M. Fauvel, J. Chanussot, J. A. Benediktsson, and J. R. Sveinsson. Spectral and spatial classiﬁcation of hyperspec- tral data using svms and morphological proﬁles. In IEEE Int. Geoscience and Remote Sensing Symposium , pages 4834– 4837, 2007. 2 [8] M. Goel, E. Whitmire, A. Mariakakis, T . S. Saponas, N. Joshi, D. Morris, B. Guenter, M. Gavriliu, G. Borriello, and S. N. Patel. Hypercam: hyperspectral imaging for ubiq- uitous computing applications. In A CM Intl. J oint Conf. P er- vasive and Ubiquitous Computing , pages 145–156, 2015. 3 [9] R. Gross, I. Matthews, and S. Baker . Fisher light-ﬁelds for face recognition across pose and illumination. In Joint P at- tern Recognition Symposium , pages 481–489, 2002. 2 [10] A. B. Hamida, A. Benoit, P . Lambert, and C. B. Amar . 3-d deep learning approach for remote sensing image classiﬁca- tion. T rans. Geoscience and Remote Sensing , 56(8):4420– 4434, 2018. 2 [11] J. C. Harsanyi and C.-I. Chang. Hyperspectral image clas- siﬁcation and dimensionality reduction: An orthogonal sub- space projection approach. IEEE T rans. Geoscience and Re- mote Sensing , 32(4):779–785, 1994. 1 [12] M. He, B. Li, and H. Chen. Multi-scale 3d deep con volu- tional neural network for hyperspectral image classiﬁcation. In Intl. Conf. Imag e Processing , 2017. 2 [13] W . Hu, Y . Huang, L. W ei, F . Zhang, and H. Li. Deep con volu- tional neural netw orks for hyperspectral image classiﬁcation. Journal of Sensor s , 2015, 2015. 2 [14] M. H. Kim, T . A. Harvey , D. S. Kittle, H. Rushmeier , J. Dorsey , R. O. Prum, and D. J. Brady . 3d imaging spec- troscopy for measuring hyperspectral patterns on solid ob- jects. ACM T rans. Graphics (TOG) , 31(4):38, 2012. 1 [15] H. Lee and H. Kwon. Contextual deep cnn based hyperspec- tral classiﬁcation. In Intl. Geoscience and Remote Sensing Symposium , 2016. 2 [16] Y . Li, H. Zhang, and Q. Shen. Spectral–spatial classiﬁca- tion of hyperspectral imagery with 3d conv olutional neural network. Remote Sensing , 9(1):67, 2017. 2 [17] J. W . Lichtman and J.-A. Conchello. Fluorescence mi- croscopy . Natur e methods , 2(12):910, 2005. 1 [18] B. Liu, X. Y u, P . Zhang, X. T an, A. Y u, and Z. Xue. A semi- supervised con volutional neural network for hyperspectral image classiﬁcation. Remote Sensing Letters , 8(9):839–848, 2017. 2 [19] C. Liu and J. Gu. Discriminati ve illumination: Per-pixel clas- siﬁcation of raw materials based on optimal projections of spectral brdf. IEEE trans. pattern analysis and machine in- telligence , 36(1):86–98, 2014. 2 [20] S. P . Love and D. L. Graff. Full-frame programmable spectral ﬁlters based on micromirror arrays. J. Mi- cr o/Nanolithography , MEMS, and MOEMS , 13(1):011108, 2014. 3 [21] Y . Luo, J. Zou, C. Y ao, X. Zhao, T . Li, and G. Bai. Hsi-cnn: A novel conv olution neural network for hyperspectral image. In Intl. Conf. A udio, Language and Image Pr ocessing , 2018. 2 [22] F . Melgani and L. Bruzzone. Classiﬁcation of hyperspec- tral remote sensing images with support vector machines. IEEE T rans. geoscience and r emote sensing , 42(8):1778– 1790, 2004. 2 [23] A. Mohan, R. Raskar, and J. T umblin. Agile spectrum imag- ing: Programmable wav elength modulation for cameras and projectors. In Computer Graphics F orum , 2008. 3 [24] M. O’T oole and K. N. Kutulakos. Optical computing for fast light transport analysis. ACM T rans. Graph. , 29(6):164, 2010. 3 [25] Z. Pan, G. Healey , M. Prasad, and B. T romberg. Face recog- nition in hyperspectral images. IEEE T rans. P attern Analysis and Machine Intelligence , 25(12):1552–1560, 2003. 1 [26] J.-I. Park, M.-H. Lee, M. D. Grossberg, and S. K. Nayar . Multispectral imaging using multiplexed illumination. In IEEE Intl. Conf. Computer V ision , 2007. 3 [27] A. P aszke, S. Gross, S. Chintala, G. Chanan, E. Y ang, Z. De- V ito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer . Auto- matic differentiation in p ytorch. In NIPS-W , 2017. 7 , 9 [28] F . Pedre gosa, G. V aroquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P . Prettenhofer, R. W eiss, V . Dubour g, J. V anderplas, A. Passos, D. Cournapeau, M. Brucher , M. Perrot, and E. Duchesnay . Scikit-learn: Ma- chine learning in Python. Journal of Machine Learning Re- sear ch , 12:2825–2830, 2011. 7 , 9 [29] V . Saragadam and A. Sarankaranarayanan. KRISM—krylov subspace-based optical computing of hyperspectral images. arXiv:1801.09343 , 2018. 3 , 4 [30] V . Sharma, A. Diba, T . T uytelaars, and L. V an Gool. Hyperspectral cnn for image classiﬁcation & band selec- tion, with application to face recognition. T echnical report KUL/ESA T/PSI/1604, KU Leuven, ESA T , Leuven, Belgium , 2016. 2 [31] Y . T arabalka, J. Chanussot, and J. A. Benediktsson. Segmen- tation and classiﬁcation of hyperspectral images using water - shed transformation. P attern Reco gnition , 43(7):2367–2379, 2010. 1

Programmable Spectrometry -- Per-pixel Classification of Materials using Learned Spectral Filters

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment