Morphing a Stereogram into Hologram

This paper develops a simple and fast method to reconstruct reality from stereoscopic images. We bring together ideas from robust optical flow techniques, morphing deformations and lightfield 3D rendering in order to create unsupervised multiview ima…

Authors: Enrique Canessa, Livio Tenze

Morphing a Stereogram into Hologram
Morphing a Stereogram i nto Hologram Enrique Canessa 1 and Livio T enze Science Dissemination Unit (SDU) ICTP - Internati onal Centr e for Theor etical Physics, T rieste, Italy Abstract This paper develops a s imple and fast method to re construct reality from stereoscopic i mages. W e bring together ideas from robust optical flow techniques, morphing deformations and ligh tfield 3D rendering in order to create unsupervised multiview images of a scene. The reconstruction algorithm provides a good visualization of the virtual 3D imagery behind stereograms upon display on a headset-free Looking Glass 3D moni tor . W e discuss the possibility of app lying the method for live 3D streaming optimized via an associated lookup table. Keywor ds: 2D to Hologram conversion , autostereoscop ic, multiview display , lightfield r endering, optical flow , morphing, deep learning, cultural heritage. Introd uction V irtual reality (VR) to engage with 3D objects and environments has been a burgeoni ng and popular subject in recent years. This technology is highly beneficial in a range of diverse innovative scientific, engineering and entertainment appl ications including medicine, astro stereo photography , robotic vision, intelligent transportation systems and video games among others [1]. For example, it has been recently shown tha t the advantages of 3D stereoscopic visualiza tion over a conventional 2D planar s creen can shorten the dissection time of specialized sur gery [2]. Although many advances have been achieved already , a key limitation for VR technologies is the fact that, in some cases, it can produce contagious yawning [3]. I t is also necessary to wear special goggles or headsets, causing motion sickness and limiting user time. Notwithstanding such constraints, VR applications can still give science a new dimension allowing researchers to view and share 3D da ta [4]. The final goal however would be to obtain and visualize similar results without wearing any device. Motivated by the study of R yan Baumann on animating stereograms with optical flow morphing [5], in this paper we made a first attempt to reconstruct reality in similar simple terms. W e deve lop a method to combine observations of 2D stereoscopic images with 3D virtual interpretations of reality. W e apply the unsupervised tor ch-warp optical flow algorithm [5,6] to animate stereo pairs and retrieve distinct 2D views from morphing deformations between left (L) and right (R) images. Since all these s equential frames put together can give an acceptable illusion of depth and parallax in the horizontal direction [5], w e place them in a single standard Quilt collage [7]. W e then convert the Quilt into a native lightfield image using open source SURF s ara visualization python scripts [8]. This Quilt provides a good visualization of virtual 3D imagery of stereograms by a direct 1 Author for correspondence: canessae@ictp.it 1 display of the lightfield output images on the new class of standalone Looking Glass 3D monitors [9]. Multiple viewers can s ee the scene inside stereoscopic images without the need for glasses and from dif ferent angles. W e discuss on th e possibility of applying this method for producing multiview 3D streaming with just a stereo webcam. Hardware W e use low-cost ELP-960P2CAM (V90 and LC1 100) U SB stereo webcams with no distortion dual lens and M12 mount synchronization to obtain the principal 2D s tereoscopic images [10]. According to specifications, the two ca mera video –with low power consumption, 90 degre e lens, standard electronic rolling shutter and 1/3” CMOS OV9750 s ensor for high quality image– can reach high frame rates in MJPEG compression format of 2560(H)x960(V)p@60fps with a sensitivity 3.7V/lux-sec@550nm. Its small size of 80x16.5 mm is useful for embedded applications. It supports Linux OS USB video class UVC with adjustable parameters such as brightness, contrast, saturation, hue, sharpness, color balance and exposure. For the multiple angles visualization of our morphing to hologram image files generated in full color with the present algorithm, we u se the glasses-free standard Looking Glass 8.9″ (also known as HoloPlay) as an HDMI external monitor [9]. This HoloPlay device combines lightfield and volumetric technologies within a single new type of display [10], which allows to display an hologram of simultaneous 32 (or 45) different views at 60 fps formed via a 8x4 (or 9x5) Quilt input. The technology used in this class of monitor is described in the U.S. Patent application number 2017-0078655 : “ Printed Plane 3D V ol umetric Display” . Besides the standard 64bit W indows 10, the viewing under O.S. Linux Ubuntu 19.04 was also possible using a Notebook Aspire E 15, Inter Core i5, 64bit, 8GIB RAM, 1366x768p resolution with graphics card Nvidia GeoForce 820M output 2560x1600p. The small Raspberry Pi 3 single- board computer device Model B+ 1.4GHz 64-bit quad-core processor with extra power supply w as also able to display our stereograms morphed into multiview 3D display . Method The procedure adopted to get a stereogram is as follows: we first position the principal object in the scene, at le ast 1.6 meter distan t apar t from the two lenses of the ELP stereo webcam. This distance enables us to avoid image distortions due to well-known technical limitations of stereoscopic webcams [1 1]. The dual lenses are aligned parallel toward the horizon, avoiding to tilt the ELP’ s device. The most outstanding feature of the compact ELP s ynchronized stereo webcam we us e is that the two cameras video frames are synchronous. This unique fea ture enables to simulate the manner in which human eyes observe ‘simu ltaneously’ one scene from two different viewpoints [10]. This is ideal for bino cular stereo vision development like the one studi ed in this work. By one single shot we retrieve a single image in H D resolution, containing L and R views, and w ithout the need for any extra, complex prior calibration of the stereo webcams as in the case of most compa ct industrial camera devices producing and displaying two L and R images independently . 2 W e take with the ELP a stereoscopic test picture using the ffmpeg command and cut the resulting single HD image of max. 1280x960p resolution in two equal parts to generate the L and R images. W e then resize each view to set initially the widths to 512p and then crop their heights to 256p (anywhere along its vertical axis starting from the same upper corner in both images). This image manipulation is carried out before the morphing rendering to speed up the computation using smaller images and avoid the case of an ‘ out- of-memory (RAM) condition befor e sgemm’ in the deep convolution matching [6] when trying to match the full HD images. Fig. 1 : 32 intermediate views (0-31 frames) by morphing a stereoscopi c image v ia the ‘tor ch-warp’ algorithm described in [5]. Shown ar e al so the morphed animated Gif and the backwar d and forwar d displacements of the optical flow data. Our method for 3D multiview then follows closely the algorithm of Baumann for animating stereograms [5]. W e auto matically apply optical flow deformations based on DeepFlow (which outputs .flo files) to our pairs of L-R images in order to morph between them. This alignment of the two (dissimilar) images, followed by a gradual fade out from one image to the other , can give an acceptable illusion of depth and (parallax) motion to the viewer (see exa mples in [5]). In particular , this continuous interpolation by optical flow-based image warp enabl ed us to control the gradually cross-dissolving between pairs of stereo images. By fading out from one L image to the other R, we can then split a whole scene into a given number of intermediary component frames as illustrated in Figure 1. W e first generate .flow data for 32 dif ferent views (frames numbered 0-31 in the figure) starting from L to R image s in an entirely unsupervised manner . This choice for the morphing process is done based on the nature of the Looking Glass HoloPlay [7]. 3 Fig. 2 : Example of 8x4 Quilt i nput for the Look ing Glass HoloPlay fr om the intermediary component (morphed) frames in Figur e 1. Next, the 32 dif ferent ( .png ) views from the morphing data are then converted to PNG24 to form the needed Quilt. W e place the set of 32 view images (512x256p) in a single s tandard 4x8 Quilt image (2048x2048 p) as in Figure 2 using ‘make_quilt.py’ by SURFsara scripts [8]. The views start from the bottom left as the leftmost view (L-image) to the top right being the furthest right (R- image). Fig. 3 : Native i mage output of the Quilt in Figure 2 based on t he pe r -device HoloPlay calibration ‘.json’ value s. Finally , we generate a native image tar geted to a specific HoloPlay device using the ‘quilt2native.py’ script in [8] as shown in the example of Figure 3. W e g et the display calibration values (in the form of a standard data interchange file .json ) from a Looking Glass Display using: ‘get_calibration_fr om_eepr om.py’ from [8]. 4 The multiview 3D rendering of our morphed stereogram of Figure 3 can then be displayed on a HoloPlay device as appears in Figure 4. This output can al so be seen in the vi deo: https://www .youtube.com/watch?v=6F AhmI-vtLQ Fig. 4 : Multiview hologram output fr om morphed ster eoscopic images. S ee also video demo at: https://www .youtube.c om/watch?v=6F AhmI-vtLQ In order to produce beautiful holograms, as in Figure 4, Looking Glass provides 32 (or 45) discrete views or frames of a 3D scene, displaying these views over a ~50°-wide view cone. This l ightfield arrangement tr icks our visual perception into seeing 3D objects by parallax ( i.e, moving the head around the scene, and by stereo vision ( i.e., presenting different perspectives to each eye ). Discussion Each Looking Glas s holds own calibration data for correct rendering and, inside its render volume, different depths have different optical properties [7]. The depth where things look sharpest is at the so-called Zero-Parallax Plane in the middle of the display (in our case, around frame Nr . 16) . Objects in this plane show up in the same pixel-space position for all 32 (or 45) views. Objects in the scene that are nearer or further than this pl ane, undergo parallax. The Looking Glass HoloPlay provides a novel glasses-free way to preview 3D objects and scenes as in Figure 4. A gener ic expression for the relation between the pixels of a slanted lenticular 3D-LCD and the multiple perspective views was first derived by Cees van Berkel [12]. Each sub-pixel on the 3D- LCD is mapped to a certain view number and colour value ( i.e., in the lightfield domain). If i and j denote the panel coordinates for each sub-pixel, then N i,j = N tot (i - i off - 3j tan(α)) mod(P x )/P , (1) where N denotes the view number of a certain viewpoint, α the slanted angl e between the lenticular lens and the LCD pane l and P x the lenticular pitch. Upscaling mu ltiple views ( e.g., upscaling from 5 the Quilt to the native Looking Glass image) requires lots of CPU resources and increases system complexity [13-15]. W e have created multiview images via Eq.(1) starting from a stereoscopic scene. W e have put together ideas from optical flow techniques, morphing deformations and lightfield 3D rendering. 2D morphing yield reasonable and bet ter 3D visual results w hen it works correctly along the horizontal direction, as required by the optical elements adopted by the Looking Glass HoloPlay [7]. Fading from misaligned L to R images can cause the other intermed iate parts of the whole scene to blur and get distorted. The limits of the morphing process in stereo an imation, as compared with the use of depth map, is that morphing cannot provide complete information on distant regions. It mainly extrapolates the parallax encoded in the images by co mbining information from nearby pixels only . It is essentially a purely local method [16]. However , the application of 2D morphing to create the (32 or 45) required 2D views for the Looking Glass Quilt poses minimum geometr ic constra ints on the reconstruct ed 3D scene via the alternate ligh tfield projections . Morphing between the two im ages taken simultaneously with the ELP stereo webcam produces a nice illusion of 3D throughout the multiviews of a scene. Optical flow techniques are relatively sensitive to the pres ence of occlusions, illumination changes and out-of-plane movements. These fac tors lead to noise and to obtain translation motion discontinuities between the neighborhoods of two consecutive ima ges. The key proces sing steps in the flow field [5,6], include th e matching by polynomial interpolation to approximate pixel intensities in the neighborhood, warping and optimization without an expl icit regularization. W e have found that to genera te .flow data for 32 views, optical flow techniques can lead to a reasonable accuracy to reconstruct 3D reality from s tereoscopic images. The CP U time for creating these sets of view frames, and especially the final Looking Glass native image, can take a few minutes. This process still needs to be optimized. T o map the Quil t into the final native image for a display on the holographic Looking Glass HoloPlay , it is necessary to apply the complex algorithm of Eq.(1) that depends on input parameters of the physical structure of this device. Ev ery HoloPlay mon itor in fact possesses unique calibration parameters (such as pitch, slope, dpi) set in the phase of manufacturing. By making use of this calibration, and applying lightfield geo metric transformations, one then gets the multiview image reproduced in the Looking Glass HoloPlay . This mapping procedur e requires considerable calculation power , since the final native image is at a resolution of 2560x1600p with 3 color channels –such that, once the pix el to be mapped is fixed, the map value for each color channel implies s eparated calculations. In essence this procedure, as such, would become computationally expensive and difficult to apply in applications for a real-time video in 3D. Since the mapping matrix depends on the geometric position of each pixel, and on the calibration parameters of the display , we have constructed a Lookup T able (LUT) to replace runtime computation and s ave processing time. This LUT is created only once at the beginning of the mapping process: Quilt→HoloPlay image, and then used for all the images that need to be visualized. W e create the array as follows: Fir st, w e allocate 3 matrices for the three color channels RGB of size 2560x1600p x 2. E ach matrix provides the X coordinate of th e Quilt from which we take the corresponding value and the Y coordinate. This explains the multiplication of the resolution 2560x1600p by 2. T o avoid unnecessary waste of resources and consume the least possible amount 6 of RAM mem ory , each e lement of the matrices is made of type uint16_t (the uint8_ t type would allow to address maximum values of 255). Secondly , all the positions of the pixels 2560x1600p are scrolled and we calculate the mapping value for each pixel on the Quilt image. Next, once the mapping value has been calculated, the value is stored in the three different allocated matrices. Finally , once the map filling procedure has been completed, we save th e 3 three matr ices in binary format. Then, at each successive time s tep it is possible to reload the matrices (without recalculating them) and apply the mapping automatically to all the necessary images. This procedure allows to speed up signif icantly the mapping procedure –the rendering operation of the final native image is essentially achieved by accessing the elements of the 3 matr ices to map the Quilt pixels on the final native image for a dynamical display on the HoloPlay . W e have implemented anew a C-library th at includes the LUT to quantify by benchmarking the timing to convert the Quilt into the native i mage between the direct, classical method of Eq.(1) –or , similarly to those of th e S URFsara s cripts [8]. A test with gpr of f or the statistics of the single files inside the C-library indicates that the total conversion time is 1.18s against 0.67s for the LUT . This means that the implement ation of LUT allows to reduce the computing time of about 50%. One could even reduce the computation t ime by a factor of 4, by allocating the 4 threads of a Raspberry Pi 3 with 4 processors to the Quilt image of Figure 2 divided in four parts (with successive 8 views each). These possibilities open the path f or producing multiview 3D streaming in real-time with a simpler and faster algorithm. Conclusions W e have propo sed a rapid conversion approach that allows to transform stereoscopic images into holograms via the automatic implementat ion of a LUT and unsupervised morphing deformations between the L and R images and with no use of depth map. The key idea here is just to start from the pair of structured stereoscopic 2D images w ith no additional information about any intermediate view to create the holograms. Under this method, we establish correlations between 2D image observations and their 3D display on a Looking Glass HoloPlay monitor . Future work could include the potential application of this method in the field of live 3D s treaming using cost effective hardware and no headsets. In princ iple it would be also possible to interact with such real-time holograms displayed in the HoloPlay screen with a Leap Motion controller [7]. Ultimately , the different set of techniques discussed in this work still need to be optimized and further experimented when combined all together . W ork along these lines is under development. References [1] van Beurden, M.H.P .H ., IJsselsteijn, W .A. and J uola, “Effectiveness of s ter eoscopic displays in medicine: a r eview”, J.F . 3D Research 3 (2012) 3; https://doi.org/10.1007/3DRes.01(2012)3 7 [2] Itatani, Y . et al., “Thr e e-dimensional ster eos copic visualization shortens operative time in lapar os copic gastr ectomy for gastric cancer”, Nature Scientific Reports 9 (2019) 4108; https://doi.org/10.1038/s41598-019-40269-3 [3] Gallup, A.C., V asilyev , D., Anderson, N., and K ingstone, A., “ Contagious yawning in virtual r eality is affected by actual, but not simulated, social pr esence” , Nature Scientific Reports 9 (2019) 294; https://doi.org/10.1038/s41598-018-36570-2 [4] Matthews, D., “V irtual-r eality applications give science a new dimension”, Nature 557 (2018) 127; http://doi.org/10.1038/d41586-018-04997-2 [5] Baumann, R., (blog), 17 Aug 2016, “Animating Ster eogr ams with Optical Flow Morphing”, https://ryanfb.github.io/etc/2016/08/17/animating_stereograms_with_optical_flow_morphing.html (accessed 4 May 2019). [6] W einzaepfel, P ., Revaud, J., Harchaoui, Z., Schmid, “ DeepFlow: Lar ge displacement optica l flow with deep matching” , C., ICCV - IEEE International Conference on Computer V ision, Dec 2013, Sydney , Australia. pp.1385-1392, 10.1 109/ICCV .2013.175. hal-00873592 [7] “How the Looking Glass monitor wor ks” : https://docs.lookingglassfactory .com/Appendix/how- it-works/ (accessed 4 May 2019). [8] V isualizat ion @ SURFsara - Some python scripts and tools related to the Looking Glass : https://github.com/surfsara-visualization/looking-glass (accessed 4 May 2019). [9] The Looking Glass is a patent-pending combination of lightfield and volumetric display technologies within a single 3D display system: https://look ingglassfactory .com/ [10] Ailipu T echnology Co., manufacturer surveillance systems: http://www .webcamerausb.com [1 1] W oods, A., Docherty , T ., and Koch, R., “Image distortions in ster eoscop ic video systems”, Stereoscopic Displays and Applications IV , Proc. SPIE V ol. 1915, San Jose, CA, Feb. 1993; https:// doi.org/10.1 1 17/12.157041 [12] van Berkel, C., “Image pr eparation for 3D-LCD” , Proc. SPIE 3639, Stereoscopic Displays and V irtual Reality Systems VI, (24 May 1999); https://doi.or g/10.1 117/12.349368 [13] T akaki, Y . “ Multi‐view 3‐D display employing a flat‐panel display with slanted pixe l arrangement”, Soc. Info. Display (SIF) 18 (2012) 476; https://doi.org/10.1889/JSID18.7.476 [14] Don Lee, E. et al.,“Upscaled sub-pixel multiplexing of multi-view images based on weighted view overlay”, Electronic Lett., 51 (2015) 828 ; https: //doi.or g/10.1049/el.2014.3668 [15] Jeong, Y .J., Chang, H.S., Nam, D., and Kuo C.-C.J ay , “Dir ect l ight f ield r endering without 2D image generation”, Soc. Info. Display (SIF) 24 (2017) 686; https://doi.org/10.1002/jsid.513 [16] Davis, J.A., and McAllister , D.F ., "Morphing in ster eo animation", Proc. SPIE 3639, Stereoscopic Displays and V irtual Reality S ystems VI, (24 May 1999); https://doi.org/10.1 1 17/12.349397 8

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment