Feature Extraction in Augmented Reality

Feature Extraction in Augmented Reality Jekishan K. Parmar Department of Computer Science & Engineering, Babaria Institute of T echnology , V adodara Email: jekishan@aol.in Ankit Desai IQM Corporation & School of Engineering & Applied Science, Ahmedabad Univ ersity , Ahmedabad Email: desaiankitb@gmail.com Abstract —A ugmented Reality (AR) is used for various applica- tions associated with the real world. In this paper , ﬁrst, describe characteristics and essential services of AR. Brief history on V irtual Reality (VR) and AR is also mentioned in the intro- ductory section. Then, AR T echnologies along with its workﬂow is depicted, which includes the complete AR Process consisting of the stages of Image Acquisition, F eature Extraction, F eature Matching, Geometric V eriﬁcation, and Associated Inf ormation Retrieval. F eature extraction is the essence of AR hence its details are furnished in the paper . I . I N T RO D U C T I O N The term Augmented Reality holds multiple deﬁnitions but one of the most popular and the one that still holds is Augmented reality is a ﬁeld in which 3D virtual objects are integrated into a 3-D real en vironment in real time [1]. Though, AR can be looked from the point of view of enhancement to reality , it can be deﬁned as AR is use of computers to enhance the richness of the real world [2]. It can also be simply called user interaction in 3D en vironment [3]. When it is perceiv ed as an interaction between humans and virtual objects, it can also deﬁne it as combining the real and the virtual in order to assist the user in performing a task in a physical setting is called Augmented Reality [4]. According to [5], Augmented Reality supports context-aw are computing. For an AR user , the real world and virtual objects coexist on the same view . For example, in a museum there is Mona Lisa, when this picture of Mona Lisa is looked through an augmented reality smartphone, the information of Mona Lisa will be imposed on top. In addition, it may include information like the artist, da V inci, as well as the year that it was estimated that this picture was drawn. Figure 1, sho ws one such example of augmented reality camera with the T aj Mahal. It can be seen in the ﬁgure 1 that as the camera is acquiring the T aj in its lenses and displaying to its user, information related to the T aj is also being displayed to the user . Next section compares AR and VR with deep insights into current advancements and research challenges, section III depicts various technologies inv olved in AR, then, details about Feature Extraction is furnished in section IV and ﬁnally , section V concludes the paper . I I . A U G M E N T E D R E A L I T Y A N D V I RT UA L R E A L I T Y Jaron Lanier coined the phrase V irtual Reality , in 1989 [6] and Thomas Caudell deﬁned Augmented Reality in 1990 [7]. Hence, these concepts are quite old. Though, its complete implementation has become possible only recently and that too, with full reliability . A. Reasons for Recent advancements in Augmented Reality • High resolution cameras, which enables accurate image and object identiﬁcation, are only av ailable since recent times. They are also av ailable in smart devices. • Recent av ailability of very high performing Central Pro- cessing Units and Graphical Process Units due to this fast and reliable image processing along with feature extraction can be done, which is necessary for Augmented Reality . • Cheap av ailability of large amount of memory and faster input/output access which is useful to store object infor- mation and quickly access it, which is highly essential for AR. • Sharp virtual text, and sharp images, superimposed on smart devices in a very elegant, and yet, easy-on-the-eye fashion has become possible only due to av ailability of High Deﬁnition displays on these devices. • Information needs to be retriev ed speedily and brought to the device from AR servers and databases so that it can be displayed on this device for the user to be able to see quickly , which is recently possible due to advent of high-speed broadband, wireless and wired networking technologies. Due to all these advancements, AR is going to become better and quicker . In spite of all these technological adv ancements there are some challenges in AR that needs to be worked on and requires proper study as well as research in order to get more from AR. B. Researc h Challenges in Augmented Reality Despite the growing interests and dev elopment in AR, following are the challenges that exists in the ﬁeld which has very wide scope of research. • A precise, fast, and robust registration between synthetic augmentations and the real en vironment is one of the most important challenge [8]. • AR system needs to deal with vast amount of information that exists in the real world which requires v ery quick and portable hardware [9]. • Highly efﬁcient energy consumption from AR devices is desired due to their limited battery life [9]. Fig. 1. Augmented Reality camera example 1 Fig. 2. Augmented Reality (AR) versus V irtual Reality (VR) • High network dependency is also a challenge as there may not be network coverage in some areas [10]. • AR assumes e verything to be static while real world is very dynamic, as there are many things that keep on changing like people passing by , weather conditions, color of buildings as they may be repainted over a gap of few years. Thus, inability of AR to lack dynamism is also a challenge [10]. C. Comparison of Augmented Reality and V irtual Reality A virtual reality user will be fully immersed in animated en vironment. This is like the game playing space that is commonly used. A user or a game player will commonly use an av atar to exist and interact inside the virtual space. The view of the user in virtual reality is different from the real en vironment and fantasies and illusions surrounding the av atar are easy to create in the virtual world. Whereas, augmented reality is a mixture of real life and virtual reality [11]. This is ho w , it dif fers from virtual reality . So, augmented reality combines information and does various things, but its basis is real life, what the user actually sees [12]. An augmented reality user is able to obtain useful informa- tion about location or objects and can interact with virtual 1 http://www .arlab.com/img/content/products/matching01.jpg contents in the real world. Therefore the virtual contents are superimposed onto the real world image that is seen by the user . The augmented reality user can distinguish the superimposed virtual objects and hence able to turn on or turn of f selected AR functions, which may be related to certain objects. In other words, if a user is only interested in objects, and not in locations. In that case, user will go and adjust augmented reality functions such that location information does not pop up, on their user screen and only gets object information. Things like that are controllable in futuristic systems. In comparison to virtual reality , augmented reality users commonly feel less separated from the real world, because basically the foundation of their view is exactly the real world. Things are just superimposed on the real world view of that user . So fantasies and illusions can be created and superimposed on the real world view . Some of the deﬁnitions of AR has already been described, but its deﬁnition in context of virtual reality can be formed and compared with the deﬁnition of virtual reality itself. The comparison is shown in ﬁgure 2. It can be seen in ﬁgure 2, that on one side there is the real en vironment and on the other side is the virtual en vironment, then these two combine into a mixed reality , where the augmented reality is virtually based on reality that is based on virtuality and augmented virtuality is based on virtuality which in turn is based on reality . Hence, augmented reality is based on reality where virtual information and images are ov erlapped and superimposed in the right position. I I I . A U G M E N T E D R E A L I T Y : T E C H N O L O G Y Figure 3 shows technological components present in aug- mented reality . On the left, there is content provider server which contains contents viz. 3D geographical assets, ge- ographical information, text, images, movies and point of interest information. On the other side there are three major units to make augmented reality possible and this includes detection and tracking engines, a rendering system, and also interaction devices. Detection and T racking uses computer vision, tracking sensors, camera API and markers and features in order to carry out its task of Recognition. Necessary visualization is provided to the user by using computer graphics API, video composition and depth composition in order to render images. Then, interaction is provided to the user by picking up gestures, voice, haptics and providing a UI. Thus, these three are the combination of software framework and application programming, including browser access, as shown in ﬁgure 3. A. W orkﬂow of AR Fig 4 shows, the user sends an input, and it is detected by the interaction unit, then the interaction unit sends information to the rendering device. No w , the rendering device will take combined input from the detection and tracking unit. In addition, the contents needs to be streamed together and this, combined together by the rendering unit which will send back visual feedback to the user . Fig. 3. T echnological Components in Augmented Reality Fig. 4. AR W orkﬂow B. AR Process AR Process is di vided into ﬁv e steps, which can be de- scribed by an example. • Image Acquisition: In this step, the image is retrieved from the augmented reality camera. In our example i.e. the picture which is shown in ﬁgure 5, is at the image acquisition stage and it is the Leaning T o wer of Pisa in the middle of the device display . Fig. 5. Augmented Reality Process Example • Feature Extraction: It is based on an initial set of mea- sured data. The extraction process generates informativ e non redundant values to cilitate the subsequent feature learning and generalization steps. As depicted in ﬁgure 5, part 2), there are redundant red circles on top of the image, which denotes extraction of features. Once the feature is extracted, these feature extracted points will be used to identify and learn what this object is. • Feature Matching: This is a process of computing ab- stractions of image information and to make a local decision if there is an image feature or not and this is conducted for all image points. In order to identify extracted points, other features are needed to be matched, which is performed in this step and the matched objects are sent for veriﬁcation in next step. • Geometric V eriﬁcation: This is an identiﬁcation process of ﬁnding geometrically related images in the image data set. The image data set is a subset of the ov erall augmented reality image database. Matched objects in previous step are veriﬁed in this step by comparing it with objects in the database and are further sent for retriev al in next step. • Associated Information Retriev al: This process is to search and retrieve metadata, te xt, and content-based indexing information of the identiﬁed image or object. Associated information is used for display on the aug- mented reality screen near the corresponding image or object. As shown in the ﬁgure 5, in the middle lower part, there is Leaning T o wer of Pisa and its information is superimposed on top of the image that is used on the Smartphone. I V . F E A T U R E E X T R A C T I O N P R O C E S S I N A U G M E N T E D R E A L I T Y Feature Extraction is the second step in AR Process, though it composes of six stages as shown in ﬁgure 6. This section will discuss it in detail. • Grayscale Image Generation (GIG): Here, the original im- age captured by the augmented reality device is changed into a grayscale value image in order to make it robust to colour modiﬁcations. • Integral Image Generation (IIG): This is a process of building an inte gral image from the grayscale image. This procedure enables fast calculation of summations over image sub-regions. • Response Map Generation (RMG): In order to detect Interest Points (IPs)using the determinant of the Hessian Matrix of image, the RMG Process constructs the scale- space of the image. Only after having IPs, v arious op- erations can be done which leads close to actual feature extraction. • Interest Point Detection (IPD): Based on the generated scaled response maps, the maxima and minima are de- tected and used as the Interest Points. • Orientation Assignment (OA): Here, each detected Inter- est Point, is assigned a reproducible orientation to provide Fig. 6. Augmented Reality Process Example rotation in variance. Rotation in variance means in variance to image rotation. • Descriptor Extraction (DE): This is a process of uniquely identifying an interest point such that it is distinguished from other Interest Points. A. F eature Extraction with an Example The feature extraction process is about ﬁnding the Interest Points from the image or the video, detecting the descriptors, from the interest point and then compare the descriptors with the data in the database. As shown in ﬁgure 7. Here, ﬁrst is the original image, then gray scale image, which is processed into the interest points, and as sho wn, there are certain locations, where some speciﬁc characteristics are identiﬁed and marked. Then in the next processing stage and based on some color coordination, the descriptors are found. W ith these descriptors database can be queried and ﬁnd matching information of what this object is. As shown here in ﬁgure 7, in order to have accurate descriptors, keen and sharp image is needed. Therefore, now with new enhanced cameras on smart devices, more accurate descriptors are found and therefore, the augmented reality information is more reliable, and there are less errors, and also it is processed much more quickly . B. Blob Detection Feature extraction is a combination where qualiﬁcation for descriptors is needed in a v ery accurate way . Therefore, in variability from noise, scale, and rotation needs to be kept. In addition, there are various kinds of descriptors, e.g. corner descriptors, blob descriptors and region descriptors. In this section, describes blob detection with the help of an example. Fig. 7. Feature Extraction Example A blob is a re gion of an image that has constant or approximately constant image properties. All the points in a Fig. 8. Blob Detection Example blob are considered to be similar to each other . These image properties, which include brightness and color , are used in the comparison process to surrounding regions [13]. C. Algorithm for Blob Detection Follo wing are the steps are needed to be follo w for Blob Detection [14]. Figure 9, sho ws an image with its blobs detected. • Start from the ﬁrst line of the image and ﬁnd groups of one or more white (or black) pixels, known as lineblobs. • Find X, Y co-ordinates of each those blob and number each of these groups. • Repeat this sequence on next line. • While collecting the lineblobs, check whether all lineblobs are collected and whether they overlap or not. • If lineblobs are overlapping then mer ge these lineblobs by using there X and Y co-ordinates to one blob and treat it as a whole blob . Repeat this for ev ery line and you hav e a collection of blobs. D. Other F eature Extraction T echniques Follo wing are some feature extraction techniques: • Haar [15] • Scalable In variant Feature Transform (SIFT) [16] • Histogram of Oriented Gradient (HOG) [17] • Speeded Up Robust Features (SURF) [18] • Oriented F AST and rotated BRIEF (ORB) [19] SIFT and SURF will be described in detail in this section. SIFT is the most widely used feature extraction algorithm. It extracts features from images accurately and efﬁciently . It ov ercomes the various adverse effects of extraction, such as transformation, noise, and lightness. Follo wing is the four step SIFT algorithm. • Step 1. Scale space extreme detection • Step 2. Ke y point localization and ﬁltering, • Step 3. Orientation assignment, • Step 4. Descriptor construction SURF is an improvement over SIFT from the aspect of speed. Its algorithm is based on the same algorithmic principle as SIFT , but uses procedures that require less computation to enhance the processing speed. Therefore, SURF made it possible to carry out feature extraction in a near real-time or a real-time manner . Following is the three step algorithm of SURF: • Step 1. Interest Point Detection, which is about high- speed detection of interest points. • Step 2. Local Neighbourhood Description, which is about descriptors using response of the Haar-w avelet. • Step 3. Matching Process, where faster matching algo- rithm is applied by using a Laplacian operator . V . C O N C L U S I O N Looking at the recent advancements in AR T echnologies viz. accurate cameras and smart devices, it is bound that more and more AR based devices and applications are going to capture its market share. As a pillar to that, the core component of AR i.e. Feature Extraction is going to be technologically challenging. For the same, SIFT and SURF , presented in this paper , are ready to be used as feature extraction algorithms. Apart from that, some more insight can be giv en into other feature extraction techniques that are described in the previous section. Though, emphasis on accuracy of the feature extrac- tion and matching process is still a ﬁeld of research which can be worked on by using new machine learning techniques and algorithms. R E F E R E N C E S [1] Ronald Azuma, ”A survey of augmented reality . ” MIT Press, Presence 6.4 (1997): 355-385. [2] B.B. Bederson, Audio Augmented Reality: A Prototype Automated T our Guide. In Conference Companion on Human Factors in Computing Systems, 21011. CHI ?95. Ne w Y ork, NY , USA: A CM, 1995. [3] Carey , Rick, T ony Fields, Andries van Dam, and Dan V enolia. Why Is 3-D Interaction So Hard and What Can W e Really Do About It? In Proceedings of the 21st Annual Conference on Computer Graphics and Interactiv e T echniques, 49293. SIGGRAPH ?94. New Y ork, NY , USA: A CM, 1994. [4] Dnser, Andreas, and Eva Hornecker . Lessons from an AR Book Study . In Proceedings of the 1st International Conference on T angible and Embedded Interaction, 17982. TEI ?07. New Y ork, NY , USA: ACM, 2007. [5] J. Grubert, T . Langlotz, S. Zollmann and H. Regenbrecht, ”T owards Per- vasi ve Augmented Reality: Context-A wareness in Augmented Reality . ” IEEE Trans V is Comput Graph. March, 2016. [6] C. Conn, J. Lanier , M. Minsky , S. Fisher , and A. Druin. 1989. Virtual en vironments and interacti vity: windows to the future. In A CM SIG- GRAPH 89 Panel Proceedings (SIGGRAPH ’89). ACM, New Y ork, NY , USA, 7-18. DOI=http://dx.doi.org/10.1145/77276.77278 [7] Digital T rends. ¡http://www .digitaltrends.com/features/what-is- augmented-reality-iphone-apps-games-ﬂash-yelp-android-ar-softw are- and-more/¿ [8] M. Mallem, ”Augmented Reality: Issues, T rends and Challenges” In Pro- ceedings of 2nd IEEE Internationcal Conference on Image Processing Theory T ools and Applications (IPT A), 2010. [9] M. Mekni and A. Lemieux, ”Augmented Reality: Applications, Chal- lenges and Future Trends”, In Proc. of the 13th International Conference on Applied Computer and Applied Computational Science (ACA COS), Kuala Lumpur , Malaysia, April 23-25, 2014. [10] C. Arth and D. Schmalstieg, ”Challenges of Large-Scale Augmented Reality on Smartphones”, Graz Uni versity of T echnology , Graz, 2011. [11] P . Milgram and F . Kishino, ”A T axonomy of Mixed Reality V isual Displays” IEICE Trans. on Information Systems, V ol. E77-D, No. 12, December 1994. [12] J.M. Harley , E.G. Poitras and A. Jarrell,. ”Comparing virtual and location-based augmented reality mobile learning: emotions and learning outcomes” Education T ech Research Dev (2016) 64: 359. doi:10.1007/s11423-015-9420-7 [13] Chao Du, Y en-Lin Chen, Mao Y e and Liu Ren, ”Edge Snapping-Based Depth Enhancement for Dynamic Occlusion Handling in Augmented Reality”, In Proc. of 15th IEEE International Symposium on Mixed and Augmented Reality (ISMAR) , Mexico, 2016. [14] S.S.V anjire, E. Khan, P . Mahamulkar, S. Kohli and P . Kedar , ”Imple- mentation of Augmented Reality using Hand Gesture Recognition”, International Journal of T echnical Research and Applications, V ol 3. Issue 2, pp. 253-256, April 2015. [15] Papageorgiou, Oren and Poggio, ”A general framew ork for object detection”, In Proc. of 6th IEEE International Conference on Computer V ision, India, 1998. [16] D. G. Lo we, ”Method and apparatus for identifying scale in variant features in an image and use of same for locating an object in an image”, U.S. Patent 6 711 293, March 23, 2004. [17] N. Dalal and B. Triggs, ”Histograms of oriented gradients for human detection”, In Proc. of 2nd IEEE Int. Conf. on Computer V ision and Pattern Recognition (CVPR), San Diego, CA, 2005. [18] H. Bay , A. Ess, T . T uytelaars and L. V . Gool, ”Speeded-Up Robust Features (SURF)”, Computer V ision and Understanding, V ol 110, Issue 3, pp. 346-359, June 2008. [19] E. Rublee, V . Rabaud, K. K onolige and G. Bradski, ”ORB: An efﬁcient alternativ e to SIFT or SURF”, In Proc. of 13th IEEE Int. Conf. on Computer V ision, Spain 2011.

Feature Extraction in Augmented Reality

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment