Field Label Prediction for Autofill in Web Browsers
Automatic form fill is an important productivity related feature present in major web browsers, which predicts the field labels of a web form and automatically fills values in a new form based on the values previously filled for the same field in oth…
Authors: Joy Bose
Field Label Prediction for Autofill in Web Browsers Joy Bose Microsoft IDC Hyderabad, India joy.bose@ieee.org Abstract — Automatic form fill is an important productivity related feature present in majo r web brows ers , which predicts the field labels of a web form and automatically fills values in a new form based on the values previously fil led f or the same field in other forms. This feature incre ases the convenience and efficiency of users who have to fill similar information in fields in multiple forms . In this paper we describe a machine learn ing solution fo r predicting the form field labels, im pl e mented as a web service using Azure ML Studio. Keywords — Auto form fill , web browser, prediction, machine learning, web service, Azure ML I. I NTRODUCTIO N Automatic fo rm fill [1 -3] is a feature i n web b r owsers where the fields in a web form are filled autom atically upon loading of the form. This w orks by predicting the field labels and automat ically suggesti ng or filli ng the previously stored informatio n for those fi elds, b ased on the us er ’ s historical data stored l ocally in the browse r . T his f eature is present in all major web browsers and resu lts in producti vity enhanceme nts since the us er does not have to fill t he same form fiel d repeatedl y for multi ple forms. In order for this f e ature to work, the field labels for a new form would have to be predicted correctl y. This is the problem we are seekin g to solve in this paper . A naïve approach for predicting t he form labels is to keep a file wit h the extr acted fe atures and the p redicted f orm labels for each field in each fo rm . Howe ver, such an approach would n ot be scalabl e, sinc e the we b has bil lions of for ms. Chrome [1], Fire fox and other major browse rs use a combinati on of heuristic ru les for this purpose. An exam ple of such a rule may be: if the Id or name or label of the field HTML follows a specific regu lar expressi on, then the predicte d field sh ould be set to a sp ecific value. Howev er, suc h a n approach may not give desire d resul ts for s ome ki nds of forms where t he Id may be ambig uous or for forms t hat are dy namic , such as f orms be hind a l ogin or paywall . In s uch a case, a machine learn ing based appr oach, based on predicti ng the fields of a web form based on learning usi ng a label ed dataset , might give b etter resu lts. In this pape r we d escribe the design and imple mentati on of a mac hine lea rning s olution , compris ing a web service to predict the field labels for the fields in a given web form. A machine learning model is trained on the server , a nd pre dicts the label of the fo rm field in real time given a set of features extracted from the HTML of the web form . Alternati vely , the trained model fi l e ca n a lso be inc luded as part of t he web browser executable or b y usi ng a browser e xtensio n. Howev er, we focus on the web ser vice approach in this paper si nce it is easier t o update the mo del when n ew data is acqui red. It must be noted that the act ual form da t a remains in the web browser client since that data i s private, only the f e atures related to the fo rm and fiel d details e xtracte d fro m t he HT ML are sent to the server a nd used to predict t he f ield labels . Fig. 1 shows a sample autofill interface. Fig. 2 shows the steps of trainin g the machine l earning model, and fig. 3 shows the arch itecture of the we b service to pre dict th e field la bels. Fig. 1. A sample i nterface for a utofill sugges tions for a web f orm Fig. 2. Steps for training t he machine learni ng model to predi ct form fields Fig. 3. Ar chitecture of the we b service t o predict the la bels for form fields II. R ELATED WOR K The autofill feature i n browsers [1 -3] has been ar ound for a long tim e, and a number of patents [4 -6 ] also exist in t his a rea. Liddle [7 ] built an early proto type for extracti ng user data for auto form fill . They expl ored the feasibility of ge tt ing the relevant user informati on to enabl e autofill, experiment ing wi th differe nt queries. How ever, this soluti on was proposed bef ore autofill was widely available in major web browsers, so it concentrat es on the feasibilit y rather th an the performance accuracy . Winckler et. al [8 ] pr oposed a solutio n to autofil l that explored getting user inform ation from different sou rces of user data with va rying l evels of privacy. Hartmann [9] explore d the feasibilit y of building an auto form fill soluti on that is context aware , using a mapping between a context store and the user inte rface o f the f orm to get more accu rate la bels . Wang [10] analyz ed web user inter face components, using clusteri ng to f i nd sema nticall y similar UIs to enable autofill rather t han usin g fixed l abels as used in major w eb browsers . However, the above solutions did not focus more on the practical issues encountere d in a large scale implementati on . They als o did not directl y us e a mach ine l earning s olutio n to classify the labels of the form fields, as is the focus of this paper . In the followin g sect ions, we deta il the steps of preparing the datase t and t raining the machin e learni n g model. III. D ATASET PREPARATI ON In order to train a machine learnin g model to predict the field labels in a web f orm, the first step is to gener ate a dataset. This is d one by ext racting features from the HTML of multiple web forms and manuall y provi ding a label for each o f the fields. Co mmon labels used in most web forms are na me, address, username , passwor d, age etc . We used a crowdsourced method with human labelers to generate a datas et of around 4000 values from commonly used web forms. We use label, name, id an d URL as the input feat ures . W e performe d some b asi c preprocessing to remo ve st op wo rds etc. For each of the form fiel ds, we train a separate bina ry classifi er, although one can also exper iment with a multi - class classifi cation a pproac h. For encoding the inp ut dataset as a featur e vector, we used a one hot encoding approach , where we got a dicti onary o f possible values and each of the features was a v a lue in that dictiona ry. For exam ple, we g ot all the possible field names from our datas et (such as username, username _01 etc) and built a fea ture vect or. The fi nal featur e vector is built by concatenati ng the indivi dual encodi ngs of the differ ent features. Fig. 4 sh ows a sa mple dat aset fo r an email label predict or. Fig. 4. A sample dataset for predicting whether the label is email. The last column is the predicted label as a binary value IV. C LASSIFICATION USI NG A ZURE ML STUDIO Azure Machine Learning ( ML ) studio [11] provi de s a simple and easy to use interface for traini ng and deploying machine lear ning models. One has to sim ply drag and dro p modules from a menu, which can also incl ude custom cod e and draw connect ions between t he diff erent m odules. Fig. 5. I nterface for training a machi ne learning model usin g Azure ML studio The Azure ML studio gives features for ingesting data in differe nt i nput forms s uch as CSV, d i ffere nt types of i nput preprocess ing , experimen ting with different types of models along with scoring and evalu ation of models . An additional advanta ge, aside from ease of use, of the studi o is that it can easily convert the model t o a we b service host ed o n Azur e and provide the APIs to call in o rder t o get t he r esult f rom th e web service. T hat is why we p referre d to use it. However , one may use any other ma chine lea rning pla tform a nd get t he same results wi th si milar mod els and p arameters . Using the Azure ML studi o, we traine d a model on our dataset and experi mented with different machi ne learning algorith ms and parameters, trying t o opt imize the accur acy. Once we had obtain ed a desired lev el of accurac y for each of the labels ( email classifier, state cl assifie r, or multi class classifi er etc), we expos ed the mo del as a web serv ice to call from the web browse r client using a browse r extensi on . Fig. 5 gives the screenshot of the interfa ce for a field label predictor using Az ure ML st udio . V. M ODEL AND PRELIMIN ARY RESULTS We config ured a tr ain:test rati o of 70:30 f or our so lution . After experimenti ng with differ ent models inclu ding linear regressi on, s upport vector machines and decisi on tr ees, we found that a mul ti class decision forest (o ne of t he off-the-sh elf ML al gorithm libraries available i n t he Az ure ML studio ) gave the best results for ou r datas et. The paramet ers we used for the decisi on forest are as follows: resampling m e thod = bag ging, number of de cision trees = 16, maximum depth = 100, random splits per node = 128, maxi mum sam ples per leaf no de = 1. After tuning the model parameters to improve the accuracy , we obtaine d an overal l precisi on of around 9 5% for the email classifi er. We obtained comparable results for the multi -class classifi er as w ell. VI. C ONCLUSION AND FUTU RE WORK In this paper, we d is cusse d a machine learning based solutio n for autofill feature of web browsers. We trained the model and implemente d the system as a web se rvice using the Azure mac hine learning st udio, and obtai ned good results for our form field lab el classi ficati on. In fu ture, we intend to train our model on a bigge r and more varie d dataset. We also plan to expe riment with a hybri d approach b as ed on a ens emble of d i fferent ap proaches (l ookup table , re gular ex pressions a nd machine le arning) . S uch a hyb rid approach might be expected to work better for most cases than any of t he indiv idual ap proaches . R EFERENCES [1] Fill out forms automatically- Google Chrome help [Online]. Available: https://support. google.com/c hrome/answer/14 2893 [2] J. T. Kaines. How to set up autofill i n Google Chrome. [Online]. Available: h ttps://www.dummi es.com/educa tion/internet-basics/how- to - set- up -autofill- in - google-chrome/ [3] Automaticall y fill your address in web forms – Firefox help. [Online]. Available: https://support.mozilla .org/en-US/kb/autom atically-fill-your- address-web-forms [4] Light J, Garney J, inventors; I ntel Cor p, assignee . Automatic we b b ased form fill-in. Un ited States paten t US 6,192,38 0. 2001 Feb 20 [5] Yolleck S, Yang E, Walters D, Glasgow D, inventors; Microsoft Cor p, assignee. Metho d for a browser auto form fill. United States paten t application US 11/053,217. 200 6 Aug 10. [6] Goodman JT, K a die CM, Chickering DM, Bradfor d DE, Glasgow DA, inventors; Microsoft Corp, assignee. Intelligent autofill. United States patent US 7,25 4,569. 2007 Au g 7. [7] Liddle SW, Embl ey DW, Scott DT , Yau SH. E xtracting data beh ind web forms. InInternational Conference on Conceptual Modeling 2002 Oct 7 (pp. 402-413). S pringer, Berlin, He idelberg. [8] Marco Winckler, Vicent Gaits, Dong-Bach Vo, Firmeni ch Sergio, a n d Gustavo Rossi. 2011. An ap proach and tool sup port for assisting users to fill-in web forms with personal informat ion. In Proceedings of the 29th ACM international conference on Design of comm unication (SI GDOC '11). ACM, New Y ork, NY, USA , 195-202. [9] Hartmann, M., Muhlhauser, M. 2009. Context-Aware Form Filling for Web Applica tions. In Proceedings of the 2009 IEEE International Conference on Sema ntic Comput ing (ICSC '09) IEEE Computer Society, Washi ngton, DC, USA. [10] Wang S, Zou Y, Upadhyaya B, Ng J. An intelligent f ramew ork for auto - filling web forms from different web applications. In2013 IEEE Ninth World Congress on Services 2013 J un 28 (pp. 17 5-179). IEEE. [11] Azure Machine L earning: Micr osoft Azure . [Onli ne]. Available: https://azure.mi crosoft.com/en-in/ser vices/machi ne-learning/
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment