Approaches for user profile Investigation in Orkut Social Network

Internet becomes a large and rich repository of information about us as individually. Any thing form user profile information to friends links the user subscribes to are reflection of social interactions as user has in real worlds. Social networking …

Authors: Rajni Ranjan Singh, Deepak Singh Tomar

Approaches f or user profile Invest igation in Orku t Social Network Rajni Ranjan S ingh, Dee pak Si ngh To mar Maulana Azad Nat ional Institute of Technolog y (MANIT) Bhopal, India. ranjansingh06@gmai l.com , deepaktomar@manit.ac.in Abstract- Internet becomes a large & rich repository of information abo ut us as individually . Any t hing form user profile information to frie nds links the user subscribe s to are refle ction of s ocial i n teract ions as user has in real wor lds. Social networking has created new w ays to c ommunicate and sha re information. Social networking websites are being use d reg ularly by millions of people, and it now seems t hat social networking will be an e nduring part of everyday l ife. Social networks s uch as Orkut, Be bo, MySpace, Flickr, Facebook, Friendster and LinkedIn, have at tracted millions o f internet user who are involved in bogging, participatory book reviewing, personal networking and photo shar ing. Social network ser vices are increasingly b eing used in legal and cr iminal investigations. Information posted on sites such as Orkut and Facebook has been used by police, pr obation, a nd univers ity officials t o prosecute users of said sites. In s ome sit uations, content posted on web s ocial network has been used in court. In the propose d work degree of closeness is identifi ed by link weig ht approaches and inf ormation matrices are g enerated and m atched on t he basis of similarity in user profile information. The propose d technique is usef u l to investigate a user profile and calculate closeness /interacti on between users. Keywords— Social networks, similarity measur e, User profile, web communities, link analysis. I. I NTRODUCTIO N Orkut i s one of the earlier & most fam ous w eb s ocial networks run by google plays an important role t o communicate an d share private a nd public i nformation in w eb environment facilitate bogging (scraping), personal networking, photo s haring, cha tting, pri vate mes sing, frie nd search. An i nteresting part of Orkut SNS i s that u ser can s ee not onl y others profile inf ormation but als o others fr iends networks. Rece nt work has attempted to find of web pag es communities b y pe rforming a nalysis on their graph struct ure [1], Mining D irected Social N etwork from Message Boar d [2], Evaluating Similarit y m easure in Or kut netw orks [ 3]. Discover behaviour of Turkish pe ople in O rkut[ 4], t rust based recommendat ion s ystem.our w ork focuse s on indi viduals ' homepages a nd the connecti ons betwee n t hem we c an now use it to characteri ze r elatio nships betw een people. Beyond developing the int erface, we quantitativel y evalua ted t he matchmaking a ppr oach for a ll kinds of i nformati on about the user. To predict whether one person is a friend of another & how much closeness b oth ha s, we ra nk all users b y their similarit y to that person. Intuitivel y, our matchma king approach guesse s that the more similar a person is, the more likely they ar e t o be a fri end. Sim ilarity is measured by anal y sin g prof ile i nformatio n, mutual fr iends, and mutual communities. If we are tr ying to e valuate t he likeli hood t hat user A is li nked t o user B, w e sum t he number of items the two users have in co mmon. Simil arit y betwee n profiles reflects closeness a nd intera ction betwee n users .We only considered di rect frien ds to the ca ndidates f or matching. Computing t he similar ity score f or in dividual to all others friends in direct links, and rank t he others ac cording to th eir similarit y score . We expect s ome fr iends to be more s imilar than others. The rest of the paper is organize d as follows. In section 2 backgrounds in which we disc uss a bout Orkut networks, user profiles, and fri ends networks In Section 3 w e discuss ap proaches f or user pr ofile investi ngation Sectio n 4 illustrate proposed f ramewor k for relati on identificat ion on the basis of profile similarity. S ection 5 shows Experimen tal results, Section 6 discuss C hallenges, S ection 7 C onclusi on and Secti on 8 shows Refe rences used i n this paper. II. BA CKGROUN D A. Orkut an Overview Orkut is our topic of interest i s a free-acce ss social netw orking service ow ned and operate d by G oogle. The se rvice is designed t o help u sers met new fr iends and maintain existi ng relationship s It is one of the most visited w ebsites in I ndia a nd Brazil. B .Crime ove r Orkut Now Orkut social N/W are targeted by criminals an d terrorist to spread wr ong inf ormation plan blast recent ly ma ny abduction terrorist acti vity are noticed by c yber po lice, n ow terrorist use the internet and t ools like E-mai l, Chat a nd soc ial N/wing sites t o plant terr orist at tack. C. User pro file: User p rofile is a n individual use r home pag e consists of mai nly. (1) Social, professional, personal i nformation. (2) Links t o other frie nds profil e called friend list. (3) Comm unities (4) Ph otograph of user. (5) Scrapbo ok (6) Ph oto album (IJCSIS) International Journal of Computer Science and Information Security, Vol.6, No. 2, 2009 259 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 T ABLE 1 S ITE FEATURES O F O RKUT S OCIAL NE TWORKS Site Feature Orkut Profiles Public ly v iewable pr ofile Advertisement Yes Interface Simply and easy to Unders tand Chat Yes, Google chat Search Yes, Google sear ch Customizable No Online/offline communicati on (Orkut scrapbook) Support multiple la nguage Friends rating and profile view Yes u can ra te your friends D. Friends Networks from U ser Prof iles. The friend’s netw ork of Orku t, our topic of stu dy, has tw o varieties of ac counts: users and c ommunities. TABLE 2 USER, COMMUNIT Y, LINK RELATIONS HIP Start End Link Denotes User User Trust or friendship User Community Readership or Subscribership Community User Membership, Posti ng access, maintainer Community Community Obsolete 1). Friendlist: - A user has con nection wit h t heir frien ds b y maintaining fri endlist c onsist of l inks of a ll friends pr ofiles reflects user social relation. User explicitl y add s friends by accepting frien d’s request. Fig. 1 Friend list 2) Communities: - Commu nit y is a group of user prof iles share c ommon intere st. Anyone wit h an Orkut account can create a com munity on a nything. Fig. 2 Communities III. APP ROACHES FOR U SER P ROFILE INVES TIGATION A) I dentifying connectio n How t wo pr ofiles a re c onnecte d and how much closene ss bo th has, this approaches is used t o detect the c onnection betw een two or more suspicious profiles means how two criminals ar e connected in we b envir onments. Users in W eb social network visualize as a node and li n k between u sers re flect r elationship. A user has connecti on wi th their frien ds b y maintaini ng fri end list c onsist of link s of all friends pr ofiles. Our initial a pproach to link identif ication consi sted of dividing friend’s netw ork feature s into grap h features [3] . Figure 3 Friends/neighbours 1. in degree of u : popul arit y of the user 2. in degree of v : pop ularity of t he candidate 3. Out de gree of u : nu mber of other fri ends beside s the Candidate; sat uration of f riends list 4. Out de gree of v : num ber of existi ng friends of t he Candidate be sides the u ser; Correlate s loosel y with Likelihood of a reciprocal link 5. Number of mutual fr iends w such that u → w  w → v 6. “Forwar d deleted dista nce“: minim um alterna tive Distance f rom u to v in t he graph wit hout the ed ge ( u , v ) 7. Backward distance fr om v to u in the graph These were s uppleme nted by inter est-base d features: 8. Number of mutual i nterests betwee n u and v 9. Number of interests lis ted by u 10. Nu mber of interests listed b y v 11. Ratio of the number of mutual int erests to the number Listed b y u 12. Ratio of the number of mutual int erests to the number Listed b y 13-path length : number of li nks (edges) bet ween u and v in a given path. 14 hop c ount: nu mber of v ertices (users) between u a nd v in a given path . 1) Investig ate user profile findout the s trong connection to other profiles E xtract frie nds profiles bel ongs to same c ities, education ,c ollege etc. (IJCSIS) International Journal of Computer Science and Information Security, Vol.6, No. 2, 2009 260 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 2) Investigate t he relationship link betwee n two or m ore profiles (used t o find- out connection bet ween sus picious profiles) W is the closene ss/interaction level B) Closeness i dentification W can be calc ulated b y one of the tw o methods 1) On the ba sis of commu nicati on 2) On the ba sis of profile similari ty. 1) O n the b asis of commun icati on In a social network ba sed upon online communicat ion, the dis tance between individuals does not mean `geographical distance' because each pe rson liv es i n a virtua l world. In stead, distance can b e considered `psychologica l distance' and this can be measured by the influence" wielded among the members of t he netw ork. Consider the situation where a n individual p has a great deal of influence on an individual q[2]. In this case, we ca n consider three type s of relationship. Case1: p is clos e to q. Case2: q is clos e to p. Case3: p and q are clos e to each other. Figure 4: A message chain of messages sent by t hree individuals. 2) On the b asis of profi le similarity. 2.1) On the basis of conte nts unique ness 2.2) On the basis of conte nts similari ty 2.1) on the basis of co ntents uniqu eness Similari ty is measured by a nalyzing text, li nks. If we are trying to evaluate the likelihood that user A is linked to user B , we s um the number of i tems the t wo us ers have in co mmon. Items t hat ar e unique to a few users ar e weighted more than commonly oc curring items. The weighting scheme we us e is the inverse log fr equency of their occurrence. For example, if only two people mention an item, then the weight of tha t item is 1/log(2) or 1.4.[ 1 ] It is possi ble with this algorithm to evaluate each s hared item type independently ( i.e. links, mailing lists, t ext) or to combine the m together into a single likeness score. 2.2) on the basis of co ntents si milarity Propose w ork focus on c ontents based similarit y. IV.PROPOSED WORK: SI MILARIT Y MEASU REMENT BASED ON CO NTENTS SI MILARITY Similarity between pr ofiles refl ects closeness and interaction bet ween user s. Similarit y is meas ured b y prof ile informati on given by user. Similarit y measureme nt process consists of f ollowing. (IJCSIS) International Journal of Computer Science and Information Security, Vol.6, No. 2, 2009 261 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 Fig. 5.proposed architecture A. Extraction of Orkut network. Online social networks are part of the We b, but their data representations are ver y differe nt from general we b pa ges. The Web pa ges tha t describe a n i ndividual in a n online s ocial network are t ypically w ell structured, as t hey ar e usual l y automaticall y generated, u nlike general web pages, w hich could be auth ored by any per son. Theref ore, we can be ve ry certain about what pieces of da ta we can obtain after crawling a particula r indi vidual’s web page s. In Orkut, l inks are undirected and lin k creati on r equires consent from the tar get. Since, at t he ti me of the crawl, n ew users had to be invited b y an e xistin g user t o join t he s ystem. Because O rkut does not export a n AP I, we can resort to HTM L scree n scraping to conduct crawl Furthermore, Orkut l imits the ra te at which a single I P a ddress c an download inf ormation a n d require s a logged-in account t o br owse t he netw ork. A s a result , it took more than a m onth to crawl million users [ 5]. From user pr ofile we obtain the user’s s ocial, p rofessi onal, personal inf ormation like-rel igion, ethnici ty, age, hom etown, cit y , countr y, language speak, educ ation, col lege universit y ……t otal 68 fi eld of user profi le. Be sides this local information, the re are us ually links t hat we ca n use to trac e the user’s connect ion t o the others, which are hyperlinke d to those friends profile pa ges. Thus, b y extrac ting these hyperlinks, we can construc t the graph of connecti ons between all t he users in the social network. E very user profile and communitie s in Orkut social network assigned a unique ID by which the y uniquel y identifie d. Examples: - Hyperlinks to f riend’s pr ofiles:- http://www.orkut.co.in/Main#F riendsL ist.aspx?uid=12760208310579966367 http://www.orkut.co.in/Main#F riendsL ist.aspx?uid=13317425663991525398 http://www.orkut.co.in/Main#F riendsL ist.aspx?uid=13973160759338911611 http://www.orkut.co.in/Main#F riendsL ist.asp x?uid= 14636742881815046201 http://www.orkut.co.in/Main#F riendsL ist.aspx?uid=17422035763385012687 http://www.orkut.co.in/Main# Community.aspx?cmm=180343 70, http://www.orkut.co.in/Main# Community.aspx?cmm=831246 8 Friends profile ids:- Community ids- 12760208310579966367. 18034370. 13317425663991525398. 8312468. 13973160759338911611. 14636742881815046201 17422035763385012687 A web data ext ractor is i nco rporate to e xtract profiles fr om orkut.com. It extracts te xt and links of a gi ven orkut profi le. Text re presents profile genera l inf ormation and li nks rep rese nt social connections, like- friends and c ommunities. Web data extractor e xtract web page t hat are currently ap pear on the browser. To extract f riends (neighb ours) profile we have t o browse ever y individual prof ile on t he browser (inter net explorer) and should ru n the web e xtractor pr ogram manua lly. Figure 6 Orkut User prof ile extraction Figure 7 Extracted text f ile, contains profi le Information (IJCSIS) International Journal of Computer Science and Information Security, Vol.6, No. 2, 2009 262 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 Figure 8 Extracted Link I nformation (friends profi le links, communities’ limks) B. Pre-Process ing. Extracted data is in unstruc tured f orm. It ca n not d irectl y used for relationship identific ation, so data must pr ocess and convert into structur ed form to do this. Durin g r esour ce extraction separate file s are cre ated for ever y orkut pr ofiles contains te xt a nd lin ks i nformati on. C code has been developed based on t oken searching appr oach that take s extracted file a s a input a nd generate output consi st of- 1) Friends pr ofile ID’s, 2) Communities ID’s 3) Users general informati on like social, p rofessi onal, educational, i nterest, Contact . 1) Extract ion of val uable information: All information of user homepage i s not required for similarity mea surement. So, onl y feasible informat ion i s e xtract ed. O ut of 68 fi elds 20 fields are considered f or similarit y measurement T ABLE 3-EXTRACTED PROFILE INFORMATION Social Professional Gender, Relationship status, languages speak, Ethnicity , Religion, Smoking, Drinking, Sports, Hom etown, Activ ity, Ci ty, Zip/postal code, State, Country. Education, College/University, Degree, Occupation, Industry, Company 2) Categoriz ation Extracted informati on is classified ac cording to their characteristics. Figure 5 Categorised Inf ormation C. Matchin g process. Categorised fie lds are used for similarit y mea sureme nt. In this proposed wor k, simila rity is meas ure bet ween a user profile and hi s friends that are directl y li nked/ hyperlinked from his homepage. S o w e calculate similari ty w eight between a s ource profiles a nd his frie nd’s networ k. 1 ) Weight Assignment: Some profile fields are better predictor of connection tha n others, li ke- if two people men tion sa me cit y or working in sam e organizati on then it shows stron g connection as c ompared to those peoples who mentioned different cities. Pr o fessional information like if two p eople stud y ing in same univer sity/college then c ollege/univer sit y field wei ght w ould be always higher than e ducati on a nd degree fiel d. So for better pre diction of r elationship, different weight is assigne d to differ ent profile fiel ds. Two methods a re proposed to decide wei ght of diffe rent fields: • Binary wei ght Assignm ent • Weight on t he basis of hie rarch y 1.1) Binary Weight Assignment: Binary weight assigned to ever y fields. Professional & Edu. Abbreviation Weight Education W ed 1/0 Degree W d 1/0 College/University W cu 1/0 Industry W in 1/0 Occupation W cy 1 /0 Company W oc 1/0 Contact Inf o. Abbreviati on Weight Home Town W t 1/0 PIN Postal Code W p 1/0 City W c 1/0 State W s 1/0 Country W co 1/0 (IJCSIS) International Journal of Computer Science and Information Security, Vol.6, No. 2, 2009 263 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 Figure 6 Binary Weight Assignments Binary weight (0 or 1) i s assi gned to ever y fie lds and the se fields weigh t ar e multipli ed w ith matc hing r esult Discuss i n next section. Binar y we ight can be used to mask t he r esult (to show somet hing or to hide something). 1.2) Hierarchy based wei ght assignme nt . Figure 9: Contact I nformation Hierarc hy Figure 10: Educational information Hierarchy Figure 11: Prof essional information Hierarc hy . Weight is ass igned acc ording to the position in hiera rch y Figure 12 Hierarchy Based Weight Assignmen t Some cate gories d oesn’t for m hierarch y like i nterest, per sonal. Whereas s ome c ategories like c ontact, professional and educational forms hierarch y . So that, the weight is assigne d according t o the positi on in hierar ch y . 2). Matching Matrix/ Ta ble Cre ation. Separate mating matri x has created f or ever y cate gory of informati on. T ABLE 4.G ENERAL M ATRIX F ORMAT Friends Field1 Field2 Field N Total Weight f1 0/1 0/1 0/1 f2 0/1 0/1 0/1 . . . Fn 0/1 0/1 0/1 0-Match, 1-No Ma tch Table 6 s hows t he g eneral f ormat o f matchin g m atrix. Columns specif y the f ields and rows specif y the friends matching r esult, in the f orm of 0/1 (1-f or perfe ct Matc h, 0-N o match). Figure 13 User connect ions f1, f2,…,fn a re frien ds of U.U is a source pr ofile fr om which we perf orm matchin g. T ABLE 5: C ONTACT IN FO M ATCHING M ATRIX / T ABLE . Friends Home town Pin code City State Country Total Weight f1 0/1 0/1 0/1 0/1 0/1 W c1 f2 0/1 0/1 0/1 0/1 0/1 W c2 . . . . . . fn 0/1 0/1 0/1 0/1 0/1 W cn X X X X X W t W p W c W s Wc o Personal Info. Abbrevi ation W eight Gender W g 1/0 language W la 1/0 Religion W re 1/0 Ethnicity W et 1/0 Relati onship stat us W rs 1/0 Interest Abbreviation Ab b. W We ight Sports W sp 1/0 Activities W sm 1/0 Smoking W dr 1/0 Drinking W ac 1/0 Contact Inf o. Weight Home Town 3 PIN Postal Code 4 City 3 State 2 Country 1 Job descri ption Weight Industry 1 Occupation 2 Company/organiza tion 3 Educational Info. Weight Education 1 Degree 2 College/University 3 (IJCSIS) International Journal of Computer Science and Information Security, Vol.6, No. 2, 2009 264 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 W t , W p , W c , W s , W co a re the w eight of hometown, pin code, cit y , state, c ountr y re spectiv ely and these weight wi ll be multiplied w ith the match ing r esult. W c1 , W c2,…, W cn are total matching score. Matchin g mat rix specifies the similar it y between U and their friends acc ording to the simila rity in contact inf ormation. T ABLE 6: E DUCATION AL & P ROFESSIONAL INFO . M ATCHING M ATRIX / TABL E Friends Education Degree College Ind ustry F 1 0/1 0/1 0/1 0/1 F 2 0/1 0/1 0/1 0/1 . . . . F n 0/1 0/1 0/1 0/1 X X X X W ed W d W cu W in Occupation Company Total Weight 0/1 0 /1 W p 1 0/1 0 /1 W p 2 . . . 0/1 0 /1 Wp n X X W cy W oc W ed , W d , W cu, W in ,, W cy , W oc , are the weight of fiel ds Education, degree, c ollege/univer sity, i ndustr y, occupa tion compan y, respec tivel y a nd will be multi plied wi th the matching result (1/0). W p1 , W p2 ,…,Wpn are total matchin g score. Matchin g matrix specify the similari t y between U an d their fr iends acc ording t o t he similarit y i n pr ofessional & Educational in formation. T ABLE 17: P ERSONAL I NFORMATION MATCHING M ATRIX / TABLE Fri. Gender Language E thnicity Religion R ela. Status Tot. Weight f1 0/1 0/1 0/1 0/1 0/1 W pe 1 f2 0/1 0/1 0/1 0/1 0/1 W pe 2 . . . . . . Fn 0/1 0/1 0/1 0/1 0/1 W pe n X X X X X W g W la W et W re W rs W g , W la , W et ,W re ,W rs are the weight of fields Gender, Language, Ethnicit y, Re ligion, Relations hip status respectivel y and will be multiplied wi th t he mat ching result (1/0).W pe1 , W pe2 ,W pen are total mat ching score of f riend f1,f2,fn res pectivel y wi th U. matching matrix specif y t he similarit y betw een U and their friends acc ording to t he similarit y in Per sonal inf ormation. T ABLE 8: I NTEREST I NFORM ATION MATCHIN G MATRIX Fri. Sports Smoking Drinking Activi ties Total Wei. f1 0/1 0/1 0/1 0/1 W i1 f2 0/1 0/1 0/1 0/1 W i2 . . . . fn 0/1 0/1 0/1 0/1 W in X X X X W sp W sm W dr W ac Wsp, Wsm, Wdr, W ac are the weight of field s Spor ts, Smoking, D rinkin g,, Activit ies res pectivel y and wi ll be multiplied with the matc hing result (1/0).Wi1, W i2,Win are total matching sc ore o f friend f1,f2,fn r espectivel y wi th U. matching matrix specif y the similarit y between U and the ir friends accordi ng to the s imilar ity in user interest informati on. 3) Weight of mutual friends & mutual communit ies 1) Mutu al f riends: - Specify t he m utual social c onnection between users. W hen two friends havin g greater number of mutual frien ds then the y creat e mutual soc ial ne tworks. S o for finding close ness, w eights of mutual frie nds are added in resultant simi larit y score. Number of mutual fr iends w such tha t u → w  w → fi U – Base pr ofile, and f i - indicate fri ends profiles. Mf i - mut ual friends wei ght betwe en u and fi . WAF- weight adjustment factor deci des t he u pper limit o f the weight, if WAF is 10, t hen maximu m weight of mutual friends cannot exce ed above 10. 2) Mutual c ommun ities:- speci f y the mutua l interest b etwee n users. Users may bel ong to a ny number of c ommunities t hat reflect his interest. When two fr iends have greater nu mber of mutual c ommunities t hen t hey are more close d a ccordi ng to shared interest. So, wei ghts of mutual commu nities are added into resultant we ight. U – Base pr ofile. f i - indicates frie nds profiles. Mc i - mutual c ommunitie s weight of user i. Mutual c ommunities (u, f i ) = m utual communit ies between u and fi WAF- wei ght adjustme nt fact or defines t he upper li mit of weight, if WAF is 10 then maxi mum weight of mutua l communities can not excee d above 10. D. Visual re sult Corre lation: (IJCSIS) International Journal of Computer Science and Information Security, Vol.6, No. 2, 2009 265 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 Fig.14. Resultant Simil arity Weight Wc1, Wc2…. Wcn-Contac t information si milarit y score. Wp1, Wp2… Wpn-Pr ofessional Similar ity matchin g Score. Wpe1, Wpe2..W pen-Pers onal Info. S imilarit y M atching Score. Wi1, Wi2…… Win-Interest Matchin g Score Mfi, Mf2……. Mfn-Mut ual friends Weight . Mcu1,Mcu 2...Mcu-Mutual Commu nities weight. TW1, TW2, TWn=total similar ity weight betwee n U and f1,f2,fn restricti vely. WT 1 = Wc 1 +Wi 1 +Wp 1 +Wpe 1 +Wf 1 +Wc 1 +WT1 WT 2 = Wc 2 +Wi 2 +Wp 2 +Wpe 2 +Wf 2 +Wc 2 +WT2 WTn= Wc n +Wi n +Wp n +Wpe n +Wf n +Wc n +WT n V. E XPERIMENT R ESULTS Data is gathered fr om www .orkut.com. In this wor k 9 Orkut pr ofiles are used as sam ple data, extracted by using w eb data extrac tor. Pre-processi ng an d matc hing has been performed b y pre- processin g engine devel oped in C langua ge (using fi le handing & string functi on). Token sea rchin g approach i s used to extract use ful patterns. In this e xperiment, matching is performed betw een a base pr ofiles ( ex. Ra jni ranjan singh) and thei r neighbour s profiles. One-to-man y profile matchi ng is perf ormed. Binary Weight Assign ment (IJCSIS) International Journal of Computer Science and Information Security, Vol.6, No. 2, 2009 266 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 On the basis of hierarch y Chart I: Similarity Scor e using binary weight assignmen t Chart II: Similarity Score using Hiera rchical weight assignment Two res ult Matrix has been generated. One is b y using bi nary weight an d another one is b y using hierar chical weight assignme nt. Total similarit y sc ore is t he succe ssive addition of each cate gorized match ing re sults. As per sh own in c hart 1, Nilesh pr ofile is closer to the base pr ofile ( R ajni ranjan profile) since personal, contact and mutual communities’ fiel ds are more similar to base p rofile. In chart II, Devvrit profile is closer t o base profile in which c ontact i nformation pl ay a n important r ole. VI. CHALLENGE S A) Orkut pri vacy issue and extractio n of meani ngful patter ns Of course, not all users provide t heir social, professi onal, personal, i nterest’s i nformati on, a nd even i f the y did, privac y settings may prevent us fr om vi ewin g t heir p r o file content s. This availabil ity of data was not a huge probl em, but it could potentiall y skew our ability to extract mea ningful patterns. The bigger issue, howe ver, is the natural lan guage processing problem: we are ultimatel y i nterested in the meani ng be hind the words. If diffe rent users list “Ma dhya P radesh” and “ M.P” , same as in C ollege/U niversity field users li st MANIT a nd Maulana aza d NIT” a nd “NITB” it is hard f or us to realize t hat (IJCSIS) International Journal of Computer Science and Information Security, Vol.6, No. 2, 2009 267 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 these are t he sa me state a nd same colle ge. P lus, this i s a syntactical is sue; more intere sting situati ons are when different user s ma y list intere st (Activitie s): “joggin g throu gh the streets,”“jo gging,”“ jogging!,” and“ jogging, but only on a treadmill.” Eve n as a hu man it can be diff icult t o de termine how specific we should be when classif ying. Last, even if we only loo k for keywords, there could be errors: “ scho ol,” “anything b ut scho ol” is cle arly not the sa me. To reduce these problem we tr y t o w ork on data that are c ome f rom fiel ds contains only predef ined set of data, in programmin g term we can say data taken from Combo boxes like countr y, re ligion, ethnicit y , smoking, Dri nking, G ender, Relationship Status ,Educa tion, Degree, Industr y ,lan guage speak. Web social networks are d ynamic by nature user may add more f riends and join man y c ommunities and c an c hange profiles c ontents so similarity score may chan ge acc ording to time. VII CONCLU SION This paper aim s t o answe r t he questi on: Are social links v alid i ndicators of real user i nteraction? Pr ofile based similarit y show t he exact relati onship betwe en use rs. Similarit y between tw o-user profiles on Or kut is measured on the basis of social, geographical , educat ional, pr ofessional, shared interest (includ ing mutual c ommunities) and mut ual social c onnection ( mutual friends). T he meas ured simi larit y score may b e used as a trust between users. Profile based similarit y measurem ent i s us eful f or in vestigation of use rs profile. Si milarity betw een us er profiles re flects cl oseness a nd interaction betw een users. VII REFERENCE S [1] Adamic L.A. and Adar. E “Frien ds and N eighbors on the web” Social Networks,Vol 25,2007, pp 211-230. [2] Mining Directed Social Network from Message Board, Naohiro Matsumura Osaka University,Dav id E. Goldberg UIUC,Xavier Llor `aUIUC WWW 2005, May 10–14, 2005, Chiba, Japan.ACM 1595930515/05/0005. [3] Structural Link Analysis from Use r Profiles an d Friends Networks: A Feature Construction Approach William H. Hsu Joseph Lancaster Martin S.R. Paradesi Tim Weninger I CWSM'2007 Boulder, Col orado, USA. [4] Discover behav iour of T urkish people in Orkut Alberto Ochoa-Zezzatti, Javier Martínez, Alberto Hernández & Jaime M uñoz,UAIE UAZ / Instituto Tecnológico de León (Maestría e n Sistem as I nteligentes).17 th internatio nal conference on electronics ,communication and computers(CONIELE COMP’07)) 0-7695-2799- x/07 2007 IEEE [5] Measurement and A nalysis of Online Social Ne tworks ,Alan Mislove,Massimiliano Marco n, Krishna P. Gummadi, Peter Dr uschel, Bobby Bhattacharjee, I MC’07, October 24-26, 2007, San Diego , California, USA. Copyright 2007 ACM 9 78-1-59593-908-1/07/0010. [6] ”An Integrated method for social network extraction” Masahiro Hamasak , Yutaka Matsuo,WWW 2006 may 23-26,2006 ACM 1- 59593-332-9/06/0005. [7] Trust and Nuanced Prof ile Similarity in Online Social Netwo rk s Je n ni f er Golbeck_ golbeck@ cs.umd.edu University of Maryland, College Park 840 0 Baltimore Avenue, Suite 200, College Park , Maryland 20740 [8] A Framework t o identify relationship among user profi le in Web Social Environment,Dee pak s ingh T omar,Sc shrivastava,Rajni Ranjan Singh , 2 nd National Conference on Emerging Principal and Practices of Computer Science and Information Technology (EPPCSIT 2009) at Ludianna:ISBN:81- 89652-36-6. Authors pro file: Mr. Rajni Ranjan Sin gh , received M Tech in Informati on Securit y from Maul ana Azad Na tional Institute o f Technolo gy Bhopal India in 2009. He obtained his BE in Computer Science and En gineerin g from SATI Vi disha in 200 6. His research acti vities base d on, System an d netw ork secur it y, Web Techno logy. Mr. Deepak Sin gh T omar , working as Assi stant Professo r in Depart ment of Comp uter Sc ience at MAN IT B hopal. His research activities are based on di gital f orensics, data mi ning and networ k securit y . (IJCSIS) International Journal of Computer Science and Information Security, Vol.6, No. 2, 2009 268 http://sites.google.com/site/ijcsis/ ISSN 1947-5500

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment