# Weibo ## Introduction Weibo is a twiiter-like website in China. The data we are studying is a subgraph generated by star-sampling with 1M seeds. ## Dataset The raw data is stored in json and formated as follow: ``` 1000010031 {"id":1000010031,"screen_name":"刘峰Arthur","name":"刘峰Arthur","province":"11","city":"5","location":"北京 朝阳区","description":"请给我一个美丽的名字,好让她在夜里可以低唤我,让我在奔驰的岁月里,记得我们相爱的事...","url":"","profile_image_url":"http://tp4.sinaimg.cn/1000010031/50/5620733478/1","domain":"arthurlf","gender":"m","followers_count":161,"friends_count":42,"statuses_count":104,"favourites_count":0,"created_at":"Sat Aug 13 00:00:00 +0800 2011","following":false,"allow_all_act_msg":true,"geo_enabled":true,"verified":false,"status":{"created_at":"Wed Jan 11 22:07:56 +0800 2012","id":3400772775784942,"text":"拖着疲惫的身躯仍丢不掉心中对爱的信仰,坚信你我只要有一颗宁静的心,能始终经得住生活的冲击...流风...挺住...[酷]","source":"<a href=\"http://weibo.com/mobile/iphone.php\" rel=\"nofollow\">iPhone客户端</a>","favorited":false,"truncated":false,"in_reply_to_status_id":"","in_reply_to_user_id":"","in_reply_to_screen_name":"","thumbnail_pic":"http://ww4.sinaimg.cn/thumbnail/3b9af12fjw1doyugws0b0j.jpg","bmiddle_pic":"http://ww4.sinaimg.cn/bmiddle/3b9af12fjw1doyugws0b0j.jpg","original_pic":"http://ww4.sinaimg.cn/large/3b9af12fjw1doyugws0b0j.jpg","geo":null,"mid":"3400772775784942","annotations":[{"server_ip":"10.73.19.157"}]}} {"ids":[1961307151,1784484997,2500538567,2476469867,1566399614,1767316885,1804207797,2036818020,1973555873,2135127421,1880426505,1466328827,1716642711,1774181370,1820803222,1919234703,2036534861,2043477285,1251974920,1717997681,1240827002,1919526085,1926940725,1811872815,1720905464,1751895117,2032362671,1211195393,1281742123,2142881192,2285672001,1830921181,2344575775,1784092934,1892528834,1731328944,1302344127,1780829445,1695700554,1074578341,1802486035,1342200407],"next_cursor":0,"previous_cursor":0} ... ``` http://zhang18f.myweb.cs.uwindsor.ca/datasets/weibo/blacknode_info-utf8.txt.gz ## Similarity Check We calculated the pair-wised jaccard similarity between the top 10,000 nodes. The jaccard similarity file can be found [here](http://datasets.zhang18f.myweb.cs.uwindsor.ca/weibo.js). There are a expected jaccard similarity related to their degree. Here is the plot of expected jaccard similairity and the jaccard similairty in Sina Weibo: ![](http://datasets.zhang18f.myweb.cs.uwindsor.ca/zb-boxplot.png) ![](http://datasets.zhang18f.myweb.cs.uwindsor.ca/js.png) ![](http://datasets.zhang18f.myweb.cs.uwindsor.ca/js-log.png) And we build a website wich can query the jaccard similarity in real time [link](http://569.asxzy.net) ## Zombie Detection While processing the pair-wised jaccard similarity, we find that the jaccard similarity between some big nodes is abnormally high and we call these followers zombies. Our goal is to find these zombies and prove these users are zombies. ### Spammed Target By anylise the pair-wised jaccard similairty of top 10,000 nodes, we are able to identify 395 nodes of sapammed targets. [link](http://datasets.zhang18f.myweb.cs.uwindsor.ca/weibo.395.js) ### Attributes in Zombie Network By calculate the pair-wised jaccard similarity of top 10,000 nodes. we identified 395 groups of zombie networks. Here are the plots of zombies' attributes #### created time #### ![](http://datasets.zhang18f.myweb.cs.uwindsor.ca/zb-created_at.png) #### description #### ![](http://datasets.zhang18f.myweb.cs.uwindsor.ca/zb-description.png) #### gender #### ![](http://datasets.zhang18f.myweb.cs.uwindsor.ca/zb-gender.png) #### location #### ![](http://datasets.zhang18f.myweb.cs.uwindsor.ca/zb-location.png) Here are the information of the 395 zombie groups | attribute | url | |:---------------:|:---------------------------------------------------------------| | zombie leaders | http://datasets.zhang18f.myweb.cs.uwindsor.ca/weibo.395 | | zombie groups | http://datasets.zhang18f.myweb.cs.uwindsor.ca/weibo.395.cmt | | zombie list | http://datasets.zhang18f.myweb.cs.uwindsor.ca/weibo.395.zb | | all atrributes| http://datasets.zhang18f.myweb.cs.uwindsor.ca/zb-showall.pdf | | atrribute list: created_at | http://datasets.zhang18f.myweb.cs.uwindsor.ca/zb-created_at.pdf | | atrribute list: description |http://datasets.zhang18f.myweb.cs.uwindsor.ca/zb-description.pdf | | atrribute list: gender | http://datasets.zhang18f.myweb.cs.uwindsor.ca/zb-gender.pdf | | atrribute list: location | http://datasets.zhang18f.myweb.cs.uwindsor.ca/zb-location.pdf | | atrribute list: degree distribution| http://datasets.zhang18f.myweb.cs.uwindsor.ca/zb-degree.pdf | ### Dating group user 爱约会美女(1787728323) ,爱约会帅哥(1787709495) and 爱约会(1786915491) have over 10,000 followers and the jaccard similarity between the 3 nodes is about 0.95. In the other way, we believe there are 95% followers in that group are zombies. The group data can be found: | attribute | url | |:---------------:|:---------------------------------------------------------------| | zombie list | http://datasets.zhang18f.myweb.cs.uwindsor.ca/aiyuehui.zb | | edge list |http://datasets.zhang18f.myweb.cs.uwindsor.ca/aiyuehui.graph | | atrribute list: city | http://datasets.zhang18f.myweb.cs.uwindsor.ca/aiyuehui.zb.city | | atrribute list: created_at | http://datasets.zhang18f.myweb.cs.uwindsor.ca/aiyuehui.zb.created_at | | atrribute list: description |http://datasets.zhang18f.myweb.cs.uwindsor.ca/aiyuehui.zb.description | | atrribute list: followers_count |http://datasets.zhang18f.myweb.cs.uwindsor.ca/aiyuehui.zb.followers_count | | atrribute list: friends_count |http://datasets.zhang18f.myweb.cs.uwindsor.ca/aiyuehui.zb.friends_count | | atrribute list: gender | http://datasets.zhang18f.myweb.cs.uwindsor.ca/aiyuehui.zb.gender | | atrribute list: location | http://datasets.zhang18f.myweb.cs.uwindsor.ca/aiyuehui.zb.location | | atrribute list: name | http://datasets.zhang18f.myweb.cs.uwindsor.ca/aiyuehui.zb.name | | atrribute list: province | http://datasets.zhang18f.myweb.cs.uwindsor.ca/aiyuehui.zb.province | | atrribute list: screen_name | http://datasets.zhang18f.myweb.cs.uwindsor.ca/aiyuehui.zb.screen_name | | atrribute list: source | http://datasets.zhang18f.myweb.cs.uwindsor.ca/aiyuehui.zb.source | | atrribute list: text |http://datasets.zhang18f.myweb.cs.uwindsor.ca/aiyuehui.zb.text | the plot of the graph can be found: http://datasets.zhang18f.myweb.cs.uwindsor.ca/aiyuehui.pdf ![](http://datasets.zhang18f.myweb.cs.uwindsor.ca/aiyuehui.png) ### TianYi group user 中国电信物联网(1961266125) and 中国电信协同通信(2093070470) have over 3,000 followers and the jaccard similarity between the 2 nodes is about 0.95. The tianyi data can be found: | attribute | url | |:---------------:|:---------------------------------------------------------------| | zombie list | http://datasets.zhang18f.myweb.cs.uwindsor.ca/tianyi.zb | | edge list |http://datasets.zhang18f.myweb.cs.uwindsor.ca/tianyi.graph | | atrribute list: city | http://datasets.zhang18f.myweb.cs.uwindsor.ca/tianyi.zb.city | | atrribute list: created_at | http://datasets.zhang18f.myweb.cs.uwindsor.ca/tianyi.zb.created_at | | atrribute list: description |http://datasets.zhang18f.myweb.cs.uwindsor.ca/tianyi.zb.description | | atrribute list: followers_count |http://datasets.zhang18f.myweb.cs.uwindsor.ca/tianyi.zb.followers_count | | atrribute list: friends_count |http://datasets.zhang18f.myweb.cs.uwindsor.ca/tianyi.zb.friends_count | | atrribute list: gender | http://datasets.zhang18f.myweb.cs.uwindsor.ca/tianyi.zb.gender | | atrribute list: location | http://datasets.zhang18f.myweb.cs.uwindsor.ca/tianyi.zb.location| | atrribute list: name | http://datasets.zhang18f.myweb.cs.uwindsor.ca/tianyi.zb.name | | atrribute list: province | http://datasets.zhang18f.myweb.cs.uwindsor.ca/tianyi.zb.province | | atrribute list: screen_name | http://datasets.zhang18f.myweb.cs.uwindsor.ca/tianyi.zb.screen_name | | atrribute list: source | http://datasets.zhang18f.myweb.cs.uwindsor.ca/tianyi.zb.source | | atrribute list: text |http://datasets.zhang18f.myweb.cs.uwindsor.ca/tianyi.zb.text | The plot of the graph can be found: http://datasets.zhang18f.myweb.cs.uwindsor.ca/tianyi.pdf ![](http://datasets.zhang18f.myweb.cs.uwindsor.ca/tianyi.png) ### WenWanTianXian group We believe there is a website called 文玩天下 created a huge zombie network. in that network, there are some [users](http://datasets.zhang18f.myweb.cs.uwindsor.ca/wwtx) linked to each other and the jaccard similarity among them is about 0.9. The wwtx data can be found: | attribute | url | |:---------------:|:---------------------------------------------------------------| | zombie list | http://datasets.zhang18f.myweb.cs.uwindsor.ca/wwtx.zb | | edge list |http://datasets.zhang18f.myweb.cs.uwindsor.ca/wwtx.graph | | atrribute list: city | http://datasets.zhang18f.myweb.cs.uwindsor.ca/wwtx.zb.city | | atrribute list: created_at | http://datasets.zhang18f.myweb.cs.uwindsor.ca/wwtx.zb.created_at | | atrribute list: description |http://datasets.zhang18f.myweb.cs.uwindsor.ca/wwtx.zb.description | | atrribute list: followers_count |http://datasets.zhang18f.myweb.cs.uwindsor.ca/wwtx.zb.followers_count | | atrribute list: friends_count |http://datasets.zhang18f.myweb.cs.uwindsor.ca/wwtx.zb.friends_count | | atrribute list: gender | http://datasets.zhang18f.myweb.cs.uwindsor.ca/wwtx.zb.gender | | atrribute list: location | http://datasets.zhang18f.myweb.cs.uwindsor.ca/wwtx.zb.location| | atrribute list: name | http://datasets.zhang18f.myweb.cs.uwindsor.ca/wwtx.zb.name | | atrribute list: province | http://datasets.zhang18f.myweb.cs.uwindsor.ca/wwtx.zb.province | | atrribute list: screen_name | http://datasets.zhang18f.myweb.cs.uwindsor.ca/wwtx.zb.screen_name | | atrribute list: source | http://datasets.zhang18f.myweb.cs.uwindsor.ca/wwtx.zb.source | | atrribute list: text |http://datasets.zhang18f.myweb.cs.uwindsor.ca/wwtx.zb.text | The plot of the graph can be found: http://datasets.zhang18f.myweb.cs.uwindsor.ca/tianyi.pdf ![](http://datasets.zhang18f.myweb.cs.uwindsor.ca/wwtx.png) ### Unkown group The data can be found: | attribute | url | |:---------------:|:---------------------------------------------------------------| | zombie list | http://datasets.zhang18f.myweb.cs.uwindsor.ca/unknown.zb | | edge list |http://datasets.zhang18f.myweb.cs.uwindsor.ca/unknown.graph | | atrribute list: city | http://datasets.zhang18f.myweb.cs.uwindsor.ca/unknown.zb.city | | atrribute list: created_at | http://datasets.zhang18f.myweb.cs.uwindsor.ca/unknown.zb.created_at | | atrribute list: description |http://datasets.zhang18f.myweb.cs.uwindsor.ca/unknown.zb.description | | atrribute list: followers_count |http://datasets.zhang18f.myweb.cs.uwindsor.ca/unknown.zb.followers_count | | atrribute list: friends_count |http://datasets.zhang18f.myweb.cs.uwindsor.ca/unknown.zb.friends_count | | atrribute list: gender | http://datasets.zhang18f.myweb.cs.uwindsor.ca/unknown.zb.gender | | atrribute list: location | http://datasets.zhang18f.myweb.cs.uwindsor.ca/unknown.zb.location| | atrribute list: name | http://datasets.zhang18f.myweb.cs.uwindsor.ca/unknown.zb.name | | atrribute list: province | http://datasets.zhang18f.myweb.cs.uwindsor.ca/unknown.zb.province | | atrribute list: screen_name | http://datasets.zhang18f.myweb.cs.uwindsor.ca/unknown.zb.screen_name | | atrribute list: source | http://datasets.zhang18f.myweb.cs.uwindsor.ca/unknown.zb.source | | atrribute list: text |http://datasets.zhang18f.myweb.cs.uwindsor.ca/unknown.zb.text |