- 阅读权限
- 255
- 威望
- 1 级
- 论坛币
- 49407 个
- 通用积分
- 51.8104
- 学术水平
- 370 点
- 热心指数
- 273 点
- 信用等级
- 335 点
- 经验
- 57815 点
- 帖子
- 4006
- 精华
- 21
- 在线时间
- 582 小时
- 注册时间
- 2005-5-8
- 最后登录
- 2023-11-26
|
- 12) Visualize the clusters with a force graph on lightning-viz.
- Create the input data vectors for the lightning-viz:
- list of the group memberships (the populations)
- list of the people (the sample IDs)
- Nested list of the graph links, with each person linked to their predicted cluster.
- >
- %python
- #prepare our data into a suitable format for our viz
- from pyspark.sql.functions import rowNumber
- from pyspark.sql.window import Window
- #ensure that our data arrays come out in the same order
- df = sqlContext.sql("select sample, popcode, prediction from final_results_table").coalesce(1)
- w = Window().orderBy()
- df = df.withColumn("rownumber", rowNumber().over(w))
- pop = df.select("popcode", "rownumber").collect()
- peeps = df.select("sample", "rownumber").collect()
- preds= df.select("prediction", "rownumber").collect()
- pop = [(str(x),str(y)) for (x,y) in pop]
- peeps = [(str(x),str(y)) for (x,y) in peeps]
- preds = [( x, str(y), 1) for (x,y) in preds]
- def getKey(item):
- return item[1]
- g = sorted(pop, key=getKey)
- l = sorted(peeps, key=getKey)
- pr = sorted(preds, key=getKey)
- # add 3 points for our force graph centers
- groups = ["0","1","2"] + [x[0] for x in g]
- labels = ["0","1","2"] + [x[0] for x in l]
- predictions = [[x[0],x[2]] for x in pr]
- listIndices= list(range(3,len(predictions) + 3))
- i = 0
- for sublist in predictions:
- sublist.insert(0,listIndices[i])
- i += 1
- >
- %python
- #create the viz
- from lightning import Lightning
- lgn = Lightning(host='http://public.lightning-viz.org')
- lgn.create_session("new")
- viz = lgn.force(predictions, group=groups, labels=labels)
- viz.get_public_link()
复制代码
|
|