云服务器内容精选

  • 准备数据 首先,企业A和大数据厂商B需要商议确定要提供的数据范围及对应的元数据信息,双方初始决定使用最近三个月的已有用户转化数据作为联邦训练的训练集和评估集,之后使用每周产生的新数据作为联邦预测的预测集。 表1 企业A的数据 字段名称 字段类型 描述 id string hash过后的手机号字符串 col0-col4 label float int 企业A数据特征 企业A对用户的标签属性 industry_all.csv id,col0,col1,col2,col3,col4,label 5feceb66ffc86f38d952786c6d696c79c2dbc239dd4e91b46729d73a27fb57e9,-1.4092505981594734,-0.5893679205612337,-4.467396692737264,1.370376187747878,-1.236832500268279,1 6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b,-1.5143756509526236,-1.9007475942180778,-5.617412558508785,2.2624690030531363,0.2886799132470795,0 d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35,-1.768367116508903,1.2721845837988317,1.1497337351126178,-1.3322677230347135,0.9716103319957519,1 4b227777d4dd1fc61c6f884f48641d02b4d121d3fd328cb08b5531fcacdabf8a,0.37260755643902965,-0.2919401803207504,0.08086265459068624,0.3915016044811785,-0.01227642831882032,1 ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d,-2.963183239713765,0.15113195842028704,-3.8749664899828824,1.0598464836794779,-4.400883309764479,1 e7f6c011776e8db7cd330b54174fd76f7d0216b612387a5ffcfb81e6f0919683,-0.35120767987472346,1.8018318746365054,1.4431627055321963,0.33307198119824927,0.8626132267902704,0 7902699be42c8a8e46fbbb4501726517e86b22c56a189f7625a6da49081b2451,-2.6642415757243825,0.8836647864509011,-1.2340786744195096,-1.4945873871135977,-2.6999504889710626,1 2c624232cdd221771294dfbb310aca000a0df6ac8b66b696d90ef06fdefb64a3,3.0418810956792526,-0.6516843409674193,3.6616499550343105,0.035548733627266224,3.477873903864847,0 19581e27de7ced00ff1ce50b2047e7a567c76b1cbaebabe5ef03f7c3017bb5b7,-0.8239137547429756,0.7877120377027675,0.4296355963569869,-1.315646485980162,-1.652321610851379,1 4a44dc15364204a80fe80e9039455cc1608281820fe2b24f1e5233ade6af1dd5,0.24150521920304757,-0.21911471888817458,1.5143874504690156,-0.6652345113435701,0.17857570592695637,0 6b51d431df5d7f141cbececcf79edf3dd861c3b4069f0b11661a3eefacbba918,0.9669487046029339,-1.5427187535294289,2.490658334326762,0.4233920429380765,2.972622142213776,0 3fdba35f04dc8c462986c992bcf875546257113072a909c162f7e470e581e278,-1.847252571492643,0.4969814473631169,1.6544165211185982,-1.9450069019776826,0.39415199332185435,1 8527a891e224136950ff32ca212b45bc93f69fbb801c3b1ebedac52775f99e61,0.1622108420432964,0.1771676208189943,4.55368226430978,-1.1032207991089722,2.375621631048501,0 e629fa6598d732768f7c726b4b621285f9c3b85303900aa912017db7617d8bdb,4.0527809556953,1.2053939486734313,3.260708709473611,1.1400990661834884,5.025657734758696,0 b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9,-0.21563539406333465,0.5231489445682316,-2.639937297036372,2.3738020768486425,0.34341393069722226,1 4523540f1504cd17100c4835e85b7eefd49911580f8efff0599a8f283be6b9e3,-0.5935568930535046,-0.35175055806960276,0.9645122559090376,-0.017390131639078914,0.09256256476781644,1 4ec9599fc203d176a301536c2e091a19bc852759b255bd6818810a42c5fed14a,1.0066513658973761,-0.9724037855292317,1.314115256428494,0.363296291355055,5.171128738363806,0 9400f1b21cb527d7fa3d3eabba93557a18ebe7a2ca4e471cfe5e4c5b4ca7f767,0.1406977237605178,-1.455646778048175,-0.7223212422509906,1.265951206785454,-0.5504387433588089,1 表2 大数据厂商B的数据 字段名称 字段类型 描述 id string hash过后的手机号字符串 f0-f4 float 大数据厂商数据特征 bigdata_all.csv id,f0,f1,f2,f3,f4 5feceb66ffc86f38d952786c6d696c79c2dbc239dd4e91b46729d73a27fb57e9,-0.246852445,-1.761531756,-2.840375975,-0.562750693,-2.23499737 6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b,-1.216062821,-1.093614452,-1.632396806,0.887601314,-4.40930101 4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce,-0.150047899,-1.323266508,3.01679156,1.728583156,0.656158732 4b227777d4dd1fc61c6f884f48641d02b4d121d3fd328cb08b5531fcacdabf8a,-0.333871414,-1.21968931,-0.082894791,0.020390259,-0.076884947 ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d,-2.438861166,0.111880807,-3.51428545,1.123004835,0.228893969 e7f6c011776e8db7cd330b54174fd76f7d0216b612387a5ffcfb81e6f0919683,-2.759963795,0.405262468,1.264947591,1.027350049,1.293868423 7902699be42c8a8e46fbbb4501726517e86b22c56a189f7625a6da49081b2451,0.189352371,-0.607297495,-0.808339321,2.048455567,1.303872778 2c624232cdd221771294dfbb310aca000a0df6ac8b66b696d90ef06fdefb64a3,0.390064223,0.664175034,3.20228741,0.380574513,0.017733811 19581e27de7ced00ff1ce50b2047e7a567c76b1cbaebabe5ef03f7c3017bb5b7,0.379250902,1.962293246,0.066277661,3.083228267,1.952626328 4a44dc15364204a80fe80e9039455cc1608281820fe2b24f1e5233ade6af1dd5,-0.070919538,-2.219653517,1.461645551,1.66185096,0.778770954 4fc82b26aecb47d2868c4efbe3581732a3e7cbcc6c2efb32062c08170a05eeb8,-0.771151327,-1.184821181,-0.674077615,-0.379858223,0.158957184 6b51d431df5d7f141cbececcf79edf3dd861c3b4069f0b11661a3eefacbba918,-0.738091802,-1.474822882,2.93475295,-3.763763721,-1.817301398 3fdba35f04dc8c462986c992bcf875546257113072a909c162f7e470e581e278,-0.483250226,0.616586578,3.001851708,2.407914633,0.856369412 8527a891e224136950ff32ca212b45bc93f69fbb801c3b1ebedac52775f99e61,-0.789268594,1.071733834,3.763254446,-3.760298263,0.49776472 e629fa6598d732768f7c726b4b621285f9c3b85303900aa912017db7617d8bdb,-0.372531118,1.559382514,2.403559204,-0.041093457,0.169341125 b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9,-2.773477116,-1.137653133,-1.50133841,0.82842642,-1.25476711 4523540f1504cd17100c4835e85b7eefd49911580f8efff0599a8f283be6b9e3,-1.542814756,1.019110477,1.395515599,0.539956076,0.100325065 4ec9599fc203d176a301536c2e091a19bc852759b255bd6818810a42c5fed14a,0.024227451,-1.087235302,3.67470964,-2.420729037,-3.132456573 其中为了保证数据安全,企业A和大数据厂商B通过讨论决定使用hash过后的手机号作为已有数据的唯一标识id字段,并将唯一标识作为数据对齐的依据。 父主题: 使用 TICS 可信联邦学习进行联邦建模
  • 筛选特征 样本对齐执行完成后单击下一步进入“特征选择”页面,这一步企业A需要选出企业A自己和大数据厂商B的特征及标签用于后续的训练。 企业A可以选择特征及标签后“启动分箱和IV计算”,通过联邦的统计算法计算出所选特征的iv值,一般而言iv值较高的特征更有区分性,应该作为首选的训练特征;过低的iv值没有区分性会造成训练资源的浪费,过高的iv值又过于突出可能会过度影响训练出来的模型。 例如这里大数据厂商提供的f4特征iv值是0,说明这个特征对于标签的识别没有区分度,可以不选用;而f0、f2特征的iv值中等,适合作为模型的训练特征。 根据计算得出的iv值,企业A调整了训练使用的特征,没有选用双方提供的特征全集,去掉了部分iv值较低的特征,减少了无用的计算消耗。 父主题: 使用TI CS 可信联邦学习进行联邦建模