Transform TFX ãã€ãã©ã€ã³ã³ã³ããŒãã³ã
Transform TFX ãã€ãã©ã€ã³ã³ã³ããŒãã³ã㯠ExampleGen ã³ã³ããŒãã³ãããåºåããã tf.Examples ã«ãSchemaGen ãäœæããããŒã¿ã¹ããŒãã䜿çšããŠç¹åŸŽéãšã³ãžãã¢ãªã³ã°ã宿œããSavedModelãããã³ã倿åãšå€æåŸã®äž¡æ¹ã®ããŒã¿ã«é¢ããçµ±èšãåºåããŸãããã® SavedModel ã¯ãå®è¡ããããš ExampleGen ã³ã³ããŒãã³ããåºåãã tf.Examples ãåãå ¥ããŠã倿ãããç¹åŸŽéããŒã¿ãåºåããŸãã
æ¶è²»: ExampleGen ã³ã³ããŒãã³ãã® tf.ExamplesãSchemaGen ã³ã³ããŒãã³ãã®ããŒã¿ã¹ããŒã
åºå: SavedModelãããã³ã倿åããã³å€æåŸã®çµ±èšã Trainer ã³ã³ããŒãã³ãã«åºåã
Transform ã³ã³ããŒãã³ããæ§æãã
preprocessing_fn ã®èšè¿°ãå®äºãããšãPython ã¢ãžã¥ãŒã«ã§å®çŸ©ããããã®ã¢ãžã¥ãŒã«ã¯å
¥åãšã㊠Transform ã³ã³ããŒãã³ãã«æäŸãããŸãããã®ã¢ãžã¥ãŒã«ã¯ Transform ã«ãã£ãŠèªã¿èŸŒãŸããTransform 㯠preprocessing_fn ãšãã颿°ãæ€åºããŠãããªããã»ãã·ã³ã°ãã€ãã©ã€ã³ã®æ§ç¯ã«äœ¿çšãããŸãã
ããã«ãTFDV ããŒã¹ã®å€æåãŸãã¯å€æåŸã®çµ±èšèšç®ã«ãªãã·ã§ã³ãæäŸããããšãã§ããŸãããããè¡ãã«ã¯ãåãã¢ãžã¥ãŒã«å
ã§ stats_options_updater_fn ãå®çŸ©ããŸãã
Transform ãš TensorFlow Transform
Transform ã¯ãããŒã¿ã»ããã«ç¹åŸŽéãšã³ãžãã¢ãªã³ã°ã宿œããããã« TensorFlow Transform ãå€å€§ã«äœ¿çšããŠããŸããTensorFlow Transform ã¯ç¹åŸŽéããŒã¿ããã¬ãŒãã³ã°ããã»ã¹ã®äžç°ãšããŠã¢ãã«ã«éãããåã«å€æããããã®æé©ãªããŒã«ã§ãã以äžã«ãäžè¬çãªç¹åŸŽé倿ã®äžéšã瀺ããŸãã
åã蟌ã¿: ã¹ããŒã¹ç¹åŸŽéïŒããã£ãã©ãªãçæããæŽæ°å ID ãªã©ïŒã髿¬¡å 空éããäœæ¬¡å 空éãžã®ææçŸ©ãªãããã³ã°ãèŠã€ãåºããŠå¯ãªç¹åŸŽéã«å€æããŸããåã蟌ã¿ã®åºç€ã«ã€ããŠãæ©æ¢°åŠç¿ã«ãããåã蟌ã¿ãŠãããã®ã¯ã©ãã·ã¥ã³ãŒã¹ãã芧ãã ããã
ããã£ãã©ãªçæ: æååãŸãã¯ãã®ä»ã®éæ°å€ç¹åŸŽéããäžæã®å€ãã ID çªå·ã«ãããã³ã°ããããã£ãã©ãªãäœæããŠæŽæ°ã«å€æããŸãã
å€ã®æ£èŠå: æ°å€ç¹åŸŽéãé¡äŒŒããç¯å²å ã«åãŸãããã«å€æããŸãã
ãã±ããå: é£ç¶ããå€ã®ç¹åŸŽéããå€ã颿£ãã±ããã«ä»£å ¥ããŠã«ããŽãªç¹åŸŽéã«å€æããŸãã
ããã¹ãç¹åŸŽéã®å å®å: ããŒã¯ã³ãn-gramããšã³ãã£ãã£ãã»ã³ãã¡ã³ããªã©ã®çããŒã¿ããç¹åŸŽéãçæããŠç¹åŸŽéã»ãããå å®åããŸãã
TensorFlow Transform ã«ã¯ãããã®ãµããŒããããã®ä»å€ãã®å€æã®ãµããŒããåãã£ãŠããŸãã
ææ°ã®ããŒã¿ããèªåçã«ããã£ãã©ãªãçæãã
ããŒã¿ãã¢ãã«ã«éä¿¡ããåã«ãä»»æã®å€æãããŒã¿ã«å®æœãããTensorFlow Transform ã¯å€æãã¢ãã«ã® TensorFlow ã°ã©ãã«çµã¿èŸŒãããããã¬ãŒãã³ã°æãšæšè«æã«åã倿ãè¡ãããŸããå šãã¬ãŒãã³ã°ã€ã³ã¹ã¿ã³ã¹ã®ç¹åŸŽéã®æå€§å€ãªã©ãããŒã¿ã®ã°ããŒãã«ããããã£ãåç §ãã倿ãå®çŸ©ããããšãã§ããŸãã
ããŒã¿ã¯ TFX ãå®è¡ããåã«ä»»æã«å€æããããšãã§ããŸãããTensorFlow Transform å ã§è¡ãå Žåããã®å€æã¯ TensorFlow ã°ã©ãã®äžéšãšãªããŸãããã®æ¹æ³ã¯ããã¬ãŒãã³ã°/ãµãŒãã³ã°ã¹ãã¥ãŒãåé¿ããäžã§åœ¹ç«ã¡ãŸãã
ã¢ããªã³ã°ã³ãŒãå ã§ã®å€æã«ã¯ FeatureColumns ã䜿çšãããŸããFeatureColumns ã䜿çšãããšããã±ããåãäºåå®çŸ©æžã¿ã®ããã£ãã©ãªã䜿çšããæŽæ°åããŸãã¯ããŒã¿ã確èªããã«å®çŸ©ã§ãããã®ä»ã®å€æãå®çŸ©ã§ããŸãã
äžæ¹ã§ TensorFlow Transform ã¯å šããŒã¿ã確èªããŠããããããããã£ãŠããªãå€ãèšç®ããå¿ èŠã®ãã倿ãè¡ãããã«èšèšãããŠããŸããããšãã°ãããã£ãã©ãªãçæããã«ã¯ãå šããŒã¿ã®ç¢ºèªãå¿ èŠã§ãã
泚æ: ãããã®èšç®ã¯ãå éšçã« Apache Beam ã«å®è£ ãããŠããŸãã
Apache Beam ã䜿ã£ãå€ã®èšç®ã«å ããTensorFlow Transform ã§ã¯ãŠãŒã¶ãŒããããã®å€ã TensorFlow ã°ã©ãã«åã蟌ãããšãã§ããŸãããã®ã°ã©ãã¯ãã®åŸãã¬ãŒãã³ã°ã°ã©ãã«èªã¿èŸŒãããšãã§ããŸããããšãã°ãç¹åŸŽéãæ£èŠåããéãtft.scale_to_z_score 颿°ã«ãã£ãŠç¹åŸŽéã®å¹³åãšæšæºåå·®ãèšç®ãããTensorFlow ã°ã©ãå
ã«ããå¹³åãæžç®ããŠæšæºåå·®ã§é€ç®ãã颿°ã®è¡šçŸãèšç®ãããŸããçµ±èšã ãã§ãªã TensorFlow ã°ã©ããåºåããããšã§ãTensorFlow Transform ã¯ããªããã»ã·ã³ã°ãã€ãã©ã€ã³ã®ãªãŒãµãªã³ã°ããã»ã¹ãåçŽåããŠããŸãã
ããªããã»ãã·ã³ã°ã¯ã°ã©ããšããŠè¡šçŸãããŠããããããµãŒããŒã§çºçããããšãã§ãããã¬ãŒãã³ã°ãšãµãŒãã³ã°éã®äžè²«æ§ãä¿èšŒãããŸãããã®äžè²«æ§ã«ããããã¬ãŒãã³ã°/ãµãŒãã³ã°ã¹ãã¥ãŒã®åå ã® 1 ã€ãæ¶ãå»ãããŸãã
TensorFlow Transform ã§ã¯ãTensorFlow ã³ãŒãã䜿çšããŠããªããã»ãã·ã³ã°ãã€ãã©ã€ã³ãæå®ããããšãã§ããŸããã€ãŸãããã€ãã©ã€ã³ã¯ TensorFlow ã°ã©ããšåãæ¹æ³ã§æ§ç¯ããããšããããšã§ãããã®ã°ã©ãã« TensorFlow æŒç®ã®ã¿ã䜿çšãããŠããå Žåããã€ãã©ã€ã³ã¯å
¥åããããåãå
¥ããŠåºåããããè¿ãçŽç²ãªããããšãªããŸãããã®ãããªãã€ãã©ã€ã³ã¯ãtf.Estimator API ã䜿çšããå Žåã« input_fn å
ã«ãã®ã°ã©ããé
眮ããããšãšåãããšã§ããæ°éãèšç®ãããªã©ã®ãã«ãã¹æŒç®ãæå®ããããã«ãTensorFlow Transform ã«ã¯ãå€èŠçã« TensorFlow æŒç®ã«äŒŒãŠããããå®éã«ã¯ Apache Beam ãå®è¡ããåºåã宿°ãšããŠã°ã©ãã«æ¿å
¥ãããé
å»¶èšç®ãæå®ãã analyzer ãšåŒã°ããç¹æ®é¢æ°ãåãã£ãŠããŸããéåžžã® TensorFlow æŒç®ã 1 ã€ã®ããããå
¥åãšããŠåãããã®ãããã§èšç®ãè¡ã£ãŠããããåºåããã®ã«å¯Ÿããanalyzer ã¯ãã¹ãŠã®ãããã«å¯ŸããŠã°ããŒãã«ç°¡çŽ (Apache Beam ã«å®è£
) ã宿œããŠçµæãè¿ããŸãã
éåžžã® TensorFlow æŒç®ãš TensorFlow Transform analyzer ãçµã¿åããããšãããŒã¿ãäºååŠçããè€éãªãã€ãã©ã€ã³ãäœæããããšãã§ããŸããããšãã°ãtft.scale_to_z_score 颿°ã¯å
¥åãã³ãœã«ãåããå¹³å 0 ãšåæ£ 1 ãæã€ããã«æ£èŠåããããã³ãœã«ãè¿ããŸããå
éšçã«ã¯ mean ãš var analyzer ãåŒã³åºããŠããã宿œããŠãããããããããšã§å
¥åãã³ãœã«ã®å¹³åãšåæ£ã«çãã宿°ãã°ã©ãå
ã«å¹æçã«çæãããŸãããã®åŸ TensorFlow æŒç®ã䜿çšããŠãå¹³åã®æžç®ãšæšæºåå·®ã«ããé€ç®ãè¡ãããŸãã
TensorFlow Transform preprocessing_fn
TFX Transform ã³ã³ããŒãã³ãã¯ãããŒã¿ã®èªã¿æžãã«é¢é£ãã API åŒã³åºããåŠçããåºåã® SavedModel ããã£ã¹ã¯ã«æžã蟌ãããšã§ãTransform ã®äœ¿ç𿹿³ãåçŽåããŠããŸããTFX ãŠãŒã¶ãŒã¯ãpreprocessing_fn ãšãã 1 ã€ã®é¢æ°ãå®çŸ©ããã ãã§è¯ãã®ã§ããpreprocessing_fn ã«ã¯ããã³ãœã«ã®å
¥å dict ãæäœããŠãã³ãœã«ã®åºå dict ãçæããäžé£ã®é¢æ°ãå®çŸ©ããŸããTensorFlow Transform API ã«ã¯ scale_to_0_1 ã compute_and_apply_vocabulary ãªã©ã®ãã«ããŒé¢æ°ããããŸããã以äžã«ç€ºãããããã«éåžžã® TensorFlow 颿°ã䜿çšããããšãã§ããŸãã
preprocessing_fn ãžã®å ¥åãçè§£ãã
preprocessing_fn ã¯ããã³ãœã«ïŒTensorãSparseTensorããŸã㯠RaggedTensorïŒã«å¯Ÿããäžé£ã®æŒç®ãèšè¿°ããŸããpreprocessing_fn ãæ£ããå®çŸ©ããã«ã¯ãããŒã¿ããã³ãœã«ãšããŠã©ã®ããã«è¡šçŸãããŠããããçè§£ããå¿
èŠããããŸããpreprocessing_fn ãžã®å
¥åã¯ãã¹ããŒãã«ãã£ãŠæ±ºå®ãããŸããSchema proto ã¯æçµçã«ããŒã¿è§£æã«äœ¿çšããããç¹åŸŽé仿§ãïŒãè§£æä»æ§ããšåŒã°ããããšããããŸãïŒå€æãããŸãã倿ããžãã¯ã«ã€ããŠã®è©³çްã¯ããã¡ããã芧ãã ããã
TensorFlow Transform ã䜿çšããŠæååã®ã©ãã«ãåŠçãã
TensorFlow Transform ã¯éåžžãããã£ãã©ãªãçæãããã®ããã£ãã©ãªãé©çšããŠæååãæŽæ°ã«å€æããããã«äœ¿çšãããŸãããã®ã¯ãŒã¯ãããŒã«åŸã£ãå Žåãã¢ãã«ã«æ§ç¯ããã input_fn ã¯æŽæ°åãããæååãåºåããããšã«ãªããŸãããã ããã©ãã«ã«ã€ããŠã¯äŸå€ã§ããã¢ãã«ãåºåïŒæŽæ°ïŒã©ãã«ãæååã«ãããã³ã°ããã«ã¯ãã¢ãã«ã¯ã©ãã«ã®å¯èœãªå€ã®ãªã¹ããšãšãã«æååã®ã©ãã«ãåºåãã input_fn ãå¿
èŠãšããŸããããšãã°ã©ãã«ã cat ãš dog ã§ããå Žåããããã®çã®æååã input_fn ã®åºåã§ãããã㌠["cat", "dog"] ããã©ã¡ãŒã¿ãŒãšã㊠Estimator ã«æž¡ãå¿
èŠããããŸãïŒè©³çްã¯ä»¥äžãåç
§ïŒã
æååã®ã©ãã«ãšæŽæ°ã®ãããã³ã°ãåŠçããã«ã¯ãTensorFlow Transform ã䜿çšããŠããã£ãã©ãªãçæããå¿ èŠããããŸãã以äžã«ãããã宿Œããã³ãŒãã¹ããããã瀺ããŸãã
äžèšã®ããªããã»ãã·ã³ã°é¢æ°ã¯çã®å
¥åç¹åŸŽé (ããªããã»ãã·ã³ã°é¢æ°ã®åºåã®äžç°ãšããŠãè¿ãããŸã) ãåããããã«å¯Ÿã㊠tft.uniques ãåŒã³åºããŸããããã«ãããã¢ãã«ã§ã¢ã¯ã»ã¹ã§ãã education ã®ããã£ãã©ãªãçæãããŸãã
ã¢ãã«ã®ã³ãŒãã§ã¯ãåé¡åã«ãtft.uniques ãçæããããã£ãã©ãªã label_vocabulary åŒæ°ãšããŠæå®ããå¿
èŠããããŸããããã¯ãæåã«ãã«ããŒé¢æ°ã䜿ã£ãŠãã®ããã£ãã©ãªããªã¹ããšããŠèªã¿åã£ãŠè¡ããŸããããã¯ä»¥äžã®ã¹ããããã§ç€ºãããŠããŸãããµã³ãã«ã³ãŒãã§ã¯äžèšã§èª¬æãã倿æžã¿ã®ã©ãã«ã䜿çšãããŠããŸãããããã§ã¯çã®ã©ãã«ã䜿çšããããã®ã³ãŒãã瀺ããŸãã
ã¢ãã«ã®ã³ãŒãã§ã¯ãåé¡åã«ãtft.uniques ãçæããããã£ãã©ãªã label_vocabulary åŒæ°ãšããŠæå®ããå¿
èŠããããŸããããã¯ãæåã«ãã«ããŒé¢æ°ã䜿ã£ãŠãã®ããã£ãã©ãªããªã¹ããšããŠèªã¿åã£ãŠè¡ããŸããããã¯ä»¥äžã®ã¹ããããã§ç€ºãããŠããŸãããµã³ãã«ã³ãŒãã§ã¯äžèšã§èª¬æãã倿æžã¿ã®ã©ãã«ã䜿çšãããŠããŸãããããã§ã¯çã®ã©ãã«ã䜿çšããããã®ã³ãŒãã瀺ããŸãã
倿åããã³å€æåŸã®çµ±èšã®æ§æ
äžèšã®ããã«ã倿ã³ã³ããŒãã³ã㯠TFDV ãåŒã³åºããŠã倿åãšå€æåŸã®äž¡æ¹ã®çµ±èšãèšç®ããŸããTFDV ã¯ããªãã·ã§ã³ã® StatsOptions ãªããžã§ã¯ããå
¥åãšããŠåãåããŸããç¹å®ã®è¿œå çµ±èšïŒNLP çµ±èšãªã©ïŒãæå¹ã«ããããæ€èšŒããããããå€ïŒæå°/æå€§ããŒã¯ã³é »åºŠïŒãèšå®ããŠããã®ãªããžã§ã¯ããæ§æããããšããå§ãããŸãããããè¡ãã«ã¯ãã¢ãžã¥ãŒã«ãã¡ã€ã«ã§ stats_options_updater_fn ãå®çŸ©ããŸãã
å€ãã®å Žåãç¹åŸŽéã®ååŠçã«äœ¿çšãããããã£ãã©ãªã«é¢ããç¥èã¯å€æåŸã®çµ±èšã«æçšã§ããããã£ãã©ãªåãããã¹ãžã®ãããã³ã°ã¯ãTFT ã§çæããããã¹ãŠã®ããã£ãã©ãªã® StatsOptionsïŒãããã£ãŠ TFDVïŒã«æäŸãããŸãããŸããå€éšã§äœæãããããã£ãã©ãªã®ãããã³ã°ã¯ã(i) StatsOptions å
ã® vocab_paths ãã£ã¯ã·ã§ããªãçŽæ¥å€æŽãããã(ii) tft.annotate_assetã䜿çšããŠè¿œå ã§ããŸãã