Path: blob/main/a4/__pycache__/vocab.cpython-310.pyc
1003 views
o ���c�$ � @ s" d Z ddlmZ ddlmZ ddlmZ ddlZddlZddlm Z ddl mZmZ ddl ZG dd � d e�ZG d d� de�Zdd � Zedkr�ee �Zeded � eded � eed ddd�Zeed ddd�Ze�ee�Zedee�ee�f � e�ed � eded � dS dS )aF CS224N 2022-23: Homework 4 vocab.py: Vocabulary Generation Pencheng Yin <[email protected]> Sahil Chopra <[email protected]> Vera Lin <[email protected]> Siyan Li <[email protected]> Usage: vocab.py --train-src=<file> --train-tgt=<file> [options] VOCAB_FILE Options: -h --help Show this screen. --train-src=<file> File of training source sentences --train-tgt=<file> File of training target sentences --size=<int> vocab size [default: 50000] --freq-cutoff=<int> frequency cutoff [default: 2] � )�Counter)�docopt)�chainN)�List)�read_corpus� pad_sentsc @ s� e Zd ZdZd!dd�Zdd� Zdd� Zd d � Zdd� Zd d� Z dd� Z dd� Zdd� Zdd� Z deee dejdejfdd�Zed"dd��Zedd � �ZdS )#� VocabEntryzW Vocabulary Entry, i.e. structure containing either src or tgt language terms. Nc C sb |r|| _ nt� | _ d| j d<