Path: blob/main/a4/__pycache__/vocab.cpython-310.pyc
3764 views
o
���c�$ � @ s" d Z ddlmZ ddlmZ ddlmZ ddlZddlZddlm Z ddl
mZmZ ddl
ZG dd � d e�ZG d
d� de�Zdd
� Zedkr�ee �Zeded � eded � eed ddd�Zeed ddd�Ze�ee�Zedee�ee�f � e�ed � eded � dS dS )aF
CS224N 2022-23: Homework 4
vocab.py: Vocabulary Generation
Pencheng Yin <[email protected]>
Sahil Chopra <[email protected]>
Vera Lin <[email protected]>
Siyan Li <[email protected]>
Usage:
vocab.py --train-src=<file> --train-tgt=<file> [options] VOCAB_FILE
Options:
-h --help Show this screen.
--train-src=<file> File of training source sentences
--train-tgt=<file> File of training target sentences
--size=<int> vocab size [default: 50000]
--freq-cutoff=<int> frequency cutoff [default: 2]
� )�Counter)�docopt)�chainN)�List)�read_corpus� pad_sentsc @ s� e Zd ZdZd!dd�Zdd� Zdd� Zd d
� Zdd� Zd
d� Z dd� Z
dd� Zdd� Zdd� Z
deee dejdejfdd�Zed"dd��Zedd � �ZdS )#�
VocabEntryzW Vocabulary Entry, i.e. structure containing either
src or tgt language terms.
Nc C sb |r|| _ nt� | _ d| j d<