Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
yiming-wange
GitHub Repository: yiming-wange/cs224n-2023-solution
Path: blob/main/a4/__pycache__/nmt_model.cpython-310.pyc
1003 views
o

J�"dr�@s�dZddlmZddlZddlmZmZmZmZm	Z	ddl
Z
ddlmZddl
Z
ddlmmZddlmZmZddlmZeddd	g�ZGd
d�dej�ZdS)z�
CS224N 2022-23: Homework 4
nmt_model.py: NMT Model
Pencheng Yin <[email protected]>
Sahil Chopra <[email protected]>
Vera Lin <[email protected]>
Siyan Li <[email protected]>
�)�
namedtupleN)�List�Tuple�Dict�Set�Union)�pad_packed_sequence�pack_padded_sequence)�ModelEmbeddings�
Hypothesis�value�scorecsheZdZdZd*�fdd�	Zdeeedeeedejfdd	�Z	d
ejdee
deejeejejfffdd
�Zdejdejdeejejfdejdejf
dd�Z
dejdeejejfdejdejdejdeeejejffdd�Zdejdee
dejfdd�Zd+deede
de
deefd d!�Zedejfd"d#��Zed$efd%d&��Zd'efd(d)�Z�ZS),�NMTz� Simple Neural Machine Translation Model:
        - Bidrectional LSTM Encoder
        - Unidirection LSTM Decoder
        - Global Attention Model (Luong, et al. 2015)
    皙�����?cs:tt|���t||�|_||_||_||_d|_d|_	d|_
d|_d|_d|_
d|_d|_d|_d|_tj||ddd�|_tj||jdd�|_tj||j|jd	�|_	tjd|j|jdd
�|_
tjd|j|jdd
�|_tjd|j|jdd
�|_tjd|j|jdd
�|_
t�t|jj�|j�|_t�|j�|_dS)a� Init NMT Model.

        @param embed_size (int): Embedding size (dimensionality)
        @param hidden_size (int): Hidden Size, the size of hidden states (dimensionality)
        @param vocab (Vocab): Vocabulary object containing src and tgt languages
                              See vocab.py for documentation.
        @param dropout_rate (float): Dropout probability, for attention
        NFr��same)�kernel_size�paddingT)�
input_size�hidden_size�
bidirectional)rr)�bias�)�superr�__init__r
�model_embeddingsr�dropout_rate�vocab�encoder�decoder�h_projection�c_projection�att_projection�combined_output_projection�target_vocab_projection�dropout�gen_sanity_check�counter�nn�Conv1d�post_embed_cnn�LSTM�LSTMCell�Linear�len�tgt�Dropout)�self�
embed_sizerrr��	__class__��;/Users/yimingwang/Desktop/cs224n/assignment/a4/nmt_model.pyr!s0	zNMT.__init__�source�target�returncCs�dd�|D�}|jjj||jd�}|jjj||jd�}|�||�\}}|�||�}|�||||�}	tj	|�
|	�dd�}
||jjdk��}tj
|
|dd��d�dd	��d�|dd�}|jd
d�}
|
S)ak Take a mini-batch of source and target sentences, compute the log-likelihood of
        target sentences under the language models learned by the NMT system.

        @param source (List[List[str]]): list of source sentence tokens
        @param target (List[List[str]]): list of target sentence tokens, wrapped by `<s>` and `</s>`

        @returns scores (Tensor): a variable/tensor of shape (b, ) representing the
                                    log-likelihood of generating the gold-standard target sentence for
                                    each example in the input batch. Here b = batch size.
        cSsg|]}t|��qSr5)r.)�.0�sr5r5r6�
<listcomp>lszNMT.forward.<locals>.<listcomp>��device�������dimz<pad>�N)�indexrAr)r�src�to_input_tensorr>r/�encode�generate_sent_masks�decode�F�log_softmaxr$�float�torch�gather�	unsqueeze�squeeze�sum)r1r7r8�source_lengths�
source_padded�
target_padded�enc_hiddens�dec_init_state�	enc_masks�combined_outputs�PZtarget_masksZtarget_gold_words_log_prob�scoresr5r5r6�forward`s	�
�zNMT.forwardrRrQc
Cs�d\}}|j\}}|j�|�}|�t�|d���d�}|�t||��\}\}	}
t|�\}}|�d�}|�	|	�d��
|d��}|�|
�d��
|d��}||f}||fS)a. Apply the encoder to source sentences to obtain encoder hidden states.
            Additionally, take the final states of the encoder and project them to obtain initial states for decoder.

        @param source_padded (Tensor): Tensor of padded source sentences with shape (src_len, b), where
                                        b = batch_size, src_len = maximum source sentence length. Note that
                                       these have already been sorted in order of longest to shortest sentence.
        @param source_lengths (List[int]): List of actual lengths for each of the source sentences in the batch
        @returns enc_hiddens (Tensor): Tensor of hidden units with shape (b, src_len, h*2), where
                                        b = batch size, src_len = maximum source sentence length, h = hidden size.
        @returns dec_init_state (tuple(Tensor, Tensor)): Tuple of tensors representing the decoder's initial
                                                hidden state and cell. Both tensors should have shape (2, b, h).
        )NN)rBrr)rrrB)rBrrr?)�shaperr7r*rL�permuterr	rr �reshaper!)
r1rRrQrTrU�_�B�X�xZlast_hiddenZ	last_cellZinit_decoder_hiddenZinit_decoder_cellr5r5r6rF�s
(
z
NMT.encoderTrVrUrScCs�|dd�}|}|�d�}tj||j|jd�}g}|�|�}	|j�|�}
t�|
d�D]}tj	|�
�|gdd�}|�||||	|�\}}
}|�|
�q*t�
|�}|S)a�Compute combined output vectors for a batch.

        @param enc_hiddens (Tensor): Hidden states (b, src_len, h*2), where
                                     b = batch size, src_len = maximum source sentence length, h = hidden size.
        @param enc_masks (Tensor): Tensor of sentence masks (b, src_len), where
                                     b = batch size, src_len = maximum source sentence length.
        @param dec_init_state (tuple(Tensor, Tensor)): Initial state and cell for decoder
        @param target_padded (Tensor): Gold-standard padded target sentences (tgt_len, b), where
                                       tgt_len = maximum target sentence length, b = batch size.

        @returns combined_outputs (Tensor): combined output tensor  (tgt_len, b,  h), where
                                        tgt_len = maximum target sentence length, b = batch_size,  h = hidden size
        Nr?rr=rBr@)�sizerL�zerosrr>r"rr8�split�catrO�step�append�stack)r1rTrVrUrS�	dec_state�
batch_sizeZo_prevrW�enc_hiddens_proj�Y�y�Ybar_t�combined_outputr^r5r5r6rH�s

%
z
NMT.decodernrirkcCs�d}|�||�}|d}t�||�d���d�}|dur(|j�|��td��t	�
|�}	t�|	�d�|��d�}
tj||
gdd�}|�|�}|�
t�|��}
|
}|||fS)aF Compute one forward step of the LSTM decoder, including the attention computation.

        @param Ybar_t (Tensor): Concatenated Tensor of [Y_t o_prev], with shape (b, e + h). The input for the decoder,
                                where b = batch size, e = embedding size, h = hidden size.
        @param dec_state (tuple(Tensor, Tensor)): Tuple of tensors both with shape (b, h), where b = batch size, h = hidden size.
                First tensor is decoder's prev hidden state, second tensor is decoder's prev cell.
        @param enc_hiddens (Tensor): Encoder hidden states Tensor, with shape (b, src_len, h * 2), where b = batch size,
                                    src_len = maximum source length, h = hidden size.
        @param enc_hiddens_proj (Tensor): Encoder hidden states Tensor, projected from (h * 2) to h. Tensor is with shape (b, src_len, h),
                                    where b = batch size, src_len = maximum source length, h = hidden size.
        @param enc_masks (Tensor): Tensor of sentence masks shape (b, src_len),
                                    where b = batch size, src_len is maximum source length.

        @returns dec_state (tuple (Tensor, Tensor)): Tuple of tensors both shape (b, h), where b = batch size, h = hidden size.
                First tensor is decoder's new hidden state, second tensor is decoder's new cell.
        @returns combined_output (Tensor): Combined output Tensor at timestep t, shape (b, h), where b = batch size, h = hidden size.
        @returns e_t (Tensor): Tensor of shape (b, src_len). It is attention scores distribution.
                                Note: You will not use this outside of this function.
                                      We are simply returning this value so that we can sanity check
                                      your implementation.
        Nrr?�infrBr@)rrL�bmmrNrO�data�masked_fill_�boolrKrI�softmaxrer#r%�tanh)r1rnrirTrkrVroZ
dec_hiddenZe_tZalpha_tZa_tZU_tZV_tZO_tr5r5r6rfs


zNMT.stepcCsLtj|�d�|�d�tjd�}t|�D]\}}d|||d�f<q|�|j�S)a	 Generate sentence masks for encoder hidden states.

        @param enc_hiddens (Tensor): encodings of shape (b, src_len, 2*h), where b = batch size,
                                     src_len = max source length, h = hidden size.
        @param source_lengths (List[int]): List of actual lengths for each of the sentences in the batch.

        @returns enc_masks (Tensor): Tensor of sentence masks of shape (b, src_len),
                                    where src_len = max source length, h = hidden size.
        rrB)�dtypeN)rLrcrbrK�	enumerate�tor>)r1rTrQrVZe_id�src_lenr5r5r6rGws
zNMT.generate_sent_masks��F�src_sent�	beam_size�max_decoding_time_stepc(s��jj�|g�j�}��|t|�g�\}}��|�}|}tjd�j	�jd�}	�jj
d}
dgg}tjt|�tj�jd�}g}
d}t|
�|k�rZ||k�rZ|d7}t|�}|�||�
d�|�
d��}|�||�
d�|�
d��}tj�fdd	�|D�tj�jd�}�j�|�}tj||	gd
d�}�j||||dd
�\\}}}}tj��|�d
d�}|t|
�}|�d��|�|�d
�}tj||d�\}}tj|t�jj
�dd�}|t�jj
�}g} g}!g}"t|||�D]A\}#}$}%|#��}#|$��}$|%��}%�jj
j|$}&||#|&g}'|&dk�r|
�t |'dd
�|%d��q�| �|'�|!�|#�|"�|%�q�t|
�|k�r-n-tj|!tj�jd�}!||!||!f}||!}	| }tj|"tj�jd�}t|
�|k�rZ||ksLt|
�dk�rt|
�t |ddd�|d��d��|
j!dd�dd�|
S)aM Given a single source sentence, perform beam search, yielding translations in the target language.
        @param src_sent (List[str]): a single source sentence (words)
        @param beam_size (int): beam size
        @param max_decoding_time_step (int): maximum number of time steps to unroll the decoding RNN
        @returns hypotheses (List[Hypothesis]): a list of hypothesis, each hypothesis has two fields:
                value: List[str]: the decoded target sentence, represented as a list of words
                score: float: the log-likelihood of the target sentence
        rBr=z</s>z<s>)rwr>rrcsg|]
}�jj|d�qS)r?)rr/)r:�hyp�r1r5r6r<�sz#NMT.beam_search.<locals>.<listcomp>r?r@N)rV)�k�floor)�
rounding_mode)rr
cSs|jS�N)r
)r�r5r5r6�<lambda>�sz!NMT.beam_search.<locals>.<lambda>T)�key�reverse)"rrDrEr>rFr.r"rLrcrr/rK�expandrb�tensor�longrr8rerfrIrJr$rN�	expand_as�view�topk�div�zip�item�id2wordrgr�sort)(r1r}r~rZ
src_sents_varZ
src_encodingsZdec_init_vecZsrc_encodings_att_linearZh_tm1Zatt_tm1�eos_id�
hypothesesZ
hyp_scoresZcompleted_hypotheses�tZhyp_numZexp_src_encodingsZexp_src_encodings_att_linearZy_tm1Z	y_t_embedraZh_tZcell_tZatt_tr^Zlog_p_tZlive_hyp_numZcontiuating_hyp_scoresZtop_cand_hyp_scoresZtop_cand_hyp_posZprev_hyp_idsZhyp_word_idsZnew_hypothesesZlive_hyp_idsZnew_hyp_scoresZprev_hyp_idZhyp_word_idZcand_new_hyp_scoreZhyp_wordZnew_hyp_sentr5r�r6�beam_search�s~

��"�

�

�;
�zNMT.beam_searchcCs|jjjjS)zG Determine which device to place the Tensors upon, CPU or GPU.
        )rr7�weightr>r�r5r5r6r>�sz
NMT.device�
model_pathcCsBtj|dd�d�}|d}tdd|di|��}|�|d�|S)	zT Load the model from a file.
        @param model_path (str): path to model
        cSs|Sr�r5)�storage�locr5r5r6r��szNMT.load.<locals>.<lambda>)�map_location�argsr�
state_dictNr5)rL�loadr�load_state_dict)r��paramsr��modelr5r5r6r��s
zNMT.load�pathcCsFtd|tjd�t|jj|j|jd�|j|�	�d�}t
�||�dS)zO Save the odel to a file.
        @param path (str): path to the model
        zsave model parameters to [%s])�file)r2rr)r�rr�N)�print�sys�stderr�dictrr2rrrr�rL�save)r1r�r�r5r5r6r��s��zNMT.save)r)r{r|)�__name__�
__module__�__qualname__�__doc__rr�strrL�TensorrZ�intrrFrHrfrGrr��propertyr>�staticmethodr�r��
__classcell__r5r5r3r6rsD(?'�C��
�J����
�c�\r)r��collectionsrr��typingrrrrrrL�torch.nnr(�torch.nn.utils�torch.nn.functional�
functionalrIZtorch.nn.utils.rnnrr	rr
r�Modulerr5r5r5r6�<module>s