Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
suyashi29
GitHub Repository: suyashi29/python-su
Path: blob/master/Generative NLP Models using Python/Updated Language Translation model.ipynb
3074 views
Kernel: Python 3 (ipykernel)
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM # Step 1: Define supported target languages and model supported_languages = { "2": ("German", "translation_en_to_de"), "3": ("French", "translation_en_to_fr"), "4": ("Spanish", "translation_en_to_es"), } # Step 2: Display menu to user print("Select the language to translate English into:") for key, (language, _) in supported_languages.items(): print(f"{key}. {language}") # Step 3: Take user input for language and text lang_choice = input("Enter choice (1-4): ").strip() if lang_choice not in supported_languages: print("Invalid choice. Exiting.") exit() target_lang, pipeline_task = supported_languages[lang_choice] text_to_translate = input(f"Enter English text to translate into {target_lang}: ") # Step 4: Load appropriate model and tokenizer translator = pipeline(pipeline_task) # Step 5: Translate and display result result = translator(text_to_translate) print(f"\nTranslated text in {target_lang}: {result[0]['translation_text']}")
Select the language to translate English into: 1. Hindi 2. German 3. French 4. Spanish Enter choice (1-4): 2 Enter English text to translate into German: living is good
No model was supplied, defaulted to google-t5/t5-base and revision 686f1db (https://huggingface.co/google-t5/t5-base). Using a pipeline without specifying a model name and revision in production is not recommended. C:\Users\Suyashi144893\AppData\Local\anaconda3\Lib\site-packages\huggingface_hub\file_download.py:943: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn(
config.json: 0.00B [00:00, ?B/s]
C:\Users\Suyashi144893\AppData\Local\anaconda3\Lib\site-packages\huggingface_hub\file_download.py:143: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\Suyashi144893\.cache\huggingface\hub\models--google-t5--t5-base. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations. To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development warnings.warn(message)
WARNING:tensorflow:From C:\Users\Suyashi144893\AppData\Local\anaconda3\Lib\site-packages\tf_keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.
model.safetensors: 0%| | 0.00/892M [00:00<?, ?B/s]
generation_config.json: 0%| | 0.00/147 [00:00<?, ?B/s]
spiece.model: 0%| | 0.00/792k [00:00<?, ?B/s]
tokenizer.json: 0.00B [00:00, ?B/s]
Translated text in German: Das Leben ist gut

Second Option

from transformers import MarianTokenizer, MarianMTModel # Language map language_models = { "1": ("Hindi", "Helsinki-NLP/opus-mt-en-hi"), "2": ("German", "Helsinki-NLP/opus-mt-en-de"), "3": ("French", "Helsinki-NLP/opus-mt-en-fr"), "4": ("Spanish", "Helsinki-NLP/opus-mt-en-es"), } # Show menu print("Select target language:") for key, (lang, _) in language_models.items(): print(f"{key}. {lang}") # Get input choice = input("Enter your choice (1-4): ").strip() if choice not in language_models: print("Invalid selection.") exit() lang_name, model_name = language_models[choice] text = input(f"Enter English text to translate to {lang_name}: ") # Load tokenizer and model tokenizer = MarianTokenizer.from_pretrained(model_name) model = MarianMTModel.from_pretrained(model_name) # Translate tokens = tokenizer.prepare_seq2seq_batch([text], return_tensors="pt") translated = model.generate(**tokens) output = tokenizer.decode(translated[0], skip_special_tokens=True) # Result print(f"\nTranslated to {lang_name}: {output}")
Select target language: 1. Hindi 2. German 3. French 4. Spanish Enter your choice (1-4): 2 Enter English text to translate to German: hello hey
tokenizer_config.json: 0%| | 0.00/42.0 [00:00<?, ?B/s]
C:\Users\Suyashi144893\AppData\Local\anaconda3\Lib\site-packages\huggingface_hub\file_download.py:143: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\Suyashi144893\.cache\huggingface\hub\models--Helsinki-NLP--opus-mt-en-de. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations. To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development warnings.warn(message)
source.spm: 0%| | 0.00/768k [00:00<?, ?B/s]
target.spm: 0%| | 0.00/797k [00:00<?, ?B/s]
vocab.json: 0.00B [00:00, ?B/s]
config.json: 0.00B [00:00, ?B/s]
pytorch_model.bin: 0%| | 0.00/298M [00:00<?, ?B/s]
generation_config.json: 0%| | 0.00/293 [00:00<?, ?B/s]
C:\Users\Suyashi144893\AppData\Local\anaconda3\Lib\site-packages\transformers\tokenization_utils_base.py:4072: FutureWarning: `prepare_seq2seq_batch` is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular `__call__` method to prepare your inputs and targets. Here is a short example: model_inputs = tokenizer(src_texts, text_target=tgt_texts, ...) If you either need to use different keyword arguments for the source and target texts, you should do two calls like this: model_inputs = tokenizer(src_texts, ...) labels = tokenizer(text_target=tgt_texts, ...) model_inputs["labels"] = labels["input_ids"] See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice. For a more complete example, see the implementation of `prepare_seq2seq_batch`. warnings.warn(formatted_warning, FutureWarning)
Translated to German: Guten Tag.