Path: blob/main/docs/source/examples/parse_addresses.rst
1233 views
.. role:: hidden
:class: hidden-section
Parse Addresses
***************
.. code-block:: python
import pandas as pd
from deepparse import download_from_public_repository
from deepparse.dataset_container import PickleDatasetContainer
from deepparse.parser import AddressParser
Here is an example on how to parse multiple addresses. First, let's download the train and test data from the public repository.
.. code-block:: python
saving_dir = "./data"
file_extension = "p"
test_dataset_name = "predict"
download_from_public_repository(test_dataset_name, saving_dir, file_extension=file_extension)
Now let's load the dataset using one of our dataset container
.. code-block:: python
addresses_to_parse = PickleDatasetContainer("./data/predict.p", is_training_container=False)
Let's use the ``BPEmb`` model on a GPU.
.. code-block:: python
address_parser = AddressParser(model_type="bpemb", device=0)
.. code-block:: python
parsed_addresses = address_parser(test_data[0:300])
# Print one of the parsed address
print(parsed_addresses[0])
When parsing addresses, some data quality tests are applied to the dataset.
First, it validates that no addresses to parse are empty.
Second, it validates that no addresses are whitespace-only.
The next two lines are rising a ``DataError``.
.. code-block:: python
address_parser("") # Raise an error
address_parser(" ") # Raise an error
We can also put our parsed address into a Pandas ``DataFrame`` for analysis. You can choose the fields to use or use the
default one.
.. code-block:: python
fields = ['StreetNumber', 'StreetName', 'Municipality', 'Province', 'PostalCode']
parsed_address_data_frame = pd.DataFrame([parsed_address.to_dict(fields=fields) for parsed_address in parsed_addresses],
columns=fields)