Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
GRAAL-Research
GitHub Repository: GRAAL-Research/deepparse
Path: blob/main/examples/parse_addresses.py
1231 views
1
# pylint: skip-file
2
###################
3
"""
4
IMPORTANT:
5
THE EXAMPLE IN THIS FILE IS CURRENTLY NOT FUNCTIONAL
6
BECAUSE THE `download_from_public_repository` FUNCTION
7
NO LONGER EXISTS. WE HAD TO MAKE A QUICK RELEASE TO
8
REMEDIATE AN ISSUE IN OUR PREVIOUS STORAGE SOLUTION.
9
THIS WILL BE FIXED IN A FUTURE RELEASE.
10
11
IN THE MEAN TIME IF YOU NEED ANY CLARIFICATION
12
REGARDING THE PACKAGE PLEASE FEEL FREE TO OPEN AN ISSUE.
13
"""
14
import pandas as pd
15
16
from deepparse import download_from_public_repository
17
from deepparse.dataset_container import PickleDatasetContainer
18
from deepparse.parser import AddressParser
19
20
# Here is an example on how to parse multiple addresses
21
# First, let's download the train and test data from the public repository.
22
saving_dir = "./data"
23
file_extension = "p"
24
test_dataset_name = "predict"
25
download_from_public_repository(test_dataset_name, saving_dir, file_extension=file_extension)
26
27
# Now let's load the dataset using one of our dataset container
28
addresses_to_parse = PickleDatasetContainer("./data/predict.p", is_training_container=False)
29
30
# We can sneak peek some addresses
31
print(addresses_to_parse[:2])
32
33
# Let's use the BPEmb model on a GPU
34
address_parser = AddressParser(model_type="bpemb", device=0)
35
36
# We can now parse some addresses
37
parsed_addresses = address_parser(addresses_to_parse[0:300])
38
39
# Print one of the parsed address
40
print(parsed_addresses[0])
41
42
# When parsing addresses, some data quality tests are applied to the dataset.
43
# First, it validates that no addresses to parse are empty.
44
# Second, it validates that no addresses are whitespace-only.
45
# The next two lines are rising a DataError.
46
address_parser("") # Raise an error
47
address_parser(" ") # Raise an error
48
49
# We can also put our parsed address into a pandas dataframe for analysis
50
# You can choose the fields to use or use the default one
51
fields = ["StreetNumber", "StreetName", "Municipality", "Province", "PostalCode"]
52
parsed_address_data_frame = pd.DataFrame(
53
[parsed_address.to_dict(fields=fields) for parsed_address in parsed_addresses],
54
columns=fields,
55
)
56
57