Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
18892 views
ubuntu2004
Kernel: Python 3 (Anaconda 2020)
!pip install pandas_read_xml
Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: pandas_read_xml in /home/user/.local/lib/python3.7/site-packages (0.3.1) Requirement already satisfied: pyarrow in /home/user/.local/lib/python3.7/site-packages (from pandas_read_xml) (3.0.0) Requirement already satisfied: requests in /ext/anaconda2020.02/lib/python3.7/site-packages (from pandas_read_xml) (2.24.0) Requirement already satisfied: zipfile36 in /home/user/.local/lib/python3.7/site-packages (from pandas_read_xml) (0.1.3) Requirement already satisfied: distlib in /ext/anaconda2020.02/lib/python3.7/site-packages (from pandas_read_xml) (0.3.1) Requirement already satisfied: xmltodict in /ext/anaconda2020.02/lib/python3.7/site-packages (from pandas_read_xml) (0.12.0) Requirement already satisfied: pandas in /ext/anaconda2020.02/lib/python3.7/site-packages (from pandas_read_xml) (1.1.5) Requirement already satisfied: urllib3>=1.26.3 in /home/user/.local/lib/python3.7/site-packages (from pandas_read_xml) (1.26.4) Requirement already satisfied: numpy>=1.16.6 in /ext/anaconda2020.02/lib/python3.7/site-packages (from pyarrow->pandas_read_xml) (1.18.5) Requirement already satisfied: chardet<4,>=3.0.2 in /ext/anaconda2020.02/lib/python3.7/site-packages (from requests->pandas_read_xml) (3.0.4) Requirement already satisfied: idna<3,>=2.5 in /ext/anaconda2020.02/lib/python3.7/site-packages (from requests->pandas_read_xml) (2.8) Requirement already satisfied: certifi>=2017.4.17 in /ext/anaconda2020.02/lib/python3.7/site-packages (from requests->pandas_read_xml) (2020.12.5) Requirement already satisfied: python-dateutil>=2.7.3 in /ext/anaconda2020.02/lib/python3.7/site-packages (from pandas->pandas_read_xml) (2.8.0) Requirement already satisfied: pytz>=2017.2 in /ext/anaconda2020.02/lib/python3.7/site-packages (from pandas->pandas_read_xml) (2019.3) Requirement already satisfied: six>=1.5 in /ext/anaconda2020.02/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas->pandas_read_xml) (1.14.0)
test_xml = """<?xml version="1.0" encoding="UTF-8"?> <!-- bookstore.xml --> <bookstore> <book ISBN="0123456001"> <title>Java For Dummies</title> <author>Tan Ah Teck</author> <category>Programming</category> <year>2009</year> <edition>7</edition> <price>19.99</price> </book> <book ISBN="0123456002"> <title>More Java For Dummies</title> <author>Tan Ah Teck</author> <category>Programming</category> <year>2008</year> <price>25.99</price> </book> <book ISBN="0123456010"> <title>The Complete Guide to Fishing</title> <author>Bill Jones</author> <author>James Cook</author> <author>Mary Turing</author> <category>Fishing</category> <category>Leisure</category> <language>French</language> <year>2000</year> <edition>2</edition> <price>49.99</price> </book> </bookstore>"""
import pandas_read_xml as pdxi from pandas_read_xml import flatten, fully_flatten, auto_separate_tables
/ext/anaconda2020.02/lib/python3.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.4) or chardet (3.0.4) doesn't match a supported version! RequestsDependencyWarning)
df = pdxi.read_xml(test_xml, ['bookstore']) df
df = df.pipe(flatten) df
df = df.pipe(flatten) df
key_columns = ['book|@ISBN'] data = df.pipe(auto_separate_tables, key_columns)
data.keys()
dict_keys(['author', 'category', 'book'])
data['author']