Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
18892 views
ubuntu2004
Kernel: Python 3 (Anaconda 2020)
!pip install pandas_read_xml
Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: pandas_read_xml in /home/user/.local/lib/python3.7/site-packages (0.3.1) Requirement already satisfied: zipfile36 in /home/user/.local/lib/python3.7/site-packages (from pandas_read_xml) (0.1.3) Requirement already satisfied: distlib in /ext/anaconda2020.02/lib/python3.7/site-packages (from pandas_read_xml) (0.3.1) Requirement already satisfied: pandas in /ext/anaconda2020.02/lib/python3.7/site-packages (from pandas_read_xml) (1.1.5) Requirement already satisfied: urllib3>=1.26.3 in /home/user/.local/lib/python3.7/site-packages (from pandas_read_xml) (1.26.4) Requirement already satisfied: requests in /ext/anaconda2020.02/lib/python3.7/site-packages (from pandas_read_xml) (2.24.0) Requirement already satisfied: xmltodict in /ext/anaconda2020.02/lib/python3.7/site-packages (from pandas_read_xml) (0.12.0) Requirement already satisfied: pyarrow in /home/user/.local/lib/python3.7/site-packages (from pandas_read_xml) (3.0.0) Requirement already satisfied: python-dateutil>=2.7.3 in /ext/anaconda2020.02/lib/python3.7/site-packages (from pandas->pandas_read_xml) (2.8.0) Requirement already satisfied: pytz>=2017.2 in /ext/anaconda2020.02/lib/python3.7/site-packages (from pandas->pandas_read_xml) (2019.3) Requirement already satisfied: numpy>=1.15.4 in /ext/anaconda2020.02/lib/python3.7/site-packages (from pandas->pandas_read_xml) (1.18.5) Requirement already satisfied: idna<3,>=2.5 in /ext/anaconda2020.02/lib/python3.7/site-packages (from requests->pandas_read_xml) (2.8) Requirement already satisfied: certifi>=2017.4.17 in /ext/anaconda2020.02/lib/python3.7/site-packages (from requests->pandas_read_xml) (2020.12.5) Requirement already satisfied: chardet<4,>=3.0.2 in /ext/anaconda2020.02/lib/python3.7/site-packages (from requests->pandas_read_xml) (3.0.4) Requirement already satisfied: six>=1.5 in /ext/anaconda2020.02/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas->pandas_read_xml) (1.14.0)
test_xml = """<?xml version="1.0" encoding="UTF-8"?> <article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.0" xml:lang="en"> <front> <journal-meta> <journal-id journal-id-type="publisher-id">jreligion</journal-id> <journal-title-group> <journal-title>The Journal of Religion</journal-title> </journal-title-group> <publisher> <publisher-name>The University of Chicago Press</publisher-name> </publisher> <issn pub-type="ppub">00224189</issn> <issn pub-type="epub">15496538</issn> <custom-meta-group/> </journal-meta> <article-meta> <article-id pub-id-type="jstor-stable">4625926</article-id> <article-id pub-id-type="doi">10.1086/522275</article-id> <article-id pub-id-type="msid">JR0700335</article-id> <title-group> <article-title>A Critique of Gordon Kaufman’s Theological Method, with Special Reference to His Theory of Religion</article-title> </title-group> <contrib-group> <contrib contrib-type="author" xlink:type="simple"> <string-name> <given-names>Joshua</given-names> <x xml:space="preserve"> </x> <surname>Braley</surname> </string-name> </contrib> <aff id="aff_1">Santa Fe Community College</aff> </contrib-group> <pub-date pub-type="ppub"> <month>01</month> <year>2008</year> <string-date>January 2008</string-date> </pub-date> <volume>88</volume> <issue>1</issue> <issue-id>522213</issue-id> <fpage>29</fpage> <lpage>52</lpage> <permissions> <copyright-statement>© 2008 by The University of Chicago. All rights reserved.</copyright-statement> <copyright-year>2008</copyright-year> <copyright-holder>The University of Chicago</copyright-holder> </permissions> <self-uri xlink:href="https://www.jstor.org/stable/10.1086/522275"/> <custom-meta-group> <custom-meta> <meta-name>lang</meta-name> <meta-value>en</meta-value> </custom-meta> </custom-meta-group> </article-meta> <notes notes-type="epigraph"> <disp-quote> <p>But then, you’ll say, God and religion are the same! (Jonathan Edwards)</p> </disp-quote> </notes> </front> <back> </back> </article> """
import pandas_read_xml as pdxi from pandas_read_xml import flatten, fully_flatten, auto_separate_tables
/ext/anaconda2020.02/lib/python3.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.4) or chardet (3.0.4) doesn't match a supported version! RequestsDependencyWarning)
df = pdxi.read_xml(test_xml, ['article']) df
df = df.pipe(flatten) df
df = df.pipe(flatten) df
df = df.pipe(flatten) df
df = df.pipe(flatten) df
df = df.pipe(flatten) df
df = df.pipe(flatten) df
df = df.pipe(flatten) df
key_columns = ['front|article-meta|pub-date|string-date']
data = df.pipe(auto_separate_tables, key_columns)
data.keys()
dict_keys(['journal-meta|issn', 'article-meta|article-id', 'article-meta|contrib-group|contrib', 'article-meta|contrib-group|aff', 'article-meta|custom-meta-group|custom-meta', 'front'])
data['article-meta|contrib-group|aff']
data['article-meta|article-id']