-
Notifications
You must be signed in to change notification settings - Fork 1
/
01-parsing.py
189 lines (108 loc) · 5.84 KB
/
01-parsing.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
# coding: utf-8
# #Parsing VOEvent XML packets with ``voevent-parse``#
# ##Getting started##
# In[ ]:
from __future__ import print_function
import voeventparse as vp
# **IPython Tip #1**: In IPython (terminal *or* notebook) you can quickly check the docstring for something by putting a question mark in front, e.g.
# In[ ]:
# Uncomment the following and hit enter:
# ?vp.load
# Alternatively, you can always [read the docs](http://voevent-parse.readthedocs.org),
# which include autogenerated
# [API specs](http://voevent-parse.rtfd.org/en/master/reference.html#voeventparse.voevent.load).
#
# Ok, let's load up a [voevent (click here to see the raw XML)](voevent.xml):
# In[ ]:
with open('voevent.xml') as f:
v = vp.load(f)
# **IPython Tip #2**: We also get tab-completion. Simply start typing the name of a function (or even just the '.' operator) and hit tab to see valid possible options - this is handy for exploring VOEvent packets:
# In[ ]:
# Uncomment the following and hit tab:
# v.
# ##Accessing data##
#
# ###Text-values###
#
#
# **XML Tip #1**:
# An XML packet is a tree-structure made composed of [elements](http://www.w3schools.com/xml/xml_elements.asp).
# We can dig into the tree structure of the VOEvent, and inspect values:
# In[ ]:
v.Who.Date.text
# In[ ]:
print("Inferred reason is", v.Why.Inference.Name.text)
print( "(A string of length {})".format(len(v.Why.Inference.Name.text)))
type(v.Why.Inference.Name.text)
# ###Attributes###
# **XML Tip #2**:
# Note that there are [two ways](http://www.w3schools.com/dtd/dtd_el_vs_attr.asp) to store data in an XML packet:
# * A single string can be stored as an element's text-value - like the two we just saw.
# * Alternatively, we can attach a number of key-value strings to an element, storing them as [attributes]( http://www.w3schools.com/xml/xml_attributes.asp). We can access these via ``attrib``, which behaves like a Python dictionary, e.g.:
# In[ ]:
print(v.attrib['ivorn'])
print(v.attrib['role'])
# In[ ]:
v.Why.Inference.attrib
# ###'Sibling' elements and list-style access###
# So far, each of the elements we've accessed has been the only one of that name - i.e. our VOEvent has only one ``Who`` child-element, likewise there's only one ``Inference`` under the ``Why`` entry in this particular packet. But that's not always the case; for example the ``What`` section contains a ``Group`` with two child-elements called ``Param``:
#
# In[ ]:
print(vp.prettystr(v.What.Group))
# So how do we access all of these?
# This is where we start getting into the details of [lxml.objectify syntax](http://lxml.de/objectify.html#the-lxml-objectify-api) (which voevent-parse uses under the hood).
# **lxml.objectify uses a neat, but occasionally confusing, trick: when we access a child-element by name, what's returned behaves like a list**:
# In[ ]:
v.What[0] # v.What behaves like a list!
# However, to save having to type something like ``v.foo[0].bar[0].baz[0]``, the first element of the list can also be accessed without the ``[0]`` operator (aka ['syntactic sugar'](http://en.wikipedia.org/wiki/Syntactic_sugar)):
# In[ ]:
v.What is v.What[0]
# Knowing that it's 'just a list', we have a couple of options, we can iterate:
# In[ ]:
for par in v.What.Group.Param:
print(par.Description)
# Or we can check the length, access elements by index, etc:
# In[ ]:
len(v.What.Group.Param)
# In[ ]:
v.What.Group.Param[1].Description
# Note that another example of this 'syntactic sugar' is that we can display the text-value of an element without adding the ``.text`` suffix.
#
# However, see below for why it's a good idea to always use ``.text`` when you really do want the text-value of an element:
# In[ ]:
print(v.Why.Inference.Name) # More syntax sugar - if it has a string-value but no children, print the string
print(v.Why.Inference.Name.text) # The safe option
print(v.Why.Inference.Name.text[:3]) # Indexing on the string as you'd expect
print(v.Why.Inference.Name[:3]) # This is indexing on the *list of elements*, not the string!
# If that all sounds awfully messy, help is at hand: you're most likely to encounter sibling elements under the ``What`` entry of a VOEvent, and voevent-parse has a function to convert that to a nested Python dictionary for you:
# In[ ]:
# Consult the docstring
#?vp.pull_params
# In[ ]:
what_dict = vp.pull_params(v)
what_dict
# In[ ]:
what_dict['source_flux']['peak_flux']['value']
# ##Advanced##
# Since voevent-parse uses lxml.objectify, the full power of the LXML library is available when handling VOEvents loaded with voevent-parse.
#
# ###Iterating over child-elements###
# We already saw how you can access a group of child-elements by name, in list-like fashion. But you can also iterate over all the children of an element, even if you don't know the names ('tags', in XML-speak) ahead of time:
# In[ ]:
for child in v.Who.iterchildren():
print(child.tag, child.text, child.attrib)
# In[ ]:
for child in v.WhereWhen.ObsDataLocation.ObservationLocation.iterchildren():
print(child.tag, child.text, child.attrib)
# ###Querying a VOEvent###
# Another powerful technique is to find elements using Xpath or [ElementPath](http://lxml.de/tutorial.html#elementpath) queries, but this is beyond the scope of this tutorial: we leave you with just a single example:
# In[ ]:
v.find(".//Param[@name='int_flux']").attrib['value']
# ##Final words##
# Congratulations! You should now be able to extract data from just about any VOEvent packet.
# Note that voevent-parse comes with a few
# [convenience routines](http://voevent-parse.readthedocs.org/en/master/reference.html#module-voeventparse.convenience) to help with common, tedious operations, but you can always compose your own.
#
# If you put together something that you think others could use (or find a bug!), pull requests are welcome.
#
# Next stop: [authoring your own VOEvent](02-authoring.ipynb).