Get the XML document and look it over
- Download the XML document. Save locally.
- Open Oxygen and open the document you just downloaded.
- What is the document?
- (Optional) Download the PDF version of the XML document.
Question: Do the XML and PDF versions appear to be identical?
Note: The XML and PDF documents are made available from FDsys, U.S. Government Publishing Office.
We want to create a table of contents for the entries in this file to put on our website in HTML. We will include content from the section "Public Papers of Barack Obama" and from Appendix A, "Digest of Other White House Announcements". We want the table of contents to include entry title, entry date, and beginning page number.
Oxygen: Evaluate the XML
- Document > Validate > Check well-formedness. (Alternatively, CHECKMARK > Check well-formedness.)
- Document > Validate > Validate. (Alternatively, CHECKMARK > Validate.)
- Question: What is the difference between well-formed and valid XML?
Oxygen: View the data and its structure
- "Pretty print": Document > Source > Format and Indent
- Scroll through the document.
- Would be nice to have an outline of the structure of the document. See left box, "Outline"; click on element; corresponding element in document is highlighted. Watch the breadcrumbs. (If the box is not displayed: Window > Show View > Outline.)
- Would be nice to see how the elements are organized hierarchically and what they contain. See right box, "Model". "The Model view presents the structure of the currently selected tag, and its documentation, defined as annotation in the schema of the current document." (If the box is not displayed: Window > Show View > Model.) Note: #PCDATA is parsed character data.
- Question: Click on the elements <item-date> in line 396, <item-head> in line 394, <PRTPAGE> in line 418, and <date> in line 41235 of the file. What do those elements contain?
- Well-formed XML has exactly one root (or document element).
Question: What is the root of this document? "Definition: There is exactly one element, called the root, or document element, no part of which appears in the content of any other element." Source Note: The optional XML prolog, beginning with "?xml", is ignored.
- Question: What is the best way to find out what elements are children of the root?
Oxygen: Get to know the data and check it for consistency and correctness using XPath path expressions
"The path expression [...] provides a means of hierarchic addressing of the nodes in an XML tree." Source
- We want to verify that the pages in the XML document are in the expected order. The <PRTPAGE> element appears to contain the page numbers as the value of the attribute "P", e.g. <PRTPAGE P="11"/>. We know that <PRTPAGE> is a child of <XML>. Click anywhere in the document and enter this XPath expression, which is an absolute path, in the XPath box at the top left: /XML/PRTPAGE
Question: How many <PRTPAGE> elements are children of <XML>? (Look in the Results View, the box at the bottom.)
Question: Are the pages in the expected order? Do you see anything that you might want to follow up on?
- We want to know whether there are <PRTPAGE> elements that are not children of <XML>. Use this XPath expression to find all <PRTPAGE> elements in the document: //PRTPAGE
Question: How many are there?
- The page number appears as the value of the attribute P, e.g. <PRTPAGE P="11"/>. We want to know whether all <PRTPAGE> elements have the attribute P: //PRTPAGE[@P]
Question: How many are there? Alternatively use: //PRTPAGE[not(@P)]
- We want to see only the <PRTPAGE> elements that are children of <granule>.
Question: What part of the document do you think is included in the element <granule>?
- One way is to use an absolute path: /XML/granule/PRTPAGE
- Another way is to click on the <granule> element in the document and use a relative path as the XPath expression: PRTPAGE
- The result of the XPath expression /XML/PRTPAGE is the element <PRTPAGE P="62"/>. Write an XPath expression to find the 14th <PRTPAGE> element that is a child of <XML>.
- Question: What is the result(s) of each of the following XPath expressions?
- /XML/PRTPAGE[@P = "13"]
- Question: We want to create a mashup of the individual dated entries in the XML document as described above in our goal. There are at least 2 elements in the XML document containing dates: <item-date> (used in the section "Public Papers of Barack Obama") and <date> (used in Appendix A, "Digest of Other White House Announcements"). What potential problems might we run into in creating the mashup? Hint: //item-date and //date.
Question: Wouldn't it be easier if we just converted all <item-date> elements to <date>?
Oxygen: Transform the XML using XSLT
Use XSLT to clean the data, subset it, create XHTML, etc.
- We will output the text content of the elements <item-head> and <item-date> and the value of the attribute P in the element <PRTPAGE>. To do this, we will transform the XML using XSLT (eXtensible Stylesheet Language).
- We will also need to output HTML and CSS that can be used by the browser to lay out and style the text that we are outputting. XSLT can do that too.
- Within the XSLT, we will use XPath path expressions to locate the data we want to output.
- Question: What are the advantages of storing the data in XML and using XSLT to transform it?
Unless otherwise stated, quotations are from the Oxygen help system.