Title: Building EDEN NonXML Files November 2005
1Building EDEN Non-XML FilesNovember 2005
U.S. DEPARTMENT OF EDUCATION
2Agenda
- ESS Introduction A brief overview of the
features of ESS. - ESS Databases The path data takes from
submission to the data repository. - File formats High level overview of available
file formats. - File Specification Review Step through File
Specifications. - Order of File Submission Importance of
Directory file and managing data identifiers. - Review of Errors and Edits Sharing of solutions
to common problems. Open session with group to
discuss their states issues. - SEA Tools - What do you use? Open session for
states to share how they prepare files. Review
successes and failures.
3ESS Sub-Applications
Access to ESS Home Page
Access to transmit data files
Access to view status of transmitted files.
Including format and validation errors,
reasonability edit warnings.
Access to view submission status by SEA, LEA and
School.
This sub-application is available to view and
verify data at a State, LEA, or School level.
This area allows you to change your password as
needed.
4ESS Databases
5System Components
- Data Receipt and Processing System (DRPS)
- Receives files and checks for errors
- Staging Database
- Performs reasonability edits
- Data Repository
- Responsibility of data transferred to ED
6Data Receipt and Processing System
- Manages transmission of files from the SEAS to
EDEN - Receives files from SEA
- Performs format and validation edits
- Generates submission and error reports
- Emails file receipt and status to submitter
- If a file contains any format or validation
errors, its data does not go beyond this point
7Staging Database
- Contains data from files that passed format and
validation edits - Contents can be viewed from the Education Unit
Profile - Performs reasonability edits on submitted data
- Data requiring explanations from the SEA does not
move past this point - Before leaving this stage, ED will contact SEA
regarding any data quality concerns
8Data Repository
- Contains cleaned and edited files
- Responsibility of the data has transferred from
SEA to ED - SEA will receive notice 96 hours before data is
moved to DR - Additional review of data in the DR performed by
ED subject matter experts before data is used or
made available to public
9(No Transcript)
10 11File Formats
- Fixed Format (.txt)
- Comma-delimited Format (.csv)
- Tab-delimited Format (.tab)
- XML Format (.xml)
12- Fixed Format
- Each record has a pre-defined length
- Each field in a record is assigned a beginning
position and has a pre-defined length - All fields must be included in a record, whether
used or not - Filler fields and optional fields for which no
value is reported should be blank in fixed
position records - Unused positions of a field should be blank
13- Fixed Format
- For a fixed format file, the external file name
in the header record must have a file extension
of .txt - Numeric fields must not contain commas
- Data in decimal fields must contain the decimal
point (e.g., 1.123)
14 15- Comma-delimited Format
-
- Similar to the fixed format for order and format
of each field in a record - Each field is separated by a comma instead of
beginning and ending in a specified position - All fields, including filler fields must be
accounted for
16- Comma-delimited Format
- Except for the last field in a record, each field
whether valued or not, must be designated by a
comma, including filler fields - Example ( indicates CRLF)
- First Field Three Bears School
- Second Field Filler
- Third Field 123 Main Street
- Last Field Filler
- THREE BEARS SCHOOL,,123 MAIN STREET,
17- Comma-delimited Format
- When the last field in a record is a filler, the
record will appear to end with a comma, but it
actually ends with a blank filler field - For valued fields, the comma should immediately
follow the data value, except for the last field
in the record - For a comma-delimited file, the external file
name in the header record must have a file
extension of .csv
18- Comma-delimited Format Examples
- Header Record Field Content
- Field Name Format Data Value
- File Type STRING LEA DIRECTORY INFO
- Total Records in File NUMBER 123
- Filename STRING EULEADIRECTORYVer0001.csv
- File Identifier STRING LEA Directory Example
- File Reporting Period STRING 2003-2004
- LEA DIRECTORY INFO,123,EULEADIRECTORYVer0001.csv,L
EA Directory Example, - 2003-2004
19- Comma-delimited Format Examples
20- Tab-delimited Format
-
- Almost identical to comma-delimited format except
for the following differences - Each field is separated by a tab character
instead of a comma - Blank fields and fillers need only be designated
with a tab character
21- Tab-delimited Format
-
- For a tab-delimited file, the external file name
in the header record must have a file extension
of .tab - When the last field in a record is a filler, the
record will appear to end with a tab character,
but it actually ends with a blank filler field
22- Tab-delimited Format
- Except for the last field in a record, each field
whether valued or not, must be designated by a
tab character, including filler fields - Example (gt indicates a tab character -
indicates CRLF) - First Field Three Bears School
- Second Field Filler
- Third Field 123 Main Street
- Last Field Filler
- THREE BEARS SCHOOLgtgt123 MAIN STREETgt
23- Tab-delimited Format Examples
- Header Record Field Content
- Field Name Format Data Value
- File Type STRING LEA DIRECTORY INFO
- Total Records in File NUMBER 123
- Filename STRING EULEADIRECTORYVer0001.tab
- File Identifier STRING LEA Directory Example
- File Reporting Period STRING 2003-2004
- LEA DIRECTORY INFOgt123gtEULEADIRECTORYVer0001.tabgtL
EA Directory Examplegt - 2003-2004
24- Tab-delimited Format Examples
25- Extensible Markup Language - XML
- XML is a standard for creating documents that
describe and apply a structure to data - XML document are self-describing making them
readable by both humans and computers - Extensible means there not a fixed set of
elements like HTML. Authors can define their own
tags. - A common format for exchanging data over the Web.
26- Example of XML
- ltFILETRANSMISSION FILELAYOUTTYPE"SEA MEMBERSHIP
TABLE" - FILEID"SEA MEMBERSHIP 15 OCT 2005"
- SCHOOLYEAR"2005-2006"gt
- ltAGENCY FIPSSTATECODE"99" STATEAGENCYIDNUMBER
"01" - STATELEAIDNUMBER"518"gt
- ltTABLETYPE TYPEABBRV"MEMBER"
TOTALINDICATOR"N"gt - ltCATEGORY TYPE"GRADELVMEM"
VALUE"03"/gt - ltCATEGORY TYPE"RACEETHNIC"
VALUE"AS"/gt - ltCATEGORY TYPE"GENDER" VALUE"F"/gt
- ltAMOUNTgt456lt/AMOUNTgt
- ltEXPLANATION/gt
- lt/TABLETYPEgt
- lt/AGENCYgt
- lt/FILETRANSMISSIONgt
27 File Specifications
28File Specifications
- Contains layout of all non-XML file types
- 83 separate specifications- 1 for each file type
- All have similar format
- Primary reference point for preparing and
submitting files - Revised for the 2005-2006 collection period
- Current specifications available at
http//www.ed.gov/about/inits/ed/pbdmi/file-specif
ications.html
29Section 1.0- Purpose
- Provides an overview of what data is submitted in
the file - Lists changes from previous school years file
specifications - Clarifies the set of Educational Units from which
the data should be collected - The Category Set table is now located in this
section - Easier to determine requirements of a file at a
glance
30Section 1.0- Purpose, contd
- Categories
- Lists the data groups that make up a specific
table entry - Details which fields should be populated for any
given record in a file
31Section 2.0- File Naming Convention
- Describes naming convention for consistency
- File names can be no longer than 25 characters
- Not strictly necessary, but strongly recommended
32Section 3.0- Guidance for Submitting this File
- Helpful hints and frequently asked questions
appear in this section - Information is specific to each particular file
specification - Clarifies when zero rows need to be submitted
33Sections 4-6 File Level Specifications
- Sections 4, 5, and 6 provide the technical
details needed to prepare the SEA, LEA, and
School level files - If data is not collected at a given level, that
level is left out - Includes tables listing the data groups and their
permitted values - Consists of 3 subsections
- Header Record Definition
- Data Record Definition
- Examples
34Sections 4-6 File Level Specifications, contd
- The Header and Data Record tables contain 7
columns - Data Group Name
- Start Position
- Length
- Type
- Mandatory/Optional
- Definition/Comments
- Permitted Values
35Sections 4-6 File Level Specifications, contd
- Data groups are listed in the order they should
exist within each record - Start position
- Indicates the starting point of a data group
measured in characters from the beginning of a
line - Only applies to fixed-width files
- Length
- The maximum number of characters for a data group
value - Values exceeding this length can cause format
errors - Applies to all non-XML file types
36Sections 4-6 File Level Specifications, contd
Types
- String
- Can consist of any alphanumerical character
- Leading zeros are significant in numbers
expressed as strings - Number
- Can consist of any integer value -1 or greater
- Leading zeros are not required
- Decimal
- Used to report percentages
- Consists of a digit followed by 4 decimal places
(0.8355 83.55)
37Mandatory vs. Optional
- An M in the Mandatory/Optional column indicates
that the field is mandatory and must contain some
data for each record within the file - An O in the Mandatory/Optional column indicates
that the field is Optional and data is not
required for that field - Optional fields may still be required in
accordance with the category chart of the file
specification
38Definition, contd
- Definition/Comments- gives a brief explanation of
the data group along with additional information
on preparing the file - Permitted Values- lists the only accepted values
for a given field - If blank, permitted values must be consistent
with the data type
39Examples
- The examples section includes an excerpt of
properly formatted file in each of the 3 non-XML
formats
40Order of File Submissions
41- 05-06 Transmittals
- Importance of Directory Files
- Relationships Among Files
42- Timing for 2005-2006 EDEN Data Submission
- The submission time frame for School Year
2005-2006 data begins in January 2006. - SEAs should begin submitting data as soon as the
data is available after January 1, 2006. - See Workbook
43- Importance of Directory Files
- Prior to any other data being submitted for an
education unit, a complete and correct Directory
record must be in place - SEA Directory data must be in place prior to
submission of LEA Directory data - LEA Directory data must be in place prior to
submission of School Directory data
44- Importance of Directory Files
- Directory and Grades Offered files are not school
year specific, but continue from year to year - They only need to be submitted when data changes
or if data for the education unit does not exist
in the database - Operational Status is now collected with the
directory file.
45- Relationships Among Files
- A Membership record (or any other non-directory
record) for a school will not be accepted unless
a record for that school exists in the directory
file - Grades Offered files should be submitted for any
LEA or School which has students enrolled
46- Updating Directory Files
- Submitting a transmittal file with only an
additional or corrected education unit is ok, or
the entire directory transmittal file may be
resubmitted - If the entire Directory transmittal file is
submitted, be sure to include the additional
and/or corrected education unit(s) - Submit grades offered for the additional
education unit - Submit membership data for the additional
education unit
47- Directory Files Identifiers
- State Identifiers are the primary identifiers
used to match a submitted education unit
directory record to an education unit directory
record on the EDEN database - NCES Identifiers may be provided, but are only
used a secondary check
48- Identifiers for SEA Data
- FIPS State Code and State Agency Number
- State Agency Number (default 01)
49- Identifiers for LEA Data
- FIPS State Code
- State Agency Number
- State-assigned LEA ID
50- Identifiers for School Data
- FIPS State Code
- State Agency Number
- State-assigned LEA ID
- State-assigned School Identifier
51- Adding/Updating an SEA
- FIPS/Agency Match - FIPS Code and State Agency
Number in submission match the FIPS Code and
State Agency Number on EDEN - Treated as update - Submission changes applied
- FIPS Match Agency Not Match - FIPS Code in
submission matches FIPS Code in EDEN - State
Agency Number in submission does not match State
Agency Number in EDEN - Treated as new state agency Agency added
52- Adding/Updating an LEA
- Primary IDs Match, No Secondary IDs/Secondary IDs
Match - Treated as update - Submission LEA changes
applied - State SEA IDs Match, LEA ID Not Match, No
Secondary IDs - Treated as new LEA added to EDEN
- State IDs Match, LEA ID Not Match, Secondary IDs
Match - Error (An attempt is being made to associate 2
different State LEA IDs to a single NCES ID.)
53- Modifying State Identifiers
- It is not possible to modify a States ID by
submitting an updated directory file - EDEN will see the modified ID as a new unit and
add it to the system, resulting in duplicate
records
54- Adding/Updating a School
- The same logic for identifiers applies at the
school level
55EDEN Edits, Errors, and Warnings
56 - Format Edits Errors
- Validation Edits Errors
- Reasonability Edits Errors
57- Format Edits and Errors
- Definition transmitted file must be in an
allowable specified format - Purpose If the transmitted file is not in an
allowable specified format, DRPS cannot interpret
it - Result If the file is acceptably formatted,
Validation Edits are performed. If not, the file
is rejected and an email will be sent providing a
link to the report. - A single format error will result in rejection
58- Invalid File Format
- Message Data is not in correct fixed (txt)
file format. - Message Data is not in correct delimited
(csv/tab) file format.
59- Validation (Hard) Edits and Errors
- Definition Identification of submissions where
invalid data has been used for a field where
specific values or a domain has been specified - Purpose To ensure that the data stored in the
Data Repository can be interpreted according to
agreed upon rules - Result Any validation error in any submission
will cause the entire file to be rejected.
However, all validation errors will be identified
up to a threshold - High Error Rates currently 1000 or more errors
identified will result in the termination of
validation edit processing - If a file contains no validation errors, any
applicable reasonability edits will be performed.
All submissions in the Transmittal will be
loaded into the Staging Database.
60- Most Common Validation Error 1
- Message The Category Set combination is invalid
for Table Type ltTable Type Namegt for the reported
ltcount or subtotalgt. - Cause A count or sub-total for an invalid
combination of Categories has been provided. For
example, a Membership student count was provided
only for grade level, where the Race/Ethnicity
and Gender are also required. In the case of a
Membership sub-total, a subtotal was provided for
all 3 Categories (Grade Level, Race/Ethnicity and
Gender) where each sub-total required only
contains 2 Categories each. - Resolution Refer to the appropriate file
specifications document for the correct Category
Sets and their valid combinations for a count or
subtotal for the table type.
61- Most Common Validation Error 2
- Message "Submitted State ID is not found in
EDEN." - Cause An attempt was made to submit metric
information (e.g., Membership) for an LEA or
school that does not exist in the EDEN Staging
Database. - Resolution Verify that the State School
Identifier in the membership file submission is
correct. If the State Identifier in the
membership file is correct, submit a directory
record for the school and then resubmit the
membership file. Refer to the EDEN Workbook for
a discussion on "Relationships Among Files."
62- Most Common Validation Error 3
- Message "The Category Code ltvaluegt, which was
submitted for the reported ltTable Type Namegt, is
not a Permitted Code." - Cause An invalid Category Permitted Value is
provided. - Resolution Refer to the appropriate file
specifications document for the permitted code
values that can be reported for each Category in
a Category Set for the data group in error.
63- Most Common Validation Error 4
- Message The State Agency Number is not found in
EDEN Directory. - Cause The State Agency Number in a submission
file cannot be found in the EDEN data base. - Resolution Ensure that the State Agency Number
for the FIPS State Code being submitted is
correct.
64- Most Common Validation Error 5
- Message Sub-total missing for ltTable Type Namegt
Table Type. - Cause A required sub-total is not provided for a
count type. - Resolution Refer to the appropriate file
specifications document for the correct Category
Sets and their valid combinations for a sub-total
for this Data Group.
65- Most Common Validation Error 6
- Message Field DisplayName contains value that
is not part of the valid set. - Cause The permitted value specified in the
displayed field name is not valid. - Resolution Correct the permitted value in error.
Refer to the appropriate file specifications
document for valid permitted values.
66- Most Common Validation Error 7
- Message The table type abbreviation does not
exist. - Cause An invalid Table Type Name was provided in
the Header Record. - Resolution Ensure that the Table Type Name is
not misspelled. (e.g., MUMBER instead of
MEMBER). In Fixed Format (.txt) files, ensure
that the Table Type Name is left-justified in the
field. Refer to the File Format Specifications
for the Table Type being submitted for the valid
Table Type Name.
67- Most Common Validation Error 8
- Message The Grand Total for ltTable Type Namegt
is less than the sum of its parts. - Cause A grand total was provided that was less
than the sum of the student counts for the
Category Set being reported. For example, the
grand total for the MEMBER Table Type is a total
of the Grade Level, Race Ethnicity and Gender
student counts for each education unit being
reported (i.e., a grand total is reported for
each education unit and must be equal to or
greater than the sum of the student counts for
that education unit). - Resolution Correct the grand total for the Table
Name and Category Set in question and resubmit
the file.
68- Most Common Validation Error 9
- Message Grand Total is missing for ltTable Type
Namegt Table Type. - Cause A required grand total is not provided for
a count type. - Resolution Refer to the appropriate file
specifications document for the correct Category
Sets and their valid combinations for a grand
total for this Data Group.
69- Most Common Validation Error 10
- Message The LEA ID is invalid for this State.
- Cause In a submission for an LEA, the
combination of State FIPS Code, State Agency
Number and LEA State Identifier cannot be found
in the EDEN Staging Database. - Resolution Add the LEA Directory information as
appropriate and then resubmit the school
Directory.
70- Reasonability Edits
- Definition Identification of submissions where
a questionable data has been submitted - Purpose While the data may fall outside normal
expectation there may be reasonable explanations
for it. An explanation is supplied to save
future communication between ED analysts or other
interpreters of the same data in the future - Result
- All submissions in the file are loaded into the
Staging Database but those submissions where
reasonability edits have failed must be
explained. - Such submissions will not be transferred to the
Data Repository until a reasonable explanation
has been approved
71 - Reasonability Warning 1
- Message School Membership count changed
significantly from last year. - Cause The student membership school count for a
count of more than 10 students has an increase or
decrease of more than 50 in the student count
for that category group. - Resolution The State should verify that the
count in question has been reported correctly and
provide an explanation accordingly.
72 - Reasonability Warning 2
- Message School Membership subtotal changed
significantly from last year. - Cause The submitted student membership sub-total
for a school has an increase or decrease of more
than 25 than the prior year subtotal. - Resolution The State should verify that the
sub-total in question has been reported correctly
and provide an explanation accordingly.
73 - Reasonability Warning 3
- Message LEA Membership count changed
significantly from last year. - Cause The student membership LEA count for a
submitted count of more than 25 students has an
increase or decrease of more than 25 than the
prior year student count for that category group.
- Resolution The State should verify that the
count in question has been reported correctly and
provide an explanation accordingly.
74Errors and Edits
- Questions concerning specific errors?
- Questions concerning problems or concerns with
regards to transmittals?
Open Session
75SEA Tools
- What the tools, programs your states uses to
build your EDEN Files? - Share any failures?
- Share any successes?
Open Session
76Recap Objectives
- Provided a basic understanding of ESS and its
sub-applications. - Provided an understanding of the data file
submission lifecycle. - Provided an in-depth knowledge of the File
Specifications and how to read them. - Provided time to learn from other SEA Partners
concerning internally used tools and processes.