Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to configure a fixed length window to use a variable length? #90

Open
David-Jacobsen opened this issue Oct 19, 2022 · 3 comments

Comments

@David-Jacobsen
Copy link

I know the question sounds stupid, but I'm facing a challenge where some record formats are of a variable length.

Essentially, I have a file with multiple record formats, most of which are fixed length with a space between each field, however there is also a comments record type. The comments record type is identified by starting with a COM, and then the comment is the remainder of the line. So..

COM This is the comment
COM This is also a comment

Those lines are not space padded up to a defined length.

I've used the FixedLengthTypeMapperSelector to be able to map the various fixed length fields to their own classes, but I'm struggling to figure out a configuration to accommodate the comment record types. In reading the code, it looks like this might be possible as is with the Trailing window, but I'm not sure how to configure it.

@jehugaleahsa
Copy link
Owner

Take a look at this section in the README: https://github.com/jehugaleahsa/FlatFiles#skipping-records Basically, you add an event handler to the reader that lets you look at and skip a row before it gets processed. Let me know if you run into any issues.

@David-Jacobsen
Copy link
Author

David-Jacobsen commented Oct 19, 2022

Thank you, that's what I was doing initially but I was informed I need to load the comment records as well. It's a bit of a mess and I don't have control over the source format, nor the destination format.

Sample file:

COM This class has had multiple drop outs
COM This class was impacted by poor accessibility to computer lab
STU 897654 098 095
STU 876532 070 074
END

I have to load each record as the following entity

Class ClassData
{
 public string RecordType {get;set;}
 public string StudentId {get;set;}
 public string MidTerm {get;set;}
 public string Final {get;set;}
 public string Comment {get;set;}
}

So, I have a List() and I'm using FlatFiles to parse each record of each file, and then add it to the list.

var STUMapper = FixedLengthTypEmapper.Define<ClassData>();
STUMapper.Property(s => s.RecordType, new Window(4));
STUMapper.Property(s => s.StudentId, new Window(6));
STUMapper.Property(s => s.MidTerm, new Window(4));
STUMapper.Property(s => s.Final, new Window(3));

var ENDMapper = FixedLengthTypeMapper.Define<ClassData>();
ENDMapper.Property(e =>e.RecordType, new Window(3));

var COMMapper = FixedLengthTypeMapper.Define<ClassData>();
COMMapper.Property(c => c.RecordType, new Window(4));
COMMapper.Property(c => c.Comment, new Window(???));

var selector = new FixedLengthTypeMapperSelector();
selector.When(x => x.StartsWith("STU")).Use(STUMapper);
selector.When(x => x.StartsWith("COM")).Use(COMMapper);
selector.When(x => x.StartsWith("END")).Use(ENDMapper);

var records = new List<ClassData>();
using var reader = new StreamReader(file);
var fixedReader = selector.GetReader(reader);
while (await fixedReader.ReadAsync().ConfigureAwait(false))
{
 records.Add((ClassData)fixedReader.Current);
}

If I set the window for the comment arbitrarily small, then it works but obviously it truncates the comments. If I skip the comment records, it also works but it skips the comment records. If I set it as a delimited file, then the space would break up every record... unless a delimitedtypemapper has the ability to 'combine' all trailing fields into a single field?

~Edit:
I actually do have a very janky work around but it will likely come at the cost of performance. Essentially, I scan the file twice. Once using the FixedLengthMapperSelector which will parse all of the fixed length records, and then as a DelimitedTypeMapper with "COM " as the delimiter, and hardcoding the RecordType to "COM"...

var comOptions = new DelimitedOptions();
comOptions.Separator = "COM ";
var comMapper = DelimitedTypeMapper.Define<ClassData>();

comMapper.Property(p => p.RecordType).OnParsed((o,t) => o.RecordContext.Values[0] = "COM");
comMapper.Property(p => p.Comment);

The FixedLengthMapperSelector will skip all records starting with "COM" and the DelimitedTypeMapper will skip all records with Values.Length < 2

Again, it's really janky but it works unless you have a better idea.

@David-Jacobsen
Copy link
Author

Not sure what's preferred, editing the original or adding another comment, but I believe I have found a solution with Window.Trailing.

var COMMapper = FixedLengthTypeMapper.Define<ClassData>();
COMMapper.Property(c => c.RecordType, new Window(4));
COMMapper.Property(c => c.Comment, Window.Trailing);

Initial tests have this working. Thanks for such an excellent and configurable product!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants