Skip to content

Commit

Permalink
CSV Parser 2.1.0 (#131)
Browse files Browse the repository at this point in the history
* Update csv_row.cpp

* Simplified handling of quoted fields

* Added tokenizer

* Some minor code clean up

* Update raw_csv_data.hpp

* Some more code clean up

* Added thread safety test

* read_csv() now makes calls that are aligned to the CSV

* Added more CSV writer specializations

* Update test_round_trip.cpp

* Update test_round_trip.cpp

* Simplified CSVWriter

* Update csv_writer.hpp

* Updated docs

* Some code clean up

* Changed how custom types are serialized

* Update csv_writer.hpp

* Attempt to make BasicCSVParser no-copy

* Created different basic_csv_parser specializations

* Got CSVGuessing working again

* Got round trip test working again

* Got more tests working again

* Fixed some failing tests

* Most unit tests working again

* CSVStat no longer inherits from CSVReader

* Update single header

* Code clean up

* More code clean up

* Fix CSVStat segfault

* Fixed bug in parse_loop() & c++11 compatibility issues

* Fixed g++-6 compatibility

* Added some optimizations

* Code clean up

* Added CSVReader constructor over an ifstream

* Attempt to fix ifstream parsing

* Fixed std::ifstream parsing

* Added more comments

* Fixed a bunch of warnings

* Last update of docs

* Update README.md
  • Loading branch information
vincentlaucsb authored Oct 18, 2020
1 parent 9269423 commit 621a9d9
Show file tree
Hide file tree
Showing 39 changed files with 4,733 additions and 3,957 deletions.
39 changes: 35 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@
* [Single Header](#single-header)
* [CMake Instructions](#cmake-instructions)
* [Features & Examples](#features--examples)
* [Reading a Large File (with Iterators)](#reading-a-large-file-with-iterators)
* [Reading an Arbitrarily Large File (with Iterators)](#reading-an-arbitrarily-large-file-with-iterators)
* [Memory Mapped Files vs. Streams](#memory-mapped-files-vs-streams)
* [Indexing by Column Names](#indexing-by-column-names)
* [Numeric Conversions](#numeric-conversions)
* [Specifying the CSV Format](#specifying-the-csv-format)
Expand Down Expand Up @@ -86,7 +87,7 @@ target_link_libraries(<your program> csv)
```

## Features & Examples
### Reading a Large File (with Iterators)
### Reading an Arbitrarily Large File (with Iterators)
With this library, you can easily stream over a large file without reading its entirety into memory.

**C++ Style**
Expand Down Expand Up @@ -125,6 +126,29 @@ while (reader.read_row(row)) {
...
```

#### Memory-Mapped Files vs. Streams
By default, passing in a file path string to the constructor of `CSVReader`
causes memory-mapped IO to be used. In general, this option is the most
performant.

However, `std::ifstream` may also be used as well as in-memory sources via `std::stringstream`.

**Note**: Currently CSV guessing only works for memory-mapped files. The CSV dialect
must be manually defined for other sources.

```cpp
CSVFormat format;
// custom formatting options go here

CSVReader mmap("some_file.csv", format);

std::ifstream infile("some_file.csv", std::ios::binary);
CSVReader ifstream_reader(infile, format);

std::stringstream my_csv;
CSVReader sstream_reader(my_csv, format);
```
### Indexing by Column Names
Retrieving values using a column name string is a cheap, constant time operation.
Expand Down Expand Up @@ -314,15 +338,22 @@ using namespace std;
...

stringstream ss; // Can also use ofstream, etc.

auto writer = make_csv_writer(ss);
// auto writer = make_tsv_writer(ss); // For tab-separated files
// DelimWriter<stringstream, '|', '"'> writer(ss); // Your own custom format

writer << vector<string>({ "A", "B", "C" })
<< deque<string>({ "I'm", "too", "tired" })
<< list<string>({ "to", "write", "documentation." });

writer << array<string, 2>({ "The quick brown "fox", "jumps over the lazy dog" });
writer << array<string, 2>({ "The quick brown", "fox", "jumps over the lazy dog" });
writer << make_tuple(1, 2.0, "Three");
...

```
You can pass in arbitrary types into `DelimWriter` by defining a conversion function
for that type to `std::string`.
## Contributing
Bug reports, feature requests, and so on are always welcome. Feel free to leave a note in the Issues section.
37 changes: 22 additions & 15 deletions docs/source/Doxy.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,26 +8,24 @@ For quick examples, go to this project's [GitHub page](https://github.com/vincen
### CSV Reading
* csv::CSVFormat: \copybrief csv::CSVFormat
* csv::CSVReader
* csv::CSVReader::size(): \copybrief csv::CSVReader::size()
* csv::CSVReader::n_rows(): \copybrief csv::CSVReader::n_rows()
* csv::CSVReader::utf8_bom(): \copybrief csv::CSVReader::utf8_bom()
* csv::CSVReader::get_format(): \copybrief csv::CSVReader::get_format()
* Manually parsing string fragments
* csv::CSVReader::feed()
* Retrieving data
* csv::CSVReader::read_row()
* csv::CSVReader::iterator
* csv::CSVReader::iterator: Recommended
* csv::CSVReader::begin()
* csv::CSVReader::end()
* csv::CSVReader::read_row()
* Convenience Functions
* csv::parse()
* csv::operator ""_csv()
* csv::parse_no_header()
* csv::operator ""_csv_no_header()

### See also
#### See also
[Dealing with Variable Length CSV Rows](md_docs_source_variable_row_lengths.html)

### Working with parsed data
#### Working with parsed data
* csv::CSVRow: \copybrief csv::CSVRow
* csv::CSVRow::operator std::vector<std::string>()
* csv::CSVRow::iterator
Expand All @@ -42,10 +40,15 @@ For quick examples, go to this project's [GitHub page](https://github.com/vincen
### Statistics
* csv::CSVStat

### Writing
### CSV Writing
* csv::make_csv_writer(): Construct a csv::CSVWriter
* csv::make_tsv_writer(): Construct a csv::TSVWriter
* csv::DelimWriter
* csv::CSVWriter
* csv::TSVWriter
* Pre-Defined Specializations
* csv::CSVWriter
* csv::TSVWriter
* Methods
* csv::DelimWriter::operator<<()

## Frequently Asked Questions

Expand All @@ -65,8 +68,12 @@ is chosen as the starting row.
Because you can subclass csv::CSVReader, you can implement your own guessing hueristic. csv::internals::CSVGuesser may be used as a helpful guide in doing so.

### Is the CSV parser thread-safe?
The csv::CSVReader iterators are intended to be used from one thread at a time. However, csv::CSVRow and csv::CSVField objects should be
thread-safe (since they mainly involve reading data). If you want to perform computations on multiple columns in parallel,
you may want to avoid using the iterators and
use csv::CSVReader::read_row() to manually chunk your data. csv::CSVStat provides an example of how parallel computations
may be performed. (Specifically, look at csv::CSVStat::calc() and csv::CSVStat::calc_worker() in csv_stat.cpp).
This library already does a lot of work behind the scenes to use threads to squeeze
performance from your CPU. However, ambitious users who are in the mood for
experimenting should follow these guidelines:
* csv::CSVReader::iterator should only be used from one thread
* A workaround is to chunk blocks of `CSVRow` objects together and
create separate threads to process each column
* csv::CSVRow may be safely processed from multiple threads
* csv::CSVField objects should only be read from one thread at a time
* **Note**: csv::CSVRow::operator[]() produces separate copies of `csv::CSVField` objects
2 changes: 1 addition & 1 deletion include/csv.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
CSV for C++, version 2.0.1
CSV for C++, version 2.1.0
https://github.com/vincentlaucsb/csv-parser
MIT License
Expand Down
9 changes: 3 additions & 6 deletions include/internal/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,15 @@ add_library(csv STATIC "")

target_sources(csv
PRIVATE
basic_csv_parser.hpp
basic_csv_parser.cpp
col_names.cpp
col_names.hpp
compatibility.hpp
constants.hpp
common.hpp
csv_format.hpp
csv_format.cpp
csv_reader.hpp
csv_reader.cpp
csv_reader_internals.hpp
csv_reader_internals.cpp
csv_reader_iterator.cpp
csv_row.hpp
csv_row.cpp
Expand All @@ -22,8 +21,6 @@ target_sources(csv
csv_utility.hpp
csv_writer.hpp
data_type.h
raw_csv_data.hpp
raw_csv_data.cpp
)

set_target_properties(csv PROPERTIES LINKER_LANGUAGE CXX)
Expand Down
Loading

0 comments on commit 621a9d9

Please sign in to comment.