Statical learning: Datasets

The main drawback of statistical learning is that it requires huge quantities of data before it becomes useful and sufficiently smart for our purposes.

It will be nearly impossible to produce all the required data by hand, hence the need to automate as much of the work as possible.

Here, we need to identify sources of datasets that can be used without too much effort to train the statistical model.

Datasets

English Grammar Databases

Use English Grammar dictionaries. Look for the word "is" and copy all the usage examples with label "is_handling". Look for the word "has" and copy all the usage examples with label "has_handling" and so on and so forth.

One could go a degree further and look at all synonyms of "is", then for each close synonym found, look at that word's grammar usage examples.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Statical learning: Datasets

Datasets

English Grammar Databases

Clone this wiki locally