Data

Project Corpus Train Dev Test Download
Semantics Proto-Roles Penn TreeBank 7800 969 969 v1 (tar.gz)
English Web TreeBank 4877 632 582 v2 (tar.gz)
Factuality English Web TreeBank 5668 652 600 v1 (tar.gz)
English Web TreeBank 22279 2660 2561 v2 (tar.gz)
Genericity English Web TreeBank 26721 3274 3119 pred (zip)
English Web TreeBank 30035 3611 3500 arg (zip)
Time English Web TreeBank 59593 16914 15411 v1 (zip)
Word Sense English Web TreeBank 17202 1943 1876 v1 (tar.gz)
Common Sense Inference See paper - - - Full JOCI (zip)
SNLI 2379 299 298 Subset A (zip)
SNLI + Gigaword 5091 643 641 Subset B (zip)
Diverse Natural Language Inference See paper 249947 31696 31232 Inference Is Everything (zip)
See paper - - 570459 DNC v0.1 (url)
ParaBank See paper - - - ParaBank v1.0 Full (~9 GB) (zip)
See paper - - - ParaBank v1.0 Large, 50m pairs (~3 GB) (zip)
See paper - - - ParaBank v1.0 Small Diverse, 5m pairs (zip)
See paper - - - ParaBank v1.0 Large Diverse, 50m pairs (zip)