Skip to content

Commit 0bbfe81

Browse files
authored
Cleaned up and updated READMEs (#29)
* Cleaned up and updated READMEs * Updated README to remove redundant pip command
1 parent a46ae9e commit 0bbfe81

5 files changed

Lines changed: 27 additions & 22 deletions

File tree

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,5 +14,5 @@ This will download the CoNLL-2003 corpus to `original_corpus/`, apply correction
1414
corrected corpus in `corrected_corpus/`.
1515

1616
NOTE: [Text Extensions for Pandas](https://github.com/CODAIT/text-extensions-for-pandas) must be
17-
installed to run the script. To install in your Python environment, use the command
18-
`pip install text-extensions-for-pandas`.
17+
installed to run the script. It provides utilities to download and work with the CoNLL-2003
18+
corpus and assist with NLP analysis on Pandas.

corrected_corpus/README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
# Creating the Corrected CoNLL-2003 Corpus
22

3+
This directory will contain the corrected CoNLL-2003 corpus that is the result
4+
of applying label corrections on the original corpus after running the
5+
following command from the project home directory:
6+
7+
$ python scripts/download_corpus_and_correct_labels.py
8+
39
The CoNLL-2003 corpus is licensed for research use only. Be sure to
410
adhere to the terms of the license when using this data set!
5-
6-
In the project home directory, run `python scripts/download_corpus_and_correct_labels.py`,
7-
which will download the corpus and apply label corrections developed in
8-
this project to write the corrected corpus to this directory.

corrected_labels/README.md

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,12 @@
1-
# data directory
2-
3-
Data for hand-labeling.
1+
This directory This directory contains corrected labels for the different error types, as well as a single file with
2+
all corrections combined: `all_corrections_combined.csv`. Sub-directories contain the different stages
3+
of model outputs, human labels and final audited corrections.
44

55
File Name | Produced By | Description
66
------------------------------- | -------------- | --------------------------------------------------------------------
77
`all_conll_corrections_combined.csv` | `Label_stats.ipynb` | A consolidated list of all the corrections that we will perform on the corpus
8+
`annotator_rubric.csv` | | A list of owners of annotations/audit of each underlying file
89
`sentence_corrections.json` | `sentence_correction_preprocessing.ipynb` | Final list of lines to be deleted from the corpus/submissions
9-
`model_outputs` | original model outputs |
10-
`human_labels` | human annotations on top of model outputs |
11-
`human_labels_auditted` | secondary review (audit) of above human labels
12-
`Inter Annotator Agreement.ipynb` | A notebook analysing the relations between different stages of the correction process
13-
`annotator_rubric.csv` | A list of owners of annotations/audit of each underlying file
10+
`model_outputs` | trained model ensemble outputs | model predictions of correct labels
11+
`human_labels` | Manual inspection of labels | human annotations on top of model outputs |
12+
`human_labels_auditted` | Peer-reviewed audits | secondary review (audit) of above human labels

original_corpus/README.md

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,9 @@
11
# Downloading the CoNLL-2003 Corpus
22

3-
The CoNLL-2003 corpus is licensed for research use only. Be sure to
4-
adhere to the terms of the license when using this data set!
3+
This directory will contain the original, unmodified CoNLL-2003 corpus after
4+
running the following command from the project home directory:
55

6-
The corpus can be downloaded using the package "Text Extensions for
7-
Pandas." To install use the command `pip install text-extensions-for-pandas`,
8-
see https://github.com/CODAIT/text-extensions-for-pandas for more info.
6+
$ python scripts/download_corpus_and_correct_labels.py
97

10-
To download the corpus and apply label corrections developed in
11-
this project, run the following in the project home directory:
12-
`python scripts/download_corpus_and_correct_labels.py`
8+
The CoNLL-2003 corpus is licensed for research use only. Be sure to
9+
adhere to the terms of the license when using this data set!

scripts/README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
This directory contains scripts for
2+
3+
File Name | Description
4+
------------------------------- | --------------------------------------------------------------------
5+
`download_and_correct_corpus.py` | Script to download the original CoNLL-2003 corpus and apply corrections
6+
`compute_precision_and_recall.py` | Compute precision and recall for CoNLL-2003 team submissions, see [instructions](../reproduce_experiments/README.md).
7+
`Inter Annotator Agreement.ipynb` | A notebook analysing the relations between different stages of the correction process

0 commit comments

Comments
 (0)