NICER: the Nagoya Interlanguage Corpus of English Reborn 1.0には、英語母語話者と英語学習者が執筆したエッセイファイルが収録されている。
@Begin @Participants: JPN501 @PID: PIDJP501 @Age: 21 @Sex: F @YearInSchool: U2 @Major: agriculture @StudyHistory: 8 @OtherLanguage: Chinese=1.0;none= @Qualification: TOEIC=590(2013);none=;none= @Abroad: none=;none= @Reading: 3 @Writing: 2 @Listening: 2 @Speaking: 1 @JapaneseEssay: 4 @EnglishEssayEx: 3 @EnglishEssay: 2 @Difficulty: @EssayTraining: 3 @SelfEval: 2 @TopicEase: 4 @Topic: sports @Criterion: 4 @Proctor: 1 @Comments: @Date: 2013-12-17 @Version: 1.0 *JPN501: What kind of sports do you like? %NTV: OK %COM: *JPN501: Do you like soccer, base ball or swimming? %NTV: Do you like soccer, baseball, or swimming? %COM: "Baseball" is one word. In lists with three or more items, put a comma between each item, including one before the final "and". *JPN501: There are many and variety sports around the world. %NTV: There are many varieties of sports around the world. %COM: *JPN501: A country has some traditional sports. %NTV: Most countries have some traditional sports. %COM: *JPN501: Of course, there are some traditional sports in Japan. %NTV: OK %COM: *JPN501: They are called "BUDO". %NTV: They are called budo. %COM: This word does not require capitalization. *JPN501: BUDO are JYUDO, KENDO, KYUDO and so on. %NTV: Budo include judo, kendo, kyudo, and so on. %COM: These words do not require capitalization. *JPN501: If you play BUDO, there is an important thing that you must remember. %NTV: If you play budo, there is one important thing you must remember. %COM: *JPN501: It is "REI". %NTV: It is rei. %COM: %par: 中略 *JPN501: We Japanese should be proud of and teach more many people around the world about this traditional sports "BUDO". %NTV: We Japanese should be proud of and teach many more people around the world about our traditional sports, budo. %COM: @End
*JPN501: What kind of sports do you like? *JPN501: Do you like soccer, base ball or swimming? *JPN501: There are many and variety sports around the world.
for file in *.txt; do sed -Ei "1,28d" ${file}; done
for file in *.txt; do sed -Ei "" "1,28d" ${file}; done
perl -pi -e 's/(%NTV:\t.+|%COM:\t.+|%COM:|\@End|\*JPN\d\d\d:\t)//g' *.txt
perl -pi -e 's/^\n//g' *.txt
perl -pi -e 's/%par://g' *.txt