About

This repo contains datasets and code for EMNLP 2020 paper Assessing Phrasal Representation and Composition in Transformers, by Lang Yu and Allyson Ettinger.

Our follow-up paper On the Interplay Between Fine-tuning and Composition in Transformers is coming up at Findings of ACL 2021.

Notes

I wrote two notes explaining the paper. They can be found here.

Dependencies

The code is implemented with Huggingface's transformer v2.5.1. There are multiple API changes in their recent releases. And you may need to adjust related code if you are using a different version.

Repo structure

src/ contains source code to generate dataset and perform analysis
datasets contains dataset for landmark test from (Kintsch 2001), and datasets we used to get the metrics reported in the paper.

Dataset

Similarity correlation dataset: As mentioned in the paper, the full dataset can be downloaded here: http://saifmohammad.com/WebPages/BiRD.html. Please refer to the original paper for details about the dataset.
Paraphrase classification dataset: You can download ppdb-2.0-tldr from http://paraphrase.org.
Landmark experiment dataset: included in the repo. Exact from the serie of papers by Kintsch.

You can use the datasets inclued in the dataset folder, or download the full dataset and run code to generate controlled dataset yourself. There are multiple configurations available to specify number of samples used, proportion of negative samples etc. in configuration.py.

Code

main.py: main entrance for correlation and classification experiment
workload_generator.py: preprocessing logic. ppdb filtering and controlled dataset generation is implemented in this file.
configuration.py: configuration
kintsch_exp.py: main entrance for landmark experiment
analyzer: helper class to perform analysis

Usage

update configuration.py to specify: a) dataset location, b) experiment you want to run and c) model class you want to test
to run correlation or classification experiment, run python main.py (after you set proper configuration in configuration.py)
to run landmark experiment, run python kintsch_exp.py

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
datasets		datasets
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Notes

Dependencies

Repo structure

Dataset

Code

Usage

About

Releases

Packages

Languages

yulang/phrasal-composition-in-transformers

Folders and files

Latest commit

History

Repository files navigation

About

Notes

Dependencies

Repo structure

Dataset

Code

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages