For applications that require processing large amounts of text at inference
time, Large Language Models (LLMs) are handicapped by their limited context
windows, which are typically 2048 tokens. In-context learning, an emergent
phenomenon in LLMs in sizes above a certain parameter threshold, constitutes
one significant example because it can only leverage training examples that fit
into the context window. Existing efforts to address the context window
limitation involve training specialized architectures, which tend to be smaller
than the sizes in which in-context learning manifests due to the memory
footprint of processing long texts. We present Parallel Context Windows (PCW),
a method that alleviates the context window restriction for any off-the-shelf
LLM without further training. The key to the approach is to carve a long
context into chunks (``windows'') that fit within the architecture, restrict
the attention mechanism to apply only within each window, and re-use the
positional embeddings among the windows. We test the PCW approach on in-context
learning with models that range in size between 750 million and 178 billion
parameters, and show substantial improvements for tasks with diverse input and
output spaces. Our results motivate further investigation of Parallel Context
Windows as a method for applying off-the-shelf LLMs in other settings that
require long text sequences.