# Data structure for text editor

36,404

## Solution 1

On good old ZX-Spectrum one (or more, I do not know) text exditor used very simple structure.

There was one big buffer, which occupied all free RAM. Text was split in two parts at the cursor. Part before the cursor, was placed at the beginning of the buffer, and the rest at the end of the buffer. As text typed, data simply added to the end of first part, and when cursor is moved, text is copied forth and back.

Buffer layout:

``````Hello, World!
^------Cursor here

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|H|e|l|l|o|,| |W| <free>  |o|r|l|d|!|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                ^         ^        |
begin           cur1      cur2    end
``````

That's, how some edit operations was made:

``````Type a char:    buffer[cur1++] = character

Backspace:      cur1--

Cursor left:    buffer[--cur2] = buffer[--cur1]

Cursor right:   buffer[cur1++] = buffer[cur2++]
``````

Buffer in action:

``````             Hello, W..............orld!
Press right          ^             ^
Hello, Wo..............rld!
Press backspace       ^             ^
Hello, W...............rld!
Press 0              ^              ^
Hello, W0..............rld!
^             ^
``````

## Solution 2

Rope

A rope is essentially a binary tree whose leaves are arrays of characters. A node in the tree has a left child and a right child - the left child is the first part of the string, while the right child is the final part of the string. Concatenation of two ropes simply involves the creation of a new tree node with both ropes as children. To ensure logarithmic time indexing and sub-string operations the resulting rope may need to be balanced. Various balancing strategies are possible.

The main advantages of ropes as compared to storing strings as character arrays is that they enable much faster concatenation than ordinary strings, and don't require a large contiguous memory space to store a large string. The main disadvantages are greater overall space usage and slower indexing, both of which become more severe as the tree structure becomes larger and deeper. However, many practical applications of indexing involve only iteration over the string, which remains fast as long as the leaf nodes are large enough to benefit from cache effects.

## Solution 3

I know it's too late for an answer, but I found The Craft of Text Editing book really useful. It contains description of several buffer models with their pros and cons. Unfortunately, it doesn't mention Ropes data structure.

## Solution 4

You might find this interesting, even if it does not exactly answer your question:

Most efficient data structure to add styles to text

I am hoping that the discussion will go to fascinating places :-)

## Solution 5

As @Vovanium already mentioned the basic theory of how gap buffer can be used, I have implemented a version of C/C++.

Code:

``````#include <stdio.h>
#define SZ 1000

char leftArray[SZ], rightArray[SZ];
int leftCount, rightCount;
int cursorPos;

/*
* Private APIs
*/

void printArray(){

for(register int i = 0; i < leftCount; i++){
printf("%c", leftArray[i]);
}

for(register int i = rightCount - 1; i >= 0; i--){
printf("%c", rightArray[i]);
}
printf("\n");
}

while(leftCount > pos){
rightArray[rightCount++] = leftArray[--leftCount];
}

while(leftCount < pos){
leftArray[leftCount++] = rightArray[--rightCount];
}
}

/*
* Public APIs for Text Editor
*/

void init(){

cursorPos = leftCount = rightCount = 0;
}

void addWord(char word[], int len){

for(register int i = 0; i < len; i++){
leftArray[leftCount++] = word[i];
}
leftArray[leftCount] = 0;
}

leftCount--;
}

void moveCurson(int newPosition){

cursorPos = newPosition;
}

void subString(int pos, int length, char result[]){

int k = 0;
int l = 0;
while(k + pos < leftCount && k < length){
result[k] = leftArray[pos + k];
k++;
}

length -= k;
while( l < length){
result[k++] = rightArray[rightCount - 1 - l];
l++;
}
}
``````
Share:
36,404
Author by

### Michael

Scala (moslty) programmer

Updated on March 04, 2020

• Michael over 3 years

This is an interview question. What data structure would you use to store the text in a text editor?

• thkala almost 13 years
+1 for telling me what the structure I re-invented (sic), described and suggested in one of my posts is called officially :-)
• Tony Delroy almost 13 years
@thkala: pays to do some research first - nothing new under the sun ;-)
• Tony Delroy almost 13 years
As well as concatenation, Rope's are typically relatively good (compared to entirely contiguous storage) for many deletion, insertion and substitution actions, especially near the start of the document or where growing the contiguous storage would require a move in memory.
• thkala almost 13 years
@Tony: I learnt about half a ton of Data structures in a class a few years back, but English is not my native language and the professors were not that consistent at providing the English terminology. My English is quite good, but some times it can be quite hard matching what you remember with its generally accepted name... "it was a tree with this and that feature" is not always helpful :-/
• thkala almost 13 years
BTW I just updated the Wikipedia page on Binary trees to contain a link to the Rope data structure... would have been very helpful a few days ago :-)
• Aaron Digulla almost 13 years
For reference: This is called a "gab buffer". Most implementations don't move the buffer when you move the cursor. They just do it on insert/delete operations.
• Vovanium almost 13 years
@Aaron Digulla: Thanks, good addition. Both implementations have their reasons.
• Andres Kievsky about 12 years
There's a typo there: it's called gap buffer And here's more information from Wikipedia
• tigrou over 5 years
How do you manage multiple lines ? is there a gap buffer for each line or just a single one for the whole document ? AFAIK in case of single gap buffer, it means a lot of data to move cursor is moved from start to end. Or in such editors is moving cursor across lines is not allowed ? (only left / right)