Data structure for text editor

36,404

Solution 1

On good old ZX-Spectrum one (or more, I do not know) text exditor used very simple structure.

There was one big buffer, which occupied all free RAM. Text was split in two parts at the cursor. Part before the cursor, was placed at the beginning of the buffer, and the rest at the end of the buffer. As text typed, data simply added to the end of first part, and when cursor is moved, text is copied forth and back.

Buffer layout:

Hello, World!
        ^------Cursor here

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|H|e|l|l|o|,| |W| <free>  |o|r|l|d|!|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                ^         ^        |
begin           cur1      cur2    end

That's, how some edit operations was made:

Type a char:    buffer[cur1++] = character

Backspace:      cur1--

Cursor left:    buffer[--cur2] = buffer[--cur1]

Cursor right:   buffer[cur1++] = buffer[cur2++]

Buffer in action:

             Hello, W..............orld!
Press right          ^             ^
             Hello, Wo..............rld!
Press backspace       ^             ^
             Hello, W...............rld!
Press 0              ^              ^
             Hello, W0..............rld!
                      ^             ^

Solution 2

Rope

A rope is essentially a binary tree whose leaves are arrays of characters. A node in the tree has a left child and a right child - the left child is the first part of the string, while the right child is the final part of the string. Concatenation of two ropes simply involves the creation of a new tree node with both ropes as children. To ensure logarithmic time indexing and sub-string operations the resulting rope may need to be balanced. Various balancing strategies are possible.

The main advantages of ropes as compared to storing strings as character arrays is that they enable much faster concatenation than ordinary strings, and don't require a large contiguous memory space to store a large string. The main disadvantages are greater overall space usage and slower indexing, both of which become more severe as the tree structure becomes larger and deeper. However, many practical applications of indexing involve only iteration over the string, which remains fast as long as the leaf nodes are large enough to benefit from cache effects.

Solution 3

I know it's too late for an answer, but I found The Craft of Text Editing book really useful. It contains description of several buffer models with their pros and cons. Unfortunately, it doesn't mention Ropes data structure.

Solution 4

You might find this interesting, even if it does not exactly answer your question:

Most efficient data structure to add styles to text

I am hoping that the discussion will go to fascinating places :-)

Solution 5

As @Vovanium already mentioned the basic theory of how gap buffer can be used, I have implemented a version of C/C++.

Code:

#include <stdio.h>
#define SZ 1000

char leftArray[SZ], rightArray[SZ];
int leftCount, rightCount;
int cursorPos;

/*
 * Private APIs
 */

void printArray(){

    for(register int i = 0; i < leftCount; i++){
        printf("%c", leftArray[i]);
    }

    for(register int i = rightCount - 1; i >= 0; i--){
        printf("%c", rightArray[i]);
    }
    printf("\n");
}

void adjust(int pos){

    while(leftCount > pos){
        rightArray[rightCount++] = leftArray[--leftCount];
    }

    while(leftCount < pos){
        leftArray[leftCount++] = rightArray[--rightCount];
    }
}


/*
 * Public APIs for Text Editor
 */

void init(){

    cursorPos = leftCount = rightCount = 0;
}

void addWord(char word[], int len){

    adjust(cursorPos);

    for(register int i = 0; i < len; i++){
        leftArray[leftCount++] = word[i];
    }
    leftArray[leftCount] = 0;
}

void addBackSpace(){

    adjust(cursorPos);
    leftCount--;
}

void moveCurson(int newPosition){

    cursorPos = newPosition;
}

void subString(int pos, int length, char result[]){

        adjust(cursorPos);

    int k = 0;
        int l = 0;
    while(k + pos < leftCount && k < length){
        result[k] = leftArray[pos + k];
        k++;
    }

    length -= k;
    while( l < length){
        result[k++] = rightArray[rightCount - 1 - l];
        l++;
    }
}
Share:
36,404
Michael
Author by

Michael

Scala (moslty) programmer

Updated on March 04, 2020

Comments

  • Michael
    Michael about 4 years

    This is an interview question. What data structure would you use to store the text in a text editor?

  • thkala
    thkala over 13 years
    +1 for telling me what the structure I re-invented (sic), described and suggested in one of my posts is called officially :-)
  • Tony Delroy
    Tony Delroy over 13 years
    @thkala: pays to do some research first - nothing new under the sun ;-)
  • Tony Delroy
    Tony Delroy over 13 years
    As well as concatenation, Rope's are typically relatively good (compared to entirely contiguous storage) for many deletion, insertion and substitution actions, especially near the start of the document or where growing the contiguous storage would require a move in memory.
  • thkala
    thkala over 13 years
    @Tony: I learnt about half a ton of Data structures in a class a few years back, but English is not my native language and the professors were not that consistent at providing the English terminology. My English is quite good, but some times it can be quite hard matching what you remember with its generally accepted name... "it was a tree with this and that feature" is not always helpful :-/
  • thkala
    thkala over 13 years
    BTW I just updated the Wikipedia page on Binary trees to contain a link to the Rope data structure... would have been very helpful a few days ago :-)
  • Aaron Digulla
    Aaron Digulla over 13 years
    For reference: This is called a "gab buffer". Most implementations don't move the buffer when you move the cursor. They just do it on insert/delete operations.
  • Vovanium
    Vovanium over 13 years
    @Aaron Digulla: Thanks, good addition. Both implementations have their reasons.
  • Andres Kievsky
    Andres Kievsky over 12 years
    There's a typo there: it's called gap buffer And here's more information from Wikipedia
  • tigrou
    tigrou almost 6 years
    How do you manage multiple lines ? is there a gap buffer for each line or just a single one for the whole document ? AFAIK in case of single gap buffer, it means a lot of data to move cursor is moved from start to end. Or in such editors is moving cursor across lines is not allowed ? (only left / right)