Remove extra white spaces in C++
Solution 1
Here's a simple, non-C++11 solution, using the same remove_extra_whitespace()
signature as in the question:
#include <cstdio>
void remove_extra_whitespaces(char* input, char* output)
{
int inputIndex = 0;
int outputIndex = 0;
while(input[inputIndex] != '\0')
{
output[outputIndex] = input[inputIndex];
if(input[inputIndex] == ' ')
{
while(input[inputIndex + 1] == ' ')
{
// skip over any extra spaces
inputIndex++;
}
}
outputIndex++;
inputIndex++;
}
// null-terminate output
output[outputIndex] = '\0';
}
int main(int argc, char **argv)
{
char input[0x255] = "asfa sas f f dgdgd dg ggg";
char output[0x255] = "NO_OUTPUT_YET";
remove_extra_whitespaces(input,output);
printf("input: %s\noutput: %s\n", input, output);
return 1;
}
Output:
input: asfa sas f f dgdgd dg ggg
output: asfa sas f f dgdgd dg ggg
Solution 2
There are already plenty of nice solutions. I propose you an alternative based on a dedicated <algorithm>
meant to avoid consecutive duplicates: unique_copy()
:
void remove_extra_whitespaces(const string &input, string &output)
{
output.clear(); // unless you want to add at the end of existing sring...
unique_copy (input.begin(), input.end(), back_insert_iterator<string>(output),
[](char a,char b){ return isspace(a) && isspace(b);});
cout << output<<endl;
}
Here is a live demo. Note that I changed from c style strings to the safer and more powerful C++ strings.
Edit: if keeping c-style strings is required in your code, you could use almost the same code but with pointers instead of iterators. That's the magic of C++. Here is another live demo.
Solution 3
Since you use C++, you can take advantage of standard-library features designed for that sort of work. You could use std::string
(instead of char[0x255]
) and std::istringstream
, which will replace most of the pointer arithmetic.
First, make a string stream:
std::istringstream stream(input);
Then, read strings from it. It will remove the whitespace delimiters automatically:
std::string word;
while (stream >> word)
{
...
}
Inside the loop, build your output string:
if (!output.empty()) // special case: no space before first word
output += ' ';
output += word;
A disadvantage of this method is that it allocates memory dynamically (including several reallocations, performed when the output string grows).
Solution 4
You can use std::unique which reduces adjacent duplicates to a single instance according to how you define what makes two elements equal is.
Here I have defined elements as equal if they are both whitespace characters:
inline std::string& remove_extra_ws_mute(std::string& s)
{
s.erase(std::unique(std::begin(s), std::end(s), [](unsigned char a, unsigned char b){
return std::isspace(a) && std::isspace(b);
}), std::end(s));
return s;
}
inline std::string remove_extra_ws_copy(std::string s)
{
return remove_extra_ws_mute(s);
}
std::unique moves the duplicates to the end of the string and returns an iterator to the beginning of them so they can be erased.
Additionally, if you must work with low level strings then you can still use std::unique on the pointers:
char* remove_extra_ws(char const* s)
{
std::size_t len = std::strlen(s);
char* buf = new char[len + 1];
std::strcpy(buf, s);
// Note that std::unique will also retain the null terminator
// in its correct position at the end of the valid portion
// of the string
std::unique(buf, buf + len + 1, [](unsigned char a, unsigned char b){
return (a && std::isspace(a)) && (b && std::isspace(b));
});
return buf;
}
Solution 5
There are plenty of ways of doing this (e.g., using regular expressions), but one way you could do this is using std::copy_if
with a stateful functor remembering whether the last character was a space:
#include <algorithm>
#include <string>
#include <iostream>
struct if_not_prev_space
{
// Is last encountered character space.
bool m_is = false;
bool operator()(const char c)
{
// Copy if last was not space, or current is not space.
const bool ret = !m_is || c != ' ';
m_is = c == ' ';
return ret;
}
};
int main()
{
const std::string s("abc sssd g g sdg gg gf into abc sssd g g sdg gg gf");
std::string o;
std::copy_if(std::begin(s), std::end(s), std::back_inserter(o), if_not_prev_space());
std::cout << o << std::endl;
}
Damian
Updated on July 12, 2022Comments
-
Damian almost 2 years
I tried to write a script that removes extra white spaces but I didn't manage to finish it.
Basically I want to transform
abc sssd g g sdg gg gf
intoabc sssd g g sdg gg gf
.In languages like PHP or C#, it would be very easy, but not in C++, I see. This is my code:
#include <iostream> #include <stdio.h> #include <stdlib.h> #include <cstring> #include <unistd.h> #include <string.h> char* trim3(char* s) { int l = strlen(s); while(isspace(s[l - 1])) --l; while(* s && isspace(* s)) ++s, --l; return strndup(s, l); } char *str_replace(char * t1, char * t2, char * t6) { char*t4; char*t5=(char *)malloc(10); memset(t5, 0, 10); while(strstr(t6,t1)) { t4=strstr(t6,t1); strncpy(t5+strlen(t5),t6,t4-t6); strcat(t5,t2); t4+=strlen(t1); t6=t4; } return strcat(t5,t4); } void remove_extra_whitespaces(char* input,char* output) { char* inputPtr = input; // init inputPtr always at the last moment. int spacecount = 0; while(*inputPtr != '\0') { char* substr; strncpy(substr, inputPtr+0, 1); if(substr == " ") { spacecount++; } else { spacecount = 0; } printf("[%p] -> %d\n",*substr,spacecount); // Assume the string last with \0 // some code inputPtr++; // After "some code" (instead of what you wrote). } } int main(int argc, char **argv) { printf("testing 2 ..\n"); char input[0x255] = "asfa sas f f dgdgd dg ggg"; char output[0x255] = "NO_OUTPUT_YET"; remove_extra_whitespaces(input,output); return 1; }
It doesn't work. I tried several methods. What I am trying to do is to iterate the string letter by letter and dump it in another string as long as there is only one space in a row; if there are two spaces, don't write the second character to the new string.
How can I solve this?
-
Damian over 8 yearsyes,
string
>char[0x255]
, i agree, but i want to stick withchar*
because all the code is inchar*
... -
Damian over 8 yearsyes,
string
>char[0x255]
, i agree, but i want to stick withchar*
because all the code is inchar*
... -
Damian over 8 yearsyes,
string
>char[0x255]
, i agree, but i want to stick withchar*
because all the code is inchar*
... , can it be done? -
anatolyg over 8 yearsYou can convert back and forth - from
char*
tostring
by a constructor, and back byc_str()
andstrcpy
. Lots of unnecessary work for the CPU, but less headache for you. -
Ami Tavory over 8 yearsNot sure you meant to address the comment to me, but see
string::c_str
. -
jaggedSpire over 8 yearsthis leaves one extra space at the end of the string if it ends in whitespace. Not sure if OP's shifting requirements need that to be taken care of...
-
Deduplicator over 8 years@anatolyg: If it's done at the right places at the right times, there's probably at most a little amount of extra-work for the optimizer.
-
Christophe over 8 yearsNice, but the static
prev_is_space
would not be reset if you would execute this bloc several times (in a loop or in a function or in several threads). For this to work safely you'd need to capture a local bool that you can reset when needed. -
Ami Tavory over 8 years@jaggedSpire Good point. I must say I thought of that, and decided (perhaps wishfull-thinkingly) that it fits the problem requirements. If not, though, it can be solved with a single line after the application of
copy_if
. -
Deduplicator over 8 yearsThat's a nice one. Though it should have the original's signature, probably.
-
Lol4t0 over 8 years@Christophe, I see. Thanks.
-
Christophe over 8 years@Deduplicator yes, I edited to recommend switching to
std::string
-
Damian over 8 yearsyes, i agree as well, string is the best, but all the
script
is written (2000 lines) usingchar*
... and this script must run oncentos 4, 5.1
,debian 4
,unix based systems
... and so on, and it is better to use the simplest functions possible, to not getsegmentation fault
... -
Damian over 8 yearsyes, i agree as well, string is the best, but all the
script
is written (2000 lines) usingchar*
... and this script must run oncentos 4, 5.1
,debian 4
,unix based systems
... and so on, and it is better to use the simplest functions possible, to not getsegmentation fault
... -
Damian over 8 yearsyes, i agree as well, string is the best, but all the
script
is written (2000 lines) usingchar*
... and this script must run oncentos 4, 5.1
,debian 4
,unix based systems
... and so on, and it is better to use the simplest functions possible, to not getsegmentation fault
... -
Damian over 8 yearshmm, very intresting, so basically your
int remove_whitesaces(char *p)
function, does not have to take two parameters, just modify it "on the fly" with the power of pointers, right? -
Damian over 8 yearsyes, i agree as well, string is the best, but all the
script
is written (2000 lines) usingchar*
... and this script must run oncentos 4, 5.1
,debian 4
,unix based systems
... and so on, and it is better to use the simplest functions possible, to not getsegmentation fault
... -
Jts over 8 yearsYeah, because the output length will always be equal or lower than the input length, so there's no need to create another object. I also overloaded it to support std::strings (and again no memory allocation takes place). I thought you would accept my answer since it's actually customizable (and doesn't accept tabs ('\t') which are considered spaces by almost everyone. And it can ignore line breaks if needed.
-
Jts over 8 yearsYour function doesn't work properly. If there's spaces in the beggining or the end, it keeps them. Not what the op wants.
-
villapx over 8 yearsNo problem. Note also that
remove_extra_whitespaces()
assumes that the final string won't overflow the memory allocated foroutput
; if it does, you'd likely get a segmentation fault. -
Christophe over 8 years@José my function removes redundant spaces as requested by the OP. I couldn't find any evidence in the question that the starting space or the ending space should be removed. If this would be a requirement, you'd just replace
input.begin()
with afind_if()
and add a conditional erase before returning. -
Christophe over 8 years@Damian the nice thing with the algorithm library is that many algorithms also work with pointers instead of iterators. Here the online demo using the same algorithm , yet keeping c-style strings as you like them ;-)
-
Deduplicator over 8 yearsJust two general comments: 1.
using namespace
is a scourge, only acceptable when the namespace is guaranteed to only contain the symbols you want to import. 2.std::endl
does a manual flush, which is generally simply wateful. -
Deduplicator over 8 yearsBTW: You might want to add the cstring-solution to your answer.
-
Peter Cordes over 8 years@Damian: using simpler functions is no guarantee of avoiding bugs. The more code you have to write yourself, instead of using library tools, the more chance there is of having a bug. Obviously you have to understand the library functions you use, and C++ has way more than C.
-
Peter - Reinstate Monica over 8 yearsThis is an elegant solution (stateful predicate).
-
Damian about 8 years
sscanf
is a function that ca be used inANSI C (plain C)
as well? -
Peter - Reinstate Monica about 8 years@Damian Oh yes, it is. It's part of the C standard (and with it, part of the POSIX standard for Unix-like systems).
-
Damian about 8 yearsthank you, you know, C is a very old programming language, it gives me headaches all the time ... look at this : stackoverflow.com/questions/35873677/…
-
Damian about 8 yearsC is a very old programming language, it gives me headaches all the time ... look at this : stackoverflow.com/questions/35873677/…