How to select first 10 words of a sentence?

87,924

Solution 1

implode(' ', array_slice(explode(' ', $sentence), 0, 10));

To add support for other word breaks like commas and dashes, preg_match gives a quick way and doesn't require splitting the string:

function get_words($sentence, $count = 10) {
  preg_match("/(?:\w+(?:\W+|$)){0,$count}/", $sentence, $matches);
  return $matches[0];
}

As Pebbl mentions, PHP doesn't handle UTF-8 or Unicode all that well, so if that is a concern then you can replace \w for [^\s,\.;\?\!] and \W for [\s,\.;\?\!].

Solution 2

Simply splitting on spaces will function incorrectly if there is an unexpected character in place of a space in the sentence structure, or if the sentence contains multiple conjoined spaces.

The following version will work no matter what kind of "space" you use between words and can be easily extended to handle other characters... it currently supports any white space character plus , . ; ? !

function get_snippet( $str, $wordCount = 10 ) {
  return implode( 
    '', 
    array_slice( 
      preg_split(
        '/([\s,\.;\?\!]+)/', 
        $str, 
        $wordCount*2+1, 
        PREG_SPLIT_DELIM_CAPTURE
      ),
      0,
      $wordCount*2-1
    )
  );
}

Regular expressions are perfect for this issue, because you can easily make the code as flexible or strict as you like. You do have to be careful however. I specifically approached the above targeting the gaps between words — rather than the words themselves — because it is rather difficult to state unequivocally what will define a word.

Take the \w word boundary, or its inverse \W. I rarely rely on these, mainly because — depending on the software you are using (like certain versions of PHP) — they don't always include UTF-8 or Unicode characters.

In regular expressions it is better to be specific, at all times. So that your expressions can handle things like the following, no matter where they are rendered:

echo get_snippet('Это не те дроиды, которые вы ищете', 5);

/// outputs: Это не те дроиды, которые

Avoiding splitting could be worthwhile however, in terms of performance. So you could use Kelly's updated approach but switch \w for [^\s,\.;\?\!]+ and \W for [\s,\.;\?\!]+. Although, personally I like the simplicity of the splitting expression used above, it is easier to read and therefore modify. The stack of PHP functions however, is a bit ugly :)

Solution 3

http://snipplr.com/view/8480/a-php-function-to-return-the-first-n-words-from-a-string/

function shorten_string($string, $wordsreturned)
{
    $retval = $string;  //  Just in case of a problem
    $array = explode(" ", $string);
    /*  Already short enough, return the whole thing*/
    if (count($array)<=$wordsreturned)
    {
        $retval = $string;
    }
    /*  Need to chop of some words*/
    else
    {
        array_splice($array, $wordsreturned);
        $retval = implode(" ", $array)." ...";
    }
    return $retval;
}

Solution 4

I suggest to use str_word_count:

<?php
$str = "Lorem ipsum       dolor sit    amet, 
        consectetur        adipiscing elit";
print_r(str_word_count($str, 1));
?>

The above example will output:

Array
(
    [0] => Lorem
    [1] => ipsum
    [2] => dolor
    [3] => sit
    [4] => amet
    [5] => consectetur
    [6] => adipiscing
    [7] => elit
)

The use a loop to get the words you want.

Source: http://php.net/str_word_count

Solution 5

This can easily be done using str_word_count()

$first10words = implode(' ', array_slice(str_word_count($sentence,1), 0, 10));
Share:
87,924
AAA
Author by

AAA

Updated on May 11, 2020

Comments

  • AAA
    AAA about 4 years

    How do I, from an output, only select the first 10 words?

  • Farzher
    Farzher over 11 years
    +1 Why was this at 0 votes? It's a better solution than the other answers. Although, people shouldn't be using camel case in PHP.
  • Pebbl
    Pebbl over 11 years
    @StephenSarcsamKamenar thanks... and good point, I'd been doing too much javascripting that day :)
  • JeanValjean
    JeanValjean over 11 years
    I do agree with @StephenSarcsamKamenar's question! I suppose that there are two much answers here. It is a duty of the one that made the question to update the right answer. This is the best for me: +1 with no doubt!
  • NotJay
    NotJay over 10 years
    This worked great for me. I needed to display only the first 5 sentences however so I switched the 10 to a 5, then switched the ' ' to '. ' in the implode and explode and it worked just fine. I did have to put a period after I displayed the text because the last period was omitted. Thank you.
  • Pebbl
    Pebbl about 9 years
    Nice update, +1 for avoiding the splitting (and using regular expressions!). You'll want to watch out for those word boundaries however, as per my updated answer.
  • Kelly
    Kelly about 9 years
    It's unfortunate that PHP still hasn't figured out how to handle Unicode -- thanks for the info, I've updated my answer.
  • ingalcala
    ingalcala about 8 years
    thank you very much!!, this worked on my site with WPIMPORTALL to only select the first 6 letters. Also that Unicode, was an excellent add!! wonderful
  • Greeso
    Greeso about 8 years
    Great answer. However, I would like to add to the answer that you may need to user trim() around your $str before you process it. This way you eliminate any whitespace in the corners. This would help if you want to check whether you want to add ellipses to the end of the string if the resulting string is a subset of the original.
  • Mostafa
    Mostafa over 5 years
    How return 10 worlds if our string have <p>? this not work with string that html on theme...
  • Kelly
    Kelly over 5 years
    You're going to have to strip the html out of the string. Try using strip_tags.
  • Alex
    Alex over 2 years
    I think the above snippet can be slightly optimized by replacing "$str, $wordCount*2+1" with "$str, $wordCount+1" as the counting of the chunks resulted does not include the splitting characters/words.