Unix - Using find to List all .html files. (Do not use shell wildcards or the ls command)

11,325

Solution 1

What you needed was

find -name '*.html'

Or for regex:

find -regex '.*/.*\.html'

To ignore case, use -iname or -iregex:

find -iname '*.html'
find -iregex '.*/.*\.html'

Manual for -name:

   -name pattern
          Base of file name (the path with the leading directories
          removed) matches shell pattern pattern.  The metacharacters
          (`*', `?', and `[]') match a `.' at the start of the base name
          (this is a change in findutils-4.2.2; see section STANDARDS CON‐
          FORMANCE below).  To ignore a directory and the files under it,
          use -prune; see an example in the description of -path.  Braces
          are not recognised as being special, despite the fact that some
          shells including Bash imbue braces with a special meaning in
          shell patterns.  The filename matching is performed with the use
          of the fnmatch(3) library function.   Don't forget to enclose
          the pattern in quotes in order to protect it from expansion by
          the shell.

Solution 2

find . -name '*.html'

You have to single quote the wildcard to keep the shell from globbing it when passing it to find.

Solution 3

You want

find . -name "*.html"

Find uses emacs regex by default, not the posix you are probably used to.

Solution 4

You are missing a couple things here. First of all the path. If you are searching in the local path, use . For example: find . will list every file and directory recursively in the current directory. Second a * is a wildcard. So to find all the .html files in the current directory, try

find . -name *.html
Share:
11,325
Zoe
Author by

Zoe

Updated on June 07, 2022

Comments

  • Zoe
    Zoe almost 2 years

    I've tried 'find -name .html$', 'find -name .html\>'.
    None worked.

    I'd like to know why these two are wrong and what's the right one to use with no wildcards?

    • ott--
      ott-- over 10 years
      find -type f | grep -e 'html$'. What's wrong with using wildcards? find -name '*.html'.
    • matth
      matth over 10 years
      @Zoe, in future when you have seemingly arbitrary constraints, please explain the source of the constraint so that people may provide appropriate help. Is this a bar bet? An online quiz? Or do you have some specific engineering reason to avoid wildcard characters?
    • Zoe
      Zoe over 10 years
      It's just an exercise. I guess the limitation is to avoid using 'easy pass' but understand the other choices out there. Thanks for suggestion!
  • Zoe
    Zoe over 10 years
    But isn't it wildcard?
  • Zoe
    Zoe over 10 years
    But isn't it shell wildcard
  • matth
    matth over 10 years
    Since it is quoted, it isn't a shell wildcard. It is a find wildcard.
  • Zoe
    Zoe over 10 years
    can you explain me the difference in detail? Thanks!
  • matth
    matth over 10 years
    A command-line parameter like *.html is interpreted by the shell. A command-line parameter like '*.html' is passed, uninterpreted, to the command. In this instance, find uses * in a similar, but not identical, manner as the shell does. As a more obvious example, consider echo '***HELLO!***'. The * characters in that command are most certainly not wildcards, they are simply parameters to the echo.
  • Zoe
    Zoe over 10 years
    what's the difference between using "" and ''? Thanks!
  • Andrew Stubbs
    Andrew Stubbs over 10 years
    In this case, there is no difference, in general using double quotes would allow you to abstract this find into "find . -name "*.$EXTENSION" and allow you to define the environment variable $EXTENSION elsewhere. Single quotes would not allow this.
  • matth
    matth over 10 years
    @Zoe - "Enclosing characters in single quotes (') preserves the literal value of each character within the quotes.", and "Enclosing characters in double quotes (") preserves the literal value of all characters within the quotes, with the exception of $, `, \ , and, when history expansion is enabled, !." -- Bash Reference Manual
  • konsolebox
    konsolebox over 10 years
    @Zoe It is but the pattern is interpreted inside find since it's already quoted with single quotes. Also, in Windows/DOS those are called wildcards, but in Linux or UNIX, it is mostly known as glob patterns. See my update by the way for the info.
  • Zoe
    Zoe over 10 years
    find -regex '.*/.*\.html' What's .*/.* doing here?
  • zwol
    zwol over 10 years
    I am deeply amused by your referring to POSIX basic regular expressions as "emacs regex" and contrasting them with "posix [regex]" by which I assume you mean POSIX extended regular expressions and/or the even more extended (but not POSIX-standardized, and technically not even "regular expressions" anymore) Perl regexes.
  • Andrew Stubbs
    Andrew Stubbs over 10 years
    @Zack - I was merely reciting from the find manpage, but I am glad to have caused you much amusement: "-regextype type ... Currently implemented types are emacs (this is the default)"