Copy only Specific text of a file to another

26,924

Solution 1

I assume the file follows the same pattern. If that is the case, you can have a command like below.

grep -o ' path=.*$' file.txt | cut -c8- |rev | cut -c 4- | rev

So, I open the file using cat and then I extract only the characters from path= and then I remove the unwanted characters using cut and then I use the rev technique to remove unwanted characters from the end.

Another awk approach

awk -F'path="' '{print $2}' file.txt |rev | cut -c 4- | rev

I use the path=" as delimiter and print all the information after it. And the rev basically does the same as above.

Testing

cat file.txt
<classpathentry kind="src" path="Sources"/>
<classpathentry kind="con" path="WOFramework/ERExtensions"/>
<classpathentry kind="con" path="WOFramework/ERJars"/>
<classpathentry kind="con" path="WOFramework/ERPrototypes"/>
<classpathentry kind="con" path="WOFramework/JavaEOAccess"/>
<classpathentry kind="con" path="WOFramework/JavaEOControl"/>
<classpathentry kind="con" path="WOFramework/JavaFoundation"/>
<classpathentry kind="con" path="WOFramework/JavaJDBCAdaptor"/>

After running the command,

Sources
WOFramework/ERExtensions
WOFramework/ERJars
WOFramework/ERPrototypes
WOFramework/JavaEOAccess
WOFramework/JavaEOControl
WOFramework/JavaFoundation
WOFramework/JavaJDBCAdaptor

A better approach as provided by Stephane in comments.

cut -d '"' -f4 file.txt

Solution 2

A simple approach with awk:

awk -F\" '/WOF/ {print $4}' abc.txt > outfile
  • -F\" changes the field separator from the default (a space) to a quote mark (escaped with \)
  • /WOF/ restricts the returned results of each record (line of the file) to those that match the pattern: WOF
  • $4 is the fourth field for each of those matching records, the path.

Solution 3

sed -n '/.*="con"[^"]*./{s///;s/..>//p}' <<\DATA

<classpathentry kind="src" path="Sources"/>
<classpathentry kind="con" path="WOFramework/ERExtensions"/>
<classpathentry kind="con" path="WOFramework/ERJars"/>
<classpathentry kind="con" path="WOFramework/ERPrototypes"/>
<classpathentry kind="con" path="WOFramework/JavaEOAccess"/>
<classpathentry kind="con" path="WOFramework/JavaEOControl"/>
<classpathentry kind="con" path="WOFramework/JavaFoundation"/>
<classpathentry kind="con" path="WOFramework/JavaJDBCAdaptor"/>
DATA

OUTPUT

WOFramework/ERExtensions
WOFramework/ERJars
WOFramework/ERPrototypes
WOFramework/JavaEOAccess
WOFramework/JavaEOControl
WOFramework/JavaFoundation
WOFramework/JavaJDBCAdaptor

This should get only the WO... stuff, I think. It's also fully portable.

Solution 4

Another approach with grep and cut:

grep "kind=\"con\"" sample.txt | cut -d \" -f 4 > sample_edited.txt

This will grep all lines containing kind="con" and print the paths by setting cut's delimiter to ".

Solution 5

Another solution if your version of grep supports PCRE-style lookarounds

grep -oP '(?<=kind="con" path=").+?(?="/>)' abc.txt
Share:
26,924

Related videos on Youtube

gkmohit
Author by

gkmohit

I am an Entrepreneur, Web Designer and Online Business Consultant. My mission is to help small businesses grow by leveraging the power of the internet. I believe in automating tasks by using tools so that you can focus on your core business. I have always been a curious person. The first time I used a computer was in grade 8 and fascinated by how you could create digital art using Corel Draw. In class 10, I had the opportunity to use the first mobile phone, and I was very intrigued by how the OS integrated with the hardware. That same curiosity led me to write my first piece of code in grade 10, and I then realized the power a programmer had in this world. In the mid-2011 family and I moved from Bangalore, India to Toronto, Canada, where I started my undergraduate degree in Computer Science at York University. As a student, I couldn't wait to get some industry experience, and I was fortunate to land my first job in IT at the University Information Technology department. I started as a Technical Analyst and slowly grew to be a software developer at the Student Information System. Gaining some industry experience gave me the confidence to go and attend a few hackathons across North America. I was fortunate to win a few awards from companies like Google, IBM, Bank of Nova Scotia and more while attending hackathons. With the help of my awards, experience and my skills, I started my internship at SAP Labs in Waterloo, Canada. My course was great, but I was seeking something more challenging, so my hackathon team members and I decided to start a fast-growing development shop Hyfer Technologies. At Hyfer Technologies, I stumbled upon Product Management and Business Analysis while managing a team of developers remotely. So far, I have been able to work with 10+ clients from conception to production. As a product manager, I have had a few failed projects but also some that are still growing strong. As of March 2020, I am working with The Ottawa Hospital as a Business Analyst. As a Product Manager &amp; Business Analyst, my skills include but are not limited to: Management Strategy Growth Strategy Customer, partner and client relations, Organizational Design Process Improvements Statistical Analysis and Data Mining Marketing and Brand Strategy Running Product-Related Sessions Managing technical team Through these skills and experience, I am confident I can add a lot of values to any growing team. I am always open to learning more about you and your business. Feel free to reach out to me or follow me on LinkedIn.

Updated on September 18, 2022

Comments

  • gkmohit
    gkmohit almost 2 years

    I have a file abc.txt the contents are

    <classpathentry kind="src" path="Sources"/>
    <classpathentry kind="con" path="WOFramework/ERExtensions"/>
    <classpathentry kind="con" path="WOFramework/ERJars"/>
    <classpathentry kind="con" path="WOFramework/ERPrototypes"/>
    <classpathentry kind="con" path="WOFramework/JavaEOAccess"/>
    <classpathentry kind="con" path="WOFramework/JavaEOControl"/>
    <classpathentry kind="con" path="WOFramework/JavaFoundation"/>
    <classpathentry kind="con" path="WOFramework/JavaJDBCAdaptor"/>
    

    I want to copy all the paths into another file. That is I want my output text file to look like:

        WOFramework/ERExtensions
        WOFramework/ERJars
        WOFramework/ERPrototypes
        WOFramework/JavaEOAccess
        WOFramework/JavaEOControl
        WOFramework/JavaFoundation
        WOFramework/JavaJDBCAdaptor
    
    • Remon
      Remon about 10 years
      you want to copy depending on kind?
    • Mikel
      Mikel about 10 years
      Looks like you're trying to extract parts of an XML document. Try an XML tool such as xmlstarlet or xmllint. stackoverflow.com/questions/91791/…
    • Stéphane Chazelas
      Stéphane Chazelas about 10 years
      cut -d '"' -f4?
    • Ramesh
      Ramesh about 10 years
      @StephaneChazelas, your answer should be the best solution :)
  • mikeserv
    mikeserv about 10 years
    He doesn't want "Sources"
  • mikeserv
    mikeserv about 10 years
    Actually, I guess he accepted it - so what do I know?
  • text
    text about 10 years
    sed -e 's/.*path="//' -e 's:".*$::' abc.txt > output_file -- dropping everything after the last quote instead of specific matching at the end.
  • Avinash Raj
    Avinash Raj about 10 years
    it display extra lines.
  • Mathias Begert
    Mathias Begert about 10 years
    @AvinashRaj, where are you seeing extra lines in the OP's input data? The answer above is tailored to the OP's data.
  • Avinash Raj
    Avinash Raj about 10 years
    it displays sources also according to the op's input.