Friday, August 18, 2017

process text streams using filters (sort,nl,wc,expand,cut,paste)



We can see the number of text files,that we created.

[ user@tcox1 ~]$ mkdir tmp
[ user@tcox1 ~]$  ll
[ user@tcox1 ~]$  cd tmp
[ user@tcox1 tmp]$  cp ../* .txt

these are the text files,we are going to deal with.
The first command,we are going to cover is sorting:
It allows us to sort either numerically or alphabetically.
Starting by default,column zero.


[ user@tcox1 tmp]$  cat numbers.txt
1
5
10
13
55
511
6
2
49

[ user@tcox1 tmp]$ sort numbers.txt

[ user@tcox1 tmp]$  sort -n numbers.txt

1
2
5
6
10
13
49
55
511

[ user@tcox1 tmp]$  cat alpha

The sort command doesn’t change the original file.
We want to sort the values in the file and have it actually captured that output and redirect to

another file.


[ user@tcox1 tmp]$  sort alpha.txt   >  sortedalpha.txt
[ user@tcox1 tmp]$ cat sorted alpha.txt


alpha
beta
epsilon
gamma
theta
zeta

[ user@tcox1 tmp]$

How we number lines in a file?
à nl command helps us to numbering the line.

[ user@tcox1 tmp]$ cat /etc/passwd

adm: x : 3 : 4 adm : /var/adm: /sbin/nologin
lp:x: 4 : 7 :lp: /var/spool/lpd: /sbin/nologin


[ user@tcox1 tmp]$ nl  cat /etc/passwd



It will assign the line number to the file,
If I have blank lines in the /etc/passwd

If you want to assign the line number to the blank line:

In this case,we don’t have any blank lines in the /etc/passwd
The account of lines,numbers and words in the file.

Word count(wc):count the number of words,lines and numbers




[ user@tcox1 tmp]$  wc  -l  /etc/passwd
31  /etc/passwd

[ user@tcox1 tmp]$ wc -w  cat /etc/passwd
50  /etc/passwd

[ user@tcox1 tmp]$ wc -c   /etc/passwd
1538  /etc/passwd

[ user@tcox1 tmp]$ ls -al  /etc/passwd


1538 bytes or characters.
It is often used in a conjunction with a command.
If I want to know number of lines in a particular directory:

This is the process of pipelining or processing the text stream in to the output.



[ user@tcox1 tmp]$ ls -al  /var  | wc -l
20

[ user@tcox1 tmp]$ cat /var/log/messages | wc -l
cat: /var/log/messages:  permission denied
0

[ user@tcox1 tmp]$ sudo cat /var/log/messages | wc -l
[sudo]  password for user:
1768
[ user@tcox1 tmp]$ 

expand is used when you have values that is separated by the tab:

it is

[ user@tcox1 tmp]$  ll

[ user@tcox1 tmp]$  cat tabs.txt
column1 column2 column3
mycolumn1  mycolumn2  mycolumn3
c1  c2  c3
[ user@tcox1 tmp]$ 


it is very difficult to read,so I can change the tab in to constitent value.


Expand command:

[ user@tcox1 tmp]$  expand -t  10 tabs.txt
column1       column2      column3
mycolumn1       mycolumn2      mycolumn3
c1               c2            c3


[ user@tcox1 tmp]$ 

what I am saying,everyone is getting 10 spaces between each value.
This provides space between everything and helps us to provide the line up.

Cut command:

Next is the cut command:
It allows me to exact certain fields or columns of data,from a particular location,indicated file
using,specified delimiter.

Ofcourse delimiter is a character that,is used to separate the fields in the file.


[ user@tcox1 tmp]$  cat columns.txt
first:last:hero:email
clark:kent:superman:iamsuperman@justiceleague.com
barry:allen:flash:zippy@justiceleague.com

[ user@tcox1 tmp]$ 

I am  just cutting the character 5
  cut - c 5 columns.txt

[ user@tcox1 tmp]$  cut  -c 5 columns.txt
t
k
e
y

[ user@tcox1 tmp]$  cut  -c  1-5   columns.txt
first
clark
bruce
barry

[ user@tcox1 tmp]$ 

Now I went to cut out the field’s:
Now we are cutting out the particular field.
We are cutting out one or more fields.


[ user@tcox1 tmp]$   cut -d: -f 1 columns.txt
first
clark
bruce
barry

[ user@tcox1 tmp]$ 



[ user@tcox1 tmp]$   cut -d: -f 2 columns.txt
last
kent
wayne
allen

[ user@tcox1 tmp]$   cut -d: -f 1,2 columns.txt
first:last
clark:kent
bruce:wayne
barry:allen

[ user@tcox1 tmp]$   cut -d: -f 1,2,4 columns.txt
first:last:email
clark:kent:iamsuperman@justiceleague.com


[ user@tcox1 tmp]$   cut -d: -f 1,2,4 columns.txt   > new columns.txt
[ user@tcox1 tmp]$   cat new columns.txt
first:last:email
kaushikgattu:kaushikosr@gmail.com
[ user@tcox1 tmp]$   

Those fields separated by those delimiters.cut out for that particular  file.

Paste command:

Next command is the paste command:
It will do to combined the files:
It doesn’t remove any data,during the combination.it really concenate the two files.
We can see that it concenates the two values side by side:

the next command is the join,it is similar like,we see in the database.
here, I

[ user@tcox1 tmp]$    paste file1.txt   file2.txt
value1  value1
value2   value2
value3    valuue3

[ user@tcox1 tmp]$   join file1.txt  file2.txt
value1
value2
value3
value4

[ user@tcox1 tmp]$   

here, I am going to get the unique values.between the two of them.

Unique command:

here,I changed the value to the 4.
Join behaves like database style of combining together.
Now we are going to the command called unique. ---à uniq
It allows me to get the unique line of information from the data.



[ user@tcox1 tmp]$  cat unique.txt
This is a line
This is a line
This is a  different line
This is a  different line
This is a  same line
This is a  same line
different line  1
different line  2


[ user@tcox1 tmp]$  uniq  unique.txt
This is a line
This is a  different line
This is a  same line
different line  1
different line  2


if you want to avoid the duplicate,we can use this:

[ user@tcox1 tmp]$  uniq   -d unique.txt
This is a line
This is a  different line
This is a  same line


[ user@tcox1 tmp]$  uniq   -D unique.txt
This is a line
This is a line
This is a  different line
This is a  different line
This is a  same line
This is a  same line


[ user@tcox1 tmp]$  uniq  unique.txt
This is a line
This is a  different line
This is a  same line
different line  1
different line  2

Next command is the head
By default head,
This head will help us to get the first ten lines of the file.
It gives the first ten lines of that file.


[ user@tcox1 tmp]$   sudo head /var/log/messages
xxxxxxx  rsyslogd ............................................................


this is mainly used for the log file:


[ user@tcox1 tmp]$ sudo head -n 15 /var/log/messages
xxxxxxxxxxx
rsyslogd  was HUPed


related to that,we have another command called the tail:
it will show us the bottom part of the log file.
So,now I got the last ten:


[ user@tcox1 tmp]$ sudo tail  /var/log/messages

xxxxxxxxx    Intializing Xen virtual ethernet drivers


cool feature of the tail ,we can follow everything:


[ user@tcox1 tmp]$  tail -f /var/log/yum.log
xxxxxxxxxxx    Erased: telnet
xxxxxxxxxxx     Installed: 1:telnet-0.17


we are following the output of this command using this command.

No comments:

Post a Comment