Friday, August 18, 2017

Process text streams using filters (split, cat,od,pr,fmt,sed,more and less)

Cat to display the contents
Tac it will reverse the output of the file

[ user@tcox1 tmp]$ ll
total 44
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx   tabs.txt


[ user@tcox1 tmp]$ cat tabs.txt
column1       column2 column3
mycolumn1    mycolumn2    mycolumn3
c1    c2   c3
[ user@tcox1 tmp]$ tac tabs.txt
c1    c2   c3
mycolumn1    mycolumn2    mycolumn3
column1       column2   column3
[ user@tcox1 tmp]$


Split command:

it will take the file,with the number of files and help us to split.
In multiple files,it contains the part of the data.

[ user@tcox1 tmp]$  ll
xxxxxxxxxxxxxxxxxxxx            users.txt

[ user@tcox1 tmp]$ split users.txt
[ user@tcox1 tmp]$  ll
total 48

xxxxxxxxxxxxxxxxxxxxxxx   users.txt
xxxxxxxxxxxxxxxxxxxxxxx   xaa


[ user@tcox1 tmp]$  

every file,whichever I want to split,it will begins with the x.
by default you have the two letters.


xxxxxxxxxxxxxxxxxxxxxxxxx   xaa


for every file it created with the xaa

[ user@tcox1 tmp]$  split -a 4  users.txt
[ user@tcox1 tmp]$   ll
total 52

xxxxxxxxxxxxxxxxxxxxxxx   users.txt
xxxxxxxxxxxxxxxxxxxxxxx   xaa
xxxxxxxxxxxxxxxxxxxxxxx   xaaaa

Example2:


[ user@tcox1 tmp]$  split -b 512 -a 3 users.txt
[ user@tcox1 tmp]$  

xxxxxxxxxxxxxxxxxxxxxxx   users.txt
xxxxxxxxxxxxxxxxxxxxxxx   xaaa
xxxxxxxxxxxxxxxxxxxxxxx   saab
xxxxxxxxxxxxxxxxxxxxxxx   xaac
xxxxxxxxxxxxxxxxxxxxxxx   xaad

you can also break up in to number of lines in the file.


it will allows us to safely view the display:
it will simply allow us to download rpm file:
used to install telnet on our system.


[ user@tcox1 tmp]$   sudo yumdownloader telnet
xxxxxx
xxxxxx
xxxxxx
xxxxx
telnet-0.17-48.el6.x86_64.rpm

[ user@tcox1 tmp]$   ll
total 120

xxxxxxxxxxxxxxxxxx    telnet-0.17-48.el6.x86_64.rpm


when cat the telnet

we can see the a lot of binary files.

aaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaa



od command will help us to view the things,with out messing up the terminal.


[ user@tcox1 tmp]$  od   telnet-0.17-48.el6.x86_64.rpm


0163020  062031  053750  054151 024141 020240 144117


displays in the octal format.this is the default to display the binary files.


However,we have couple of other options,display that in the –d decimal format and we can also
disply in the floating format –f and the other is the hexadecimal format -x
The next utility is the pr utility:
Source code and unstructured documents


We wont use the pr on the binary file,that’s not really going to help you.
It is mainly designed for the text documents.




[ user@tcox1 tmp]$ cat nowisthetime.txt

acddfgigrgnrigrgergemgkermgkremgkemebhjfhjbfjefbjfgbgjebg
fhjbfjbjbdfjbdfjbdfjbjdbjbufwijhfiwhfwehfhfuiweufhweuhfuhfuhhef
efegheruheriwoiefuhfuewefnwuhguhjdnfwjhfuqhiqwhhqwueqiliufhqu
huwehfuwhfuwhguwhefuhufhiwehfuhuhuehfiweghurhwifhuihguhguh
fweuhguefhguiwefkjhwuegfkjafhuuthjwefuheuhqbfeuguhwojrwjhehtueh
fygfehwuhhuefhweugheuhguweuhqwuhueuhgheguhwUAKGUKERGUHUHGU



I want to format this for printing.


[ user@tcox1 tmp]$ pr nowisthetime.txt > printed
[ user@tcox1 tmp]$ cat printed





[ user@tcox1 tmp]$ pr --columns=2 nowisthetime.txt > printed


acddfgigrgnrigrgergemgkermgkremgkemebhjfhjbfjefbjfgbgjebg
fhjbfjbjbdfjbdfjbdfjbjdbjbufwijhfiwhfwehfhfuiweufhweuhfuhfuhhef
efegheruheriwoiefuhfuewefnwuhguhjdnfwjhfuqhiqwhhqwueqiliufhqu  

huwehfuwhfuwhguwhefuhufhiwehfuhuhuehfiweghurhwifhuihguhguh
fweuhguefhguiwefkjhwuegfkjafhuuthjwefuheuhqbfeuguhwojrwjhehtueh
fygfehwuhhuefhweugheuhguweuhqwuhueuhgheguhwUAKGUKERGUHUHGU

very formally used with the format command.
It is also used,format for printing,however:
It’s format command or fmt:
I wanted each line to be charactered in 10 size.



[ user@tcox1 tmp]$ clear

[ user@tcox1 tmp]$ fmt   -10 nowisthetime.txt

xxxxx
vvvvvv
eeeeee
tttttttt


Now,we are going to get the interesting output.



[ user@tcox1 tmp]$ fmt   -15 nowisthetime.txt    | pr --columns=2

redirecting the stream as an input to the pr.
There is no filename to put,to get the header information from.


[ user@tcox1 tmp]$ fmt   -15 nowisthetime.txt    | pr --columns=2  -h "nowisthetime  text file"

date           nowisthetime  text file   Page 1


xxxxxxx
cccccccc
vvvvvvvv
ffffffffffff


Next command is the tr or the translate:
That helps us to change one or more characters that match,simple pattern of the range,in a file
or stream,keep in mind pr is not instead to substitute in terms of phases and it is not capable
doing regex.
I can substitute the simple letters.
If I wanted to use the tr:
Replace the lower case ‘a’ with the upper case ‘A’



[ user@tcox1 tmp]$  cat alpha.txt
beta
alpha
gamma
zeta
theta
epsilon

[ user@tcox1 tmp]$ tr 'a' 'A' alpha.text
tr:  extra operand  'alpha.txt'
Try 'tr  --help' for more information.
[ user@tcox1 tmp]$


it is not like normal operanding,we have to treat separately.
Redirect the file itself to the utility:




[ user@tcox1 tmp]$  tr  'a'  'A'  < alpha.txt

[ user@tcox1 tmp]$

I can also replace the range of letters:
Note:we cannot replace entire,it is not the purpose of this utility.


[ user@tcox1 tmp]$ tr 'a-e' 'A-E'  <  alpha.text
BEtA
AlphA
gAmmA
zEtA
Epsilon

we are going to deal with the sed example:
à sed is the substitute utility now.
We are going to cover sed as a pretence to ,processing text is the stream in this video.
We will cover sed in terms of regular expressions.
We will do regular expressions coverage with egrep and fgrep.
For example:I want to substitute the value:
We have to see,stream of options for the sed,before we go ahead.
By default,you have to not enclose the value,

[ user@tcox1 tmp]$  cat sedexample.txt
xxxxxxyyyyyyyyyuuuuuuuuuuuuiiiiiiiiiioooooooo


à sed ‘s/the/THE (I want to substitute ‘the/THE/’ , there are two ways I can do this,I will end
here and replace the lower case,it will be upper cased.
Or
sed ‘s/the/THE (I want to substitute ‘the/THE/g if you end with the ‘g’,globally look to the entire
file.(this will globally replace everything).
The output we get again,it’s a text processing utility team.


If I want to replace the,all I have to do is the,indicate the word.




[ user@tcox1 tmp]$  sed  's/the/NOW/g'  sedexample.txt
xxxxxxNOWvvvvvvvvvvv

we can do,some complex regex expressions:
-à sed –e ‘s/t (I want to à s -àsubstitute -à any occurrence of the /the/NOW(with now).
/g (globally in the file) –e ‘s/NOW (now I want to susbstitute the any occurrence of now).
/NEVER/g’ in the file sedexample.txt
(-e gives the ability to continue……………..)



[ user@tcox1 tmp]$  sed  -e 's/the/NOW/g -e 's/NOW/NEVER/g' sedexample.txt
xxxxxxxNEVERtime

Now Indicated in my list,what I wanted to do:

s/the/THE/g
s/THE/NOW/g
s/NOW/NEVER/g


 [user@tcox1 tmp]$  vim sedopts.txt

we are saying sed,read this list and apply them to the file.
It has to read that three and it has to provide me the last output


[user@tcox1 tmp]$ sed -f sedopts.txt sedexample.txt
xxxxxNEVERxxxxxxxxxNEVERxxxxxxxNEVER


these are simple uses of the sed,from a text processing perspective.all the items,that we need
to know.
Now we go from the complex utilities to more simple utilities:
à more allows us to page through,file and log file etc.


[user@tcox1 tmp]$ sudo cp   /var/log/messages  message.txt


if you want to exit more utility,we have to press the ‘quit’ command.
I can pass couple of options to the more command:


[user@tcox1 tmp]$  more -d  messages.txt

à more -p messages.txt
more does not have the facility to go backward.
When we use the less command,we can go forward and the backward of the .txt file.




[user@tcox1 tmp]$  less -d  messages.txt


---> b for the backward.
----> d for the forward.
These are the text processing utilities using the filters.






















No comments:

Post a Comment