Tushar Jadhav Blog: 2017-01-22

Screen command

watch -n 1 "ps aux |grep httpd |wc -l"
ctrl+a then d
screen -ls will show session id
screen -x <session id > to attach session id
----------------------------------------------------------------

The apache log file should have the IP address in Column one, so use awk or cut to get column 1, then sort the results through unique. The below will generate a list of IP address along with the number of times each was encountered sorted with the greatest number at the bottom.

find out ip add from apache access log

cat access_81-2017-01-28.log | awk '{print $1}' | sort | uniq -c | sort -n

cat access_81-2017-01-28.log | awk '{print $1}' | sort -n | uniq -c | sort -nr | head -20

awk '{ print $1}' access_81-2017-01-28.log | sort | uniq -c | sort -nr | head -n 10
cat access.log |cut -d ' ' -f 1 |sort

View Apache requests per hour

grep "23/Jan" example.com | cut -d[ -f2 | cut -d] -f1 | awk -F: '{print $2":00"}' | sort -n | uniq -c

View Apache requests per minute

grep "28/Jan/2017:06" example.com | cut -d[ -f2 | cut -d] -f1 | awk -F: '{print $2":"$3}' | sort -nk1 -nk2 | uniq -c | awk '{ if ($1 > 10) print $0}'
In the command above:

awk – prints the access_81-2017-01-28.log file.
sort – helps to sort lines in a access_81-2017-01-28.log file, the -n option compares lines based on the numerical value of strings and -r option reverses the outcome of the comparisons.

uniq – helps to report repeated lines and the -c option helps to prefix lines according to the number of occurrences.

--------------------------------------------------------------------

The following code will extract all ip's that have made more than 2 unsuccessful attempts in one minute. You first need to build a awk array indexing on [date time ip]: Apr 28 20:18 123.123.123.123

Code:

awk -F'[ :]' '{_[$1 $2 $3 $4 $13]++} _[$1 $2 $3 $4 $13]>2 {print $13}' access.log

------------------------------------------------------------------
1. Combined log format

the source IP address
the client's identity
the remote user name (if using HTTP authentication)
the date, time, and time zone of the request
the actual content of the request
the server's response code to the request
the size of the data block returned to the client, in bytes

The following assumes an Apache HTTP Server combined log format where each entry in the log file contains the following information:

%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-agent}i"

where:

%h = IP address of the client (remote host) which made the request

%l = RFC 1413 identity of the client

%u = userid of the person requesting the document

%t = Time that the server finished processing the request

%r = Request line from the client in double quotes

%>s = Status code that the server sends back to the client

%b = Size of the object returned to the client

The final two items: Referer and User-agent give details on where the request originated and what type of agent made the request.

Sample log entries:

66.249.64.13 - - [18/Sep/2004:11:07:48 +1000] "GET /robots.txt HTTP/1.0" 200 468 "-" "Googlebot/2.1" 66.249.64.13 - - [18/Sep/2004:11:07:48 +1000] "GET / HTTP/1.0" 200 6433 "-" "Googlebot/2.1"

2. Using awk

The principal use of awk is to break up each line of a file into 'fields' or 'columns' using a pre-defined separator. Because each line of the log file is based on the standard format we can do many things quite easily.

Using the default separator which is any white-space (spaces or tabs) we get the following:

awk '{print $1}' combined_log # ip address (%h)

awk '{print $2}' combined_log # RFC 1413 identity (%l)

awk '{print $3}' combined_log # userid (%u)

awk '{print $4,5}' combined_log # date/time (%t)

awk '{print $9}' combined_log # status code (%>s)

awk '{print $10}' combined_log # size (%b)

You might notice that we've missed out some items. To get to them we need to set the delimiter to the " character which changes the way the lines are 'exploded' and allows the following:

awk -F\" '{print $2}' combined_log # request line (%r)

awk -F\" '{print $4}' combined_log # referer

awk -F\" '{print $6}' combined_log # user agent

3. Examples

You want to list all user agents ordered by the number of times they appear (descending order):

awk -F\" '{print $6}' combined_log | sort | uniq -c | sort -fr

All we're doing here is extracing the user agent field from the log file and 'piping' it through some other commands. The first sort is to enable uniq to properly identify and count unique user agents. The final sort orders the result by number and name (both descending).
The result will look similar to a user agents report generated by one of the above-mentioned packages. The difference is that you can generate this ANY time from ANY log file or files.
If you're not particulary interested in which operating system the visitor is using, or what browser extensions they have, then you can use something like the following:

awk -F\" '{print $6}' combined_log \

| sed 's/($[^;]\+; [^;]\+$[^)]*)/(\1)/' \

| sort | uniq -c | sort -fr

Note: The \ at the end of a line simply indicates that the command will continue on the next line.

This will strip out the third and subsequent values in the 'bracketed' component of the user agent string. For example:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR)

becomes:

Mozilla/4.0 (compatible; MSIE 6.0)

The next step is to start filtering the output so you can narrow down on a certain page or referer. Would you like to know which pages Google has been requesting from your site?

awk -F\" '($6 ~ /Googlebot/){print $2}' combined_log | awk '{print $2}'

Or who's been looking at your guestbook?

awk -F\" '($2 ~ /guestbook\.html/){print $6}' combined_log

4. Using log files to identify problems with your site

The steps outlined below will let you identify problems with your site by identifying the different server responses and the requests that caused them:

awk '{print $9}' combined_log | sort | uniq -c | sort

The output shows how many of each type of request your site is getting. A 'normal' request results in a 200 code which means a page or file has been requested and delivered but there are many other possibilities.

The most common responses are:

200 - OK

206 - Partial Content

301 - Moved Permanently

302 - Found

304 - Not Modified

401 - Unauthorised (password required)

403 - Forbidden

404 - Not Found

200 - OK

indicates a successful request resulting in a file being returned.

206 - Partial Content
indicates that a file was only partially downloaded. The download could have been interrupted by someone leaving the page before it's fully loaded (in the case of embedded images) or cancelling a download (in the case of PDF, MP3 and similar file types).

301 - Moved Permanently
the server has indicated that the requested file is now located at a new address. Search engines should update their index by removing the old address and replacing it (PR intact) with the new one.

302 - Found
the user has been redirected, but as it's not a Permanent redirect no further action needs to be taken. This could be as simple as the server adding a / to the end of the request, or the result of a header command in PHP.

304 - Not Modified
an intelligent user agent (browser) has made a request for a file which is already present in it's cache. A 304 indicates that the cached version has the same timestamp as the 'live' version of the file so they don't need to download it. If the 'live' file was newer then the response would instead be a 200.

400 - Bad Request
the server couldn't make sense of the request.

401 - Unauthorised (password required)

an attempt has been made to access a directory or file that requires authentication (username and password). Subsequent requests would normally contain a username and password, resulting in either a 200 (user has been authenticated) or 401 (authentication failed).

403 - Forbidden
the server has blocked access to a directory or file. This typically applies to requests that would otherwise result in a directory listing being displayed.

404 - Not Found
the requested file does not exist on the server. This normally indicates a broken link (internal or external).

408 - Request Timeout
the client/server connection process was so slow that the server decided to 'hang up'.

410 - Gone
the server has indicated that the requested file used to exist but has now been permanently removed. Search engines should remove the address from their index

414 - Request-URI Too Long
the request was too long. This normally indicates an attempt to compromise the server using a buffer overflow exploit.

Note: For more on Status Codes you can read the article HTTP Server Status Codes.

A 301 or 302 code means that the request has been re-directed. What you'd like to see, if you're concerned about bandwidth usage, is a lot of 304 responses - meaning that the file didn't have to be delivered because they already had a cached version.

A 404 code may indicate that you have a problem - a broken internal link or someone linking to a page that no longer exists. You might need to fix the link, contact the site with the broken link, or set up a PURL so that the link can work again.

The next step is to identify which pages/files are generating the different codes. The following command will summarise the 404 ("Not Found") requests:

# list all 404 requests

awk '($9 ~ /404/)' combined_log

# summarise 404 requests awk '($9 ~ /404/)' combined_log |

awk '{print $9,$7}' | sort

Or, you can use an inverted regular expression to summarise the requests that didn't return 200 ("OK"):

awk '($9 !~ /200/)' combined_log | awk '{print $9,$7}' | sort | uniq

Or, you can include (or exclude in this case) a range of responses, in this case requests that returned 200 ("OK") or 304 ("Not Modified"):

awk '($9 !~ /200|304/)' combined_log | awk '{print $9,$7}' | sort | uniq

Suppose you've identifed a link that's generating a lot of 404 errors. Let's see where the requests are coming from:

awk -F\" '($2 ~ "^GET /path/to/brokenlink\.html"){print $4,$6}' combined_log

Now you can see not just the referer, but the user-agent making the request. You should be able to identify whether there is a broken link within your site, on an external site, or if a search engine or similar agent has an invalid address.

If you can't fix the link, you should look at using Apache mod_rewrite or a similar scheme to redirect (301) the requests to the most appropriate page on your site. By using a 301 instead of a normal (302) redirect you are indicating to search engines and other intelligent agents that they need to update their link as the content has 'Moved Permanently'.

5. Who's 'hotlinking' my images?

Something that really annoys some people is when their bandwidth is being used by their images being linked directly on other websites.

Here's how you can see who's doing this to your site. Just change www.example.net to your domain, and combined_log to your combined log file.

awk -F\" '($2 ~ /\.(jpg|gif)/ && $4 !~ /^http:\/\/www\.example\.net/){print $4}' combined_log \ | sort | uniq -c | sort

Translation:

explode each row using ";
the request line (%r) must contain ".jpg" or ".gif";
the referer must not start with your website address (www.example.net in this example);
display the referer and summarise.

You can block hot-linking using mod_rewrite but that can also result in blocking various search engine result pages, caches and online translation software. To see if this is happening, we look for 403 ("Forbidden") errors in the image requests:

# list image requests that returned 403 Forbidden

awk '($9 ~ /403/)' combined_log \

| awk -F\" '($2 ~ /\.(jpg|gif)/){print $4}' \

| sort | uniq -c | sortTranslation:

the status code (%>s) is 403 Forbidden;
the request line (%r) contains ".jpg" or ".gif";
display the referer and summarise.

You might notice that the above command is simply a combination of the previous, and one presented earlier. It is necessary to call awk more than once because the 'referer' field is only available after the separator is set to \", wheras the 'status code' is available directly.

6. Blank User Agents

A 'blank' user agent is typically an indication that the request is from an automated script or someone who really values their privacy. The following command will give you a list of ip addresses for those user agents so you can decide if any need to be blocked:

awk -F\" '($6 ~ /^-?$/)' combined_log | awk '{print $1}' | sort | uniq

A further pipe through logresolve will give you the hostnames of those addresses.

----------------------------------------------------------------------

service httpd status:

apache httpd dead but subsys locked

pkiilall -9 apache

pkillall -9 httpd

$ ipcs -s | grep apache
0x00000000 98306 apache 600 1
0x00000000 131075 apache 600 1
0x00000000 163844 apache 600 1

So I delete these resources:
$ ipcs -s | grep apache | perl -e 'while (<STDIN>) { @a=split(/\s+/); print `ipcrm sem $a[1]`}'

resource(s) deleted

resource(s) deleted
resource(s) deleted

and I delete the lock from the subsys folder:

$ rm -f /var/lock/subsys/httpd

and when I check the status, I get:

$ service httpd status
httpd is stopped

So I try to start the service:

$ /etc/init.d/httpd start
Starting httpd: [ OK ]

1. The dmesg command
kernel log messages

2. The command “mknod myfifo b 4 16”
Will create a block device if user is root

3. Which command is used to set terminal IO characteristic?
stty

4. Which command is used to record a user login session in a file?
script

5. Which command is used to display the operating system name
uname

6. Which command is used to display the unix version
uname -r

7. Which command is used to print a file
ipr

8. Using which command you find resource limits to the session?
ulimit

9. Which option of ls command used to view file inode number
–i
df -i

10. find / -name ‘*’ will
List all files and directories recursively starting from /

11.Solaris is the name of a flavor of UNIX from
Sun Microsystems

12. Which of the following is “NOT” a UNIX variant ?
AS400

13.The system calls in UNIX is written using which language
C

14.Which of the following enables multi-tasking in UNIX?
Time Sharing

15.Which of the following is considered as the super daemon in Unix?
init

16.Unix is which kind of Operating System?
a) Multi User
b) Multi Processes
c) Multi Tasking

17.Lp0 device file is used to access
Printer

18.Syntax of any Unix command is
command [options] [arguments]

19. Which of these is not a Unix Flavor?
a) MAC

Unix Flavor
b)BSD
c) AIX
d) IRIX

20. Configuration file (permanent)

To make the setting permanent, add the following line to /etc/vimrc or ~/.vimrc using your favorite text editor

set number

Vim will read the configuration file every time it's started, and will display the line numbers.
21. Which among the following is used to write small programs to control Unix functionalities?

C Language

22.What control character signals the end of the input file?
ctrl + d

23.How do you get help about the command “cp”?
man cp

24. To increase the response time and throughput, the kernel minimizes the frequency of disk access by keeping a pool of internal data buffer called
Buffer cache

25.At start of process execution, STDOUT & STDERR

Point to current terminal device

26.wtmp and utmp files contain:
User login-logout log

27. Which is the core of the operating system?
Kernel

28.Which among the following interacts directly with system hardware?
Kernel

29.Applications communicate with kernel by using:
System Calls

30.Which command is used to display the octal value of the text
od

31.Which command is used to view compressed text file contents
zcat

32.Which command changes a file’s group owner
chgrp

33.Which command is used to extract intermediate result in a pipeline
tee

34.Which command is used to extract a column from a text file
cut

35.Which command is used to display disk consumption of a specific directory

36.Which command creates an empty file if file does not exist?
touch

37.Which command is used to perform backup in unix?
cpio

38.Which option of rm command is used to remove a directory with all its subdirectories
-r

39.Which command is used to identify file type?
File

40.Command used to determine the path of an executable file is
which

41.Command used to count number of character in a file is

42.Which of these commands could you use to show one page of output at a time?

less

43.Which commands will give you information about how much disk space each file in the current directory uses?
du

44.Which command is used to display all the files including hidden files in your current and its subdirectories ?

ls –aR

45.Which of the following commands can be used to copy files across systems?
ftp

46.pwd command displays.

present working directory

47.Which of the following commands can be used to change default permissions for files and directories at the time of creation

Umask

48.Which tar command option is used to list the files in a tape archive format?

tvf

49.Which of the following commands will allow the user to search contents of a file for a particular pattern

grep

Tushar Jadhav Blog

Saturday, January 28, 2017

Troubleshooting of Linux Issues Part - 4

Tuesday, January 24, 2017

Linux Interview Question - Part 6

Monday, January 23, 2017

google sdk.cloud storage & python 2.7 installation