Title: select Subroutine
1select Subroutine
- Purpose
- Checks the I/O status of multiple file
descriptors and message queues. - Library
- Standard C Library (libc.a)
2Select Subroutine (cont.)
- Syntax
- include ltsys/time.hgt
- include ltsys/select.hgt
- include ltsys/types.hgt
- int select
- (Nfdsmsgs, ReadList, WriteList, ExceptList, Tim
eOut)int Nfdsmsgsstruct sellist
ReadList, WriteList, ExceptListstruct timeval
TimeOut
3Nfdsmsgs
- Specifies the number of file descriptors and the
number of message queues to check. - The low-order 16 bits give the length of a bit
mask that specifies which file descriptors to
check
4Nfdsmsgs (cont.)
- The high-order 16 bits give the size of an array
that contains message queue identifiers. - If either half of the Nfdsmsgs parameter is equal
to a value of 0, the corresponding bit mask or
array is assumed not to be present.
5Nfdsmsgs (notes)
- The low-order 16 bits of the Nfdsmsgs parameter
specify the number of bits (not elements) in the
fdsmask array that make up the file descriptor
mask. - If only part of the last int is included in the
mask, the appropriate number of low-order bits
are used, and the remaining high-order bits are
ignored.
6Nfdsmsgs (notes)
- If you set the low-order 16 bits of the Nfdsmsgs
parameter to 0, you must not define an fdsmask
array in the sellist structure. - Each int of the msgids array specifies a message
queue identifier whose status is to be checked.
7Nfdsmsgs (notes)
- Elements with a value of -1 are ignored.
- The high-order 16 bits of the Nfdsmsgs parameter
specify the number of elements in the msgids
array. - If you set the high-order 16 bits of the
Nfdsmsgs parameter to 0, you must not define a
msgids array in the sellist structure.
8TimeOut
- Specifies either a null pointer or a pointer to a
timeval structure that specifies the maximum
length of time to wait for at least one of the
selection criteria to be met. - The timeval structure is defined in the
/usr/include/sys/time.h (for AIX)
9TimeOut (cont.)
- Time.h contains the following members
- struct timeval
- int tv_sec / seconds /
- int tv_usec / microseconds /
-
-
10TimeOut (cont.)
- The number of microseconds specified in TimeOut.
tv_usec, a value from 0 to 999999, is set to one
millisecond if the process does not have root
user authority and the value is less than one
millisecond.
11TimeOut (cont.)
- If the TimeOut parameter is a null pointer, the
select subroutine waits indefinitely, until at
least one of the selection criteria is met. - If the TimeOut parameter points to a timeval
structure that contains zeros, the file and
message queue status is polled, and the select
subroutine returns immediately.
12ReadList, WriteList, ExceptList
- Specify what to check for reading, writing, and
exceptions, respectively. - Together, they specify the selection criteria.
- Each of these parameters points to a sellist
structure, which can specify both file
descriptors and message queues.
13ReadList, WriteList, ExceptList (cont.)
- Your program must define the sellist structure in
the following form - struct sellist
- int fdsmaskF / file descriptor bit mask /
- int msgidsM / message queue identifiers /
-
-
14ReadList, WriteList, ExceptList (cont.)
- The fdsmask array is treated as a bit string in
which each bit corresponds to a file descriptor. - File descriptor n is represented by the bit
(1 ltlt (n mod bits)) in the array element
fdsmaskn / BITS(int). - The BITS macro is defined in the values.h file.
15ReadList, WriteList, ExceptList (cont.)
- Each bit that is set to 1 indicates that the
status of the corresponding file descriptor is to
be checked. - The arrays specified by the ReadList, WriteList,
and ExceptList parameters are the same size
because each of these parameters points to the
same sellist structure type.
16ReadList, WriteList, ExceptList (notes)
- You need not specify the same number of file
descriptors or message queues in each. Set the
file descriptor bits that are not of interest to
0, and set the extra elements of the msgids array
to -1.
17select return values
- Upon successful completion, the select subroutine
returns a value that indicates the total number
of file descriptors and message queues that
satisfy the selection criteria. - The fdsmask bit masks are modified so that bits
set to 1 indicate file descriptors that meet the
criteria.
18select return values (cont.)
- The msgids arrays are altered so that message
queue identifiers that do not meet the criteria
are replaced with a value of -1. - The return value is similar to the Nfdsmsgs
parameter in that the low-order 16 bits give the
number of file descriptors, and the high-order 16
bits give the number of message queue identifiers.
19select return values (cont.)
- These values indicate the sum total that meet
each of the read, write, and exception criteria. - Therefore, the same file descriptor or message
queue can be counted up to three times.
20select return values (cont.)
- If the time limit specified by the TimeOut
parameter expires, the select subroutine returns
a value of 0.
21select return values (cont.)
- If a connection-based socket is specified in the
Readlist parameter and the connection
disconnects, the select subroutine returns
successfully, but the recv subroutine on the
socket will return a value of 0 to indicate the
socket connection has been closed.
22select return values (cont.)
- For non-blocking connection-based sockets, both
successful and unsuccessful connections will
cause the select subroutine to return
successfully without any error.
23select return values (cont.)
- When the connection completes successfully the
socket becomes writ-able, and if the connection
encounters an error the socket becomes both
readable and writ-able.
24select return values (cont.)
- When using the select subroutine, you can not
check any pending errors on the socket. You need
to call the getsockopt subroutine with SOL_SOCKET
and SOL_ERROR to check for a pending error.
25select return values (cont.)
- If the select subroutine is unsuccessful, it
returns a value of -1 and sets the global
variable errno to indicate the error. In this
case, the contents of the structures pointed to
by the ReadList, WriteList, and ExceptList
parameters are unpredictable.
26select error codes
- EBADF
- An invalid file descriptor or message queue
identifier was specified. - EAGAIN
- Allocation of internal data structures was
unsuccessful.
27select error codes (cont.)
- EINTR
- A signal was caught during the select subroutine
and the signal handler was installed with an
indication that subroutines are not to be
restarted. - EINVAL
- One of the parameters to the select subroutine
contained a value that is not valid.
28select error codes (cont.)
- EFAULT
- The ReadList, WriteList, ExceptList, or TimeOut
parameter points to a location outside of the
address space of the process.
29Code example
- include ltsys/types.hgt
- include ltsys/socket.hgt
- include ltnetinet/in.hgt
- include ltnetinet/tcp.hgt
- include ltfcntl.hgt
- include ltsys/time.hgt
- include lterrno.hgt
- include ltstdio.hgt
-
30Code example (cont.)
- int main()
-
- int sockfd, cnt, i 1
- struct sockaddr_in serv_addr
- bzero((char )serv_addr, sizeof serv_addr))
- serv_addr.sin_family AF_INET
- serv_addr.sin_addr.s_addr
inet_addr("172.16.55.25") - serv_addr.sin_port htons(102)
- if((sockfd socket(AF_INET, SOCK_STREAM, 0))
lt 0) - exit(1)
-
31Code example (cont.)
- if (fcntl(sockfd, F_SETFL, FNONBLOCK) lt 0)
- exit(1)
- if (connect(sockfd, (struct sockaddr
)serv_addr, sizeof(serv_addr)) lt 0 errno
! EINPROGRESS) - exit(1)
- for (cnt0 cntlt2 cnt)
- fd_set readfds, writefds
- FD_ZERO(readfds)
- FD_SET(sockfd, readfds)
- FD_ZERO(writefds)
- FD_SET(sockfd, writefds)
32Code example (cont.)
- if(select(sockfd1,readfds,writefds,NULL,NULL)
lt 0) - exit(1)
- printf("Iteration d \n",
i) - printf("FD_ISSET(sockfd, readfds)
d\n", - FD_ISSET(sockfd, readfds))
- printf("FD_ISSET(sockfd, writefds)
d\n", - FD_ISSET(sockfd, writefds))
- i
-
- return 0
33Code example (cont.)
- Here is the output of the above program
- Iteration 1
- FD_ISSET(sockfd, readfds) 0
- FD_ISSET(sockfd, writefds) 1
- Iteration 2
- FD_ISSET(sockfd, readfds) 1
- FD_ISSET(sockfd, writefds) 1
34Performance Issues and Recommended Coding
Practices
- The select subroutine can be a very compute
intensive system call, depending on the number of
open file descriptors used and the lengths of the
bit maps used. - Most examples shown in older text books were
written when the number of open files supported
was small, and thus the bit maps were short.
35Performance Issues and Recommended Coding
Practices (cont.)
- You should avoid the following (where select is
being passed FD_SETSIZE as the number of FDs to
process) - select(FD_SETSIZE, ....) Performance will be poor
if the program uses FD_ZERO and the default
FD_SETSIZE.
36Performance Issues and Recommended Coding
Practices (cont.)
- FD_ZERO should not be used in any loops or before
each select call. -
- However, using it one time to zero the bit string
will not cause problems. - If you plan to use this simple programming
method, you should override FD_SETSIZE to define
a smaller number of FDs.
37Performance Issues and Recommended Coding
Practices (cont.)
- For example, if your process will only open two
FDs that you will be selecting on, and there will
never be more than a few hundred other FDs open
in the process, you should lower FD_SETSIZE to
approximately 1024. - Do not pass FD_SETSIZE as the first parameter to
select. This specifies the maximum number of file
descriptors the system should check for.
38Performance Issues and Recommended Coding
Practices (cont.)
- The program should keep track of the highest FD
that has been assigned or use the getdtablesize
subroutine to determine this value. - This saves passing excessively long bit maps in
and out of the kernel and reduces the number of
FDs that select must check. - Use the poll system call instead of select.
39Performance Issues and Recommended Coding
Practices (cont.)
- The poll system call has the same functionality
as select, but it uses a list of FDs instead of a
bit map. - Thus, if you are only selecting on a single FD,
you would only pass one FD to poll.
40Performance Issues and Recommended Coding
Practices (cont.)
- With select, you have to pass a bit map that is
as long as the FD number assigned for that FD. - If AIX assigned FD 4000, for example, you would
have to pass a bit map 4001 bits long.
41Is the socket at the other end closed?
- Peer Condition
- Calls close() or exits
- Without touching SO_LINGER
- read()?
- Should return 0.
42Is the socket at the other end closed? (cont.)
- Write() ?
- not so clear what happens in this case
- expect EPIPE, not on the next call, but the one
after.
43Peer condition
- The peer reboots, or sets l_onoff 1, l_linger
0 and then closes - should get eventually ECONNRESET from read()
- or EPIPE from write().
44when write() returns EPIPE
- It also the SIGPIPE signal
- unless you handle or ignore the signal you would
never see the EPIPE error. - If the peer remains unreachable, you should get
some other error.
45when write() returns EPIPE (cont.)
- write() should not return 0.
- read() should return 0 on receipt of a FIN from
the peer, and on all following calls. - Correct, you must expect read() to return 0.
46Example code
- rc read(sock,buf,sizeof(buf))
- if (rc gt 0)
- write(file,buf,rc) / error checking on file
omitted / -
- else if (rc 0)
- close(file)
- close(sock) / file received successfully /
- else / rc lt 0 /
- / close file and delete it, since data is
not complete report error / -
47Get the port number for a service
- How?
- Use the getservbyname() routine.
- returns a pointer to a servent structure.
- Look at the s_port field, which contains the port
number, with correct byte ordering
48Code description
- Take a service name.
- Take a service type.
- Return a port number.
49Code description (cont.)
- If the service name is not found,
- It tries it as a decimal number.
- The number returned is byte ordered for the
network.
50Code
- int atoport(char service, char proto)
- int port
- long int lport
- struct servent serv
- char errpos / First try to read it from
/etc/services / - serv getservbyname(service, proto)
- if (serv ! NULL)
- port serv-gts_port
- else / Not in services, maybe a number? /
- lport strtol(service,errpos,0)
- if ((errpos0! 0)(lportlt1)(lportgt5000)
) - return -1 / Invalid port address /
- port htons(lport)
-
- return port
51When bind() fails
- What to do with the socket descriptor?
- If exiting
- All unix operating systems will close open file
descriptors on exit. (always better do do it
yourself) - If not exiting though,
- Close it with a regular close() call.
52How to properly close a socket?
- close() is the correct method.
- netstat might show that the socket is still
active, this is because of the TIME_WAIT state.
53The TIME_WAIT state
- TCP guarantees all data transmitted will be
delivered, if at all possible. - The server goes into a TIME_WAIT state, to be
really sure that all the data has gone through.
54The TIME_WAIT state (cont.)
- When a socket is closed, both sides agree by
sending messages to each other that they will
stop sending data.
55The problem (TIME_WAIT)
- First, there is no way to be sure that the last
ack was communicated successfully. - Second, there may be "wandering duplicates" left
on the net that must be dealt with if they are
delivered.
56TIME_WAIT (cont.)
- The end that sends the first FIN goes into the
TIME_WAIT state, because that is the end that
sends the final ACK.
57TIME_WAIT (cont.)
- If the other end's FIN is lost, or if the final
ACK is lost, having the end that sends the first
FIN maintain state about the connection
guarantees that it has enough information to
retransmit the final ACK.
58TIME_WAIT (cont.)
- The reason that the duration of the TIME_WAIT
state is 2MSL is that the maximum amount of time
a packet can wander around a network is assumed
to be MSL seconds. The factor of 2 is for the
round-trip. - The recommended value for MSL is 120 seconds.
This means a TIME_WAIT delay can reach up to 4
minutes.
59Detecting a peers death when receiving data
- No packets are sent on the TCP connection unless
there is data to send or acknowledge. - if waiting for data from the peer, there is no
way to tell if the peer has silently gone away,
or just isn't ready to send any more data yet.
(imagine a PC reboot).
60What to do?
- One option is to use the SO_KEEPALIVE option.
This option enables periodic probing of the
connection to ensure that the peer is still
present. - However, the default timeout for this option is
at least 2 hours.
61Detecting a peers death when sending data
- Sending data implies receiving ACKs from the
peer, therefore the retransmit timeout will
indicate whether the peer is still alive.
62Detecting a peers death when sending data (cont.)
- However, the retransmit timeout is designed to
allow for various contingencies, this is to avoid
dropping the TCP connections during minor network
problems. Therefore should still expect a delay
of several minutes before getting notification of
the failure.
63Sender's death
- The current approach is to implement read
timeouts on the server end the server gives up
on the client if no requests are received in a
given time period.
64Sender's death (cont.)
- Protocols where the connection is maintained have
two choices - use SO_KEEPALIVE
- use a higher-level keep-alive mechanism (such as
sending a null request to the server every so
often).
65Pros/cons of select(), non-blocking I/O and SIGIO
- non-blocking I/O implies that the application
have to poll sockets to see if there is data to
be read from them. Polling uses more CPU time
than other techniques.
66Pros/cons of select() (cont.)
- Using SIGIO enables the Operating System to send
a signal when is data waiting for it on a socket.
The drawback is that when dealing with multiple
sockets you will have to do a select() to find
out which one(s) is ready to be read.
67Pros/cons of select() (cont.)
- Using select() is great if your application has
to accept data from more than one socket at a
time, using select() provides an advantage since
it will block until any one of a number of
sockets is ready with data.
68Pros/cons of select() (cont.)
- One other advantage to select() is that you can
set a time-out value, control is returned to the
program whether any of the sockets have data for
you or not.
69Getting EPROTO from read()
- The protocol encountered an unrecoverable error
for that endpoint. - Catch-all error codes used by STREAMS-based
drivers when lacking a better option.
70select says there is data, but read returns zero
- EOF causes select to return zero, because the
other side has closed the connection. - This causes read to return zero.
71Difference between read() recv()
- read() is equivalent to recv() with a flags
parameter of 0. - Other values for the flags parameter change the
behavior of recv().
72Difference between write() send()
- write() is equivalent to send() with a flags
parameter of 0. - Other values for the flags parameter change the
behavior of send().
73calls to socket() failures after the chroot()
- On systems where sockets are implemented on top
of Streams (e.g. SystemV based systems), the
socket() function will actually be opening
certain special files in /dev.
74calls to socket() failures after the chroot()
(cont.)
- You will need to create a /dev directory under
your fake root and populate it with the required
device nodes (only).
75EINTR from the socket calls
- More than an error this is an exit condition.
- The call was interrupted by a signal.
76Receiving SIGPIPE
- With TCP you get SIGPIPE if your end of the
connection has received an RST from the other end.
77Receiving SIGPIPE (cont.)
- If you were using select instead of write, the
select would have indicated the socket as being
readable, since the RST is there for you to read
(read will return an error with errno set to
ECONNRESET).
78Receiving SIGPIPE (cont.)
- Basically an RST is TCP's response to some packet
that it doesn't expect and has no other way of
dealing with. - A common case is when the peer closes the
connection (sending you a FIN) but it is ignored
because the application is writing and not
reading.
79Receiving SIGPIPE (cont.)
- When using select, the application writes to a
connection that has been closed by the other end
and the other end's TCP responds with an RST.
80socket exceptions out-of-band data.
- Socket exceptions do not indicate that an error
has occurred. - Socket exceptions usually refer to the
notification that out-of-band data has arrived.
81socket exceptions out-of-band data. (cont.)
- Out-of-band data (called "urgent data" in TCP)
looks to the application like a separate stream
of data from the main data stream. - This can be useful for separating two different
kinds of data.
82socket exceptions out-of-band data. (cont.)
- "urgent data" does not mean that it will be
delivered any faster, or with higher priority
than data in the in-band data stream. - Unlike the main data stream, the out-of-bound
data may be lost if your application can't keep
up with it.
83Finding full hostname (FQDN)
- Some systems set the hostname to the FQDN and
others set it to just the unqualified host name. - Systems supporting POSIX do this using uname(),
but older BSD systems only provide gethostname().
84Finding full hostname (FQDN)
- Call gethostbyname() to find your IP address.
Then take the IP address and call
gethostbyaddr(). - The h_name member of the hostent should then be
your FQDN.