This article is the basic concept of industrial storage, and the content covers a series of concepts such as storage technology from NAS, SAN to host, switch, network, interface protocol, array and so on.
Network Attached Storage (NAS):
It is the name of a specialized data storage technology, which can be directly connected to a computer network and provides centralized data access services for heterogeneous network users.
The difference between NAS and traditional file storage services or direct storage devices is that the operating system and software on NAS devices only provide data storage, data access, and related management functions; in addition, NAS devices also provide more than one type of file Transfer Protocol. NAS systems usually have more than one hard disk, and like traditional file servers, they are usually formed into RAID to provide services; with NAS, other servers on the network can no longer serve as file servers. The types of NAS are very diverse. It can be a mass-produced embedded device, or it can run NAS software on a general computer.
NAS uses a file-based communication protocol, such as NFS (common on UNIX systems) or SMB (commonly used on Windows systems). NAS uses a file-based communication protocol. Everyone knows their operating mode. In contrast, a storage area network (SAN) uses a block-based communication protocol, usually through SCSI. It is Fibre Channel or iSCSI. (There are various other SAN communication protocols, such as ATA over Ethernet and HyperSCSI, but these are not common.)
NAS computers or devices usually use a streamlined version of the operating system, which only provides the simplest file service and its related communication protocols; for example, there is an open source NAS software called FreeNAS that uses a streamlined version of FreeBSD , It can run on general computer hardware, and commercial embedded devices use closed-source operating systems and communication protocol programs.
Storage Area Network (SAN):
It is an architecture that connects external storage devices and servers. People use various technologies including Fibre Channel technology, disk arrays, tape cabinets, and CD cabinets (en) to achieve this. The feature of this architecture is that the storage device connected to the server will be treated as a directly connected storage device by the operating system. Although the complexity and price of SAN have fallen, it is not widely used in large-scale enterprise storage solutions.
Compared with SAN, network storage devices (NAS, Network Attached Storage) use file-based communication protocols. For example, NFS or SMB/CIFS communication protocols are clearly defined as remote storage devices. Computers request access to abstract files. A piece of content, not a block device operation performed on the disk.
HBA:
We know that network cards are used to connect computers and computer networks. The network card is generally inserted into the computer's large bus expansion slot, and the card has an interface to connect to the computer network. The network card is physically connected to the internal bus of the computer, such as PCI, PCI-X, PCI-E, SUN Sbus, etc., and the computer network, such as Ethernet. The storage system also has similar equipment used to connect the internal bus of the computer and the storage network. This kind of equipment located on the server and connected to the storage network is generally called a Host Bus Adaptor (Host Bus Adaptor) HBA. HBA is the physical connection between the I/O channel inside the server and the I/O channel of the storage system. The most commonly used internal I/O channels of a server are PCI and Sbus, which are communication protocols that connect the server CPU and peripheral devices. The I/O channel of the storage system is actually Fibre Channel. The role of HBA is to realize the conversion between the internal channel protocol PCI or Sbus and the Fibre Channel protocol.
Common data communication protocols between servers and storage devices are IDE, SCSI and Fibre Channel. In order to realize the communication between the server and the storage device, both ends of the communication need to implement the same communication protocol. Storage devices usually have controllers, which implement one or several communication protocols, which can convert storage protocols such as IDE, SCSI or Fibre Channel to the operating protocols of physical storage devices. The communication protocol of the server is realized by the expansion card or the integrated circuit on the motherboard, which is responsible for the conversion of the bus protocol in the server and the storage protocol such as IDE and SCSI. For example, in PCs, IDE protocol functions are generally available on motherboards, and IDE protocol functions are available on IDE disk controllers. Therefore, the IDE disk can be connected to the IDE connection line of the PC. If the disk only supports the SCSI protocol, then this kind of disk cannot be directly connected to the PC. At this time, you need to insert a SCSI card into the expansion slot of the PC, and the SCSI disk can be connected to the card. The SCSI card realizes the conversion from PC bus to SCSI. The function realized by this SCSI card is the function of the host bus adapter card. If the disk only supports the Fibre Channel protocol, then the server needs to support the Fibre Channel protocol, because the high-speed characteristics of Fibre Channel are generally not supported by server motherboards, and a dedicated host bus adapter card is required. After the server is inserted into the host bus adapter card, it can be connected to the disk that supports Fibre Channel through Fibre Channel.
Failover:
In computer terms, failover means that when an active service or application terminates unexpectedly, a redundant or backup server, system, hardware, or network is quickly activated to take over for them. Failover is basically the same as the switchover operation, except that the failover is usually done automatically, and there is no warning to remind you to complete it manually, and the switchover needs to be done manually.
For servers, systems, or networks that require high availability and high stability, system designers usually design failover functions.
At the server level, automatic failover usually uses a "heartbeat" line to connect two servers. As long as the pulse or "heartbeat" between the primary server and the backup server is not interrupted, the backup server will not be activated. In order to hot switch and prevent service interruption, there may also be a third server running standby components on standby. When the primary server "heartbeat" alarm is detected, the backup server will take over the service. Some systems have the function of sending failover notifications.
Some systems are deliberately designed to not be able to perform fully automatic failover, but require administrator intervention. This "manually confirmed automatic failover" configuration, when the administrator confirms the failover, the whole process will be completed automatically.
Load balancing:
Load balancing is a computer network technology used to distribute load among multiple computers (computer clusters), network connections, CPUs, disk drives, or other resources to optimize resource usage, maximize throughput, and minimize The purpose of reducing the response time and avoiding overload at the same time.
Using multiple service components with load balancing instead of a single component can improve reliability through redundancy. Load balancing services are usually completed by dedicated software and hardware.
Cluster:
Computer cluster (Cluster) is a kind of computer system, which is connected by a group of loosely integrated computer software and/or hardware to complete computing work in a highly close collaboration. In a sense, they can be seen as a computer. A single computer in a cluster system is usually called a node, and is usually connected via a local area network, but there are other possible connection methods. Cluster computers are generally used to improve the calculation speed and/or reliability of a single computer. In general, cluster computers have a much higher cost-performance ratio than single computers, such as workstations or supercomputers.
Switch:
A network switch (English: Network switch) is a device that expands the network and can provide more connection ports in the subnet to connect more computers. The switch works on the second layer of the OSI reference model, the data link layer. The CPU inside the switch will learn its MAC address through the ARP protocol when each port is successfully connected, and save it as an ARP table. In future communications, data packets sent to this MAC address will only be sent to its corresponding port, not all ports. Therefore, the switch can be used to divide the data link layer broadcast, that is, the collision domain; but it cannot divide the network layer broadcast, that is, the broadcast domain.
Soft partition:
The meaning of soft partition is that the switch puts the global name of the device in a partition, regardless of which port is connected. For example, if the global name Q and the global name Z are in the same zone, they can talk to each other. Similarly, if Z and A are in another partition, Z and A can see each other, but A cannot see Q. This is the complexity of the partition; this feature is not common in Ethernet switches.
The concept of soft partition is not difficult to understand. It simply indicates that the architecture is based on the global name of the node. The advantage of using this kind of soft zoning is that you can connect to any port of the switch, and if you can see other nodes, then you can also access these nodes.
From a management point of view, the soft partition environment is simply a mess. When performing maintenance, you must know where each node is connected. If you use soft zoning, there is no port description on the switch, because the port information is likely to be out of date soon. In addition, soft partitions have certain security risks. As far as everyone believes, no one has ever seen a hacker trying to spoof the global name, but this behavior is possible. It is very difficult to change the partition of the device by changing the global name of the device, because the hacker does not know which global names can access the partition he wants to enter. You never put your switch settings information under the public, right?
Hard partition:
Hard partitions are more similar to virtual local area networks in the Ethernet world. If a port is placed in a zone, any traffic connected to this port will come from this zone or the number of zones set up. Of course, if someone can move the fiber optic cable, then this kind of partition is not so safe in the face of physical attacks. But do you need to worry about this situation? Therefore, for SAN, the best setting is: switch hard zoning, and restrict access to the global name of the array (target) logical unit number (LUN). Your storage array also needs global name masking so that multiple initiators can be partitioned so that the array can be seen at the same time.
Some people have strange ideas about the partition structure. Putting the same operating system on the same partition may seem like a good idea, but it doesn't make any sense in practice. In the past, it was always easy to be afraid to put Windows servers and storage arrays using different operating systems in the same partition. When you see a new LUN, Windows will pop up the "Do you need to initialize a new volume?" dialog window, and if the Windows administrator decides to click "Yes" easily, then he will destroy the logical unit number of other people. If the storage array is shielded by the logical unit number, then this is not a problem.
SCSI:
Small Computer System Interface (SCSI, Small Computer System Interface) is an independent processor standard for system-level interfaces between computers and their peripheral devices (hard disks, floppy drives, optical drives, printers, scanners, etc.). The SCSI standard defines commands, communication protocols, and the electrical characteristics of entities (in OSI terms, it occupies the physical layer, link layer, communication layer, and application layer). The largest part of the application is on storage devices (such as hard disks, Tape drive); but, in fact, the devices that can be connected to SCSI include scanners, optical devices (such as CDs, DVDs), printers, etc. The SCSI command has a list of supported devices and SCSI peripheral devices. In theory, it is impossible for SCSI to connect all devices, so the parameter "1Fh-unknown or no device type" exists.
Fibre Channel:
Fibre Channel (FC for short) is a high-speed network interconnection technology (usually operating at 2Gbps, 4Gbps, 8Gbps and 16Gbps), which is mainly used to connect computer storage devices. Fibre Channel is standardized by the T11 Technical Committee of the International Committee for Information Technology Standards (INCITS). INCITS is officially recognized by the American National Standards Institute (ANSI). In the past, Fibre Channel was mostly used for supercomputers, but it has also become a common connection type in enterprise storage SANs. Although it is called Fibre Channel, its signals can also run on twisted pairs other than optical fibers.
Fibre Channel Protocol (Fibre Channel Protocol, FCP) is a transmission protocol similar to TCP, mostly used to transmit SCSI commands on Fibre Channel.
iSCSI:
iSCSI, also known as IP-SAN, is a storage technology based on the Internet and the SCSI-3 protocol. It was proposed by the IETF and became an official standard on February 11, 2003. Compared with traditional SCSI technology, iSCSI technology has the following three revolutionary changes:
Cooperate with SCSI, which was originally only used for this machine, to transmit through the TCP/IP network, so that the connection distance can be extended infinitely;
The number of connected servers is unlimited (the upper limit of the original SCSI-3 is 15);
Because it is a server architecture, it can also achieve online expansion and even dynamic deployment.
FCoE:
Fibre Channel over Ethernet (English: Fibre Channel over Ethernet, abbreviated as FCoE) is a communication technology standard. It uses the Ethernet to transmit the Fibre Channel (Fibre Channel) frame, so that the data of the optical communication can be transmitted in the 10 Gigabit Ethernet network backbone, but it still uses the Fibre Channel protocol. It is part of the INCITS T11 FC-BB-5 standard.
SATA:
Serial ATA (Serial ATA: Serial Advanced Technology Attachment) is the twin brother of Serial SCSI (SAS: Serial Attached SCSI). The cables of the two are compatible, and the SATA hard disk can be connected to the SAS interface. It is a computer bus whose main function is to transfer data between the motherboard and a large number of storage devices (such as hard disks and optical drives).
Established by the "Serial ATA Working Group" in November 2000, SATA has completely replaced the old hard disks with the old PATA (Parallel ATA or formerly known as IDE) interface, named after the serial transmission of data. In terms of data transmission, SATA is faster than ever, and supports hot-plugging, so that hardware can be plugged in or removed from the computer while it is operating. On the other hand, the SATA bus uses embedded clock frequency signals and has stronger error correction capabilities than before. It can check transmission instructions (not only data). If errors are found, they will be corrected automatically, which improves the reliability of data transmission. sex.
However, the most obvious difference between SATA and the past is the use of a thinner cable, which is conducive to air circulation inside the chassis and to a certain extent increases the stability of the entire platform.
Currently, SATA has three specifications: SATA 1.5Gbit/s, SATA 3Gbit/s and SATA 6Gbit/s. There will be a faster SATA Express specification in the future.
RAID:
Redundant Array of Independent Disks (RAID, Redundant Array of Independent Disks), formerly known as Redundant Array of Inexpensive Disks, or hard disk array for short. The basic idea is to combine multiple relatively cheap hard disks into a hard disk array group, so that the performance can reach or even exceed that of an expensive hard disk with huge capacity. Depending on the version selected, RAID has one or more of the following advantages over a single hard disk: enhanced data integration, enhanced fault tolerance, and increased processing capacity or capacity. In addition, the disk array looks like a separate hard disk or logical storage unit to the computer. Divided into RAID-0, RAID-1, RAID-1E, RAID-5, RAID-6, RAID-7, RAID-10, RAID-50, RAID-60.
In simple terms, RAID combines multiple hard drives into one logical sector, so the operating system will only treat it as one hard drive. RAID is often used on server computers and often uses identical hard drives as a combination. Due to the declining price of hard drives and the more effective integration of RAID functions with motherboards, it has also become a choice for players, especially for jobs that require large-capacity storage space, such as video and audio production.
The initial RAID is divided into different levels, each level has its theoretical advantages and disadvantages, different levels strike a balance between the two goals, which are to increase data reliability and increase memory (group) read and write performance. Over the years, different applications for RAID concepts have emerged.