Distributed database is defined as a database which can be stored on multiple computers. Today, in the age of advances in Information Technology, there is important for people to gain access of the updated information on hand. Users have the opportunity to gain access of the information at anywhere and anytime in the network by using the distributed database. The security issues and concurrency control in distributed database are discussed in paper 1.
Based on paper 1, the communication and data processing have been improved by using the distributed database system. This is because the data on distributed database is spread throughout different computer network sites. Not only for increasing the speed of data access, but it also provides a local control of data for users and a single-point of failure is much less probably to arise. Distributed database is a database which is spread across multiple computers that are connected via the data communication links. The advantage of distributed database is data is distributed, so that the network traffic can be reduced. Moreover, if the network of the company is temporarily broken, the local database does not affected and it will remain the works. Due to the distributed database is stored in multiple computers, so that the work of one branch will not be affected when there is problems exist in other branch. However, to ensure that the information and indexes are not altered will become more difficult. Besides that, it is not well-organized when there is heavy interactions occur between sites.
Fragmentation, replication and data allocation are presented as the design of distributed database. According to the research of Shin and Irani 2, fragmentation is defined as a design method which is used to divide the relation into two or more partitions. Parallelism is one of the advantages of fragmentation. The degree of concurrency and parallelism is increased due to the transaction can be divided into several sub queries by using the fragmentation. However, the overall performance and integrity control will become slow and difficult to control due to the data are stored at different sites. Fragmentation is divided into three types which is horizontal, vertical and hybrid fragmentation. Corolel and Morris described that data replication is refer to the storage of data copies at different locations and different sites that served by a computer network 3. Maintain the stability of data is becoming the main problem in managing the replicated data. There are several advantages of replication which included improved response time, reduced the network traffic and also increased the reliability and availability. The process of deciding where to locate the data is known as the data allocation and the algorithm is considered into several factors such as performance and data availability goals 4. Data allocation strategies are classified into centralized data allocation, partitioned data allocation and replicated data allocation.
According to paper 1, it explained a lot of details about the concurrency control and security in distributed database. Concurrency control in distributed database is defined as the action of processing the concurrent access to the database. Distributed two-phase locking (2PL) is the most familiar distributed concurrency control system. “read any, write all” is the main approach of 2PL protocol and it is used as the basic concurrency control protocol 5. Each transaction in 2PL have executed in two phase which is growing phase and shrinking phase. Growing phase is for obtains locks in transaction, while shrinking phase is for releases locks. Lock managers in 2PL are spread to all sites and each of them is responsible to lock the data at that site. Distributed Optimistic protocol is another protocol for concurrency control. It is operated through exchange the certification information. Security is important in distributed database. It is used to prevent the information and data modified or misused by other people. In this paper, there are four security components is presented which is security authentication, authorization, encryption and also access control. Moreover, deadlock is clarified as the major problem that occurs in distributed system. In this research, 2PL algorithm with Timestamps mechanism is found that it is effectively enough for concurrency control in distributed database.
In paper 1, it is presented about the design, concurrency control and security of distributed database. Security is one of the most important things in distributed database as it is required to ensure that the information and data is operating in a secure environment and integrity. Nowadays, distributed database is becoming famous in computer science. Hence, we need to understand it and try to find out the solutions to improve the weakness of the distributed database.