In order to understand how Git works, we first need to understand how Git saves data about changes made to one or multiple files.
Other versioning control systems track changes made to a file and save them. We end up with a base file and a list of changes committed to that file over time. So, with every version (each commit generates a new version), we see only the changes made to a particular file in that version. Such a way of saving changes makes it harder to see the status of the whole project, especially for files to which no changes have been committed in a particular version.
In Git, saving and thinking about data is slightly different. Git sees its data more as a series of snapshots in time taken from the whole project. Every time we commit or save our project, Git generates a snapshot of our project at that moment, including all the files in that moment and saves a reference to that snapshot. We can recall any snapshot made up until a desired point in time and see the status of the entire project with all its files. This differs from the previous case, where we looked at past versions and saw only the changes made to the files that have changed. In this case, every snapshot contains all the files.
Git's specialty is the way it manages files that have not changed in a particular snapshot. It's efficient enough not to save a copy of unchanged file in a snapshot – it rather stores a link to its last version.
Git Functions Locally
Another one of Git's advantages is that everything functions locally. As Git is a distributed version control system, all the changes are saved locally on every machine. It can then be synced to the server or pulled from it.
When we pull a new version from the server, we also pull the whole history of changes. Our local machine has the complete history of changes made to our project. All the Git operations are fast, almost instantaneous because they're read from our local database.
For instance, if we'd like to see the changes made to our project a few weeks or months ago, Git compares our local repository with the data on the remote repository and only pulls the files that have changed. You can actually operate Git without any access to the network. You would then upload the changes when you get to a network.
A very important aspect of Git is the way in which changes are saved to our project. Before anything is saved to the database, it receives a checksum. Git saves everything into its database, not by filename but by the hash value of its content. When we want to find a particular commit, we’ll use the checksum to find it.
Git uses the SH1 cryptographic hash function to generate checksums. SHA1 generates 40-character string composed of hexadecimal characters and calculated based on the content of the data input. Furthermore, only the same input data will always produce the same exact hash. Even the slightest change to the input data will generate a different hash.
Writing the reference to a commit in the form of a checksum enables Git to keep data integrity. Since this is a distributed version control system, we need to be certain that our files are unique and that two people are looking at exactly the same file. When Git compares two files, it's comparing their checksums. Even the slightest change to any of the project files changes the entire checksum of the project for a particular commit.
Other version control systems use simple reference marks for checksums, such as V00234, V00235 and so on.