How to Sync Files From My Laptop to Remote Server

Context

I received phone call from one person who with his frightened voice said: Dermatologist!. I wanted to say kindly that that person probably has got wrong number because I am not dermatologist and I cannot give any medical advise about skin issues except to suggest to someone to visit dermatologist. But, that person said: I am dermatologist. I have very big problem. I have to transfer a huge number of files from one computer to another. I heard that you can help me. I have got more than fifteen thousands of images, pdf files and I have to transfer them to some server. I have been told that I will receive ssh details. What is that?

We met each other and in our conversation I realized that a huge repository of medical images, scientific texts and program files has to be migrated to new server which will be used by scientists to share scientific research. Dermatologist has got laptop with GNU/Linux and he has no previous experience in working with procedures and software in order to accomplish his task.

It is doable, I said. But, you have to be precise in typing commands and respect order of steps what is needed to do. Focus, concentration and accuracy are keywords, I emphasized. I hope I will be able to do that, he said.

First step – SSH

SSH is a secure communication protocol used to connect to remote GNU/Linux server. SSH stands for Secure SHell. Many people who are not used to work with servers expect that hey will see some complicated graphical interface. There is no complicated graphical interface, there is textual interface without buttons, dropdown menus, various radio and other boxes. Is that even more complicated? No. But, if you are dermatologist or other professional not familiar with IT you can ask someone to help you or just be happy to learn very small number of commands. Just follow and remember this guide and you do not need to learn anything complicate in order to accomplish task of copying a large amount of various files from one server to another that someone gave to you.

Our dermatologist has got those files copied to the laptop which he has to use to transfer files to other server. Since he received information that remote server was set up with SSH enabled he has to learn how to connect to it.
In order to connect to remote server he turned on his laptop and started terminal application that is terminal with text interface that he needs to start SSH session.

Terminal application window usually looks like this:

Terminal application window

Terminal application window will prompt you with username and computer name. No buttons, no complicated menus. Only prompt for commands. (Users that use screen readers can easily in textual interface perform all activities required.) However, the most important is that you understand logic what you want to do and to express it in some command. Once you get right commands you can literally copy them to the prompt.

Why Secure Shell? Why not some nice graphical user interface with my password? Security is not only feature. It must be principle. So, how SSH works? Using passwords only can be vulnerable method since many malicious bots and guys with powerful hardware can use various techniques to steel passwords from you. SSH is when security is concerned still better method to be used. Actually, SSH use so called SSH keys which are cryptographic matching pair of keys. One is private key and one is public key. Public key can be shared and exposed to others (do not do that anyway) but private key must not be exposed to anyone. The private key and processes of encryption and decryption during the establishment of connection are essential for security of SSH connection.

Firstly, we have to generate a pair of keys using command:

ssh-keygen

When we issue that command in our prompt the system will ask us to name the file in which key will be saved, and after that we will be asked to enter passphrase. Please type passphrase that you can remember. Your screen will look similar to the screen as on image below:

Outputs of ssh-keygen command

In the latest versions of GNU/Linux distributions ssh is usually configured by default to generate keys with high level security. In our case default values say that we have got RSA keys with 2048 bits.

After we generated the keys we should transfer public key to remote server by issuing command:

ssh-copy-id username@remote_host

Usually, administrator of that server or hosting company that set it up gave you the username and password while remote_host can be some address: medicaljournal.org or some IP number which looks like 193.243.27.183 or so. (This IP number is unknown to me so please do not use it, I typed just to show people without IT background how it can look like.)

After that you can connect to your serve by issuing command:

ssh -p 2020 username@remote_host

Please note that we use option that SSH is open for connection on port 2020 because sometimes hosting companies use that port instead of default 22. If your hosting company use port 22 you do not need to write “- p 2020”. If they use other port you can use that port and it will be “-p numberofport”. After issuing command we will have on prompt something like:

The authenticity of host '[name or ip number of your host will be here]:numberofport ([name or number of your host]:numberofport)' can't be established.
ECDSA key fingerprint is SHA256:y9aVJtMpIZusjf3bmSEtWg/9RwjTrCbAT0Tli9pvLmM.
Are you sure you want to continue connecting (yes/no)?

When you type “yes” and press Enter it will ask you password of that server. If you are scared you can type “logout” and press Enter and system will log you out. So far so good. Nothing exploded, you are safe and you have done wonderful work which you have to do only once.

Finally copying. But, I have questions!

When I showed that to dermatologist he felt something between happiness due to some new discovery and a sort of stage fright. Should I be able to do that to the end properly?, was visible on his face. But, I have some questions, he said. I have been told that I have to do that with port 2020. Secondly, I will have always on my computer a number of new files how can I copy them on the server? One by one? Should I have some paper evidence of what I copied?

Well, I started, there is easy answer to your questions. We can combine commands rsync and ssh if you have many files. Firstly, you have to keep your file sin some folder on your computer and we should issue this command:

rsync -avz -e "ssh -p $portNumber" source destination

The system will first time copy all files from the computer that in folders contains all files and copy them on remote computer. After copying them firstly it will copy only new files in the second turn. (rsync stands for remote sync)

In the case of our dermatologist he has got folder /research-files which had a lot of subfolders with names of researchers and each has had subfolders /articles /statistics /measurements /photos and /diagrams. He wants that on remote server should be the same principle applied with names of folders and subfolders. On remote server in /home folder administrator created user researcher and the user’s folder named researcher. All files should be copied to that folder using port 2020 in ssh. Sounds complicated, but it is not. on his laptop he has got folder /home and user /dermatology in which all folders and files are copied.

The command will be like this:

rsync -avz -e "ssh -p 2020" /home/dermatology/research-files researcher@remote_host: /home/researcher/

Instead of remote_host type your host name or IP number. After issuing this command the system will ask you to type password of your user. After typing password the process of remote sync will begin. Due to flag v in “avz” which stands for “verbose” you will see on your screen whole process. Duration of the process will depend on your upload speed. You can send mails, open documents, play music on your computer. The first turn can last longer if you have many files. But, the second and other turns will transfer only difference which means files that are added after the first turn. That will probably be considerably shorter. That’s all. Not too hard.

SSL, TLS-Security

Developers of applications for open access publishing and repositories are aware that the users will have sometimes very long interactive sessions while doing their work.

Author may spend considerable time while uploading his submission, completing metadata forms, checking various criteria such as copyright policy, privacy policy, conflict of interests, exchanging messages with editor and reviewers. After acceptance of submission author selected as principal contact person may need to realize the payment of submission fee through open access publishing web application. High level of security is needed for the payment especially having in mind that money transferred is sometimes part of the project budget given by some donor, ministry or (inter)state agency. Reviewers may spend a lot of time writing their reviews, completing review form. Readers and librarians may spend a lot of time being logged in while reading, paying subscription, collecting metadata of articles related to some specific topic. The journal secretaries, subscription managers may need to upload data on the user’s payments or information that should be private. Editors may spend a long time checking reviews, respect of privacy concerns that may be important for various supplementary files (predominantly valid in humanities, medicine, health science) various preview and review discussions, communications from the journal management on subscription and submission fee financial statements.

During the interaction various bad things can happen. Passwords may be stolen, credit card numbers can be stolen and later used for illegal purposes, malicious code can be injected and server used for various illegal purposes. Risk of that is higher if the user’s computer is infected and not properly maintained, cleaned and if it is used irresponsibly.

Security measures are not invented only for military, police and special purpose computers. Security should be principle not optional feature. The system administrators should take prevention measures seriously and help editorial boards, librarians and other organizations and persons while preparing their projects for open access publishing and repositories. We cannot control who will at some point try to intercept our interaction with web application and perform activities that may do damage to our work and work of many other scholars, scientists, general public. CrossRef and some other organizations involved in open access publishing especially if the payments are being realized using web platform required implementation of security standards in web applications. Several editors told me that they are requested to put “s” after “http” and add icon of green padlock there.

green padlock and https

Letter “s” after http means that http (Hypertext Transfer Protocol) is secure.  HTTPS is extension of HTTP. Green padlock is a small icon which is put there after performing various checks whether content and communication on respective site is secure. Many companies, international organizations gathering IT security scientists and engineers dedicated considerable amount of time and resources to establish standards, technologies, protocols and software tools to make internet communications secure and check level security.

Many people use term SSL for all security measures although, strictly speaking SSL stands for Secure Sockets Layer. SSL was the standard security technology for establishing an encrypted link between a web server and the users’ browser. The link protected by SSL ensures that all data passed between the web server and browsers remain private and integral. SSL technology was furthermore developed and after version 3.0 new technologies are being used. The Inernet Engineering Task Force published their statement in which they request that SSL should be deprecated. The users currently use on servers use TLS version 1.3. TLS  stands for Transport Layer Security. The version 1.2 of TLS was deprecated in August 2018.

The basics of functioning of the protocol can be learned from this video:

Owners of sites should purchase or get free TLS certificate from CA-Certificate Authority.  Some states established their own certificate authorities.  Indeed, many companies are involved in sale and technical support for installation and maintenance/support of certificates.  Depending on required security level you should make decision which security certificate you should install on your server. For example, if you maintain repository of primary data related to humanities and they might have private information on people involved in medical, psychological, social research you should consider use of stronger certificate which is more expensive and implementation of additional measures and policies that should keep some of data private.  If you manage a lot of payment transactions you should also consider to purchase certificate aimed for higher security level. If you do not have payments and special privacy related concerns you can consider some basic certificate or getting some certificate from open certificate Authority such as Let’s Encrypt.

Installation of certificate is sometimes process in which CA or its seller guides you and during that process they may have some security and identity checks.  That process is usually short but for some more expensive certificates it may take some time depending on checking procedures.  You maybe asked to check mail a coupe of times, click on confirmation links.  Some hosting companies and CAs such as Let’s Encrypt prepared on line instructions that is easy to follow.  However, editors who are not familiar with IT technologies and standards should ask hosting company or their system administrator to install certificate properly.  After installation of security certificate green padlock should appear on the left side in status bar in browser. If there is not green padlock or you see padlock with exclamation sign please check Why No Padlock site and use their testing tool in order to fix potential issues.  Output of test may show results like on image below:

checking ssl certificate

Some people use SSL Tools or other on line certificate checkers in order to check validity of certificate.

Some CAs offer additional checks of installed certificates since there are know attacks on certificates.  After I installed one relatively expensive certificate for one site that manage on line payments I noticed that their server still had active TLS 1.0 which was vulnerable to so called BEAST attack.  Although latest versions of TLS are not vulnerable to BEAST attack it is always good to check whether your hosting company or institutional servers are updated to the latest versions of security protocols.  Additional information on some other attacks on certificates are described on one very interesting security oriented blog. 

Note: Despite popularity of mobile phones I would not use mobile phone  for work on very important data or administering web application or server with all kind of important information. 

 

 

 

Malware Intrusion

We know that there is no ideally secure server. I witnessed many times that hosting companies and their employees sometimes suffer from a lack of resources, equipment and skilled people that should take care on security of servers.  One of them tried to convince me that permission for folders in public_html should be 777. (If you are new to web applications and setting up your system for open access publishing please find on the internet information about permissions on your server. Majority of hosting companies with shared hosting accounts by default set that folders do have permissions set to 755 and files to 644.  Those people who want to compromise your server usually inject code that is planned to exploit vulnerabilities and use your server for some, usually illegal, operations as on image below.  When you in the process of choosing application, hosting company and person who will administer server the security should be top priority issue.

example of intrusion codeThere are various methods how to do that. Example on presented here was part of one larger file that was present on one server used to publish scientific journals.  Sometimes, servers are safe but applications installed are very vulnerable.  Strong competition and financial urges force developers to issue product as soon as they can without proper testing. I came across several times that some pieces of software are written for very obsolete and insecure versions of PHP which poses additional risks for security of site. On the other side, various additions of custom code that is not tested can make system insecure.

Such incidents can endanger your reputation and trust of authors, readers, reviewers and librarians that would like to visit your site often. Above all, sometimes some drivers, firmware, operating systems are vulnerable and you as user of one account cannot do anything to prevent that. That is job of people in hosting company and manufacturers of hardware with vulnerable software to fix vulnerable parts of software. Nevertheless, this should not discourage your from publishing open access.  Constructive and proactive caution is always necessary and welcome.

 

Once, I received call from one association that is publisher of one scientific journal. They informed me that some strange code appeared on their site and I used various malware testing tools and my result was like on image below. I found soon that server was infected so called db.php infection.  Since malware was successfully uploaded on server, it GET requests and it infects every javascript files (.js) with javascript malware code.encoded intrusion  I decoded strings displayed on page and I found IP address of server that is infected and which is used for distribution of malware and which redirects users to other sites. Since such code was all over the site it was very hard to read pages and visitors were prevented from using open access content.

I reported editorial board of the journal on my findings and we informed hosting company and domain registrar of domains used to spread malware asking them to check issue and undertake necessary measures to stop abuse of our and possibly other sites infected by that malware.

The process was rather tense, stressful and painful for editorial board and all people concerned.  The hosting company that hosted server with domain used for spreading malware informed us that they will take care on the case. 

We used other tools to block IPs that are detected as attackers. We have had that day more than 290 attacks from computers from Panama and more than 150 attacks from computers from Ukraine. We restored our site by using fresh backups and reinstallation of web applications we use.  Our hosting company upgraded PHP version that was obsolete, unsupported and insecure at  the time.