Blog

How to Sync Files From My Laptop to Remote Server

Context

I received phone call from one person who with his frightened voice said: Dermatologist!. I wanted to say kindly that that person probably has got wrong number because I am not dermatologist and I cannot give any medical advise about skin issues except to suggest to someone to visit dermatologist. But, that person said: I am dermatologist. I have very big problem. I have to transfer a huge number of files from one computer to another. I heard that you can help me. I have got more than fifteen thousands of images, pdf files and I have to transfer them to some server. I have been told that I will receive ssh details. What is that?

We met each other and in our conversation I realized that a huge repository of medical images, scientific texts and program files has to be migrated to new server which will be used by scientists to share scientific research. Dermatologist has got laptop with GNU/Linux and he has no previous experience in working with procedures and software in order to accomplish his task.

It is doable, I said. But, you have to be precise in typing commands and respect order of steps what is needed to do. Focus, concentration and accuracy are keywords, I emphasized. I hope I will be able to do that, he said.

First step – SSH

SSH is a secure communication protocol used to connect to remote GNU/Linux server. SSH stands for Secure SHell. Many people who are not used to work with servers expect that hey will see some complicated graphical interface. There is no complicated graphical interface, there is textual interface without buttons, dropdown menus, various radio and other boxes. Is that even more complicated? No. But, if you are dermatologist or other professional not familiar with IT you can ask someone to help you or just be happy to learn very small number of commands. Just follow and remember this guide and you do not need to learn anything complicate in order to accomplish task of copying a large amount of various files from one server to another that someone gave to you.

Our dermatologist has got those files copied to the laptop which he has to use to transfer files to other server. Since he received information that remote server was set up with SSH enabled he has to learn how to connect to it.
In order to connect to remote server he turned on his laptop and started terminal application that is terminal with text interface that he needs to start SSH session.

Terminal application window usually looks like this:

Terminal application window

Terminal application window will prompt you with username and computer name. No buttons, no complicated menus. Only prompt for commands. (Users that use screen readers can easily in textual interface perform all activities required.) However, the most important is that you understand logic what you want to do and to express it in some command. Once you get right commands you can literally copy them to the prompt.

Why Secure Shell? Why not some nice graphical user interface with my password? Security is not only feature. It must be principle. So, how SSH works? Using passwords only can be vulnerable method since many malicious bots and guys with powerful hardware can use various techniques to steel passwords from you. SSH is when security is concerned still better method to be used. Actually, SSH use so called SSH keys which are cryptographic matching pair of keys. One is private key and one is public key. Public key can be shared and exposed to others (do not do that anyway) but private key must not be exposed to anyone. The private key and processes of encryption and decryption during the establishment of connection are essential for security of SSH connection.

Firstly, we have to generate a pair of keys using command:

ssh-keygen

When we issue that command in our prompt the system will ask us to name the file in which key will be saved, and after that we will be asked to enter passphrase. Please type passphrase that you can remember. Your screen will look similar to the screen as on image below:

Outputs of ssh-keygen command

In the latest versions of GNU/Linux distributions ssh is usually configured by default to generate keys with high level security. In our case default values say that we have got RSA keys with 2048 bits.

After we generated the keys we should transfer public key to remote server by issuing command:

ssh-copy-id username@remote_host

Usually, administrator of that server or hosting company that set it up gave you the username and password while remote_host can be some address: medicaljournal.org or some IP number which looks like 193.243.27.183 or so. (This IP number is unknown to me so please do not use it, I typed just to show people without IT background how it can look like.)

After that you can connect to your serve by issuing command:

ssh -p 2020 username@remote_host

Please note that we use option that SSH is open for connection on port 2020 because sometimes hosting companies use that port instead of default 22. If your hosting company use port 22 you do not need to write “- p 2020”. If they use other port you can use that port and it will be “-p numberofport”. After issuing command we will have on prompt something like:

The authenticity of host '[name or ip number of your host will be here]:numberofport ([name or number of your host]:numberofport)' can't be established.
ECDSA key fingerprint is SHA256:y9aVJtMpIZusjf3bmSEtWg/9RwjTrCbAT0Tli9pvLmM.
Are you sure you want to continue connecting (yes/no)?

When you type “yes” and press Enter it will ask you password of that server. If you are scared you can type “logout” and press Enter and system will log you out. So far so good. Nothing exploded, you are safe and you have done wonderful work which you have to do only once.

Finally copying. But, I have questions!

When I showed that to dermatologist he felt something between happiness due to some new discovery and a sort of stage fright. Should I be able to do that to the end properly?, was visible on his face. But, I have some questions, he said. I have been told that I have to do that with port 2020. Secondly, I will have always on my computer a number of new files how can I copy them on the server? One by one? Should I have some paper evidence of what I copied?

Well, I started, there is easy answer to your questions. We can combine commands rsync and ssh if you have many files. Firstly, you have to keep your file sin some folder on your computer and we should issue this command:

rsync -avz -e "ssh -p $portNumber" source destination

The system will first time copy all files from the computer that in folders contains all files and copy them on remote computer. After copying them firstly it will copy only new files in the second turn. (rsync stands for remote sync)

In the case of our dermatologist he has got folder /research-files which had a lot of subfolders with names of researchers and each has had subfolders /articles /statistics /measurements /photos and /diagrams. He wants that on remote server should be the same principle applied with names of folders and subfolders. On remote server in /home folder administrator created user researcher and the user’s folder named researcher. All files should be copied to that folder using port 2020 in ssh. Sounds complicated, but it is not. on his laptop he has got folder /home and user /dermatology in which all folders and files are copied.

The command will be like this:

rsync -avz -e "ssh -p 2020" /home/dermatology/research-files researcher@remote_host: /home/researcher/

Instead of remote_host type your host name or IP number. After issuing this command the system will ask you to type password of your user. After typing password the process of remote sync will begin. Due to flag v in “avz” which stands for “verbose” you will see on your screen whole process. Duration of the process will depend on your upload speed. You can send mails, open documents, play music on your computer. The first turn can last longer if you have many files. But, the second and other turns will transfer only difference which means files that are added after the first turn. That will probably be considerably shorter. That’s all. Not too hard.

Change of DOI and Associated URL

The Digital Object Identifier is special service introduced with aim to advance indexing, searching, publishing and distribution of publications. It is accepted by the International Organization for Standardization as ISO26324 .

DOI services and registration are provided by the International DOI Foundation (IDF) which is a not-for-profit membership organization.  IDF is governance and management body for the federation of Registration Agencies providing DOI services and registration.  Detailed information about DOI system is published on the site of IDFResources about DOI including video materials are free to download on the site of IDF so those interested to learn more about DOI can check materials and clarify their dilemmas about use and management of DOIs.

One of very important features of DOI is that it is unique and that metadata associated with it can be updated if necessary. Option to update metadata is very important since DOI once registered can be updated so it can easily accommodate all changes related to that digital object (article, issue of journal, book, conference proceedings, supplementary files etc.) that can take place in reality.

Let me describe one situation that I have had recently in work with one editorial board which may in similar form occur in the process of development of any scientific journal or publishing of any publication.  In our case, the editorial board due to changes of partner organizations, establishment of NGO that will support some journals and some technical reasons had to change servers and domains pointing to those servers. In addition, they publish their journals in three installations of the Open Journal Systems that had three domains and two DOI prefixes.  They moved those installations on new servers and they use three installations of Open Journal Systems now on two domains  (URLs) and DOIS created using two DOI prefixes.

Due to flexibility of options to manage DOI numbers in the Open Journal Systems editors can create efficient and easy to use pattern to create DOI numbers or create custom DOI for each article or other digital object they publish.

DOI options in OJS

The editorial board I work with decided that DOI number for each article of their scientific journal should be created like this:

DOIprefix/acronymofjournal.volumeno.issueno.articleviewno

Such a DOI number will have associated the URL that is created by the following principle:

journaldomain/index.php/journalaccronym/article/view/articleviewno

Thus, it is easy to associate each article DOI to unique URL. That URL will point to the page with metadata of article and downloadable article. If URL change there will be change only in the part of URL “journaldomain”. The rest will remain the same.  That enables editorial board and system administrator to update easily any change of URL.

Well, that is fine. But, how to accomplish that task without programming skills?  And, we do not have funds to buy software to accomplish that.  Thus, solution will be to use free software in a way that programming skills are not needed.  Imagine that editor is expert in veterinary medicine for sheep and goats. Very important for local village population. The most probably, editor did not receive any training in PHP, SQL, JavaScript.  So, we have to manage this issue without programming skills.

We will use the following software:

phpMyAdmin that is free and usually preinstalled on server

LibreOffice – advanced office suite with spreadsheet, text writing, presentation and drawing capabilities

Geany – text editor (You can use Notepad ++ or other similar text editor)

Download LibreOffice and Geany on your computer and you are powered with powerful software without proprietary licensing limiting your work.

Firstly, we need to export table from the database in which we can find DOI numbers and after that sort out those DOIs and associate proper URLs to them. Your hosting company will give you link to phpMyAdmin. It is usually part of your cpanel. You will find it in section Databases like on image below:

phpmyadmin icon

When you click on icon phpMyAdmin it will direct you to the screen that display on the left side a list of databases. Click on proper database and phpMyAdmin will direct you to the screen with a list of tables and you will see that there is table called submission_settings like on image below.

submission settings

When you click on that table the system will prompt you with the following screen:

rows in database table

In some cases you will see that such a table consists of 2500 or more rows. You can select how many rows you will see and export data in page by page.  Image above shows that we selected 25 rows, that we are on the page one, we checked all of them. Since table submission_settings contains a lot of rows please feel free to choose that you see 250 or 500 rows. We have to click on Export that you see on the right side below the rows.  When we click on Export the system will direct us to the page that will perform export in desired format. In our case we will choose OpenDocumentSpreadsheet  format as on image below by clicking on little drop down arrow right from the format of export file.

selecting format of export

We have to click on Go button.  So far, we are only clicking, no programming skills required.  After we click on Go button we will be prompted with the screen that will offer us to open file with LibreOfficeCalc which we installed previously on our computer.

selecting LibreOfficeCalc

 

When you click on OK button your computer will open rows that you exported and it will look like on image below:

LibreOffice opened rows

Well, although this looks abstract please notice that we still stick with clicking. But, now some logic we have to apply, not programming skills.  We have to filter out DOI numbers we need and copy them in a separate file. Click on menu Data/More Filters, Standard Filter and you will be prompted with the following screen that will handle what to do with data in columns.  In our case we know that in Column C should be value as displayed on image below and that in Column D should be value that contains your DOI prefix that you have got from CrossRef or other authority that can provide DOI services.

DOI filter

After that you will see table with DOI numbers exported from that number of rows. You can copy those numbers in a separate spreadsheet and repeat this process until you come to the last row of submission_settings table. It can sometimes have 4500 or more rows, but if you follow this procedure and choose let say 500 rows on the first step you will finish that easily.

After you copy all those DOI numbers and put them in one column you will have all DOI numbers.  If you follow pattern that we mentioned above you will easily generate URLs and put all those URLs in second column.  You can write URLs in software Geany which we mentioned  and put them neatly one below each other. We can copy that and paste in the second column in your spreadsheet. After you finish that you can send that file to support in CrossRef and they will manage for you deposits so the proper file will be easily associated with proper URL.  So, we accomplished task only by clicking and at the end applying some logic. Well, it is needed sometimes.

 

 

 

 

 

SSL, TLS-Security

Developers of applications for open access publishing and repositories are aware that the users will have sometimes very long interactive sessions while doing their work.

Author may spend considerable time while uploading his submission, completing metadata forms, checking various criteria such as copyright policy, privacy policy, conflict of interests, exchanging messages with editor and reviewers. After acceptance of submission author selected as principal contact person may need to realize the payment of submission fee through open access publishing web application. High level of security is needed for the payment especially having in mind that money transferred is sometimes part of the project budget given by some donor, ministry or (inter)state agency. Reviewers may spend a lot of time writing their reviews, completing review form. Readers and librarians may spend a lot of time being logged in while reading, paying subscription, collecting metadata of articles related to some specific topic. The journal secretaries, subscription managers may need to upload data on the user’s payments or information that should be private. Editors may spend a long time checking reviews, respect of privacy concerns that may be important for various supplementary files (predominantly valid in humanities, medicine, health science) various preview and review discussions, communications from the journal management on subscription and submission fee financial statements.

During the interaction various bad things can happen. Passwords may be stolen, credit card numbers can be stolen and later used for illegal purposes, malicious code can be injected and server used for various illegal purposes. Risk of that is higher if the user’s computer is infected and not properly maintained, cleaned and if it is used irresponsibly.

Security measures are not invented only for military, police and special purpose computers. Security should be principle not optional feature. The system administrators should take prevention measures seriously and help editorial boards, librarians and other organizations and persons while preparing their projects for open access publishing and repositories. We cannot control who will at some point try to intercept our interaction with web application and perform activities that may do damage to our work and work of many other scholars, scientists, general public. CrossRef and some other organizations involved in open access publishing especially if the payments are being realized using web platform required implementation of security standards in web applications. Several editors told me that they are requested to put “s” after “http” and add icon of green padlock there.

green padlock and https

Letter “s” after http means that http (Hypertext Transfer Protocol) is secure.  HTTPS is extension of HTTP. Green padlock is a small icon which is put there after performing various checks whether content and communication on respective site is secure. Many companies, international organizations gathering IT security scientists and engineers dedicated considerable amount of time and resources to establish standards, technologies, protocols and software tools to make internet communications secure and check level security.

Many people use term SSL for all security measures although, strictly speaking SSL stands for Secure Sockets Layer. SSL was the standard security technology for establishing an encrypted link between a web server and the users’ browser. The link protected by SSL ensures that all data passed between the web server and browsers remain private and integral. SSL technology was furthermore developed and after version 3.0 new technologies are being used. The Inernet Engineering Task Force published their statement in which they request that SSL should be deprecated. The users currently use on servers use TLS version 1.3. TLS  stands for Transport Layer Security. The version 1.2 of TLS was deprecated in August 2018.

The basics of functioning of the protocol can be learned from this video:

Owners of sites should purchase or get free TLS certificate from CA-Certificate Authority.  Some states established their own certificate authorities.  Indeed, many companies are involved in sale and technical support for installation and maintenance/support of certificates.  Depending on required security level you should make decision which security certificate you should install on your server. For example, if you maintain repository of primary data related to humanities and they might have private information on people involved in medical, psychological, social research you should consider use of stronger certificate which is more expensive and implementation of additional measures and policies that should keep some of data private.  If you manage a lot of payment transactions you should also consider to purchase certificate aimed for higher security level. If you do not have payments and special privacy related concerns you can consider some basic certificate or getting some certificate from open certificate Authority such as Let’s Encrypt.

Installation of certificate is sometimes process in which CA or its seller guides you and during that process they may have some security and identity checks.  That process is usually short but for some more expensive certificates it may take some time depending on checking procedures.  You maybe asked to check mail a coupe of times, click on confirmation links.  Some hosting companies and CAs such as Let’s Encrypt prepared on line instructions that is easy to follow.  However, editors who are not familiar with IT technologies and standards should ask hosting company or their system administrator to install certificate properly.  After installation of security certificate green padlock should appear on the left side in status bar in browser. If there is not green padlock or you see padlock with exclamation sign please check Why No Padlock site and use their testing tool in order to fix potential issues.  Output of test may show results like on image below:

checking ssl certificate

Some people use SSL Tools or other on line certificate checkers in order to check validity of certificate.

Some CAs offer additional checks of installed certificates since there are know attacks on certificates.  After I installed one relatively expensive certificate for one site that manage on line payments I noticed that their server still had active TLS 1.0 which was vulnerable to so called BEAST attack.  Although latest versions of TLS are not vulnerable to BEAST attack it is always good to check whether your hosting company or institutional servers are updated to the latest versions of security protocols.  Additional information on some other attacks on certificates are described on one very interesting security oriented blog. 

Note: Despite popularity of mobile phones I would not use mobile phone  for work on very important data or administering web application or server with all kind of important information. 

 

 

 

Intuition is better than manual?

In more than 25 years of working with software I have passed through various stages of using software. I was struggling with bad books, close to empty help files. I stumbled upon parts of interface, guessing what is logic behind, why workflow for some operations and procedures is not obvious. I could not do even simple things since books were scarce and expensive.

Translators who translated manuals and various technical books often invented new terminology and sometimes used awkward sentences in order to translate quite technical text on language which does not have those words in vocabulary. Some linguists were horrified with new intrusion of foreign languages and often coined their own  terms based on a lack of knowledge about technology concerned, misinterpretation of concepts and meaning.

Writing style was often very modest at least. There were no typographic differences between narrative text and command lines, illustrations too general and not representing the process described in paragraph.  Instructions were written in a way that can understand only those who already know that software. Those who did not know to use that software were mostly lost.  Procedures were not described concisely, some steps missing and above all, some examples were not workable.

On the other side, we have to be honest and confess that at the time, despite the fact that more than 20 years passed after wider using of operating systems and the development of software,  the software technology was at the beginning of introduction to the wider user population.  The code for Apollo mission was written by hand. Image bellow taken from Wikipedia shows Margaret Hamilton next to a stack of code she and her team wrote for the Apollo Mission computers.

software developer with books with code

The development of software and its maintenance was rather a hard way with a number of difficulties.

I strongly recommend those interested in history of software development to read article”No Silver Bullet – Essence and Accident in Software Engineering” by Fred Brooks in 1986.  Very interesting Wikipedia article states that: “Brooks distinguishes between two different types of complexity: accidental complexity and essential complexity. Accidental complexity relates to problems which engineers create and can fix; for example, the details of writing and optimizing assembly code or the delays caused by batch processing. Essential complexity is caused by the problem to be solved, and nothing can remove it; if users want a program to do 30 different things, then those 30 things are essential and the program must do those 30 different things. ”  Many scientists, software developers and businessmen noted that software grow faster in size and complexity than are invented methods to handle such complexities.  Some software companies initiated marketing slogan that their software is “intuitive and user friendly” which proved to be just marketing slogan far from truth.
All of that sometimes created great confusion . I felt like some botanist looking for magic flower in the middle of jungle.  I felt I was stuck in the middle of rainforest with gigantic trees, huge bushes, my skin crisscrossed with scratches and covered up with blood stains. It looked like I wanted to fit a square peg to a round hole. Enormous insects around me and snarling of hungry beasts was frightening. How did I get here?  How to get out from this?

In addition, increasingly complex software was often not well developed  and crashed frequently which caused even more confusion.  It was sometimes fun, but sometimes I have had embarrassing experience. A number of hours wasted, loss of data, broken hard disk.  I asked myself: What I have done wrong?  Is all of that really so sensitive that pressing wrong key on keyboard can toast my computer?  Who will use this stuff if it is so easy to ruin all I have done?

At the same time, business companies followed old devastating principles that everything should become commodity and they invested less in the development of supporting documentation.  They rather established support service that is not neither cheap nor efficient.  The free software movement gave to everyone free access to code being protected by GNU/GPL license which granted everyone right to develop, modify, distribute and document software as they like. That was promising framework and social, legal and technological basis for more responsible development and use of software.  At the same time users of many software packages experienced a lack of support, partly documented manuals, a variety of undocumented features, and sometimes unhelpful, arrogant and cynical support guys.

Even when manuals were written properly they were often very large and nicely illustrated books.  Too small number of people have had enough time to read them. I strongly believe that those who have had enough time for studying such manuals benefited a lot.

But, use of software manual is not based only on existence of sufficiently illustrated and written books.  Since I am more than 10 years involved in open access movement I strongly believe(d) that people from academia would use software manuals in a more systematic and principled way. But, my experience is very different.

There may be different reasons for that:

  • not sufficient time for reading detailed manuals
  • scientists and various scholars develop their own software for some purpose so they know it
  • software is developed for special purpose and it will not be used by wider audience
  • manuals and support are expensive or nonexistent
  • scholars and scientists are tired of reading and exhausting tedious work at academia so any additional obligation to read is rather avoided
  • some scholars and scientists display various psychological traits when being confronted with new scientific areas including software
  • some scholars and scientists have (un)diagnosed dyslexia, discalculia or other neurological conditions on a (sub)clinical level which prevents them from detailed reading of manuals, following procedures, remembering various relationships and hierarchies in managing various contexts managed by proper use of software features
  • some scholars and scientists for various reasons are print disabled and they do need assistive technologies to help them to learn to efficiently to use software

I am sure that there may be other reasons too.  But, those who write software and support scientists especially in the field of open access use of software should observe the following  principles:

  • written manuals should be written and tailored to the needs of some institution/library/journal
  • manuals should present procedures with examples from real use of software in a given context according to the version of software the is in production use by the users
  • manuals should describe each step in procedure or workflow and present screenshot that user see on the screen
  • manuals should be written in an accessible format
  • manuals should be enriched with infographics and other illustrations intended to present information in a brief and clear way.  They  can improve learning process by utilizing graphics to enhance the user’s ability to understand structures, workflows and procedures
  • manuals should offer links with screencasts that will show to the users how some tasks can be accomplished
  • software features that are invented with aim to improve workflow of users should be documented with examples of the real scenarios and contexts for a given group of users (i.e. journal, institute, faculty, library)
  • manuals, infographics and screencasts should be licensed with soem acceptable Creative Commons or GNU Free Documentation License

Well, one can say that it is not easy to do. But, software as manuals are a significant part of social interaction of various types of users (in the context of open access there may be readers, authors, reviewers, editors, librarians, lawyers, students, businessmen, policy makers). That is not commodity in a box. The manuals and software are part of ongoing incremental build model due to very dynamic development of various standards and services associated with academic publishing. Contemporary  software platforms require developers and users to take pace of the development of the internet, academic publishing, various technological, cultural, social, legal and scientific challenges and developments.  Consequently, when some editorial board or institution plans to deploy some software platform for academic and scientific publishing it is needed to plan continuous development, testing, support, documentation development and distribution.

Malware Intrusion

We know that there is no ideally secure server. I witnessed many times that hosting companies and their employees sometimes suffer from a lack of resources, equipment and skilled people that should take care on security of servers.  One of them tried to convince me that permission for folders in public_html should be 777. (If you are new to web applications and setting up your system for open access publishing please find on the internet information about permissions on your server. Majority of hosting companies with shared hosting accounts by default set that folders do have permissions set to 755 and files to 644.  Those people who want to compromise your server usually inject code that is planned to exploit vulnerabilities and use your server for some, usually illegal, operations as on image below.  When you in the process of choosing application, hosting company and person who will administer server the security should be top priority issue.

example of intrusion codeThere are various methods how to do that. Example on presented here was part of one larger file that was present on one server used to publish scientific journals.  Sometimes, servers are safe but applications installed are very vulnerable.  Strong competition and financial urges force developers to issue product as soon as they can without proper testing. I came across several times that some pieces of software are written for very obsolete and insecure versions of PHP which poses additional risks for security of site. On the other side, various additions of custom code that is not tested can make system insecure.

Such incidents can endanger your reputation and trust of authors, readers, reviewers and librarians that would like to visit your site often. Above all, sometimes some drivers, firmware, operating systems are vulnerable and you as user of one account cannot do anything to prevent that. That is job of people in hosting company and manufacturers of hardware with vulnerable software to fix vulnerable parts of software. Nevertheless, this should not discourage your from publishing open access.  Constructive and proactive caution is always necessary and welcome.

 

Once, I received call from one association that is publisher of one scientific journal. They informed me that some strange code appeared on their site and I used various malware testing tools and my result was like on image below. I found soon that server was infected so called db.php infection.  Since malware was successfully uploaded on server, it GET requests and it infects every javascript files (.js) with javascript malware code.encoded intrusion  I decoded strings displayed on page and I found IP address of server that is infected and which is used for distribution of malware and which redirects users to other sites. Since such code was all over the site it was very hard to read pages and visitors were prevented from using open access content.

I reported editorial board of the journal on my findings and we informed hosting company and domain registrar of domains used to spread malware asking them to check issue and undertake necessary measures to stop abuse of our and possibly other sites infected by that malware.

The process was rather tense, stressful and painful for editorial board and all people concerned.  The hosting company that hosted server with domain used for spreading malware informed us that they will take care on the case. 

We used other tools to block IPs that are detected as attackers. We have had that day more than 290 attacks from computers from Panama and more than 150 attacks from computers from Ukraine. We restored our site by using fresh backups and reinstallation of web applications we use.  Our hosting company upgraded PHP version that was obsolete, unsupported and insecure at  the time.

 

GDPR Compliance

I have experienced several times that various companies purchased databases of e-mail addresses and other information about persons who may be potential customers of their commodities. Those companies used that information to send them their mail campaigns. Sometimes they receive information about clients that are retired, young, athletes or sorted according to various criteria. Many people asked themselves how they do now that I am retired recently or that I my kids just enrolled in secondary school. Many people felt embarrassed and confused after they realized that their privacy is not protected and that their private information is distributed to third parties without their consent.

Search engines often allowed anyone to easily find information about people that are registered in any on line system.

Sometimes journal editors while entering archives of previous issues of their journals, articles and information about authors in web applications such as OJS are faced with repetitious work of entering information about some authors.  Some of them asked developers to develop plugin which will enable that will enable them to have drop down list of users so they can easily select user and insert it in list of authors of some scientific article. They did not have any intention of making public that list or to use that feature anywhere except in administration panel of their web applications. But, their benevolent intention can in some contexts produce unpleasant consequences for some authors.  Thus, it is needed that privacy is protected by design not just by possible honest intentions of people who use data about other people.

Numerous complaints in previous years motivated legislators in the EU to pass by very strict rule that will protect data about people. The EU adopted General Data Protection Regulation.

“The EU General Data Protection Regulation (GDPR) replaces the Data Protection Directive 95/46/EC and was designed to harmonize data privacy laws across Europe, to protect and empower all EU citizens data privacy and to reshape the way organizations across the region approach data privacy.”  It will have very strong impact on entities within EU and those which store and use information of the citizens of the countries that are the EU members. The EU General Data Protection Regulation (GDPR)  was approved by the EU Parliament on April 14, 2016 and enforcement day is May 25, 2018.  Organizations in non-compliance can face heavy fines.  It is important to read part on extra-territorial applicability which reads:

“Arguably the biggest change to the regulatory landscape of data privacy comes with the extended jurisdiction of the GDPR, as it applies to all companies processing the personal data of data subjects residing in the Union, regardless of the company’s location. Previously, territorial applicability of the directive was ambiguous and referred to data process ‘in context of an establishment’. This topic has arisen in a number of high profile court cases. GPDR makes its applicability very clear – it will apply to the processing of personal data by controllers and processors in the EU, regardless of whether the processing takes place in the EU or not. The GDPR will also apply to the processing of personal data of data subjects in the EU by a controller or processor not established in the EU, where the activities relate to: offering goods or services to EU citizens (irrespective of whether payment is required) and the monitoring of behaviour that takes place within the EU. Non-Eu businesses processing the data of EU citizens will also have to appoint a representative in the EU. ”

There is still time to be prepared and in order to do so properly please read the text of adopted text of The EU General Data Protection Regulation (GDPR).

 

Manuals for Open Journal Systems

I have found that many editorial boards struggle with a lack of concise instruction materials and a lack of people who can train them with hands-on approach.  They usually find some solutions in on-line forums, but it is time consuming for editorial boards to spend so much time and look for partial information. Sometimes people who write manuals do not explain each step.  Several people contacted me and asked: “What I have to do now?  Something is missing.”

System administrators, software developers assume that what is easy to them it should be easy to everyone.  They plan training to be done in one evening because “It is easy. ” In my experience, I often found out that such practice leads to misconfiguration of application, underuse of its features,  mistakes in performing workflow tasks and procedures.  Work with applications as the Open Journal Systems is not hard but it is complex and it takes some time until user is familiar with its functionality and simple procedures for configuration and efficient use.

I wrote manuals for authors, editorial boards and reviewers for scientific journals according to their needs.

You can find here manual for authors, editorial boards, and reviewers.

I will publish here soon manual that puts together some basic administrative and editorial functions aimed to successful configuration of your Open Journal Systems application for your journal.

 

Spellcheck-Scientific Texts

The Open Journal Systems users use embedded TinyMce editor to enter various information during publishing their articles in journals.  Some users may need for various reasons spellcheck in order to make sure that texts they enter are spelled correctly.  TinyMce developers  developed one very simple spellchecker plugin that is not configured by default in the Open Journal Systems. There is free version and Spell Checker Pro version. TinyMce developers stated on the web page for Spell Checker pro that: A TinyMCE Enterprise subscription includes the ability to download and install a spell check as-you-type feature for the editor.

If you are editor, editorial board member or administrator of the site of your journal various payment schemes for spellchecking might not be affordable.

For general purpose texts you can use spellcheck capability of your browser.

But, if you need free, easy to install TinyMce spellchecker capable to spellcheck medical, scientific and legal terms you can consider using Nanospell spellchecker.  The developers of Nanospell stated on their site  that: It is perfect for secure applications and websites where user experience counts:

  • Guaranteed dictionary availability across all browsers, including medical, scientific and legal words.
  • Never sends your secure data to any remote servers: everything is done locally.
  • Works in older browsers which do not have spellchecking capabilities of their own.
  • No popup Ads

I supported some members of editorial boards to install it and configure properly in their Open Journal Systems installations. That is very easy and straightforward process which should not last more than 5 minutes and its icon should appear in toolbar of TinyMce editor in your Open Journal Systems installation.  It is easy to install its dictionaries which can meet your editorial and authoring needs.  However, please be aware of their licensing policy.

 

 

A number of virtual machines on one server

I have been recently invited by high level officials of one institution to help them to publish several journals on line.  Indeed, I recommended them to use Open Journal Systems which they gladly accepted.

They showed me their server, but one of high officials said: “You know, we do not have anyone who knows Linux.  We heard that you know it. we have on our server several installations for different web platforms, but we do not have idea how to fix several small issues and how to make everything work smoothly.”  I looked at each of those GNU/Linux installations and realized that many of them are installed as desktop machines with some additional applications such as web server, PHP, MySQL etc. But, many of those installations were lacking several dependencies resulting that some modules in web applications did not work. People who installed them were elsewhere and they did not have any documentation on settings, active services, software package versions and other important information related to virtual machines with operating systems and web applications installed on them.

I suggested to them that is necessary somehow to standardize those installations and choose GNU/Linux distribution that is efficient and easy to administer and migrate web applications to newly installed virtual machines and create documentation with precise information on operating system version, versions of important applications and infrastructure requirements. Those specifications will help administrator to manage backups, upgrades, maintenance, testing instead of guessing where is which application  and with what other application is or is not compatible.  They were scared since they were not sure how to do that and how many work hours is needed for that. Well, that will save a number of work hours of saving damaged or corrupted data, misbehaving applications or consequences of compromised virtual machines due to software packages that were not upgraded when needed.

Thanking to experience and knowledge of free software developers and users of GNU/Linux and many other free software applications it is possible to plan, project and implement whole infrastructure and web applications in a way that can assure users and administrators that everything will work smooth.

It is needed to take care on:

  • scalability
  • security
  • easy of use
  • ease of quality administration
  • price
  • maintenance
  • documentation
  • backups

We can add more criteria and discuss all of them which is beyond scope of this post. But, I want to stress importance that free software, open access are not just sandbox for benevolence and good will.  It is rather, very serious activity and require a lot of work in order to make sure that users of information and knowledge we publish on web applications designed for open access publishing  will have positive experience that will help them to learn.

 

 

Backup-Simple to do?

I prepared myself to do training for the members editorial boards of two journals issued by one institute of economic sciences.

I came to the premises where I was supposed to do training, but I noticed that people who entered data in the system dedicated to on line publishing were shocked and confused. “Everything is gone!”, said one of them.  “What has gone?”, I asked.  “The data we entered in the last two months!”, she replied sadly.  I started to examine what has happened and discovered that some bad backup was restored over our virtual machine hosted on their server.  “Hmm, did anyone restored some wrong backup?”, I asked.  ” I do not know how to do that and even our administrator is away. He left a couple of weeks ago.”, she sad hopelessly.  Well, another person was invited  to examine what has happened and we determined by checking logs and possible scenarios and we have found that former system administrator assumed that so-called bare-metal backup was sufficient. In addition, he did not check whether back was complete or not and what software vendor suggests to perform backup of guests on Hyper-V. Well, too much mistakes resulted in complete loss of data that two persons entered in the period of two months.

So, when you plan hosting and backup of your data please check carefully documentation, test your methods of backup and after checking of various possible disaster scenarios implement it on production (virtual) machine with open access application and samples of data entered.