The elicitation of power prior distributions is based on the availability of the historical data, and is realized by raising the likelihood function of the historical data to a fractional power δ ∈ [0, 1], which quantifies the degree of discounting of the historical data in making inference with current data. However, when δ is not prespecified and estimated from data using a Bayesian approach under the original form of the joint power prior, an arbitrary positive constant before the likelihood of the historical data could change the inferential results. This violates the likelihood principle. This article provides a comprehensive study of its modified form, known as the normalized power prior, that obeys the likelihood principle. The optimality properties of the normalized power priors in the sense of maximizing the Shannon's mutual information is established. We show that the discrepancy between historical and current data can be better quantified under the normalized power prior by examining the posterior for several commonly used distributions. Efficient algorithms to compute the extra scaling factor in the normalized power prior is also proposed. We finally illustrate its use with three data examples, and provide an implementation with an R package NPP.